Training: 2022-04-10 23:13:50,324-rank_id: 0
Training: 2022-04-10 23:14:18,195-: margin_list              [1.0, 0.0, 0.4]
Training: 2022-04-10 23:14:18,196-: network                  r100
Training: 2022-04-10 23:14:18,196-: resume                   False
Training: 2022-04-10 23:14:18,196-: output                   work_dirs/glint360k_r100
Training: 2022-04-10 23:14:18,196-: embedding_size           512
Training: 2022-04-10 23:14:18,196-: sample_rate              1.0
Training: 2022-04-10 23:14:18,196-: interclass_filtering_threshold0
Training: 2022-04-10 23:14:18,196-: fp16                     True
Training: 2022-04-10 23:14:18,196-: batch_size               128
Training: 2022-04-10 23:14:18,196-: optimizer                sgd
Training: 2022-04-10 23:14:18,196-: lr                       0.1
Training: 2022-04-10 23:14:18,196-: momentum                 0.9
Training: 2022-04-10 23:14:18,196-: weight_decay             0.0001
Training: 2022-04-10 23:14:18,196-: verbose                  2000
Training: 2022-04-10 23:14:18,196-: frequent                 10
Training: 2022-04-10 23:14:18,196-: dali                     False
Training: 2022-04-10 23:14:18,196-: rec                      /train_tmp/glint360k
Training: 2022-04-10 23:14:18,196-: num_classes              360232
Training: 2022-04-10 23:14:18,196-: num_image                17091657
Training: 2022-04-10 23:14:18,196-: num_epoch                20
Training: 2022-04-10 23:14:18,197-: warmup_epoch             0
Training: 2022-04-10 23:14:18,197-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-10 23:14:18,197-: total_batch_size         1024
Training: 2022-04-10 23:14:18,197-: warmup_step              0
Training: 2022-04-10 23:14:18,197-: total_step               333820
Training: 2022-04-10 23:15:42,576-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-10 23:15:48,390-Speed 3190.01 samples/sec   Loss 42.7738   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 8192   Required: 142 hours
Training: 2022-04-10 23:15:51,562-Speed 3229.39 samples/sec   Loss 43.6656   LearningRate 0.1000   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 8192   Required: 106 hours
Training: 2022-04-10 23:15:54,678-Speed 3287.58 samples/sec   Loss 43.7943   LearningRate 0.1000   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 8192   Required: 87 hours
Training: 2022-04-10 23:15:57,800-Speed 3280.99 samples/sec   Loss 43.4347   LearningRate 0.1000   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 8192   Required: 76 hours
Training: 2022-04-10 23:16:00,889-Speed 3315.61 samples/sec   Loss 43.7649   LearningRate 0.1000   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 8192   Required: 68 hours
Training: 2022-04-10 23:16:03,978-Speed 3316.22 samples/sec   Loss 43.6740   LearningRate 0.1000   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-10 23:16:07,052-Speed 3331.59 samples/sec   Loss 43.6139   LearningRate 0.1000   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 8192   Required: 58 hours
Training: 2022-04-10 23:16:10,140-Speed 3317.86 samples/sec   Loss 43.6880   LearningRate 0.0999   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 8192   Required: 55 hours
Training: 2022-04-10 23:16:13,224-Speed 3320.81 samples/sec   Loss 43.3655   LearningRate 0.0999   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 8192   Required: 52 hours
Training: 2022-04-10 23:16:16,547-Speed 3082.00 samples/sec   Loss 43.0799   LearningRate 0.0999   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 16384   Required: 50 hours
Training: 2022-04-10 23:16:19,832-Speed 3118.57 samples/sec   Loss 42.9779   LearningRate 0.0999   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 16384   Required: 49 hours
Training: 2022-04-10 23:16:22,907-Speed 3330.81 samples/sec   Loss 42.9618   LearningRate 0.0999   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 16384   Required: 47 hours
Training: 2022-04-10 23:16:26,013-Speed 3298.26 samples/sec   Loss 42.7053   LearningRate 0.0999   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 16384   Required: 46 hours
Training: 2022-04-10 23:16:29,099-Speed 3318.92 samples/sec   Loss 42.5933   LearningRate 0.0999   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 16384   Required: 45 hours
Training: 2022-04-10 23:16:32,215-Speed 3286.40 samples/sec   Loss 42.5476   LearningRate 0.0999   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 16384   Required: 44 hours
Training: 2022-04-10 23:16:35,374-Speed 3243.06 samples/sec   Loss 42.4165   LearningRate 0.0999   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 16384   Required: 43 hours
Training: 2022-04-10 23:16:38,428-Speed 3353.26 samples/sec   Loss 42.3072   LearningRate 0.0999   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 16384   Required: 42 hours
Training: 2022-04-10 23:16:41,489-Speed 3346.27 samples/sec   Loss 42.3054   LearningRate 0.0999   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-04-10 23:16:44,562-Speed 3332.83 samples/sec   Loss 42.0460   LearningRate 0.0999   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 16384   Required: 41 hours
Training: 2022-04-10 23:16:47,699-Speed 3264.99 samples/sec   Loss 42.0371   LearningRate 0.0999   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-10 23:16:50,812-Speed 3290.83 samples/sec   Loss 41.8871   LearningRate 0.0999   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 32768   Required: 40 hours
Training: 2022-04-10 23:16:53,922-Speed 3293.43 samples/sec   Loss 41.6252   LearningRate 0.0999   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-10 23:16:57,026-Speed 3299.50 samples/sec   Loss 41.5288   LearningRate 0.0999   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 32768   Required: 39 hours
Training: 2022-04-10 23:17:00,146-Speed 3282.81 samples/sec   Loss 41.5211   LearningRate 0.0999   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-10 23:17:03,391-Speed 3156.52 samples/sec   Loss 41.5412   LearningRate 0.0998   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-10 23:17:06,464-Speed 3332.87 samples/sec   Loss 41.3137   LearningRate 0.0998   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 32768   Required: 38 hours
Training: 2022-04-10 23:17:09,525-Speed 3346.32 samples/sec   Loss 41.1793   LearningRate 0.0998   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-10 23:17:12,665-Speed 3262.40 samples/sec   Loss 41.0245   LearningRate 0.0998   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-10 23:17:15,733-Speed 3338.86 samples/sec   Loss 41.0004   LearningRate 0.0998   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 32768   Required: 37 hours
Training: 2022-04-10 23:17:18,795-Speed 3345.38 samples/sec   Loss 40.8876   LearningRate 0.0998   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 65536   Required: 37 hours
Training: 2022-04-10 23:17:21,948-Speed 3247.87 samples/sec   Loss 40.7150   LearningRate 0.0998   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-10 23:17:25,080-Speed 3270.77 samples/sec   Loss 40.5775   LearningRate 0.0998   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-10 23:17:28,157-Speed 3328.20 samples/sec   Loss 40.4539   LearningRate 0.0998   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-10 23:17:31,262-Speed 3298.63 samples/sec   Loss 40.4193   LearningRate 0.0998   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 65536   Required: 36 hours
Training: 2022-04-10 23:17:34,368-Speed 3298.41 samples/sec   Loss 40.2545   LearningRate 0.0998   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-10 23:17:37,440-Speed 3333.61 samples/sec   Loss 40.1867   LearningRate 0.0998   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-10 23:17:40,527-Speed 3318.15 samples/sec   Loss 39.9966   LearningRate 0.0998   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-10 23:17:43,604-Speed 3328.64 samples/sec   Loss 40.0132   LearningRate 0.0998   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-10 23:17:46,688-Speed 3320.39 samples/sec   Loss 39.8144   LearningRate 0.0998   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-04-10 23:17:49,792-Speed 3300.92 samples/sec   Loss 39.7034   LearningRate 0.0998   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:17:52,891-Speed 3304.90 samples/sec   Loss 39.5675   LearningRate 0.0997   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:17:55,972-Speed 3324.57 samples/sec   Loss 39.5376   LearningRate 0.0997   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:17:59,066-Speed 3309.75 samples/sec   Loss 39.4817   LearningRate 0.0997   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:18:02,154-Speed 3316.57 samples/sec   Loss 39.3354   LearningRate 0.0997   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:18:05,214-Speed 3347.01 samples/sec   Loss 39.2907   LearningRate 0.0997   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:18:08,286-Speed 3334.52 samples/sec   Loss 39.0824   LearningRate 0.0997   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:18:11,354-Speed 3339.43 samples/sec   Loss 39.0343   LearningRate 0.0997   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:18:14,442-Speed 3317.04 samples/sec   Loss 38.9839   LearningRate 0.0997   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:18:17,501-Speed 3348.41 samples/sec   Loss 38.8155   LearningRate 0.0997   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:18:20,569-Speed 3338.31 samples/sec   Loss 38.6331   LearningRate 0.0997   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:23,630-Speed 3346.57 samples/sec   Loss 38.6084   LearningRate 0.0997   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:26,693-Speed 3343.35 samples/sec   Loss 38.5173   LearningRate 0.0997   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:29,759-Speed 3340.45 samples/sec   Loss 38.3925   LearningRate 0.0997   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:32,832-Speed 3333.30 samples/sec   Loss 38.3184   LearningRate 0.0997   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:35,910-Speed 3327.43 samples/sec   Loss 38.1573   LearningRate 0.0997   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:39,017-Speed 3297.55 samples/sec   Loss 38.0537   LearningRate 0.0997   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:42,131-Speed 3288.19 samples/sec   Loss 37.9636   LearningRate 0.0997   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:45,209-Speed 3327.58 samples/sec   Loss 37.9146   LearningRate 0.0996   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:48,280-Speed 3335.78 samples/sec   Loss 37.7784   LearningRate 0.0996   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:18:51,343-Speed 3343.91 samples/sec   Loss 37.6350   LearningRate 0.0996   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-04-10 23:18:54,413-Speed 3335.47 samples/sec   Loss 37.5418   LearningRate 0.0996   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 524288   Required: 33 hours
Training: 2022-04-10 23:18:57,480-Speed 3339.85 samples/sec   Loss 37.5163   LearningRate 0.0996   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:00,559-Speed 3327.17 samples/sec   Loss 37.3522   LearningRate 0.0996   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:03,629-Speed 3336.61 samples/sec   Loss 37.2742   LearningRate 0.0996   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:06,691-Speed 3344.46 samples/sec   Loss 37.3587   LearningRate 0.0996   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:09,763-Speed 3334.40 samples/sec   Loss 37.0781   LearningRate 0.0996   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:12,856-Speed 3311.60 samples/sec   Loss 36.9784   LearningRate 0.0996   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:15,915-Speed 3348.02 samples/sec   Loss 36.9748   LearningRate 0.0996   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:18,977-Speed 3344.34 samples/sec   Loss 36.7309   LearningRate 0.0996   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:22,083-Speed 3298.46 samples/sec   Loss 36.7128   LearningRate 0.0996   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:25,154-Speed 3335.05 samples/sec   Loss 36.6252   LearningRate 0.0996   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:28,234-Speed 3326.23 samples/sec   Loss 36.4379   LearningRate 0.0996   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 524288   Required: 32 hours
Training: 2022-04-10 23:19:31,295-Speed 3345.80 samples/sec   Loss 36.4156   LearningRate 0.0996   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:34,406-Speed 3291.97 samples/sec   Loss 36.2215   LearningRate 0.0996   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:37,466-Speed 3348.35 samples/sec   Loss 36.1999   LearningRate 0.0995   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:40,579-Speed 3289.69 samples/sec   Loss 36.1623   LearningRate 0.0995   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:43,652-Speed 3332.90 samples/sec   Loss 35.9928   LearningRate 0.0995   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:46,727-Speed 3330.46 samples/sec   Loss 35.8523   LearningRate 0.0995   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:49,873-Speed 3256.12 samples/sec   Loss 35.8527   LearningRate 0.0995   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:52,949-Speed 3330.17 samples/sec   Loss 35.7958   LearningRate 0.0995   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:56,071-Speed 3281.32 samples/sec   Loss 35.7808   LearningRate 0.0995   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:19:59,148-Speed 3327.83 samples/sec   Loss 35.5061   LearningRate 0.0995   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:02,224-Speed 3330.68 samples/sec   Loss 35.4202   LearningRate 0.0995   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:05,288-Speed 3341.91 samples/sec   Loss 35.4539   LearningRate 0.0995   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:08,353-Speed 3342.20 samples/sec   Loss 35.3571   LearningRate 0.0995   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:11,429-Speed 3329.66 samples/sec   Loss 35.1764   LearningRate 0.0995   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:14,509-Speed 3325.77 samples/sec   Loss 35.0474   LearningRate 0.0995   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:17,633-Speed 3278.69 samples/sec   Loss 35.0270   LearningRate 0.0995   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:20,732-Speed 3305.14 samples/sec   Loss 34.9771   LearningRate 0.0995   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:23,809-Speed 3328.85 samples/sec   Loss 34.7316   LearningRate 0.0995   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:26,876-Speed 3339.20 samples/sec   Loss 34.6524   LearningRate 0.0994   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:29,972-Speed 3308.51 samples/sec   Loss 34.6686   LearningRate 0.0994   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:33,034-Speed 3345.65 samples/sec   Loss 34.5280   LearningRate 0.0994   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:36,128-Speed 3310.41 samples/sec   Loss 34.5473   LearningRate 0.0994   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:39,210-Speed 3323.95 samples/sec   Loss 34.2082   LearningRate 0.0994   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:42,314-Speed 3299.36 samples/sec   Loss 34.2576   LearningRate 0.0994   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:45,386-Speed 3334.57 samples/sec   Loss 34.2252   LearningRate 0.0994   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:48,468-Speed 3323.62 samples/sec   Loss 34.0012   LearningRate 0.0994   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:51,620-Speed 3248.92 samples/sec   Loss 33.9353   LearningRate 0.0994   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:54,701-Speed 3324.22 samples/sec   Loss 33.9179   LearningRate 0.0994   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:20:57,775-Speed 3332.61 samples/sec   Loss 33.7719   LearningRate 0.0994   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:00,868-Speed 3312.06 samples/sec   Loss 33.8345   LearningRate 0.0994   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:04,018-Speed 3250.81 samples/sec   Loss 33.5847   LearningRate 0.0994   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 524288   Required: 31 hours
Training: 2022-04-10 23:21:07,151-Speed 3269.42 samples/sec   Loss 33.6596   LearningRate 0.0994   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 524288   Required: 31 hours
Training: 2022-04-10 23:21:10,249-Speed 3306.79 samples/sec   Loss 33.4516   LearningRate 0.0994   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 524288   Required: 31 hours
Training: 2022-04-10 23:21:13,325-Speed 3329.31 samples/sec   Loss 33.3663   LearningRate 0.0994   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 524288   Required: 31 hours
Training: 2022-04-10 23:21:16,469-Speed 3258.07 samples/sec   Loss 33.1734   LearningRate 0.0994   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 524288   Required: 31 hours
Training: 2022-04-10 23:21:19,573-Speed 3300.11 samples/sec   Loss 33.2505   LearningRate 0.0993   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:22,659-Speed 3318.54 samples/sec   Loss 33.0568   LearningRate 0.0993   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:25,723-Speed 3343.34 samples/sec   Loss 32.9422   LearningRate 0.0993   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:28,792-Speed 3338.09 samples/sec   Loss 32.7482   LearningRate 0.0993   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:31,863-Speed 3334.01 samples/sec   Loss 32.8507   LearningRate 0.0993   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:34,927-Speed 3343.20 samples/sec   Loss 32.7850   LearningRate 0.0993   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:37,992-Speed 3341.76 samples/sec   Loss 32.6411   LearningRate 0.0993   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:41,144-Speed 3250.25 samples/sec   Loss 32.5898   LearningRate 0.0993   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:44,230-Speed 3318.31 samples/sec   Loss 32.3741   LearningRate 0.0993   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:47,298-Speed 3338.62 samples/sec   Loss 32.4438   LearningRate 0.0993   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:50,365-Speed 3339.57 samples/sec   Loss 32.3363   LearningRate 0.0993   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 524288   Required: 31 hours
Training: 2022-04-10 23:21:53,433-Speed 3338.68 samples/sec   Loss 32.1877   LearningRate 0.0993   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 524288   Required: 31 hours
Training: 2022-04-10 23:21:56,488-Speed 3353.64 samples/sec   Loss 31.9120   LearningRate 0.0993   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:21:59,553-Speed 3341.54 samples/sec   Loss 32.0996   LearningRate 0.0993   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:22:02,629-Speed 3328.77 samples/sec   Loss 31.8859   LearningRate 0.0993   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-10 23:22:05,694-Speed 3342.31 samples/sec   Loss 31.9352   LearningRate 0.0993   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:08,762-Speed 3338.82 samples/sec   Loss 31.7676   LearningRate 0.0993   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:11,828-Speed 3340.86 samples/sec   Loss 31.6050   LearningRate 0.0992   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:14,896-Speed 3338.24 samples/sec   Loss 31.6977   LearningRate 0.0992   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:17,970-Speed 3332.36 samples/sec   Loss 31.5033   LearningRate 0.0992   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:21,049-Speed 3326.75 samples/sec   Loss 31.4714   LearningRate 0.0992   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:24,120-Speed 3334.90 samples/sec   Loss 31.3849   LearningRate 0.0992   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:27,190-Speed 3336.79 samples/sec   Loss 31.2500   LearningRate 0.0992   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 524288   Required: 30 hours
Training: 2022-04-10 23:22:30,258-Speed 3338.07 samples/sec   Loss 31.1957   LearningRate 0.0992   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 524288   Required: 30 hours
Training: 2022-04-10 23:22:33,368-Speed 3293.71 samples/sec   Loss 31.1391   LearningRate 0.0992   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 524288   Required: 30 hours
Training: 2022-04-10 23:22:36,468-Speed 3303.55 samples/sec   Loss 31.1511   LearningRate 0.0992   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 524288   Required: 30 hours
Training: 2022-04-10 23:22:39,621-Speed 3248.44 samples/sec   Loss 30.8943   LearningRate 0.0992   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 524288   Required: 30 hours
Training: 2022-04-10 23:22:42,701-Speed 3325.88 samples/sec   Loss 30.6862   LearningRate 0.0992   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:45,771-Speed 3336.81 samples/sec   Loss 30.7624   LearningRate 0.0992   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:48,837-Speed 3340.46 samples/sec   Loss 30.7274   LearningRate 0.0992   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:51,900-Speed 3343.67 samples/sec   Loss 30.4870   LearningRate 0.0992   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:54,969-Speed 3337.72 samples/sec   Loss 30.7046   LearningRate 0.0992   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:22:58,054-Speed 3320.50 samples/sec   Loss 30.5731   LearningRate 0.0992   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:01,121-Speed 3339.23 samples/sec   Loss 30.3831   LearningRate 0.0992   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:04,189-Speed 3339.15 samples/sec   Loss 30.2275   LearningRate 0.0991   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:07,297-Speed 3295.58 samples/sec   Loss 30.0229   LearningRate 0.0991   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:10,437-Speed 3261.80 samples/sec   Loss 30.1764   LearningRate 0.0991   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:13,597-Speed 3241.14 samples/sec   Loss 30.1363   LearningRate 0.0991   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 524288   Required: 30 hours
Training: 2022-04-10 23:23:16,699-Speed 3301.66 samples/sec   Loss 29.5485   LearningRate 0.0991   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:19,777-Speed 3327.60 samples/sec   Loss 29.7707   LearningRate 0.0991   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:22,884-Speed 3297.45 samples/sec   Loss 29.5758   LearningRate 0.0991   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:25,999-Speed 3287.52 samples/sec   Loss 29.7483   LearningRate 0.0991   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:29,075-Speed 3330.29 samples/sec   Loss 29.6450   LearningRate 0.0991   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:32,140-Speed 3341.89 samples/sec   Loss 29.5682   LearningRate 0.0991   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:35,241-Speed 3302.80 samples/sec   Loss 29.4164   LearningRate 0.0991   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:38,399-Speed 3243.68 samples/sec   Loss 29.3170   LearningRate 0.0991   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:41,463-Speed 3343.41 samples/sec   Loss 29.3415   LearningRate 0.0991   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:44,543-Speed 3324.97 samples/sec   Loss 29.2975   LearningRate 0.0991   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:47,612-Speed 3337.83 samples/sec   Loss 29.1485   LearningRate 0.0991   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 524288   Required: 30 hours
Training: 2022-04-10 23:23:50,676-Speed 3342.83 samples/sec   Loss 28.9568   LearningRate 0.0991   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:53,780-Speed 3300.18 samples/sec   Loss 29.0739   LearningRate 0.0990   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:56,847-Speed 3339.21 samples/sec   Loss 28.8710   LearningRate 0.0990   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:23:59,928-Speed 3325.16 samples/sec   Loss 28.8060   LearningRate 0.0990   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:24:02,990-Speed 3344.58 samples/sec   Loss 28.5490   LearningRate 0.0990   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:24:06,076-Speed 3319.70 samples/sec   Loss 28.6894   LearningRate 0.0990   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:24:09,192-Speed 3286.81 samples/sec   Loss 28.4821   LearningRate 0.0990   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:24:12,245-Speed 3355.28 samples/sec   Loss 28.5370   LearningRate 0.0990   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:15,329-Speed 3320.74 samples/sec   Loss 28.3125   LearningRate 0.0990   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:18,398-Speed 3338.83 samples/sec   Loss 28.2707   LearningRate 0.0990   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:21,479-Speed 3324.67 samples/sec   Loss 28.0945   LearningRate 0.0990   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:24,603-Speed 3279.17 samples/sec   Loss 28.0293   LearningRate 0.0990   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:27,706-Speed 3300.64 samples/sec   Loss 28.1405   LearningRate 0.0990   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:30,776-Speed 3336.42 samples/sec   Loss 27.9890   LearningRate 0.0990   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:33,843-Speed 3339.26 samples/sec   Loss 28.0131   LearningRate 0.0990   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:36,909-Speed 3340.70 samples/sec   Loss 27.7709   LearningRate 0.0990   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:39,975-Speed 3341.13 samples/sec   Loss 27.7783   LearningRate 0.0990   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:24:43,112-Speed 3265.17 samples/sec   Loss 27.5669   LearningRate 0.0990   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:24:46,185-Speed 3332.22 samples/sec   Loss 27.5285   LearningRate 0.0989   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:24:49,254-Speed 3338.41 samples/sec   Loss 27.5531   LearningRate 0.0989   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:24:52,322-Speed 3338.92 samples/sec   Loss 27.4147   LearningRate 0.0989   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:24:55,445-Speed 3279.03 samples/sec   Loss 27.4892   LearningRate 0.0989   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:24:58,534-Speed 3316.34 samples/sec   Loss 27.3741   LearningRate 0.0989   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:01,610-Speed 3329.69 samples/sec   Loss 27.2565   LearningRate 0.0989   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:04,675-Speed 3341.77 samples/sec   Loss 27.1003   LearningRate 0.0989   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:07,737-Speed 3344.57 samples/sec   Loss 27.2034   LearningRate 0.0989   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:10,801-Speed 3343.28 samples/sec   Loss 27.0379   LearningRate 0.0989   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:13,860-Speed 3347.95 samples/sec   Loss 27.0951   LearningRate 0.0989   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:16,924-Speed 3343.11 samples/sec   Loss 26.8267   LearningRate 0.0989   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:20,007-Speed 3322.65 samples/sec   Loss 26.7697   LearningRate 0.0989   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:23,089-Speed 3323.53 samples/sec   Loss 26.7016   LearningRate 0.0989   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:26,154-Speed 3342.06 samples/sec   Loss 26.6833   LearningRate 0.0989   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:29,223-Speed 3337.07 samples/sec   Loss 26.6814   LearningRate 0.0989   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:32,299-Speed 3330.20 samples/sec   Loss 26.5558   LearningRate 0.0989   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:35,370-Speed 3335.18 samples/sec   Loss 26.5563   LearningRate 0.0989   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:38,437-Speed 3339.67 samples/sec   Loss 26.2465   LearningRate 0.0988   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:41,510-Speed 3332.99 samples/sec   Loss 26.2302   LearningRate 0.0988   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:44,579-Speed 3337.50 samples/sec   Loss 26.2929   LearningRate 0.0988   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-10 23:25:47,645-Speed 3340.92 samples/sec   Loss 26.2544   LearningRate 0.0988   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:25:50,714-Speed 3337.47 samples/sec   Loss 26.0747   LearningRate 0.0988   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:25:53,792-Speed 3328.44 samples/sec   Loss 25.9518   LearningRate 0.0988   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:25:56,868-Speed 3328.82 samples/sec   Loss 25.9816   LearningRate 0.0988   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:26:00,037-Speed 3232.85 samples/sec   Loss 26.1082   LearningRate 0.0988   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-10 23:26:44,322-[lfw][2000]XNorm: 22.371925
Training: 2022-04-10 23:26:44,323-[lfw][2000]Accuracy-Flip: 0.97317+-0.00831
Training: 2022-04-10 23:26:44,323-[lfw][2000]Accuracy-Highest: 0.97317
Training: 2022-04-10 23:27:35,734-[cfp_fp][2000]XNorm: 19.827943
Training: 2022-04-10 23:27:35,735-[cfp_fp][2000]Accuracy-Flip: 0.75371+-0.01658
Training: 2022-04-10 23:27:35,735-[cfp_fp][2000]Accuracy-Highest: 0.75371
Training: 2022-04-10 23:28:19,974-[agedb_30][2000]XNorm: 21.274121
Training: 2022-04-10 23:28:19,975-[agedb_30][2000]Accuracy-Flip: 0.82617+-0.02399
Training: 2022-04-10 23:28:19,976-[agedb_30][2000]Accuracy-Highest: 0.82617
Training: 2022-04-10 23:28:23,063-Speed 71.60 samples/sec   Loss 25.9271   LearningRate 0.0988   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:26,135-Speed 3333.79 samples/sec   Loss 25.7280   LearningRate 0.0988   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:29,188-Speed 3355.23 samples/sec   Loss 25.6644   LearningRate 0.0988   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:32,244-Speed 3351.33 samples/sec   Loss 25.7383   LearningRate 0.0988   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:35,301-Speed 3351.25 samples/sec   Loss 25.7031   LearningRate 0.0988   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:38,360-Speed 3347.38 samples/sec   Loss 25.3564   LearningRate 0.0988   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 262144   Required: 36 hours
Training: 2022-04-10 23:28:41,520-Speed 3241.09 samples/sec   Loss 25.3670   LearningRate 0.0988   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:44,667-Speed 3258.70 samples/sec   Loss 25.4718   LearningRate 0.0988   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:47,741-Speed 3331.98 samples/sec   Loss 25.3322   LearningRate 0.0988   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:50,890-Speed 3252.45 samples/sec   Loss 25.1353   LearningRate 0.0987   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:53,982-Speed 3313.04 samples/sec   Loss 25.2391   LearningRate 0.0987   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:28:57,077-Speed 3310.04 samples/sec   Loss 25.1325   LearningRate 0.0987   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:29:00,143-Speed 3339.97 samples/sec   Loss 25.1806   LearningRate 0.0987   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:29:03,209-Speed 3341.76 samples/sec   Loss 24.9666   LearningRate 0.0987   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:29:06,297-Speed 3317.01 samples/sec   Loss 24.9614   LearningRate 0.0987   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:29:09,367-Speed 3335.64 samples/sec   Loss 25.0060   LearningRate 0.0987   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 36 hours
Training: 2022-04-10 23:29:12,510-Speed 3259.46 samples/sec   Loss 24.7145   LearningRate 0.0987   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:15,587-Speed 3328.80 samples/sec   Loss 24.7310   LearningRate 0.0987   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:18,666-Speed 3326.39 samples/sec   Loss 24.9099   LearningRate 0.0987   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:21,744-Speed 3327.34 samples/sec   Loss 24.4240   LearningRate 0.0987   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:24,832-Speed 3316.62 samples/sec   Loss 24.5845   LearningRate 0.0987   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:27,901-Speed 3337.24 samples/sec   Loss 24.6872   LearningRate 0.0987   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:30,975-Speed 3332.83 samples/sec   Loss 24.3298   LearningRate 0.0987   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:34,077-Speed 3302.08 samples/sec   Loss 24.1964   LearningRate 0.0987   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:37,196-Speed 3283.47 samples/sec   Loss 24.2335   LearningRate 0.0987   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:40,281-Speed 3319.88 samples/sec   Loss 24.1638   LearningRate 0.0987   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:43,374-Speed 3312.00 samples/sec   Loss 24.1049   LearningRate 0.0986   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:46,436-Speed 3344.81 samples/sec   Loss 24.1101   LearningRate 0.0986   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:49,504-Speed 3339.03 samples/sec   Loss 23.8899   LearningRate 0.0986   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:52,575-Speed 3335.57 samples/sec   Loss 24.0541   LearningRate 0.0986   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:55,652-Speed 3328.02 samples/sec   Loss 23.7941   LearningRate 0.0986   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:29:58,727-Speed 3330.78 samples/sec   Loss 23.8110   LearningRate 0.0986   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:30:01,803-Speed 3331.41 samples/sec   Loss 23.8667   LearningRate 0.0986   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:30:04,863-Speed 3346.96 samples/sec   Loss 23.7164   LearningRate 0.0986   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:30:07,926-Speed 3344.83 samples/sec   Loss 23.7478   LearningRate 0.0986   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:30:10,986-Speed 3346.21 samples/sec   Loss 23.3387   LearningRate 0.0986   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:30:14,047-Speed 3346.35 samples/sec   Loss 23.6252   LearningRate 0.0986   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:30:17,109-Speed 3345.87 samples/sec   Loss 23.1616   LearningRate 0.0986   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:20,217-Speed 3294.86 samples/sec   Loss 23.4500   LearningRate 0.0986   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:23,280-Speed 3344.44 samples/sec   Loss 23.4636   LearningRate 0.0986   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:26,339-Speed 3347.45 samples/sec   Loss 23.2141   LearningRate 0.0986   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:29,401-Speed 3345.37 samples/sec   Loss 23.2592   LearningRate 0.0986   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:32,469-Speed 3338.44 samples/sec   Loss 23.3446   LearningRate 0.0985   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:35,525-Speed 3351.67 samples/sec   Loss 23.0154   LearningRate 0.0985   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:38,600-Speed 3330.98 samples/sec   Loss 23.0408   LearningRate 0.0985   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:41,660-Speed 3347.67 samples/sec   Loss 23.0951   LearningRate 0.0985   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:44,719-Speed 3347.83 samples/sec   Loss 22.8596   LearningRate 0.0985   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:30:47,795-Speed 3330.24 samples/sec   Loss 22.9810   LearningRate 0.0985   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:30:50,867-Speed 3333.86 samples/sec   Loss 22.8960   LearningRate 0.0985   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:30:53,933-Speed 3341.01 samples/sec   Loss 22.9053   LearningRate 0.0985   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:30:57,006-Speed 3332.73 samples/sec   Loss 22.7283   LearningRate 0.0985   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:00,073-Speed 3339.91 samples/sec   Loss 22.7560   LearningRate 0.0985   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:03,133-Speed 3348.53 samples/sec   Loss 22.4990   LearningRate 0.0985   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:06,204-Speed 3335.49 samples/sec   Loss 22.6208   LearningRate 0.0985   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:09,265-Speed 3346.24 samples/sec   Loss 22.5144   LearningRate 0.0985   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:12,314-Speed 3358.85 samples/sec   Loss 22.4380   LearningRate 0.0985   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:15,377-Speed 3343.59 samples/sec   Loss 22.5329   LearningRate 0.0985   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:18,435-Speed 3351.67 samples/sec   Loss 22.5527   LearningRate 0.0985   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:21,504-Speed 3337.64 samples/sec   Loss 22.4266   LearningRate 0.0985   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:24,579-Speed 3330.52 samples/sec   Loss 22.3650   LearningRate 0.0984   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:27,698-Speed 3284.72 samples/sec   Loss 22.1469   LearningRate 0.0984   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:30,758-Speed 3346.57 samples/sec   Loss 22.0486   LearningRate 0.0984   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:33,822-Speed 3342.55 samples/sec   Loss 22.1096   LearningRate 0.0984   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:36,905-Speed 3322.20 samples/sec   Loss 21.8647   LearningRate 0.0984   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:39,995-Speed 3314.96 samples/sec   Loss 21.9030   LearningRate 0.0984   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:31:43,085-Speed 3314.72 samples/sec   Loss 21.9241   LearningRate 0.0984   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:46,171-Speed 3318.98 samples/sec   Loss 21.7268   LearningRate 0.0984   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:49,253-Speed 3323.41 samples/sec   Loss 21.8260   LearningRate 0.0984   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:52,345-Speed 3312.90 samples/sec   Loss 21.8979   LearningRate 0.0984   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:55,458-Speed 3290.51 samples/sec   Loss 21.7509   LearningRate 0.0984   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:31:58,657-Speed 3201.20 samples/sec   Loss 21.7565   LearningRate 0.0984   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:32:01,885-Speed 3173.98 samples/sec   Loss 21.6550   LearningRate 0.0984   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:32:04,962-Speed 3328.32 samples/sec   Loss 21.4713   LearningRate 0.0984   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:32:08,028-Speed 3340.01 samples/sec   Loss 21.5415   LearningRate 0.0984   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:32:11,089-Speed 3346.38 samples/sec   Loss 21.4587   LearningRate 0.0984   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:32:14,158-Speed 3338.44 samples/sec   Loss 21.4288   LearningRate 0.0984   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 524288   Required: 34 hours
Training: 2022-04-10 23:32:17,216-Speed 3349.12 samples/sec   Loss 21.4114   LearningRate 0.0983   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:32:20,277-Speed 3346.18 samples/sec   Loss 21.3347   LearningRate 0.0983   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:32:23,334-Speed 3349.93 samples/sec   Loss 21.4098   LearningRate 0.0983   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:26,418-Speed 3321.73 samples/sec   Loss 21.3381   LearningRate 0.0983   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:29,536-Speed 3285.07 samples/sec   Loss 21.0669   LearningRate 0.0983   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:32,616-Speed 3324.87 samples/sec   Loss 21.1838   LearningRate 0.0983   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:35,680-Speed 3343.55 samples/sec   Loss 21.0878   LearningRate 0.0983   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:38,744-Speed 3342.53 samples/sec   Loss 21.0033   LearningRate 0.0983   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:41,805-Speed 3345.18 samples/sec   Loss 20.9788   LearningRate 0.0983   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:44,879-Speed 3332.49 samples/sec   Loss 20.9368   LearningRate 0.0983   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:47,947-Speed 3338.69 samples/sec   Loss 20.9448   LearningRate 0.0983   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:51,060-Speed 3290.15 samples/sec   Loss 20.7291   LearningRate 0.0983   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:32:54,195-Speed 3267.53 samples/sec   Loss 20.7662   LearningRate 0.0983   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:32:57,403-Speed 3192.88 samples/sec   Loss 20.6753   LearningRate 0.0983   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:33:00,507-Speed 3299.95 samples/sec   Loss 20.6484   LearningRate 0.0983   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:33:03,606-Speed 3305.12 samples/sec   Loss 20.4271   LearningRate 0.0983   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:33:06,706-Speed 3303.88 samples/sec   Loss 20.3544   LearningRate 0.0983   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:33:09,798-Speed 3313.02 samples/sec   Loss 20.4687   LearningRate 0.0982   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:33:12,877-Speed 3327.95 samples/sec   Loss 20.4692   LearningRate 0.0982   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:33:15,957-Speed 3325.40 samples/sec   Loss 20.2655   LearningRate 0.0982   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:33:19,046-Speed 3316.15 samples/sec   Loss 20.3528   LearningRate 0.0982   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:22,110-Speed 3341.75 samples/sec   Loss 20.1840   LearningRate 0.0982   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:25,164-Speed 3354.27 samples/sec   Loss 20.1449   LearningRate 0.0982   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:28,246-Speed 3323.76 samples/sec   Loss 20.2561   LearningRate 0.0982   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:31,324-Speed 3327.52 samples/sec   Loss 20.0992   LearningRate 0.0982   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:34,392-Speed 3338.33 samples/sec   Loss 20.1598   LearningRate 0.0982   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:37,465-Speed 3332.66 samples/sec   Loss 20.0622   LearningRate 0.0982   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:40,546-Speed 3325.03 samples/sec   Loss 20.0349   LearningRate 0.0982   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:43,611-Speed 3341.52 samples/sec   Loss 19.9123   LearningRate 0.0982   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:46,692-Speed 3324.77 samples/sec   Loss 19.7062   LearningRate 0.0982   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:49,764-Speed 3334.13 samples/sec   Loss 19.8872   LearningRate 0.0982   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:52,828-Speed 3342.38 samples/sec   Loss 19.8704   LearningRate 0.0982   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:55,885-Speed 3350.62 samples/sec   Loss 19.8452   LearningRate 0.0982   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:33:58,957-Speed 3333.87 samples/sec   Loss 19.8162   LearningRate 0.0982   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:02,028-Speed 3335.39 samples/sec   Loss 19.7496   LearningRate 0.0981   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:05,094-Speed 3341.48 samples/sec   Loss 19.5674   LearningRate 0.0981   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:08,162-Speed 3337.68 samples/sec   Loss 19.6761   LearningRate 0.0981   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:11,233-Speed 3335.67 samples/sec   Loss 19.5345   LearningRate 0.0981   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:14,301-Speed 3338.70 samples/sec   Loss 19.5442   LearningRate 0.0981   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:17,375-Speed 3332.10 samples/sec   Loss 19.3958   LearningRate 0.0981   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:20,440-Speed 3342.40 samples/sec   Loss 19.4209   LearningRate 0.0981   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:23,523-Speed 3321.86 samples/sec   Loss 19.4434   LearningRate 0.0981   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:26,576-Speed 3354.68 samples/sec   Loss 19.3856   LearningRate 0.0981   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:29,656-Speed 3325.11 samples/sec   Loss 19.2465   LearningRate 0.0981   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:32,742-Speed 3319.31 samples/sec   Loss 19.3734   LearningRate 0.0981   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:35,854-Speed 3291.91 samples/sec   Loss 19.1490   LearningRate 0.0981   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:38,926-Speed 3333.79 samples/sec   Loss 19.3033   LearningRate 0.0981   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:34:41,990-Speed 3343.21 samples/sec   Loss 19.1933   LearningRate 0.0981   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:34:45,062-Speed 3334.39 samples/sec   Loss 19.2121   LearningRate 0.0981   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:34:48,143-Speed 3324.82 samples/sec   Loss 19.0650   LearningRate 0.0981   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:34:51,213-Speed 3336.68 samples/sec   Loss 19.1485   LearningRate 0.0981   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:34:54,295-Speed 3324.04 samples/sec   Loss 18.8300   LearningRate 0.0980   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:34:57,366-Speed 3335.33 samples/sec   Loss 18.9363   LearningRate 0.0980   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:00,472-Speed 3297.69 samples/sec   Loss 18.9832   LearningRate 0.0980   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:03,568-Speed 3308.26 samples/sec   Loss 18.7963   LearningRate 0.0980   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:06,651-Speed 3322.08 samples/sec   Loss 19.0049   LearningRate 0.0980   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:09,784-Speed 3268.67 samples/sec   Loss 18.5513   LearningRate 0.0980   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:12,900-Speed 3288.09 samples/sec   Loss 18.6787   LearningRate 0.0980   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:35:15,996-Speed 3307.80 samples/sec   Loss 18.7435   LearningRate 0.0980   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:35:19,071-Speed 3331.15 samples/sec   Loss 18.6339   LearningRate 0.0980   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:35:22,152-Speed 3324.36 samples/sec   Loss 18.6092   LearningRate 0.0980   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:35:25,218-Speed 3340.80 samples/sec   Loss 18.5760   LearningRate 0.0980   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:35:28,287-Speed 3337.51 samples/sec   Loss 18.5482   LearningRate 0.0980   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:35:31,363-Speed 3329.24 samples/sec   Loss 18.6302   LearningRate 0.0980   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:35:34,442-Speed 3327.46 samples/sec   Loss 18.2369   LearningRate 0.0980   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:35:37,508-Speed 3340.36 samples/sec   Loss 18.3161   LearningRate 0.0980   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:35:40,579-Speed 3335.63 samples/sec   Loss 18.3736   LearningRate 0.0980   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:43,648-Speed 3337.17 samples/sec   Loss 18.1804   LearningRate 0.0979   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:46,721-Speed 3332.86 samples/sec   Loss 18.2881   LearningRate 0.0979   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:49,785-Speed 3343.06 samples/sec   Loss 18.2413   LearningRate 0.0979   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:52,902-Speed 3286.13 samples/sec   Loss 18.0568   LearningRate 0.0979   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:55,983-Speed 3325.69 samples/sec   Loss 18.2652   LearningRate 0.0979   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:35:59,043-Speed 3346.28 samples/sec   Loss 18.1080   LearningRate 0.0979   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:36:02,107-Speed 3343.05 samples/sec   Loss 18.2014   LearningRate 0.0979   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:36:05,177-Speed 3336.48 samples/sec   Loss 18.1247   LearningRate 0.0979   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:36:08,242-Speed 3342.01 samples/sec   Loss 18.1619   LearningRate 0.0979   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:36:11,423-Speed 3220.10 samples/sec   Loss 18.0188   LearningRate 0.0979   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:36:14,623-Speed 3200.86 samples/sec   Loss 17.8522   LearningRate 0.0979   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:36:17,684-Speed 3346.04 samples/sec   Loss 17.8373   LearningRate 0.0979   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:36:20,802-Speed 3284.50 samples/sec   Loss 17.8159   LearningRate 0.0979   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:36:23,869-Speed 3340.31 samples/sec   Loss 17.8099   LearningRate 0.0979   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:36:26,938-Speed 3336.59 samples/sec   Loss 17.8200   LearningRate 0.0979   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:36:30,012-Speed 3332.78 samples/sec   Loss 17.6835   LearningRate 0.0979   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:36:33,099-Speed 3316.98 samples/sec   Loss 17.8987   LearningRate 0.0979   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:36:36,164-Speed 3342.06 samples/sec   Loss 17.7894   LearningRate 0.0978   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:36:39,238-Speed 3332.10 samples/sec   Loss 17.6256   LearningRate 0.0978   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:36:42,307-Speed 3336.88 samples/sec   Loss 17.7021   LearningRate 0.0978   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:36:45,388-Speed 3324.42 samples/sec   Loss 17.5927   LearningRate 0.0978   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:36:48,483-Speed 3309.66 samples/sec   Loss 17.6126   LearningRate 0.0978   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:36:51,543-Speed 3347.38 samples/sec   Loss 17.4561   LearningRate 0.0978   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:36:54,612-Speed 3337.93 samples/sec   Loss 17.4250   LearningRate 0.0978   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:36:57,690-Speed 3326.89 samples/sec   Loss 17.4219   LearningRate 0.0978   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:00,755-Speed 3342.42 samples/sec   Loss 17.5395   LearningRate 0.0978   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:03,819-Speed 3342.77 samples/sec   Loss 17.5290   LearningRate 0.0978   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:06,884-Speed 3342.09 samples/sec   Loss 17.3923   LearningRate 0.0978   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:09,969-Speed 3319.60 samples/sec   Loss 17.4074   LearningRate 0.0978   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:13,059-Speed 3314.61 samples/sec   Loss 17.3833   LearningRate 0.0978   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:16,214-Speed 3247.36 samples/sec   Loss 17.3609   LearningRate 0.0978   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:19,284-Speed 3335.73 samples/sec   Loss 17.2321   LearningRate 0.0978   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:22,352-Speed 3338.43 samples/sec   Loss 17.3572   LearningRate 0.0978   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:25,433-Speed 3324.81 samples/sec   Loss 17.3697   LearningRate 0.0978   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-10 23:37:28,528-Speed 3309.37 samples/sec   Loss 17.1190   LearningRate 0.0977   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:31,614-Speed 3319.81 samples/sec   Loss 17.0945   LearningRate 0.0977   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:34,712-Speed 3305.27 samples/sec   Loss 17.1244   LearningRate 0.0977   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:37,801-Speed 3316.50 samples/sec   Loss 17.1017   LearningRate 0.0977   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:40,927-Speed 3276.43 samples/sec   Loss 17.0591   LearningRate 0.0977   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:43,996-Speed 3337.19 samples/sec   Loss 17.0155   LearningRate 0.0977   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:47,075-Speed 3327.05 samples/sec   Loss 17.0781   LearningRate 0.0977   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:50,150-Speed 3330.63 samples/sec   Loss 17.0488   LearningRate 0.0977   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:53,234-Speed 3321.41 samples/sec   Loss 17.0458   LearningRate 0.0977   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:56,314-Speed 3325.88 samples/sec   Loss 16.8396   LearningRate 0.0977   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:37:59,401-Speed 3318.18 samples/sec   Loss 16.8454   LearningRate 0.0977   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:38:02,464-Speed 3343.18 samples/sec   Loss 16.8337   LearningRate 0.0977   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:38:05,533-Speed 3337.72 samples/sec   Loss 16.8376   LearningRate 0.0977   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-10 23:38:08,582-Speed 3359.19 samples/sec   Loss 16.6377   LearningRate 0.0977   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:38:11,645-Speed 3343.78 samples/sec   Loss 16.8371   LearningRate 0.0977   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:38:14,721-Speed 3330.23 samples/sec   Loss 16.4798   LearningRate 0.0977   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:38:17,830-Speed 3294.00 samples/sec   Loss 16.6665   LearningRate 0.0977   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:38:20,899-Speed 3337.59 samples/sec   Loss 16.5203   LearningRate 0.0976   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:38:23,967-Speed 3338.45 samples/sec   Loss 16.5978   LearningRate 0.0976   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:38:27,035-Speed 3338.43 samples/sec   Loss 16.6030   LearningRate 0.0976   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:38:30,110-Speed 3330.40 samples/sec   Loss 16.5513   LearningRate 0.0976   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:38:33,178-Speed 3338.31 samples/sec   Loss 16.5722   LearningRate 0.0976   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:38:36,258-Speed 3325.94 samples/sec   Loss 16.3990   LearningRate 0.0976   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-10 23:39:20,345-[lfw][4000]XNorm: 23.358341
Training: 2022-04-10 23:39:20,346-[lfw][4000]Accuracy-Flip: 0.99000+-0.00471
Training: 2022-04-10 23:39:20,346-[lfw][4000]Accuracy-Highest: 0.99000
Training: 2022-04-10 23:40:11,831-[cfp_fp][4000]XNorm: 21.006666
Training: 2022-04-10 23:40:11,832-[cfp_fp][4000]Accuracy-Flip: 0.88471+-0.01284
Training: 2022-04-10 23:40:11,832-[cfp_fp][4000]Accuracy-Highest: 0.88471
Training: 2022-04-10 23:40:56,304-[agedb_30][4000]XNorm: 22.891768
Training: 2022-04-10 23:40:56,305-[agedb_30][4000]Accuracy-Flip: 0.90267+-0.01750
Training: 2022-04-10 23:40:56,305-[agedb_30][4000]Accuracy-Highest: 0.90267
Training: 2022-04-10 23:40:59,383-Speed 71.55 samples/sec   Loss 16.3469   LearningRate 0.0976   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:41:02,451-Speed 3337.92 samples/sec   Loss 16.3846   LearningRate 0.0976   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:41:05,505-Speed 3354.62 samples/sec   Loss 16.4279   LearningRate 0.0976   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:41:08,772-Speed 3135.13 samples/sec   Loss 16.2523   LearningRate 0.0976   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:41:11,844-Speed 3333.91 samples/sec   Loss 16.4456   LearningRate 0.0976   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:41:15,051-Speed 3193.18 samples/sec   Loss 16.1268   LearningRate 0.0976   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:41:18,204-Speed 3249.18 samples/sec   Loss 15.9846   LearningRate 0.0976   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:41:21,244-Speed 3368.94 samples/sec   Loss 16.0773   LearningRate 0.0976   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:24,314-Speed 3336.17 samples/sec   Loss 15.9743   LearningRate 0.0976   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:27,455-Speed 3261.09 samples/sec   Loss 16.2596   LearningRate 0.0976   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:30,561-Speed 3297.46 samples/sec   Loss 16.1184   LearningRate 0.0976   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:33,624-Speed 3343.69 samples/sec   Loss 16.0398   LearningRate 0.0975   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:36,781-Speed 3244.09 samples/sec   Loss 15.9967   LearningRate 0.0975   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:39,880-Speed 3304.78 samples/sec   Loss 16.0780   LearningRate 0.0975   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:42,945-Speed 3342.71 samples/sec   Loss 16.0638   LearningRate 0.0975   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:46,120-Speed 3225.52 samples/sec   Loss 15.7937   LearningRate 0.0975   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:49,330-Speed 3190.41 samples/sec   Loss 15.8767   LearningRate 0.0975   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:41:52,441-Speed 3292.91 samples/sec   Loss 16.0731   LearningRate 0.0975   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:41:55,504-Speed 3343.93 samples/sec   Loss 15.9834   LearningRate 0.0975   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:41:58,569-Speed 3341.36 samples/sec   Loss 16.1104   LearningRate 0.0975   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:42:01,706-Speed 3265.44 samples/sec   Loss 15.9383   LearningRate 0.0975   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:42:04,792-Speed 3318.78 samples/sec   Loss 15.9781   LearningRate 0.0975   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:42:07,852-Speed 3347.35 samples/sec   Loss 15.8118   LearningRate 0.0975   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:42:10,922-Speed 3336.28 samples/sec   Loss 15.7330   LearningRate 0.0975   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:42:13,983-Speed 3346.24 samples/sec   Loss 15.6863   LearningRate 0.0975   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:42:17,031-Speed 3360.00 samples/sec   Loss 15.7561   LearningRate 0.0975   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:20,118-Speed 3317.77 samples/sec   Loss 15.6957   LearningRate 0.0975   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:23,177-Speed 3349.13 samples/sec   Loss 15.7600   LearningRate 0.0975   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:26,276-Speed 3305.23 samples/sec   Loss 15.9373   LearningRate 0.0974   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:29,354-Speed 3326.96 samples/sec   Loss 15.5243   LearningRate 0.0974   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:32,435-Speed 3324.76 samples/sec   Loss 15.6376   LearningRate 0.0974   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:35,502-Speed 3339.66 samples/sec   Loss 15.5724   LearningRate 0.0974   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:38,642-Speed 3261.74 samples/sec   Loss 15.4677   LearningRate 0.0974   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:41,717-Speed 3331.50 samples/sec   Loss 15.4972   LearningRate 0.0974   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:44,872-Speed 3246.46 samples/sec   Loss 15.3693   LearningRate 0.0974   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:42:48,058-Speed 3214.44 samples/sec   Loss 15.6622   LearningRate 0.0974   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:42:51,145-Speed 3317.79 samples/sec   Loss 15.4211   LearningRate 0.0974   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:42:54,209-Speed 3343.30 samples/sec   Loss 15.4540   LearningRate 0.0974   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:42:57,385-Speed 3225.75 samples/sec   Loss 15.2338   LearningRate 0.0974   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:43:00,471-Speed 3318.83 samples/sec   Loss 15.2962   LearningRate 0.0974   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:43:03,545-Speed 3331.99 samples/sec   Loss 15.4283   LearningRate 0.0974   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:43:06,626-Speed 3324.82 samples/sec   Loss 15.2768   LearningRate 0.0974   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:43:09,771-Speed 3256.68 samples/sec   Loss 15.2443   LearningRate 0.0974   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:43:12,833-Speed 3344.27 samples/sec   Loss 15.2797   LearningRate 0.0974   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:43:15,906-Speed 3333.55 samples/sec   Loss 15.3214   LearningRate 0.0974   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:43:18,957-Speed 3356.63 samples/sec   Loss 15.3390   LearningRate 0.0973   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:43:22,129-Speed 3229.25 samples/sec   Loss 15.1456   LearningRate 0.0973   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:43:25,193-Speed 3343.38 samples/sec   Loss 14.9928   LearningRate 0.0973   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:43:28,239-Speed 3361.86 samples/sec   Loss 15.1637   LearningRate 0.0973   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:31,302-Speed 3344.19 samples/sec   Loss 15.2171   LearningRate 0.0973   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:34,365-Speed 3343.75 samples/sec   Loss 15.1622   LearningRate 0.0973   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:37,422-Speed 3350.29 samples/sec   Loss 15.0961   LearningRate 0.0973   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:40,482-Speed 3348.45 samples/sec   Loss 15.1515   LearningRate 0.0973   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:43,554-Speed 3334.71 samples/sec   Loss 14.7784   LearningRate 0.0973   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:46,621-Speed 3339.14 samples/sec   Loss 15.0225   LearningRate 0.0973   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:49,684-Speed 3343.55 samples/sec   Loss 14.8119   LearningRate 0.0973   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:52,752-Speed 3339.01 samples/sec   Loss 14.8735   LearningRate 0.0973   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:55,811-Speed 3348.39 samples/sec   Loss 14.8098   LearningRate 0.0973   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:43:58,902-Speed 3313.53 samples/sec   Loss 14.8457   LearningRate 0.0973   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:44:01,963-Speed 3345.79 samples/sec   Loss 14.9573   LearningRate 0.0973   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:44:05,042-Speed 3327.17 samples/sec   Loss 14.7254   LearningRate 0.0973   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:44:08,122-Speed 3325.44 samples/sec   Loss 14.6984   LearningRate 0.0973   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:44:11,197-Speed 3330.92 samples/sec   Loss 14.7744   LearningRate 0.0972   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:14,260-Speed 3344.22 samples/sec   Loss 14.8515   LearningRate 0.0972   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:17,334-Speed 3331.63 samples/sec   Loss 14.6949   LearningRate 0.0972   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:20,401-Speed 3339.19 samples/sec   Loss 14.7677   LearningRate 0.0972   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:23,462-Speed 3347.37 samples/sec   Loss 14.7669   LearningRate 0.0972   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:26,524-Speed 3344.28 samples/sec   Loss 14.7221   LearningRate 0.0972   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:29,617-Speed 3311.90 samples/sec   Loss 14.6940   LearningRate 0.0972   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:32,713-Speed 3307.38 samples/sec   Loss 14.7464   LearningRate 0.0972   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:35,949-Speed 3165.48 samples/sec   Loss 14.7824   LearningRate 0.0972   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:39,100-Speed 3250.31 samples/sec   Loss 14.6010   LearningRate 0.0972   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:44:42,162-Speed 3346.04 samples/sec   Loss 14.4889   LearningRate 0.0972   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:44:45,228-Speed 3340.11 samples/sec   Loss 14.5902   LearningRate 0.0972   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:44:48,290-Speed 3345.02 samples/sec   Loss 14.4695   LearningRate 0.0972   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:44:51,328-Speed 3371.32 samples/sec   Loss 14.5418   LearningRate 0.0972   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:44:54,387-Speed 3348.97 samples/sec   Loss 14.3838   LearningRate 0.0972   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:44:57,444-Speed 3350.32 samples/sec   Loss 14.5652   LearningRate 0.0972   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:45:00,514-Speed 3336.10 samples/sec   Loss 14.5007   LearningRate 0.0972   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:45:03,585-Speed 3334.96 samples/sec   Loss 14.2959   LearningRate 0.0971   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:45:06,719-Speed 3267.80 samples/sec   Loss 14.4434   LearningRate 0.0971   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:45:09,941-Speed 3179.20 samples/sec   Loss 14.2134   LearningRate 0.0971   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:45:13,010-Speed 3337.07 samples/sec   Loss 14.3033   LearningRate 0.0971   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:45:16,084-Speed 3332.66 samples/sec   Loss 14.3575   LearningRate 0.0971   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:45:19,269-Speed 3215.94 samples/sec   Loss 14.2953   LearningRate 0.0971   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:45:22,385-Speed 3287.32 samples/sec   Loss 14.4246   LearningRate 0.0971   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:25,447-Speed 3344.05 samples/sec   Loss 14.1443   LearningRate 0.0971   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:28,509-Speed 3344.76 samples/sec   Loss 14.1488   LearningRate 0.0971   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:31,578-Speed 3338.25 samples/sec   Loss 14.4485   LearningRate 0.0971   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:34,649-Speed 3335.43 samples/sec   Loss 14.3116   LearningRate 0.0971   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:37,716-Speed 3339.02 samples/sec   Loss 14.3164   LearningRate 0.0971   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:40,777-Speed 3346.88 samples/sec   Loss 14.1565   LearningRate 0.0971   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:43,842-Speed 3341.92 samples/sec   Loss 14.2120   LearningRate 0.0971   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:46,916-Speed 3331.38 samples/sec   Loss 14.1609   LearningRate 0.0971   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:50,004-Speed 3316.60 samples/sec   Loss 14.2446   LearningRate 0.0971   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:45:53,073-Speed 3337.55 samples/sec   Loss 14.1517   LearningRate 0.0971   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:45:56,139-Speed 3340.81 samples/sec   Loss 14.1427   LearningRate 0.0970   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:45:59,203-Speed 3343.10 samples/sec   Loss 14.0459   LearningRate 0.0970   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:02,285-Speed 3323.51 samples/sec   Loss 13.9772   LearningRate 0.0970   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:05,346-Speed 3346.23 samples/sec   Loss 14.0022   LearningRate 0.0970   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:08,408-Speed 3345.01 samples/sec   Loss 14.0105   LearningRate 0.0970   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:11,472-Speed 3343.63 samples/sec   Loss 13.8916   LearningRate 0.0970   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:14,536-Speed 3342.30 samples/sec   Loss 13.9741   LearningRate 0.0970   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:17,602-Speed 3340.35 samples/sec   Loss 13.9544   LearningRate 0.0970   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:20,672-Speed 3336.87 samples/sec   Loss 13.9423   LearningRate 0.0970   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:23,732-Speed 3346.67 samples/sec   Loss 13.9145   LearningRate 0.0970   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:26,810-Speed 3327.41 samples/sec   Loss 13.7823   LearningRate 0.0970   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:29,873-Speed 3344.21 samples/sec   Loss 13.8759   LearningRate 0.0970   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:46:32,922-Speed 3359.34 samples/sec   Loss 13.9516   LearningRate 0.0970   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:46:36,012-Speed 3315.33 samples/sec   Loss 14.0039   LearningRate 0.0970   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:46:39,153-Speed 3261.45 samples/sec   Loss 13.8135   LearningRate 0.0970   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:46:42,214-Speed 3346.12 samples/sec   Loss 13.8058   LearningRate 0.0970   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:46:45,283-Speed 3336.96 samples/sec   Loss 13.7444   LearningRate 0.0970   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:46:48,345-Speed 3344.92 samples/sec   Loss 13.7333   LearningRate 0.0969   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:46:51,428-Speed 3322.04 samples/sec   Loss 13.8047   LearningRate 0.0969   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:46:54,508-Speed 3326.37 samples/sec   Loss 13.8318   LearningRate 0.0969   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:46:57,587-Speed 3326.40 samples/sec   Loss 13.7793   LearningRate 0.0969   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:47:00,651-Speed 3342.41 samples/sec   Loss 13.8286   LearningRate 0.0969   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:47:03,716-Speed 3342.16 samples/sec   Loss 13.5400   LearningRate 0.0969   Epoch: 0   Global Step: 5190   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:47:06,802-Speed 3318.91 samples/sec   Loss 13.7063   LearningRate 0.0969   Epoch: 0   Global Step: 5200   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:47:09,876-Speed 3331.80 samples/sec   Loss 13.6137   LearningRate 0.0969   Epoch: 0   Global Step: 5210   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:47:12,925-Speed 3359.29 samples/sec   Loss 13.6313   LearningRate 0.0969   Epoch: 0   Global Step: 5220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:47:16,018-Speed 3311.58 samples/sec   Loss 13.6446   LearningRate 0.0969   Epoch: 0   Global Step: 5230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:47:19,112-Speed 3310.09 samples/sec   Loss 13.5831   LearningRate 0.0969   Epoch: 0   Global Step: 5240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:47:22,176-Speed 3343.61 samples/sec   Loss 13.5210   LearningRate 0.0969   Epoch: 0   Global Step: 5250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:47:25,249-Speed 3332.67 samples/sec   Loss 13.4761   LearningRate 0.0969   Epoch: 0   Global Step: 5260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:47:28,308-Speed 3348.34 samples/sec   Loss 13.3880   LearningRate 0.0969   Epoch: 0   Global Step: 5270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:47:31,397-Speed 3315.26 samples/sec   Loss 13.6845   LearningRate 0.0969   Epoch: 0   Global Step: 5280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:47:34,461-Speed 3343.41 samples/sec   Loss 13.5028   LearningRate 0.0969   Epoch: 0   Global Step: 5290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:47:37,530-Speed 3338.02 samples/sec   Loss 13.3762   LearningRate 0.0968   Epoch: 0   Global Step: 5300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:47:40,592-Speed 3344.48 samples/sec   Loss 13.4718   LearningRate 0.0968   Epoch: 0   Global Step: 5310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:47:43,702-Speed 3293.31 samples/sec   Loss 13.4654   LearningRate 0.0968   Epoch: 0   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:47:46,776-Speed 3332.34 samples/sec   Loss 13.2336   LearningRate 0.0968   Epoch: 0   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:47:49,840-Speed 3343.12 samples/sec   Loss 13.2893   LearningRate 0.0968   Epoch: 0   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:47:52,918-Speed 3328.17 samples/sec   Loss 13.4275   LearningRate 0.0968   Epoch: 0   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:47:55,978-Speed 3346.80 samples/sec   Loss 13.3417   LearningRate 0.0968   Epoch: 0   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:47:59,041-Speed 3343.32 samples/sec   Loss 13.4075   LearningRate 0.0968   Epoch: 0   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:02,103-Speed 3344.77 samples/sec   Loss 13.2651   LearningRate 0.0968   Epoch: 0   Global Step: 5380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:05,185-Speed 3323.71 samples/sec   Loss 13.2598   LearningRate 0.0968   Epoch: 0   Global Step: 5390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:08,257-Speed 3334.32 samples/sec   Loss 13.2210   LearningRate 0.0968   Epoch: 0   Global Step: 5400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:11,335-Speed 3328.05 samples/sec   Loss 13.3336   LearningRate 0.0968   Epoch: 0   Global Step: 5410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:14,396-Speed 3346.03 samples/sec   Loss 13.2367   LearningRate 0.0968   Epoch: 0   Global Step: 5420   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:48:17,454-Speed 3349.14 samples/sec   Loss 13.2396   LearningRate 0.0968   Epoch: 0   Global Step: 5430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:20,517-Speed 3344.00 samples/sec   Loss 13.3275   LearningRate 0.0968   Epoch: 0   Global Step: 5440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:23,602-Speed 3320.37 samples/sec   Loss 13.0505   LearningRate 0.0968   Epoch: 0   Global Step: 5450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:26,681-Speed 3326.61 samples/sec   Loss 13.3138   LearningRate 0.0968   Epoch: 0   Global Step: 5460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:29,748-Speed 3339.15 samples/sec   Loss 13.2317   LearningRate 0.0967   Epoch: 0   Global Step: 5470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:32,846-Speed 3306.47 samples/sec   Loss 13.1082   LearningRate 0.0967   Epoch: 0   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:36,066-Speed 3181.52 samples/sec   Loss 13.1774   LearningRate 0.0967   Epoch: 0   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:39,183-Speed 3285.68 samples/sec   Loss 12.9938   LearningRate 0.0967   Epoch: 0   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:42,260-Speed 3328.55 samples/sec   Loss 13.2314   LearningRate 0.0967   Epoch: 0   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:45,361-Speed 3302.76 samples/sec   Loss 13.1777   LearningRate 0.0967   Epoch: 0   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:48,441-Speed 3325.29 samples/sec   Loss 13.0937   LearningRate 0.0967   Epoch: 0   Global Step: 5530   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:48:51,618-Speed 3223.55 samples/sec   Loss 12.9117   LearningRate 0.0967   Epoch: 0   Global Step: 5540   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:48:54,675-Speed 3350.52 samples/sec   Loss 13.0751   LearningRate 0.0967   Epoch: 0   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:48:57,735-Speed 3347.33 samples/sec   Loss 13.0576   LearningRate 0.0967   Epoch: 0   Global Step: 5560   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:00,802-Speed 3340.33 samples/sec   Loss 13.1177   LearningRate 0.0967   Epoch: 0   Global Step: 5570   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:03,886-Speed 3321.13 samples/sec   Loss 13.0692   LearningRate 0.0967   Epoch: 0   Global Step: 5580   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:06,956-Speed 3335.95 samples/sec   Loss 12.9556   LearningRate 0.0967   Epoch: 0   Global Step: 5590   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:10,020-Speed 3342.92 samples/sec   Loss 12.8789   LearningRate 0.0967   Epoch: 0   Global Step: 5600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:13,118-Speed 3306.59 samples/sec   Loss 13.0295   LearningRate 0.0967   Epoch: 0   Global Step: 5610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:16,186-Speed 3337.99 samples/sec   Loss 13.0154   LearningRate 0.0967   Epoch: 0   Global Step: 5620   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:19,317-Speed 3271.12 samples/sec   Loss 12.7755   LearningRate 0.0967   Epoch: 0   Global Step: 5630   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:22,391-Speed 3332.18 samples/sec   Loss 12.8534   LearningRate 0.0966   Epoch: 0   Global Step: 5640   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:25,457-Speed 3340.75 samples/sec   Loss 12.7830   LearningRate 0.0966   Epoch: 0   Global Step: 5650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-10 23:49:28,529-Speed 3334.81 samples/sec   Loss 13.0644   LearningRate 0.0966   Epoch: 0   Global Step: 5660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:31,600-Speed 3335.05 samples/sec   Loss 13.0711   LearningRate 0.0966   Epoch: 0   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:34,669-Speed 3336.74 samples/sec   Loss 12.7435   LearningRate 0.0966   Epoch: 0   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:37,735-Speed 3341.53 samples/sec   Loss 12.7856   LearningRate 0.0966   Epoch: 0   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:40,805-Speed 3335.66 samples/sec   Loss 12.8527   LearningRate 0.0966   Epoch: 0   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:43,904-Speed 3305.08 samples/sec   Loss 12.7713   LearningRate 0.0966   Epoch: 0   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:46,970-Speed 3340.24 samples/sec   Loss 12.4527   LearningRate 0.0966   Epoch: 0   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:50,042-Speed 3334.56 samples/sec   Loss 12.6854   LearningRate 0.0966   Epoch: 0   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:53,114-Speed 3334.46 samples/sec   Loss 12.7981   LearningRate 0.0966   Epoch: 0   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:56,186-Speed 3334.60 samples/sec   Loss 12.6612   LearningRate 0.0966   Epoch: 0   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:49:59,282-Speed 3308.14 samples/sec   Loss 12.7206   LearningRate 0.0966   Epoch: 0   Global Step: 5760   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:50:02,413-Speed 3271.49 samples/sec   Loss 12.7370   LearningRate 0.0966   Epoch: 0   Global Step: 5770   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:50:05,500-Speed 3317.10 samples/sec   Loss 12.7269   LearningRate 0.0966   Epoch: 0   Global Step: 5780   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:50:08,591-Speed 3313.53 samples/sec   Loss 12.6349   LearningRate 0.0966   Epoch: 0   Global Step: 5790   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:50:11,660-Speed 3337.91 samples/sec   Loss 12.6296   LearningRate 0.0966   Epoch: 0   Global Step: 5800   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:50:14,717-Speed 3350.62 samples/sec   Loss 12.8911   LearningRate 0.0965   Epoch: 0   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:17,833-Speed 3287.07 samples/sec   Loss 12.5694   LearningRate 0.0965   Epoch: 0   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:20,923-Speed 3314.88 samples/sec   Loss 12.5880   LearningRate 0.0965   Epoch: 0   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:24,003-Speed 3325.09 samples/sec   Loss 12.5218   LearningRate 0.0965   Epoch: 0   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:27,094-Speed 3314.00 samples/sec   Loss 12.5447   LearningRate 0.0965   Epoch: 0   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:30,180-Speed 3318.96 samples/sec   Loss 12.6419   LearningRate 0.0965   Epoch: 0   Global Step: 5860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:33,247-Speed 3339.36 samples/sec   Loss 12.5677   LearningRate 0.0965   Epoch: 0   Global Step: 5870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:36,318-Speed 3335.84 samples/sec   Loss 12.4990   LearningRate 0.0965   Epoch: 0   Global Step: 5880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:39,397-Speed 3326.33 samples/sec   Loss 12.7043   LearningRate 0.0965   Epoch: 0   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:42,467-Speed 3336.01 samples/sec   Loss 12.7077   LearningRate 0.0965   Epoch: 0   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:45,537-Speed 3336.69 samples/sec   Loss 12.5875   LearningRate 0.0965   Epoch: 0   Global Step: 5910   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:50:48,701-Speed 3237.71 samples/sec   Loss 12.3888   LearningRate 0.0965   Epoch: 0   Global Step: 5920   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-10 23:50:51,767-Speed 3339.65 samples/sec   Loss 12.3489   LearningRate 0.0965   Epoch: 0   Global Step: 5930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:54,958-Speed 3210.55 samples/sec   Loss 12.4020   LearningRate 0.0965   Epoch: 0   Global Step: 5940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:50:58,038-Speed 3325.69 samples/sec   Loss 12.4577   LearningRate 0.0965   Epoch: 0   Global Step: 5950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:51:01,106-Speed 3338.43 samples/sec   Loss 12.4637   LearningRate 0.0965   Epoch: 0   Global Step: 5960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:51:04,180-Speed 3331.86 samples/sec   Loss 12.4887   LearningRate 0.0965   Epoch: 0   Global Step: 5970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:51:07,245-Speed 3341.98 samples/sec   Loss 12.2809   LearningRate 0.0964   Epoch: 0   Global Step: 5980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:51:10,312-Speed 3339.15 samples/sec   Loss 12.2628   LearningRate 0.0964   Epoch: 0   Global Step: 5990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:51:13,380-Speed 3337.90 samples/sec   Loss 12.3469   LearningRate 0.0964   Epoch: 0   Global Step: 6000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-10 23:51:57,889-[lfw][6000]XNorm: 23.810201
Training: 2022-04-10 23:51:57,889-[lfw][6000]Accuracy-Flip: 0.99383+-0.00380
Training: 2022-04-10 23:51:57,890-[lfw][6000]Accuracy-Highest: 0.99383
Training: 2022-04-10 23:52:49,304-[cfp_fp][6000]XNorm: 21.597567
Training: 2022-04-10 23:52:49,304-[cfp_fp][6000]Accuracy-Flip: 0.93529+-0.00943
Training: 2022-04-10 23:52:49,305-[cfp_fp][6000]Accuracy-Highest: 0.93529
Training: 2022-04-10 23:53:33,436-[agedb_30][6000]XNorm: 23.468731
Training: 2022-04-10 23:53:33,436-[agedb_30][6000]Accuracy-Flip: 0.93900+-0.01455
Training: 2022-04-10 23:53:33,437-[agedb_30][6000]Accuracy-Highest: 0.93900
Training: 2022-04-10 23:53:36,498-Speed 71.55 samples/sec   Loss 12.1791   LearningRate 0.0964   Epoch: 0   Global Step: 6010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:53:39,555-Speed 3350.29 samples/sec   Loss 12.3741   LearningRate 0.0964   Epoch: 0   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:53:42,609-Speed 3354.25 samples/sec   Loss 12.4063   LearningRate 0.0964   Epoch: 0   Global Step: 6030   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:53:45,672-Speed 3343.88 samples/sec   Loss 12.2706   LearningRate 0.0964   Epoch: 0   Global Step: 6040   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:53:48,724-Speed 3356.01 samples/sec   Loss 12.2211   LearningRate 0.0964   Epoch: 0   Global Step: 6050   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:53:51,790-Speed 3339.75 samples/sec   Loss 12.2627   LearningRate 0.0964   Epoch: 0   Global Step: 6060   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:53:54,857-Speed 3340.57 samples/sec   Loss 12.3481   LearningRate 0.0964   Epoch: 0   Global Step: 6070   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:53:57,920-Speed 3342.94 samples/sec   Loss 12.2093   LearningRate 0.0964   Epoch: 0   Global Step: 6080   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:54:00,971-Speed 3357.28 samples/sec   Loss 12.2567   LearningRate 0.0964   Epoch: 0   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:04,033-Speed 3345.20 samples/sec   Loss 12.2114   LearningRate 0.0964   Epoch: 0   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:07,089-Speed 3351.06 samples/sec   Loss 12.1932   LearningRate 0.0964   Epoch: 0   Global Step: 6110   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:10,151-Speed 3345.00 samples/sec   Loss 12.0787   LearningRate 0.0964   Epoch: 0   Global Step: 6120   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:13,292-Speed 3261.96 samples/sec   Loss 12.2689   LearningRate 0.0964   Epoch: 0   Global Step: 6130   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:16,357-Speed 3341.93 samples/sec   Loss 12.1546   LearningRate 0.0964   Epoch: 0   Global Step: 6140   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:19,484-Speed 3275.63 samples/sec   Loss 12.1486   LearningRate 0.0963   Epoch: 0   Global Step: 6150   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:22,545-Speed 3345.07 samples/sec   Loss 12.0692   LearningRate 0.0963   Epoch: 0   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:25,758-Speed 3187.79 samples/sec   Loss 12.0140   LearningRate 0.0963   Epoch: 0   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:28,922-Speed 3237.00 samples/sec   Loss 12.1114   LearningRate 0.0963   Epoch: 0   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:32,755-Speed 2672.34 samples/sec   Loss 12.0185   LearningRate 0.0963   Epoch: 0   Global Step: 6190   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:54:35,814-Speed 3348.47 samples/sec   Loss 12.1615   LearningRate 0.0963   Epoch: 0   Global Step: 6200   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:54:38,869-Speed 3352.56 samples/sec   Loss 12.1888   LearningRate 0.0963   Epoch: 0   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:41,928-Speed 3348.23 samples/sec   Loss 12.1591   LearningRate 0.0963   Epoch: 0   Global Step: 6220   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:44,985-Speed 3349.88 samples/sec   Loss 12.0651   LearningRate 0.0963   Epoch: 0   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:48,042-Speed 3350.86 samples/sec   Loss 11.8653   LearningRate 0.0963   Epoch: 0   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:51,107-Speed 3341.93 samples/sec   Loss 12.1227   LearningRate 0.0963   Epoch: 0   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:54,164-Speed 3350.73 samples/sec   Loss 11.8875   LearningRate 0.0963   Epoch: 0   Global Step: 6260   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:54:57,231-Speed 3339.85 samples/sec   Loss 11.8719   LearningRate 0.0963   Epoch: 0   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:55:00,295-Speed 3343.05 samples/sec   Loss 11.9308   LearningRate 0.0963   Epoch: 0   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:55:03,357-Speed 3344.63 samples/sec   Loss 12.0507   LearningRate 0.0963   Epoch: 0   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:55:06,422-Speed 3341.58 samples/sec   Loss 11.8631   LearningRate 0.0963   Epoch: 0   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-10 23:55:09,529-Speed 3297.05 samples/sec   Loss 11.8572   LearningRate 0.0963   Epoch: 0   Global Step: 6310   Fp16 Grad Scale: 262144   Required: 35 hours
Training: 2022-04-10 23:55:12,597-Speed 3338.60 samples/sec   Loss 11.8282   LearningRate 0.0962   Epoch: 0   Global Step: 6320   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:55:15,657-Speed 3346.93 samples/sec   Loss 12.0272   LearningRate 0.0962   Epoch: 0   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:18,715-Speed 3349.84 samples/sec   Loss 11.9427   LearningRate 0.0962   Epoch: 0   Global Step: 6340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:21,774-Speed 3347.42 samples/sec   Loss 11.8477   LearningRate 0.0962   Epoch: 0   Global Step: 6350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:24,846-Speed 3334.71 samples/sec   Loss 11.7680   LearningRate 0.0962   Epoch: 0   Global Step: 6360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:27,905-Speed 3347.68 samples/sec   Loss 11.9771   LearningRate 0.0962   Epoch: 0   Global Step: 6370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:30,968-Speed 3344.95 samples/sec   Loss 11.8346   LearningRate 0.0962   Epoch: 0   Global Step: 6380   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:34,032-Speed 3341.93 samples/sec   Loss 11.8648   LearningRate 0.0962   Epoch: 0   Global Step: 6390   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:37,086-Speed 3354.05 samples/sec   Loss 11.8361   LearningRate 0.0962   Epoch: 0   Global Step: 6400   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:42,620-Speed 1850.76 samples/sec   Loss 11.6811   LearningRate 0.0962   Epoch: 0   Global Step: 6410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:45,673-Speed 3353.87 samples/sec   Loss 11.8184   LearningRate 0.0962   Epoch: 0   Global Step: 6420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:55:48,733-Speed 3348.15 samples/sec   Loss 11.8468   LearningRate 0.0962   Epoch: 0   Global Step: 6430   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:55:51,807-Speed 3331.52 samples/sec   Loss 11.8400   LearningRate 0.0962   Epoch: 0   Global Step: 6440   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:55:54,874-Speed 3339.43 samples/sec   Loss 11.7154   LearningRate 0.0962   Epoch: 0   Global Step: 6450   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:55:57,961-Speed 3318.63 samples/sec   Loss 11.8214   LearningRate 0.0962   Epoch: 0   Global Step: 6460   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:56:01,073-Speed 3291.17 samples/sec   Loss 11.7413   LearningRate 0.0962   Epoch: 0   Global Step: 6470   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:56:04,158-Speed 3320.05 samples/sec   Loss 11.8470   LearningRate 0.0962   Epoch: 0   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:07,215-Speed 3350.53 samples/sec   Loss 11.6674   LearningRate 0.0961   Epoch: 0   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:10,273-Speed 3349.06 samples/sec   Loss 11.5777   LearningRate 0.0961   Epoch: 0   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:13,333-Speed 3347.09 samples/sec   Loss 11.6526   LearningRate 0.0961   Epoch: 0   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:16,445-Speed 3292.62 samples/sec   Loss 11.7201   LearningRate 0.0961   Epoch: 0   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:19,534-Speed 3315.85 samples/sec   Loss 11.5953   LearningRate 0.0961   Epoch: 0   Global Step: 6530   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:22,592-Speed 3348.73 samples/sec   Loss 11.5344   LearningRate 0.0961   Epoch: 0   Global Step: 6540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:25,656-Speed 3342.70 samples/sec   Loss 11.7552   LearningRate 0.0961   Epoch: 0   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:28,733-Speed 3329.31 samples/sec   Loss 11.6085   LearningRate 0.0961   Epoch: 0   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:31,818-Speed 3320.08 samples/sec   Loss 11.5651   LearningRate 0.0961   Epoch: 0   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:34,882-Speed 3343.49 samples/sec   Loss 11.7381   LearningRate 0.0961   Epoch: 0   Global Step: 6580   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:56:37,945-Speed 3343.40 samples/sec   Loss 11.5328   LearningRate 0.0961   Epoch: 0   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:41,005-Speed 3350.61 samples/sec   Loss 11.6515   LearningRate 0.0961   Epoch: 0   Global Step: 6600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:44,070-Speed 3341.18 samples/sec   Loss 11.6740   LearningRate 0.0961   Epoch: 0   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:56:47,124-Speed 3354.24 samples/sec   Loss 11.5879   LearningRate 0.0961   Epoch: 0   Global Step: 6620   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:56:50,193-Speed 3337.90 samples/sec   Loss 11.5783   LearningRate 0.0961   Epoch: 0   Global Step: 6630   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:56:53,264-Speed 3334.32 samples/sec   Loss 11.5610   LearningRate 0.0961   Epoch: 0   Global Step: 6640   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:56:56,328-Speed 3343.18 samples/sec   Loss 11.5399   LearningRate 0.0961   Epoch: 0   Global Step: 6650   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:56:59,401-Speed 3333.05 samples/sec   Loss 11.5272   LearningRate 0.0960   Epoch: 0   Global Step: 6660   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:57:02,479-Speed 3327.65 samples/sec   Loss 11.5590   LearningRate 0.0960   Epoch: 0   Global Step: 6670   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:57:05,545-Speed 3340.94 samples/sec   Loss 11.6058   LearningRate 0.0960   Epoch: 0   Global Step: 6680   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:57:08,603-Speed 3348.97 samples/sec   Loss 11.5780   LearningRate 0.0960   Epoch: 0   Global Step: 6690   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:57:11,698-Speed 3310.68 samples/sec   Loss 11.6961   LearningRate 0.0960   Epoch: 0   Global Step: 6700   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:57:14,758-Speed 3347.51 samples/sec   Loss 11.5549   LearningRate 0.0960   Epoch: 0   Global Step: 6710   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:57:17,852-Speed 3309.88 samples/sec   Loss 11.4074   LearningRate 0.0960   Epoch: 0   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:20,910-Speed 3349.36 samples/sec   Loss 11.4321   LearningRate 0.0960   Epoch: 0   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:23,976-Speed 3340.70 samples/sec   Loss 11.4272   LearningRate 0.0960   Epoch: 0   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:27,038-Speed 3345.27 samples/sec   Loss 11.3904   LearningRate 0.0960   Epoch: 0   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:30,131-Speed 3311.56 samples/sec   Loss 11.2907   LearningRate 0.0960   Epoch: 0   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:33,286-Speed 3246.09 samples/sec   Loss 11.3999   LearningRate 0.0960   Epoch: 0   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:36,385-Speed 3306.09 samples/sec   Loss 11.2511   LearningRate 0.0960   Epoch: 0   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:39,455-Speed 3336.36 samples/sec   Loss 11.3043   LearningRate 0.0960   Epoch: 0   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:42,525-Speed 3335.69 samples/sec   Loss 11.2700   LearningRate 0.0960   Epoch: 0   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:45,586-Speed 3346.29 samples/sec   Loss 11.4464   LearningRate 0.0960   Epoch: 0   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:48,648-Speed 3344.80 samples/sec   Loss 11.3118   LearningRate 0.0960   Epoch: 0   Global Step: 6820   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:57:51,706-Speed 3350.14 samples/sec   Loss 11.3981   LearningRate 0.0959   Epoch: 0   Global Step: 6830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:54,768-Speed 3344.87 samples/sec   Loss 11.2068   LearningRate 0.0959   Epoch: 0   Global Step: 6840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:57:57,867-Speed 3304.78 samples/sec   Loss 11.2989   LearningRate 0.0959   Epoch: 0   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:00,943-Speed 3329.87 samples/sec   Loss 11.4292   LearningRate 0.0959   Epoch: 0   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:04,017-Speed 3332.28 samples/sec   Loss 11.1177   LearningRate 0.0959   Epoch: 0   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:07,084-Speed 3339.09 samples/sec   Loss 11.2576   LearningRate 0.0959   Epoch: 0   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:10,145-Speed 3346.72 samples/sec   Loss 11.2532   LearningRate 0.0959   Epoch: 0   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:13,205-Speed 3346.03 samples/sec   Loss 11.3950   LearningRate 0.0959   Epoch: 0   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:16,279-Speed 3332.71 samples/sec   Loss 11.2805   LearningRate 0.0959   Epoch: 0   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:19,340-Speed 3345.88 samples/sec   Loss 11.3311   LearningRate 0.0959   Epoch: 0   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:22,412-Speed 3334.85 samples/sec   Loss 11.0398   LearningRate 0.0959   Epoch: 0   Global Step: 6930   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-10 23:58:25,467-Speed 3352.59 samples/sec   Loss 11.2206   LearningRate 0.0959   Epoch: 0   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:28,529-Speed 3345.10 samples/sec   Loss 11.0813   LearningRate 0.0959   Epoch: 0   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:31,592-Speed 3344.05 samples/sec   Loss 11.3399   LearningRate 0.0959   Epoch: 0   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:34,661-Speed 3337.07 samples/sec   Loss 11.2516   LearningRate 0.0959   Epoch: 0   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:37,731-Speed 3336.35 samples/sec   Loss 11.3358   LearningRate 0.0959   Epoch: 0   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:40,794-Speed 3343.85 samples/sec   Loss 11.2791   LearningRate 0.0959   Epoch: 0   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:43,861-Speed 3339.00 samples/sec   Loss 11.2395   LearningRate 0.0959   Epoch: 0   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:46,938-Speed 3329.15 samples/sec   Loss 11.1317   LearningRate 0.0958   Epoch: 0   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:50,054-Speed 3287.28 samples/sec   Loss 11.1160   LearningRate 0.0958   Epoch: 0   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:53,151-Speed 3306.92 samples/sec   Loss 11.1893   LearningRate 0.0958   Epoch: 0   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:56,207-Speed 3351.99 samples/sec   Loss 11.1248   LearningRate 0.0958   Epoch: 0   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:58:59,276-Speed 3337.52 samples/sec   Loss 11.1934   LearningRate 0.0958   Epoch: 0   Global Step: 7050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:02,339-Speed 3343.75 samples/sec   Loss 11.2319   LearningRate 0.0958   Epoch: 0   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:05,401-Speed 3344.87 samples/sec   Loss 11.0408   LearningRate 0.0958   Epoch: 0   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:08,481-Speed 3324.91 samples/sec   Loss 10.9880   LearningRate 0.0958   Epoch: 0   Global Step: 7080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:11,542-Speed 3346.51 samples/sec   Loss 11.1342   LearningRate 0.0958   Epoch: 0   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:14,603-Speed 3346.64 samples/sec   Loss 11.2227   LearningRate 0.0958   Epoch: 0   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:17,668-Speed 3341.76 samples/sec   Loss 11.0389   LearningRate 0.0958   Epoch: 0   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:20,763-Speed 3308.76 samples/sec   Loss 11.0989   LearningRate 0.0958   Epoch: 0   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:23,821-Speed 3349.52 samples/sec   Loss 11.0652   LearningRate 0.0958   Epoch: 0   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:26,874-Speed 3355.27 samples/sec   Loss 11.1028   LearningRate 0.0958   Epoch: 0   Global Step: 7140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-10 23:59:29,925-Speed 3356.47 samples/sec   Loss 11.0417   LearningRate 0.0958   Epoch: 0   Global Step: 7150   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:59:32,987-Speed 3344.44 samples/sec   Loss 10.9610   LearningRate 0.0958   Epoch: 0   Global Step: 7160   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:59:36,054-Speed 3340.35 samples/sec   Loss 10.9488   LearningRate 0.0958   Epoch: 0   Global Step: 7170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:59:39,115-Speed 3346.47 samples/sec   Loss 10.9578   LearningRate 0.0957   Epoch: 0   Global Step: 7180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:59:42,176-Speed 3345.74 samples/sec   Loss 10.8627   LearningRate 0.0957   Epoch: 0   Global Step: 7190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:59:45,240-Speed 3342.72 samples/sec   Loss 10.9818   LearningRate 0.0957   Epoch: 0   Global Step: 7200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:59:48,318-Speed 3327.83 samples/sec   Loss 10.9860   LearningRate 0.0957   Epoch: 0   Global Step: 7210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:59:51,402-Speed 3320.79 samples/sec   Loss 11.0131   LearningRate 0.0957   Epoch: 0   Global Step: 7220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:59:54,474-Speed 3334.51 samples/sec   Loss 10.8544   LearningRate 0.0957   Epoch: 0   Global Step: 7230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-10 23:59:57,546-Speed 3334.21 samples/sec   Loss 10.9196   LearningRate 0.0957   Epoch: 0   Global Step: 7240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:00:00,612-Speed 3339.87 samples/sec   Loss 10.8926   LearningRate 0.0957   Epoch: 0   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:03,683-Speed 3335.39 samples/sec   Loss 10.9113   LearningRate 0.0957   Epoch: 0   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:06,745-Speed 3345.89 samples/sec   Loss 10.8165   LearningRate 0.0957   Epoch: 0   Global Step: 7270   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:09,805-Speed 3346.44 samples/sec   Loss 10.8341   LearningRate 0.0957   Epoch: 0   Global Step: 7280   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:12,874-Speed 3337.34 samples/sec   Loss 10.9976   LearningRate 0.0957   Epoch: 0   Global Step: 7290   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:15,944-Speed 3336.14 samples/sec   Loss 10.8140   LearningRate 0.0957   Epoch: 0   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:19,017-Speed 3333.77 samples/sec   Loss 10.7526   LearningRate 0.0957   Epoch: 0   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:22,100-Speed 3322.46 samples/sec   Loss 10.9737   LearningRate 0.0957   Epoch: 0   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:25,164-Speed 3342.65 samples/sec   Loss 10.8644   LearningRate 0.0957   Epoch: 0   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:28,241-Speed 3329.01 samples/sec   Loss 10.9106   LearningRate 0.0957   Epoch: 0   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:00:31,301-Speed 3347.25 samples/sec   Loss 10.8175   LearningRate 0.0956   Epoch: 0   Global Step: 7350   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:00:34,353-Speed 3355.84 samples/sec   Loss 10.8617   LearningRate 0.0956   Epoch: 0   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:00:37,414-Speed 3345.42 samples/sec   Loss 10.8946   LearningRate 0.0956   Epoch: 0   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:00:40,537-Speed 3279.67 samples/sec   Loss 10.7912   LearningRate 0.0956   Epoch: 0   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:00:43,616-Speed 3327.04 samples/sec   Loss 10.7752   LearningRate 0.0956   Epoch: 0   Global Step: 7390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:00:46,695-Speed 3325.78 samples/sec   Loss 10.7010   LearningRate 0.0956   Epoch: 0   Global Step: 7400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:00:49,758-Speed 3344.15 samples/sec   Loss 10.7553   LearningRate 0.0956   Epoch: 0   Global Step: 7410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:00:52,827-Speed 3337.96 samples/sec   Loss 10.6354   LearningRate 0.0956   Epoch: 0   Global Step: 7420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:00:55,904-Speed 3329.00 samples/sec   Loss 10.8945   LearningRate 0.0956   Epoch: 0   Global Step: 7430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:00:58,989-Speed 3319.84 samples/sec   Loss 10.8632   LearningRate 0.0956   Epoch: 0   Global Step: 7440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:02,052-Speed 3344.09 samples/sec   Loss 10.5755   LearningRate 0.0956   Epoch: 0   Global Step: 7450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:05,123-Speed 3335.24 samples/sec   Loss 10.7386   LearningRate 0.0956   Epoch: 0   Global Step: 7460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:08,186-Speed 3343.96 samples/sec   Loss 10.7107   LearningRate 0.0956   Epoch: 0   Global Step: 7470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:11,247-Speed 3345.89 samples/sec   Loss 10.8401   LearningRate 0.0956   Epoch: 0   Global Step: 7480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:14,320-Speed 3333.33 samples/sec   Loss 10.7081   LearningRate 0.0956   Epoch: 0   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:01:17,391-Speed 3335.20 samples/sec   Loss 10.6108   LearningRate 0.0956   Epoch: 0   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:01:20,458-Speed 3339.72 samples/sec   Loss 10.6860   LearningRate 0.0956   Epoch: 0   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:01:23,519-Speed 3345.88 samples/sec   Loss 10.6256   LearningRate 0.0955   Epoch: 0   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:01:26,582-Speed 3343.68 samples/sec   Loss 10.7395   LearningRate 0.0955   Epoch: 0   Global Step: 7530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:29,647-Speed 3341.55 samples/sec   Loss 10.7167   LearningRate 0.0955   Epoch: 0   Global Step: 7540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:32,722-Speed 3331.17 samples/sec   Loss 10.6952   LearningRate 0.0955   Epoch: 0   Global Step: 7550   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:35,786-Speed 3343.01 samples/sec   Loss 10.6666   LearningRate 0.0955   Epoch: 0   Global Step: 7560   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:38,860-Speed 3331.91 samples/sec   Loss 10.7763   LearningRate 0.0955   Epoch: 0   Global Step: 7570   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:41,927-Speed 3339.67 samples/sec   Loss 10.6294   LearningRate 0.0955   Epoch: 0   Global Step: 7580   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:45,018-Speed 3313.45 samples/sec   Loss 10.5807   LearningRate 0.0955   Epoch: 0   Global Step: 7590   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:48,100-Speed 3322.91 samples/sec   Loss 10.5482   LearningRate 0.0955   Epoch: 0   Global Step: 7600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:51,193-Speed 3311.87 samples/sec   Loss 10.5044   LearningRate 0.0955   Epoch: 0   Global Step: 7610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:54,281-Speed 3317.42 samples/sec   Loss 10.4930   LearningRate 0.0955   Epoch: 0   Global Step: 7620   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:01:57,498-Speed 3183.29 samples/sec   Loss 10.5361   LearningRate 0.0955   Epoch: 0   Global Step: 7630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:00,733-Speed 3166.61 samples/sec   Loss 10.5304   LearningRate 0.0955   Epoch: 0   Global Step: 7640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:03,869-Speed 3265.47 samples/sec   Loss 10.5280   LearningRate 0.0955   Epoch: 0   Global Step: 7650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:06,944-Speed 3330.81 samples/sec   Loss 10.4558   LearningRate 0.0955   Epoch: 0   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:10,103-Speed 3243.08 samples/sec   Loss 10.6857   LearningRate 0.0955   Epoch: 0   Global Step: 7670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:13,348-Speed 3156.01 samples/sec   Loss 10.5874   LearningRate 0.0955   Epoch: 0   Global Step: 7680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:16,421-Speed 3333.03 samples/sec   Loss 10.5334   LearningRate 0.0954   Epoch: 0   Global Step: 7690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:19,500-Speed 3326.68 samples/sec   Loss 10.4597   LearningRate 0.0954   Epoch: 0   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:22,719-Speed 3181.87 samples/sec   Loss 10.4841   LearningRate 0.0954   Epoch: 0   Global Step: 7710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:25,833-Speed 3289.15 samples/sec   Loss 10.4540   LearningRate 0.0954   Epoch: 0   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:28,899-Speed 3340.71 samples/sec   Loss 10.5523   LearningRate 0.0954   Epoch: 0   Global Step: 7730   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:02:31,973-Speed 3332.29 samples/sec   Loss 10.5059   LearningRate 0.0954   Epoch: 0   Global Step: 7740   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:02:35,039-Speed 3339.42 samples/sec   Loss 10.5199   LearningRate 0.0954   Epoch: 0   Global Step: 7750   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:02:38,138-Speed 3305.43 samples/sec   Loss 10.4559   LearningRate 0.0954   Epoch: 0   Global Step: 7760   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:02:41,209-Speed 3334.63 samples/sec   Loss 10.4268   LearningRate 0.0954   Epoch: 0   Global Step: 7770   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:02:44,297-Speed 3317.66 samples/sec   Loss 10.3213   LearningRate 0.0954   Epoch: 0   Global Step: 7780   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:02:47,365-Speed 3338.51 samples/sec   Loss 10.5483   LearningRate 0.0954   Epoch: 0   Global Step: 7790   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:02:50,429-Speed 3342.57 samples/sec   Loss 10.4730   LearningRate 0.0954   Epoch: 0   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:53,499-Speed 3337.18 samples/sec   Loss 10.4366   LearningRate 0.0954   Epoch: 0   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:02:56,558-Speed 3348.19 samples/sec   Loss 10.2983   LearningRate 0.0954   Epoch: 0   Global Step: 7820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:02:59,651-Speed 3311.31 samples/sec   Loss 10.5207   LearningRate 0.0954   Epoch: 0   Global Step: 7830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:03:02,748-Speed 3306.84 samples/sec   Loss 10.3698   LearningRate 0.0954   Epoch: 0   Global Step: 7840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:03:05,829-Speed 3324.28 samples/sec   Loss 10.3824   LearningRate 0.0954   Epoch: 0   Global Step: 7850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:03:08,907-Speed 3328.22 samples/sec   Loss 10.6480   LearningRate 0.0953   Epoch: 0   Global Step: 7860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:03:12,041-Speed 3267.99 samples/sec   Loss 10.3550   LearningRate 0.0953   Epoch: 0   Global Step: 7870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:03:15,170-Speed 3273.64 samples/sec   Loss 10.3722   LearningRate 0.0953   Epoch: 0   Global Step: 7880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:03:18,344-Speed 3226.53 samples/sec   Loss 10.3877   LearningRate 0.0953   Epoch: 0   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:03:21,457-Speed 3290.96 samples/sec   Loss 10.3807   LearningRate 0.0953   Epoch: 0   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:03:24,638-Speed 3219.06 samples/sec   Loss 10.2656   LearningRate 0.0953   Epoch: 0   Global Step: 7910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:03:27,707-Speed 3337.03 samples/sec   Loss 10.3203   LearningRate 0.0953   Epoch: 0   Global Step: 7920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:03:30,772-Speed 3342.37 samples/sec   Loss 10.3159   LearningRate 0.0953   Epoch: 0   Global Step: 7930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:03:33,873-Speed 3303.27 samples/sec   Loss 10.2863   LearningRate 0.0953   Epoch: 0   Global Step: 7940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:03:36,965-Speed 3311.64 samples/sec   Loss 10.2288   LearningRate 0.0953   Epoch: 0   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:03:40,044-Speed 3327.42 samples/sec   Loss 10.1500   LearningRate 0.0953   Epoch: 0   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:03:43,128-Speed 3321.20 samples/sec   Loss 10.2787   LearningRate 0.0953   Epoch: 0   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:03:46,220-Speed 3312.84 samples/sec   Loss 10.2216   LearningRate 0.0953   Epoch: 0   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:03:49,329-Speed 3293.54 samples/sec   Loss 10.1673   LearningRate 0.0953   Epoch: 0   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:03:52,422-Speed 3312.58 samples/sec   Loss 10.2442   LearningRate 0.0953   Epoch: 0   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:04:36,747-[lfw][8000]XNorm: 23.446368
Training: 2022-04-11 00:04:36,748-[lfw][8000]Accuracy-Flip: 0.99500+-0.00269
Training: 2022-04-11 00:04:36,748-[lfw][8000]Accuracy-Highest: 0.99500
Training: 2022-04-11 00:05:28,095-[cfp_fp][8000]XNorm: 21.446333
Training: 2022-04-11 00:05:28,096-[cfp_fp][8000]Accuracy-Flip: 0.94314+-0.01053
Training: 2022-04-11 00:05:28,096-[cfp_fp][8000]Accuracy-Highest: 0.94314
Training: 2022-04-11 00:06:12,248-[agedb_30][8000]XNorm: 23.388890
Training: 2022-04-11 00:06:12,248-[agedb_30][8000]Accuracy-Flip: 0.94683+-0.00993
Training: 2022-04-11 00:06:12,249-[agedb_30][8000]Accuracy-Highest: 0.94683
Training: 2022-04-11 00:06:15,318-Speed 71.66 samples/sec   Loss 10.0513   LearningRate 0.0953   Epoch: 0   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-11 00:06:18,374-Speed 3351.55 samples/sec   Loss 10.2954   LearningRate 0.0953   Epoch: 0   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-11 00:06:21,485-Speed 3292.19 samples/sec   Loss 10.2943   LearningRate 0.0952   Epoch: 0   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-11 00:06:24,568-Speed 3322.26 samples/sec   Loss 10.0909   LearningRate 0.0952   Epoch: 0   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-11 00:06:27,633-Speed 3342.44 samples/sec   Loss 10.2151   LearningRate 0.0952   Epoch: 0   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-11 00:06:30,687-Speed 3353.36 samples/sec   Loss 10.1632   LearningRate 0.0952   Epoch: 0   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-11 00:06:33,741-Speed 3353.89 samples/sec   Loss 10.3873   LearningRate 0.0952   Epoch: 0   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 35 hours
Training: 2022-04-11 00:06:36,836-Speed 3309.00 samples/sec   Loss 10.2947   LearningRate 0.0952   Epoch: 0   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:06:39,896-Speed 3347.53 samples/sec   Loss 10.1278   LearningRate 0.0952   Epoch: 0   Global Step: 8090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:06:42,977-Speed 3323.60 samples/sec   Loss 10.3104   LearningRate 0.0952   Epoch: 0   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:06:46,037-Speed 3347.45 samples/sec   Loss 10.2782   LearningRate 0.0952   Epoch: 0   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:06:49,115-Speed 3328.43 samples/sec   Loss 10.1683   LearningRate 0.0952   Epoch: 0   Global Step: 8120   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:06:52,203-Speed 3315.76 samples/sec   Loss 10.1873   LearningRate 0.0952   Epoch: 0   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:06:55,273-Speed 3336.40 samples/sec   Loss 10.1305   LearningRate 0.0952   Epoch: 0   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:06:58,367-Speed 3310.53 samples/sec   Loss 10.1158   LearningRate 0.0952   Epoch: 0   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:01,446-Speed 3326.20 samples/sec   Loss 10.1380   LearningRate 0.0952   Epoch: 0   Global Step: 8160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:04,508-Speed 3344.97 samples/sec   Loss 10.0548   LearningRate 0.0952   Epoch: 0   Global Step: 8170   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:07,608-Speed 3304.88 samples/sec   Loss 10.1436   LearningRate 0.0952   Epoch: 0   Global Step: 8180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:10,687-Speed 3326.72 samples/sec   Loss 10.1226   LearningRate 0.0952   Epoch: 0   Global Step: 8190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:13,749-Speed 3344.46 samples/sec   Loss 10.2665   LearningRate 0.0951   Epoch: 0   Global Step: 8200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:16,813-Speed 3343.31 samples/sec   Loss 10.0522   LearningRate 0.0951   Epoch: 0   Global Step: 8210   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:19,881-Speed 3338.45 samples/sec   Loss 10.1548   LearningRate 0.0951   Epoch: 0   Global Step: 8220   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:22,977-Speed 3308.24 samples/sec   Loss 10.2272   LearningRate 0.0951   Epoch: 0   Global Step: 8230   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:26,040-Speed 3343.76 samples/sec   Loss 10.1971   LearningRate 0.0951   Epoch: 0   Global Step: 8240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:29,110-Speed 3335.76 samples/sec   Loss 10.1816   LearningRate 0.0951   Epoch: 0   Global Step: 8250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:32,182-Speed 3334.03 samples/sec   Loss 10.1914   LearningRate 0.0951   Epoch: 0   Global Step: 8260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:07:35,244-Speed 3345.17 samples/sec   Loss 10.0441   LearningRate 0.0951   Epoch: 0   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:38,320-Speed 3330.21 samples/sec   Loss 10.1623   LearningRate 0.0951   Epoch: 0   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:41,379-Speed 3348.19 samples/sec   Loss 9.9174   LearningRate 0.0951   Epoch: 0   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:44,453-Speed 3331.41 samples/sec   Loss 10.0479   LearningRate 0.0951   Epoch: 0   Global Step: 8300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:47,614-Speed 3240.76 samples/sec   Loss 10.0969   LearningRate 0.0951   Epoch: 0   Global Step: 8310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:50,743-Speed 3273.41 samples/sec   Loss 9.9253   LearningRate 0.0951   Epoch: 0   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:53,807-Speed 3342.07 samples/sec   Loss 9.8810   LearningRate 0.0951   Epoch: 0   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:56,923-Speed 3287.40 samples/sec   Loss 9.9691   LearningRate 0.0951   Epoch: 0   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:07:59,987-Speed 3342.80 samples/sec   Loss 9.9592   LearningRate 0.0951   Epoch: 0   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:08:03,067-Speed 3325.36 samples/sec   Loss 9.9304   LearningRate 0.0951   Epoch: 0   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:08:06,158-Speed 3313.71 samples/sec   Loss 10.0138   LearningRate 0.0950   Epoch: 0   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:08:09,215-Speed 3350.61 samples/sec   Loss 10.0756   LearningRate 0.0950   Epoch: 0   Global Step: 8380   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:12,376-Speed 3240.59 samples/sec   Loss 10.0201   LearningRate 0.0950   Epoch: 0   Global Step: 8390   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:15,485-Speed 3294.43 samples/sec   Loss 10.0234   LearningRate 0.0950   Epoch: 0   Global Step: 8400   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:18,602-Speed 3285.77 samples/sec   Loss 9.8865   LearningRate 0.0950   Epoch: 0   Global Step: 8410   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:21,660-Speed 3349.10 samples/sec   Loss 9.9121   LearningRate 0.0950   Epoch: 0   Global Step: 8420   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:24,800-Speed 3262.85 samples/sec   Loss 9.9441   LearningRate 0.0950   Epoch: 0   Global Step: 8430   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:27,877-Speed 3328.27 samples/sec   Loss 10.0032   LearningRate 0.0950   Epoch: 0   Global Step: 8440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:31,005-Speed 3274.23 samples/sec   Loss 9.9492   LearningRate 0.0950   Epoch: 0   Global Step: 8450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:34,072-Speed 3339.67 samples/sec   Loss 9.9854   LearningRate 0.0950   Epoch: 0   Global Step: 8460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:37,142-Speed 3336.80 samples/sec   Loss 9.9653   LearningRate 0.0950   Epoch: 0   Global Step: 8470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:40,223-Speed 3323.28 samples/sec   Loss 9.8294   LearningRate 0.0950   Epoch: 0   Global Step: 8480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:08:43,294-Speed 3335.88 samples/sec   Loss 9.9806   LearningRate 0.0950   Epoch: 0   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:08:46,337-Speed 3365.15 samples/sec   Loss 9.7824   LearningRate 0.0950   Epoch: 0   Global Step: 8500   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:49,402-Speed 3342.41 samples/sec   Loss 9.9180   LearningRate 0.0950   Epoch: 0   Global Step: 8510   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:52,467-Speed 3341.59 samples/sec   Loss 9.8647   LearningRate 0.0950   Epoch: 0   Global Step: 8520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:55,532-Speed 3341.69 samples/sec   Loss 9.8422   LearningRate 0.0950   Epoch: 0   Global Step: 8530   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:08:58,609-Speed 3329.14 samples/sec   Loss 9.8714   LearningRate 0.0949   Epoch: 0   Global Step: 8540   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:09:01,683-Speed 3331.71 samples/sec   Loss 10.0246   LearningRate 0.0949   Epoch: 0   Global Step: 8550   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:09:04,744-Speed 3345.30 samples/sec   Loss 9.9097   LearningRate 0.0949   Epoch: 0   Global Step: 8560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:09:07,811-Speed 3340.27 samples/sec   Loss 9.8768   LearningRate 0.0949   Epoch: 0   Global Step: 8570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:09:10,874-Speed 3344.16 samples/sec   Loss 9.8178   LearningRate 0.0949   Epoch: 0   Global Step: 8580   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:09:13,953-Speed 3326.15 samples/sec   Loss 9.8547   LearningRate 0.0949   Epoch: 0   Global Step: 8590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:09:17,019-Speed 3340.44 samples/sec   Loss 9.8731   LearningRate 0.0949   Epoch: 0   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:20,085-Speed 3341.35 samples/sec   Loss 9.9601   LearningRate 0.0949   Epoch: 0   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:23,154-Speed 3336.97 samples/sec   Loss 9.8051   LearningRate 0.0949   Epoch: 0   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:26,295-Speed 3261.91 samples/sec   Loss 9.7779   LearningRate 0.0949   Epoch: 0   Global Step: 8630   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:29,361-Speed 3340.35 samples/sec   Loss 9.8182   LearningRate 0.0949   Epoch: 0   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:32,441-Speed 3324.75 samples/sec   Loss 9.8088   LearningRate 0.0949   Epoch: 0   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:35,595-Speed 3247.30 samples/sec   Loss 9.9267   LearningRate 0.0949   Epoch: 0   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:38,723-Speed 3274.49 samples/sec   Loss 9.7387   LearningRate 0.0949   Epoch: 0   Global Step: 8670   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:41,820-Speed 3307.52 samples/sec   Loss 9.7789   LearningRate 0.0949   Epoch: 0   Global Step: 8680   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:44,911-Speed 3314.30 samples/sec   Loss 9.7816   LearningRate 0.0949   Epoch: 0   Global Step: 8690   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:47,978-Speed 3338.87 samples/sec   Loss 9.7131   LearningRate 0.0949   Epoch: 0   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:51,045-Speed 3339.83 samples/sec   Loss 9.8186   LearningRate 0.0948   Epoch: 0   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:54,145-Speed 3303.51 samples/sec   Loss 9.7400   LearningRate 0.0948   Epoch: 0   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:09:57,231-Speed 3319.86 samples/sec   Loss 9.7819   LearningRate 0.0948   Epoch: 0   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:00,343-Speed 3290.54 samples/sec   Loss 9.6609   LearningRate 0.0948   Epoch: 0   Global Step: 8740   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:03,445-Speed 3302.08 samples/sec   Loss 9.6479   LearningRate 0.0948   Epoch: 0   Global Step: 8750   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:06,559-Speed 3289.60 samples/sec   Loss 9.5891   LearningRate 0.0948   Epoch: 0   Global Step: 8760   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:09,643-Speed 3321.09 samples/sec   Loss 9.7034   LearningRate 0.0948   Epoch: 0   Global Step: 8770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:12,712-Speed 3337.64 samples/sec   Loss 9.7924   LearningRate 0.0948   Epoch: 0   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:15,775-Speed 3343.11 samples/sec   Loss 9.7884   LearningRate 0.0948   Epoch: 0   Global Step: 8790   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:18,853-Speed 3327.87 samples/sec   Loss 9.8394   LearningRate 0.0948   Epoch: 0   Global Step: 8800   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:10:21,923-Speed 3336.12 samples/sec   Loss 9.7421   LearningRate 0.0948   Epoch: 0   Global Step: 8810   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:25,047-Speed 3280.21 samples/sec   Loss 9.6455   LearningRate 0.0948   Epoch: 0   Global Step: 8820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:28,111-Speed 3343.03 samples/sec   Loss 9.7228   LearningRate 0.0948   Epoch: 0   Global Step: 8830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:31,185-Speed 3332.69 samples/sec   Loss 9.5820   LearningRate 0.0948   Epoch: 0   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:34,259-Speed 3331.37 samples/sec   Loss 9.6067   LearningRate 0.0948   Epoch: 0   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:37,329-Speed 3336.43 samples/sec   Loss 9.7451   LearningRate 0.0948   Epoch: 0   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:40,445-Speed 3286.52 samples/sec   Loss 9.5995   LearningRate 0.0948   Epoch: 0   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:43,545-Speed 3304.25 samples/sec   Loss 9.5767   LearningRate 0.0948   Epoch: 0   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:46,645-Speed 3304.00 samples/sec   Loss 9.6176   LearningRate 0.0947   Epoch: 0   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:49,717-Speed 3333.42 samples/sec   Loss 9.5656   LearningRate 0.0947   Epoch: 0   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:52,798-Speed 3324.92 samples/sec   Loss 9.6494   LearningRate 0.0947   Epoch: 0   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:55,861-Speed 3343.57 samples/sec   Loss 9.6408   LearningRate 0.0947   Epoch: 0   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:10:58,930-Speed 3337.98 samples/sec   Loss 9.7550   LearningRate 0.0947   Epoch: 0   Global Step: 8930   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:02,013-Speed 3321.84 samples/sec   Loss 9.7617   LearningRate 0.0947   Epoch: 0   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:05,089-Speed 3330.60 samples/sec   Loss 9.7234   LearningRate 0.0947   Epoch: 0   Global Step: 8950   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:08,218-Speed 3273.24 samples/sec   Loss 9.5605   LearningRate 0.0947   Epoch: 0   Global Step: 8960   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:11,296-Speed 3326.89 samples/sec   Loss 9.4998   LearningRate 0.0947   Epoch: 0   Global Step: 8970   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:14,378-Speed 3323.37 samples/sec   Loss 9.5305   LearningRate 0.0947   Epoch: 0   Global Step: 8980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:17,471-Speed 3312.27 samples/sec   Loss 9.6664   LearningRate 0.0947   Epoch: 0   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:20,555-Speed 3320.58 samples/sec   Loss 9.7118   LearningRate 0.0947   Epoch: 0   Global Step: 9000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:23,626-Speed 3335.07 samples/sec   Loss 9.6569   LearningRate 0.0947   Epoch: 0   Global Step: 9010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:26,702-Speed 3329.53 samples/sec   Loss 9.4898   LearningRate 0.0947   Epoch: 0   Global Step: 9020   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:29,796-Speed 3310.62 samples/sec   Loss 9.5321   LearningRate 0.0947   Epoch: 0   Global Step: 9030   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:32,890-Speed 3310.29 samples/sec   Loss 9.5800   LearningRate 0.0947   Epoch: 0   Global Step: 9040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:35,961-Speed 3335.18 samples/sec   Loss 9.4594   LearningRate 0.0947   Epoch: 0   Global Step: 9050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:39,027-Speed 3341.57 samples/sec   Loss 9.6220   LearningRate 0.0946   Epoch: 0   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:42,093-Speed 3340.79 samples/sec   Loss 9.6651   LearningRate 0.0946   Epoch: 0   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:45,161-Speed 3338.09 samples/sec   Loss 9.5789   LearningRate 0.0946   Epoch: 0   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:48,227-Speed 3340.44 samples/sec   Loss 9.5751   LearningRate 0.0946   Epoch: 0   Global Step: 9090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:51,305-Speed 3327.50 samples/sec   Loss 9.5735   LearningRate 0.0946   Epoch: 0   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:11:54,391-Speed 3319.15 samples/sec   Loss 9.4974   LearningRate 0.0946   Epoch: 0   Global Step: 9110   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:11:57,474-Speed 3321.66 samples/sec   Loss 9.4519   LearningRate 0.0946   Epoch: 0   Global Step: 9120   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:12:00,540-Speed 3341.35 samples/sec   Loss 9.4536   LearningRate 0.0946   Epoch: 0   Global Step: 9130   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:12:03,631-Speed 3313.08 samples/sec   Loss 9.4192   LearningRate 0.0946   Epoch: 0   Global Step: 9140   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:12:06,735-Speed 3300.30 samples/sec   Loss 9.5150   LearningRate 0.0946   Epoch: 0   Global Step: 9150   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:12:09,805-Speed 3336.24 samples/sec   Loss 9.6778   LearningRate 0.0946   Epoch: 0   Global Step: 9160   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:12:12,906-Speed 3303.34 samples/sec   Loss 9.4985   LearningRate 0.0946   Epoch: 0   Global Step: 9170   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:12:16,009-Speed 3300.45 samples/sec   Loss 9.4989   LearningRate 0.0946   Epoch: 0   Global Step: 9180   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:12:19,100-Speed 3313.93 samples/sec   Loss 9.4989   LearningRate 0.0946   Epoch: 0   Global Step: 9190   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:12:22,158-Speed 3350.44 samples/sec   Loss 9.5267   LearningRate 0.0946   Epoch: 0   Global Step: 9200   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:12:25,232-Speed 3331.13 samples/sec   Loss 9.5000   LearningRate 0.0946   Epoch: 0   Global Step: 9210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:12:28,296-Speed 3343.94 samples/sec   Loss 9.5212   LearningRate 0.0946   Epoch: 0   Global Step: 9220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:12:31,361-Speed 3340.90 samples/sec   Loss 9.4989   LearningRate 0.0945   Epoch: 0   Global Step: 9230   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:12:34,427-Speed 3341.43 samples/sec   Loss 9.3577   LearningRate 0.0945   Epoch: 0   Global Step: 9240   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:12:37,486-Speed 3347.94 samples/sec   Loss 9.4951   LearningRate 0.0945   Epoch: 0   Global Step: 9250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:12:40,555-Speed 3337.40 samples/sec   Loss 9.5800   LearningRate 0.0945   Epoch: 0   Global Step: 9260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:12:43,630-Speed 3330.79 samples/sec   Loss 9.4957   LearningRate 0.0945   Epoch: 0   Global Step: 9270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:12:46,705-Speed 3331.60 samples/sec   Loss 9.2816   LearningRate 0.0945   Epoch: 0   Global Step: 9280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:12:49,773-Speed 3338.16 samples/sec   Loss 9.4989   LearningRate 0.0945   Epoch: 0   Global Step: 9290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:12:52,840-Speed 3339.45 samples/sec   Loss 9.5958   LearningRate 0.0945   Epoch: 0   Global Step: 9300   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:12:55,913-Speed 3332.54 samples/sec   Loss 9.4203   LearningRate 0.0945   Epoch: 0   Global Step: 9310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:12:58,989-Speed 3329.67 samples/sec   Loss 9.4582   LearningRate 0.0945   Epoch: 0   Global Step: 9320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:02,056-Speed 3340.04 samples/sec   Loss 9.3948   LearningRate 0.0945   Epoch: 0   Global Step: 9330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:05,135-Speed 3326.60 samples/sec   Loss 9.4020   LearningRate 0.0945   Epoch: 0   Global Step: 9340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:08,207-Speed 3334.09 samples/sec   Loss 9.3962   LearningRate 0.0945   Epoch: 0   Global Step: 9350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:13:11,274-Speed 3339.81 samples/sec   Loss 9.3448   LearningRate 0.0945   Epoch: 0   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:13:14,339-Speed 3341.24 samples/sec   Loss 9.4247   LearningRate 0.0945   Epoch: 0   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:13:17,408-Speed 3338.17 samples/sec   Loss 9.4071   LearningRate 0.0945   Epoch: 0   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:13:20,521-Speed 3290.05 samples/sec   Loss 9.5333   LearningRate 0.0945   Epoch: 0   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:13:23,588-Speed 3339.27 samples/sec   Loss 9.3589   LearningRate 0.0944   Epoch: 0   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:13:26,656-Speed 3338.85 samples/sec   Loss 9.1293   LearningRate 0.0944   Epoch: 0   Global Step: 9410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:29,720-Speed 3342.40 samples/sec   Loss 9.3040   LearningRate 0.0944   Epoch: 0   Global Step: 9420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:32,819-Speed 3305.28 samples/sec   Loss 9.2390   LearningRate 0.0944   Epoch: 0   Global Step: 9430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:35,931-Speed 3291.64 samples/sec   Loss 9.3286   LearningRate 0.0944   Epoch: 0   Global Step: 9440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:39,094-Speed 3237.50 samples/sec   Loss 9.3578   LearningRate 0.0944   Epoch: 0   Global Step: 9450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:42,158-Speed 3343.86 samples/sec   Loss 9.2244   LearningRate 0.0944   Epoch: 0   Global Step: 9460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:45,228-Speed 3335.90 samples/sec   Loss 9.3058   LearningRate 0.0944   Epoch: 0   Global Step: 9470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:48,294-Speed 3340.02 samples/sec   Loss 9.3071   LearningRate 0.0944   Epoch: 0   Global Step: 9480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:51,403-Speed 3294.02 samples/sec   Loss 9.3176   LearningRate 0.0944   Epoch: 0   Global Step: 9490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:54,477-Speed 3332.63 samples/sec   Loss 9.3484   LearningRate 0.0944   Epoch: 0   Global Step: 9500   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:13:57,547-Speed 3336.05 samples/sec   Loss 9.3409   LearningRate 0.0944   Epoch: 0   Global Step: 9510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:00,619-Speed 3334.20 samples/sec   Loss 9.2958   LearningRate 0.0944   Epoch: 0   Global Step: 9520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:03,682-Speed 3343.89 samples/sec   Loss 9.3093   LearningRate 0.0944   Epoch: 0   Global Step: 9530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:06,749-Speed 3341.55 samples/sec   Loss 9.2583   LearningRate 0.0944   Epoch: 0   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:09,814-Speed 3340.92 samples/sec   Loss 9.2439   LearningRate 0.0944   Epoch: 0   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:12,882-Speed 3339.41 samples/sec   Loss 9.3213   LearningRate 0.0944   Epoch: 0   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:15,955-Speed 3332.35 samples/sec   Loss 9.1871   LearningRate 0.0943   Epoch: 0   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:19,060-Speed 3299.27 samples/sec   Loss 9.2882   LearningRate 0.0943   Epoch: 0   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:22,126-Speed 3340.67 samples/sec   Loss 9.1888   LearningRate 0.0943   Epoch: 0   Global Step: 9590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:25,212-Speed 3318.67 samples/sec   Loss 9.2216   LearningRate 0.0943   Epoch: 0   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:14:28,298-Speed 3319.97 samples/sec   Loss 9.3607   LearningRate 0.0943   Epoch: 0   Global Step: 9610   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:14:31,362-Speed 3343.01 samples/sec   Loss 9.1194   LearningRate 0.0943   Epoch: 0   Global Step: 9620   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:14:34,479-Speed 3285.87 samples/sec   Loss 9.2331   LearningRate 0.0943   Epoch: 0   Global Step: 9630   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:14:37,549-Speed 3335.56 samples/sec   Loss 9.1053   LearningRate 0.0943   Epoch: 0   Global Step: 9640   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:14:40,625-Speed 3330.68 samples/sec   Loss 9.1179   LearningRate 0.0943   Epoch: 0   Global Step: 9650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:14:43,699-Speed 3331.58 samples/sec   Loss 9.2149   LearningRate 0.0943   Epoch: 0   Global Step: 9660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:14:46,777-Speed 3328.09 samples/sec   Loss 9.2180   LearningRate 0.0943   Epoch: 0   Global Step: 9670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:14:49,840-Speed 3343.88 samples/sec   Loss 9.1789   LearningRate 0.0943   Epoch: 0   Global Step: 9680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:14:52,906-Speed 3340.09 samples/sec   Loss 9.1448   LearningRate 0.0943   Epoch: 0   Global Step: 9690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:14:55,972-Speed 3340.50 samples/sec   Loss 9.1418   LearningRate 0.0943   Epoch: 0   Global Step: 9700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:14:59,037-Speed 3341.79 samples/sec   Loss 9.2109   LearningRate 0.0943   Epoch: 0   Global Step: 9710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:02,145-Speed 3295.18 samples/sec   Loss 9.1758   LearningRate 0.0943   Epoch: 0   Global Step: 9720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:05,367-Speed 3180.37 samples/sec   Loss 9.2982   LearningRate 0.0943   Epoch: 0   Global Step: 9730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:15:08,441-Speed 3332.39 samples/sec   Loss 9.1369   LearningRate 0.0942   Epoch: 0   Global Step: 9740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:15:11,512-Speed 3335.01 samples/sec   Loss 9.0644   LearningRate 0.0942   Epoch: 0   Global Step: 9750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:15:14,578-Speed 3340.97 samples/sec   Loss 9.1298   LearningRate 0.0942   Epoch: 0   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:15:17,649-Speed 3335.16 samples/sec   Loss 9.1144   LearningRate 0.0942   Epoch: 0   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:15:20,754-Speed 3298.53 samples/sec   Loss 9.1563   LearningRate 0.0942   Epoch: 0   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:15:23,836-Speed 3323.11 samples/sec   Loss 9.0921   LearningRate 0.0942   Epoch: 0   Global Step: 9790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:15:26,890-Speed 3353.71 samples/sec   Loss 9.0497   LearningRate 0.0942   Epoch: 0   Global Step: 9800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:29,962-Speed 3333.79 samples/sec   Loss 9.0840   LearningRate 0.0942   Epoch: 0   Global Step: 9810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:33,041-Speed 3326.73 samples/sec   Loss 9.0460   LearningRate 0.0942   Epoch: 0   Global Step: 9820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:36,112-Speed 3335.07 samples/sec   Loss 9.1512   LearningRate 0.0942   Epoch: 0   Global Step: 9830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:39,195-Speed 3322.30 samples/sec   Loss 9.0327   LearningRate 0.0942   Epoch: 0   Global Step: 9840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:42,263-Speed 3338.99 samples/sec   Loss 9.1270   LearningRate 0.0942   Epoch: 0   Global Step: 9850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:45,348-Speed 3319.54 samples/sec   Loss 9.0562   LearningRate 0.0942   Epoch: 0   Global Step: 9860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:48,412-Speed 3343.27 samples/sec   Loss 8.9672   LearningRate 0.0942   Epoch: 0   Global Step: 9870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:51,478-Speed 3339.91 samples/sec   Loss 9.2124   LearningRate 0.0942   Epoch: 0   Global Step: 9880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:54,560-Speed 3323.22 samples/sec   Loss 9.0966   LearningRate 0.0942   Epoch: 0   Global Step: 9890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:15:57,634-Speed 3332.03 samples/sec   Loss 9.1028   LearningRate 0.0942   Epoch: 0   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:16:00,702-Speed 3339.61 samples/sec   Loss 9.0214   LearningRate 0.0942   Epoch: 0   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:16:03,768-Speed 3339.68 samples/sec   Loss 9.1997   LearningRate 0.0941   Epoch: 0   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:16:06,833-Speed 3342.44 samples/sec   Loss 9.0122   LearningRate 0.0941   Epoch: 0   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:16:09,909-Speed 3329.32 samples/sec   Loss 8.9808   LearningRate 0.0941   Epoch: 0   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:16:12,983-Speed 3331.72 samples/sec   Loss 9.0075   LearningRate 0.0941   Epoch: 0   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:16:16,063-Speed 3325.95 samples/sec   Loss 9.0497   LearningRate 0.0941   Epoch: 0   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:16:19,127-Speed 3342.02 samples/sec   Loss 8.9136   LearningRate 0.0941   Epoch: 0   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:16:22,204-Speed 3329.69 samples/sec   Loss 8.9936   LearningRate 0.0941   Epoch: 0   Global Step: 9980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:16:25,281-Speed 3328.73 samples/sec   Loss 9.0871   LearningRate 0.0941   Epoch: 0   Global Step: 9990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:16:28,349-Speed 3338.50 samples/sec   Loss 9.0645   LearningRate 0.0941   Epoch: 0   Global Step: 10000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:17:11,909-[lfw][10000]XNorm: 21.683097
Training: 2022-04-11 00:17:11,910-[lfw][10000]Accuracy-Flip: 0.99633+-0.00314
Training: 2022-04-11 00:17:11,910-[lfw][10000]Accuracy-Highest: 0.99633
Training: 2022-04-11 00:18:02,518-[cfp_fp][10000]XNorm: 20.045754
Training: 2022-04-11 00:18:02,519-[cfp_fp][10000]Accuracy-Flip: 0.94586+-0.01113
Training: 2022-04-11 00:18:02,519-[cfp_fp][10000]Accuracy-Highest: 0.94586
Training: 2022-04-11 00:18:45,997-[agedb_30][10000]XNorm: 21.611275
Training: 2022-04-11 00:18:45,998-[agedb_30][10000]Accuracy-Flip: 0.95700+-0.00849
Training: 2022-04-11 00:18:45,998-[agedb_30][10000]Accuracy-Highest: 0.95700
Training: 2022-04-11 00:18:49,054-Speed 72.78 samples/sec   Loss 9.0100   LearningRate 0.0941   Epoch: 0   Global Step: 10010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:18:52,106-Speed 3355.70 samples/sec   Loss 8.8664   LearningRate 0.0941   Epoch: 0   Global Step: 10020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:18:55,165-Speed 3347.97 samples/sec   Loss 9.0640   LearningRate 0.0941   Epoch: 0   Global Step: 10030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:18:58,221-Speed 3351.56 samples/sec   Loss 8.9411   LearningRate 0.0941   Epoch: 0   Global Step: 10040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:19:01,280-Speed 3348.55 samples/sec   Loss 8.9480   LearningRate 0.0941   Epoch: 0   Global Step: 10050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:19:04,351-Speed 3334.84 samples/sec   Loss 9.0525   LearningRate 0.0941   Epoch: 0   Global Step: 10060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:19:07,426-Speed 3331.08 samples/sec   Loss 8.9032   LearningRate 0.0941   Epoch: 0   Global Step: 10070   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:19:10,494-Speed 3338.83 samples/sec   Loss 8.9469   LearningRate 0.0941   Epoch: 0   Global Step: 10080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:19:13,590-Speed 3308.21 samples/sec   Loss 9.0088   LearningRate 0.0940   Epoch: 0   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:16,666-Speed 3329.79 samples/sec   Loss 8.9309   LearningRate 0.0940   Epoch: 0   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:19,762-Speed 3308.01 samples/sec   Loss 9.0795   LearningRate 0.0940   Epoch: 0   Global Step: 10110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:22,845-Speed 3322.16 samples/sec   Loss 8.7856   LearningRate 0.0940   Epoch: 0   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:25,907-Speed 3345.11 samples/sec   Loss 8.9557   LearningRate 0.0940   Epoch: 0   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:28,967-Speed 3347.51 samples/sec   Loss 9.0435   LearningRate 0.0940   Epoch: 0   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:32,028-Speed 3345.60 samples/sec   Loss 9.0055   LearningRate 0.0940   Epoch: 0   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:35,097-Speed 3337.83 samples/sec   Loss 8.8546   LearningRate 0.0940   Epoch: 0   Global Step: 10160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:38,202-Speed 3298.57 samples/sec   Loss 8.9378   LearningRate 0.0940   Epoch: 0   Global Step: 10170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:41,302-Speed 3304.19 samples/sec   Loss 8.8481   LearningRate 0.0940   Epoch: 0   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:44,534-Speed 3168.88 samples/sec   Loss 8.8664   LearningRate 0.0940   Epoch: 0   Global Step: 10190   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:19:47,664-Speed 3271.48 samples/sec   Loss 8.9568   LearningRate 0.0940   Epoch: 0   Global Step: 10200   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:19:50,753-Speed 3316.78 samples/sec   Loss 8.9777   LearningRate 0.0940   Epoch: 0   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:53,877-Speed 3278.88 samples/sec   Loss 9.0014   LearningRate 0.0940   Epoch: 0   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:19:56,995-Speed 3284.65 samples/sec   Loss 8.7893   LearningRate 0.0940   Epoch: 0   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:00,078-Speed 3322.01 samples/sec   Loss 8.8888   LearningRate 0.0940   Epoch: 0   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:03,152-Speed 3332.42 samples/sec   Loss 8.9571   LearningRate 0.0940   Epoch: 0   Global Step: 10250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:06,394-Speed 3159.09 samples/sec   Loss 8.8342   LearningRate 0.0939   Epoch: 0   Global Step: 10260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:09,616-Speed 3178.42 samples/sec   Loss 8.8284   LearningRate 0.0939   Epoch: 0   Global Step: 10270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:12,684-Speed 3338.46 samples/sec   Loss 8.9182   LearningRate 0.0939   Epoch: 0   Global Step: 10280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:15,774-Speed 3315.37 samples/sec   Loss 8.9937   LearningRate 0.0939   Epoch: 0   Global Step: 10290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:18,841-Speed 3339.73 samples/sec   Loss 8.8221   LearningRate 0.0939   Epoch: 0   Global Step: 10300   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:21,905-Speed 3342.57 samples/sec   Loss 8.8579   LearningRate 0.0939   Epoch: 0   Global Step: 10310   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:24,985-Speed 3325.57 samples/sec   Loss 8.7881   LearningRate 0.0939   Epoch: 0   Global Step: 10320   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:28,049-Speed 3342.83 samples/sec   Loss 9.0012   LearningRate 0.0939   Epoch: 0   Global Step: 10330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:31,113-Speed 3342.89 samples/sec   Loss 8.9227   LearningRate 0.0939   Epoch: 0   Global Step: 10340   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:20:34,177-Speed 3342.44 samples/sec   Loss 8.8085   LearningRate 0.0939   Epoch: 0   Global Step: 10350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:37,237-Speed 3346.54 samples/sec   Loss 8.7988   LearningRate 0.0939   Epoch: 0   Global Step: 10360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:40,300-Speed 3344.06 samples/sec   Loss 8.7627   LearningRate 0.0939   Epoch: 0   Global Step: 10370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:43,373-Speed 3334.40 samples/sec   Loss 8.8172   LearningRate 0.0939   Epoch: 0   Global Step: 10380   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:46,436-Speed 3343.02 samples/sec   Loss 8.8523   LearningRate 0.0939   Epoch: 0   Global Step: 10390   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:49,533-Speed 3307.92 samples/sec   Loss 8.8190   LearningRate 0.0939   Epoch: 0   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:52,607-Speed 3331.77 samples/sec   Loss 8.7653   LearningRate 0.0939   Epoch: 0   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:55,668-Speed 3346.13 samples/sec   Loss 8.7857   LearningRate 0.0939   Epoch: 0   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:20:58,755-Speed 3317.30 samples/sec   Loss 8.7091   LearningRate 0.0938   Epoch: 0   Global Step: 10430   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:01,863-Speed 3296.11 samples/sec   Loss 8.7249   LearningRate 0.0938   Epoch: 0   Global Step: 10440   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:04,933-Speed 3335.72 samples/sec   Loss 8.7679   LearningRate 0.0938   Epoch: 0   Global Step: 10450   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:07,994-Speed 3346.19 samples/sec   Loss 8.7126   LearningRate 0.0938   Epoch: 0   Global Step: 10460   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:11,065-Speed 3336.05 samples/sec   Loss 8.7466   LearningRate 0.0938   Epoch: 0   Global Step: 10470   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:14,144-Speed 3326.31 samples/sec   Loss 8.8348   LearningRate 0.0938   Epoch: 0   Global Step: 10480   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:17,231-Speed 3318.07 samples/sec   Loss 8.7306   LearningRate 0.0938   Epoch: 0   Global Step: 10490   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:20,296-Speed 3341.00 samples/sec   Loss 8.7195   LearningRate 0.0938   Epoch: 0   Global Step: 10500   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:23,362-Speed 3340.78 samples/sec   Loss 8.8153   LearningRate 0.0938   Epoch: 0   Global Step: 10510   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:26,451-Speed 3315.69 samples/sec   Loss 8.6503   LearningRate 0.0938   Epoch: 0   Global Step: 10520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:21:29,587-Speed 3266.68 samples/sec   Loss 8.6352   LearningRate 0.0938   Epoch: 0   Global Step: 10530   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:21:32,679-Speed 3312.50 samples/sec   Loss 8.7823   LearningRate 0.0938   Epoch: 0   Global Step: 10540   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:21:35,873-Speed 3207.00 samples/sec   Loss 8.8024   LearningRate 0.0938   Epoch: 0   Global Step: 10550   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:21:38,947-Speed 3331.64 samples/sec   Loss 8.7615   LearningRate 0.0938   Epoch: 0   Global Step: 10560   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:21:42,014-Speed 3339.68 samples/sec   Loss 8.8126   LearningRate 0.0938   Epoch: 0   Global Step: 10570   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:21:45,116-Speed 3301.99 samples/sec   Loss 8.6602   LearningRate 0.0938   Epoch: 0   Global Step: 10580   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:21:48,186-Speed 3335.88 samples/sec   Loss 8.8568   LearningRate 0.0938   Epoch: 0   Global Step: 10590   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:21:51,255-Speed 3337.66 samples/sec   Loss 8.7062   LearningRate 0.0938   Epoch: 0   Global Step: 10600   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:21:54,342-Speed 3318.25 samples/sec   Loss 8.7073   LearningRate 0.0937   Epoch: 0   Global Step: 10610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:21:57,415-Speed 3332.59 samples/sec   Loss 8.6624   LearningRate 0.0937   Epoch: 0   Global Step: 10620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:22:00,486-Speed 3335.16 samples/sec   Loss 8.6601   LearningRate 0.0937   Epoch: 0   Global Step: 10630   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:22:03,577-Speed 3314.28 samples/sec   Loss 8.5755   LearningRate 0.0937   Epoch: 0   Global Step: 10640   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:22:06,648-Speed 3335.60 samples/sec   Loss 8.6467   LearningRate 0.0937   Epoch: 0   Global Step: 10650   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:22:09,696-Speed 3359.69 samples/sec   Loss 8.6639   LearningRate 0.0937   Epoch: 0   Global Step: 10660   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:22:12,766-Speed 3336.15 samples/sec   Loss 8.6915   LearningRate 0.0937   Epoch: 0   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:22:15,834-Speed 3338.60 samples/sec   Loss 8.6767   LearningRate 0.0937   Epoch: 0   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:22:18,912-Speed 3328.08 samples/sec   Loss 8.6995   LearningRate 0.0937   Epoch: 0   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:22:22,070-Speed 3242.80 samples/sec   Loss 8.7134   LearningRate 0.0937   Epoch: 0   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:22:25,134-Speed 3343.49 samples/sec   Loss 8.6866   LearningRate 0.0937   Epoch: 0   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:22:28,194-Speed 3346.94 samples/sec   Loss 8.5423   LearningRate 0.0937   Epoch: 0   Global Step: 10720   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:31,265-Speed 3335.30 samples/sec   Loss 8.7901   LearningRate 0.0937   Epoch: 0   Global Step: 10730   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:34,329-Speed 3343.06 samples/sec   Loss 8.5788   LearningRate 0.0937   Epoch: 0   Global Step: 10740   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:37,386-Speed 3349.80 samples/sec   Loss 8.6520   LearningRate 0.0937   Epoch: 0   Global Step: 10750   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:40,447-Speed 3346.17 samples/sec   Loss 8.5469   LearningRate 0.0937   Epoch: 0   Global Step: 10760   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:43,525-Speed 3328.70 samples/sec   Loss 8.5183   LearningRate 0.0937   Epoch: 0   Global Step: 10770   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:46,588-Speed 3343.17 samples/sec   Loss 8.4938   LearningRate 0.0936   Epoch: 0   Global Step: 10780   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:49,741-Speed 3248.29 samples/sec   Loss 8.7119   LearningRate 0.0936   Epoch: 0   Global Step: 10790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:52,814-Speed 3333.10 samples/sec   Loss 8.6527   LearningRate 0.0936   Epoch: 0   Global Step: 10800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:55,899-Speed 3320.41 samples/sec   Loss 8.6120   LearningRate 0.0936   Epoch: 0   Global Step: 10810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:22:58,969-Speed 3336.89 samples/sec   Loss 8.6330   LearningRate 0.0936   Epoch: 0   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:23:02,035-Speed 3340.47 samples/sec   Loss 8.6157   LearningRate 0.0936   Epoch: 0   Global Step: 10830   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:23:05,112-Speed 3328.53 samples/sec   Loss 8.5043   LearningRate 0.0936   Epoch: 0   Global Step: 10840   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:23:08,200-Speed 3317.41 samples/sec   Loss 8.4960   LearningRate 0.0936   Epoch: 0   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:23:11,358-Speed 3242.85 samples/sec   Loss 8.6671   LearningRate 0.0936   Epoch: 0   Global Step: 10860   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:23:14,541-Speed 3217.69 samples/sec   Loss 8.7119   LearningRate 0.0936   Epoch: 0   Global Step: 10870   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:23:17,631-Speed 3315.17 samples/sec   Loss 8.4938   LearningRate 0.0936   Epoch: 0   Global Step: 10880   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:20,699-Speed 3338.12 samples/sec   Loss 8.4603   LearningRate 0.0936   Epoch: 0   Global Step: 10890   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:23,772-Speed 3332.42 samples/sec   Loss 8.5766   LearningRate 0.0936   Epoch: 0   Global Step: 10900   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:26,844-Speed 3335.20 samples/sec   Loss 8.5477   LearningRate 0.0936   Epoch: 0   Global Step: 10910   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:29,991-Speed 3254.22 samples/sec   Loss 8.6097   LearningRate 0.0936   Epoch: 0   Global Step: 10920   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:33,075-Speed 3321.09 samples/sec   Loss 8.4680   LearningRate 0.0936   Epoch: 0   Global Step: 10930   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:36,138-Speed 3343.88 samples/sec   Loss 8.5579   LearningRate 0.0936   Epoch: 0   Global Step: 10940   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:39,303-Speed 3236.11 samples/sec   Loss 8.5806   LearningRate 0.0935   Epoch: 0   Global Step: 10950   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:42,364-Speed 3345.48 samples/sec   Loss 8.5236   LearningRate 0.0935   Epoch: 0   Global Step: 10960   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:45,426-Speed 3345.77 samples/sec   Loss 8.4706   LearningRate 0.0935   Epoch: 0   Global Step: 10970   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:23:48,506-Speed 3325.52 samples/sec   Loss 8.5400   LearningRate 0.0935   Epoch: 0   Global Step: 10980   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:23:51,578-Speed 3333.80 samples/sec   Loss 8.5803   LearningRate 0.0935   Epoch: 0   Global Step: 10990   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:23:54,659-Speed 3324.91 samples/sec   Loss 8.5342   LearningRate 0.0935   Epoch: 0   Global Step: 11000   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:23:57,727-Speed 3339.00 samples/sec   Loss 8.6021   LearningRate 0.0935   Epoch: 0   Global Step: 11010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:24:00,882-Speed 3246.15 samples/sec   Loss 8.6233   LearningRate 0.0935   Epoch: 0   Global Step: 11020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:03,988-Speed 3298.25 samples/sec   Loss 8.5987   LearningRate 0.0935   Epoch: 0   Global Step: 11030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:07,050-Speed 3344.55 samples/sec   Loss 8.5546   LearningRate 0.0935   Epoch: 0   Global Step: 11040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:10,124-Speed 3331.48 samples/sec   Loss 8.5943   LearningRate 0.0935   Epoch: 0   Global Step: 11050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:13,196-Speed 3333.77 samples/sec   Loss 8.4756   LearningRate 0.0935   Epoch: 0   Global Step: 11060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:16,347-Speed 3251.09 samples/sec   Loss 8.4304   LearningRate 0.0935   Epoch: 0   Global Step: 11070   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:19,420-Speed 3333.79 samples/sec   Loss 8.5226   LearningRate 0.0935   Epoch: 0   Global Step: 11080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:22,488-Speed 3337.83 samples/sec   Loss 8.4488   LearningRate 0.0935   Epoch: 0   Global Step: 11090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:25,558-Speed 3336.01 samples/sec   Loss 8.4504   LearningRate 0.0935   Epoch: 0   Global Step: 11100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:28,629-Speed 3335.43 samples/sec   Loss 8.3801   LearningRate 0.0935   Epoch: 0   Global Step: 11110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:24:31,707-Speed 3328.16 samples/sec   Loss 8.4958   LearningRate 0.0934   Epoch: 0   Global Step: 11120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:24:34,786-Speed 3326.20 samples/sec   Loss 8.4871   LearningRate 0.0934   Epoch: 0   Global Step: 11130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:24:37,860-Speed 3332.01 samples/sec   Loss 8.5232   LearningRate 0.0934   Epoch: 0   Global Step: 11140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:24:40,944-Speed 3321.07 samples/sec   Loss 8.4807   LearningRate 0.0934   Epoch: 0   Global Step: 11150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:24:44,009-Speed 3341.69 samples/sec   Loss 8.4984   LearningRate 0.0934   Epoch: 0   Global Step: 11160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:24:47,085-Speed 3330.64 samples/sec   Loss 8.4584   LearningRate 0.0934   Epoch: 0   Global Step: 11170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:24:50,179-Speed 3310.30 samples/sec   Loss 8.4809   LearningRate 0.0934   Epoch: 0   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:24:53,254-Speed 3329.93 samples/sec   Loss 8.4342   LearningRate 0.0934   Epoch: 0   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:24:56,318-Speed 3342.75 samples/sec   Loss 8.4831   LearningRate 0.0934   Epoch: 0   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:24:59,399-Speed 3324.60 samples/sec   Loss 8.3752   LearningRate 0.0934   Epoch: 0   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:25:02,465-Speed 3340.89 samples/sec   Loss 8.4705   LearningRate 0.0934   Epoch: 0   Global Step: 11220   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:25:05,543-Speed 3327.35 samples/sec   Loss 8.4277   LearningRate 0.0934   Epoch: 0   Global Step: 11230   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:25:08,603-Speed 3347.34 samples/sec   Loss 8.5333   LearningRate 0.0934   Epoch: 0   Global Step: 11240   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:25:11,658-Speed 3353.09 samples/sec   Loss 8.4324   LearningRate 0.0934   Epoch: 0   Global Step: 11250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:25:14,729-Speed 3335.77 samples/sec   Loss 8.3629   LearningRate 0.0934   Epoch: 0   Global Step: 11260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:25:17,793-Speed 3342.13 samples/sec   Loss 8.4235   LearningRate 0.0934   Epoch: 0   Global Step: 11270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:25:20,882-Speed 3316.40 samples/sec   Loss 8.4165   LearningRate 0.0934   Epoch: 0   Global Step: 11280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:25:24,074-Speed 3208.28 samples/sec   Loss 8.4515   LearningRate 0.0934   Epoch: 0   Global Step: 11290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:27,137-Speed 3343.32 samples/sec   Loss 8.4501   LearningRate 0.0933   Epoch: 0   Global Step: 11300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:30,205-Speed 3339.39 samples/sec   Loss 8.3799   LearningRate 0.0933   Epoch: 0   Global Step: 11310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:33,299-Speed 3309.89 samples/sec   Loss 8.4218   LearningRate 0.0933   Epoch: 0   Global Step: 11320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:36,375-Speed 3330.54 samples/sec   Loss 8.5018   LearningRate 0.0933   Epoch: 0   Global Step: 11330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:39,446-Speed 3335.28 samples/sec   Loss 8.4063   LearningRate 0.0933   Epoch: 0   Global Step: 11340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:42,517-Speed 3334.98 samples/sec   Loss 8.4075   LearningRate 0.0933   Epoch: 0   Global Step: 11350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:45,655-Speed 3263.92 samples/sec   Loss 8.3701   LearningRate 0.0933   Epoch: 0   Global Step: 11360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:48,721-Speed 3340.87 samples/sec   Loss 8.4867   LearningRate 0.0933   Epoch: 0   Global Step: 11370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:51,784-Speed 3344.09 samples/sec   Loss 8.4983   LearningRate 0.0933   Epoch: 0   Global Step: 11380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:25:54,848-Speed 3341.99 samples/sec   Loss 8.3409   LearningRate 0.0933   Epoch: 0   Global Step: 11390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:25:57,925-Speed 3329.17 samples/sec   Loss 8.4533   LearningRate 0.0933   Epoch: 0   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:26:01,000-Speed 3330.39 samples/sec   Loss 8.4600   LearningRate 0.0933   Epoch: 0   Global Step: 11410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:26:04,092-Speed 3313.35 samples/sec   Loss 8.3465   LearningRate 0.0933   Epoch: 0   Global Step: 11420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:26:07,159-Speed 3339.93 samples/sec   Loss 8.4242   LearningRate 0.0933   Epoch: 0   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:26:10,252-Speed 3311.67 samples/sec   Loss 8.2295   LearningRate 0.0933   Epoch: 0   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:26:13,325-Speed 3332.14 samples/sec   Loss 8.2576   LearningRate 0.0933   Epoch: 0   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:26:16,380-Speed 3353.37 samples/sec   Loss 8.3746   LearningRate 0.0933   Epoch: 0   Global Step: 11460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:19,451-Speed 3334.60 samples/sec   Loss 8.4417   LearningRate 0.0932   Epoch: 0   Global Step: 11470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:22,545-Speed 3310.89 samples/sec   Loss 8.3796   LearningRate 0.0932   Epoch: 0   Global Step: 11480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:25,622-Speed 3327.78 samples/sec   Loss 8.4845   LearningRate 0.0932   Epoch: 0   Global Step: 11490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:28,709-Speed 3319.00 samples/sec   Loss 8.3470   LearningRate 0.0932   Epoch: 0   Global Step: 11500   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:31,772-Speed 3343.13 samples/sec   Loss 8.5272   LearningRate 0.0932   Epoch: 0   Global Step: 11510   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:34,847-Speed 3331.90 samples/sec   Loss 8.2883   LearningRate 0.0932   Epoch: 0   Global Step: 11520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:37,916-Speed 3336.92 samples/sec   Loss 8.3274   LearningRate 0.0932   Epoch: 0   Global Step: 11530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:40,983-Speed 3339.89 samples/sec   Loss 8.3940   LearningRate 0.0932   Epoch: 0   Global Step: 11540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:44,052-Speed 3336.56 samples/sec   Loss 8.1921   LearningRate 0.0932   Epoch: 0   Global Step: 11550   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:47,124-Speed 3334.20 samples/sec   Loss 8.3039   LearningRate 0.0932   Epoch: 0   Global Step: 11560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:26:50,193-Speed 3337.99 samples/sec   Loss 8.4377   LearningRate 0.0932   Epoch: 0   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:26:53,247-Speed 3353.83 samples/sec   Loss 8.4306   LearningRate 0.0932   Epoch: 0   Global Step: 11580   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:56,319-Speed 3333.74 samples/sec   Loss 8.2247   LearningRate 0.0932   Epoch: 0   Global Step: 11590   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:26:59,386-Speed 3339.40 samples/sec   Loss 8.2380   LearningRate 0.0932   Epoch: 0   Global Step: 11600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:02,525-Speed 3263.05 samples/sec   Loss 8.2134   LearningRate 0.0932   Epoch: 0   Global Step: 11610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:05,641-Speed 3287.13 samples/sec   Loss 8.2251   LearningRate 0.0932   Epoch: 0   Global Step: 11620   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:08,764-Speed 3279.87 samples/sec   Loss 8.1667   LearningRate 0.0932   Epoch: 0   Global Step: 11630   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:11,838-Speed 3331.60 samples/sec   Loss 8.3845   LearningRate 0.0931   Epoch: 0   Global Step: 11640   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:14,996-Speed 3243.67 samples/sec   Loss 8.2611   LearningRate 0.0931   Epoch: 0   Global Step: 11650   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:18,106-Speed 3293.14 samples/sec   Loss 8.3495   LearningRate 0.0931   Epoch: 0   Global Step: 11660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:21,184-Speed 3327.81 samples/sec   Loss 8.2537   LearningRate 0.0931   Epoch: 0   Global Step: 11670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:24,255-Speed 3335.71 samples/sec   Loss 8.3805   LearningRate 0.0931   Epoch: 0   Global Step: 11680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:27:27,333-Speed 3326.89 samples/sec   Loss 8.3208   LearningRate 0.0931   Epoch: 0   Global Step: 11690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:27:30,401-Speed 3339.43 samples/sec   Loss 8.1457   LearningRate 0.0931   Epoch: 0   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:27:33,455-Speed 3353.96 samples/sec   Loss 8.3417   LearningRate 0.0931   Epoch: 0   Global Step: 11710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:36,526-Speed 3334.81 samples/sec   Loss 8.2544   LearningRate 0.0931   Epoch: 0   Global Step: 11720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:39,608-Speed 3322.93 samples/sec   Loss 8.3669   LearningRate 0.0931   Epoch: 0   Global Step: 11730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:42,675-Speed 3340.65 samples/sec   Loss 8.1755   LearningRate 0.0931   Epoch: 0   Global Step: 11740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:45,738-Speed 3343.83 samples/sec   Loss 8.2096   LearningRate 0.0931   Epoch: 0   Global Step: 11750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:48,810-Speed 3333.65 samples/sec   Loss 8.3020   LearningRate 0.0931   Epoch: 0   Global Step: 11760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:51,896-Speed 3318.70 samples/sec   Loss 8.2842   LearningRate 0.0931   Epoch: 0   Global Step: 11770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:54,997-Speed 3303.07 samples/sec   Loss 8.2210   LearningRate 0.0931   Epoch: 0   Global Step: 11780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:27:58,167-Speed 3230.92 samples/sec   Loss 8.3260   LearningRate 0.0931   Epoch: 0   Global Step: 11790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:01,269-Speed 3302.27 samples/sec   Loss 8.1677   LearningRate 0.0931   Epoch: 0   Global Step: 11800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:04,380-Speed 3292.10 samples/sec   Loss 8.1334   LearningRate 0.0930   Epoch: 0   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:28:07,436-Speed 3352.11 samples/sec   Loss 8.1719   LearningRate 0.0930   Epoch: 0   Global Step: 11820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:10,509-Speed 3332.59 samples/sec   Loss 8.1596   LearningRate 0.0930   Epoch: 0   Global Step: 11830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:13,575-Speed 3340.91 samples/sec   Loss 8.1057   LearningRate 0.0930   Epoch: 0   Global Step: 11840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:16,740-Speed 3236.68 samples/sec   Loss 8.2546   LearningRate 0.0930   Epoch: 0   Global Step: 11850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:19,879-Speed 3262.50 samples/sec   Loss 8.0670   LearningRate 0.0930   Epoch: 0   Global Step: 11860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:22,962-Speed 3321.94 samples/sec   Loss 8.1854   LearningRate 0.0930   Epoch: 0   Global Step: 11870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:26,036-Speed 3332.80 samples/sec   Loss 8.2622   LearningRate 0.0930   Epoch: 0   Global Step: 11880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:29,102-Speed 3340.16 samples/sec   Loss 8.1946   LearningRate 0.0930   Epoch: 0   Global Step: 11890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:32,169-Speed 3339.11 samples/sec   Loss 8.1262   LearningRate 0.0930   Epoch: 0   Global Step: 11900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:35,235-Speed 3340.94 samples/sec   Loss 8.1489   LearningRate 0.0930   Epoch: 0   Global Step: 11910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:28:38,311-Speed 3330.64 samples/sec   Loss 8.0750   LearningRate 0.0930   Epoch: 0   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:28:41,401-Speed 3314.97 samples/sec   Loss 8.2518   LearningRate 0.0930   Epoch: 0   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:28:44,487-Speed 3319.86 samples/sec   Loss 8.1053   LearningRate 0.0930   Epoch: 0   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:28:47,554-Speed 3339.25 samples/sec   Loss 8.3164   LearningRate 0.0930   Epoch: 0   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:28:50,633-Speed 3326.70 samples/sec   Loss 8.1237   LearningRate 0.0930   Epoch: 0   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:28:53,712-Speed 3325.55 samples/sec   Loss 8.2458   LearningRate 0.0930   Epoch: 0   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:28:56,807-Speed 3310.18 samples/sec   Loss 8.0863   LearningRate 0.0930   Epoch: 0   Global Step: 11980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:28:59,875-Speed 3339.10 samples/sec   Loss 8.1578   LearningRate 0.0929   Epoch: 0   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:29:02,944-Speed 3337.25 samples/sec   Loss 8.0794   LearningRate 0.0929   Epoch: 0   Global Step: 12000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:29:47,252-[lfw][12000]XNorm: 24.062023
Training: 2022-04-11 00:29:47,253-[lfw][12000]Accuracy-Flip: 0.99617+-0.00358
Training: 2022-04-11 00:29:47,253-[lfw][12000]Accuracy-Highest: 0.99633
Training: 2022-04-11 00:30:38,830-[cfp_fp][12000]XNorm: 22.129733
Training: 2022-04-11 00:30:38,831-[cfp_fp][12000]Accuracy-Flip: 0.95971+-0.00769
Training: 2022-04-11 00:30:38,831-[cfp_fp][12000]Accuracy-Highest: 0.95971
Training: 2022-04-11 00:31:23,042-[agedb_30][12000]XNorm: 23.550760
Training: 2022-04-11 00:31:23,043-[agedb_30][12000]Accuracy-Flip: 0.96233+-0.01036
Training: 2022-04-11 00:31:23,043-[agedb_30][12000]Accuracy-Highest: 0.96233
Training: 2022-04-11 00:31:26,123-Speed 71.52 samples/sec   Loss 8.1388   LearningRate 0.0929   Epoch: 0   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:31:29,181-Speed 3349.85 samples/sec   Loss 8.1459   LearningRate 0.0929   Epoch: 0   Global Step: 12020   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:31:32,262-Speed 3324.29 samples/sec   Loss 8.1529   LearningRate 0.0929   Epoch: 0   Global Step: 12030   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:31:35,368-Speed 3297.19 samples/sec   Loss 8.1479   LearningRate 0.0929   Epoch: 0   Global Step: 12040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:31:38,500-Speed 3270.30 samples/sec   Loss 8.1820   LearningRate 0.0929   Epoch: 0   Global Step: 12050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:31:41,647-Speed 3254.64 samples/sec   Loss 8.1051   LearningRate 0.0929   Epoch: 0   Global Step: 12060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:31:44,808-Speed 3240.64 samples/sec   Loss 8.2059   LearningRate 0.0929   Epoch: 0   Global Step: 12070   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:31:47,903-Speed 3308.94 samples/sec   Loss 7.9462   LearningRate 0.0929   Epoch: 0   Global Step: 12080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:31:50,964-Speed 3345.82 samples/sec   Loss 8.1185   LearningRate 0.0929   Epoch: 0   Global Step: 12090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:31:54,036-Speed 3334.17 samples/sec   Loss 8.1409   LearningRate 0.0929   Epoch: 0   Global Step: 12100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:31:57,110-Speed 3332.47 samples/sec   Loss 8.1234   LearningRate 0.0929   Epoch: 0   Global Step: 12110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:00,173-Speed 3344.57 samples/sec   Loss 8.1570   LearningRate 0.0929   Epoch: 0   Global Step: 12120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:03,243-Speed 3336.13 samples/sec   Loss 8.0315   LearningRate 0.0929   Epoch: 0   Global Step: 12130   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:06,306-Speed 3344.12 samples/sec   Loss 8.0088   LearningRate 0.0929   Epoch: 0   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:09,392-Speed 3319.21 samples/sec   Loss 8.0320   LearningRate 0.0929   Epoch: 0   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:12,496-Speed 3299.36 samples/sec   Loss 8.0496   LearningRate 0.0928   Epoch: 0   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:15,600-Speed 3299.51 samples/sec   Loss 8.1033   LearningRate 0.0928   Epoch: 0   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:18,698-Speed 3306.61 samples/sec   Loss 8.0359   LearningRate 0.0928   Epoch: 0   Global Step: 12180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:21,773-Speed 3330.92 samples/sec   Loss 8.1088   LearningRate 0.0928   Epoch: 0   Global Step: 12190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:24,850-Speed 3329.07 samples/sec   Loss 8.0285   LearningRate 0.0928   Epoch: 0   Global Step: 12200   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:27,935-Speed 3319.97 samples/sec   Loss 8.1371   LearningRate 0.0928   Epoch: 0   Global Step: 12210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:31,019-Speed 3321.69 samples/sec   Loss 8.1317   LearningRate 0.0928   Epoch: 0   Global Step: 12220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:34,126-Speed 3296.57 samples/sec   Loss 7.9125   LearningRate 0.0928   Epoch: 0   Global Step: 12230   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:32:37,213-Speed 3317.24 samples/sec   Loss 8.1926   LearningRate 0.0928   Epoch: 0   Global Step: 12240   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:40,301-Speed 3317.70 samples/sec   Loss 8.1259   LearningRate 0.0928   Epoch: 0   Global Step: 12250   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:43,373-Speed 3333.00 samples/sec   Loss 7.9747   LearningRate 0.0928   Epoch: 0   Global Step: 12260   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:46,448-Speed 3331.37 samples/sec   Loss 8.0822   LearningRate 0.0928   Epoch: 0   Global Step: 12270   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:49,541-Speed 3311.76 samples/sec   Loss 8.1181   LearningRate 0.0928   Epoch: 0   Global Step: 12280   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:52,619-Speed 3328.01 samples/sec   Loss 8.0615   LearningRate 0.0928   Epoch: 0   Global Step: 12290   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:55,701-Speed 3322.66 samples/sec   Loss 7.9940   LearningRate 0.0928   Epoch: 0   Global Step: 12300   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:32:58,779-Speed 3327.26 samples/sec   Loss 7.9349   LearningRate 0.0928   Epoch: 0   Global Step: 12310   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:33:01,853-Speed 3333.53 samples/sec   Loss 7.9970   LearningRate 0.0928   Epoch: 0   Global Step: 12320   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:33:04,974-Speed 3281.82 samples/sec   Loss 8.0633   LearningRate 0.0927   Epoch: 0   Global Step: 12330   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:33:08,041-Speed 3339.08 samples/sec   Loss 8.1049   LearningRate 0.0927   Epoch: 0   Global Step: 12340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:11,122-Speed 3325.17 samples/sec   Loss 7.9576   LearningRate 0.0927   Epoch: 0   Global Step: 12350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:14,262-Speed 3261.31 samples/sec   Loss 7.9847   LearningRate 0.0927   Epoch: 0   Global Step: 12360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:17,327-Speed 3341.59 samples/sec   Loss 7.9101   LearningRate 0.0927   Epoch: 0   Global Step: 12370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:20,404-Speed 3329.60 samples/sec   Loss 7.9396   LearningRate 0.0927   Epoch: 0   Global Step: 12380   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:23,469-Speed 3341.48 samples/sec   Loss 7.9728   LearningRate 0.0927   Epoch: 0   Global Step: 12390   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:26,531-Speed 3344.95 samples/sec   Loss 8.0462   LearningRate 0.0927   Epoch: 0   Global Step: 12400   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:29,609-Speed 3327.54 samples/sec   Loss 8.0622   LearningRate 0.0927   Epoch: 0   Global Step: 12410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:32,721-Speed 3291.67 samples/sec   Loss 8.0520   LearningRate 0.0927   Epoch: 0   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:35,811-Speed 3314.63 samples/sec   Loss 7.9186   LearningRate 0.0927   Epoch: 0   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:38,875-Speed 3342.34 samples/sec   Loss 7.9556   LearningRate 0.0927   Epoch: 0   Global Step: 12440   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:33:41,937-Speed 3346.31 samples/sec   Loss 7.8775   LearningRate 0.0927   Epoch: 0   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:44,997-Speed 3347.39 samples/sec   Loss 8.0633   LearningRate 0.0927   Epoch: 0   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:48,055-Speed 3348.69 samples/sec   Loss 8.0173   LearningRate 0.0927   Epoch: 0   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:51,212-Speed 3244.20 samples/sec   Loss 8.0955   LearningRate 0.0927   Epoch: 0   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:54,304-Speed 3312.57 samples/sec   Loss 7.9044   LearningRate 0.0927   Epoch: 0   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:33:57,364-Speed 3346.88 samples/sec   Loss 7.9434   LearningRate 0.0927   Epoch: 0   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:00,427-Speed 3344.02 samples/sec   Loss 7.8747   LearningRate 0.0926   Epoch: 0   Global Step: 12510   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:03,493-Speed 3341.43 samples/sec   Loss 8.0607   LearningRate 0.0926   Epoch: 0   Global Step: 12520   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:06,554-Speed 3346.25 samples/sec   Loss 7.9439   LearningRate 0.0926   Epoch: 0   Global Step: 12530   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:09,640-Speed 3318.49 samples/sec   Loss 7.8542   LearningRate 0.0926   Epoch: 0   Global Step: 12540   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:12,731-Speed 3313.90 samples/sec   Loss 7.9364   LearningRate 0.0926   Epoch: 0   Global Step: 12550   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:15,800-Speed 3337.36 samples/sec   Loss 7.9122   LearningRate 0.0926   Epoch: 0   Global Step: 12560   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:18,863-Speed 3343.27 samples/sec   Loss 7.9449   LearningRate 0.0926   Epoch: 0   Global Step: 12570   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:21,928-Speed 3342.21 samples/sec   Loss 8.0885   LearningRate 0.0926   Epoch: 0   Global Step: 12580   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:25,060-Speed 3269.85 samples/sec   Loss 7.9112   LearningRate 0.0926   Epoch: 0   Global Step: 12590   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:28,159-Speed 3305.15 samples/sec   Loss 7.7790   LearningRate 0.0926   Epoch: 0   Global Step: 12600   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:34:31,246-Speed 3318.54 samples/sec   Loss 8.0432   LearningRate 0.0926   Epoch: 0   Global Step: 12610   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:34,354-Speed 3295.39 samples/sec   Loss 7.9233   LearningRate 0.0926   Epoch: 0   Global Step: 12620   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:37,494-Speed 3262.50 samples/sec   Loss 7.8720   LearningRate 0.0926   Epoch: 0   Global Step: 12630   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:40,716-Speed 3179.15 samples/sec   Loss 7.8657   LearningRate 0.0926   Epoch: 0   Global Step: 12640   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:43,833-Speed 3285.60 samples/sec   Loss 7.9279   LearningRate 0.0926   Epoch: 0   Global Step: 12650   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:46,909-Speed 3329.71 samples/sec   Loss 7.8759   LearningRate 0.0926   Epoch: 0   Global Step: 12660   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:49,978-Speed 3338.05 samples/sec   Loss 7.9287   LearningRate 0.0926   Epoch: 0   Global Step: 12670   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:53,055-Speed 3328.90 samples/sec   Loss 8.0390   LearningRate 0.0925   Epoch: 0   Global Step: 12680   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:56,123-Speed 3338.45 samples/sec   Loss 7.9553   LearningRate 0.0925   Epoch: 0   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:34:59,190-Speed 3339.48 samples/sec   Loss 7.8358   LearningRate 0.0925   Epoch: 0   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:35:02,277-Speed 3317.48 samples/sec   Loss 7.9850   LearningRate 0.0925   Epoch: 0   Global Step: 12710   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:35:05,417-Speed 3261.69 samples/sec   Loss 7.9457   LearningRate 0.0925   Epoch: 0   Global Step: 12720   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:35:08,529-Speed 3292.06 samples/sec   Loss 7.9113   LearningRate 0.0925   Epoch: 0   Global Step: 12730   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:35:11,687-Speed 3243.83 samples/sec   Loss 7.8752   LearningRate 0.0925   Epoch: 0   Global Step: 12740   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:35:14,755-Speed 3338.02 samples/sec   Loss 7.8939   LearningRate 0.0925   Epoch: 0   Global Step: 12750   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:35:17,811-Speed 3351.26 samples/sec   Loss 7.9457   LearningRate 0.0925   Epoch: 0   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:35:20,890-Speed 3326.91 samples/sec   Loss 7.9941   LearningRate 0.0925   Epoch: 0   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:35:23,964-Speed 3332.69 samples/sec   Loss 7.9379   LearningRate 0.0925   Epoch: 0   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:35:27,027-Speed 3343.50 samples/sec   Loss 7.8355   LearningRate 0.0925   Epoch: 0   Global Step: 12790   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:35:30,126-Speed 3304.99 samples/sec   Loss 7.9608   LearningRate 0.0925   Epoch: 0   Global Step: 12800   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:35:33,202-Speed 3330.00 samples/sec   Loss 7.9408   LearningRate 0.0925   Epoch: 0   Global Step: 12810   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:35:36,273-Speed 3335.12 samples/sec   Loss 7.8164   LearningRate 0.0925   Epoch: 0   Global Step: 12820   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:35:39,343-Speed 3336.18 samples/sec   Loss 7.8628   LearningRate 0.0925   Epoch: 0   Global Step: 12830   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:35:42,424-Speed 3324.91 samples/sec   Loss 7.9522   LearningRate 0.0925   Epoch: 0   Global Step: 12840   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:35:45,498-Speed 3331.75 samples/sec   Loss 7.9189   LearningRate 0.0924   Epoch: 0   Global Step: 12850   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:35:48,584-Speed 3319.33 samples/sec   Loss 7.8892   LearningRate 0.0924   Epoch: 0   Global Step: 12860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:35:51,652-Speed 3338.77 samples/sec   Loss 7.9031   LearningRate 0.0924   Epoch: 0   Global Step: 12870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:35:54,723-Speed 3334.58 samples/sec   Loss 7.8455   LearningRate 0.0924   Epoch: 0   Global Step: 12880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:35:57,797-Speed 3332.38 samples/sec   Loss 7.8250   LearningRate 0.0924   Epoch: 0   Global Step: 12890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:00,903-Speed 3297.69 samples/sec   Loss 7.8470   LearningRate 0.0924   Epoch: 0   Global Step: 12900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:04,062-Speed 3241.75 samples/sec   Loss 7.7867   LearningRate 0.0924   Epoch: 0   Global Step: 12910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:07,162-Speed 3304.97 samples/sec   Loss 7.9367   LearningRate 0.0924   Epoch: 0   Global Step: 12920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:10,231-Speed 3337.52 samples/sec   Loss 7.9543   LearningRate 0.0924   Epoch: 0   Global Step: 12930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:13,313-Speed 3322.96 samples/sec   Loss 7.8253   LearningRate 0.0924   Epoch: 0   Global Step: 12940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:16,502-Speed 3211.82 samples/sec   Loss 7.8957   LearningRate 0.0924   Epoch: 0   Global Step: 12950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:19,695-Speed 3208.09 samples/sec   Loss 7.8814   LearningRate 0.0924   Epoch: 0   Global Step: 12960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:22,812-Speed 3285.99 samples/sec   Loss 7.8343   LearningRate 0.0924   Epoch: 0   Global Step: 12970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:25,896-Speed 3321.19 samples/sec   Loss 7.8730   LearningRate 0.0924   Epoch: 0   Global Step: 12980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:28,956-Speed 3347.34 samples/sec   Loss 7.8681   LearningRate 0.0924   Epoch: 0   Global Step: 12990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:36:32,053-Speed 3307.17 samples/sec   Loss 7.7792   LearningRate 0.0924   Epoch: 0   Global Step: 13000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:36:35,140-Speed 3317.56 samples/sec   Loss 7.8397   LearningRate 0.0924   Epoch: 0   Global Step: 13010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:36:38,269-Speed 3274.08 samples/sec   Loss 7.7087   LearningRate 0.0924   Epoch: 0   Global Step: 13020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:36:41,422-Speed 3248.51 samples/sec   Loss 7.7407   LearningRate 0.0923   Epoch: 0   Global Step: 13030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:36:44,501-Speed 3327.06 samples/sec   Loss 7.7902   LearningRate 0.0923   Epoch: 0   Global Step: 13040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:36:47,585-Speed 3322.39 samples/sec   Loss 7.7932   LearningRate 0.0923   Epoch: 0   Global Step: 13050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:36:50,655-Speed 3336.21 samples/sec   Loss 7.8245   LearningRate 0.0923   Epoch: 0   Global Step: 13060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:36:53,723-Speed 3338.36 samples/sec   Loss 7.6956   LearningRate 0.0923   Epoch: 0   Global Step: 13070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:36:56,809-Speed 3318.77 samples/sec   Loss 7.7023   LearningRate 0.0923   Epoch: 0   Global Step: 13080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:36:59,881-Speed 3334.41 samples/sec   Loss 7.6362   LearningRate 0.0923   Epoch: 0   Global Step: 13090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:02,975-Speed 3310.63 samples/sec   Loss 7.6838   LearningRate 0.0923   Epoch: 0   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:37:06,043-Speed 3337.91 samples/sec   Loss 7.6736   LearningRate 0.0923   Epoch: 0   Global Step: 13110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:37:09,154-Speed 3292.43 samples/sec   Loss 7.8213   LearningRate 0.0923   Epoch: 0   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:37:12,264-Speed 3294.11 samples/sec   Loss 7.8188   LearningRate 0.0923   Epoch: 0   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:37:15,335-Speed 3334.66 samples/sec   Loss 7.8011   LearningRate 0.0923   Epoch: 0   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:37:18,406-Speed 3335.41 samples/sec   Loss 7.7071   LearningRate 0.0923   Epoch: 0   Global Step: 13150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:37:21,481-Speed 3330.70 samples/sec   Loss 7.5884   LearningRate 0.0923   Epoch: 0   Global Step: 13160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:37:24,558-Speed 3328.57 samples/sec   Loss 7.7247   LearningRate 0.0923   Epoch: 0   Global Step: 13170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:27,795-Speed 3164.34 samples/sec   Loss 7.8273   LearningRate 0.0923   Epoch: 0   Global Step: 13180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:30,980-Speed 3216.40 samples/sec   Loss 7.7729   LearningRate 0.0923   Epoch: 0   Global Step: 13190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:34,118-Speed 3263.87 samples/sec   Loss 7.5961   LearningRate 0.0922   Epoch: 0   Global Step: 13200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:37,205-Speed 3317.48 samples/sec   Loss 7.8718   LearningRate 0.0922   Epoch: 0   Global Step: 13210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:40,283-Speed 3327.41 samples/sec   Loss 7.7661   LearningRate 0.0922   Epoch: 0   Global Step: 13220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:43,359-Speed 3329.64 samples/sec   Loss 7.6749   LearningRate 0.0922   Epoch: 0   Global Step: 13230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:46,459-Speed 3303.98 samples/sec   Loss 7.7509   LearningRate 0.0922   Epoch: 0   Global Step: 13240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:49,543-Speed 3321.92 samples/sec   Loss 7.7832   LearningRate 0.0922   Epoch: 0   Global Step: 13250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:52,642-Speed 3305.12 samples/sec   Loss 7.7259   LearningRate 0.0922   Epoch: 0   Global Step: 13260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:37:55,726-Speed 3320.95 samples/sec   Loss 7.8076   LearningRate 0.0922   Epoch: 0   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:37:58,807-Speed 3324.83 samples/sec   Loss 7.6992   LearningRate 0.0922   Epoch: 0   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:01,878-Speed 3335.69 samples/sec   Loss 7.7667   LearningRate 0.0922   Epoch: 0   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:04,958-Speed 3325.34 samples/sec   Loss 7.6867   LearningRate 0.0922   Epoch: 0   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:08,061-Speed 3300.94 samples/sec   Loss 7.8433   LearningRate 0.0922   Epoch: 0   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:11,140-Speed 3326.54 samples/sec   Loss 7.6627   LearningRate 0.0922   Epoch: 0   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:14,218-Speed 3327.99 samples/sec   Loss 7.7599   LearningRate 0.0922   Epoch: 0   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:17,292-Speed 3331.82 samples/sec   Loss 7.6712   LearningRate 0.0922   Epoch: 0   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:20,425-Speed 3269.41 samples/sec   Loss 7.6713   LearningRate 0.0922   Epoch: 0   Global Step: 13350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:23,495-Speed 3336.43 samples/sec   Loss 7.6147   LearningRate 0.0922   Epoch: 0   Global Step: 13360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:26,557-Speed 3344.34 samples/sec   Loss 7.5874   LearningRate 0.0922   Epoch: 0   Global Step: 13370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:29,630-Speed 3333.82 samples/sec   Loss 7.7612   LearningRate 0.0921   Epoch: 0   Global Step: 13380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:32,711-Speed 3323.55 samples/sec   Loss 7.6905   LearningRate 0.0921   Epoch: 0   Global Step: 13390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:35,784-Speed 3333.01 samples/sec   Loss 7.6398   LearningRate 0.0921   Epoch: 0   Global Step: 13400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:38,857-Speed 3333.93 samples/sec   Loss 7.6536   LearningRate 0.0921   Epoch: 0   Global Step: 13410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:41,936-Speed 3326.13 samples/sec   Loss 7.6413   LearningRate 0.0921   Epoch: 0   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:45,008-Speed 3334.19 samples/sec   Loss 7.6246   LearningRate 0.0921   Epoch: 0   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:48,079-Speed 3335.37 samples/sec   Loss 7.6483   LearningRate 0.0921   Epoch: 0   Global Step: 13440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:51,156-Speed 3328.72 samples/sec   Loss 7.6477   LearningRate 0.0921   Epoch: 0   Global Step: 13450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:54,225-Speed 3337.07 samples/sec   Loss 7.6451   LearningRate 0.0921   Epoch: 0   Global Step: 13460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:38:57,292-Speed 3339.03 samples/sec   Loss 7.6677   LearningRate 0.0921   Epoch: 0   Global Step: 13470   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:39:00,361-Speed 3337.67 samples/sec   Loss 7.5666   LearningRate 0.0921   Epoch: 0   Global Step: 13480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:03,520-Speed 3242.44 samples/sec   Loss 7.6132   LearningRate 0.0921   Epoch: 0   Global Step: 13490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:06,635-Speed 3288.22 samples/sec   Loss 7.5470   LearningRate 0.0921   Epoch: 0   Global Step: 13500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:09,702-Speed 3339.42 samples/sec   Loss 7.5858   LearningRate 0.0921   Epoch: 0   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:12,789-Speed 3318.18 samples/sec   Loss 7.5961   LearningRate 0.0921   Epoch: 0   Global Step: 13520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:15,868-Speed 3326.38 samples/sec   Loss 7.6478   LearningRate 0.0921   Epoch: 0   Global Step: 13530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:18,960-Speed 3313.59 samples/sec   Loss 7.6428   LearningRate 0.0921   Epoch: 0   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:22,044-Speed 3320.64 samples/sec   Loss 7.6397   LearningRate 0.0920   Epoch: 0   Global Step: 13550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:25,110-Speed 3341.43 samples/sec   Loss 7.6918   LearningRate 0.0920   Epoch: 0   Global Step: 13560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:28,232-Speed 3280.56 samples/sec   Loss 7.7795   LearningRate 0.0920   Epoch: 0   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:39:31,296-Speed 3342.43 samples/sec   Loss 7.6750   LearningRate 0.0920   Epoch: 0   Global Step: 13580   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:39:34,352-Speed 3351.97 samples/sec   Loss 7.6045   LearningRate 0.0920   Epoch: 0   Global Step: 13590   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:39:37,422-Speed 3336.81 samples/sec   Loss 7.6211   LearningRate 0.0920   Epoch: 0   Global Step: 13600   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:39:40,491-Speed 3336.71 samples/sec   Loss 7.7199   LearningRate 0.0920   Epoch: 0   Global Step: 13610   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:39:43,560-Speed 3338.00 samples/sec   Loss 7.5974   LearningRate 0.0920   Epoch: 0   Global Step: 13620   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:39:46,637-Speed 3329.34 samples/sec   Loss 7.5043   LearningRate 0.0920   Epoch: 0   Global Step: 13630   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:39:49,704-Speed 3338.68 samples/sec   Loss 7.5150   LearningRate 0.0920   Epoch: 0   Global Step: 13640   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:39:52,836-Speed 3270.11 samples/sec   Loss 7.5526   LearningRate 0.0920   Epoch: 0   Global Step: 13650   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:39:55,943-Speed 3297.23 samples/sec   Loss 7.6097   LearningRate 0.0920   Epoch: 0   Global Step: 13660   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:39:59,070-Speed 3275.70 samples/sec   Loss 7.5713   LearningRate 0.0920   Epoch: 0   Global Step: 13670   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:40:02,152-Speed 3323.45 samples/sec   Loss 7.5504   LearningRate 0.0920   Epoch: 0   Global Step: 13680   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:40:05,224-Speed 3333.96 samples/sec   Loss 7.7434   LearningRate 0.0920   Epoch: 0   Global Step: 13690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:08,305-Speed 3324.76 samples/sec   Loss 7.5779   LearningRate 0.0920   Epoch: 0   Global Step: 13700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:11,383-Speed 3327.58 samples/sec   Loss 7.5525   LearningRate 0.0920   Epoch: 0   Global Step: 13710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:14,456-Speed 3332.78 samples/sec   Loss 7.6334   LearningRate 0.0919   Epoch: 0   Global Step: 13720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:17,537-Speed 3324.90 samples/sec   Loss 7.6367   LearningRate 0.0919   Epoch: 0   Global Step: 13730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:20,648-Speed 3292.38 samples/sec   Loss 7.6194   LearningRate 0.0919   Epoch: 0   Global Step: 13740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:23,720-Speed 3334.37 samples/sec   Loss 7.5859   LearningRate 0.0919   Epoch: 0   Global Step: 13750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:26,789-Speed 3338.00 samples/sec   Loss 7.5451   LearningRate 0.0919   Epoch: 0   Global Step: 13760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:29,861-Speed 3333.28 samples/sec   Loss 7.4994   LearningRate 0.0919   Epoch: 0   Global Step: 13770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:32,934-Speed 3333.55 samples/sec   Loss 7.6429   LearningRate 0.0919   Epoch: 0   Global Step: 13780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:40:36,007-Speed 3333.68 samples/sec   Loss 7.6538   LearningRate 0.0919   Epoch: 0   Global Step: 13790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:40:39,083-Speed 3329.10 samples/sec   Loss 7.5644   LearningRate 0.0919   Epoch: 0   Global Step: 13800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:40:42,191-Speed 3295.66 samples/sec   Loss 7.5939   LearningRate 0.0919   Epoch: 0   Global Step: 13810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:40:45,266-Speed 3331.63 samples/sec   Loss 7.5621   LearningRate 0.0919   Epoch: 0   Global Step: 13820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:40:48,342-Speed 3329.64 samples/sec   Loss 7.5702   LearningRate 0.0919   Epoch: 0   Global Step: 13830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:40:51,416-Speed 3332.24 samples/sec   Loss 7.6044   LearningRate 0.0919   Epoch: 0   Global Step: 13840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:40:54,556-Speed 3261.67 samples/sec   Loss 7.6290   LearningRate 0.0919   Epoch: 0   Global Step: 13850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:40:57,627-Speed 3335.29 samples/sec   Loss 7.4863   LearningRate 0.0919   Epoch: 0   Global Step: 13860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:41:00,704-Speed 3329.09 samples/sec   Loss 7.5235   LearningRate 0.0919   Epoch: 0   Global Step: 13870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:41:03,774-Speed 3335.52 samples/sec   Loss 7.5192   LearningRate 0.0919   Epoch: 0   Global Step: 13880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:41:06,869-Speed 3309.73 samples/sec   Loss 7.5272   LearningRate 0.0919   Epoch: 0   Global Step: 13890   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:41:10,012-Speed 3259.01 samples/sec   Loss 7.7238   LearningRate 0.0918   Epoch: 0   Global Step: 13900   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:41:13,112-Speed 3303.27 samples/sec   Loss 7.4685   LearningRate 0.0918   Epoch: 0   Global Step: 13910   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:41:16,191-Speed 3327.04 samples/sec   Loss 7.5425   LearningRate 0.0918   Epoch: 0   Global Step: 13920   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:41:19,274-Speed 3322.78 samples/sec   Loss 7.5148   LearningRate 0.0918   Epoch: 0   Global Step: 13930   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:41:22,334-Speed 3347.42 samples/sec   Loss 7.6431   LearningRate 0.0918   Epoch: 0   Global Step: 13940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:41:25,435-Speed 3302.23 samples/sec   Loss 7.5255   LearningRate 0.0918   Epoch: 0   Global Step: 13950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:41:28,571-Speed 3266.31 samples/sec   Loss 7.4241   LearningRate 0.0918   Epoch: 0   Global Step: 13960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:41:31,644-Speed 3333.96 samples/sec   Loss 7.4140   LearningRate 0.0918   Epoch: 0   Global Step: 13970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:41:34,795-Speed 3250.49 samples/sec   Loss 7.4761   LearningRate 0.0918   Epoch: 0   Global Step: 13980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:41:37,890-Speed 3309.53 samples/sec   Loss 7.5263   LearningRate 0.0918   Epoch: 0   Global Step: 13990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:41:40,968-Speed 3328.13 samples/sec   Loss 7.4126   LearningRate 0.0918   Epoch: 0   Global Step: 14000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:42:24,759-[lfw][14000]XNorm: 22.431370
Training: 2022-04-11 00:42:24,760-[lfw][14000]Accuracy-Flip: 0.99533+-0.00379
Training: 2022-04-11 00:42:24,760-[lfw][14000]Accuracy-Highest: 0.99633
Training: 2022-04-11 00:43:15,716-[cfp_fp][14000]XNorm: 20.762199
Training: 2022-04-11 00:43:15,717-[cfp_fp][14000]Accuracy-Flip: 0.96371+-0.00806
Training: 2022-04-11 00:43:15,717-[cfp_fp][14000]Accuracy-Highest: 0.96371
Training: 2022-04-11 00:43:59,428-[agedb_30][14000]XNorm: 22.679022
Training: 2022-04-11 00:43:59,429-[agedb_30][14000]Accuracy-Flip: 0.96167+-0.00840
Training: 2022-04-11 00:43:59,429-[agedb_30][14000]Accuracy-Highest: 0.96233
Training: 2022-04-11 00:44:02,591-Speed 72.30 samples/sec   Loss 7.6154   LearningRate 0.0918   Epoch: 0   Global Step: 14010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:44:05,717-Speed 3276.21 samples/sec   Loss 7.4695   LearningRate 0.0918   Epoch: 0   Global Step: 14020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:44:08,770-Speed 3355.05 samples/sec   Loss 7.5169   LearningRate 0.0918   Epoch: 0   Global Step: 14030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:44:11,827-Speed 3350.52 samples/sec   Loss 7.5194   LearningRate 0.0918   Epoch: 0   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:14,879-Speed 3356.07 samples/sec   Loss 7.4983   LearningRate 0.0918   Epoch: 0   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:17,931-Speed 3355.86 samples/sec   Loss 7.4511   LearningRate 0.0918   Epoch: 0   Global Step: 14060   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:20,982-Speed 3357.59 samples/sec   Loss 7.4671   LearningRate 0.0917   Epoch: 0   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:24,033-Speed 3356.60 samples/sec   Loss 7.5083   LearningRate 0.0917   Epoch: 0   Global Step: 14080   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:27,091-Speed 3349.96 samples/sec   Loss 7.5300   LearningRate 0.0917   Epoch: 0   Global Step: 14090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:30,149-Speed 3348.79 samples/sec   Loss 7.4849   LearningRate 0.0917   Epoch: 0   Global Step: 14100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:33,212-Speed 3344.72 samples/sec   Loss 7.6057   LearningRate 0.0917   Epoch: 0   Global Step: 14110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:36,278-Speed 3340.04 samples/sec   Loss 7.4810   LearningRate 0.0917   Epoch: 0   Global Step: 14120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:39,348-Speed 3336.50 samples/sec   Loss 7.5147   LearningRate 0.0917   Epoch: 0   Global Step: 14130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:42,409-Speed 3346.37 samples/sec   Loss 7.4945   LearningRate 0.0917   Epoch: 0   Global Step: 14140   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:44:45,534-Speed 3277.87 samples/sec   Loss 7.4994   LearningRate 0.0917   Epoch: 0   Global Step: 14150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:48,658-Speed 3278.59 samples/sec   Loss 7.5663   LearningRate 0.0917   Epoch: 0   Global Step: 14160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:51,733-Speed 3330.47 samples/sec   Loss 7.3741   LearningRate 0.0917   Epoch: 0   Global Step: 14170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:54,835-Speed 3302.43 samples/sec   Loss 7.4629   LearningRate 0.0917   Epoch: 0   Global Step: 14180   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:44:57,897-Speed 3345.20 samples/sec   Loss 7.5690   LearningRate 0.0917   Epoch: 0   Global Step: 14190   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:00,963-Speed 3340.80 samples/sec   Loss 7.5254   LearningRate 0.0917   Epoch: 0   Global Step: 14200   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:04,046-Speed 3321.63 samples/sec   Loss 7.5415   LearningRate 0.0917   Epoch: 0   Global Step: 14210   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:07,222-Speed 3224.65 samples/sec   Loss 7.4526   LearningRate 0.0917   Epoch: 0   Global Step: 14220   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:10,329-Speed 3296.77 samples/sec   Loss 7.4586   LearningRate 0.0917   Epoch: 0   Global Step: 14230   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:13,438-Speed 3295.29 samples/sec   Loss 7.5063   LearningRate 0.0917   Epoch: 0   Global Step: 14240   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:16,507-Speed 3337.42 samples/sec   Loss 7.4055   LearningRate 0.0916   Epoch: 0   Global Step: 14250   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:45:19,646-Speed 3263.45 samples/sec   Loss 7.4728   LearningRate 0.0916   Epoch: 0   Global Step: 14260   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:45:22,781-Speed 3267.14 samples/sec   Loss 7.2891   LearningRate 0.0916   Epoch: 0   Global Step: 14270   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:45:25,840-Speed 3349.69 samples/sec   Loss 7.4585   LearningRate 0.0916   Epoch: 0   Global Step: 14280   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:28,953-Speed 3290.66 samples/sec   Loss 7.3819   LearningRate 0.0916   Epoch: 0   Global Step: 14290   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:32,028-Speed 3330.85 samples/sec   Loss 7.3575   LearningRate 0.0916   Epoch: 0   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:35,148-Speed 3282.46 samples/sec   Loss 7.5276   LearningRate 0.0916   Epoch: 0   Global Step: 14310   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:38,217-Speed 3337.50 samples/sec   Loss 7.2570   LearningRate 0.0916   Epoch: 0   Global Step: 14320   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:41,290-Speed 3333.54 samples/sec   Loss 7.3828   LearningRate 0.0916   Epoch: 0   Global Step: 14330   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:44,366-Speed 3329.36 samples/sec   Loss 7.5171   LearningRate 0.0916   Epoch: 0   Global Step: 14340   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:47,524-Speed 3243.99 samples/sec   Loss 7.3824   LearningRate 0.0916   Epoch: 0   Global Step: 14350   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:50,651-Speed 3275.12 samples/sec   Loss 7.4662   LearningRate 0.0916   Epoch: 0   Global Step: 14360   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:53,782-Speed 3272.03 samples/sec   Loss 7.4462   LearningRate 0.0916   Epoch: 0   Global Step: 14370   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:45:56,866-Speed 3320.67 samples/sec   Loss 7.4806   LearningRate 0.0916   Epoch: 0   Global Step: 14380   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:45:59,976-Speed 3294.02 samples/sec   Loss 7.3622   LearningRate 0.0916   Epoch: 0   Global Step: 14390   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:03,064-Speed 3316.34 samples/sec   Loss 7.4337   LearningRate 0.0916   Epoch: 0   Global Step: 14400   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:06,171-Speed 3296.27 samples/sec   Loss 7.3945   LearningRate 0.0916   Epoch: 0   Global Step: 14410   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:09,242-Speed 3336.10 samples/sec   Loss 7.3993   LearningRate 0.0915   Epoch: 0   Global Step: 14420   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:12,302-Speed 3346.50 samples/sec   Loss 7.3345   LearningRate 0.0915   Epoch: 0   Global Step: 14430   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:15,372-Speed 3337.20 samples/sec   Loss 7.4033   LearningRate 0.0915   Epoch: 0   Global Step: 14440   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:18,446-Speed 3331.35 samples/sec   Loss 7.3780   LearningRate 0.0915   Epoch: 0   Global Step: 14450   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:21,514-Speed 3339.29 samples/sec   Loss 7.4751   LearningRate 0.0915   Epoch: 0   Global Step: 14460   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:24,587-Speed 3336.88 samples/sec   Loss 7.4870   LearningRate 0.0915   Epoch: 0   Global Step: 14470   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:27,718-Speed 3270.52 samples/sec   Loss 7.4207   LearningRate 0.0915   Epoch: 0   Global Step: 14480   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:30,876-Speed 3243.52 samples/sec   Loss 7.5017   LearningRate 0.0915   Epoch: 0   Global Step: 14490   Fp16 Grad Scale: 262144   Required: 34 hours
Training: 2022-04-11 00:46:33,931-Speed 3352.18 samples/sec   Loss 7.4700   LearningRate 0.0915   Epoch: 0   Global Step: 14500   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:37,047-Speed 3287.14 samples/sec   Loss 7.4441   LearningRate 0.0915   Epoch: 0   Global Step: 14510   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:46:40,216-Speed 3232.71 samples/sec   Loss 7.4123   LearningRate 0.0915   Epoch: 0   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:46:43,323-Speed 3297.07 samples/sec   Loss 7.2630   LearningRate 0.0915   Epoch: 0   Global Step: 14530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:46:46,417-Speed 3309.51 samples/sec   Loss 7.4322   LearningRate 0.0915   Epoch: 0   Global Step: 14540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:46:49,495-Speed 3328.56 samples/sec   Loss 7.4777   LearningRate 0.0915   Epoch: 0   Global Step: 14550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:46:52,586-Speed 3313.27 samples/sec   Loss 7.4124   LearningRate 0.0915   Epoch: 0   Global Step: 14560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:46:55,695-Speed 3294.96 samples/sec   Loss 7.3153   LearningRate 0.0915   Epoch: 0   Global Step: 14570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:46:58,788-Speed 3310.69 samples/sec   Loss 7.3948   LearningRate 0.0915   Epoch: 0   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:01,947-Speed 3242.54 samples/sec   Loss 7.3725   LearningRate 0.0914   Epoch: 0   Global Step: 14590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:05,013-Speed 3340.73 samples/sec   Loss 7.3680   LearningRate 0.0914   Epoch: 0   Global Step: 14600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:08,089-Speed 3330.52 samples/sec   Loss 7.3396   LearningRate 0.0914   Epoch: 0   Global Step: 14610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:11,153-Speed 3342.74 samples/sec   Loss 7.1932   LearningRate 0.0914   Epoch: 0   Global Step: 14620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:14,228-Speed 3330.20 samples/sec   Loss 7.3973   LearningRate 0.0914   Epoch: 0   Global Step: 14630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:17,307-Speed 3327.10 samples/sec   Loss 7.3667   LearningRate 0.0914   Epoch: 0   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:20,386-Speed 3326.98 samples/sec   Loss 7.4306   LearningRate 0.0914   Epoch: 0   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:23,459-Speed 3333.10 samples/sec   Loss 7.3655   LearningRate 0.0914   Epoch: 0   Global Step: 14660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:26,523-Speed 3342.04 samples/sec   Loss 7.2742   LearningRate 0.0914   Epoch: 0   Global Step: 14670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:47:29,594-Speed 3335.88 samples/sec   Loss 7.3381   LearningRate 0.0914   Epoch: 0   Global Step: 14680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:47:32,656-Speed 3345.24 samples/sec   Loss 7.3632   LearningRate 0.0914   Epoch: 0   Global Step: 14690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:47:35,725-Speed 3337.02 samples/sec   Loss 7.3914   LearningRate 0.0914   Epoch: 0   Global Step: 14700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:47:38,825-Speed 3305.65 samples/sec   Loss 7.3073   LearningRate 0.0914   Epoch: 0   Global Step: 14710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:47:41,896-Speed 3335.14 samples/sec   Loss 7.3892   LearningRate 0.0914   Epoch: 0   Global Step: 14720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:47:44,961-Speed 3342.21 samples/sec   Loss 7.4128   LearningRate 0.0914   Epoch: 0   Global Step: 14730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:47:48,045-Speed 3321.45 samples/sec   Loss 7.2950   LearningRate 0.0914   Epoch: 0   Global Step: 14740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:47:51,163-Speed 3284.81 samples/sec   Loss 7.3638   LearningRate 0.0914   Epoch: 0   Global Step: 14750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:47:54,228-Speed 3341.47 samples/sec   Loss 7.2778   LearningRate 0.0914   Epoch: 0   Global Step: 14760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:47:57,290-Speed 3345.61 samples/sec   Loss 7.3846   LearningRate 0.0913   Epoch: 0   Global Step: 14770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:00,352-Speed 3345.00 samples/sec   Loss 7.2862   LearningRate 0.0913   Epoch: 0   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:48:03,449-Speed 3306.32 samples/sec   Loss 7.3366   LearningRate 0.0913   Epoch: 0   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:48:06,558-Speed 3294.65 samples/sec   Loss 7.3517   LearningRate 0.0913   Epoch: 0   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:48:09,658-Speed 3304.30 samples/sec   Loss 7.2601   LearningRate 0.0913   Epoch: 0   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:48:12,731-Speed 3333.91 samples/sec   Loss 7.3088   LearningRate 0.0913   Epoch: 0   Global Step: 14820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:48:15,782-Speed 3356.39 samples/sec   Loss 7.3407   LearningRate 0.0913   Epoch: 0   Global Step: 14830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:18,857-Speed 3331.75 samples/sec   Loss 7.3440   LearningRate 0.0913   Epoch: 0   Global Step: 14840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:22,004-Speed 3254.45 samples/sec   Loss 7.3632   LearningRate 0.0913   Epoch: 0   Global Step: 14850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:25,149-Speed 3256.08 samples/sec   Loss 7.3234   LearningRate 0.0913   Epoch: 0   Global Step: 14860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:28,217-Speed 3338.55 samples/sec   Loss 7.2676   LearningRate 0.0913   Epoch: 0   Global Step: 14870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:31,286-Speed 3337.62 samples/sec   Loss 7.3703   LearningRate 0.0913   Epoch: 0   Global Step: 14880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:34,349-Speed 3342.88 samples/sec   Loss 7.4336   LearningRate 0.0913   Epoch: 0   Global Step: 14890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:37,492-Speed 3259.75 samples/sec   Loss 7.3543   LearningRate 0.0913   Epoch: 0   Global Step: 14900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:40,574-Speed 3323.08 samples/sec   Loss 7.2606   LearningRate 0.0913   Epoch: 0   Global Step: 14910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:43,649-Speed 3330.44 samples/sec   Loss 7.3993   LearningRate 0.0913   Epoch: 0   Global Step: 14920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:48:46,713-Speed 3343.11 samples/sec   Loss 7.1851   LearningRate 0.0913   Epoch: 0   Global Step: 14930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:48:49,824-Speed 3292.21 samples/sec   Loss 7.2967   LearningRate 0.0912   Epoch: 0   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:48:52,902-Speed 3328.17 samples/sec   Loss 7.3094   LearningRate 0.0912   Epoch: 0   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:48:56,021-Speed 3283.91 samples/sec   Loss 7.3474   LearningRate 0.0912   Epoch: 0   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:48:59,089-Speed 3337.72 samples/sec   Loss 7.2872   LearningRate 0.0912   Epoch: 0   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:49:02,189-Speed 3304.31 samples/sec   Loss 7.1607   LearningRate 0.0912   Epoch: 0   Global Step: 14980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:49:05,382-Speed 3207.72 samples/sec   Loss 7.3696   LearningRate 0.0912   Epoch: 0   Global Step: 14990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:49:08,533-Speed 3251.17 samples/sec   Loss 7.2201   LearningRate 0.0912   Epoch: 0   Global Step: 15000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:49:11,602-Speed 3337.47 samples/sec   Loss 7.2595   LearningRate 0.0912   Epoch: 0   Global Step: 15010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:49:14,667-Speed 3341.46 samples/sec   Loss 7.3460   LearningRate 0.0912   Epoch: 0   Global Step: 15020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:49:17,748-Speed 3325.19 samples/sec   Loss 7.2335   LearningRate 0.0912   Epoch: 0   Global Step: 15030   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:49:20,814-Speed 3339.81 samples/sec   Loss 7.3886   LearningRate 0.0912   Epoch: 0   Global Step: 15040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:49:23,931-Speed 3286.49 samples/sec   Loss 7.2114   LearningRate 0.0912   Epoch: 0   Global Step: 15050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:27,030-Speed 3304.51 samples/sec   Loss 7.3005   LearningRate 0.0912   Epoch: 0   Global Step: 15060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:30,107-Speed 3329.17 samples/sec   Loss 7.3387   LearningRate 0.0912   Epoch: 0   Global Step: 15070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:33,173-Speed 3341.02 samples/sec   Loss 7.1777   LearningRate 0.0912   Epoch: 0   Global Step: 15080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:36,239-Speed 3341.09 samples/sec   Loss 7.2669   LearningRate 0.0912   Epoch: 0   Global Step: 15090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:39,334-Speed 3309.00 samples/sec   Loss 7.1878   LearningRate 0.0912   Epoch: 0   Global Step: 15100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:42,396-Speed 3344.74 samples/sec   Loss 7.1479   LearningRate 0.0912   Epoch: 0   Global Step: 15110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:45,468-Speed 3334.52 samples/sec   Loss 7.1323   LearningRate 0.0911   Epoch: 0   Global Step: 15120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:48,573-Speed 3298.19 samples/sec   Loss 7.2355   LearningRate 0.0911   Epoch: 0   Global Step: 15130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:51,651-Speed 3328.32 samples/sec   Loss 7.2854   LearningRate 0.0911   Epoch: 0   Global Step: 15140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:49:54,715-Speed 3343.48 samples/sec   Loss 7.1575   LearningRate 0.0911   Epoch: 0   Global Step: 15150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:49:57,785-Speed 3336.05 samples/sec   Loss 7.2050   LearningRate 0.0911   Epoch: 0   Global Step: 15160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:00,881-Speed 3308.06 samples/sec   Loss 7.1143   LearningRate 0.0911   Epoch: 0   Global Step: 15170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:03,950-Speed 3337.08 samples/sec   Loss 7.1303   LearningRate 0.0911   Epoch: 0   Global Step: 15180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:07,023-Speed 3334.22 samples/sec   Loss 7.1513   LearningRate 0.0911   Epoch: 0   Global Step: 15190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:10,090-Speed 3339.72 samples/sec   Loss 7.2025   LearningRate 0.0911   Epoch: 0   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:13,159-Speed 3336.90 samples/sec   Loss 7.1633   LearningRate 0.0911   Epoch: 0   Global Step: 15210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:16,239-Speed 3325.50 samples/sec   Loss 7.2246   LearningRate 0.0911   Epoch: 0   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:19,320-Speed 3324.56 samples/sec   Loss 7.0842   LearningRate 0.0911   Epoch: 0   Global Step: 15230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:22,393-Speed 3333.46 samples/sec   Loss 7.2624   LearningRate 0.0911   Epoch: 0   Global Step: 15240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:25,444-Speed 3357.09 samples/sec   Loss 7.1707   LearningRate 0.0911   Epoch: 0   Global Step: 15250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:28,510-Speed 3340.89 samples/sec   Loss 7.2699   LearningRate 0.0911   Epoch: 0   Global Step: 15260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:31,586-Speed 3329.67 samples/sec   Loss 7.0943   LearningRate 0.0911   Epoch: 0   Global Step: 15270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:34,671-Speed 3320.40 samples/sec   Loss 7.2444   LearningRate 0.0911   Epoch: 0   Global Step: 15280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:37,740-Speed 3336.46 samples/sec   Loss 7.1337   LearningRate 0.0910   Epoch: 0   Global Step: 15290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:40,816-Speed 3330.61 samples/sec   Loss 7.2765   LearningRate 0.0910   Epoch: 0   Global Step: 15300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:43,884-Speed 3337.94 samples/sec   Loss 7.1502   LearningRate 0.0910   Epoch: 0   Global Step: 15310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:47,001-Speed 3286.33 samples/sec   Loss 7.2248   LearningRate 0.0910   Epoch: 0   Global Step: 15320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:50,073-Speed 3333.99 samples/sec   Loss 7.3265   LearningRate 0.0910   Epoch: 0   Global Step: 15330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:53,143-Speed 3336.30 samples/sec   Loss 7.1504   LearningRate 0.0910   Epoch: 0   Global Step: 15340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:50:56,210-Speed 3339.64 samples/sec   Loss 7.1958   LearningRate 0.0910   Epoch: 0   Global Step: 15350   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:50:59,272-Speed 3345.01 samples/sec   Loss 7.1994   LearningRate 0.0910   Epoch: 0   Global Step: 15360   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:51:02,348-Speed 3329.77 samples/sec   Loss 7.1391   LearningRate 0.0910   Epoch: 0   Global Step: 15370   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:51:05,398-Speed 3359.06 samples/sec   Loss 7.0907   LearningRate 0.0910   Epoch: 0   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:08,464-Speed 3340.52 samples/sec   Loss 7.1218   LearningRate 0.0910   Epoch: 0   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:11,535-Speed 3334.88 samples/sec   Loss 7.1464   LearningRate 0.0910   Epoch: 0   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:14,601-Speed 3340.95 samples/sec   Loss 7.2576   LearningRate 0.0910   Epoch: 0   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:17,679-Speed 3327.99 samples/sec   Loss 7.1543   LearningRate 0.0910   Epoch: 0   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:20,758-Speed 3326.44 samples/sec   Loss 7.0502   LearningRate 0.0910   Epoch: 0   Global Step: 15430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:23,831-Speed 3333.65 samples/sec   Loss 7.0878   LearningRate 0.0910   Epoch: 0   Global Step: 15440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:26,911-Speed 3324.40 samples/sec   Loss 7.2073   LearningRate 0.0910   Epoch: 0   Global Step: 15450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:29,979-Speed 3338.32 samples/sec   Loss 7.0982   LearningRate 0.0910   Epoch: 0   Global Step: 15460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:33,044-Speed 3342.38 samples/sec   Loss 7.2136   LearningRate 0.0909   Epoch: 0   Global Step: 15470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:36,115-Speed 3335.04 samples/sec   Loss 7.2367   LearningRate 0.0909   Epoch: 0   Global Step: 15480   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:51:39,177-Speed 3345.08 samples/sec   Loss 7.0369   LearningRate 0.0909   Epoch: 0   Global Step: 15490   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:51:42,245-Speed 3339.04 samples/sec   Loss 7.0360   LearningRate 0.0909   Epoch: 0   Global Step: 15500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:45,307-Speed 3344.90 samples/sec   Loss 7.1988   LearningRate 0.0909   Epoch: 0   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:48,372-Speed 3341.42 samples/sec   Loss 7.0641   LearningRate 0.0909   Epoch: 0   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:51,507-Speed 3267.52 samples/sec   Loss 7.1297   LearningRate 0.0909   Epoch: 0   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:54,569-Speed 3345.01 samples/sec   Loss 7.0409   LearningRate 0.0909   Epoch: 0   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:51:57,635-Speed 3339.60 samples/sec   Loss 7.0320   LearningRate 0.0909   Epoch: 0   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:52:00,722-Speed 3318.32 samples/sec   Loss 7.1106   LearningRate 0.0909   Epoch: 0   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:52:03,807-Speed 3320.58 samples/sec   Loss 7.1725   LearningRate 0.0909   Epoch: 0   Global Step: 15570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:52:06,876-Speed 3337.22 samples/sec   Loss 7.0126   LearningRate 0.0909   Epoch: 0   Global Step: 15580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:52:09,940-Speed 3342.70 samples/sec   Loss 7.0662   LearningRate 0.0909   Epoch: 0   Global Step: 15590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:52:13,031-Speed 3313.19 samples/sec   Loss 7.2106   LearningRate 0.0909   Epoch: 0   Global Step: 15600   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:52:16,097-Speed 3341.45 samples/sec   Loss 7.1542   LearningRate 0.0909   Epoch: 0   Global Step: 15610   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:52:19,161-Speed 3342.22 samples/sec   Loss 7.0102   LearningRate 0.0909   Epoch: 0   Global Step: 15620   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:52:22,228-Speed 3339.16 samples/sec   Loss 7.2150   LearningRate 0.0909   Epoch: 0   Global Step: 15630   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:52:25,347-Speed 3284.33 samples/sec   Loss 7.0974   LearningRate 0.0908   Epoch: 0   Global Step: 15640   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:52:28,411-Speed 3342.83 samples/sec   Loss 7.0654   LearningRate 0.0908   Epoch: 0   Global Step: 15650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:52:31,484-Speed 3332.81 samples/sec   Loss 7.1050   LearningRate 0.0908   Epoch: 0   Global Step: 15660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:52:34,541-Speed 3350.88 samples/sec   Loss 7.1453   LearningRate 0.0908   Epoch: 0   Global Step: 15670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:52:37,621-Speed 3325.30 samples/sec   Loss 7.1319   LearningRate 0.0908   Epoch: 0   Global Step: 15680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:52:40,697-Speed 3330.04 samples/sec   Loss 7.0603   LearningRate 0.0908   Epoch: 0   Global Step: 15690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:52:43,768-Speed 3335.06 samples/sec   Loss 7.1045   LearningRate 0.0908   Epoch: 0   Global Step: 15700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:52:46,836-Speed 3338.36 samples/sec   Loss 7.0524   LearningRate 0.0908   Epoch: 0   Global Step: 15710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:52:49,914-Speed 3327.97 samples/sec   Loss 7.0359   LearningRate 0.0908   Epoch: 0   Global Step: 15720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:52:52,985-Speed 3334.91 samples/sec   Loss 7.1125   LearningRate 0.0908   Epoch: 0   Global Step: 15730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:52:56,079-Speed 3311.41 samples/sec   Loss 7.2248   LearningRate 0.0908   Epoch: 0   Global Step: 15740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:52:59,175-Speed 3308.25 samples/sec   Loss 7.0123   LearningRate 0.0908   Epoch: 0   Global Step: 15750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:02,241-Speed 3340.42 samples/sec   Loss 7.0602   LearningRate 0.0908   Epoch: 0   Global Step: 15760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:05,319-Speed 3327.77 samples/sec   Loss 6.9774   LearningRate 0.0908   Epoch: 0   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:53:08,397-Speed 3327.97 samples/sec   Loss 7.0702   LearningRate 0.0908   Epoch: 0   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:53:11,457-Speed 3346.58 samples/sec   Loss 7.0296   LearningRate 0.0908   Epoch: 0   Global Step: 15790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:14,521-Speed 3342.67 samples/sec   Loss 7.0853   LearningRate 0.0908   Epoch: 0   Global Step: 15800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:17,681-Speed 3241.51 samples/sec   Loss 7.0545   LearningRate 0.0908   Epoch: 0   Global Step: 15810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:20,800-Speed 3283.22 samples/sec   Loss 6.9877   LearningRate 0.0907   Epoch: 0   Global Step: 15820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:23,957-Speed 3245.13 samples/sec   Loss 7.0320   LearningRate 0.0907   Epoch: 0   Global Step: 15830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:27,026-Speed 3337.98 samples/sec   Loss 7.0934   LearningRate 0.0907   Epoch: 0   Global Step: 15840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:30,097-Speed 3334.76 samples/sec   Loss 6.9707   LearningRate 0.0907   Epoch: 0   Global Step: 15850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:33,169-Speed 3333.86 samples/sec   Loss 6.9359   LearningRate 0.0907   Epoch: 0   Global Step: 15860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:36,240-Speed 3335.11 samples/sec   Loss 7.0039   LearningRate 0.0907   Epoch: 0   Global Step: 15870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:39,309-Speed 3337.19 samples/sec   Loss 7.0465   LearningRate 0.0907   Epoch: 0   Global Step: 15880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:53:42,383-Speed 3332.49 samples/sec   Loss 7.0663   LearningRate 0.0907   Epoch: 0   Global Step: 15890   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:53:45,484-Speed 3303.02 samples/sec   Loss 7.1436   LearningRate 0.0907   Epoch: 0   Global Step: 15900   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:53:48,678-Speed 3208.29 samples/sec   Loss 6.9505   LearningRate 0.0907   Epoch: 0   Global Step: 15910   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:53:51,761-Speed 3322.38 samples/sec   Loss 7.0230   LearningRate 0.0907   Epoch: 0   Global Step: 15920   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:53:54,832-Speed 3334.75 samples/sec   Loss 6.9952   LearningRate 0.0907   Epoch: 0   Global Step: 15930   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:53:57,909-Speed 3329.61 samples/sec   Loss 6.9973   LearningRate 0.0907   Epoch: 0   Global Step: 15940   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:54:00,981-Speed 3333.60 samples/sec   Loss 7.0970   LearningRate 0.0907   Epoch: 0   Global Step: 15950   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:54:04,056-Speed 3331.63 samples/sec   Loss 7.0168   LearningRate 0.0907   Epoch: 0   Global Step: 15960   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:54:07,129-Speed 3333.08 samples/sec   Loss 7.0004   LearningRate 0.0907   Epoch: 0   Global Step: 15970   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:54:10,245-Speed 3286.57 samples/sec   Loss 7.0995   LearningRate 0.0907   Epoch: 0   Global Step: 15980   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 00:54:13,403-Speed 3243.78 samples/sec   Loss 7.0515   LearningRate 0.0906   Epoch: 0   Global Step: 15990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:54:16,577-Speed 3227.01 samples/sec   Loss 6.9011   LearningRate 0.0906   Epoch: 0   Global Step: 16000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:55:01,144-[lfw][16000]XNorm: 22.021432
Training: 2022-04-11 00:55:01,145-[lfw][16000]Accuracy-Flip: 0.99733+-0.00226
Training: 2022-04-11 00:55:01,146-[lfw][16000]Accuracy-Highest: 0.99733
Training: 2022-04-11 00:55:52,468-[cfp_fp][16000]XNorm: 20.099813
Training: 2022-04-11 00:55:52,469-[cfp_fp][16000]Accuracy-Flip: 0.95729+-0.00814
Training: 2022-04-11 00:55:52,469-[cfp_fp][16000]Accuracy-Highest: 0.96371
Training: 2022-04-11 00:56:36,521-[agedb_30][16000]XNorm: 21.365375
Training: 2022-04-11 00:56:36,522-[agedb_30][16000]Accuracy-Flip: 0.96383+-0.00830
Training: 2022-04-11 00:56:36,522-[agedb_30][16000]Accuracy-Highest: 0.96383
Training: 2022-04-11 00:56:39,601-Speed 71.60 samples/sec   Loss 6.9391   LearningRate 0.0906   Epoch: 0   Global Step: 16010   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:56:42,669-Speed 3339.21 samples/sec   Loss 6.9941   LearningRate 0.0906   Epoch: 0   Global Step: 16020   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:56:45,817-Speed 3253.14 samples/sec   Loss 7.0394   LearningRate 0.0906   Epoch: 0   Global Step: 16030   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:56:48,897-Speed 3325.37 samples/sec   Loss 6.9248   LearningRate 0.0906   Epoch: 0   Global Step: 16040   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:56:51,972-Speed 3331.32 samples/sec   Loss 7.0725   LearningRate 0.0906   Epoch: 0   Global Step: 16050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:56:55,069-Speed 3307.33 samples/sec   Loss 7.0073   LearningRate 0.0906   Epoch: 0   Global Step: 16060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:56:58,126-Speed 3350.46 samples/sec   Loss 6.9687   LearningRate 0.0906   Epoch: 0   Global Step: 16070   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:57:01,182-Speed 3352.28 samples/sec   Loss 6.9251   LearningRate 0.0906   Epoch: 0   Global Step: 16080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-04-11 00:57:04,238-Speed 3350.59 samples/sec   Loss 6.9667   LearningRate 0.0906   Epoch: 0   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-04-11 00:57:07,325-Speed 3318.05 samples/sec   Loss 7.0465   LearningRate 0.0906   Epoch: 0   Global Step: 16100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:57:10,392-Speed 3339.71 samples/sec   Loss 6.8706   LearningRate 0.0906   Epoch: 0   Global Step: 16110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:57:13,439-Speed 3361.67 samples/sec   Loss 7.0383   LearningRate 0.0906   Epoch: 0   Global Step: 16120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:16,502-Speed 3343.96 samples/sec   Loss 7.0829   LearningRate 0.0906   Epoch: 0   Global Step: 16130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:19,587-Speed 3320.34 samples/sec   Loss 6.9731   LearningRate 0.0906   Epoch: 0   Global Step: 16140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:22,666-Speed 3326.30 samples/sec   Loss 7.1038   LearningRate 0.0906   Epoch: 0   Global Step: 16150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:25,731-Speed 3342.26 samples/sec   Loss 6.9595   LearningRate 0.0906   Epoch: 0   Global Step: 16160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:28,794-Speed 3344.40 samples/sec   Loss 7.0166   LearningRate 0.0905   Epoch: 0   Global Step: 16170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:31,873-Speed 3325.44 samples/sec   Loss 6.9245   LearningRate 0.0905   Epoch: 0   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:34,944-Speed 3336.09 samples/sec   Loss 7.0021   LearningRate 0.0905   Epoch: 0   Global Step: 16190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:38,008-Speed 3342.30 samples/sec   Loss 6.9878   LearningRate 0.0905   Epoch: 0   Global Step: 16200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:41,076-Speed 3338.32 samples/sec   Loss 6.9867   LearningRate 0.0905   Epoch: 0   Global Step: 16210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:57:44,147-Speed 3335.74 samples/sec   Loss 7.0613   LearningRate 0.0905   Epoch: 0   Global Step: 16220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:57:47,218-Speed 3335.51 samples/sec   Loss 6.9351   LearningRate 0.0905   Epoch: 0   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:57:50,281-Speed 3343.09 samples/sec   Loss 7.0316   LearningRate 0.0905   Epoch: 0   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:57:53,386-Speed 3299.36 samples/sec   Loss 6.9606   LearningRate 0.0905   Epoch: 0   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:57:56,522-Speed 3265.96 samples/sec   Loss 6.9416   LearningRate 0.0905   Epoch: 0   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:57:59,588-Speed 3340.41 samples/sec   Loss 6.9954   LearningRate 0.0905   Epoch: 0   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:58:02,663-Speed 3330.56 samples/sec   Loss 6.9117   LearningRate 0.0905   Epoch: 0   Global Step: 16280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:05,745-Speed 3323.34 samples/sec   Loss 6.9428   LearningRate 0.0905   Epoch: 0   Global Step: 16290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:08,815-Speed 3336.44 samples/sec   Loss 6.9117   LearningRate 0.0905   Epoch: 0   Global Step: 16300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:11,888-Speed 3334.07 samples/sec   Loss 6.9421   LearningRate 0.0905   Epoch: 0   Global Step: 16310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:15,038-Speed 3250.45 samples/sec   Loss 6.8840   LearningRate 0.0905   Epoch: 0   Global Step: 16320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:18,220-Speed 3220.15 samples/sec   Loss 7.0337   LearningRate 0.0905   Epoch: 0   Global Step: 16330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:21,381-Speed 3240.05 samples/sec   Loss 6.9474   LearningRate 0.0904   Epoch: 0   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:24,487-Speed 3296.63 samples/sec   Loss 6.8284   LearningRate 0.0904   Epoch: 0   Global Step: 16350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:27,564-Speed 3328.58 samples/sec   Loss 6.8221   LearningRate 0.0904   Epoch: 0   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:30,665-Speed 3303.06 samples/sec   Loss 6.9070   LearningRate 0.0904   Epoch: 0   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 00:58:33,732-Speed 3339.43 samples/sec   Loss 6.9222   LearningRate 0.0904   Epoch: 0   Global Step: 16380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:58:36,804-Speed 3335.12 samples/sec   Loss 6.8103   LearningRate 0.0904   Epoch: 0   Global Step: 16390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:58:39,869-Speed 3341.93 samples/sec   Loss 6.9265   LearningRate 0.0904   Epoch: 0   Global Step: 16400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:58:42,949-Speed 3325.46 samples/sec   Loss 6.9509   LearningRate 0.0904   Epoch: 0   Global Step: 16410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:58:46,015-Speed 3340.51 samples/sec   Loss 6.8551   LearningRate 0.0904   Epoch: 0   Global Step: 16420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:58:49,081-Speed 3340.19 samples/sec   Loss 6.9179   LearningRate 0.0904   Epoch: 0   Global Step: 16430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:58:52,208-Speed 3275.44 samples/sec   Loss 6.9532   LearningRate 0.0904   Epoch: 0   Global Step: 16440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:58:55,281-Speed 3333.29 samples/sec   Loss 6.9857   LearningRate 0.0904   Epoch: 0   Global Step: 16450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:58:58,395-Speed 3288.79 samples/sec   Loss 6.8813   LearningRate 0.0904   Epoch: 0   Global Step: 16460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:01,466-Speed 3336.25 samples/sec   Loss 6.9219   LearningRate 0.0904   Epoch: 0   Global Step: 16470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:04,638-Speed 3228.30 samples/sec   Loss 6.9725   LearningRate 0.0904   Epoch: 0   Global Step: 16480   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:59:07,807-Speed 3232.69 samples/sec   Loss 6.9454   LearningRate 0.0904   Epoch: 0   Global Step: 16490   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:59:10,875-Speed 3338.54 samples/sec   Loss 6.9042   LearningRate 0.0904   Epoch: 0   Global Step: 16500   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 00:59:14,012-Speed 3264.46 samples/sec   Loss 6.9419   LearningRate 0.0904   Epoch: 0   Global Step: 16510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:17,099-Speed 3318.28 samples/sec   Loss 6.8366   LearningRate 0.0903   Epoch: 0   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:20,216-Speed 3286.13 samples/sec   Loss 6.9997   LearningRate 0.0903   Epoch: 0   Global Step: 16530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:23,301-Speed 3320.56 samples/sec   Loss 6.9357   LearningRate 0.0903   Epoch: 0   Global Step: 16540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:26,364-Speed 3342.96 samples/sec   Loss 6.9145   LearningRate 0.0903   Epoch: 0   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:29,472-Speed 3296.47 samples/sec   Loss 6.8425   LearningRate 0.0903   Epoch: 0   Global Step: 16560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:32,554-Speed 3323.30 samples/sec   Loss 6.8607   LearningRate 0.0903   Epoch: 0   Global Step: 16570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:35,722-Speed 3232.64 samples/sec   Loss 6.8289   LearningRate 0.0903   Epoch: 0   Global Step: 16580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:38,785-Speed 3343.86 samples/sec   Loss 6.9207   LearningRate 0.0903   Epoch: 0   Global Step: 16590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:41,863-Speed 3328.12 samples/sec   Loss 6.8365   LearningRate 0.0903   Epoch: 0   Global Step: 16600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:44,916-Speed 3354.59 samples/sec   Loss 6.9018   LearningRate 0.0903   Epoch: 0   Global Step: 16610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:47,983-Speed 3340.24 samples/sec   Loss 6.8142   LearningRate 0.0903   Epoch: 0   Global Step: 16620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:51,083-Speed 3303.11 samples/sec   Loss 6.8772   LearningRate 0.0903   Epoch: 0   Global Step: 16630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:54,168-Speed 3320.11 samples/sec   Loss 6.8944   LearningRate 0.0903   Epoch: 0   Global Step: 16640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 00:59:57,265-Speed 3307.48 samples/sec   Loss 6.8794   LearningRate 0.0903   Epoch: 0   Global Step: 16650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:00:00,329-Speed 3343.31 samples/sec   Loss 6.8982   LearningRate 0.0903   Epoch: 0   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:00:03,396-Speed 3339.71 samples/sec   Loss 6.7703   LearningRate 0.0903   Epoch: 0   Global Step: 16670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:00:06,789-Speed 3018.51 samples/sec   Loss 6.8898   LearningRate 0.0903   Epoch: 0   Global Step: 16680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:00:09,858-Speed 3338.11 samples/sec   Loss 6.8973   LearningRate 0.0903   Epoch: 0   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:00:41,049-Speed 328.31 samples/sec   Loss 6.2319   LearningRate 0.0902   Epoch: 1   Global Step: 16700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:00:44,264-Speed 3186.41 samples/sec   Loss 6.1680   LearningRate 0.0902   Epoch: 1   Global Step: 16710   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:00:47,372-Speed 3295.57 samples/sec   Loss 6.1998   LearningRate 0.0902   Epoch: 1   Global Step: 16720   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:00:50,491-Speed 3284.35 samples/sec   Loss 6.0872   LearningRate 0.0902   Epoch: 1   Global Step: 16730   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:00:53,560-Speed 3337.57 samples/sec   Loss 6.0899   LearningRate 0.0902   Epoch: 1   Global Step: 16740   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:00:56,610-Speed 3358.68 samples/sec   Loss 6.1860   LearningRate 0.0902   Epoch: 1   Global Step: 16750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:00:59,686-Speed 3330.47 samples/sec   Loss 6.1823   LearningRate 0.0902   Epoch: 1   Global Step: 16760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:01:02,759-Speed 3333.36 samples/sec   Loss 6.0355   LearningRate 0.0902   Epoch: 1   Global Step: 16770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:01:05,848-Speed 3316.10 samples/sec   Loss 6.1330   LearningRate 0.0902   Epoch: 1   Global Step: 16780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:01:09,026-Speed 3222.84 samples/sec   Loss 6.0951   LearningRate 0.0902   Epoch: 1   Global Step: 16790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:12,256-Speed 3170.65 samples/sec   Loss 6.1954   LearningRate 0.0902   Epoch: 1   Global Step: 16800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:15,798-Speed 2891.86 samples/sec   Loss 6.1614   LearningRate 0.0902   Epoch: 1   Global Step: 16810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:18,871-Speed 3333.02 samples/sec   Loss 6.2441   LearningRate 0.0902   Epoch: 1   Global Step: 16820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:21,938-Speed 3340.34 samples/sec   Loss 6.1512   LearningRate 0.0902   Epoch: 1   Global Step: 16830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:25,028-Speed 3314.53 samples/sec   Loss 6.1127   LearningRate 0.0902   Epoch: 1   Global Step: 16840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:28,150-Speed 3280.69 samples/sec   Loss 6.1164   LearningRate 0.0902   Epoch: 1   Global Step: 16850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:31,218-Speed 3339.85 samples/sec   Loss 6.0299   LearningRate 0.0902   Epoch: 1   Global Step: 16860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:34,348-Speed 3271.93 samples/sec   Loss 6.2368   LearningRate 0.0901   Epoch: 1   Global Step: 16870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:37,419-Speed 3335.39 samples/sec   Loss 6.2421   LearningRate 0.0901   Epoch: 1   Global Step: 16880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:40,499-Speed 3325.25 samples/sec   Loss 6.1206   LearningRate 0.0901   Epoch: 1   Global Step: 16890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:01:43,580-Speed 3325.42 samples/sec   Loss 6.0947   LearningRate 0.0901   Epoch: 1   Global Step: 16900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:01:46,631-Speed 3356.90 samples/sec   Loss 6.1178   LearningRate 0.0901   Epoch: 1   Global Step: 16910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:49,742-Speed 3292.96 samples/sec   Loss 6.1985   LearningRate 0.0901   Epoch: 1   Global Step: 16920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:52,862-Speed 3282.66 samples/sec   Loss 6.1143   LearningRate 0.0901   Epoch: 1   Global Step: 16930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:56,018-Speed 3244.90 samples/sec   Loss 6.2682   LearningRate 0.0901   Epoch: 1   Global Step: 16940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:01:59,259-Speed 3161.27 samples/sec   Loss 6.1516   LearningRate 0.0901   Epoch: 1   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:02,333-Speed 3331.73 samples/sec   Loss 6.1544   LearningRate 0.0901   Epoch: 1   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:05,425-Speed 3313.02 samples/sec   Loss 6.1195   LearningRate 0.0901   Epoch: 1   Global Step: 16970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:08,507-Speed 3323.44 samples/sec   Loss 6.1361   LearningRate 0.0901   Epoch: 1   Global Step: 16980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:11,599-Speed 3313.46 samples/sec   Loss 6.2187   LearningRate 0.0901   Epoch: 1   Global Step: 16990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:14,682-Speed 3322.13 samples/sec   Loss 6.1705   LearningRate 0.0901   Epoch: 1   Global Step: 17000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:17,755-Speed 3332.49 samples/sec   Loss 6.1066   LearningRate 0.0901   Epoch: 1   Global Step: 17010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:02:20,841-Speed 3319.52 samples/sec   Loss 6.1478   LearningRate 0.0901   Epoch: 1   Global Step: 17020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:02:23,925-Speed 3320.70 samples/sec   Loss 6.0688   LearningRate 0.0901   Epoch: 1   Global Step: 17030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:26,999-Speed 3332.80 samples/sec   Loss 6.0832   LearningRate 0.0901   Epoch: 1   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:30,135-Speed 3266.10 samples/sec   Loss 6.1411   LearningRate 0.0900   Epoch: 1   Global Step: 17050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:33,232-Speed 3308.01 samples/sec   Loss 6.1710   LearningRate 0.0900   Epoch: 1   Global Step: 17060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:36,315-Speed 3321.89 samples/sec   Loss 6.0602   LearningRate 0.0900   Epoch: 1   Global Step: 17070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:39,476-Speed 3240.69 samples/sec   Loss 6.1500   LearningRate 0.0900   Epoch: 1   Global Step: 17080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:42,577-Speed 3303.30 samples/sec   Loss 6.2294   LearningRate 0.0900   Epoch: 1   Global Step: 17090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:45,786-Speed 3191.21 samples/sec   Loss 6.2177   LearningRate 0.0900   Epoch: 1   Global Step: 17100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:48,933-Speed 3255.30 samples/sec   Loss 6.1602   LearningRate 0.0900   Epoch: 1   Global Step: 17110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:52,053-Speed 3283.07 samples/sec   Loss 6.1944   LearningRate 0.0900   Epoch: 1   Global Step: 17120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:02:55,150-Speed 3307.49 samples/sec   Loss 6.2382   LearningRate 0.0900   Epoch: 1   Global Step: 17130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:02:58,229-Speed 3326.81 samples/sec   Loss 6.1375   LearningRate 0.0900   Epoch: 1   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:01,296-Speed 3340.07 samples/sec   Loss 6.2798   LearningRate 0.0900   Epoch: 1   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:04,386-Speed 3314.65 samples/sec   Loss 6.1387   LearningRate 0.0900   Epoch: 1   Global Step: 17160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:07,479-Speed 3310.94 samples/sec   Loss 6.2315   LearningRate 0.0900   Epoch: 1   Global Step: 17170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:10,550-Speed 3335.78 samples/sec   Loss 6.2318   LearningRate 0.0900   Epoch: 1   Global Step: 17180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:13,627-Speed 3328.67 samples/sec   Loss 6.0732   LearningRate 0.0900   Epoch: 1   Global Step: 17190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:16,705-Speed 3327.92 samples/sec   Loss 6.2167   LearningRate 0.0900   Epoch: 1   Global Step: 17200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:19,770-Speed 3341.76 samples/sec   Loss 6.1403   LearningRate 0.0900   Epoch: 1   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:22,853-Speed 3321.97 samples/sec   Loss 6.2210   LearningRate 0.0899   Epoch: 1   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:25,921-Speed 3338.73 samples/sec   Loss 6.2318   LearningRate 0.0899   Epoch: 1   Global Step: 17230   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:03:29,020-Speed 3304.70 samples/sec   Loss 6.1222   LearningRate 0.0899   Epoch: 1   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:32,114-Speed 3311.24 samples/sec   Loss 6.0903   LearningRate 0.0899   Epoch: 1   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:35,183-Speed 3336.88 samples/sec   Loss 6.1998   LearningRate 0.0899   Epoch: 1   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:38,274-Speed 3314.27 samples/sec   Loss 6.2278   LearningRate 0.0899   Epoch: 1   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:41,343-Speed 3337.95 samples/sec   Loss 6.2208   LearningRate 0.0899   Epoch: 1   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:44,446-Speed 3300.01 samples/sec   Loss 6.2017   LearningRate 0.0899   Epoch: 1   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:47,625-Speed 3222.65 samples/sec   Loss 6.1929   LearningRate 0.0899   Epoch: 1   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:50,711-Speed 3318.37 samples/sec   Loss 6.2656   LearningRate 0.0899   Epoch: 1   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:53,808-Speed 3307.54 samples/sec   Loss 6.2600   LearningRate 0.0899   Epoch: 1   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:03:56,898-Speed 3314.26 samples/sec   Loss 6.2937   LearningRate 0.0899   Epoch: 1   Global Step: 17330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:04:00,015-Speed 3286.43 samples/sec   Loss 6.1663   LearningRate 0.0899   Epoch: 1   Global Step: 17340   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:04:03,109-Speed 3310.98 samples/sec   Loss 6.2634   LearningRate 0.0899   Epoch: 1   Global Step: 17350   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:04:06,193-Speed 3320.98 samples/sec   Loss 6.1336   LearningRate 0.0899   Epoch: 1   Global Step: 17360   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:04:09,252-Speed 3348.41 samples/sec   Loss 6.1964   LearningRate 0.0899   Epoch: 1   Global Step: 17370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:04:12,348-Speed 3307.99 samples/sec   Loss 6.2292   LearningRate 0.0899   Epoch: 1   Global Step: 17380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:04:15,571-Speed 3178.42 samples/sec   Loss 6.2084   LearningRate 0.0899   Epoch: 1   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:04:18,746-Speed 3225.35 samples/sec   Loss 6.1986   LearningRate 0.0898   Epoch: 1   Global Step: 17400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:04:21,938-Speed 3208.65 samples/sec   Loss 6.1667   LearningRate 0.0898   Epoch: 1   Global Step: 17410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:25,015-Speed 3329.15 samples/sec   Loss 6.2653   LearningRate 0.0898   Epoch: 1   Global Step: 17420   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:28,086-Speed 3335.25 samples/sec   Loss 6.2564   LearningRate 0.0898   Epoch: 1   Global Step: 17430   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:31,163-Speed 3328.24 samples/sec   Loss 6.1858   LearningRate 0.0898   Epoch: 1   Global Step: 17440   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:34,237-Speed 3332.23 samples/sec   Loss 6.1413   LearningRate 0.0898   Epoch: 1   Global Step: 17450   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:37,319-Speed 3323.60 samples/sec   Loss 6.2188   LearningRate 0.0898   Epoch: 1   Global Step: 17460   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:40,395-Speed 3329.76 samples/sec   Loss 6.3843   LearningRate 0.0898   Epoch: 1   Global Step: 17470   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:43,460-Speed 3341.40 samples/sec   Loss 6.2598   LearningRate 0.0898   Epoch: 1   Global Step: 17480   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:46,551-Speed 3313.17 samples/sec   Loss 6.2606   LearningRate 0.0898   Epoch: 1   Global Step: 17490   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:49,643-Speed 3313.53 samples/sec   Loss 6.2149   LearningRate 0.0898   Epoch: 1   Global Step: 17500   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:52,811-Speed 3232.59 samples/sec   Loss 6.1987   LearningRate 0.0898   Epoch: 1   Global Step: 17510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:04:55,898-Speed 3318.00 samples/sec   Loss 6.1516   LearningRate 0.0898   Epoch: 1   Global Step: 17520   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:04:58,967-Speed 3338.06 samples/sec   Loss 6.2915   LearningRate 0.0898   Epoch: 1   Global Step: 17530   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:02,036-Speed 3336.83 samples/sec   Loss 6.2130   LearningRate 0.0898   Epoch: 1   Global Step: 17540   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:05,125-Speed 3316.66 samples/sec   Loss 6.2901   LearningRate 0.0898   Epoch: 1   Global Step: 17550   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:08,214-Speed 3315.73 samples/sec   Loss 6.1568   LearningRate 0.0898   Epoch: 1   Global Step: 17560   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:11,299-Speed 3320.64 samples/sec   Loss 6.1997   LearningRate 0.0898   Epoch: 1   Global Step: 17570   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:14,371-Speed 3333.62 samples/sec   Loss 6.2651   LearningRate 0.0897   Epoch: 1   Global Step: 17580   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:17,447-Speed 3329.78 samples/sec   Loss 6.1575   LearningRate 0.0897   Epoch: 1   Global Step: 17590   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:20,546-Speed 3305.02 samples/sec   Loss 6.2498   LearningRate 0.0897   Epoch: 1   Global Step: 17600   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:23,621-Speed 3330.72 samples/sec   Loss 6.2064   LearningRate 0.0897   Epoch: 1   Global Step: 17610   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:26,710-Speed 3316.29 samples/sec   Loss 6.2201   LearningRate 0.0897   Epoch: 1   Global Step: 17620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:05:29,796-Speed 3318.62 samples/sec   Loss 6.3007   LearningRate 0.0897   Epoch: 1   Global Step: 17630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:05:32,873-Speed 3328.28 samples/sec   Loss 6.1541   LearningRate 0.0897   Epoch: 1   Global Step: 17640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:05:35,948-Speed 3331.52 samples/sec   Loss 6.2398   LearningRate 0.0897   Epoch: 1   Global Step: 17650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:05:39,041-Speed 3310.92 samples/sec   Loss 6.2406   LearningRate 0.0897   Epoch: 1   Global Step: 17660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:05:42,133-Speed 3314.05 samples/sec   Loss 6.2261   LearningRate 0.0897   Epoch: 1   Global Step: 17670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:05:45,257-Speed 3279.26 samples/sec   Loss 6.2387   LearningRate 0.0897   Epoch: 1   Global Step: 17680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:48,342-Speed 3319.14 samples/sec   Loss 6.1906   LearningRate 0.0897   Epoch: 1   Global Step: 17690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:51,424-Speed 3324.28 samples/sec   Loss 6.2072   LearningRate 0.0897   Epoch: 1   Global Step: 17700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:54,492-Speed 3338.45 samples/sec   Loss 6.1525   LearningRate 0.0897   Epoch: 1   Global Step: 17710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:05:57,561-Speed 3336.95 samples/sec   Loss 6.1254   LearningRate 0.0897   Epoch: 1   Global Step: 17720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:00,667-Speed 3297.64 samples/sec   Loss 6.2143   LearningRate 0.0897   Epoch: 1   Global Step: 17730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:03,876-Speed 3191.49 samples/sec   Loss 6.2111   LearningRate 0.0897   Epoch: 1   Global Step: 17740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:06,951-Speed 3331.86 samples/sec   Loss 6.2735   LearningRate 0.0896   Epoch: 1   Global Step: 17750   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:10,019-Speed 3337.80 samples/sec   Loss 6.2088   LearningRate 0.0896   Epoch: 1   Global Step: 17760   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:13,089-Speed 3336.42 samples/sec   Loss 6.2959   LearningRate 0.0896   Epoch: 1   Global Step: 17770   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:16,159-Speed 3336.56 samples/sec   Loss 6.0902   LearningRate 0.0896   Epoch: 1   Global Step: 17780   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:19,228-Speed 3337.10 samples/sec   Loss 6.3344   LearningRate 0.0896   Epoch: 1   Global Step: 17790   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:22,304-Speed 3330.17 samples/sec   Loss 6.1628   LearningRate 0.0896   Epoch: 1   Global Step: 17800   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:25,372-Speed 3338.26 samples/sec   Loss 6.2368   LearningRate 0.0896   Epoch: 1   Global Step: 17810   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:28,448-Speed 3330.55 samples/sec   Loss 6.2207   LearningRate 0.0896   Epoch: 1   Global Step: 17820   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:31,522-Speed 3332.39 samples/sec   Loss 6.3581   LearningRate 0.0896   Epoch: 1   Global Step: 17830   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:34,597-Speed 3330.73 samples/sec   Loss 6.2911   LearningRate 0.0896   Epoch: 1   Global Step: 17840   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:06:37,684-Speed 3318.13 samples/sec   Loss 6.2544   LearningRate 0.0896   Epoch: 1   Global Step: 17850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:40,808-Speed 3278.43 samples/sec   Loss 6.2116   LearningRate 0.0896   Epoch: 1   Global Step: 17860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:43,876-Speed 3337.94 samples/sec   Loss 6.2504   LearningRate 0.0896   Epoch: 1   Global Step: 17870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:46,960-Speed 3322.74 samples/sec   Loss 6.2789   LearningRate 0.0896   Epoch: 1   Global Step: 17880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:50,036-Speed 3329.21 samples/sec   Loss 6.2707   LearningRate 0.0896   Epoch: 1   Global Step: 17890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:53,148-Speed 3291.93 samples/sec   Loss 6.1812   LearningRate 0.0896   Epoch: 1   Global Step: 17900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:56,296-Speed 3253.54 samples/sec   Loss 6.0953   LearningRate 0.0896   Epoch: 1   Global Step: 17910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:06:59,365-Speed 3337.35 samples/sec   Loss 6.2291   LearningRate 0.0896   Epoch: 1   Global Step: 17920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:07:02,463-Speed 3306.55 samples/sec   Loss 6.2079   LearningRate 0.0895   Epoch: 1   Global Step: 17930   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:07:05,599-Speed 3266.24 samples/sec   Loss 6.1571   LearningRate 0.0895   Epoch: 1   Global Step: 17940   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:07:08,728-Speed 3272.88 samples/sec   Loss 6.2279   LearningRate 0.0895   Epoch: 1   Global Step: 17950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:07:11,814-Speed 3319.49 samples/sec   Loss 6.1629   LearningRate 0.0895   Epoch: 1   Global Step: 17960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:07:14,894-Speed 3324.91 samples/sec   Loss 6.2528   LearningRate 0.0895   Epoch: 1   Global Step: 17970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:07:17,963-Speed 3338.20 samples/sec   Loss 6.0419   LearningRate 0.0895   Epoch: 1   Global Step: 17980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:07:21,055-Speed 3312.95 samples/sec   Loss 6.2544   LearningRate 0.0895   Epoch: 1   Global Step: 17990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:07:24,145-Speed 3314.55 samples/sec   Loss 6.2991   LearningRate 0.0895   Epoch: 1   Global Step: 18000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:08:08,318-[lfw][18000]XNorm: 23.301956
Training: 2022-04-11 01:08:08,318-[lfw][18000]Accuracy-Flip: 0.99633+-0.00332
Training: 2022-04-11 01:08:08,319-[lfw][18000]Accuracy-Highest: 0.99733
Training: 2022-04-11 01:08:59,473-[cfp_fp][18000]XNorm: 21.433828
Training: 2022-04-11 01:08:59,474-[cfp_fp][18000]Accuracy-Flip: 0.96457+-0.00797
Training: 2022-04-11 01:08:59,474-[cfp_fp][18000]Accuracy-Highest: 0.96457
Training: 2022-04-11 01:09:43,744-[agedb_30][18000]XNorm: 23.306870
Training: 2022-04-11 01:09:43,744-[agedb_30][18000]Accuracy-Flip: 0.96600+-0.00883
Training: 2022-04-11 01:09:43,745-[agedb_30][18000]Accuracy-Highest: 0.96600
Training: 2022-04-11 01:09:46,825-Speed 71.77 samples/sec   Loss 6.2280   LearningRate 0.0895   Epoch: 1   Global Step: 18010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:09:49,906-Speed 3324.97 samples/sec   Loss 6.3454   LearningRate 0.0895   Epoch: 1   Global Step: 18020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:09:52,988-Speed 3323.67 samples/sec   Loss 6.2519   LearningRate 0.0895   Epoch: 1   Global Step: 18030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:09:56,041-Speed 3355.29 samples/sec   Loss 6.2482   LearningRate 0.0895   Epoch: 1   Global Step: 18040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:09:59,104-Speed 3343.80 samples/sec   Loss 6.1959   LearningRate 0.0895   Epoch: 1   Global Step: 18050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:02,156-Speed 3355.72 samples/sec   Loss 6.2349   LearningRate 0.0895   Epoch: 1   Global Step: 18060   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:10:05,261-Speed 3298.90 samples/sec   Loss 6.2750   LearningRate 0.0895   Epoch: 1   Global Step: 18070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:10:08,324-Speed 3343.20 samples/sec   Loss 6.2639   LearningRate 0.0895   Epoch: 1   Global Step: 18080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:10:11,397-Speed 3333.82 samples/sec   Loss 6.2081   LearningRate 0.0895   Epoch: 1   Global Step: 18090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:10:14,453-Speed 3351.23 samples/sec   Loss 6.1600   LearningRate 0.0894   Epoch: 1   Global Step: 18100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:10:17,514-Speed 3346.52 samples/sec   Loss 6.1944   LearningRate 0.0894   Epoch: 1   Global Step: 18110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:10:20,562-Speed 3359.90 samples/sec   Loss 6.2591   LearningRate 0.0894   Epoch: 1   Global Step: 18120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:23,627-Speed 3341.91 samples/sec   Loss 6.2178   LearningRate 0.0894   Epoch: 1   Global Step: 18130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:26,685-Speed 3349.05 samples/sec   Loss 6.2822   LearningRate 0.0894   Epoch: 1   Global Step: 18140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:29,759-Speed 3332.42 samples/sec   Loss 6.2651   LearningRate 0.0894   Epoch: 1   Global Step: 18150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:32,820-Speed 3346.25 samples/sec   Loss 6.0778   LearningRate 0.0894   Epoch: 1   Global Step: 18160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:35,885-Speed 3341.57 samples/sec   Loss 6.2285   LearningRate 0.0894   Epoch: 1   Global Step: 18170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:38,971-Speed 3319.33 samples/sec   Loss 6.1364   LearningRate 0.0894   Epoch: 1   Global Step: 18180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:42,072-Speed 3303.46 samples/sec   Loss 6.1871   LearningRate 0.0894   Epoch: 1   Global Step: 18190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:45,136-Speed 3342.05 samples/sec   Loss 6.2385   LearningRate 0.0894   Epoch: 1   Global Step: 18200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:48,223-Speed 3317.93 samples/sec   Loss 6.1353   LearningRate 0.0894   Epoch: 1   Global Step: 18210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:10:51,295-Speed 3334.66 samples/sec   Loss 6.2354   LearningRate 0.0894   Epoch: 1   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:10:54,361-Speed 3339.88 samples/sec   Loss 6.2550   LearningRate 0.0894   Epoch: 1   Global Step: 18230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:10:57,436-Speed 3330.98 samples/sec   Loss 6.2870   LearningRate 0.0894   Epoch: 1   Global Step: 18240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:00,533-Speed 3307.41 samples/sec   Loss 6.3345   LearningRate 0.0894   Epoch: 1   Global Step: 18250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:03,680-Speed 3255.27 samples/sec   Loss 6.2346   LearningRate 0.0894   Epoch: 1   Global Step: 18260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:06,800-Speed 3282.60 samples/sec   Loss 6.1982   LearningRate 0.0894   Epoch: 1   Global Step: 18270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:09,861-Speed 3345.61 samples/sec   Loss 6.2821   LearningRate 0.0893   Epoch: 1   Global Step: 18280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:12,944-Speed 3323.00 samples/sec   Loss 6.1760   LearningRate 0.0893   Epoch: 1   Global Step: 18290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:16,045-Speed 3302.75 samples/sec   Loss 6.1966   LearningRate 0.0893   Epoch: 1   Global Step: 18300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:19,206-Speed 3240.67 samples/sec   Loss 6.2258   LearningRate 0.0893   Epoch: 1   Global Step: 18310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:22,366-Speed 3240.65 samples/sec   Loss 6.2888   LearningRate 0.0893   Epoch: 1   Global Step: 18320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:25,428-Speed 3345.96 samples/sec   Loss 6.1746   LearningRate 0.0893   Epoch: 1   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:11:28,486-Speed 3349.07 samples/sec   Loss 6.2516   LearningRate 0.0893   Epoch: 1   Global Step: 18340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:31,610-Speed 3278.84 samples/sec   Loss 6.1640   LearningRate 0.0893   Epoch: 1   Global Step: 18350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:34,758-Speed 3253.65 samples/sec   Loss 6.2395   LearningRate 0.0893   Epoch: 1   Global Step: 18360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:37,926-Speed 3233.57 samples/sec   Loss 6.2335   LearningRate 0.0893   Epoch: 1   Global Step: 18370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:41,046-Speed 3282.15 samples/sec   Loss 6.2726   LearningRate 0.0893   Epoch: 1   Global Step: 18380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:44,144-Speed 3306.87 samples/sec   Loss 6.2240   LearningRate 0.0893   Epoch: 1   Global Step: 18390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:47,205-Speed 3345.97 samples/sec   Loss 6.2511   LearningRate 0.0893   Epoch: 1   Global Step: 18400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:50,265-Speed 3346.95 samples/sec   Loss 6.2168   LearningRate 0.0893   Epoch: 1   Global Step: 18410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:53,334-Speed 3337.46 samples/sec   Loss 6.1049   LearningRate 0.0893   Epoch: 1   Global Step: 18420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:56,395-Speed 3346.49 samples/sec   Loss 6.2314   LearningRate 0.0893   Epoch: 1   Global Step: 18430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:11:59,453-Speed 3349.19 samples/sec   Loss 6.1179   LearningRate 0.0893   Epoch: 1   Global Step: 18440   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:12:02,515-Speed 3345.28 samples/sec   Loss 6.3097   LearningRate 0.0893   Epoch: 1   Global Step: 18450   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:12:05,643-Speed 3274.13 samples/sec   Loss 6.1530   LearningRate 0.0892   Epoch: 1   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:08,735-Speed 3312.16 samples/sec   Loss 6.2653   LearningRate 0.0892   Epoch: 1   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:11,798-Speed 3343.99 samples/sec   Loss 6.2122   LearningRate 0.0892   Epoch: 1   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:14,858-Speed 3348.31 samples/sec   Loss 6.2193   LearningRate 0.0892   Epoch: 1   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:17,932-Speed 3331.45 samples/sec   Loss 6.1635   LearningRate 0.0892   Epoch: 1   Global Step: 18500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:20,992-Speed 3346.93 samples/sec   Loss 6.1997   LearningRate 0.0892   Epoch: 1   Global Step: 18510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:24,060-Speed 3338.80 samples/sec   Loss 6.1517   LearningRate 0.0892   Epoch: 1   Global Step: 18520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:27,117-Speed 3350.19 samples/sec   Loss 6.2138   LearningRate 0.0892   Epoch: 1   Global Step: 18530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:30,207-Speed 3314.65 samples/sec   Loss 6.2141   LearningRate 0.0892   Epoch: 1   Global Step: 18540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:33,265-Speed 3349.14 samples/sec   Loss 6.2828   LearningRate 0.0892   Epoch: 1   Global Step: 18550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:36,328-Speed 3344.42 samples/sec   Loss 6.2477   LearningRate 0.0892   Epoch: 1   Global Step: 18560   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:12:39,454-Speed 3275.71 samples/sec   Loss 6.2283   LearningRate 0.0892   Epoch: 1   Global Step: 18570   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:12:42,513-Speed 3349.32 samples/sec   Loss 6.2665   LearningRate 0.0892   Epoch: 1   Global Step: 18580   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:12:45,604-Speed 3313.25 samples/sec   Loss 6.2654   LearningRate 0.0892   Epoch: 1   Global Step: 18590   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:12:48,689-Speed 3321.08 samples/sec   Loss 6.2133   LearningRate 0.0892   Epoch: 1   Global Step: 18600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:51,757-Speed 3338.08 samples/sec   Loss 6.2821   LearningRate 0.0892   Epoch: 1   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:54,819-Speed 3344.53 samples/sec   Loss 6.2578   LearningRate 0.0892   Epoch: 1   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:12:57,901-Speed 3322.95 samples/sec   Loss 6.2210   LearningRate 0.0891   Epoch: 1   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:13:00,982-Speed 3324.34 samples/sec   Loss 6.2615   LearningRate 0.0891   Epoch: 1   Global Step: 18640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:13:04,081-Speed 3305.43 samples/sec   Loss 6.2124   LearningRate 0.0891   Epoch: 1   Global Step: 18650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:13:07,182-Speed 3302.38 samples/sec   Loss 6.2106   LearningRate 0.0891   Epoch: 1   Global Step: 18660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:13:10,256-Speed 3332.85 samples/sec   Loss 6.2413   LearningRate 0.0891   Epoch: 1   Global Step: 18670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:13:13,338-Speed 3322.85 samples/sec   Loss 6.2465   LearningRate 0.0891   Epoch: 1   Global Step: 18680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:13:16,435-Speed 3307.97 samples/sec   Loss 6.2495   LearningRate 0.0891   Epoch: 1   Global Step: 18690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:13:19,523-Speed 3316.11 samples/sec   Loss 6.2119   LearningRate 0.0891   Epoch: 1   Global Step: 18700   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:13:22,581-Speed 3349.10 samples/sec   Loss 6.1691   LearningRate 0.0891   Epoch: 1   Global Step: 18710   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:13:25,681-Speed 3304.19 samples/sec   Loss 6.1460   LearningRate 0.0891   Epoch: 1   Global Step: 18720   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:13:28,739-Speed 3349.32 samples/sec   Loss 6.2766   LearningRate 0.0891   Epoch: 1   Global Step: 18730   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:13:31,786-Speed 3361.24 samples/sec   Loss 6.1414   LearningRate 0.0891   Epoch: 1   Global Step: 18740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:13:34,851-Speed 3342.49 samples/sec   Loss 6.2323   LearningRate 0.0891   Epoch: 1   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:13:37,909-Speed 3349.45 samples/sec   Loss 6.1476   LearningRate 0.0891   Epoch: 1   Global Step: 18760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:13:40,971-Speed 3345.60 samples/sec   Loss 6.1665   LearningRate 0.0891   Epoch: 1   Global Step: 18770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:13:44,036-Speed 3340.63 samples/sec   Loss 6.2634   LearningRate 0.0891   Epoch: 1   Global Step: 18780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:13:47,098-Speed 3345.57 samples/sec   Loss 6.2920   LearningRate 0.0891   Epoch: 1   Global Step: 18790   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:13:50,160-Speed 3345.24 samples/sec   Loss 6.2522   LearningRate 0.0891   Epoch: 1   Global Step: 18800   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:13:53,231-Speed 3334.59 samples/sec   Loss 6.2491   LearningRate 0.0890   Epoch: 1   Global Step: 18810   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:13:56,297-Speed 3340.82 samples/sec   Loss 6.2738   LearningRate 0.0890   Epoch: 1   Global Step: 18820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:13:59,362-Speed 3342.39 samples/sec   Loss 6.2396   LearningRate 0.0890   Epoch: 1   Global Step: 18830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:02,428-Speed 3340.04 samples/sec   Loss 6.1517   LearningRate 0.0890   Epoch: 1   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:05,490-Speed 3345.19 samples/sec   Loss 6.1553   LearningRate 0.0890   Epoch: 1   Global Step: 18850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:08,561-Speed 3335.44 samples/sec   Loss 6.2593   LearningRate 0.0890   Epoch: 1   Global Step: 18860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:14:11,623-Speed 3345.20 samples/sec   Loss 6.2512   LearningRate 0.0890   Epoch: 1   Global Step: 18870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:14:14,685-Speed 3344.50 samples/sec   Loss 6.3315   LearningRate 0.0890   Epoch: 1   Global Step: 18880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:14:17,752-Speed 3340.19 samples/sec   Loss 6.2830   LearningRate 0.0890   Epoch: 1   Global Step: 18890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:14:20,824-Speed 3333.24 samples/sec   Loss 6.2363   LearningRate 0.0890   Epoch: 1   Global Step: 18900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:14:23,887-Speed 3344.50 samples/sec   Loss 6.2873   LearningRate 0.0890   Epoch: 1   Global Step: 18910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:14:26,951-Speed 3342.98 samples/sec   Loss 6.2996   LearningRate 0.0890   Epoch: 1   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:14:30,055-Speed 3299.65 samples/sec   Loss 6.1922   LearningRate 0.0890   Epoch: 1   Global Step: 18930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:14:33,189-Speed 3268.41 samples/sec   Loss 6.1779   LearningRate 0.0890   Epoch: 1   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:14:36,240-Speed 3356.78 samples/sec   Loss 6.1860   LearningRate 0.0890   Epoch: 1   Global Step: 18950   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:39,304-Speed 3342.94 samples/sec   Loss 6.2445   LearningRate 0.0890   Epoch: 1   Global Step: 18960   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:42,382-Speed 3327.74 samples/sec   Loss 6.1912   LearningRate 0.0890   Epoch: 1   Global Step: 18970   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:45,517-Speed 3267.37 samples/sec   Loss 6.2343   LearningRate 0.0890   Epoch: 1   Global Step: 18980   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:48,578-Speed 3345.80 samples/sec   Loss 6.1686   LearningRate 0.0889   Epoch: 1   Global Step: 18990   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:51,648-Speed 3336.63 samples/sec   Loss 6.2313   LearningRate 0.0889   Epoch: 1   Global Step: 19000   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:54,709-Speed 3345.88 samples/sec   Loss 6.2736   LearningRate 0.0889   Epoch: 1   Global Step: 19010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:14:57,792-Speed 3321.24 samples/sec   Loss 6.2294   LearningRate 0.0889   Epoch: 1   Global Step: 19020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:00,866-Speed 3333.32 samples/sec   Loss 6.1481   LearningRate 0.0889   Epoch: 1   Global Step: 19030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:03,969-Speed 3300.53 samples/sec   Loss 6.2404   LearningRate 0.0889   Epoch: 1   Global Step: 19040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:07,030-Speed 3345.74 samples/sec   Loss 6.2550   LearningRate 0.0889   Epoch: 1   Global Step: 19050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:15:10,080-Speed 3358.69 samples/sec   Loss 6.2162   LearningRate 0.0889   Epoch: 1   Global Step: 19060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:13,145-Speed 3341.31 samples/sec   Loss 6.0782   LearningRate 0.0889   Epoch: 1   Global Step: 19070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:16,210-Speed 3341.96 samples/sec   Loss 6.2011   LearningRate 0.0889   Epoch: 1   Global Step: 19080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:19,276-Speed 3341.13 samples/sec   Loss 6.2373   LearningRate 0.0889   Epoch: 1   Global Step: 19090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:22,353-Speed 3328.77 samples/sec   Loss 6.2189   LearningRate 0.0889   Epoch: 1   Global Step: 19100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:25,442-Speed 3315.35 samples/sec   Loss 6.1552   LearningRate 0.0889   Epoch: 1   Global Step: 19110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:28,507-Speed 3342.29 samples/sec   Loss 6.1753   LearningRate 0.0889   Epoch: 1   Global Step: 19120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:31,566-Speed 3348.37 samples/sec   Loss 6.1921   LearningRate 0.0889   Epoch: 1   Global Step: 19130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:34,624-Speed 3349.04 samples/sec   Loss 6.2184   LearningRate 0.0889   Epoch: 1   Global Step: 19140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:37,683-Speed 3348.83 samples/sec   Loss 6.2099   LearningRate 0.0889   Epoch: 1   Global Step: 19150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:40,748-Speed 3341.14 samples/sec   Loss 6.2520   LearningRate 0.0889   Epoch: 1   Global Step: 19160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:15:43,815-Speed 3339.61 samples/sec   Loss 6.1360   LearningRate 0.0888   Epoch: 1   Global Step: 19170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:15:46,874-Speed 3348.12 samples/sec   Loss 6.2036   LearningRate 0.0888   Epoch: 1   Global Step: 19180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:49,950-Speed 3330.10 samples/sec   Loss 6.3330   LearningRate 0.0888   Epoch: 1   Global Step: 19190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:53,032-Speed 3322.86 samples/sec   Loss 6.2088   LearningRate 0.0888   Epoch: 1   Global Step: 19200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:56,101-Speed 3337.57 samples/sec   Loss 6.3055   LearningRate 0.0888   Epoch: 1   Global Step: 19210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:15:59,176-Speed 3331.74 samples/sec   Loss 6.2204   LearningRate 0.0888   Epoch: 1   Global Step: 19220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:16:02,340-Speed 3236.61 samples/sec   Loss 6.2850   LearningRate 0.0888   Epoch: 1   Global Step: 19230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:16:05,405-Speed 3342.42 samples/sec   Loss 6.2492   LearningRate 0.0888   Epoch: 1   Global Step: 19240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:16:08,511-Speed 3296.77 samples/sec   Loss 6.1605   LearningRate 0.0888   Epoch: 1   Global Step: 19250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:16:11,591-Speed 3325.35 samples/sec   Loss 6.2284   LearningRate 0.0888   Epoch: 1   Global Step: 19260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:16:14,676-Speed 3319.97 samples/sec   Loss 6.2412   LearningRate 0.0888   Epoch: 1   Global Step: 19270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:16:17,761-Speed 3320.40 samples/sec   Loss 6.2178   LearningRate 0.0888   Epoch: 1   Global Step: 19280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:20,825-Speed 3342.63 samples/sec   Loss 6.2545   LearningRate 0.0888   Epoch: 1   Global Step: 19290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:23,905-Speed 3326.65 samples/sec   Loss 6.1924   LearningRate 0.0888   Epoch: 1   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:27,031-Speed 3276.09 samples/sec   Loss 6.2274   LearningRate 0.0888   Epoch: 1   Global Step: 19310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:30,183-Speed 3249.17 samples/sec   Loss 6.1383   LearningRate 0.0888   Epoch: 1   Global Step: 19320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:33,288-Speed 3298.87 samples/sec   Loss 6.1625   LearningRate 0.0888   Epoch: 1   Global Step: 19330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:36,399-Speed 3293.62 samples/sec   Loss 6.1946   LearningRate 0.0887   Epoch: 1   Global Step: 19340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:39,471-Speed 3333.35 samples/sec   Loss 6.2691   LearningRate 0.0887   Epoch: 1   Global Step: 19350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:42,543-Speed 3333.92 samples/sec   Loss 6.1706   LearningRate 0.0887   Epoch: 1   Global Step: 19360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:45,649-Speed 3297.59 samples/sec   Loss 6.2657   LearningRate 0.0887   Epoch: 1   Global Step: 19370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:48,715-Speed 3340.54 samples/sec   Loss 6.1789   LearningRate 0.0887   Epoch: 1   Global Step: 19380   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:16:51,791-Speed 3330.62 samples/sec   Loss 6.1041   LearningRate 0.0887   Epoch: 1   Global Step: 19390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:54,863-Speed 3333.85 samples/sec   Loss 6.1945   LearningRate 0.0887   Epoch: 1   Global Step: 19400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:16:57,926-Speed 3344.40 samples/sec   Loss 6.2586   LearningRate 0.0887   Epoch: 1   Global Step: 19410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:01,033-Speed 3296.72 samples/sec   Loss 6.1565   LearningRate 0.0887   Epoch: 1   Global Step: 19420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:04,103-Speed 3335.98 samples/sec   Loss 6.0651   LearningRate 0.0887   Epoch: 1   Global Step: 19430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:07,166-Speed 3343.19 samples/sec   Loss 6.2269   LearningRate 0.0887   Epoch: 1   Global Step: 19440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:10,276-Speed 3293.81 samples/sec   Loss 6.1413   LearningRate 0.0887   Epoch: 1   Global Step: 19450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:13,388-Speed 3290.54 samples/sec   Loss 6.2407   LearningRate 0.0887   Epoch: 1   Global Step: 19460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:16,464-Speed 3329.77 samples/sec   Loss 6.1828   LearningRate 0.0887   Epoch: 1   Global Step: 19470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:19,539-Speed 3331.96 samples/sec   Loss 6.1669   LearningRate 0.0887   Epoch: 1   Global Step: 19480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:22,603-Speed 3342.80 samples/sec   Loss 6.1741   LearningRate 0.0887   Epoch: 1   Global Step: 19490   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:17:25,670-Speed 3339.43 samples/sec   Loss 6.1333   LearningRate 0.0887   Epoch: 1   Global Step: 19500   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:17:28,737-Speed 3339.36 samples/sec   Loss 6.2385   LearningRate 0.0887   Epoch: 1   Global Step: 19510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:31,819-Speed 3323.57 samples/sec   Loss 6.1760   LearningRate 0.0886   Epoch: 1   Global Step: 19520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:34,955-Speed 3265.87 samples/sec   Loss 6.1841   LearningRate 0.0886   Epoch: 1   Global Step: 19530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:38,150-Speed 3204.91 samples/sec   Loss 6.3007   LearningRate 0.0886   Epoch: 1   Global Step: 19540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:41,388-Speed 3163.99 samples/sec   Loss 6.0782   LearningRate 0.0886   Epoch: 1   Global Step: 19550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:44,536-Speed 3253.73 samples/sec   Loss 6.1443   LearningRate 0.0886   Epoch: 1   Global Step: 19560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:47,654-Speed 3284.52 samples/sec   Loss 6.1771   LearningRate 0.0886   Epoch: 1   Global Step: 19570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:50,723-Speed 3337.26 samples/sec   Loss 6.1185   LearningRate 0.0886   Epoch: 1   Global Step: 19580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:53,790-Speed 3339.92 samples/sec   Loss 6.2605   LearningRate 0.0886   Epoch: 1   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:17:56,867-Speed 3328.41 samples/sec   Loss 6.1414   LearningRate 0.0886   Epoch: 1   Global Step: 19600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:18:00,018-Speed 3250.74 samples/sec   Loss 6.1937   LearningRate 0.0886   Epoch: 1   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:18:03,111-Speed 3311.22 samples/sec   Loss 6.1532   LearningRate 0.0886   Epoch: 1   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:18:06,181-Speed 3336.76 samples/sec   Loss 6.1525   LearningRate 0.0886   Epoch: 1   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:18:09,266-Speed 3319.60 samples/sec   Loss 6.2023   LearningRate 0.0886   Epoch: 1   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:18:12,360-Speed 3310.56 samples/sec   Loss 6.0577   LearningRate 0.0886   Epoch: 1   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:18:15,425-Speed 3341.73 samples/sec   Loss 6.1718   LearningRate 0.0886   Epoch: 1   Global Step: 19660   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:18:18,490-Speed 3342.07 samples/sec   Loss 6.2417   LearningRate 0.0886   Epoch: 1   Global Step: 19670   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:18:21,558-Speed 3338.20 samples/sec   Loss 6.2302   LearningRate 0.0886   Epoch: 1   Global Step: 19680   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:18:24,628-Speed 3336.86 samples/sec   Loss 6.2208   LearningRate 0.0886   Epoch: 1   Global Step: 19690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:18:27,691-Speed 3344.04 samples/sec   Loss 6.1794   LearningRate 0.0885   Epoch: 1   Global Step: 19700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:18:30,754-Speed 3343.79 samples/sec   Loss 6.1759   LearningRate 0.0885   Epoch: 1   Global Step: 19710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:18:33,830-Speed 3329.91 samples/sec   Loss 6.1864   LearningRate 0.0885   Epoch: 1   Global Step: 19720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:18:36,892-Speed 3344.51 samples/sec   Loss 6.1432   LearningRate 0.0885   Epoch: 1   Global Step: 19730   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:18:39,957-Speed 3342.24 samples/sec   Loss 6.2077   LearningRate 0.0885   Epoch: 1   Global Step: 19740   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:18:43,025-Speed 3338.11 samples/sec   Loss 6.1488   LearningRate 0.0885   Epoch: 1   Global Step: 19750   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:18:46,093-Speed 3338.48 samples/sec   Loss 6.1890   LearningRate 0.0885   Epoch: 1   Global Step: 19760   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:18:49,162-Speed 3337.00 samples/sec   Loss 6.2405   LearningRate 0.0885   Epoch: 1   Global Step: 19770   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:18:52,247-Speed 3320.66 samples/sec   Loss 6.1889   LearningRate 0.0885   Epoch: 1   Global Step: 19780   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:18:55,321-Speed 3331.94 samples/sec   Loss 6.1793   LearningRate 0.0885   Epoch: 1   Global Step: 19790   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:18:58,391-Speed 3336.15 samples/sec   Loss 6.1323   LearningRate 0.0885   Epoch: 1   Global Step: 19800   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:19:01,460-Speed 3337.07 samples/sec   Loss 6.2015   LearningRate 0.0885   Epoch: 1   Global Step: 19810   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:19:04,524-Speed 3342.64 samples/sec   Loss 6.0994   LearningRate 0.0885   Epoch: 1   Global Step: 19820   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-04-11 01:19:07,598-Speed 3332.49 samples/sec   Loss 6.2039   LearningRate 0.0885   Epoch: 1   Global Step: 19830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:10,725-Speed 3276.01 samples/sec   Loss 6.0910   LearningRate 0.0885   Epoch: 1   Global Step: 19840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:13,877-Speed 3248.93 samples/sec   Loss 6.1214   LearningRate 0.0885   Epoch: 1   Global Step: 19850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:16,980-Speed 3300.45 samples/sec   Loss 6.2148   LearningRate 0.0885   Epoch: 1   Global Step: 19860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:20,081-Speed 3303.67 samples/sec   Loss 6.1860   LearningRate 0.0884   Epoch: 1   Global Step: 19870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:23,187-Speed 3297.52 samples/sec   Loss 6.0580   LearningRate 0.0884   Epoch: 1   Global Step: 19880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:26,345-Speed 3242.92 samples/sec   Loss 6.2215   LearningRate 0.0884   Epoch: 1   Global Step: 19890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:29,481-Speed 3265.84 samples/sec   Loss 6.1275   LearningRate 0.0884   Epoch: 1   Global Step: 19900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:32,578-Speed 3307.79 samples/sec   Loss 6.1003   LearningRate 0.0884   Epoch: 1   Global Step: 19910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:35,684-Speed 3297.71 samples/sec   Loss 6.1147   LearningRate 0.0884   Epoch: 1   Global Step: 19920   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:19:38,780-Speed 3308.24 samples/sec   Loss 6.1840   LearningRate 0.0884   Epoch: 1   Global Step: 19930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:19:41,864-Speed 3321.81 samples/sec   Loss 6.1509   LearningRate 0.0884   Epoch: 1   Global Step: 19940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:19:44,949-Speed 3319.86 samples/sec   Loss 6.1591   LearningRate 0.0884   Epoch: 1   Global Step: 19950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:19:48,021-Speed 3333.89 samples/sec   Loss 6.1362   LearningRate 0.0884   Epoch: 1   Global Step: 19960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:19:51,105-Speed 3320.32 samples/sec   Loss 6.1439   LearningRate 0.0884   Epoch: 1   Global Step: 19970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:19:54,206-Speed 3303.84 samples/sec   Loss 6.1415   LearningRate 0.0884   Epoch: 1   Global Step: 19980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:19:57,276-Speed 3335.67 samples/sec   Loss 6.0912   LearningRate 0.0884   Epoch: 1   Global Step: 19990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:20:00,356-Speed 3326.25 samples/sec   Loss 6.0817   LearningRate 0.0884   Epoch: 1   Global Step: 20000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:20:43,932-[lfw][20000]XNorm: 21.738991
Training: 2022-04-11 01:20:43,933-[lfw][20000]Accuracy-Flip: 0.99667+-0.00316
Training: 2022-04-11 01:20:43,933-[lfw][20000]Accuracy-Highest: 0.99733
Training: 2022-04-11 01:21:34,444-[cfp_fp][20000]XNorm: 19.928066
Training: 2022-04-11 01:21:34,444-[cfp_fp][20000]Accuracy-Flip: 0.96800+-0.00762
Training: 2022-04-11 01:21:34,445-[cfp_fp][20000]Accuracy-Highest: 0.96800
Training: 2022-04-11 01:22:17,829-[agedb_30][20000]XNorm: 21.824821
Training: 2022-04-11 01:22:17,830-[agedb_30][20000]Accuracy-Flip: 0.96933+-0.00961
Training: 2022-04-11 01:22:17,830-[agedb_30][20000]Accuracy-Highest: 0.96933
Training: 2022-04-11 01:22:20,897-Speed 72.86 samples/sec   Loss 6.1835   LearningRate 0.0884   Epoch: 1   Global Step: 20010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:22:24,053-Speed 3245.45 samples/sec   Loss 6.0538   LearningRate 0.0884   Epoch: 1   Global Step: 20020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:22:27,166-Speed 3289.91 samples/sec   Loss 6.2193   LearningRate 0.0884   Epoch: 1   Global Step: 20030   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:22:30,235-Speed 3337.37 samples/sec   Loss 6.0631   LearningRate 0.0884   Epoch: 1   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:22:33,311-Speed 3329.78 samples/sec   Loss 6.2750   LearningRate 0.0883   Epoch: 1   Global Step: 20050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:22:36,360-Speed 3358.84 samples/sec   Loss 6.1789   LearningRate 0.0883   Epoch: 1   Global Step: 20060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:22:39,421-Speed 3346.04 samples/sec   Loss 6.0944   LearningRate 0.0883   Epoch: 1   Global Step: 20070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:22:42,515-Speed 3311.46 samples/sec   Loss 6.0784   LearningRate 0.0883   Epoch: 1   Global Step: 20080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:22:45,663-Speed 3253.14 samples/sec   Loss 6.1407   LearningRate 0.0883   Epoch: 1   Global Step: 20090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:22:48,727-Speed 3343.44 samples/sec   Loss 6.1628   LearningRate 0.0883   Epoch: 1   Global Step: 20100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:22:51,791-Speed 3342.28 samples/sec   Loss 6.1399   LearningRate 0.0883   Epoch: 1   Global Step: 20110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:22:54,852-Speed 3346.45 samples/sec   Loss 6.0979   LearningRate 0.0883   Epoch: 1   Global Step: 20120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:22:57,937-Speed 3319.88 samples/sec   Loss 6.1438   LearningRate 0.0883   Epoch: 1   Global Step: 20130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:01,046-Speed 3294.26 samples/sec   Loss 6.1687   LearningRate 0.0883   Epoch: 1   Global Step: 20140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:04,119-Speed 3332.86 samples/sec   Loss 6.1693   LearningRate 0.0883   Epoch: 1   Global Step: 20150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:07,186-Speed 3339.93 samples/sec   Loss 6.2443   LearningRate 0.0883   Epoch: 1   Global Step: 20160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:23:10,300-Speed 3288.98 samples/sec   Loss 6.1653   LearningRate 0.0883   Epoch: 1   Global Step: 20170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:23:13,416-Speed 3287.73 samples/sec   Loss 6.2132   LearningRate 0.0883   Epoch: 1   Global Step: 20180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:23:16,524-Speed 3294.48 samples/sec   Loss 6.0163   LearningRate 0.0883   Epoch: 1   Global Step: 20190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:23:19,615-Speed 3314.44 samples/sec   Loss 6.1774   LearningRate 0.0883   Epoch: 1   Global Step: 20200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:23:22,706-Speed 3313.97 samples/sec   Loss 6.1714   LearningRate 0.0883   Epoch: 1   Global Step: 20210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:23:25,798-Speed 3311.61 samples/sec   Loss 6.1709   LearningRate 0.0883   Epoch: 1   Global Step: 20220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:28,966-Speed 3233.30 samples/sec   Loss 6.0889   LearningRate 0.0882   Epoch: 1   Global Step: 20230   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:32,057-Speed 3314.12 samples/sec   Loss 6.2005   LearningRate 0.0882   Epoch: 1   Global Step: 20240   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:35,132-Speed 3330.43 samples/sec   Loss 6.1938   LearningRate 0.0882   Epoch: 1   Global Step: 20250   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:38,224-Speed 3312.66 samples/sec   Loss 6.1582   LearningRate 0.0882   Epoch: 1   Global Step: 20260   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:41,303-Speed 3327.21 samples/sec   Loss 6.1162   LearningRate 0.0882   Epoch: 1   Global Step: 20270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:44,364-Speed 3346.46 samples/sec   Loss 6.0694   LearningRate 0.0882   Epoch: 1   Global Step: 20280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:47,431-Speed 3339.94 samples/sec   Loss 6.2711   LearningRate 0.0882   Epoch: 1   Global Step: 20290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:50,492-Speed 3346.00 samples/sec   Loss 6.1672   LearningRate 0.0882   Epoch: 1   Global Step: 20300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:53,559-Speed 3339.28 samples/sec   Loss 6.1425   LearningRate 0.0882   Epoch: 1   Global Step: 20310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:23:56,627-Speed 3339.19 samples/sec   Loss 6.1035   LearningRate 0.0882   Epoch: 1   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:23:59,687-Speed 3346.79 samples/sec   Loss 6.0530   LearningRate 0.0882   Epoch: 1   Global Step: 20330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:02,767-Speed 3325.68 samples/sec   Loss 6.1754   LearningRate 0.0882   Epoch: 1   Global Step: 20340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:05,854-Speed 3317.34 samples/sec   Loss 6.2412   LearningRate 0.0882   Epoch: 1   Global Step: 20350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:08,975-Speed 3281.96 samples/sec   Loss 6.2014   LearningRate 0.0882   Epoch: 1   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:12,034-Speed 3348.23 samples/sec   Loss 6.0817   LearningRate 0.0882   Epoch: 1   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:15,120-Speed 3319.69 samples/sec   Loss 6.0842   LearningRate 0.0882   Epoch: 1   Global Step: 20380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:18,179-Speed 3347.91 samples/sec   Loss 6.1603   LearningRate 0.0882   Epoch: 1   Global Step: 20390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:21,245-Speed 3340.51 samples/sec   Loss 6.1984   LearningRate 0.0882   Epoch: 1   Global Step: 20400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:24,312-Speed 3339.87 samples/sec   Loss 6.1031   LearningRate 0.0881   Epoch: 1   Global Step: 20410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:27,392-Speed 3324.91 samples/sec   Loss 6.1431   LearningRate 0.0881   Epoch: 1   Global Step: 20420   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:24:30,448-Speed 3355.76 samples/sec   Loss 6.1255   LearningRate 0.0881   Epoch: 1   Global Step: 20430   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:24:33,516-Speed 3338.10 samples/sec   Loss 6.0665   LearningRate 0.0881   Epoch: 1   Global Step: 20440   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:24:36,597-Speed 3325.08 samples/sec   Loss 6.2112   LearningRate 0.0881   Epoch: 1   Global Step: 20450   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:24:39,742-Speed 3256.64 samples/sec   Loss 6.1962   LearningRate 0.0881   Epoch: 1   Global Step: 20460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:42,889-Speed 3254.71 samples/sec   Loss 6.2253   LearningRate 0.0881   Epoch: 1   Global Step: 20470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:45,945-Speed 3351.88 samples/sec   Loss 6.1838   LearningRate 0.0881   Epoch: 1   Global Step: 20480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:49,011-Speed 3340.50 samples/sec   Loss 6.1845   LearningRate 0.0881   Epoch: 1   Global Step: 20490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:52,077-Speed 3340.98 samples/sec   Loss 6.0796   LearningRate 0.0881   Epoch: 1   Global Step: 20500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:55,174-Speed 3307.39 samples/sec   Loss 6.2474   LearningRate 0.0881   Epoch: 1   Global Step: 20510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:24:58,252-Speed 3327.43 samples/sec   Loss 6.0293   LearningRate 0.0881   Epoch: 1   Global Step: 20520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:01,332-Speed 3325.57 samples/sec   Loss 6.0324   LearningRate 0.0881   Epoch: 1   Global Step: 20530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:04,390-Speed 3349.06 samples/sec   Loss 6.0863   LearningRate 0.0881   Epoch: 1   Global Step: 20540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:07,477-Speed 3317.93 samples/sec   Loss 6.2033   LearningRate 0.0881   Epoch: 1   Global Step: 20550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:10,555-Speed 3328.13 samples/sec   Loss 6.1108   LearningRate 0.0881   Epoch: 1   Global Step: 20560   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:25:13,609-Speed 3354.13 samples/sec   Loss 6.1587   LearningRate 0.0881   Epoch: 1   Global Step: 20570   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:25:16,675-Speed 3340.11 samples/sec   Loss 6.1319   LearningRate 0.0881   Epoch: 1   Global Step: 20580   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:25:19,718-Speed 3365.76 samples/sec   Loss 6.2053   LearningRate 0.0880   Epoch: 1   Global Step: 20590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:22,772-Speed 3353.42 samples/sec   Loss 6.1768   LearningRate 0.0880   Epoch: 1   Global Step: 20600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:25,836-Speed 3343.00 samples/sec   Loss 6.1062   LearningRate 0.0880   Epoch: 1   Global Step: 20610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:28,918-Speed 3323.35 samples/sec   Loss 6.2209   LearningRate 0.0880   Epoch: 1   Global Step: 20620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:32,037-Speed 3283.46 samples/sec   Loss 6.1263   LearningRate 0.0880   Epoch: 1   Global Step: 20630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:35,145-Speed 3295.53 samples/sec   Loss 6.1373   LearningRate 0.0880   Epoch: 1   Global Step: 20640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:38,210-Speed 3342.61 samples/sec   Loss 6.1067   LearningRate 0.0880   Epoch: 1   Global Step: 20650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:41,270-Speed 3346.68 samples/sec   Loss 6.2015   LearningRate 0.0880   Epoch: 1   Global Step: 20660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:44,346-Speed 3330.00 samples/sec   Loss 6.0536   LearningRate 0.0880   Epoch: 1   Global Step: 20670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:47,421-Speed 3331.37 samples/sec   Loss 6.1306   LearningRate 0.0880   Epoch: 1   Global Step: 20680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:25:50,480-Speed 3347.47 samples/sec   Loss 5.9823   LearningRate 0.0880   Epoch: 1   Global Step: 20690   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:25:53,666-Speed 3215.46 samples/sec   Loss 6.1821   LearningRate 0.0880   Epoch: 1   Global Step: 20700   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:25:56,725-Speed 3347.20 samples/sec   Loss 6.0604   LearningRate 0.0880   Epoch: 1   Global Step: 20710   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:25:59,782-Speed 3351.58 samples/sec   Loss 6.1311   LearningRate 0.0880   Epoch: 1   Global Step: 20720   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:26:02,855-Speed 3332.28 samples/sec   Loss 6.0652   LearningRate 0.0880   Epoch: 1   Global Step: 20730   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:26:05,916-Speed 3346.90 samples/sec   Loss 6.1289   LearningRate 0.0880   Epoch: 1   Global Step: 20740   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:26:08,981-Speed 3341.12 samples/sec   Loss 6.0686   LearningRate 0.0880   Epoch: 1   Global Step: 20750   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:26:12,045-Speed 3343.16 samples/sec   Loss 6.1241   LearningRate 0.0879   Epoch: 1   Global Step: 20760   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:26:15,175-Speed 3272.01 samples/sec   Loss 6.1387   LearningRate 0.0879   Epoch: 1   Global Step: 20770   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:26:18,273-Speed 3306.09 samples/sec   Loss 6.1153   LearningRate 0.0879   Epoch: 1   Global Step: 20780   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:26:21,328-Speed 3353.50 samples/sec   Loss 6.0711   LearningRate 0.0879   Epoch: 1   Global Step: 20790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:24,446-Speed 3284.61 samples/sec   Loss 6.1325   LearningRate 0.0879   Epoch: 1   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:27,514-Speed 3337.77 samples/sec   Loss 6.0958   LearningRate 0.0879   Epoch: 1   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:30,574-Speed 3347.43 samples/sec   Loss 6.1048   LearningRate 0.0879   Epoch: 1   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:33,631-Speed 3350.69 samples/sec   Loss 6.1028   LearningRate 0.0879   Epoch: 1   Global Step: 20830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:36,689-Speed 3349.82 samples/sec   Loss 6.0490   LearningRate 0.0879   Epoch: 1   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:39,797-Speed 3295.60 samples/sec   Loss 6.2324   LearningRate 0.0879   Epoch: 1   Global Step: 20850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:42,863-Speed 3340.13 samples/sec   Loss 6.1694   LearningRate 0.0879   Epoch: 1   Global Step: 20860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:45,937-Speed 3331.96 samples/sec   Loss 6.1528   LearningRate 0.0879   Epoch: 1   Global Step: 20870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:49,019-Speed 3324.17 samples/sec   Loss 6.0872   LearningRate 0.0879   Epoch: 1   Global Step: 20880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:26:52,077-Speed 3348.60 samples/sec   Loss 5.9636   LearningRate 0.0879   Epoch: 1   Global Step: 20890   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:26:55,132-Speed 3352.72 samples/sec   Loss 6.0862   LearningRate 0.0879   Epoch: 1   Global Step: 20900   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:26:58,251-Speed 3284.56 samples/sec   Loss 6.1693   LearningRate 0.0879   Epoch: 1   Global Step: 20910   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:27:01,315-Speed 3342.25 samples/sec   Loss 6.0357   LearningRate 0.0879   Epoch: 1   Global Step: 20920   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:27:04,375-Speed 3348.00 samples/sec   Loss 6.0484   LearningRate 0.0879   Epoch: 1   Global Step: 20930   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:27:07,490-Speed 3287.90 samples/sec   Loss 6.0653   LearningRate 0.0878   Epoch: 1   Global Step: 20940   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:27:10,553-Speed 3343.67 samples/sec   Loss 6.1091   LearningRate 0.0878   Epoch: 1   Global Step: 20950   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:27:13,606-Speed 3354.99 samples/sec   Loss 6.0276   LearningRate 0.0878   Epoch: 1   Global Step: 20960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:16,685-Speed 3325.93 samples/sec   Loss 6.1132   LearningRate 0.0878   Epoch: 1   Global Step: 20970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:19,777-Speed 3312.94 samples/sec   Loss 6.0918   LearningRate 0.0878   Epoch: 1   Global Step: 20980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:22,916-Speed 3262.97 samples/sec   Loss 6.0394   LearningRate 0.0878   Epoch: 1   Global Step: 20990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:25,989-Speed 3333.68 samples/sec   Loss 6.0817   LearningRate 0.0878   Epoch: 1   Global Step: 21000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:29,084-Speed 3308.37 samples/sec   Loss 6.0315   LearningRate 0.0878   Epoch: 1   Global Step: 21010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:32,146-Speed 3345.73 samples/sec   Loss 6.1352   LearningRate 0.0878   Epoch: 1   Global Step: 21020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:35,227-Speed 3323.72 samples/sec   Loss 6.1745   LearningRate 0.0878   Epoch: 1   Global Step: 21030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:38,300-Speed 3333.33 samples/sec   Loss 6.0832   LearningRate 0.0878   Epoch: 1   Global Step: 21040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:41,409-Speed 3294.18 samples/sec   Loss 6.0229   LearningRate 0.0878   Epoch: 1   Global Step: 21050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:44,547-Speed 3264.43 samples/sec   Loss 6.0027   LearningRate 0.0878   Epoch: 1   Global Step: 21060   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:27:47,617-Speed 3336.31 samples/sec   Loss 5.9963   LearningRate 0.0878   Epoch: 1   Global Step: 21070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:50,681-Speed 3342.37 samples/sec   Loss 6.0266   LearningRate 0.0878   Epoch: 1   Global Step: 21080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:53,823-Speed 3260.41 samples/sec   Loss 6.0157   LearningRate 0.0878   Epoch: 1   Global Step: 21090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:56,925-Speed 3302.11 samples/sec   Loss 5.9609   LearningRate 0.0878   Epoch: 1   Global Step: 21100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:27:59,990-Speed 3340.92 samples/sec   Loss 6.0980   LearningRate 0.0878   Epoch: 1   Global Step: 21110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:03,049-Speed 3348.45 samples/sec   Loss 6.0805   LearningRate 0.0877   Epoch: 1   Global Step: 21120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:06,114-Speed 3341.52 samples/sec   Loss 6.0067   LearningRate 0.0877   Epoch: 1   Global Step: 21130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:09,180-Speed 3341.02 samples/sec   Loss 5.9695   LearningRate 0.0877   Epoch: 1   Global Step: 21140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:12,301-Speed 3281.66 samples/sec   Loss 5.9910   LearningRate 0.0877   Epoch: 1   Global Step: 21150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:15,433-Speed 3269.69 samples/sec   Loss 6.0081   LearningRate 0.0877   Epoch: 1   Global Step: 21160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:18,497-Speed 3343.94 samples/sec   Loss 6.0164   LearningRate 0.0877   Epoch: 1   Global Step: 21170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:21,657-Speed 3241.48 samples/sec   Loss 5.9996   LearningRate 0.0877   Epoch: 1   Global Step: 21180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:24,771-Speed 3288.17 samples/sec   Loss 6.0039   LearningRate 0.0877   Epoch: 1   Global Step: 21190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:27,833-Speed 3345.24 samples/sec   Loss 6.0848   LearningRate 0.0877   Epoch: 1   Global Step: 21200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:30,889-Speed 3351.32 samples/sec   Loss 5.9973   LearningRate 0.0877   Epoch: 1   Global Step: 21210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:33,946-Speed 3350.31 samples/sec   Loss 6.0609   LearningRate 0.0877   Epoch: 1   Global Step: 21220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:37,011-Speed 3341.80 samples/sec   Loss 5.9886   LearningRate 0.0877   Epoch: 1   Global Step: 21230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:40,067-Speed 3351.95 samples/sec   Loss 5.9676   LearningRate 0.0877   Epoch: 1   Global Step: 21240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:43,136-Speed 3337.17 samples/sec   Loss 6.0166   LearningRate 0.0877   Epoch: 1   Global Step: 21250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:46,233-Speed 3308.02 samples/sec   Loss 6.1105   LearningRate 0.0877   Epoch: 1   Global Step: 21260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:28:49,278-Speed 3363.42 samples/sec   Loss 6.0626   LearningRate 0.0877   Epoch: 1   Global Step: 21270   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:28:52,338-Speed 3347.72 samples/sec   Loss 6.1127   LearningRate 0.0877   Epoch: 1   Global Step: 21280   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:28:55,398-Speed 3347.12 samples/sec   Loss 6.0340   LearningRate 0.0877   Epoch: 1   Global Step: 21290   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:28:58,489-Speed 3312.56 samples/sec   Loss 6.0824   LearningRate 0.0876   Epoch: 1   Global Step: 21300   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:29:01,570-Speed 3324.50 samples/sec   Loss 5.9999   LearningRate 0.0876   Epoch: 1   Global Step: 21310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:29:04,629-Speed 3348.63 samples/sec   Loss 6.0382   LearningRate 0.0876   Epoch: 1   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:29:07,696-Speed 3339.27 samples/sec   Loss 6.0159   LearningRate 0.0876   Epoch: 1   Global Step: 21330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:29:10,759-Speed 3343.80 samples/sec   Loss 6.0789   LearningRate 0.0876   Epoch: 1   Global Step: 21340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:29:13,935-Speed 3225.39 samples/sec   Loss 6.0868   LearningRate 0.0876   Epoch: 1   Global Step: 21350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:29:16,997-Speed 3344.82 samples/sec   Loss 6.0064   LearningRate 0.0876   Epoch: 1   Global Step: 21360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:29:20,059-Speed 3344.73 samples/sec   Loss 6.0555   LearningRate 0.0876   Epoch: 1   Global Step: 21370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:23,124-Speed 3342.63 samples/sec   Loss 6.1074   LearningRate 0.0876   Epoch: 1   Global Step: 21380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:26,189-Speed 3341.17 samples/sec   Loss 6.0549   LearningRate 0.0876   Epoch: 1   Global Step: 21390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:29,261-Speed 3333.89 samples/sec   Loss 5.9271   LearningRate 0.0876   Epoch: 1   Global Step: 21400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:32,340-Speed 3326.69 samples/sec   Loss 6.0140   LearningRate 0.0876   Epoch: 1   Global Step: 21410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:35,415-Speed 3331.11 samples/sec   Loss 5.9923   LearningRate 0.0876   Epoch: 1   Global Step: 21420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:38,492-Speed 3329.28 samples/sec   Loss 6.0902   LearningRate 0.0876   Epoch: 1   Global Step: 21430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:41,561-Speed 3337.59 samples/sec   Loss 5.9045   LearningRate 0.0876   Epoch: 1   Global Step: 21440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:44,708-Speed 3254.05 samples/sec   Loss 5.9838   LearningRate 0.0876   Epoch: 1   Global Step: 21450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:47,786-Speed 3327.54 samples/sec   Loss 5.9781   LearningRate 0.0876   Epoch: 1   Global Step: 21460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:29:50,859-Speed 3332.93 samples/sec   Loss 6.0790   LearningRate 0.0876   Epoch: 1   Global Step: 21470   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:29:53,972-Speed 3289.52 samples/sec   Loss 6.1237   LearningRate 0.0875   Epoch: 1   Global Step: 21480   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:29:57,040-Speed 3338.89 samples/sec   Loss 6.0490   LearningRate 0.0875   Epoch: 1   Global Step: 21490   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:30:00,100-Speed 3347.19 samples/sec   Loss 5.9809   LearningRate 0.0875   Epoch: 1   Global Step: 21500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:03,185-Speed 3320.10 samples/sec   Loss 5.8732   LearningRate 0.0875   Epoch: 1   Global Step: 21510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:06,296-Speed 3292.69 samples/sec   Loss 6.0224   LearningRate 0.0875   Epoch: 1   Global Step: 21520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:09,360-Speed 3342.48 samples/sec   Loss 5.9812   LearningRate 0.0875   Epoch: 1   Global Step: 21530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:12,441-Speed 3324.22 samples/sec   Loss 6.0680   LearningRate 0.0875   Epoch: 1   Global Step: 21540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:15,514-Speed 3333.54 samples/sec   Loss 6.0544   LearningRate 0.0875   Epoch: 1   Global Step: 21550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:18,629-Speed 3287.69 samples/sec   Loss 5.9171   LearningRate 0.0875   Epoch: 1   Global Step: 21560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:21,697-Speed 3338.17 samples/sec   Loss 6.0216   LearningRate 0.0875   Epoch: 1   Global Step: 21570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:24,810-Speed 3290.66 samples/sec   Loss 5.9696   LearningRate 0.0875   Epoch: 1   Global Step: 21580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:27,940-Speed 3272.44 samples/sec   Loss 6.0506   LearningRate 0.0875   Epoch: 1   Global Step: 21590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:31,029-Speed 3315.28 samples/sec   Loss 5.9962   LearningRate 0.0875   Epoch: 1   Global Step: 21600   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:30:34,099-Speed 3337.11 samples/sec   Loss 5.9946   LearningRate 0.0875   Epoch: 1   Global Step: 21610   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:30:37,155-Speed 3350.90 samples/sec   Loss 5.9567   LearningRate 0.0875   Epoch: 1   Global Step: 21620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:40,221-Speed 3340.57 samples/sec   Loss 6.0836   LearningRate 0.0875   Epoch: 1   Global Step: 21630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:43,290-Speed 3337.74 samples/sec   Loss 6.0343   LearningRate 0.0875   Epoch: 1   Global Step: 21640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:46,359-Speed 3336.93 samples/sec   Loss 6.0383   LearningRate 0.0874   Epoch: 1   Global Step: 21650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:49,437-Speed 3327.39 samples/sec   Loss 6.0086   LearningRate 0.0874   Epoch: 1   Global Step: 21660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:52,525-Speed 3317.42 samples/sec   Loss 5.9986   LearningRate 0.0874   Epoch: 1   Global Step: 21670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:55,590-Speed 3341.55 samples/sec   Loss 5.9222   LearningRate 0.0874   Epoch: 1   Global Step: 21680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:30:58,685-Speed 3309.59 samples/sec   Loss 5.9735   LearningRate 0.0874   Epoch: 1   Global Step: 21690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:01,768-Speed 3322.69 samples/sec   Loss 5.9986   LearningRate 0.0874   Epoch: 1   Global Step: 21700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:04,834-Speed 3340.28 samples/sec   Loss 6.0066   LearningRate 0.0874   Epoch: 1   Global Step: 21710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:07,998-Speed 3236.55 samples/sec   Loss 6.0310   LearningRate 0.0874   Epoch: 1   Global Step: 21720   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:31:11,054-Speed 3352.33 samples/sec   Loss 6.0023   LearningRate 0.0874   Epoch: 1   Global Step: 21730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:14,258-Speed 3196.03 samples/sec   Loss 5.9159   LearningRate 0.0874   Epoch: 1   Global Step: 21740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:17,505-Speed 3154.99 samples/sec   Loss 6.0476   LearningRate 0.0874   Epoch: 1   Global Step: 21750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:20,701-Speed 3203.93 samples/sec   Loss 6.0529   LearningRate 0.0874   Epoch: 1   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:23,782-Speed 3324.32 samples/sec   Loss 5.9730   LearningRate 0.0874   Epoch: 1   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:26,879-Speed 3308.06 samples/sec   Loss 5.9863   LearningRate 0.0874   Epoch: 1   Global Step: 21780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:29,979-Speed 3303.68 samples/sec   Loss 6.0118   LearningRate 0.0874   Epoch: 1   Global Step: 21790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:33,087-Speed 3296.07 samples/sec   Loss 6.0155   LearningRate 0.0874   Epoch: 1   Global Step: 21800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:36,148-Speed 3346.35 samples/sec   Loss 6.0123   LearningRate 0.0874   Epoch: 1   Global Step: 21810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:39,221-Speed 3332.63 samples/sec   Loss 5.9833   LearningRate 0.0874   Epoch: 1   Global Step: 21820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:31:42,283-Speed 3344.28 samples/sec   Loss 5.9519   LearningRate 0.0873   Epoch: 1   Global Step: 21830   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 01:31:45,450-Speed 3234.17 samples/sec   Loss 5.9914   LearningRate 0.0873   Epoch: 1   Global Step: 21840   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 01:31:48,511-Speed 3346.40 samples/sec   Loss 5.9618   LearningRate 0.0873   Epoch: 1   Global Step: 21850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:31:51,586-Speed 3330.46 samples/sec   Loss 6.0144   LearningRate 0.0873   Epoch: 1   Global Step: 21860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:31:54,659-Speed 3333.89 samples/sec   Loss 6.0177   LearningRate 0.0873   Epoch: 1   Global Step: 21870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:31:57,732-Speed 3332.65 samples/sec   Loss 5.8989   LearningRate 0.0873   Epoch: 1   Global Step: 21880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:00,906-Speed 3226.68 samples/sec   Loss 6.0980   LearningRate 0.0873   Epoch: 1   Global Step: 21890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:04,031-Speed 3278.03 samples/sec   Loss 6.0500   LearningRate 0.0873   Epoch: 1   Global Step: 21900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:07,223-Speed 3208.52 samples/sec   Loss 5.9620   LearningRate 0.0873   Epoch: 1   Global Step: 21910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:10,355-Speed 3270.27 samples/sec   Loss 6.0880   LearningRate 0.0873   Epoch: 1   Global Step: 21920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:13,446-Speed 3313.08 samples/sec   Loss 5.9636   LearningRate 0.0873   Epoch: 1   Global Step: 21930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:16,588-Speed 3260.39 samples/sec   Loss 5.9533   LearningRate 0.0873   Epoch: 1   Global Step: 21940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:19,715-Speed 3275.20 samples/sec   Loss 5.9570   LearningRate 0.0873   Epoch: 1   Global Step: 21950   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 01:32:22,768-Speed 3354.83 samples/sec   Loss 5.9742   LearningRate 0.0873   Epoch: 1   Global Step: 21960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:25,897-Speed 3273.18 samples/sec   Loss 5.9370   LearningRate 0.0873   Epoch: 1   Global Step: 21970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:28,978-Speed 3324.60 samples/sec   Loss 6.0398   LearningRate 0.0873   Epoch: 1   Global Step: 21980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:32,045-Speed 3340.31 samples/sec   Loss 5.9314   LearningRate 0.0873   Epoch: 1   Global Step: 21990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:32:35,107-Speed 3344.76 samples/sec   Loss 5.9708   LearningRate 0.0873   Epoch: 1   Global Step: 22000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:33:18,737-[lfw][22000]XNorm: 22.601832
Training: 2022-04-11 01:33:18,738-[lfw][22000]Accuracy-Flip: 0.99717+-0.00248
Training: 2022-04-11 01:33:18,739-[lfw][22000]Accuracy-Highest: 0.99733
Training: 2022-04-11 01:34:09,421-[cfp_fp][22000]XNorm: 21.024974
Training: 2022-04-11 01:34:09,422-[cfp_fp][22000]Accuracy-Flip: 0.97429+-0.00579
Training: 2022-04-11 01:34:09,422-[cfp_fp][22000]Accuracy-Highest: 0.97429
Training: 2022-04-11 01:34:53,006-[agedb_30][22000]XNorm: 22.296151
Training: 2022-04-11 01:34:53,006-[agedb_30][22000]Accuracy-Flip: 0.97267+-0.00484
Training: 2022-04-11 01:34:53,007-[agedb_30][22000]Accuracy-Highest: 0.97267
Training: 2022-04-11 01:34:56,069-Speed 72.64 samples/sec   Loss 6.0887   LearningRate 0.0872   Epoch: 1   Global Step: 22010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:34:59,118-Speed 3359.21 samples/sec   Loss 6.0456   LearningRate 0.0872   Epoch: 1   Global Step: 22020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:02,169-Speed 3356.58 samples/sec   Loss 5.9794   LearningRate 0.0872   Epoch: 1   Global Step: 22030   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:05,218-Speed 3359.93 samples/sec   Loss 5.8886   LearningRate 0.0872   Epoch: 1   Global Step: 22040   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:08,266-Speed 3360.31 samples/sec   Loss 6.0147   LearningRate 0.0872   Epoch: 1   Global Step: 22050   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:11,324-Speed 3348.92 samples/sec   Loss 5.9088   LearningRate 0.0872   Epoch: 1   Global Step: 22060   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:14,375-Speed 3356.91 samples/sec   Loss 5.9634   LearningRate 0.0872   Epoch: 1   Global Step: 22070   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:17,427-Speed 3356.51 samples/sec   Loss 6.0279   LearningRate 0.0872   Epoch: 1   Global Step: 22080   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:20,477-Speed 3357.93 samples/sec   Loss 6.0228   LearningRate 0.0872   Epoch: 1   Global Step: 22090   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:23,534-Speed 3350.23 samples/sec   Loss 6.0241   LearningRate 0.0872   Epoch: 1   Global Step: 22100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:35:26,611-Speed 3329.12 samples/sec   Loss 6.0761   LearningRate 0.0872   Epoch: 1   Global Step: 22110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:35:29,667-Speed 3350.65 samples/sec   Loss 6.0137   LearningRate 0.0872   Epoch: 1   Global Step: 22120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:35:32,717-Speed 3358.00 samples/sec   Loss 6.0386   LearningRate 0.0872   Epoch: 1   Global Step: 22130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:35,773-Speed 3352.03 samples/sec   Loss 5.9814   LearningRate 0.0872   Epoch: 1   Global Step: 22140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:38,845-Speed 3333.55 samples/sec   Loss 6.0038   LearningRate 0.0872   Epoch: 1   Global Step: 22150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:41,914-Speed 3337.66 samples/sec   Loss 5.9942   LearningRate 0.0872   Epoch: 1   Global Step: 22160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:45,022-Speed 3295.82 samples/sec   Loss 5.9378   LearningRate 0.0872   Epoch: 1   Global Step: 22170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:48,086-Speed 3343.42 samples/sec   Loss 5.9391   LearningRate 0.0872   Epoch: 1   Global Step: 22180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:51,187-Speed 3301.96 samples/sec   Loss 5.9219   LearningRate 0.0871   Epoch: 1   Global Step: 22190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:54,269-Speed 3323.69 samples/sec   Loss 5.8835   LearningRate 0.0871   Epoch: 1   Global Step: 22200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:35:57,338-Speed 3337.59 samples/sec   Loss 5.9183   LearningRate 0.0871   Epoch: 1   Global Step: 22210   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:36:00,407-Speed 3336.46 samples/sec   Loss 6.0032   LearningRate 0.0871   Epoch: 1   Global Step: 22220   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:36:03,483-Speed 3330.16 samples/sec   Loss 5.8678   LearningRate 0.0871   Epoch: 1   Global Step: 22230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:06,566-Speed 3321.93 samples/sec   Loss 6.0652   LearningRate 0.0871   Epoch: 1   Global Step: 22240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:09,647-Speed 3325.39 samples/sec   Loss 5.9531   LearningRate 0.0871   Epoch: 1   Global Step: 22250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:12,742-Speed 3309.12 samples/sec   Loss 5.8555   LearningRate 0.0871   Epoch: 1   Global Step: 22260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:15,820-Speed 3327.40 samples/sec   Loss 5.9317   LearningRate 0.0871   Epoch: 1   Global Step: 22270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:18,972-Speed 3249.95 samples/sec   Loss 5.8568   LearningRate 0.0871   Epoch: 1   Global Step: 22280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:22,152-Speed 3221.03 samples/sec   Loss 5.9272   LearningRate 0.0871   Epoch: 1   Global Step: 22290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:25,989-Speed 2668.81 samples/sec   Loss 5.9729   LearningRate 0.0871   Epoch: 1   Global Step: 22300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:29,140-Speed 3250.52 samples/sec   Loss 5.8919   LearningRate 0.0871   Epoch: 1   Global Step: 22310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:32,201-Speed 3345.69 samples/sec   Loss 5.8381   LearningRate 0.0871   Epoch: 1   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:35,289-Speed 3316.92 samples/sec   Loss 5.9114   LearningRate 0.0871   Epoch: 1   Global Step: 22330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:38,466-Speed 3223.62 samples/sec   Loss 5.9955   LearningRate 0.0871   Epoch: 1   Global Step: 22340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:41,553-Speed 3318.00 samples/sec   Loss 5.9520   LearningRate 0.0871   Epoch: 1   Global Step: 22350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:44,627-Speed 3332.61 samples/sec   Loss 5.8734   LearningRate 0.0871   Epoch: 1   Global Step: 22360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:47,789-Speed 3239.03 samples/sec   Loss 5.8622   LearningRate 0.0870   Epoch: 1   Global Step: 22370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:50,859-Speed 3336.60 samples/sec   Loss 5.9076   LearningRate 0.0870   Epoch: 1   Global Step: 22380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:53,956-Speed 3307.18 samples/sec   Loss 5.9613   LearningRate 0.0870   Epoch: 1   Global Step: 22390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:36:57,044-Speed 3316.83 samples/sec   Loss 5.8952   LearningRate 0.0870   Epoch: 1   Global Step: 22400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:00,109-Speed 3341.20 samples/sec   Loss 5.8218   LearningRate 0.0870   Epoch: 1   Global Step: 22410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:03,169-Speed 3347.03 samples/sec   Loss 5.8975   LearningRate 0.0870   Epoch: 1   Global Step: 22420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:06,310-Speed 3261.86 samples/sec   Loss 5.8856   LearningRate 0.0870   Epoch: 1   Global Step: 22430   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:37:09,381-Speed 3334.39 samples/sec   Loss 5.9196   LearningRate 0.0870   Epoch: 1   Global Step: 22440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:12,471-Speed 3314.82 samples/sec   Loss 5.9620   LearningRate 0.0870   Epoch: 1   Global Step: 22450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:15,563-Speed 3312.80 samples/sec   Loss 5.9838   LearningRate 0.0870   Epoch: 1   Global Step: 22460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:18,632-Speed 3337.17 samples/sec   Loss 5.9208   LearningRate 0.0870   Epoch: 1   Global Step: 22470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:21,724-Speed 3313.32 samples/sec   Loss 5.9451   LearningRate 0.0870   Epoch: 1   Global Step: 22480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:24,813-Speed 3315.81 samples/sec   Loss 5.9543   LearningRate 0.0870   Epoch: 1   Global Step: 22490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:27,871-Speed 3349.21 samples/sec   Loss 5.8914   LearningRate 0.0870   Epoch: 1   Global Step: 22500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:30,950-Speed 3326.44 samples/sec   Loss 5.9475   LearningRate 0.0870   Epoch: 1   Global Step: 22510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:34,021-Speed 3334.48 samples/sec   Loss 5.8613   LearningRate 0.0870   Epoch: 1   Global Step: 22520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:37,090-Speed 3338.04 samples/sec   Loss 5.9378   LearningRate 0.0870   Epoch: 1   Global Step: 22530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:40,158-Speed 3339.00 samples/sec   Loss 5.9455   LearningRate 0.0870   Epoch: 1   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:43,233-Speed 3331.25 samples/sec   Loss 5.8967   LearningRate 0.0869   Epoch: 1   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:46,302-Speed 3336.61 samples/sec   Loss 5.8805   LearningRate 0.0869   Epoch: 1   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:49,362-Speed 3347.07 samples/sec   Loss 5.8936   LearningRate 0.0869   Epoch: 1   Global Step: 22570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:52,490-Speed 3274.79 samples/sec   Loss 5.9778   LearningRate 0.0869   Epoch: 1   Global Step: 22580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:55,566-Speed 3329.56 samples/sec   Loss 5.8441   LearningRate 0.0869   Epoch: 1   Global Step: 22590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:37:58,630-Speed 3342.70 samples/sec   Loss 5.9083   LearningRate 0.0869   Epoch: 1   Global Step: 22600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:38:01,702-Speed 3334.93 samples/sec   Loss 6.0094   LearningRate 0.0869   Epoch: 1   Global Step: 22610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:38:04,766-Speed 3342.74 samples/sec   Loss 5.8947   LearningRate 0.0869   Epoch: 1   Global Step: 22620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:38:07,840-Speed 3331.48 samples/sec   Loss 5.9256   LearningRate 0.0869   Epoch: 1   Global Step: 22630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:38:10,912-Speed 3334.48 samples/sec   Loss 5.9322   LearningRate 0.0869   Epoch: 1   Global Step: 22640   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:38:14,032-Speed 3282.71 samples/sec   Loss 5.8551   LearningRate 0.0869   Epoch: 1   Global Step: 22650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:38:17,180-Speed 3253.90 samples/sec   Loss 5.9797   LearningRate 0.0869   Epoch: 1   Global Step: 22660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:38:20,264-Speed 3320.52 samples/sec   Loss 5.8440   LearningRate 0.0869   Epoch: 1   Global Step: 22670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:38:23,357-Speed 3311.30 samples/sec   Loss 5.8516   LearningRate 0.0869   Epoch: 1   Global Step: 22680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:38:26,444-Speed 3318.46 samples/sec   Loss 5.8860   LearningRate 0.0869   Epoch: 1   Global Step: 22690   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:29,561-Speed 3285.81 samples/sec   Loss 5.8425   LearningRate 0.0869   Epoch: 1   Global Step: 22700   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:32,708-Speed 3254.27 samples/sec   Loss 5.8216   LearningRate 0.0869   Epoch: 1   Global Step: 22710   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:35,776-Speed 3338.68 samples/sec   Loss 5.8865   LearningRate 0.0869   Epoch: 1   Global Step: 22720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:38,861-Speed 3319.86 samples/sec   Loss 5.8190   LearningRate 0.0868   Epoch: 1   Global Step: 22730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:41,929-Speed 3338.74 samples/sec   Loss 5.8328   LearningRate 0.0868   Epoch: 1   Global Step: 22740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:47,529-Speed 1828.75 samples/sec   Loss 5.9293   LearningRate 0.0868   Epoch: 1   Global Step: 22750   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:50,633-Speed 3300.45 samples/sec   Loss 5.8617   LearningRate 0.0868   Epoch: 1   Global Step: 22760   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:53,754-Speed 3281.22 samples/sec   Loss 5.8190   LearningRate 0.0868   Epoch: 1   Global Step: 22770   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:56,820-Speed 3340.99 samples/sec   Loss 5.9505   LearningRate 0.0868   Epoch: 1   Global Step: 22780   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:38:59,881-Speed 3346.01 samples/sec   Loss 5.7858   LearningRate 0.0868   Epoch: 1   Global Step: 22790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:39:02,941-Speed 3347.59 samples/sec   Loss 5.8571   LearningRate 0.0868   Epoch: 1   Global Step: 22800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:39:06,052-Speed 3291.90 samples/sec   Loss 5.8615   LearningRate 0.0868   Epoch: 1   Global Step: 22810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:39:09,240-Speed 3213.02 samples/sec   Loss 5.9268   LearningRate 0.0868   Epoch: 1   Global Step: 22820   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:12,360-Speed 3282.04 samples/sec   Loss 5.9003   LearningRate 0.0868   Epoch: 1   Global Step: 22830   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:15,434-Speed 3331.88 samples/sec   Loss 5.9804   LearningRate 0.0868   Epoch: 1   Global Step: 22840   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:18,502-Speed 3338.34 samples/sec   Loss 5.8833   LearningRate 0.0868   Epoch: 1   Global Step: 22850   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:21,585-Speed 3322.54 samples/sec   Loss 5.9002   LearningRate 0.0868   Epoch: 1   Global Step: 22860   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:24,714-Speed 3274.25 samples/sec   Loss 5.7528   LearningRate 0.0868   Epoch: 1   Global Step: 22870   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:27,782-Speed 3337.87 samples/sec   Loss 5.9458   LearningRate 0.0868   Epoch: 1   Global Step: 22880   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:30,852-Speed 3336.78 samples/sec   Loss 5.9133   LearningRate 0.0868   Epoch: 1   Global Step: 22890   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:33,956-Speed 3299.05 samples/sec   Loss 5.8218   LearningRate 0.0868   Epoch: 1   Global Step: 22900   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:37,020-Speed 3343.12 samples/sec   Loss 5.9465   LearningRate 0.0867   Epoch: 1   Global Step: 22910   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:39:40,090-Speed 3336.25 samples/sec   Loss 5.9348   LearningRate 0.0867   Epoch: 1   Global Step: 22920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:39:43,163-Speed 3332.94 samples/sec   Loss 5.8394   LearningRate 0.0867   Epoch: 1   Global Step: 22930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:39:46,232-Speed 3336.75 samples/sec   Loss 5.8700   LearningRate 0.0867   Epoch: 1   Global Step: 22940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:39:49,356-Speed 3279.24 samples/sec   Loss 5.8568   LearningRate 0.0867   Epoch: 1   Global Step: 22950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:39:52,427-Speed 3335.11 samples/sec   Loss 5.9419   LearningRate 0.0867   Epoch: 1   Global Step: 22960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:39:55,532-Speed 3298.64 samples/sec   Loss 5.7971   LearningRate 0.0867   Epoch: 1   Global Step: 22970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:39:58,595-Speed 3344.10 samples/sec   Loss 5.7664   LearningRate 0.0867   Epoch: 1   Global Step: 22980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:01,658-Speed 3343.25 samples/sec   Loss 5.7976   LearningRate 0.0867   Epoch: 1   Global Step: 22990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:04,728-Speed 3337.10 samples/sec   Loss 5.8583   LearningRate 0.0867   Epoch: 1   Global Step: 23000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:07,797-Speed 3336.47 samples/sec   Loss 5.8594   LearningRate 0.0867   Epoch: 1   Global Step: 23010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:10,954-Speed 3244.97 samples/sec   Loss 5.9786   LearningRate 0.0867   Epoch: 1   Global Step: 23020   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:40:14,007-Speed 3354.08 samples/sec   Loss 5.9030   LearningRate 0.0867   Epoch: 1   Global Step: 23030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:17,101-Speed 3311.22 samples/sec   Loss 5.8439   LearningRate 0.0867   Epoch: 1   Global Step: 23040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:20,175-Speed 3331.54 samples/sec   Loss 6.0393   LearningRate 0.0867   Epoch: 1   Global Step: 23050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:23,248-Speed 3334.27 samples/sec   Loss 5.8745   LearningRate 0.0867   Epoch: 1   Global Step: 23060   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:26,329-Speed 3324.23 samples/sec   Loss 5.8259   LearningRate 0.0867   Epoch: 1   Global Step: 23070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:29,403-Speed 3330.80 samples/sec   Loss 5.8780   LearningRate 0.0867   Epoch: 1   Global Step: 23080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:32,491-Speed 3317.29 samples/sec   Loss 5.8499   LearningRate 0.0866   Epoch: 1   Global Step: 23090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:35,569-Speed 3327.30 samples/sec   Loss 5.8465   LearningRate 0.0866   Epoch: 1   Global Step: 23100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:38,639-Speed 3336.14 samples/sec   Loss 5.8508   LearningRate 0.0866   Epoch: 1   Global Step: 23110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:41,709-Speed 3336.05 samples/sec   Loss 5.9001   LearningRate 0.0866   Epoch: 1   Global Step: 23120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:44,762-Speed 3355.27 samples/sec   Loss 5.8666   LearningRate 0.0866   Epoch: 1   Global Step: 23130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:47,836-Speed 3332.69 samples/sec   Loss 5.7804   LearningRate 0.0866   Epoch: 1   Global Step: 23140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:50,898-Speed 3344.19 samples/sec   Loss 5.9125   LearningRate 0.0866   Epoch: 1   Global Step: 23150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:53,987-Speed 3315.76 samples/sec   Loss 5.9208   LearningRate 0.0866   Epoch: 1   Global Step: 23160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:40:57,058-Speed 3335.70 samples/sec   Loss 5.8190   LearningRate 0.0866   Epoch: 1   Global Step: 23170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:00,128-Speed 3336.53 samples/sec   Loss 5.7690   LearningRate 0.0866   Epoch: 1   Global Step: 23180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:03,202-Speed 3331.98 samples/sec   Loss 5.7800   LearningRate 0.0866   Epoch: 1   Global Step: 23190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:06,265-Speed 3343.21 samples/sec   Loss 5.9052   LearningRate 0.0866   Epoch: 1   Global Step: 23200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:09,337-Speed 3333.80 samples/sec   Loss 5.7703   LearningRate 0.0866   Epoch: 1   Global Step: 23210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:12,415-Speed 3328.07 samples/sec   Loss 5.8328   LearningRate 0.0866   Epoch: 1   Global Step: 23220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:15,535-Speed 3283.54 samples/sec   Loss 5.7410   LearningRate 0.0866   Epoch: 1   Global Step: 23230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:18,683-Speed 3252.97 samples/sec   Loss 5.8619   LearningRate 0.0866   Epoch: 1   Global Step: 23240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:21,749-Speed 3341.21 samples/sec   Loss 5.8029   LearningRate 0.0866   Epoch: 1   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:24,887-Speed 3263.51 samples/sec   Loss 5.7937   LearningRate 0.0865   Epoch: 1   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:27,957-Speed 3335.87 samples/sec   Loss 5.8156   LearningRate 0.0865   Epoch: 1   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:31,027-Speed 3336.40 samples/sec   Loss 5.8614   LearningRate 0.0865   Epoch: 1   Global Step: 23280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:34,117-Speed 3314.57 samples/sec   Loss 5.8304   LearningRate 0.0865   Epoch: 1   Global Step: 23290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:37,208-Speed 3314.74 samples/sec   Loss 5.7418   LearningRate 0.0865   Epoch: 1   Global Step: 23300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:41:40,299-Speed 3313.23 samples/sec   Loss 5.8906   LearningRate 0.0865   Epoch: 1   Global Step: 23310   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:41:43,366-Speed 3339.15 samples/sec   Loss 5.8881   LearningRate 0.0865   Epoch: 1   Global Step: 23320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:41:46,440-Speed 3332.42 samples/sec   Loss 5.8056   LearningRate 0.0865   Epoch: 1   Global Step: 23330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:41:49,615-Speed 3225.27 samples/sec   Loss 5.8342   LearningRate 0.0865   Epoch: 1   Global Step: 23340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:41:52,685-Speed 3336.34 samples/sec   Loss 5.8364   LearningRate 0.0865   Epoch: 1   Global Step: 23350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:41:55,789-Speed 3299.62 samples/sec   Loss 5.8570   LearningRate 0.0865   Epoch: 1   Global Step: 23360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:41:58,951-Speed 3239.50 samples/sec   Loss 5.9016   LearningRate 0.0865   Epoch: 1   Global Step: 23370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:02,020-Speed 3337.56 samples/sec   Loss 5.9068   LearningRate 0.0865   Epoch: 1   Global Step: 23380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:05,098-Speed 3328.82 samples/sec   Loss 5.8362   LearningRate 0.0865   Epoch: 1   Global Step: 23390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:08,165-Speed 3338.88 samples/sec   Loss 5.8153   LearningRate 0.0865   Epoch: 1   Global Step: 23400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:11,238-Speed 3333.17 samples/sec   Loss 5.8478   LearningRate 0.0865   Epoch: 1   Global Step: 23410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:42:14,357-Speed 3283.48 samples/sec   Loss 5.8231   LearningRate 0.0865   Epoch: 1   Global Step: 23420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:42:17,495-Speed 3264.38 samples/sec   Loss 5.7218   LearningRate 0.0865   Epoch: 1   Global Step: 23430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:42:20,561-Speed 3340.10 samples/sec   Loss 5.7456   LearningRate 0.0864   Epoch: 1   Global Step: 23440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:42:23,648-Speed 3317.99 samples/sec   Loss 5.8344   LearningRate 0.0864   Epoch: 1   Global Step: 23450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:42:26,732-Speed 3321.11 samples/sec   Loss 5.8997   LearningRate 0.0864   Epoch: 1   Global Step: 23460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:29,803-Speed 3336.23 samples/sec   Loss 5.8144   LearningRate 0.0864   Epoch: 1   Global Step: 23470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:32,870-Speed 3339.00 samples/sec   Loss 5.7866   LearningRate 0.0864   Epoch: 1   Global Step: 23480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:35,939-Speed 3337.96 samples/sec   Loss 5.8977   LearningRate 0.0864   Epoch: 1   Global Step: 23490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:39,006-Speed 3339.43 samples/sec   Loss 5.8673   LearningRate 0.0864   Epoch: 1   Global Step: 23500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:42,079-Speed 3332.02 samples/sec   Loss 5.9467   LearningRate 0.0864   Epoch: 1   Global Step: 23510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:45,161-Speed 3323.31 samples/sec   Loss 5.7537   LearningRate 0.0864   Epoch: 1   Global Step: 23520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:48,226-Speed 3342.37 samples/sec   Loss 5.8666   LearningRate 0.0864   Epoch: 1   Global Step: 23530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:51,295-Speed 3336.97 samples/sec   Loss 5.7614   LearningRate 0.0864   Epoch: 1   Global Step: 23540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:54,379-Speed 3321.19 samples/sec   Loss 5.8113   LearningRate 0.0864   Epoch: 1   Global Step: 23550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:42:57,457-Speed 3328.20 samples/sec   Loss 5.7784   LearningRate 0.0864   Epoch: 1   Global Step: 23560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:00,531-Speed 3331.49 samples/sec   Loss 5.8530   LearningRate 0.0864   Epoch: 1   Global Step: 23570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:03,615-Speed 3321.48 samples/sec   Loss 5.9026   LearningRate 0.0864   Epoch: 1   Global Step: 23580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:06,686-Speed 3334.76 samples/sec   Loss 5.7397   LearningRate 0.0864   Epoch: 1   Global Step: 23590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:09,754-Speed 3338.00 samples/sec   Loss 5.7951   LearningRate 0.0864   Epoch: 1   Global Step: 23600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:12,823-Speed 3337.72 samples/sec   Loss 5.9327   LearningRate 0.0864   Epoch: 1   Global Step: 23610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:15,887-Speed 3342.64 samples/sec   Loss 5.7487   LearningRate 0.0863   Epoch: 1   Global Step: 23620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:18,967-Speed 3325.26 samples/sec   Loss 5.8504   LearningRate 0.0863   Epoch: 1   Global Step: 23630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:22,072-Speed 3298.58 samples/sec   Loss 5.7033   LearningRate 0.0863   Epoch: 1   Global Step: 23640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:25,140-Speed 3338.39 samples/sec   Loss 5.7744   LearningRate 0.0863   Epoch: 1   Global Step: 23650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:28,246-Speed 3297.83 samples/sec   Loss 5.7143   LearningRate 0.0863   Epoch: 1   Global Step: 23660   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 01:43:31,322-Speed 3330.36 samples/sec   Loss 5.7190   LearningRate 0.0863   Epoch: 1   Global Step: 23670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:34,397-Speed 3331.05 samples/sec   Loss 5.7231   LearningRate 0.0863   Epoch: 1   Global Step: 23680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:37,469-Speed 3334.01 samples/sec   Loss 5.7466   LearningRate 0.0863   Epoch: 1   Global Step: 23690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:40,540-Speed 3334.50 samples/sec   Loss 5.7382   LearningRate 0.0863   Epoch: 1   Global Step: 23700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:43,638-Speed 3306.37 samples/sec   Loss 5.8602   LearningRate 0.0863   Epoch: 1   Global Step: 23710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:46,718-Speed 3325.26 samples/sec   Loss 5.7449   LearningRate 0.0863   Epoch: 1   Global Step: 23720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:49,794-Speed 3329.94 samples/sec   Loss 5.8342   LearningRate 0.0863   Epoch: 1   Global Step: 23730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:52,879-Speed 3320.23 samples/sec   Loss 5.7650   LearningRate 0.0863   Epoch: 1   Global Step: 23740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:55,961-Speed 3323.57 samples/sec   Loss 5.7106   LearningRate 0.0863   Epoch: 1   Global Step: 23750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:43:59,089-Speed 3273.70 samples/sec   Loss 5.8455   LearningRate 0.0863   Epoch: 1   Global Step: 23760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:44:02,252-Speed 3238.04 samples/sec   Loss 5.7827   LearningRate 0.0863   Epoch: 1   Global Step: 23770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:44:05,308-Speed 3352.00 samples/sec   Loss 5.7459   LearningRate 0.0863   Epoch: 1   Global Step: 23780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:08,379-Speed 3335.29 samples/sec   Loss 5.7530   LearningRate 0.0863   Epoch: 1   Global Step: 23790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:11,446-Speed 3339.60 samples/sec   Loss 5.8393   LearningRate 0.0862   Epoch: 1   Global Step: 23800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:14,524-Speed 3327.55 samples/sec   Loss 5.7144   LearningRate 0.0862   Epoch: 1   Global Step: 23810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:17,624-Speed 3303.46 samples/sec   Loss 5.8120   LearningRate 0.0862   Epoch: 1   Global Step: 23820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:20,697-Speed 3333.93 samples/sec   Loss 5.9053   LearningRate 0.0862   Epoch: 1   Global Step: 23830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:23,912-Speed 3185.53 samples/sec   Loss 5.7873   LearningRate 0.0862   Epoch: 1   Global Step: 23840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:27,123-Speed 3190.40 samples/sec   Loss 5.7904   LearningRate 0.0862   Epoch: 1   Global Step: 23850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:30,279-Speed 3245.03 samples/sec   Loss 5.7271   LearningRate 0.0862   Epoch: 1   Global Step: 23860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:33,371-Speed 3312.28 samples/sec   Loss 5.7252   LearningRate 0.0862   Epoch: 1   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:36,439-Speed 3338.83 samples/sec   Loss 5.7853   LearningRate 0.0862   Epoch: 1   Global Step: 23880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:44:39,529-Speed 3313.88 samples/sec   Loss 5.8265   LearningRate 0.0862   Epoch: 1   Global Step: 23890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:44:42,737-Speed 3193.37 samples/sec   Loss 5.7813   LearningRate 0.0862   Epoch: 1   Global Step: 23900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:44:45,804-Speed 3339.31 samples/sec   Loss 5.6573   LearningRate 0.0862   Epoch: 1   Global Step: 23910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:44:48,889-Speed 3320.23 samples/sec   Loss 5.8310   LearningRate 0.0862   Epoch: 1   Global Step: 23920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:44:51,966-Speed 3328.68 samples/sec   Loss 5.8069   LearningRate 0.0862   Epoch: 1   Global Step: 23930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:55,033-Speed 3339.09 samples/sec   Loss 5.8499   LearningRate 0.0862   Epoch: 1   Global Step: 23940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:44:58,100-Speed 3340.29 samples/sec   Loss 5.6574   LearningRate 0.0862   Epoch: 1   Global Step: 23950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:45:01,164-Speed 3343.04 samples/sec   Loss 5.7673   LearningRate 0.0862   Epoch: 1   Global Step: 23960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:45:04,235-Speed 3334.66 samples/sec   Loss 5.9305   LearningRate 0.0862   Epoch: 1   Global Step: 23970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:45:07,309-Speed 3332.37 samples/sec   Loss 5.7112   LearningRate 0.0861   Epoch: 1   Global Step: 23980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:45:10,472-Speed 3237.89 samples/sec   Loss 5.7452   LearningRate 0.0861   Epoch: 1   Global Step: 23990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:45:13,624-Speed 3249.07 samples/sec   Loss 5.7462   LearningRate 0.0861   Epoch: 1   Global Step: 24000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:45:58,058-[lfw][24000]XNorm: 22.484680
Training: 2022-04-11 01:45:58,059-[lfw][24000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-04-11 01:45:58,059-[lfw][24000]Accuracy-Highest: 0.99767
Training: 2022-04-11 01:46:49,584-[cfp_fp][24000]XNorm: 20.806705
Training: 2022-04-11 01:46:49,585-[cfp_fp][24000]Accuracy-Flip: 0.97386+-0.00720
Training: 2022-04-11 01:46:49,585-[cfp_fp][24000]Accuracy-Highest: 0.97429
Training: 2022-04-11 01:47:33,818-[agedb_30][24000]XNorm: 22.607942
Training: 2022-04-11 01:47:33,819-[agedb_30][24000]Accuracy-Flip: 0.97417+-0.00873
Training: 2022-04-11 01:47:33,819-[agedb_30][24000]Accuracy-Highest: 0.97417
Training: 2022-04-11 01:47:36,913-Speed 71.46 samples/sec   Loss 5.7591   LearningRate 0.0861   Epoch: 1   Global Step: 24010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:47:40,008-Speed 3308.83 samples/sec   Loss 5.7422   LearningRate 0.0861   Epoch: 1   Global Step: 24020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:47:43,128-Speed 3283.55 samples/sec   Loss 5.7870   LearningRate 0.0861   Epoch: 1   Global Step: 24030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:47:46,180-Speed 3355.68 samples/sec   Loss 5.7984   LearningRate 0.0861   Epoch: 1   Global Step: 24040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:47:49,243-Speed 3344.29 samples/sec   Loss 5.7531   LearningRate 0.0861   Epoch: 1   Global Step: 24050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:47:52,446-Speed 3197.18 samples/sec   Loss 5.6993   LearningRate 0.0861   Epoch: 1   Global Step: 24060   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:47:55,560-Speed 3289.60 samples/sec   Loss 5.7279   LearningRate 0.0861   Epoch: 1   Global Step: 24070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:47:58,720-Speed 3241.23 samples/sec   Loss 5.7498   LearningRate 0.0861   Epoch: 1   Global Step: 24080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:48:01,884-Speed 3236.19 samples/sec   Loss 5.7738   LearningRate 0.0861   Epoch: 1   Global Step: 24090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:48:04,938-Speed 3353.73 samples/sec   Loss 5.7444   LearningRate 0.0861   Epoch: 1   Global Step: 24100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:48:07,983-Speed 3363.75 samples/sec   Loss 5.7327   LearningRate 0.0861   Epoch: 1   Global Step: 24110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:11,038-Speed 3352.76 samples/sec   Loss 5.6780   LearningRate 0.0861   Epoch: 1   Global Step: 24120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:14,097-Speed 3348.40 samples/sec   Loss 5.8059   LearningRate 0.0861   Epoch: 1   Global Step: 24130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:17,170-Speed 3333.68 samples/sec   Loss 5.7860   LearningRate 0.0861   Epoch: 1   Global Step: 24140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:20,233-Speed 3343.02 samples/sec   Loss 5.7986   LearningRate 0.0861   Epoch: 1   Global Step: 24150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:23,337-Speed 3300.27 samples/sec   Loss 5.5560   LearningRate 0.0860   Epoch: 1   Global Step: 24160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:26,404-Speed 3338.93 samples/sec   Loss 5.7620   LearningRate 0.0860   Epoch: 1   Global Step: 24170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:29,482-Speed 3328.06 samples/sec   Loss 5.6749   LearningRate 0.0860   Epoch: 1   Global Step: 24180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:32,636-Speed 3247.29 samples/sec   Loss 5.7341   LearningRate 0.0860   Epoch: 1   Global Step: 24190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:35,772-Speed 3266.45 samples/sec   Loss 5.7397   LearningRate 0.0860   Epoch: 1   Global Step: 24200   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:48:38,910-Speed 3264.17 samples/sec   Loss 5.7347   LearningRate 0.0860   Epoch: 1   Global Step: 24210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:48:42,039-Speed 3272.89 samples/sec   Loss 5.6742   LearningRate 0.0860   Epoch: 1   Global Step: 24220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:48:45,207-Speed 3233.89 samples/sec   Loss 5.6155   LearningRate 0.0860   Epoch: 1   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:48:48,287-Speed 3324.82 samples/sec   Loss 5.6672   LearningRate 0.0860   Epoch: 1   Global Step: 24240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:48:51,361-Speed 3331.81 samples/sec   Loss 5.7009   LearningRate 0.0860   Epoch: 1   Global Step: 24250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:48:54,505-Speed 3257.42 samples/sec   Loss 5.7832   LearningRate 0.0860   Epoch: 1   Global Step: 24260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:48:57,650-Speed 3256.69 samples/sec   Loss 5.7356   LearningRate 0.0860   Epoch: 1   Global Step: 24270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:49:00,723-Speed 3333.12 samples/sec   Loss 5.6288   LearningRate 0.0860   Epoch: 1   Global Step: 24280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:49:03,787-Speed 3343.88 samples/sec   Loss 5.6642   LearningRate 0.0860   Epoch: 1   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:49:06,864-Speed 3327.99 samples/sec   Loss 5.7791   LearningRate 0.0860   Epoch: 1   Global Step: 24300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:49:09,923-Speed 3348.12 samples/sec   Loss 5.7375   LearningRate 0.0860   Epoch: 1   Global Step: 24310   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:49:12,965-Speed 3367.81 samples/sec   Loss 5.7646   LearningRate 0.0860   Epoch: 1   Global Step: 24320   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:16,033-Speed 3338.62 samples/sec   Loss 5.6388   LearningRate 0.0860   Epoch: 1   Global Step: 24330   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:19,109-Speed 3329.08 samples/sec   Loss 5.7149   LearningRate 0.0859   Epoch: 1   Global Step: 24340   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:22,217-Speed 3296.30 samples/sec   Loss 5.8064   LearningRate 0.0859   Epoch: 1   Global Step: 24350   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:25,273-Speed 3350.80 samples/sec   Loss 5.7798   LearningRate 0.0859   Epoch: 1   Global Step: 24360   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:28,359-Speed 3319.37 samples/sec   Loss 5.6598   LearningRate 0.0859   Epoch: 1   Global Step: 24370   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:31,415-Speed 3352.01 samples/sec   Loss 5.7997   LearningRate 0.0859   Epoch: 1   Global Step: 24380   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:34,485-Speed 3336.15 samples/sec   Loss 5.7799   LearningRate 0.0859   Epoch: 1   Global Step: 24390   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:37,546-Speed 3346.25 samples/sec   Loss 5.6594   LearningRate 0.0859   Epoch: 1   Global Step: 24400   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:40,607-Speed 3345.98 samples/sec   Loss 5.7663   LearningRate 0.0859   Epoch: 1   Global Step: 24410   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:49:43,674-Speed 3339.91 samples/sec   Loss 5.6895   LearningRate 0.0859   Epoch: 1   Global Step: 24420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:49:46,758-Speed 3320.94 samples/sec   Loss 5.6797   LearningRate 0.0859   Epoch: 1   Global Step: 24430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:49:49,879-Speed 3281.30 samples/sec   Loss 5.6137   LearningRate 0.0859   Epoch: 1   Global Step: 24440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:49:53,026-Speed 3254.91 samples/sec   Loss 5.7395   LearningRate 0.0859   Epoch: 1   Global Step: 24450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:49:56,118-Speed 3312.03 samples/sec   Loss 5.7133   LearningRate 0.0859   Epoch: 1   Global Step: 24460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:49:59,197-Speed 3326.96 samples/sec   Loss 5.5793   LearningRate 0.0859   Epoch: 1   Global Step: 24470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:02,284-Speed 3318.48 samples/sec   Loss 5.8554   LearningRate 0.0859   Epoch: 1   Global Step: 24480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:05,349-Speed 3341.50 samples/sec   Loss 5.6637   LearningRate 0.0859   Epoch: 1   Global Step: 24490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:08,408-Speed 3347.90 samples/sec   Loss 5.7062   LearningRate 0.0859   Epoch: 1   Global Step: 24500   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:11,493-Speed 3320.65 samples/sec   Loss 5.6669   LearningRate 0.0859   Epoch: 1   Global Step: 24510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:14,579-Speed 3318.07 samples/sec   Loss 5.7622   LearningRate 0.0858   Epoch: 1   Global Step: 24520   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-04-11 01:50:17,645-Speed 3341.55 samples/sec   Loss 5.7223   LearningRate 0.0858   Epoch: 1   Global Step: 24530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:20,721-Speed 3329.74 samples/sec   Loss 5.6775   LearningRate 0.0858   Epoch: 1   Global Step: 24540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:23,788-Speed 3339.09 samples/sec   Loss 5.7906   LearningRate 0.0858   Epoch: 1   Global Step: 24550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:26,855-Speed 3339.62 samples/sec   Loss 5.8583   LearningRate 0.0858   Epoch: 1   Global Step: 24560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:29,926-Speed 3335.93 samples/sec   Loss 5.7374   LearningRate 0.0858   Epoch: 1   Global Step: 24570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:32,990-Speed 3342.85 samples/sec   Loss 5.6447   LearningRate 0.0858   Epoch: 1   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:36,088-Speed 3305.07 samples/sec   Loss 5.7048   LearningRate 0.0858   Epoch: 1   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:39,157-Speed 3337.68 samples/sec   Loss 5.6872   LearningRate 0.0858   Epoch: 1   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:42,227-Speed 3336.93 samples/sec   Loss 5.7120   LearningRate 0.0858   Epoch: 1   Global Step: 24610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:45,300-Speed 3331.87 samples/sec   Loss 5.7419   LearningRate 0.0858   Epoch: 1   Global Step: 24620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:48,426-Speed 3277.66 samples/sec   Loss 5.6186   LearningRate 0.0858   Epoch: 1   Global Step: 24630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:51,558-Speed 3270.33 samples/sec   Loss 5.6840   LearningRate 0.0858   Epoch: 1   Global Step: 24640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:54,689-Speed 3270.70 samples/sec   Loss 5.6698   LearningRate 0.0858   Epoch: 1   Global Step: 24650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:50:57,864-Speed 3225.99 samples/sec   Loss 5.7605   LearningRate 0.0858   Epoch: 1   Global Step: 24660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:51:00,945-Speed 3324.12 samples/sec   Loss 5.6284   LearningRate 0.0858   Epoch: 1   Global Step: 24670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:51:04,020-Speed 3331.37 samples/sec   Loss 5.6943   LearningRate 0.0858   Epoch: 1   Global Step: 24680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:51:07,100-Speed 3324.89 samples/sec   Loss 5.7642   LearningRate 0.0858   Epoch: 1   Global Step: 24690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:51:10,163-Speed 3343.89 samples/sec   Loss 5.7274   LearningRate 0.0857   Epoch: 1   Global Step: 24700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:51:13,254-Speed 3313.77 samples/sec   Loss 5.6655   LearningRate 0.0857   Epoch: 1   Global Step: 24710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 01:51:16,321-Speed 3340.05 samples/sec   Loss 5.6043   LearningRate 0.0857   Epoch: 1   Global Step: 24720   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:51:19,393-Speed 3334.29 samples/sec   Loss 5.6774   LearningRate 0.0857   Epoch: 1   Global Step: 24730   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:51:22,489-Speed 3308.18 samples/sec   Loss 5.7163   LearningRate 0.0857   Epoch: 1   Global Step: 24740   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 01:51:25,607-Speed 3285.09 samples/sec   Loss 5.8098   LearningRate 0.0857   Epoch: 1   Global Step: 24750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:51:28,670-Speed 3343.71 samples/sec   Loss 5.6700   LearningRate 0.0857   Epoch: 1   Global Step: 24760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:51:31,783-Speed 3291.07 samples/sec   Loss 5.6754   LearningRate 0.0857   Epoch: 1   Global Step: 24770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:51:34,863-Speed 3325.12 samples/sec   Loss 5.6261   LearningRate 0.0857   Epoch: 1   Global Step: 24780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:51:37,931-Speed 3338.32 samples/sec   Loss 5.6298   LearningRate 0.0857   Epoch: 1   Global Step: 24790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:51:40,999-Speed 3338.15 samples/sec   Loss 5.7296   LearningRate 0.0857   Epoch: 1   Global Step: 24800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:51:44,093-Speed 3311.19 samples/sec   Loss 5.6411   LearningRate 0.0857   Epoch: 1   Global Step: 24810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:51:47,152-Speed 3348.74 samples/sec   Loss 5.6642   LearningRate 0.0857   Epoch: 1   Global Step: 24820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:51:50,302-Speed 3250.69 samples/sec   Loss 5.7375   LearningRate 0.0857   Epoch: 1   Global Step: 24830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:51:53,406-Speed 3300.72 samples/sec   Loss 5.6735   LearningRate 0.0857   Epoch: 1   Global Step: 24840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:51:56,470-Speed 3341.88 samples/sec   Loss 5.7016   LearningRate 0.0857   Epoch: 1   Global Step: 24850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:51:59,530-Speed 3347.26 samples/sec   Loss 5.6092   LearningRate 0.0857   Epoch: 1   Global Step: 24860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:52:02,580-Speed 3358.07 samples/sec   Loss 5.6526   LearningRate 0.0857   Epoch: 1   Global Step: 24870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:05,652-Speed 3334.77 samples/sec   Loss 5.6665   LearningRate 0.0856   Epoch: 1   Global Step: 24880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:08,795-Speed 3258.51 samples/sec   Loss 5.6893   LearningRate 0.0856   Epoch: 1   Global Step: 24890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:11,856-Speed 3347.06 samples/sec   Loss 5.6299   LearningRate 0.0856   Epoch: 1   Global Step: 24900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:14,944-Speed 3316.46 samples/sec   Loss 5.8001   LearningRate 0.0856   Epoch: 1   Global Step: 24910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:18,009-Speed 3341.70 samples/sec   Loss 5.6565   LearningRate 0.0856   Epoch: 1   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:21,086-Speed 3328.43 samples/sec   Loss 5.6156   LearningRate 0.0856   Epoch: 1   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:24,201-Speed 3287.89 samples/sec   Loss 5.6839   LearningRate 0.0856   Epoch: 1   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:27,319-Speed 3284.74 samples/sec   Loss 5.6276   LearningRate 0.0856   Epoch: 1   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:30,545-Speed 3174.88 samples/sec   Loss 5.6984   LearningRate 0.0856   Epoch: 1   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:52:33,631-Speed 3319.55 samples/sec   Loss 5.7286   LearningRate 0.0856   Epoch: 1   Global Step: 24970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:52:36,732-Speed 3302.64 samples/sec   Loss 5.7091   LearningRate 0.0856   Epoch: 1   Global Step: 24980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:52:39,790-Speed 3350.53 samples/sec   Loss 5.7580   LearningRate 0.0856   Epoch: 1   Global Step: 24990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:52:42,869-Speed 3326.16 samples/sec   Loss 5.6680   LearningRate 0.0856   Epoch: 1   Global Step: 25000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:52:45,943-Speed 3331.73 samples/sec   Loss 5.6711   LearningRate 0.0856   Epoch: 1   Global Step: 25010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:52:49,004-Speed 3346.34 samples/sec   Loss 5.7260   LearningRate 0.0856   Epoch: 1   Global Step: 25020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:52:52,063-Speed 3347.75 samples/sec   Loss 5.5500   LearningRate 0.0856   Epoch: 1   Global Step: 25030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:52:55,134-Speed 3335.36 samples/sec   Loss 5.6837   LearningRate 0.0856   Epoch: 1   Global Step: 25040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:52:58,229-Speed 3309.70 samples/sec   Loss 5.6594   LearningRate 0.0856   Epoch: 1   Global Step: 25050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:01,391-Speed 3238.32 samples/sec   Loss 5.7060   LearningRate 0.0855   Epoch: 1   Global Step: 25060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:04,484-Speed 3312.52 samples/sec   Loss 5.5910   LearningRate 0.0855   Epoch: 1   Global Step: 25070   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 01:53:07,557-Speed 3332.84 samples/sec   Loss 5.6670   LearningRate 0.0855   Epoch: 1   Global Step: 25080   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 01:53:10,665-Speed 3295.42 samples/sec   Loss 5.5432   LearningRate 0.0855   Epoch: 1   Global Step: 25090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:13,799-Speed 3267.58 samples/sec   Loss 5.6709   LearningRate 0.0855   Epoch: 1   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:17,018-Speed 3182.52 samples/sec   Loss 5.6563   LearningRate 0.0855   Epoch: 1   Global Step: 25110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:20,122-Speed 3299.26 samples/sec   Loss 5.5801   LearningRate 0.0855   Epoch: 1   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:23,290-Speed 3232.82 samples/sec   Loss 5.6073   LearningRate 0.0855   Epoch: 1   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:26,355-Speed 3341.86 samples/sec   Loss 5.5634   LearningRate 0.0855   Epoch: 1   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:29,423-Speed 3339.40 samples/sec   Loss 5.5626   LearningRate 0.0855   Epoch: 1   Global Step: 25150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:32,506-Speed 3322.26 samples/sec   Loss 5.6445   LearningRate 0.0855   Epoch: 1   Global Step: 25160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:35,623-Speed 3284.97 samples/sec   Loss 5.6972   LearningRate 0.0855   Epoch: 1   Global Step: 25170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:38,764-Speed 3260.85 samples/sec   Loss 5.5792   LearningRate 0.0855   Epoch: 1   Global Step: 25180   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:41,886-Speed 3281.92 samples/sec   Loss 5.5504   LearningRate 0.0855   Epoch: 1   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:53:44,945-Speed 3348.49 samples/sec   Loss 5.5315   LearningRate 0.0855   Epoch: 1   Global Step: 25200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:53:48,022-Speed 3328.14 samples/sec   Loss 5.6105   LearningRate 0.0855   Epoch: 1   Global Step: 25210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:53:51,095-Speed 3333.60 samples/sec   Loss 5.6326   LearningRate 0.0855   Epoch: 1   Global Step: 25220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:53:54,250-Speed 3246.71 samples/sec   Loss 5.6816   LearningRate 0.0855   Epoch: 1   Global Step: 25230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:53:57,367-Speed 3286.48 samples/sec   Loss 5.5802   LearningRate 0.0854   Epoch: 1   Global Step: 25240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:54:00,456-Speed 3315.29 samples/sec   Loss 5.6295   LearningRate 0.0854   Epoch: 1   Global Step: 25250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:54:03,537-Speed 3324.86 samples/sec   Loss 5.6060   LearningRate 0.0854   Epoch: 1   Global Step: 25260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:54:06,734-Speed 3203.83 samples/sec   Loss 5.6554   LearningRate 0.0854   Epoch: 1   Global Step: 25270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:54:09,802-Speed 3338.60 samples/sec   Loss 5.5575   LearningRate 0.0854   Epoch: 1   Global Step: 25280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:54:12,890-Speed 3315.98 samples/sec   Loss 5.6165   LearningRate 0.0854   Epoch: 1   Global Step: 25290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:54:16,005-Speed 3288.16 samples/sec   Loss 5.6835   LearningRate 0.0854   Epoch: 1   Global Step: 25300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:19,085-Speed 3325.32 samples/sec   Loss 5.4596   LearningRate 0.0854   Epoch: 1   Global Step: 25310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:22,222-Speed 3266.02 samples/sec   Loss 5.7006   LearningRate 0.0854   Epoch: 1   Global Step: 25320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:25,286-Speed 3342.95 samples/sec   Loss 5.6487   LearningRate 0.0854   Epoch: 1   Global Step: 25330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:28,353-Speed 3339.55 samples/sec   Loss 5.5495   LearningRate 0.0854   Epoch: 1   Global Step: 25340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:31,412-Speed 3347.79 samples/sec   Loss 5.7253   LearningRate 0.0854   Epoch: 1   Global Step: 25350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:34,474-Speed 3345.16 samples/sec   Loss 5.6993   LearningRate 0.0854   Epoch: 1   Global Step: 25360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:37,540-Speed 3340.54 samples/sec   Loss 5.6136   LearningRate 0.0854   Epoch: 1   Global Step: 25370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:40,629-Speed 3315.87 samples/sec   Loss 5.6624   LearningRate 0.0854   Epoch: 1   Global Step: 25380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:43,710-Speed 3323.73 samples/sec   Loss 5.5739   LearningRate 0.0854   Epoch: 1   Global Step: 25390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:46,767-Speed 3350.51 samples/sec   Loss 5.5948   LearningRate 0.0854   Epoch: 1   Global Step: 25400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:49,888-Speed 3282.06 samples/sec   Loss 5.5713   LearningRate 0.0854   Epoch: 1   Global Step: 25410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:53,011-Speed 3279.84 samples/sec   Loss 5.6270   LearningRate 0.0854   Epoch: 1   Global Step: 25420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:56,125-Speed 3289.61 samples/sec   Loss 5.6135   LearningRate 0.0853   Epoch: 1   Global Step: 25430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:54:59,175-Speed 3357.78 samples/sec   Loss 5.6379   LearningRate 0.0853   Epoch: 1   Global Step: 25440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:02,246-Speed 3334.74 samples/sec   Loss 5.6666   LearningRate 0.0853   Epoch: 1   Global Step: 25450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:05,319-Speed 3333.52 samples/sec   Loss 5.6237   LearningRate 0.0853   Epoch: 1   Global Step: 25460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:08,425-Speed 3297.62 samples/sec   Loss 5.6589   LearningRate 0.0853   Epoch: 1   Global Step: 25470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:11,512-Speed 3318.02 samples/sec   Loss 5.6940   LearningRate 0.0853   Epoch: 1   Global Step: 25480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:14,577-Speed 3341.78 samples/sec   Loss 5.6576   LearningRate 0.0853   Epoch: 1   Global Step: 25490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:17,638-Speed 3346.20 samples/sec   Loss 5.6031   LearningRate 0.0853   Epoch: 1   Global Step: 25500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:20,723-Speed 3319.48 samples/sec   Loss 5.5838   LearningRate 0.0853   Epoch: 1   Global Step: 25510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:23,794-Speed 3335.63 samples/sec   Loss 5.4997   LearningRate 0.0853   Epoch: 1   Global Step: 25520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:26,860-Speed 3340.17 samples/sec   Loss 5.5368   LearningRate 0.0853   Epoch: 1   Global Step: 25530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:29,924-Speed 3343.33 samples/sec   Loss 5.7314   LearningRate 0.0853   Epoch: 1   Global Step: 25540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:55:32,993-Speed 3336.75 samples/sec   Loss 5.6271   LearningRate 0.0853   Epoch: 1   Global Step: 25550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:55:36,070-Speed 3328.60 samples/sec   Loss 5.6474   LearningRate 0.0853   Epoch: 1   Global Step: 25560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:55:39,142-Speed 3334.38 samples/sec   Loss 5.5756   LearningRate 0.0853   Epoch: 1   Global Step: 25570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:55:42,250-Speed 3296.07 samples/sec   Loss 5.5293   LearningRate 0.0853   Epoch: 1   Global Step: 25580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:55:45,315-Speed 3341.63 samples/sec   Loss 5.5625   LearningRate 0.0853   Epoch: 1   Global Step: 25590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:55:48,365-Speed 3357.53 samples/sec   Loss 5.6613   LearningRate 0.0853   Epoch: 1   Global Step: 25600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:51,437-Speed 3334.92 samples/sec   Loss 5.4974   LearningRate 0.0852   Epoch: 1   Global Step: 25610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:54,520-Speed 3321.45 samples/sec   Loss 5.6089   LearningRate 0.0852   Epoch: 1   Global Step: 25620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:55:57,622-Speed 3301.98 samples/sec   Loss 5.6291   LearningRate 0.0852   Epoch: 1   Global Step: 25630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:00,744-Speed 3281.26 samples/sec   Loss 5.6706   LearningRate 0.0852   Epoch: 1   Global Step: 25640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:03,836-Speed 3312.49 samples/sec   Loss 5.6087   LearningRate 0.0852   Epoch: 1   Global Step: 25650   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:06,990-Speed 3247.24 samples/sec   Loss 5.5680   LearningRate 0.0852   Epoch: 1   Global Step: 25660   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:10,083-Speed 3311.26 samples/sec   Loss 5.5468   LearningRate 0.0852   Epoch: 1   Global Step: 25670   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:13,146-Speed 3344.27 samples/sec   Loss 5.5789   LearningRate 0.0852   Epoch: 1   Global Step: 25680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:16,215-Speed 3336.77 samples/sec   Loss 5.5246   LearningRate 0.0852   Epoch: 1   Global Step: 25690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:19,344-Speed 3273.31 samples/sec   Loss 5.7031   LearningRate 0.0852   Epoch: 1   Global Step: 25700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:56:22,558-Speed 3187.14 samples/sec   Loss 5.4849   LearningRate 0.0852   Epoch: 1   Global Step: 25710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:56:25,654-Speed 3308.43 samples/sec   Loss 5.6318   LearningRate 0.0852   Epoch: 1   Global Step: 25720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:56:28,759-Speed 3298.94 samples/sec   Loss 5.6442   LearningRate 0.0852   Epoch: 1   Global Step: 25730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:56:31,843-Speed 3321.05 samples/sec   Loss 5.6276   LearningRate 0.0852   Epoch: 1   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:56:34,939-Speed 3308.33 samples/sec   Loss 5.5015   LearningRate 0.0852   Epoch: 1   Global Step: 25750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:38,029-Speed 3315.13 samples/sec   Loss 5.5951   LearningRate 0.0852   Epoch: 1   Global Step: 25760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:41,099-Speed 3336.21 samples/sec   Loss 5.4904   LearningRate 0.0852   Epoch: 1   Global Step: 25770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:44,180-Speed 3323.46 samples/sec   Loss 5.5822   LearningRate 0.0852   Epoch: 1   Global Step: 25780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:47,316-Speed 3266.46 samples/sec   Loss 5.5527   LearningRate 0.0851   Epoch: 1   Global Step: 25790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:50,390-Speed 3331.63 samples/sec   Loss 5.6361   LearningRate 0.0851   Epoch: 1   Global Step: 25800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:53,521-Speed 3271.23 samples/sec   Loss 5.6382   LearningRate 0.0851   Epoch: 1   Global Step: 25810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:56,598-Speed 3328.61 samples/sec   Loss 5.5142   LearningRate 0.0851   Epoch: 1   Global Step: 25820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:56:59,666-Speed 3338.85 samples/sec   Loss 5.5068   LearningRate 0.0851   Epoch: 1   Global Step: 25830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:57:02,742-Speed 3330.11 samples/sec   Loss 5.5863   LearningRate 0.0851   Epoch: 1   Global Step: 25840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:57:05,810-Speed 3338.03 samples/sec   Loss 5.5763   LearningRate 0.0851   Epoch: 1   Global Step: 25850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:57:08,877-Speed 3340.15 samples/sec   Loss 5.6134   LearningRate 0.0851   Epoch: 1   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:57:11,950-Speed 3332.56 samples/sec   Loss 5.5372   LearningRate 0.0851   Epoch: 1   Global Step: 25870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:57:15,036-Speed 3318.89 samples/sec   Loss 5.4832   LearningRate 0.0851   Epoch: 1   Global Step: 25880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:57:18,137-Speed 3303.39 samples/sec   Loss 5.5721   LearningRate 0.0851   Epoch: 1   Global Step: 25890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:57:21,215-Speed 3327.15 samples/sec   Loss 5.5550   LearningRate 0.0851   Epoch: 1   Global Step: 25900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:57:24,334-Speed 3284.13 samples/sec   Loss 5.5830   LearningRate 0.0851   Epoch: 1   Global Step: 25910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:57:27,506-Speed 3228.40 samples/sec   Loss 5.5415   LearningRate 0.0851   Epoch: 1   Global Step: 25920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 01:57:30,603-Speed 3306.94 samples/sec   Loss 5.6339   LearningRate 0.0851   Epoch: 1   Global Step: 25930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:57:33,678-Speed 3332.30 samples/sec   Loss 5.5412   LearningRate 0.0851   Epoch: 1   Global Step: 25940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:57:36,766-Speed 3315.89 samples/sec   Loss 5.5588   LearningRate 0.0851   Epoch: 1   Global Step: 25950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:57:39,918-Speed 3250.23 samples/sec   Loss 5.5414   LearningRate 0.0851   Epoch: 1   Global Step: 25960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:57:42,993-Speed 3330.01 samples/sec   Loss 5.4922   LearningRate 0.0850   Epoch: 1   Global Step: 25970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:57:46,223-Speed 3171.39 samples/sec   Loss 5.6275   LearningRate 0.0850   Epoch: 1   Global Step: 25980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:57:49,292-Speed 3336.81 samples/sec   Loss 5.5164   LearningRate 0.0850   Epoch: 1   Global Step: 25990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:57:52,374-Speed 3323.19 samples/sec   Loss 5.6343   LearningRate 0.0850   Epoch: 1   Global Step: 26000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 01:58:35,989-[lfw][26000]XNorm: 22.415484
Training: 2022-04-11 01:58:35,990-[lfw][26000]Accuracy-Flip: 0.99700+-0.00267
Training: 2022-04-11 01:58:35,990-[lfw][26000]Accuracy-Highest: 0.99767
Training: 2022-04-11 01:59:26,756-[cfp_fp][26000]XNorm: 20.440697
Training: 2022-04-11 01:59:26,757-[cfp_fp][26000]Accuracy-Flip: 0.97529+-0.00495
Training: 2022-04-11 01:59:26,757-[cfp_fp][26000]Accuracy-Highest: 0.97529
Training: 2022-04-11 02:00:10,505-[agedb_30][26000]XNorm: 22.288008
Training: 2022-04-11 02:00:10,505-[agedb_30][26000]Accuracy-Flip: 0.97367+-0.00510
Training: 2022-04-11 02:00:10,506-[agedb_30][26000]Accuracy-Highest: 0.97417
Training: 2022-04-11 02:00:13,565-Speed 72.53 samples/sec   Loss 5.5654   LearningRate 0.0850   Epoch: 1   Global Step: 26010   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 02:00:16,627-Speed 3344.13 samples/sec   Loss 5.4132   LearningRate 0.0850   Epoch: 1   Global Step: 26020   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-04-11 02:00:19,698-Speed 3335.47 samples/sec   Loss 5.5396   LearningRate 0.0850   Epoch: 1   Global Step: 26030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 02:00:22,758-Speed 3346.98 samples/sec   Loss 5.6047   LearningRate 0.0850   Epoch: 1   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 02:00:25,819-Speed 3346.60 samples/sec   Loss 5.5419   LearningRate 0.0850   Epoch: 1   Global Step: 26050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 02:00:28,883-Speed 3342.90 samples/sec   Loss 5.5284   LearningRate 0.0850   Epoch: 1   Global Step: 26060   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 02:00:32,026-Speed 3258.60 samples/sec   Loss 5.5989   LearningRate 0.0850   Epoch: 1   Global Step: 26070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 02:00:35,094-Speed 3338.74 samples/sec   Loss 5.5179   LearningRate 0.0850   Epoch: 1   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 02:00:38,159-Speed 3341.14 samples/sec   Loss 5.6133   LearningRate 0.0850   Epoch: 1   Global Step: 26090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 02:00:41,303-Speed 3257.96 samples/sec   Loss 5.4464   LearningRate 0.0850   Epoch: 1   Global Step: 26100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 02:00:44,410-Speed 3296.02 samples/sec   Loss 5.5172   LearningRate 0.0850   Epoch: 1   Global Step: 26110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-04-11 02:00:47,471-Speed 3346.81 samples/sec   Loss 5.5592   LearningRate 0.0850   Epoch: 1   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:00:50,607-Speed 3265.73 samples/sec   Loss 5.5506   LearningRate 0.0850   Epoch: 1   Global Step: 26130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:00:53,800-Speed 3208.11 samples/sec   Loss 5.5599   LearningRate 0.0850   Epoch: 1   Global Step: 26140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:00:56,880-Speed 3326.28 samples/sec   Loss 5.6321   LearningRate 0.0849   Epoch: 1   Global Step: 26150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:00:59,952-Speed 3333.28 samples/sec   Loss 5.6416   LearningRate 0.0849   Epoch: 1   Global Step: 26160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:03,030-Speed 3327.64 samples/sec   Loss 5.5394   LearningRate 0.0849   Epoch: 1   Global Step: 26170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:06,095-Speed 3342.71 samples/sec   Loss 5.6421   LearningRate 0.0849   Epoch: 1   Global Step: 26180   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:09,157-Speed 3344.84 samples/sec   Loss 5.5466   LearningRate 0.0849   Epoch: 1   Global Step: 26190   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:12,232-Speed 3330.77 samples/sec   Loss 5.5418   LearningRate 0.0849   Epoch: 1   Global Step: 26200   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:15,299-Speed 3339.35 samples/sec   Loss 5.5218   LearningRate 0.0849   Epoch: 1   Global Step: 26210   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:18,437-Speed 3263.90 samples/sec   Loss 5.5575   LearningRate 0.0849   Epoch: 1   Global Step: 26220   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:21,550-Speed 3290.47 samples/sec   Loss 5.5519   LearningRate 0.0849   Epoch: 1   Global Step: 26230   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:01:24,603-Speed 3356.42 samples/sec   Loss 5.5337   LearningRate 0.0849   Epoch: 1   Global Step: 26240   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:27,701-Speed 3305.25 samples/sec   Loss 5.4962   LearningRate 0.0849   Epoch: 1   Global Step: 26250   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:30,792-Speed 3313.82 samples/sec   Loss 5.5067   LearningRate 0.0849   Epoch: 1   Global Step: 26260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:01:33,858-Speed 3340.29 samples/sec   Loss 5.5368   LearningRate 0.0849   Epoch: 1   Global Step: 26270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:01:36,922-Speed 3342.74 samples/sec   Loss 5.5034   LearningRate 0.0849   Epoch: 1   Global Step: 26280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:01:39,989-Speed 3340.18 samples/sec   Loss 5.6388   LearningRate 0.0849   Epoch: 1   Global Step: 26290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:01:43,136-Speed 3254.60 samples/sec   Loss 5.5196   LearningRate 0.0849   Epoch: 1   Global Step: 26300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:01:46,199-Speed 3343.61 samples/sec   Loss 5.5724   LearningRate 0.0849   Epoch: 1   Global Step: 26310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:01:49,271-Speed 3334.13 samples/sec   Loss 5.5281   LearningRate 0.0849   Epoch: 1   Global Step: 26320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:01:52,435-Speed 3237.85 samples/sec   Loss 5.6128   LearningRate 0.0848   Epoch: 1   Global Step: 26330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:01:55,506-Speed 3335.16 samples/sec   Loss 5.3894   LearningRate 0.0848   Epoch: 1   Global Step: 26340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:01:58,593-Speed 3317.28 samples/sec   Loss 5.4136   LearningRate 0.0848   Epoch: 1   Global Step: 26350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:02:01,748-Speed 3246.68 samples/sec   Loss 5.4944   LearningRate 0.0848   Epoch: 1   Global Step: 26360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:02:04,806-Speed 3349.37 samples/sec   Loss 5.5737   LearningRate 0.0848   Epoch: 1   Global Step: 26370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:07,866-Speed 3347.23 samples/sec   Loss 5.4278   LearningRate 0.0848   Epoch: 1   Global Step: 26380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:10,935-Speed 3337.70 samples/sec   Loss 5.5080   LearningRate 0.0848   Epoch: 1   Global Step: 26390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:14,011-Speed 3329.29 samples/sec   Loss 5.4722   LearningRate 0.0848   Epoch: 1   Global Step: 26400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:17,178-Speed 3234.34 samples/sec   Loss 5.4939   LearningRate 0.0848   Epoch: 1   Global Step: 26410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:20,249-Speed 3336.00 samples/sec   Loss 5.5467   LearningRate 0.0848   Epoch: 1   Global Step: 26420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:23,312-Speed 3343.82 samples/sec   Loss 5.3972   LearningRate 0.0848   Epoch: 1   Global Step: 26430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:26,382-Speed 3336.18 samples/sec   Loss 5.5641   LearningRate 0.0848   Epoch: 1   Global Step: 26440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:29,451-Speed 3336.61 samples/sec   Loss 5.4245   LearningRate 0.0848   Epoch: 1   Global Step: 26450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:32,508-Speed 3350.39 samples/sec   Loss 5.5877   LearningRate 0.0848   Epoch: 1   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:35,599-Speed 3314.02 samples/sec   Loss 5.5809   LearningRate 0.0848   Epoch: 1   Global Step: 26470   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:02:38,677-Speed 3327.93 samples/sec   Loss 5.4301   LearningRate 0.0848   Epoch: 1   Global Step: 26480   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:02:41,747-Speed 3335.89 samples/sec   Loss 5.5343   LearningRate 0.0848   Epoch: 1   Global Step: 26490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:44,841-Speed 3310.23 samples/sec   Loss 5.4806   LearningRate 0.0848   Epoch: 1   Global Step: 26500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:47,979-Speed 3264.92 samples/sec   Loss 5.5722   LearningRate 0.0847   Epoch: 1   Global Step: 26510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:51,036-Speed 3350.29 samples/sec   Loss 5.5162   LearningRate 0.0847   Epoch: 1   Global Step: 26520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:54,111-Speed 3331.09 samples/sec   Loss 5.4492   LearningRate 0.0847   Epoch: 1   Global Step: 26530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:02:57,172-Speed 3345.20 samples/sec   Loss 5.4225   LearningRate 0.0847   Epoch: 1   Global Step: 26540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:03:00,226-Speed 3354.11 samples/sec   Loss 5.4500   LearningRate 0.0847   Epoch: 1   Global Step: 26550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:03,330-Speed 3300.08 samples/sec   Loss 5.5401   LearningRate 0.0847   Epoch: 1   Global Step: 26560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:06,517-Speed 3212.98 samples/sec   Loss 5.5207   LearningRate 0.0847   Epoch: 1   Global Step: 26570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:09,601-Speed 3321.89 samples/sec   Loss 5.5492   LearningRate 0.0847   Epoch: 1   Global Step: 26580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:12,732-Speed 3270.99 samples/sec   Loss 5.5043   LearningRate 0.0847   Epoch: 1   Global Step: 26590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:15,915-Speed 3218.44 samples/sec   Loss 5.5420   LearningRate 0.0847   Epoch: 1   Global Step: 26600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:18,978-Speed 3343.99 samples/sec   Loss 5.5227   LearningRate 0.0847   Epoch: 1   Global Step: 26610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:22,038-Speed 3346.86 samples/sec   Loss 5.5463   LearningRate 0.0847   Epoch: 1   Global Step: 26620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:25,147-Speed 3294.73 samples/sec   Loss 5.4272   LearningRate 0.0847   Epoch: 1   Global Step: 26630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:28,231-Speed 3320.32 samples/sec   Loss 5.4086   LearningRate 0.0847   Epoch: 1   Global Step: 26640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:31,293-Speed 3345.34 samples/sec   Loss 5.4252   LearningRate 0.0847   Epoch: 1   Global Step: 26650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:03:34,377-Speed 3320.87 samples/sec   Loss 5.5045   LearningRate 0.0847   Epoch: 1   Global Step: 26660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:03:37,450-Speed 3333.86 samples/sec   Loss 5.5566   LearningRate 0.0847   Epoch: 1   Global Step: 26670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:03:40,495-Speed 3363.81 samples/sec   Loss 5.5303   LearningRate 0.0847   Epoch: 1   Global Step: 26680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:43,555-Speed 3347.29 samples/sec   Loss 5.5055   LearningRate 0.0846   Epoch: 1   Global Step: 26690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:46,627-Speed 3333.95 samples/sec   Loss 5.4803   LearningRate 0.0846   Epoch: 1   Global Step: 26700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:49,715-Speed 3317.07 samples/sec   Loss 5.4694   LearningRate 0.0846   Epoch: 1   Global Step: 26710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:52,778-Speed 3343.60 samples/sec   Loss 5.4580   LearningRate 0.0846   Epoch: 1   Global Step: 26720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:55,849-Speed 3335.69 samples/sec   Loss 5.4347   LearningRate 0.0846   Epoch: 1   Global Step: 26730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:03:58,915-Speed 3339.76 samples/sec   Loss 5.4942   LearningRate 0.0846   Epoch: 1   Global Step: 26740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:01,995-Speed 3326.25 samples/sec   Loss 5.4924   LearningRate 0.0846   Epoch: 1   Global Step: 26750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:05,055-Speed 3346.31 samples/sec   Loss 5.4863   LearningRate 0.0846   Epoch: 1   Global Step: 26760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:08,117-Speed 3345.85 samples/sec   Loss 5.4943   LearningRate 0.0846   Epoch: 1   Global Step: 26770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:11,192-Speed 3330.15 samples/sec   Loss 5.4743   LearningRate 0.0846   Epoch: 1   Global Step: 26780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:04:14,263-Speed 3336.08 samples/sec   Loss 5.5291   LearningRate 0.0846   Epoch: 1   Global Step: 26790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:04:17,321-Speed 3349.08 samples/sec   Loss 5.5759   LearningRate 0.0846   Epoch: 1   Global Step: 26800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:04:20,382-Speed 3345.82 samples/sec   Loss 5.5791   LearningRate 0.0846   Epoch: 1   Global Step: 26810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:04:23,464-Speed 3323.82 samples/sec   Loss 5.5259   LearningRate 0.0846   Epoch: 1   Global Step: 26820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:04:26,546-Speed 3322.99 samples/sec   Loss 5.5056   LearningRate 0.0846   Epoch: 1   Global Step: 26830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:04:29,600-Speed 3354.91 samples/sec   Loss 5.4056   LearningRate 0.0846   Epoch: 1   Global Step: 26840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:32,662-Speed 3345.19 samples/sec   Loss 5.4583   LearningRate 0.0846   Epoch: 1   Global Step: 26850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:35,726-Speed 3342.84 samples/sec   Loss 5.5408   LearningRate 0.0846   Epoch: 1   Global Step: 26860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:38,819-Speed 3310.92 samples/sec   Loss 5.5125   LearningRate 0.0845   Epoch: 1   Global Step: 26870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:41,884-Speed 3342.13 samples/sec   Loss 5.5348   LearningRate 0.0845   Epoch: 1   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:44,998-Speed 3289.35 samples/sec   Loss 5.4447   LearningRate 0.0845   Epoch: 1   Global Step: 26890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:48,065-Speed 3338.94 samples/sec   Loss 5.5298   LearningRate 0.0845   Epoch: 1   Global Step: 26900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:51,132-Speed 3339.83 samples/sec   Loss 5.3733   LearningRate 0.0845   Epoch: 1   Global Step: 26910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:54,195-Speed 3343.68 samples/sec   Loss 5.3618   LearningRate 0.0845   Epoch: 1   Global Step: 26920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:04:57,259-Speed 3343.58 samples/sec   Loss 5.4191   LearningRate 0.0845   Epoch: 1   Global Step: 26930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:05:00,332-Speed 3333.43 samples/sec   Loss 5.5042   LearningRate 0.0845   Epoch: 1   Global Step: 26940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:03,404-Speed 3334.17 samples/sec   Loss 5.3901   LearningRate 0.0845   Epoch: 1   Global Step: 26950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:06,485-Speed 3323.60 samples/sec   Loss 5.4510   LearningRate 0.0845   Epoch: 1   Global Step: 26960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:09,567-Speed 3323.43 samples/sec   Loss 5.4695   LearningRate 0.0845   Epoch: 1   Global Step: 26970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:12,683-Speed 3287.23 samples/sec   Loss 5.5495   LearningRate 0.0845   Epoch: 1   Global Step: 26980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:15,819-Speed 3266.10 samples/sec   Loss 5.4855   LearningRate 0.0845   Epoch: 1   Global Step: 26990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:18,944-Speed 3276.64 samples/sec   Loss 5.5311   LearningRate 0.0845   Epoch: 1   Global Step: 27000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:22,022-Speed 3328.27 samples/sec   Loss 5.4199   LearningRate 0.0845   Epoch: 1   Global Step: 27010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:25,092-Speed 3335.86 samples/sec   Loss 5.4072   LearningRate 0.0845   Epoch: 1   Global Step: 27020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:28,158-Speed 3341.05 samples/sec   Loss 5.4750   LearningRate 0.0845   Epoch: 1   Global Step: 27030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:31,279-Speed 3281.91 samples/sec   Loss 5.3958   LearningRate 0.0845   Epoch: 1   Global Step: 27040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:34,495-Speed 3184.69 samples/sec   Loss 5.4381   LearningRate 0.0845   Epoch: 1   Global Step: 27050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:37,669-Speed 3226.63 samples/sec   Loss 5.5605   LearningRate 0.0844   Epoch: 1   Global Step: 27060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:40,888-Speed 3182.32 samples/sec   Loss 5.4922   LearningRate 0.0844   Epoch: 1   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:44,124-Speed 3165.00 samples/sec   Loss 5.3387   LearningRate 0.0844   Epoch: 1   Global Step: 27080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:47,258-Speed 3268.09 samples/sec   Loss 5.4544   LearningRate 0.0844   Epoch: 1   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:50,380-Speed 3281.31 samples/sec   Loss 5.5048   LearningRate 0.0844   Epoch: 1   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:53,541-Speed 3240.14 samples/sec   Loss 5.4662   LearningRate 0.0844   Epoch: 1   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:56,694-Speed 3248.17 samples/sec   Loss 5.4797   LearningRate 0.0844   Epoch: 1   Global Step: 27120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:05:59,799-Speed 3298.21 samples/sec   Loss 5.4827   LearningRate 0.0844   Epoch: 1   Global Step: 27130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:02,905-Speed 3297.64 samples/sec   Loss 5.4879   LearningRate 0.0844   Epoch: 1   Global Step: 27140   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:06:05,976-Speed 3335.50 samples/sec   Loss 5.3271   LearningRate 0.0844   Epoch: 1   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:09,053-Speed 3328.32 samples/sec   Loss 5.3611   LearningRate 0.0844   Epoch: 1   Global Step: 27160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:12,134-Speed 3324.68 samples/sec   Loss 5.5120   LearningRate 0.0844   Epoch: 1   Global Step: 27170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:15,216-Speed 3323.40 samples/sec   Loss 5.4347   LearningRate 0.0844   Epoch: 1   Global Step: 27180   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:18,281-Speed 3341.56 samples/sec   Loss 5.4417   LearningRate 0.0844   Epoch: 1   Global Step: 27190   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:21,395-Speed 3289.08 samples/sec   Loss 5.4513   LearningRate 0.0844   Epoch: 1   Global Step: 27200   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:24,527-Speed 3271.05 samples/sec   Loss 5.3568   LearningRate 0.0844   Epoch: 1   Global Step: 27210   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:27,635-Speed 3295.49 samples/sec   Loss 5.4036   LearningRate 0.0844   Epoch: 1   Global Step: 27220   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:30,703-Speed 3337.68 samples/sec   Loss 5.5741   LearningRate 0.0844   Epoch: 1   Global Step: 27230   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:33,788-Speed 3320.79 samples/sec   Loss 5.4811   LearningRate 0.0843   Epoch: 1   Global Step: 27240   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:36,854-Speed 3340.66 samples/sec   Loss 5.4157   LearningRate 0.0843   Epoch: 1   Global Step: 27250   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:06:39,906-Speed 3355.29 samples/sec   Loss 5.4135   LearningRate 0.0843   Epoch: 1   Global Step: 27260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:42,998-Speed 3313.50 samples/sec   Loss 5.4450   LearningRate 0.0843   Epoch: 1   Global Step: 27270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:46,064-Speed 3339.92 samples/sec   Loss 5.4970   LearningRate 0.0843   Epoch: 1   Global Step: 27280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:49,140-Speed 3329.67 samples/sec   Loss 5.4783   LearningRate 0.0843   Epoch: 1   Global Step: 27290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:52,203-Speed 3344.21 samples/sec   Loss 5.4017   LearningRate 0.0843   Epoch: 1   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:55,266-Speed 3343.93 samples/sec   Loss 5.5163   LearningRate 0.0843   Epoch: 1   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:06:58,341-Speed 3330.16 samples/sec   Loss 5.3970   LearningRate 0.0843   Epoch: 1   Global Step: 27320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:01,395-Speed 3354.49 samples/sec   Loss 5.4344   LearningRate 0.0843   Epoch: 1   Global Step: 27330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:04,464-Speed 3337.20 samples/sec   Loss 5.3682   LearningRate 0.0843   Epoch: 1   Global Step: 27340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:07,598-Speed 3268.10 samples/sec   Loss 5.4382   LearningRate 0.0843   Epoch: 1   Global Step: 27350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:10,664-Speed 3340.62 samples/sec   Loss 5.4778   LearningRate 0.0843   Epoch: 1   Global Step: 27360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:13,724-Speed 3347.27 samples/sec   Loss 5.4876   LearningRate 0.0843   Epoch: 1   Global Step: 27370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:16,799-Speed 3331.11 samples/sec   Loss 5.4410   LearningRate 0.0843   Epoch: 1   Global Step: 27380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:19,902-Speed 3300.64 samples/sec   Loss 5.3722   LearningRate 0.0843   Epoch: 1   Global Step: 27390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:22,981-Speed 3325.79 samples/sec   Loss 5.3383   LearningRate 0.0843   Epoch: 1   Global Step: 27400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:26,060-Speed 3327.15 samples/sec   Loss 5.3084   LearningRate 0.0843   Epoch: 1   Global Step: 27410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:29,134-Speed 3331.82 samples/sec   Loss 5.3919   LearningRate 0.0842   Epoch: 1   Global Step: 27420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:07:32,199-Speed 3341.40 samples/sec   Loss 5.4046   LearningRate 0.0842   Epoch: 1   Global Step: 27430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:35,288-Speed 3316.31 samples/sec   Loss 5.5581   LearningRate 0.0842   Epoch: 1   Global Step: 27440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:38,347-Speed 3347.95 samples/sec   Loss 5.3871   LearningRate 0.0842   Epoch: 1   Global Step: 27450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:41,528-Speed 3219.91 samples/sec   Loss 5.4451   LearningRate 0.0842   Epoch: 1   Global Step: 27460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:44,597-Speed 3337.40 samples/sec   Loss 5.3634   LearningRate 0.0842   Epoch: 1   Global Step: 27470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:47,671-Speed 3332.51 samples/sec   Loss 5.4408   LearningRate 0.0842   Epoch: 1   Global Step: 27480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:50,754-Speed 3322.25 samples/sec   Loss 5.4909   LearningRate 0.0842   Epoch: 1   Global Step: 27490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:53,842-Speed 3316.76 samples/sec   Loss 5.4233   LearningRate 0.0842   Epoch: 1   Global Step: 27500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:56,918-Speed 3328.91 samples/sec   Loss 5.4288   LearningRate 0.0842   Epoch: 1   Global Step: 27510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:07:59,989-Speed 3336.20 samples/sec   Loss 5.4596   LearningRate 0.0842   Epoch: 1   Global Step: 27520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:08:03,076-Speed 3317.41 samples/sec   Loss 5.4625   LearningRate 0.0842   Epoch: 1   Global Step: 27530   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:08:06,188-Speed 3291.44 samples/sec   Loss 5.5196   LearningRate 0.0842   Epoch: 1   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:08:09,248-Speed 3347.83 samples/sec   Loss 5.4408   LearningRate 0.0842   Epoch: 1   Global Step: 27550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:12,312-Speed 3342.27 samples/sec   Loss 5.3953   LearningRate 0.0842   Epoch: 1   Global Step: 27560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:15,374-Speed 3344.63 samples/sec   Loss 5.4330   LearningRate 0.0842   Epoch: 1   Global Step: 27570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:18,448-Speed 3333.38 samples/sec   Loss 5.4678   LearningRate 0.0842   Epoch: 1   Global Step: 27580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:21,532-Speed 3320.29 samples/sec   Loss 5.4273   LearningRate 0.0842   Epoch: 1   Global Step: 27590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:24,616-Speed 3321.45 samples/sec   Loss 5.4447   LearningRate 0.0841   Epoch: 1   Global Step: 27600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:27,711-Speed 3309.86 samples/sec   Loss 5.3999   LearningRate 0.0841   Epoch: 1   Global Step: 27610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:30,854-Speed 3259.27 samples/sec   Loss 5.4628   LearningRate 0.0841   Epoch: 1   Global Step: 27620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:33,975-Speed 3280.97 samples/sec   Loss 5.3490   LearningRate 0.0841   Epoch: 1   Global Step: 27630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:37,048-Speed 3333.57 samples/sec   Loss 5.3545   LearningRate 0.0841   Epoch: 1   Global Step: 27640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:08:40,127-Speed 3326.42 samples/sec   Loss 5.4314   LearningRate 0.0841   Epoch: 1   Global Step: 27650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:08:43,197-Speed 3335.46 samples/sec   Loss 5.3509   LearningRate 0.0841   Epoch: 1   Global Step: 27660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:08:46,283-Speed 3319.68 samples/sec   Loss 5.4181   LearningRate 0.0841   Epoch: 1   Global Step: 27670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:08:49,405-Speed 3280.54 samples/sec   Loss 5.4140   LearningRate 0.0841   Epoch: 1   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:08:52,490-Speed 3320.01 samples/sec   Loss 5.3159   LearningRate 0.0841   Epoch: 1   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:08:55,630-Speed 3262.08 samples/sec   Loss 5.4666   LearningRate 0.0841   Epoch: 1   Global Step: 27700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:08:58,701-Speed 3335.50 samples/sec   Loss 5.3544   LearningRate 0.0841   Epoch: 1   Global Step: 27710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:01,768-Speed 3339.06 samples/sec   Loss 5.3592   LearningRate 0.0841   Epoch: 1   Global Step: 27720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:04,834-Speed 3341.27 samples/sec   Loss 5.2974   LearningRate 0.0841   Epoch: 1   Global Step: 27730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:07,901-Speed 3338.65 samples/sec   Loss 5.3648   LearningRate 0.0841   Epoch: 1   Global Step: 27740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:10,972-Speed 3336.15 samples/sec   Loss 5.3587   LearningRate 0.0841   Epoch: 1   Global Step: 27750   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:09:14,055-Speed 3321.92 samples/sec   Loss 5.4446   LearningRate 0.0841   Epoch: 1   Global Step: 27760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:17,132-Speed 3328.69 samples/sec   Loss 5.3889   LearningRate 0.0841   Epoch: 1   Global Step: 27770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:20,206-Speed 3331.60 samples/sec   Loss 5.3745   LearningRate 0.0840   Epoch: 1   Global Step: 27780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:23,277-Speed 3335.91 samples/sec   Loss 5.4242   LearningRate 0.0840   Epoch: 1   Global Step: 27790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:26,349-Speed 3333.92 samples/sec   Loss 5.3595   LearningRate 0.0840   Epoch: 1   Global Step: 27800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:29,422-Speed 3333.05 samples/sec   Loss 5.3498   LearningRate 0.0840   Epoch: 1   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:32,489-Speed 3339.84 samples/sec   Loss 5.3618   LearningRate 0.0840   Epoch: 1   Global Step: 27820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:35,631-Speed 3258.94 samples/sec   Loss 5.3419   LearningRate 0.0840   Epoch: 1   Global Step: 27830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:38,783-Speed 3250.44 samples/sec   Loss 5.3876   LearningRate 0.0840   Epoch: 1   Global Step: 27840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:41,885-Speed 3300.87 samples/sec   Loss 5.4673   LearningRate 0.0840   Epoch: 1   Global Step: 27850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:44,975-Speed 3315.13 samples/sec   Loss 5.3250   LearningRate 0.0840   Epoch: 1   Global Step: 27860   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:09:48,041-Speed 3341.25 samples/sec   Loss 5.4054   LearningRate 0.0840   Epoch: 1   Global Step: 27870   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:09:51,107-Speed 3340.78 samples/sec   Loss 5.3638   LearningRate 0.0840   Epoch: 1   Global Step: 27880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:54,211-Speed 3298.74 samples/sec   Loss 5.3899   LearningRate 0.0840   Epoch: 1   Global Step: 27890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:09:57,286-Speed 3331.70 samples/sec   Loss 5.3341   LearningRate 0.0840   Epoch: 1   Global Step: 27900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:10:00,378-Speed 3312.77 samples/sec   Loss 5.3313   LearningRate 0.0840   Epoch: 1   Global Step: 27910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:10:03,448-Speed 3335.20 samples/sec   Loss 5.3457   LearningRate 0.0840   Epoch: 1   Global Step: 27920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:10:06,528-Speed 3326.00 samples/sec   Loss 5.4076   LearningRate 0.0840   Epoch: 1   Global Step: 27930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:10:09,595-Speed 3339.04 samples/sec   Loss 5.3232   LearningRate 0.0840   Epoch: 1   Global Step: 27940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:10:12,727-Speed 3270.76 samples/sec   Loss 5.4305   LearningRate 0.0840   Epoch: 1   Global Step: 27950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:10:15,919-Speed 3208.54 samples/sec   Loss 5.3579   LearningRate 0.0839   Epoch: 1   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:10:18,997-Speed 3328.48 samples/sec   Loss 5.4168   LearningRate 0.0839   Epoch: 1   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:10:22,094-Speed 3306.13 samples/sec   Loss 5.3600   LearningRate 0.0839   Epoch: 1   Global Step: 27980   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:10:25,187-Speed 3312.05 samples/sec   Loss 5.4185   LearningRate 0.0839   Epoch: 1   Global Step: 27990   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:10:28,303-Speed 3286.63 samples/sec   Loss 5.2885   LearningRate 0.0839   Epoch: 1   Global Step: 28000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:11:12,150-[lfw][28000]XNorm: 22.855926
Training: 2022-04-11 02:11:12,151-[lfw][28000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-11 02:11:12,151-[lfw][28000]Accuracy-Highest: 0.99767
Training: 2022-04-11 02:12:03,091-[cfp_fp][28000]XNorm: 21.482712
Training: 2022-04-11 02:12:03,092-[cfp_fp][28000]Accuracy-Flip: 0.97671+-0.00759
Training: 2022-04-11 02:12:03,092-[cfp_fp][28000]Accuracy-Highest: 0.97671
Training: 2022-04-11 02:12:46,948-[agedb_30][28000]XNorm: 22.822563
Training: 2022-04-11 02:12:46,949-[agedb_30][28000]Accuracy-Flip: 0.97417+-0.00716
Training: 2022-04-11 02:12:46,949-[agedb_30][28000]Accuracy-Highest: 0.97417
Training: 2022-04-11 02:12:50,023-Speed 72.26 samples/sec   Loss 5.3901   LearningRate 0.0839   Epoch: 1   Global Step: 28010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:12:53,083-Speed 3347.05 samples/sec   Loss 5.3345   LearningRate 0.0839   Epoch: 1   Global Step: 28020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:12:56,136-Speed 3354.63 samples/sec   Loss 5.3346   LearningRate 0.0839   Epoch: 1   Global Step: 28030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:12:59,246-Speed 3293.12 samples/sec   Loss 5.3233   LearningRate 0.0839   Epoch: 1   Global Step: 28040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:02,295-Speed 3358.69 samples/sec   Loss 5.3161   LearningRate 0.0839   Epoch: 1   Global Step: 28050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:05,348-Speed 3355.64 samples/sec   Loss 5.3185   LearningRate 0.0839   Epoch: 1   Global Step: 28060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:08,420-Speed 3333.59 samples/sec   Loss 5.2913   LearningRate 0.0839   Epoch: 1   Global Step: 28070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:11,477-Speed 3350.39 samples/sec   Loss 5.3205   LearningRate 0.0839   Epoch: 1   Global Step: 28080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:14,578-Speed 3303.17 samples/sec   Loss 5.2954   LearningRate 0.0839   Epoch: 1   Global Step: 28090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:17,668-Speed 3314.86 samples/sec   Loss 5.4195   LearningRate 0.0839   Epoch: 1   Global Step: 28100   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:13:20,732-Speed 3342.69 samples/sec   Loss 5.2844   LearningRate 0.0839   Epoch: 1   Global Step: 28110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:23,876-Speed 3258.07 samples/sec   Loss 5.3035   LearningRate 0.0839   Epoch: 1   Global Step: 28120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:27,041-Speed 3235.57 samples/sec   Loss 5.3080   LearningRate 0.0839   Epoch: 1   Global Step: 28130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:30,152-Speed 3292.22 samples/sec   Loss 5.4114   LearningRate 0.0839   Epoch: 1   Global Step: 28140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:33,317-Speed 3236.42 samples/sec   Loss 5.1959   LearningRate 0.0838   Epoch: 1   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:36,464-Speed 3254.82 samples/sec   Loss 5.3272   LearningRate 0.0838   Epoch: 1   Global Step: 28160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:39,568-Speed 3298.72 samples/sec   Loss 5.3357   LearningRate 0.0838   Epoch: 1   Global Step: 28170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:13:42,626-Speed 3349.75 samples/sec   Loss 5.2743   LearningRate 0.0838   Epoch: 1   Global Step: 28180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:13:45,766-Speed 3262.00 samples/sec   Loss 5.2673   LearningRate 0.0838   Epoch: 1   Global Step: 28190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:13:48,848-Speed 3323.29 samples/sec   Loss 5.3285   LearningRate 0.0838   Epoch: 1   Global Step: 28200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:13:51,940-Speed 3313.24 samples/sec   Loss 5.3149   LearningRate 0.0838   Epoch: 1   Global Step: 28210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:13:55,076-Speed 3265.88 samples/sec   Loss 5.3624   LearningRate 0.0838   Epoch: 1   Global Step: 28220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:13:58,199-Speed 3279.34 samples/sec   Loss 5.3085   LearningRate 0.0838   Epoch: 1   Global Step: 28230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:01,268-Speed 3337.22 samples/sec   Loss 5.4765   LearningRate 0.0838   Epoch: 1   Global Step: 28240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:04,344-Speed 3330.20 samples/sec   Loss 5.3151   LearningRate 0.0838   Epoch: 1   Global Step: 28250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:07,407-Speed 3343.45 samples/sec   Loss 5.2649   LearningRate 0.0838   Epoch: 1   Global Step: 28260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:10,475-Speed 3338.08 samples/sec   Loss 5.2692   LearningRate 0.0838   Epoch: 1   Global Step: 28270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:13,541-Speed 3341.56 samples/sec   Loss 5.3866   LearningRate 0.0838   Epoch: 1   Global Step: 28280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:14:16,623-Speed 3323.27 samples/sec   Loss 5.3240   LearningRate 0.0838   Epoch: 1   Global Step: 28290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:14:19,685-Speed 3344.31 samples/sec   Loss 5.4182   LearningRate 0.0838   Epoch: 1   Global Step: 28300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:14:22,757-Speed 3334.59 samples/sec   Loss 5.4339   LearningRate 0.0838   Epoch: 1   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:14:25,820-Speed 3344.26 samples/sec   Loss 5.4881   LearningRate 0.0838   Epoch: 1   Global Step: 28320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:14:28,885-Speed 3341.42 samples/sec   Loss 5.2951   LearningRate 0.0837   Epoch: 1   Global Step: 28330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:14:31,959-Speed 3332.28 samples/sec   Loss 5.2316   LearningRate 0.0837   Epoch: 1   Global Step: 28340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:14:35,023-Speed 3342.49 samples/sec   Loss 5.2727   LearningRate 0.0837   Epoch: 1   Global Step: 28350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:38,092-Speed 3337.08 samples/sec   Loss 5.2993   LearningRate 0.0837   Epoch: 1   Global Step: 28360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:41,158-Speed 3341.10 samples/sec   Loss 5.3297   LearningRate 0.0837   Epoch: 1   Global Step: 28370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:44,223-Speed 3342.05 samples/sec   Loss 5.2518   LearningRate 0.0837   Epoch: 1   Global Step: 28380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:47,290-Speed 3338.82 samples/sec   Loss 5.3225   LearningRate 0.0837   Epoch: 1   Global Step: 28390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:50,353-Speed 3344.44 samples/sec   Loss 5.3164   LearningRate 0.0837   Epoch: 1   Global Step: 28400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:53,415-Speed 3345.10 samples/sec   Loss 5.3494   LearningRate 0.0837   Epoch: 1   Global Step: 28410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:56,473-Speed 3349.17 samples/sec   Loss 5.3133   LearningRate 0.0837   Epoch: 1   Global Step: 28420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:14:59,530-Speed 3349.91 samples/sec   Loss 5.4226   LearningRate 0.0837   Epoch: 1   Global Step: 28430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:15:02,604-Speed 3331.84 samples/sec   Loss 5.2736   LearningRate 0.0837   Epoch: 1   Global Step: 28440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:15:05,670-Speed 3341.28 samples/sec   Loss 5.3499   LearningRate 0.0837   Epoch: 1   Global Step: 28450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:08,736-Speed 3340.47 samples/sec   Loss 5.2581   LearningRate 0.0837   Epoch: 1   Global Step: 28460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:11,804-Speed 3338.60 samples/sec   Loss 5.3484   LearningRate 0.0837   Epoch: 1   Global Step: 28470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:14,882-Speed 3327.27 samples/sec   Loss 5.2314   LearningRate 0.0837   Epoch: 1   Global Step: 28480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:17,947-Speed 3341.39 samples/sec   Loss 5.2617   LearningRate 0.0837   Epoch: 1   Global Step: 28490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:21,010-Speed 3344.58 samples/sec   Loss 5.2734   LearningRate 0.0837   Epoch: 1   Global Step: 28500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:24,072-Speed 3345.18 samples/sec   Loss 5.3712   LearningRate 0.0836   Epoch: 1   Global Step: 28510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:27,153-Speed 3324.04 samples/sec   Loss 5.2842   LearningRate 0.0836   Epoch: 1   Global Step: 28520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:30,258-Speed 3298.95 samples/sec   Loss 5.4113   LearningRate 0.0836   Epoch: 1   Global Step: 28530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:33,382-Speed 3278.73 samples/sec   Loss 5.2824   LearningRate 0.0836   Epoch: 1   Global Step: 28540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:36,565-Speed 3217.96 samples/sec   Loss 5.3520   LearningRate 0.0836   Epoch: 1   Global Step: 28550   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:15:39,671-Speed 3297.03 samples/sec   Loss 5.3556   LearningRate 0.0836   Epoch: 1   Global Step: 28560   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:15:42,765-Speed 3311.78 samples/sec   Loss 5.3834   LearningRate 0.0836   Epoch: 1   Global Step: 28570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:45,861-Speed 3308.13 samples/sec   Loss 5.2566   LearningRate 0.0836   Epoch: 1   Global Step: 28580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:48,939-Speed 3326.81 samples/sec   Loss 5.2280   LearningRate 0.0836   Epoch: 1   Global Step: 28590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:52,015-Speed 3329.69 samples/sec   Loss 5.2566   LearningRate 0.0836   Epoch: 1   Global Step: 28600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:55,085-Speed 3336.38 samples/sec   Loss 5.3251   LearningRate 0.0836   Epoch: 1   Global Step: 28610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:15:58,164-Speed 3326.77 samples/sec   Loss 5.3052   LearningRate 0.0836   Epoch: 1   Global Step: 28620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:01,232-Speed 3338.71 samples/sec   Loss 5.2596   LearningRate 0.0836   Epoch: 1   Global Step: 28630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:04,331-Speed 3305.05 samples/sec   Loss 5.3688   LearningRate 0.0836   Epoch: 1   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:07,465-Speed 3268.15 samples/sec   Loss 5.3362   LearningRate 0.0836   Epoch: 1   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:10,541-Speed 3330.16 samples/sec   Loss 5.3478   LearningRate 0.0836   Epoch: 1   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:13,682-Speed 3260.76 samples/sec   Loss 5.2433   LearningRate 0.0836   Epoch: 1   Global Step: 28670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:16,766-Speed 3320.42 samples/sec   Loss 5.2632   LearningRate 0.0836   Epoch: 1   Global Step: 28680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:19,840-Speed 3332.42 samples/sec   Loss 5.2583   LearningRate 0.0835   Epoch: 1   Global Step: 28690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:22,918-Speed 3326.87 samples/sec   Loss 5.2367   LearningRate 0.0835   Epoch: 1   Global Step: 28700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:26,020-Speed 3302.07 samples/sec   Loss 5.3002   LearningRate 0.0835   Epoch: 1   Global Step: 28710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:29,141-Speed 3281.88 samples/sec   Loss 5.2501   LearningRate 0.0835   Epoch: 1   Global Step: 28720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:32,205-Speed 3343.44 samples/sec   Loss 5.3627   LearningRate 0.0835   Epoch: 1   Global Step: 28730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:35,289-Speed 3321.15 samples/sec   Loss 5.3603   LearningRate 0.0835   Epoch: 1   Global Step: 28740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:38,364-Speed 3330.37 samples/sec   Loss 5.2082   LearningRate 0.0835   Epoch: 1   Global Step: 28750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:41,432-Speed 3338.25 samples/sec   Loss 5.3333   LearningRate 0.0835   Epoch: 1   Global Step: 28760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:44,489-Speed 3351.11 samples/sec   Loss 5.3350   LearningRate 0.0835   Epoch: 1   Global Step: 28770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:47,556-Speed 3339.39 samples/sec   Loss 5.2854   LearningRate 0.0835   Epoch: 1   Global Step: 28780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:50,628-Speed 3333.65 samples/sec   Loss 5.3386   LearningRate 0.0835   Epoch: 1   Global Step: 28790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:53,695-Speed 3339.21 samples/sec   Loss 5.2729   LearningRate 0.0835   Epoch: 1   Global Step: 28800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:56,769-Speed 3333.07 samples/sec   Loss 5.2308   LearningRate 0.0835   Epoch: 1   Global Step: 28810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:16:59,838-Speed 3337.30 samples/sec   Loss 5.2579   LearningRate 0.0835   Epoch: 1   Global Step: 28820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:17:02,903-Speed 3341.71 samples/sec   Loss 5.2091   LearningRate 0.0835   Epoch: 1   Global Step: 28830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:06,050-Speed 3254.11 samples/sec   Loss 5.2290   LearningRate 0.0835   Epoch: 1   Global Step: 28840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:09,146-Speed 3308.29 samples/sec   Loss 5.2862   LearningRate 0.0835   Epoch: 1   Global Step: 28850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:12,218-Speed 3334.36 samples/sec   Loss 5.2646   LearningRate 0.0835   Epoch: 1   Global Step: 28860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:15,309-Speed 3313.18 samples/sec   Loss 5.2854   LearningRate 0.0835   Epoch: 1   Global Step: 28870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:18,410-Speed 3303.34 samples/sec   Loss 5.3336   LearningRate 0.0834   Epoch: 1   Global Step: 28880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:21,523-Speed 3290.10 samples/sec   Loss 5.3441   LearningRate 0.0834   Epoch: 1   Global Step: 28890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:24,589-Speed 3340.31 samples/sec   Loss 5.2837   LearningRate 0.0834   Epoch: 1   Global Step: 28900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:27,655-Speed 3341.16 samples/sec   Loss 5.2080   LearningRate 0.0834   Epoch: 1   Global Step: 28910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:30,754-Speed 3304.89 samples/sec   Loss 5.3131   LearningRate 0.0834   Epoch: 1   Global Step: 28920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:17:33,820-Speed 3341.13 samples/sec   Loss 5.2296   LearningRate 0.0834   Epoch: 1   Global Step: 28930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:17:37,023-Speed 3197.05 samples/sec   Loss 5.2339   LearningRate 0.0834   Epoch: 1   Global Step: 28940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:17:40,125-Speed 3302.15 samples/sec   Loss 5.2597   LearningRate 0.0834   Epoch: 1   Global Step: 28950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:17:43,204-Speed 3326.74 samples/sec   Loss 5.3034   LearningRate 0.0834   Epoch: 1   Global Step: 28960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:17:46,282-Speed 3327.26 samples/sec   Loss 5.1275   LearningRate 0.0834   Epoch: 1   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:17:49,376-Speed 3311.38 samples/sec   Loss 5.3435   LearningRate 0.0834   Epoch: 1   Global Step: 28980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:17:52,443-Speed 3339.18 samples/sec   Loss 5.2387   LearningRate 0.0834   Epoch: 1   Global Step: 28990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:17:55,510-Speed 3339.63 samples/sec   Loss 5.1625   LearningRate 0.0834   Epoch: 1   Global Step: 29000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:17:58,566-Speed 3351.18 samples/sec   Loss 5.2927   LearningRate 0.0834   Epoch: 1   Global Step: 29010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:01,635-Speed 3337.27 samples/sec   Loss 5.1991   LearningRate 0.0834   Epoch: 1   Global Step: 29020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:04,719-Speed 3321.18 samples/sec   Loss 5.2495   LearningRate 0.0834   Epoch: 1   Global Step: 29030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:07,784-Speed 3341.30 samples/sec   Loss 5.4085   LearningRate 0.0834   Epoch: 1   Global Step: 29040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:10,904-Speed 3283.64 samples/sec   Loss 5.2416   LearningRate 0.0834   Epoch: 1   Global Step: 29050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:14,048-Speed 3257.20 samples/sec   Loss 5.2894   LearningRate 0.0833   Epoch: 1   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:17,181-Speed 3269.53 samples/sec   Loss 5.2080   LearningRate 0.0833   Epoch: 1   Global Step: 29070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:20,253-Speed 3333.97 samples/sec   Loss 5.3296   LearningRate 0.0833   Epoch: 1   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:23,316-Speed 3343.93 samples/sec   Loss 5.1924   LearningRate 0.0833   Epoch: 1   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:26,382-Speed 3340.73 samples/sec   Loss 5.1959   LearningRate 0.0833   Epoch: 1   Global Step: 29100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:29,445-Speed 3343.82 samples/sec   Loss 5.2037   LearningRate 0.0833   Epoch: 1   Global Step: 29110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:18:32,538-Speed 3310.88 samples/sec   Loss 5.2377   LearningRate 0.0833   Epoch: 1   Global Step: 29120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:18:35,606-Speed 3338.74 samples/sec   Loss 5.1921   LearningRate 0.0833   Epoch: 1   Global Step: 29130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:18:38,695-Speed 3316.35 samples/sec   Loss 5.2865   LearningRate 0.0833   Epoch: 1   Global Step: 29140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:18:41,764-Speed 3336.52 samples/sec   Loss 5.2066   LearningRate 0.0833   Epoch: 1   Global Step: 29150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:44,853-Speed 3316.23 samples/sec   Loss 5.2079   LearningRate 0.0833   Epoch: 1   Global Step: 29160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:47,926-Speed 3333.24 samples/sec   Loss 5.2542   LearningRate 0.0833   Epoch: 1   Global Step: 29170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:50,997-Speed 3334.57 samples/sec   Loss 5.2380   LearningRate 0.0833   Epoch: 1   Global Step: 29180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:54,077-Speed 3325.70 samples/sec   Loss 5.2493   LearningRate 0.0833   Epoch: 1   Global Step: 29190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:18:57,154-Speed 3328.46 samples/sec   Loss 5.2634   LearningRate 0.0833   Epoch: 1   Global Step: 29200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:19:00,231-Speed 3329.06 samples/sec   Loss 5.2364   LearningRate 0.0833   Epoch: 1   Global Step: 29210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:19:03,309-Speed 3326.97 samples/sec   Loss 5.1900   LearningRate 0.0833   Epoch: 1   Global Step: 29220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:19:06,391-Speed 3324.07 samples/sec   Loss 5.2135   LearningRate 0.0833   Epoch: 1   Global Step: 29230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:19:09,490-Speed 3304.68 samples/sec   Loss 5.2786   LearningRate 0.0832   Epoch: 1   Global Step: 29240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:19:12,568-Speed 3327.69 samples/sec   Loss 5.2865   LearningRate 0.0832   Epoch: 1   Global Step: 29250   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:15,747-Speed 3222.29 samples/sec   Loss 5.2394   LearningRate 0.0832   Epoch: 1   Global Step: 29260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:18,931-Speed 3216.84 samples/sec   Loss 5.3022   LearningRate 0.0832   Epoch: 1   Global Step: 29270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:22,000-Speed 3336.80 samples/sec   Loss 5.2274   LearningRate 0.0832   Epoch: 1   Global Step: 29280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:25,089-Speed 3316.09 samples/sec   Loss 5.2419   LearningRate 0.0832   Epoch: 1   Global Step: 29290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:28,177-Speed 3316.45 samples/sec   Loss 5.3528   LearningRate 0.0832   Epoch: 1   Global Step: 29300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:31,256-Speed 3326.86 samples/sec   Loss 5.2997   LearningRate 0.0832   Epoch: 1   Global Step: 29310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:34,369-Speed 3289.95 samples/sec   Loss 5.2682   LearningRate 0.0832   Epoch: 1   Global Step: 29320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:37,495-Speed 3276.73 samples/sec   Loss 5.2644   LearningRate 0.0832   Epoch: 1   Global Step: 29330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:40,625-Speed 3272.43 samples/sec   Loss 5.1481   LearningRate 0.0832   Epoch: 1   Global Step: 29340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:43,740-Speed 3288.73 samples/sec   Loss 5.2839   LearningRate 0.0832   Epoch: 1   Global Step: 29350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:46,817-Speed 3328.05 samples/sec   Loss 5.2335   LearningRate 0.0832   Epoch: 1   Global Step: 29360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:49,886-Speed 3337.11 samples/sec   Loss 5.3103   LearningRate 0.0832   Epoch: 1   Global Step: 29370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:52,956-Speed 3337.08 samples/sec   Loss 5.1968   LearningRate 0.0832   Epoch: 1   Global Step: 29380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:56,021-Speed 3341.58 samples/sec   Loss 5.2143   LearningRate 0.0832   Epoch: 1   Global Step: 29390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:19:59,097-Speed 3329.08 samples/sec   Loss 5.2092   LearningRate 0.0832   Epoch: 1   Global Step: 29400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:02,167-Speed 3336.99 samples/sec   Loss 5.2581   LearningRate 0.0832   Epoch: 1   Global Step: 29410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:05,254-Speed 3317.31 samples/sec   Loss 5.1026   LearningRate 0.0832   Epoch: 1   Global Step: 29420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:08,326-Speed 3334.58 samples/sec   Loss 5.1330   LearningRate 0.0831   Epoch: 1   Global Step: 29430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:11,394-Speed 3339.02 samples/sec   Loss 5.2274   LearningRate 0.0831   Epoch: 1   Global Step: 29440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:14,462-Speed 3337.72 samples/sec   Loss 5.1561   LearningRate 0.0831   Epoch: 1   Global Step: 29450   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:20:17,533-Speed 3335.65 samples/sec   Loss 5.2316   LearningRate 0.0831   Epoch: 1   Global Step: 29460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:20,608-Speed 3330.59 samples/sec   Loss 5.2368   LearningRate 0.0831   Epoch: 1   Global Step: 29470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:23,686-Speed 3327.78 samples/sec   Loss 5.2552   LearningRate 0.0831   Epoch: 1   Global Step: 29480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:26,762-Speed 3329.65 samples/sec   Loss 5.3296   LearningRate 0.0831   Epoch: 1   Global Step: 29490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:29,856-Speed 3310.68 samples/sec   Loss 5.2398   LearningRate 0.0831   Epoch: 1   Global Step: 29500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:20:33,049-Speed 3206.91 samples/sec   Loss 5.2012   LearningRate 0.0831   Epoch: 1   Global Step: 29510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:20:36,138-Speed 3317.18 samples/sec   Loss 5.2115   LearningRate 0.0831   Epoch: 1   Global Step: 29520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:20:39,202-Speed 3342.24 samples/sec   Loss 5.2113   LearningRate 0.0831   Epoch: 1   Global Step: 29530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:20:42,279-Speed 3329.04 samples/sec   Loss 5.1810   LearningRate 0.0831   Epoch: 1   Global Step: 29540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:20:45,415-Speed 3266.18 samples/sec   Loss 5.2571   LearningRate 0.0831   Epoch: 1   Global Step: 29550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:20:48,486-Speed 3334.89 samples/sec   Loss 5.1673   LearningRate 0.0831   Epoch: 1   Global Step: 29560   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:20:51,554-Speed 3338.10 samples/sec   Loss 5.3373   LearningRate 0.0831   Epoch: 1   Global Step: 29570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:20:54,635-Speed 3324.77 samples/sec   Loss 5.1862   LearningRate 0.0831   Epoch: 1   Global Step: 29580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:20:57,813-Speed 3222.16 samples/sec   Loss 5.2127   LearningRate 0.0831   Epoch: 1   Global Step: 29590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:00,892-Speed 3326.95 samples/sec   Loss 5.1138   LearningRate 0.0831   Epoch: 1   Global Step: 29600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:04,000-Speed 3295.39 samples/sec   Loss 5.1779   LearningRate 0.0830   Epoch: 1   Global Step: 29610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:21:07,079-Speed 3326.67 samples/sec   Loss 5.3241   LearningRate 0.0830   Epoch: 1   Global Step: 29620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:21:10,264-Speed 3215.67 samples/sec   Loss 5.2279   LearningRate 0.0830   Epoch: 1   Global Step: 29630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:21:13,329-Speed 3341.71 samples/sec   Loss 5.2938   LearningRate 0.0830   Epoch: 1   Global Step: 29640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:21:16,415-Speed 3318.67 samples/sec   Loss 5.1622   LearningRate 0.0830   Epoch: 1   Global Step: 29650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:21:19,484-Speed 3338.31 samples/sec   Loss 5.2281   LearningRate 0.0830   Epoch: 1   Global Step: 29660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:21:22,550-Speed 3340.43 samples/sec   Loss 5.1306   LearningRate 0.0830   Epoch: 1   Global Step: 29670   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:25,639-Speed 3315.02 samples/sec   Loss 5.1526   LearningRate 0.0830   Epoch: 1   Global Step: 29680   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:28,800-Speed 3240.09 samples/sec   Loss 5.1825   LearningRate 0.0830   Epoch: 1   Global Step: 29690   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:31,897-Speed 3308.20 samples/sec   Loss 5.2266   LearningRate 0.0830   Epoch: 1   Global Step: 29700   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:35,022-Speed 3277.60 samples/sec   Loss 5.1604   LearningRate 0.0830   Epoch: 1   Global Step: 29710   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:38,101-Speed 3326.12 samples/sec   Loss 5.1909   LearningRate 0.0830   Epoch: 1   Global Step: 29720   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:41,167-Speed 3341.11 samples/sec   Loss 5.2549   LearningRate 0.0830   Epoch: 1   Global Step: 29730   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:44,239-Speed 3333.85 samples/sec   Loss 5.1409   LearningRate 0.0830   Epoch: 1   Global Step: 29740   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:47,407-Speed 3232.62 samples/sec   Loss 5.1462   LearningRate 0.0830   Epoch: 1   Global Step: 29750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:50,515-Speed 3295.57 samples/sec   Loss 5.1246   LearningRate 0.0830   Epoch: 1   Global Step: 29760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:21:53,599-Speed 3321.11 samples/sec   Loss 5.2342   LearningRate 0.0830   Epoch: 1   Global Step: 29770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:21:56,666-Speed 3340.18 samples/sec   Loss 5.1870   LearningRate 0.0830   Epoch: 1   Global Step: 29780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:21:59,866-Speed 3200.32 samples/sec   Loss 5.2969   LearningRate 0.0829   Epoch: 1   Global Step: 29790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:02,970-Speed 3299.68 samples/sec   Loss 5.1890   LearningRate 0.0829   Epoch: 1   Global Step: 29800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:06,047-Speed 3329.27 samples/sec   Loss 5.1879   LearningRate 0.0829   Epoch: 1   Global Step: 29810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:09,118-Speed 3334.53 samples/sec   Loss 5.1766   LearningRate 0.0829   Epoch: 1   Global Step: 29820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:12,188-Speed 3336.07 samples/sec   Loss 5.1928   LearningRate 0.0829   Epoch: 1   Global Step: 29830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:15,253-Speed 3342.55 samples/sec   Loss 5.1318   LearningRate 0.0829   Epoch: 1   Global Step: 29840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:18,324-Speed 3334.34 samples/sec   Loss 5.2338   LearningRate 0.0829   Epoch: 1   Global Step: 29850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:21,411-Speed 3317.73 samples/sec   Loss 5.1273   LearningRate 0.0829   Epoch: 1   Global Step: 29860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:24,503-Speed 3313.52 samples/sec   Loss 5.2877   LearningRate 0.0829   Epoch: 1   Global Step: 29870   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:22:27,549-Speed 3362.59 samples/sec   Loss 5.1744   LearningRate 0.0829   Epoch: 1   Global Step: 29880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:30,619-Speed 3336.03 samples/sec   Loss 5.1970   LearningRate 0.0829   Epoch: 1   Global Step: 29890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:33,685-Speed 3340.47 samples/sec   Loss 5.0557   LearningRate 0.0829   Epoch: 1   Global Step: 29900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:36,753-Speed 3338.48 samples/sec   Loss 5.1958   LearningRate 0.0829   Epoch: 1   Global Step: 29910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:22:39,814-Speed 3345.49 samples/sec   Loss 5.1866   LearningRate 0.0829   Epoch: 1   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:22:42,878-Speed 3343.15 samples/sec   Loss 5.1171   LearningRate 0.0829   Epoch: 1   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:22:46,015-Speed 3265.00 samples/sec   Loss 5.1974   LearningRate 0.0829   Epoch: 1   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:22:49,205-Speed 3210.99 samples/sec   Loss 5.1808   LearningRate 0.0829   Epoch: 1   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:22:52,270-Speed 3341.39 samples/sec   Loss 5.1865   LearningRate 0.0829   Epoch: 1   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:22:55,351-Speed 3324.31 samples/sec   Loss 5.2387   LearningRate 0.0829   Epoch: 1   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:22:58,422-Speed 3336.06 samples/sec   Loss 5.1582   LearningRate 0.0828   Epoch: 1   Global Step: 29980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:23:01,492-Speed 3335.72 samples/sec   Loss 5.1943   LearningRate 0.0828   Epoch: 1   Global Step: 29990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:23:04,558-Speed 3340.93 samples/sec   Loss 5.3689   LearningRate 0.0828   Epoch: 1   Global Step: 30000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:23:48,231-[lfw][30000]XNorm: 22.215762
Training: 2022-04-11 02:23:48,232-[lfw][30000]Accuracy-Flip: 0.99583+-0.00300
Training: 2022-04-11 02:23:48,232-[lfw][30000]Accuracy-Highest: 0.99767
Training: 2022-04-11 02:24:39,010-[cfp_fp][30000]XNorm: 20.173684
Training: 2022-04-11 02:24:39,010-[cfp_fp][30000]Accuracy-Flip: 0.97314+-0.00645
Training: 2022-04-11 02:24:39,011-[cfp_fp][30000]Accuracy-Highest: 0.97671
Training: 2022-04-11 02:25:22,962-[agedb_30][30000]XNorm: 22.465723
Training: 2022-04-11 02:25:22,963-[agedb_30][30000]Accuracy-Flip: 0.97500+-0.00810
Training: 2022-04-11 02:25:22,963-[agedb_30][30000]Accuracy-Highest: 0.97500
Training: 2022-04-11 02:25:26,022-Speed 72.39 samples/sec   Loss 5.2182   LearningRate 0.0828   Epoch: 1   Global Step: 30010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:25:29,074-Speed 3355.85 samples/sec   Loss 5.1720   LearningRate 0.0828   Epoch: 1   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:25:32,134-Speed 3347.52 samples/sec   Loss 5.1634   LearningRate 0.0828   Epoch: 1   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:25:35,243-Speed 3294.45 samples/sec   Loss 5.1065   LearningRate 0.0828   Epoch: 1   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:25:38,332-Speed 3315.91 samples/sec   Loss 5.0813   LearningRate 0.0828   Epoch: 1   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:25:41,402-Speed 3335.92 samples/sec   Loss 5.2443   LearningRate 0.0828   Epoch: 1   Global Step: 30060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:25:44,492-Speed 3313.97 samples/sec   Loss 5.1661   LearningRate 0.0828   Epoch: 1   Global Step: 30070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:25:47,603-Speed 3292.82 samples/sec   Loss 5.1465   LearningRate 0.0828   Epoch: 1   Global Step: 30080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:25:50,679-Speed 3329.95 samples/sec   Loss 5.1404   LearningRate 0.0828   Epoch: 1   Global Step: 30090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:25:53,791-Speed 3291.32 samples/sec   Loss 5.1625   LearningRate 0.0828   Epoch: 1   Global Step: 30100   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:25:56,853-Speed 3345.30 samples/sec   Loss 5.1695   LearningRate 0.0828   Epoch: 1   Global Step: 30110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:25:59,927-Speed 3331.63 samples/sec   Loss 5.1125   LearningRate 0.0828   Epoch: 1   Global Step: 30120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:02,985-Speed 3349.83 samples/sec   Loss 5.2045   LearningRate 0.0828   Epoch: 1   Global Step: 30130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:06,111-Speed 3276.03 samples/sec   Loss 5.2669   LearningRate 0.0828   Epoch: 1   Global Step: 30140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:09,181-Speed 3336.27 samples/sec   Loss 5.1920   LearningRate 0.0828   Epoch: 1   Global Step: 30150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:12,246-Speed 3341.84 samples/sec   Loss 5.1144   LearningRate 0.0827   Epoch: 1   Global Step: 30160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:26:15,318-Speed 3334.49 samples/sec   Loss 5.1697   LearningRate 0.0827   Epoch: 1   Global Step: 30170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:26:18,397-Speed 3326.64 samples/sec   Loss 5.2431   LearningRate 0.0827   Epoch: 1   Global Step: 30180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:21,458-Speed 3345.73 samples/sec   Loss 5.1078   LearningRate 0.0827   Epoch: 1   Global Step: 30190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:24,535-Speed 3328.18 samples/sec   Loss 5.0756   LearningRate 0.0827   Epoch: 1   Global Step: 30200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:27,628-Speed 3312.18 samples/sec   Loss 5.1669   LearningRate 0.0827   Epoch: 1   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:30,749-Speed 3281.26 samples/sec   Loss 5.1395   LearningRate 0.0827   Epoch: 1   Global Step: 30220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:33,925-Speed 3224.71 samples/sec   Loss 5.0998   LearningRate 0.0827   Epoch: 1   Global Step: 30230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:37,039-Speed 3289.66 samples/sec   Loss 5.1207   LearningRate 0.0827   Epoch: 1   Global Step: 30240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:40,111-Speed 3333.83 samples/sec   Loss 5.2378   LearningRate 0.0827   Epoch: 1   Global Step: 30250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:43,184-Speed 3333.36 samples/sec   Loss 5.2616   LearningRate 0.0827   Epoch: 1   Global Step: 30260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:46,252-Speed 3338.25 samples/sec   Loss 5.1351   LearningRate 0.0827   Epoch: 1   Global Step: 30270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:26:49,324-Speed 3334.85 samples/sec   Loss 5.1540   LearningRate 0.0827   Epoch: 1   Global Step: 30280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:26:52,416-Speed 3312.18 samples/sec   Loss 5.2188   LearningRate 0.0827   Epoch: 1   Global Step: 30290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:26:55,569-Speed 3247.71 samples/sec   Loss 5.1394   LearningRate 0.0827   Epoch: 1   Global Step: 30300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:26:58,636-Speed 3340.13 samples/sec   Loss 5.1384   LearningRate 0.0827   Epoch: 1   Global Step: 30310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:01,707-Speed 3335.58 samples/sec   Loss 5.1190   LearningRate 0.0827   Epoch: 1   Global Step: 30320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:04,783-Speed 3329.17 samples/sec   Loss 5.0784   LearningRate 0.0827   Epoch: 1   Global Step: 30330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:07,862-Speed 3327.23 samples/sec   Loss 5.2429   LearningRate 0.0826   Epoch: 1   Global Step: 30340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:10,965-Speed 3300.67 samples/sec   Loss 5.2186   LearningRate 0.0826   Epoch: 1   Global Step: 30350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:14,034-Speed 3337.28 samples/sec   Loss 5.1281   LearningRate 0.0826   Epoch: 1   Global Step: 30360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:17,132-Speed 3305.58 samples/sec   Loss 5.1222   LearningRate 0.0826   Epoch: 1   Global Step: 30370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:20,238-Speed 3297.85 samples/sec   Loss 5.0513   LearningRate 0.0826   Epoch: 1   Global Step: 30380   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:27:23,307-Speed 3337.78 samples/sec   Loss 5.0773   LearningRate 0.0826   Epoch: 1   Global Step: 30390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:26,420-Speed 3289.42 samples/sec   Loss 5.1396   LearningRate 0.0826   Epoch: 1   Global Step: 30400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:29,503-Speed 3322.68 samples/sec   Loss 5.1630   LearningRate 0.0826   Epoch: 1   Global Step: 30410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:32,706-Speed 3198.15 samples/sec   Loss 5.1127   LearningRate 0.0826   Epoch: 1   Global Step: 30420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:27:35,756-Speed 3358.35 samples/sec   Loss 5.1055   LearningRate 0.0826   Epoch: 1   Global Step: 30430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:27:38,828-Speed 3333.79 samples/sec   Loss 5.1266   LearningRate 0.0826   Epoch: 1   Global Step: 30440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:27:41,925-Speed 3307.35 samples/sec   Loss 5.2281   LearningRate 0.0826   Epoch: 1   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:27:45,154-Speed 3172.37 samples/sec   Loss 5.1141   LearningRate 0.0826   Epoch: 1   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:27:48,212-Speed 3348.49 samples/sec   Loss 5.2752   LearningRate 0.0826   Epoch: 1   Global Step: 30470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:27:51,273-Speed 3346.45 samples/sec   Loss 5.1870   LearningRate 0.0826   Epoch: 1   Global Step: 30480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:27:54,341-Speed 3338.71 samples/sec   Loss 5.1566   LearningRate 0.0826   Epoch: 1   Global Step: 30490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:27:57,438-Speed 3307.16 samples/sec   Loss 5.0679   LearningRate 0.0826   Epoch: 1   Global Step: 30500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:28:00,537-Speed 3305.25 samples/sec   Loss 5.0836   LearningRate 0.0826   Epoch: 1   Global Step: 30510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:28:03,620-Speed 3322.01 samples/sec   Loss 5.0755   LearningRate 0.0826   Epoch: 1   Global Step: 30520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:28:06,696-Speed 3330.47 samples/sec   Loss 5.1485   LearningRate 0.0825   Epoch: 1   Global Step: 30530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:09,759-Speed 3343.53 samples/sec   Loss 5.1587   LearningRate 0.0825   Epoch: 1   Global Step: 30540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:12,963-Speed 3196.89 samples/sec   Loss 5.0843   LearningRate 0.0825   Epoch: 1   Global Step: 30550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:16,090-Speed 3275.32 samples/sec   Loss 5.1183   LearningRate 0.0825   Epoch: 1   Global Step: 30560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:19,151-Speed 3346.02 samples/sec   Loss 5.1134   LearningRate 0.0825   Epoch: 1   Global Step: 30570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:22,226-Speed 3330.99 samples/sec   Loss 5.2612   LearningRate 0.0825   Epoch: 1   Global Step: 30580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:25,295-Speed 3337.19 samples/sec   Loss 5.0892   LearningRate 0.0825   Epoch: 1   Global Step: 30590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:28,357-Speed 3345.67 samples/sec   Loss 5.1296   LearningRate 0.0825   Epoch: 1   Global Step: 30600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:31,423-Speed 3340.05 samples/sec   Loss 5.1664   LearningRate 0.0825   Epoch: 1   Global Step: 30610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:34,491-Speed 3338.11 samples/sec   Loss 5.0688   LearningRate 0.0825   Epoch: 1   Global Step: 30620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:37,582-Speed 3314.42 samples/sec   Loss 5.0704   LearningRate 0.0825   Epoch: 1   Global Step: 30630   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:28:40,686-Speed 3299.39 samples/sec   Loss 5.0434   LearningRate 0.0825   Epoch: 1   Global Step: 30640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:43,775-Speed 3315.61 samples/sec   Loss 5.0734   LearningRate 0.0825   Epoch: 1   Global Step: 30650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:46,853-Speed 3327.02 samples/sec   Loss 5.0949   LearningRate 0.0825   Epoch: 1   Global Step: 30660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:49,933-Speed 3326.30 samples/sec   Loss 5.1570   LearningRate 0.0825   Epoch: 1   Global Step: 30670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:53,001-Speed 3338.05 samples/sec   Loss 5.1125   LearningRate 0.0825   Epoch: 1   Global Step: 30680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:56,066-Speed 3342.63 samples/sec   Loss 5.1121   LearningRate 0.0825   Epoch: 1   Global Step: 30690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:28:59,140-Speed 3331.31 samples/sec   Loss 5.1259   LearningRate 0.0825   Epoch: 1   Global Step: 30700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:29:02,299-Speed 3242.20 samples/sec   Loss 5.0742   LearningRate 0.0824   Epoch: 1   Global Step: 30710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:29:05,394-Speed 3310.06 samples/sec   Loss 5.0970   LearningRate 0.0824   Epoch: 1   Global Step: 30720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:29:08,490-Speed 3308.39 samples/sec   Loss 5.1051   LearningRate 0.0824   Epoch: 1   Global Step: 30730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:29:11,565-Speed 3330.35 samples/sec   Loss 5.1629   LearningRate 0.0824   Epoch: 1   Global Step: 30740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:29:14,627-Speed 3344.82 samples/sec   Loss 5.1789   LearningRate 0.0824   Epoch: 1   Global Step: 30750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:17,710-Speed 3322.44 samples/sec   Loss 5.1152   LearningRate 0.0824   Epoch: 1   Global Step: 30760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:20,793-Speed 3321.98 samples/sec   Loss 5.2487   LearningRate 0.0824   Epoch: 1   Global Step: 30770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:23,868-Speed 3331.17 samples/sec   Loss 5.2305   LearningRate 0.0824   Epoch: 1   Global Step: 30780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:27,072-Speed 3196.62 samples/sec   Loss 5.1119   LearningRate 0.0824   Epoch: 1   Global Step: 30790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:30,152-Speed 3326.36 samples/sec   Loss 5.1773   LearningRate 0.0824   Epoch: 1   Global Step: 30800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:33,284-Speed 3269.65 samples/sec   Loss 5.1996   LearningRate 0.0824   Epoch: 1   Global Step: 30810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:36,346-Speed 3344.60 samples/sec   Loss 5.0261   LearningRate 0.0824   Epoch: 1   Global Step: 30820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:39,477-Speed 3271.70 samples/sec   Loss 5.0989   LearningRate 0.0824   Epoch: 1   Global Step: 30830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:42,718-Speed 3160.12 samples/sec   Loss 5.1419   LearningRate 0.0824   Epoch: 1   Global Step: 30840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:45,795-Speed 3328.87 samples/sec   Loss 5.0422   LearningRate 0.0824   Epoch: 1   Global Step: 30850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:48,871-Speed 3330.00 samples/sec   Loss 5.1357   LearningRate 0.0824   Epoch: 1   Global Step: 30860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:51,979-Speed 3295.32 samples/sec   Loss 5.1314   LearningRate 0.0824   Epoch: 1   Global Step: 30870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:55,050-Speed 3335.05 samples/sec   Loss 5.1311   LearningRate 0.0824   Epoch: 1   Global Step: 30880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:29:58,116-Speed 3341.17 samples/sec   Loss 5.0970   LearningRate 0.0823   Epoch: 1   Global Step: 30890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:01,192-Speed 3329.23 samples/sec   Loss 5.0756   LearningRate 0.0823   Epoch: 1   Global Step: 30900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:04,268-Speed 3329.55 samples/sec   Loss 5.0552   LearningRate 0.0823   Epoch: 1   Global Step: 30910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:07,338-Speed 3336.85 samples/sec   Loss 5.1895   LearningRate 0.0823   Epoch: 1   Global Step: 30920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:10,405-Speed 3338.73 samples/sec   Loss 5.0602   LearningRate 0.0823   Epoch: 1   Global Step: 30930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:13,471-Speed 3341.63 samples/sec   Loss 5.0831   LearningRate 0.0823   Epoch: 1   Global Step: 30940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:16,534-Speed 3344.36 samples/sec   Loss 5.1175   LearningRate 0.0823   Epoch: 1   Global Step: 30950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:30:19,602-Speed 3337.26 samples/sec   Loss 5.0816   LearningRate 0.0823   Epoch: 1   Global Step: 30960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:30:22,659-Speed 3350.77 samples/sec   Loss 5.1339   LearningRate 0.0823   Epoch: 1   Global Step: 30970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:25,723-Speed 3343.74 samples/sec   Loss 5.0676   LearningRate 0.0823   Epoch: 1   Global Step: 30980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:28,786-Speed 3343.91 samples/sec   Loss 5.0615   LearningRate 0.0823   Epoch: 1   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:31,863-Speed 3327.95 samples/sec   Loss 5.1621   LearningRate 0.0823   Epoch: 1   Global Step: 31000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:34,933-Speed 3336.26 samples/sec   Loss 5.0478   LearningRate 0.0823   Epoch: 1   Global Step: 31010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:38,003-Speed 3336.73 samples/sec   Loss 5.1791   LearningRate 0.0823   Epoch: 1   Global Step: 31020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:41,068-Speed 3341.97 samples/sec   Loss 5.0644   LearningRate 0.0823   Epoch: 1   Global Step: 31030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:44,141-Speed 3332.83 samples/sec   Loss 5.1263   LearningRate 0.0823   Epoch: 1   Global Step: 31040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:47,229-Speed 3317.10 samples/sec   Loss 5.0407   LearningRate 0.0823   Epoch: 1   Global Step: 31050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:50,415-Speed 3214.08 samples/sec   Loss 5.0383   LearningRate 0.0823   Epoch: 1   Global Step: 31060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:30:53,503-Speed 3316.88 samples/sec   Loss 4.9688   LearningRate 0.0823   Epoch: 1   Global Step: 31070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:30:56,575-Speed 3334.62 samples/sec   Loss 5.1355   LearningRate 0.0822   Epoch: 1   Global Step: 31080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:30:59,637-Speed 3344.26 samples/sec   Loss 5.0364   LearningRate 0.0822   Epoch: 1   Global Step: 31090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:02,741-Speed 3300.36 samples/sec   Loss 5.1135   LearningRate 0.0822   Epoch: 1   Global Step: 31100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:05,838-Speed 3307.14 samples/sec   Loss 5.0557   LearningRate 0.0822   Epoch: 1   Global Step: 31110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:08,944-Speed 3297.24 samples/sec   Loss 5.0419   LearningRate 0.0822   Epoch: 1   Global Step: 31120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:12,018-Speed 3332.23 samples/sec   Loss 5.0264   LearningRate 0.0822   Epoch: 1   Global Step: 31130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:15,093-Speed 3331.23 samples/sec   Loss 5.0040   LearningRate 0.0822   Epoch: 1   Global Step: 31140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:18,163-Speed 3336.38 samples/sec   Loss 5.0567   LearningRate 0.0822   Epoch: 1   Global Step: 31150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:21,287-Speed 3277.68 samples/sec   Loss 5.0780   LearningRate 0.0822   Epoch: 1   Global Step: 31160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:24,352-Speed 3342.46 samples/sec   Loss 5.0096   LearningRate 0.0822   Epoch: 1   Global Step: 31170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:27,425-Speed 3333.29 samples/sec   Loss 5.0925   LearningRate 0.0822   Epoch: 1   Global Step: 31180   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:30,494-Speed 3336.53 samples/sec   Loss 5.0691   LearningRate 0.0822   Epoch: 1   Global Step: 31190   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:33,570-Speed 3330.30 samples/sec   Loss 5.1171   LearningRate 0.0822   Epoch: 1   Global Step: 31200   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:36,644-Speed 3332.07 samples/sec   Loss 5.0444   LearningRate 0.0822   Epoch: 1   Global Step: 31210   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:39,717-Speed 3332.80 samples/sec   Loss 5.0357   LearningRate 0.0822   Epoch: 1   Global Step: 31220   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:31:42,790-Speed 3333.79 samples/sec   Loss 5.1196   LearningRate 0.0822   Epoch: 1   Global Step: 31230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:31:45,880-Speed 3314.40 samples/sec   Loss 5.0894   LearningRate 0.0822   Epoch: 1   Global Step: 31240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:31:48,975-Speed 3308.63 samples/sec   Loss 5.2191   LearningRate 0.0822   Epoch: 1   Global Step: 31250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:31:52,075-Speed 3303.98 samples/sec   Loss 5.1561   LearningRate 0.0821   Epoch: 1   Global Step: 31260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:31:55,153-Speed 3327.88 samples/sec   Loss 5.0959   LearningRate 0.0821   Epoch: 1   Global Step: 31270   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:31:58,224-Speed 3335.42 samples/sec   Loss 5.0117   LearningRate 0.0821   Epoch: 1   Global Step: 31280   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:01,307-Speed 3321.99 samples/sec   Loss 5.1104   LearningRate 0.0821   Epoch: 1   Global Step: 31290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:04,394-Speed 3317.84 samples/sec   Loss 5.0668   LearningRate 0.0821   Epoch: 1   Global Step: 31300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:07,463-Speed 3338.51 samples/sec   Loss 5.1252   LearningRate 0.0821   Epoch: 1   Global Step: 31310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:10,554-Speed 3313.17 samples/sec   Loss 5.0121   LearningRate 0.0821   Epoch: 1   Global Step: 31320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:13,646-Speed 3313.52 samples/sec   Loss 5.0031   LearningRate 0.0821   Epoch: 1   Global Step: 31330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:32:16,812-Speed 3234.81 samples/sec   Loss 5.0239   LearningRate 0.0821   Epoch: 1   Global Step: 31340   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:32:19,967-Speed 3245.62 samples/sec   Loss 5.0849   LearningRate 0.0821   Epoch: 1   Global Step: 31350   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:32:23,144-Speed 3223.65 samples/sec   Loss 4.9610   LearningRate 0.0821   Epoch: 1   Global Step: 31360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:32:26,256-Speed 3291.59 samples/sec   Loss 4.9877   LearningRate 0.0821   Epoch: 1   Global Step: 31370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:32:29,342-Speed 3318.92 samples/sec   Loss 5.0842   LearningRate 0.0821   Epoch: 1   Global Step: 31380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:32:32,508-Speed 3235.12 samples/sec   Loss 5.1361   LearningRate 0.0821   Epoch: 1   Global Step: 31390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:32:35,573-Speed 3341.99 samples/sec   Loss 5.0410   LearningRate 0.0821   Epoch: 1   Global Step: 31400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:38,655-Speed 3323.57 samples/sec   Loss 5.0652   LearningRate 0.0821   Epoch: 1   Global Step: 31410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:41,724-Speed 3337.60 samples/sec   Loss 5.0822   LearningRate 0.0821   Epoch: 1   Global Step: 31420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:44,818-Speed 3310.29 samples/sec   Loss 5.1267   LearningRate 0.0821   Epoch: 1   Global Step: 31430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:47,893-Speed 3330.24 samples/sec   Loss 5.0103   LearningRate 0.0821   Epoch: 1   Global Step: 31440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:50,961-Speed 3338.25 samples/sec   Loss 5.0429   LearningRate 0.0820   Epoch: 1   Global Step: 31450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:54,031-Speed 3337.28 samples/sec   Loss 5.1667   LearningRate 0.0820   Epoch: 1   Global Step: 31460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:32:57,107-Speed 3329.63 samples/sec   Loss 4.9972   LearningRate 0.0820   Epoch: 1   Global Step: 31470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:33:00,179-Speed 3334.04 samples/sec   Loss 5.0784   LearningRate 0.0820   Epoch: 1   Global Step: 31480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:33:03,265-Speed 3318.46 samples/sec   Loss 5.0704   LearningRate 0.0820   Epoch: 1   Global Step: 31490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:33:06,335-Speed 3337.14 samples/sec   Loss 5.0983   LearningRate 0.0820   Epoch: 1   Global Step: 31500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:09,410-Speed 3329.98 samples/sec   Loss 4.9986   LearningRate 0.0820   Epoch: 1   Global Step: 31510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:12,483-Speed 3333.61 samples/sec   Loss 4.9967   LearningRate 0.0820   Epoch: 1   Global Step: 31520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:15,683-Speed 3200.21 samples/sec   Loss 5.0569   LearningRate 0.0820   Epoch: 1   Global Step: 31530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:18,760-Speed 3328.88 samples/sec   Loss 5.0603   LearningRate 0.0820   Epoch: 1   Global Step: 31540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:21,954-Speed 3206.21 samples/sec   Loss 5.0001   LearningRate 0.0820   Epoch: 1   Global Step: 31550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:25,069-Speed 3289.02 samples/sec   Loss 5.0036   LearningRate 0.0820   Epoch: 1   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:28,144-Speed 3330.64 samples/sec   Loss 4.9295   LearningRate 0.0820   Epoch: 1   Global Step: 31570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:31,242-Speed 3305.88 samples/sec   Loss 5.0607   LearningRate 0.0820   Epoch: 1   Global Step: 31580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:34,319-Speed 3329.11 samples/sec   Loss 5.1026   LearningRate 0.0820   Epoch: 1   Global Step: 31590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:37,379-Speed 3347.09 samples/sec   Loss 5.1142   LearningRate 0.0820   Epoch: 1   Global Step: 31600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:40,483-Speed 3299.70 samples/sec   Loss 5.0277   LearningRate 0.0820   Epoch: 1   Global Step: 31610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:43,567-Speed 3321.31 samples/sec   Loss 5.0629   LearningRate 0.0820   Epoch: 1   Global Step: 31620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:46,725-Speed 3242.46 samples/sec   Loss 5.0063   LearningRate 0.0819   Epoch: 1   Global Step: 31630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:49,807-Speed 3323.82 samples/sec   Loss 4.9080   LearningRate 0.0819   Epoch: 1   Global Step: 31640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:52,884-Speed 3329.20 samples/sec   Loss 5.1407   LearningRate 0.0819   Epoch: 1   Global Step: 31650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:55,988-Speed 3299.55 samples/sec   Loss 5.0665   LearningRate 0.0819   Epoch: 1   Global Step: 31660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:33:59,067-Speed 3325.85 samples/sec   Loss 4.9136   LearningRate 0.0819   Epoch: 1   Global Step: 31670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:34:02,134-Speed 3340.13 samples/sec   Loss 5.0157   LearningRate 0.0819   Epoch: 1   Global Step: 31680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:34:05,236-Speed 3301.61 samples/sec   Loss 5.1125   LearningRate 0.0819   Epoch: 1   Global Step: 31690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:34:08,286-Speed 3358.53 samples/sec   Loss 5.0540   LearningRate 0.0819   Epoch: 1   Global Step: 31700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:34:11,399-Speed 3289.73 samples/sec   Loss 4.9874   LearningRate 0.0819   Epoch: 1   Global Step: 31710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:34:14,551-Speed 3250.03 samples/sec   Loss 4.9325   LearningRate 0.0819   Epoch: 1   Global Step: 31720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:34:17,650-Speed 3305.28 samples/sec   Loss 5.1193   LearningRate 0.0819   Epoch: 1   Global Step: 31730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:34:20,821-Speed 3230.24 samples/sec   Loss 4.9656   LearningRate 0.0819   Epoch: 1   Global Step: 31740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:34:23,918-Speed 3306.61 samples/sec   Loss 5.1583   LearningRate 0.0819   Epoch: 1   Global Step: 31750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:26,984-Speed 3341.16 samples/sec   Loss 5.0871   LearningRate 0.0819   Epoch: 1   Global Step: 31760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:30,081-Speed 3306.66 samples/sec   Loss 5.0492   LearningRate 0.0819   Epoch: 1   Global Step: 31770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:33,159-Speed 3327.43 samples/sec   Loss 4.9964   LearningRate 0.0819   Epoch: 1   Global Step: 31780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:36,249-Speed 3315.57 samples/sec   Loss 5.0089   LearningRate 0.0819   Epoch: 1   Global Step: 31790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:39,327-Speed 3327.13 samples/sec   Loss 4.9839   LearningRate 0.0819   Epoch: 1   Global Step: 31800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:42,437-Speed 3294.06 samples/sec   Loss 4.9802   LearningRate 0.0818   Epoch: 1   Global Step: 31810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:45,528-Speed 3313.64 samples/sec   Loss 5.0112   LearningRate 0.0818   Epoch: 1   Global Step: 31820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:48,600-Speed 3334.40 samples/sec   Loss 4.9860   LearningRate 0.0818   Epoch: 1   Global Step: 31830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:51,698-Speed 3305.72 samples/sec   Loss 5.0583   LearningRate 0.0818   Epoch: 1   Global Step: 31840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:34:54,774-Speed 3329.25 samples/sec   Loss 5.0618   LearningRate 0.0818   Epoch: 1   Global Step: 31850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:34:57,853-Speed 3326.61 samples/sec   Loss 5.0024   LearningRate 0.0818   Epoch: 1   Global Step: 31860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:00,943-Speed 3314.49 samples/sec   Loss 5.0409   LearningRate 0.0818   Epoch: 1   Global Step: 31870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:04,072-Speed 3273.79 samples/sec   Loss 5.0899   LearningRate 0.0818   Epoch: 1   Global Step: 31880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:07,260-Speed 3212.65 samples/sec   Loss 5.0349   LearningRate 0.0818   Epoch: 1   Global Step: 31890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:10,348-Speed 3317.20 samples/sec   Loss 4.9939   LearningRate 0.0818   Epoch: 1   Global Step: 31900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:13,414-Speed 3341.23 samples/sec   Loss 4.9360   LearningRate 0.0818   Epoch: 1   Global Step: 31910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:16,503-Speed 3315.05 samples/sec   Loss 5.1203   LearningRate 0.0818   Epoch: 1   Global Step: 31920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:19,619-Speed 3286.98 samples/sec   Loss 5.0307   LearningRate 0.0818   Epoch: 1   Global Step: 31930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:22,722-Speed 3301.28 samples/sec   Loss 5.1161   LearningRate 0.0818   Epoch: 1   Global Step: 31940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:25,818-Speed 3307.55 samples/sec   Loss 5.0567   LearningRate 0.0818   Epoch: 1   Global Step: 31950   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:35:28,971-Speed 3248.91 samples/sec   Loss 5.0395   LearningRate 0.0818   Epoch: 1   Global Step: 31960   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:35:32,051-Speed 3325.44 samples/sec   Loss 5.0046   LearningRate 0.0818   Epoch: 1   Global Step: 31970   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:35:35,110-Speed 3347.66 samples/sec   Loss 5.0490   LearningRate 0.0818   Epoch: 1   Global Step: 31980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:38,174-Speed 3343.02 samples/sec   Loss 4.9528   LearningRate 0.0818   Epoch: 1   Global Step: 31990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:35:41,240-Speed 3341.53 samples/sec   Loss 5.0014   LearningRate 0.0817   Epoch: 1   Global Step: 32000   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:36:25,323-[lfw][32000]XNorm: 24.467341
Training: 2022-04-11 02:36:25,324-[lfw][32000]Accuracy-Flip: 0.99683+-0.00337
Training: 2022-04-11 02:36:25,324-[lfw][32000]Accuracy-Highest: 0.99767
Training: 2022-04-11 02:37:16,601-[cfp_fp][32000]XNorm: 22.552203
Training: 2022-04-11 02:37:16,602-[cfp_fp][32000]Accuracy-Flip: 0.97657+-0.00826
Training: 2022-04-11 02:37:16,602-[cfp_fp][32000]Accuracy-Highest: 0.97671
Training: 2022-04-11 02:38:00,650-[agedb_30][32000]XNorm: 24.442474
Training: 2022-04-11 02:38:00,651-[agedb_30][32000]Accuracy-Flip: 0.97667+-0.00606
Training: 2022-04-11 02:38:00,651-[agedb_30][32000]Accuracy-Highest: 0.97667
Training: 2022-04-11 02:38:03,749-Speed 71.86 samples/sec   Loss 5.0343   LearningRate 0.0817   Epoch: 1   Global Step: 32010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:06,877-Speed 3274.64 samples/sec   Loss 5.0545   LearningRate 0.0817   Epoch: 1   Global Step: 32020   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:09,949-Speed 3335.38 samples/sec   Loss 4.9677   LearningRate 0.0817   Epoch: 1   Global Step: 32030   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:13,056-Speed 3296.61 samples/sec   Loss 4.9552   LearningRate 0.0817   Epoch: 1   Global Step: 32040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:16,110-Speed 3353.03 samples/sec   Loss 5.0573   LearningRate 0.0817   Epoch: 1   Global Step: 32050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:19,166-Speed 3351.29 samples/sec   Loss 5.0707   LearningRate 0.0817   Epoch: 1   Global Step: 32060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:22,223-Speed 3350.65 samples/sec   Loss 4.9576   LearningRate 0.0817   Epoch: 1   Global Step: 32070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:25,388-Speed 3236.63 samples/sec   Loss 5.0544   LearningRate 0.0817   Epoch: 1   Global Step: 32080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:28,616-Speed 3172.72 samples/sec   Loss 4.9909   LearningRate 0.0817   Epoch: 1   Global Step: 32090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:31,826-Speed 3191.22 samples/sec   Loss 5.1154   LearningRate 0.0817   Epoch: 1   Global Step: 32100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:34,883-Speed 3349.85 samples/sec   Loss 5.0346   LearningRate 0.0817   Epoch: 1   Global Step: 32110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:37,945-Speed 3345.44 samples/sec   Loss 4.9495   LearningRate 0.0817   Epoch: 1   Global Step: 32120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:41,002-Speed 3349.93 samples/sec   Loss 5.0905   LearningRate 0.0817   Epoch: 1   Global Step: 32130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:44,069-Speed 3339.98 samples/sec   Loss 5.0341   LearningRate 0.0817   Epoch: 1   Global Step: 32140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:47,135-Speed 3340.37 samples/sec   Loss 4.9797   LearningRate 0.0817   Epoch: 1   Global Step: 32150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:50,252-Speed 3286.02 samples/sec   Loss 4.9924   LearningRate 0.0817   Epoch: 1   Global Step: 32160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:38:53,335-Speed 3321.88 samples/sec   Loss 4.9581   LearningRate 0.0817   Epoch: 1   Global Step: 32170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:38:56,403-Speed 3338.87 samples/sec   Loss 4.9315   LearningRate 0.0816   Epoch: 1   Global Step: 32180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:38:59,469-Speed 3340.64 samples/sec   Loss 4.9671   LearningRate 0.0816   Epoch: 1   Global Step: 32190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:02,573-Speed 3300.41 samples/sec   Loss 4.9883   LearningRate 0.0816   Epoch: 1   Global Step: 32200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:05,641-Speed 3338.54 samples/sec   Loss 4.9974   LearningRate 0.0816   Epoch: 1   Global Step: 32210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:08,706-Speed 3340.68 samples/sec   Loss 5.0536   LearningRate 0.0816   Epoch: 1   Global Step: 32220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:11,774-Speed 3339.13 samples/sec   Loss 5.0106   LearningRate 0.0816   Epoch: 1   Global Step: 32230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:14,839-Speed 3341.19 samples/sec   Loss 5.1037   LearningRate 0.0816   Epoch: 1   Global Step: 32240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:17,916-Speed 3329.25 samples/sec   Loss 4.9838   LearningRate 0.0816   Epoch: 1   Global Step: 32250   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:21,032-Speed 3286.87 samples/sec   Loss 4.9345   LearningRate 0.0816   Epoch: 1   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:24,116-Speed 3321.53 samples/sec   Loss 5.0477   LearningRate 0.0816   Epoch: 1   Global Step: 32270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:39:27,340-Speed 3177.12 samples/sec   Loss 5.0174   LearningRate 0.0816   Epoch: 1   Global Step: 32280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:39:30,442-Speed 3301.36 samples/sec   Loss 4.8993   LearningRate 0.0816   Epoch: 1   Global Step: 32290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:33,539-Speed 3307.38 samples/sec   Loss 5.0202   LearningRate 0.0816   Epoch: 1   Global Step: 32300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:36,636-Speed 3307.54 samples/sec   Loss 4.9646   LearningRate 0.0816   Epoch: 1   Global Step: 32310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:39,772-Speed 3265.63 samples/sec   Loss 4.9539   LearningRate 0.0816   Epoch: 1   Global Step: 32320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:42,930-Speed 3242.71 samples/sec   Loss 4.8836   LearningRate 0.0816   Epoch: 1   Global Step: 32330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:45,998-Speed 3339.27 samples/sec   Loss 4.9911   LearningRate 0.0816   Epoch: 1   Global Step: 32340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:49,196-Speed 3202.23 samples/sec   Loss 4.9892   LearningRate 0.0816   Epoch: 1   Global Step: 32350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:52,311-Speed 3288.81 samples/sec   Loss 5.0143   LearningRate 0.0816   Epoch: 1   Global Step: 32360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:55,375-Speed 3342.22 samples/sec   Loss 4.9902   LearningRate 0.0815   Epoch: 1   Global Step: 32370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:39:58,490-Speed 3288.36 samples/sec   Loss 4.9543   LearningRate 0.0815   Epoch: 1   Global Step: 32380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:01,687-Speed 3204.12 samples/sec   Loss 5.0665   LearningRate 0.0815   Epoch: 1   Global Step: 32390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:40:04,765-Speed 3327.73 samples/sec   Loss 4.9228   LearningRate 0.0815   Epoch: 1   Global Step: 32400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:40:07,830-Speed 3341.69 samples/sec   Loss 4.9201   LearningRate 0.0815   Epoch: 1   Global Step: 32410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:40:10,878-Speed 3359.73 samples/sec   Loss 4.9325   LearningRate 0.0815   Epoch: 1   Global Step: 32420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:13,933-Speed 3352.90 samples/sec   Loss 5.0118   LearningRate 0.0815   Epoch: 1   Global Step: 32430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:17,104-Speed 3230.57 samples/sec   Loss 4.9866   LearningRate 0.0815   Epoch: 1   Global Step: 32440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:20,160-Speed 3351.76 samples/sec   Loss 5.1149   LearningRate 0.0815   Epoch: 1   Global Step: 32450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:23,233-Speed 3332.93 samples/sec   Loss 4.9904   LearningRate 0.0815   Epoch: 1   Global Step: 32460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:26,301-Speed 3338.15 samples/sec   Loss 4.9605   LearningRate 0.0815   Epoch: 1   Global Step: 32470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:29,365-Speed 3343.23 samples/sec   Loss 5.0803   LearningRate 0.0815   Epoch: 1   Global Step: 32480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:32,459-Speed 3310.19 samples/sec   Loss 4.9615   LearningRate 0.0815   Epoch: 1   Global Step: 32490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:35,524-Speed 3341.60 samples/sec   Loss 5.0563   LearningRate 0.0815   Epoch: 1   Global Step: 32500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:38,581-Speed 3350.46 samples/sec   Loss 4.9488   LearningRate 0.0815   Epoch: 1   Global Step: 32510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:40:41,736-Speed 3246.40 samples/sec   Loss 4.9253   LearningRate 0.0815   Epoch: 1   Global Step: 32520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:40:44,799-Speed 3344.21 samples/sec   Loss 5.0001   LearningRate 0.0815   Epoch: 1   Global Step: 32530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:40:47,866-Speed 3339.94 samples/sec   Loss 4.9722   LearningRate 0.0815   Epoch: 1   Global Step: 32540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:40:50,926-Speed 3346.65 samples/sec   Loss 4.9639   LearningRate 0.0814   Epoch: 1   Global Step: 32550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:40:54,028-Speed 3301.98 samples/sec   Loss 4.8703   LearningRate 0.0814   Epoch: 1   Global Step: 32560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:40:57,083-Speed 3352.61 samples/sec   Loss 4.9699   LearningRate 0.0814   Epoch: 1   Global Step: 32570   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:00,154-Speed 3336.20 samples/sec   Loss 4.9003   LearningRate 0.0814   Epoch: 1   Global Step: 32580   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:03,229-Speed 3329.91 samples/sec   Loss 5.0211   LearningRate 0.0814   Epoch: 1   Global Step: 32590   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:06,294-Speed 3341.89 samples/sec   Loss 4.9340   LearningRate 0.0814   Epoch: 1   Global Step: 32600   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:09,444-Speed 3251.80 samples/sec   Loss 4.8610   LearningRate 0.0814   Epoch: 1   Global Step: 32610   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:12,521-Speed 3328.59 samples/sec   Loss 4.9541   LearningRate 0.0814   Epoch: 1   Global Step: 32620   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:15,589-Speed 3339.19 samples/sec   Loss 4.8965   LearningRate 0.0814   Epoch: 1   Global Step: 32630   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:18,658-Speed 3337.35 samples/sec   Loss 4.9087   LearningRate 0.0814   Epoch: 1   Global Step: 32640   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:21,726-Speed 3338.79 samples/sec   Loss 4.9411   LearningRate 0.0814   Epoch: 1   Global Step: 32650   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:24,848-Speed 3280.71 samples/sec   Loss 5.0359   LearningRate 0.0814   Epoch: 1   Global Step: 32660   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:27,953-Speed 3297.84 samples/sec   Loss 4.8462   LearningRate 0.0814   Epoch: 1   Global Step: 32670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:41:31,049-Speed 3308.57 samples/sec   Loss 4.9695   LearningRate 0.0814   Epoch: 1   Global Step: 32680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:41:34,113-Speed 3342.28 samples/sec   Loss 5.1215   LearningRate 0.0814   Epoch: 1   Global Step: 32690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:41:37,181-Speed 3339.54 samples/sec   Loss 4.9793   LearningRate 0.0814   Epoch: 1   Global Step: 32700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:41:40,236-Speed 3352.29 samples/sec   Loss 4.9621   LearningRate 0.0814   Epoch: 1   Global Step: 32710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:41:43,294-Speed 3349.88 samples/sec   Loss 5.0198   LearningRate 0.0814   Epoch: 1   Global Step: 32720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:41:46,365-Speed 3334.71 samples/sec   Loss 4.9905   LearningRate 0.0814   Epoch: 1   Global Step: 32730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:41:49,426-Speed 3346.28 samples/sec   Loss 4.9533   LearningRate 0.0813   Epoch: 1   Global Step: 32740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:41:52,493-Speed 3340.11 samples/sec   Loss 4.8850   LearningRate 0.0813   Epoch: 1   Global Step: 32750   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:55,565-Speed 3333.79 samples/sec   Loss 4.8644   LearningRate 0.0813   Epoch: 1   Global Step: 32760   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:41:58,627-Speed 3345.64 samples/sec   Loss 4.9707   LearningRate 0.0813   Epoch: 1   Global Step: 32770   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:42:01,698-Speed 3334.43 samples/sec   Loss 4.9295   LearningRate 0.0813   Epoch: 1   Global Step: 32780   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:42:04,774-Speed 3331.01 samples/sec   Loss 4.8698   LearningRate 0.0813   Epoch: 1   Global Step: 32790   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:42:07,836-Speed 3344.77 samples/sec   Loss 5.0200   LearningRate 0.0813   Epoch: 1   Global Step: 32800   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:42:10,912-Speed 3329.83 samples/sec   Loss 4.8688   LearningRate 0.0813   Epoch: 1   Global Step: 32810   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:42:14,003-Speed 3312.91 samples/sec   Loss 4.9368   LearningRate 0.0813   Epoch: 1   Global Step: 32820   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:42:17,075-Speed 3334.40 samples/sec   Loss 4.9022   LearningRate 0.0813   Epoch: 1   Global Step: 32830   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:42:20,132-Speed 3350.39 samples/sec   Loss 4.9531   LearningRate 0.0813   Epoch: 1   Global Step: 32840   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:42:23,270-Speed 3263.69 samples/sec   Loss 4.9474   LearningRate 0.0813   Epoch: 1   Global Step: 32850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:26,352-Speed 3323.39 samples/sec   Loss 4.8666   LearningRate 0.0813   Epoch: 1   Global Step: 32860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:29,465-Speed 3290.05 samples/sec   Loss 4.9268   LearningRate 0.0813   Epoch: 1   Global Step: 32870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:32,605-Speed 3262.88 samples/sec   Loss 5.0394   LearningRate 0.0813   Epoch: 1   Global Step: 32880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:35,660-Speed 3352.08 samples/sec   Loss 4.8992   LearningRate 0.0813   Epoch: 1   Global Step: 32890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:38,717-Speed 3351.37 samples/sec   Loss 4.8997   LearningRate 0.0813   Epoch: 1   Global Step: 32900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:41,774-Speed 3350.43 samples/sec   Loss 4.8363   LearningRate 0.0813   Epoch: 1   Global Step: 32910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:44,868-Speed 3309.93 samples/sec   Loss 4.9493   LearningRate 0.0812   Epoch: 1   Global Step: 32920   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:47,969-Speed 3302.52 samples/sec   Loss 4.9328   LearningRate 0.0812   Epoch: 1   Global Step: 32930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:51,104-Speed 3267.23 samples/sec   Loss 4.9125   LearningRate 0.0812   Epoch: 1   Global Step: 32940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:42:54,217-Speed 3290.09 samples/sec   Loss 4.8852   LearningRate 0.0812   Epoch: 1   Global Step: 32950   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:42:57,285-Speed 3338.13 samples/sec   Loss 4.8638   LearningRate 0.0812   Epoch: 1   Global Step: 32960   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:43:00,345-Speed 3347.97 samples/sec   Loss 4.8831   LearningRate 0.0812   Epoch: 1   Global Step: 32970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:43:03,403-Speed 3349.96 samples/sec   Loss 4.8797   LearningRate 0.0812   Epoch: 1   Global Step: 32980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:43:06,466-Speed 3343.19 samples/sec   Loss 4.9928   LearningRate 0.0812   Epoch: 1   Global Step: 32990   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:43:09,522-Speed 3351.73 samples/sec   Loss 4.9148   LearningRate 0.0812   Epoch: 1   Global Step: 33000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:12,592-Speed 3336.81 samples/sec   Loss 4.9141   LearningRate 0.0812   Epoch: 1   Global Step: 33010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:15,657-Speed 3341.68 samples/sec   Loss 4.9118   LearningRate 0.0812   Epoch: 1   Global Step: 33020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:18,718-Speed 3346.07 samples/sec   Loss 4.8624   LearningRate 0.0812   Epoch: 1   Global Step: 33030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:21,780-Speed 3344.51 samples/sec   Loss 4.9368   LearningRate 0.0812   Epoch: 1   Global Step: 33040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:24,871-Speed 3313.51 samples/sec   Loss 4.9561   LearningRate 0.0812   Epoch: 1   Global Step: 33050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:27,930-Speed 3349.08 samples/sec   Loss 4.8489   LearningRate 0.0812   Epoch: 1   Global Step: 33060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:30,988-Speed 3349.14 samples/sec   Loss 4.8376   LearningRate 0.0812   Epoch: 1   Global Step: 33070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:34,058-Speed 3336.07 samples/sec   Loss 4.9420   LearningRate 0.0812   Epoch: 1   Global Step: 33080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:37,136-Speed 3327.93 samples/sec   Loss 4.8586   LearningRate 0.0812   Epoch: 1   Global Step: 33090   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:43:40,218-Speed 3323.41 samples/sec   Loss 4.8578   LearningRate 0.0812   Epoch: 1   Global Step: 33100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:43:43,370-Speed 3249.50 samples/sec   Loss 4.9130   LearningRate 0.0811   Epoch: 1   Global Step: 33110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:43:46,457-Speed 3316.89 samples/sec   Loss 5.0010   LearningRate 0.0811   Epoch: 1   Global Step: 33120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:43:49,530-Speed 3334.13 samples/sec   Loss 4.8655   LearningRate 0.0811   Epoch: 1   Global Step: 33130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:43:52,593-Speed 3342.79 samples/sec   Loss 4.8505   LearningRate 0.0811   Epoch: 1   Global Step: 33140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:43:55,663-Speed 3337.52 samples/sec   Loss 4.8566   LearningRate 0.0811   Epoch: 1   Global Step: 33150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:43:58,734-Speed 3335.15 samples/sec   Loss 4.9078   LearningRate 0.0811   Epoch: 1   Global Step: 33160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:01,820-Speed 3318.38 samples/sec   Loss 4.8779   LearningRate 0.0811   Epoch: 1   Global Step: 33170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:04,880-Speed 3347.27 samples/sec   Loss 4.9298   LearningRate 0.0811   Epoch: 1   Global Step: 33180   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:07,943-Speed 3343.94 samples/sec   Loss 4.9587   LearningRate 0.0811   Epoch: 1   Global Step: 33190   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:11,044-Speed 3302.65 samples/sec   Loss 4.9783   LearningRate 0.0811   Epoch: 1   Global Step: 33200   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:44:14,129-Speed 3320.29 samples/sec   Loss 4.9165   LearningRate 0.0811   Epoch: 1   Global Step: 33210   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:44:17,217-Speed 3317.28 samples/sec   Loss 4.9702   LearningRate 0.0811   Epoch: 1   Global Step: 33220   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:20,365-Speed 3253.70 samples/sec   Loss 4.9283   LearningRate 0.0811   Epoch: 1   Global Step: 33230   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:23,437-Speed 3333.76 samples/sec   Loss 4.9657   LearningRate 0.0811   Epoch: 1   Global Step: 33240   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:26,511-Speed 3332.75 samples/sec   Loss 4.8555   LearningRate 0.0811   Epoch: 1   Global Step: 33250   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:29,597-Speed 3318.59 samples/sec   Loss 4.9311   LearningRate 0.0811   Epoch: 1   Global Step: 33260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:32,668-Speed 3334.68 samples/sec   Loss 4.8814   LearningRate 0.0811   Epoch: 1   Global Step: 33270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:35,748-Speed 3325.87 samples/sec   Loss 4.9273   LearningRate 0.0811   Epoch: 1   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:38,818-Speed 3336.00 samples/sec   Loss 4.9983   LearningRate 0.0810   Epoch: 1   Global Step: 33290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:44:41,879-Speed 3346.69 samples/sec   Loss 4.9413   LearningRate 0.0810   Epoch: 1   Global Step: 33300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:44:44,948-Speed 3337.54 samples/sec   Loss 4.9371   LearningRate 0.0810   Epoch: 1   Global Step: 33310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:44:48,017-Speed 3337.30 samples/sec   Loss 4.8478   LearningRate 0.0810   Epoch: 1   Global Step: 33320   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 02:44:51,077-Speed 3346.53 samples/sec   Loss 4.9679   LearningRate 0.0810   Epoch: 1   Global Step: 33330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:44:54,140-Speed 3344.46 samples/sec   Loss 4.8156   LearningRate 0.0810   Epoch: 1   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:44:57,202-Speed 3344.77 samples/sec   Loss 4.8842   LearningRate 0.0810   Epoch: 1   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:45:00,270-Speed 3338.33 samples/sec   Loss 4.8390   LearningRate 0.0810   Epoch: 1   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:45:03,371-Speed 3303.31 samples/sec   Loss 4.8343   LearningRate 0.0810   Epoch: 1   Global Step: 33370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:45:06,681-Speed 3094.10 samples/sec   Loss 4.8396   LearningRate 0.0810   Epoch: 1   Global Step: 33380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:45:37,670-Speed 330.45 samples/sec   Loss 4.4032   LearningRate 0.0810   Epoch: 2   Global Step: 33390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:45:40,811-Speed 3261.48 samples/sec   Loss 4.2495   LearningRate 0.0810   Epoch: 2   Global Step: 33400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:45:43,877-Speed 3340.22 samples/sec   Loss 4.2460   LearningRate 0.0810   Epoch: 2   Global Step: 33410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:45:46,961-Speed 3321.83 samples/sec   Loss 4.2905   LearningRate 0.0810   Epoch: 2   Global Step: 33420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:45:50,077-Speed 3287.99 samples/sec   Loss 4.1882   LearningRate 0.0810   Epoch: 2   Global Step: 33430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:45:53,154-Speed 3327.62 samples/sec   Loss 4.2180   LearningRate 0.0810   Epoch: 2   Global Step: 33440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:45:56,255-Speed 3303.04 samples/sec   Loss 4.1738   LearningRate 0.0810   Epoch: 2   Global Step: 33450   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:45:59,319-Speed 3342.97 samples/sec   Loss 4.2781   LearningRate 0.0810   Epoch: 2   Global Step: 33460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:46:02,432-Speed 3290.23 samples/sec   Loss 4.2436   LearningRate 0.0810   Epoch: 2   Global Step: 33470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:46:05,505-Speed 3333.29 samples/sec   Loss 4.2525   LearningRate 0.0809   Epoch: 2   Global Step: 33480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:46:08,669-Speed 3237.80 samples/sec   Loss 4.3760   LearningRate 0.0809   Epoch: 2   Global Step: 33490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:46:12,400-Speed 2745.05 samples/sec   Loss 4.2579   LearningRate 0.0809   Epoch: 2   Global Step: 33500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:46:15,538-Speed 3264.62 samples/sec   Loss 4.2638   LearningRate 0.0809   Epoch: 2   Global Step: 33510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:46:18,824-Speed 3116.73 samples/sec   Loss 4.3011   LearningRate 0.0809   Epoch: 2   Global Step: 33520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:46:21,894-Speed 3335.57 samples/sec   Loss 4.3017   LearningRate 0.0809   Epoch: 2   Global Step: 33530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:46:24,975-Speed 3325.00 samples/sec   Loss 4.2541   LearningRate 0.0809   Epoch: 2   Global Step: 33540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:46:28,040-Speed 3342.33 samples/sec   Loss 4.2005   LearningRate 0.0809   Epoch: 2   Global Step: 33550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:46:31,128-Speed 3316.35 samples/sec   Loss 4.2912   LearningRate 0.0809   Epoch: 2   Global Step: 33560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:46:34,205-Speed 3328.34 samples/sec   Loss 4.2828   LearningRate 0.0809   Epoch: 2   Global Step: 33570   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 02:46:37,270-Speed 3342.54 samples/sec   Loss 4.3513   LearningRate 0.0809   Epoch: 2   Global Step: 33580   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 02:46:40,326-Speed 3352.86 samples/sec   Loss 4.2431   LearningRate 0.0809   Epoch: 2   Global Step: 33590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:46:43,421-Speed 3308.62 samples/sec   Loss 4.2333   LearningRate 0.0809   Epoch: 2   Global Step: 33600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:46:46,545-Speed 3278.99 samples/sec   Loss 4.2629   LearningRate 0.0809   Epoch: 2   Global Step: 33610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:46:49,664-Speed 3283.34 samples/sec   Loss 4.2897   LearningRate 0.0809   Epoch: 2   Global Step: 33620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:46:52,746-Speed 3323.59 samples/sec   Loss 4.3026   LearningRate 0.0809   Epoch: 2   Global Step: 33630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:46:55,828-Speed 3324.30 samples/sec   Loss 4.3382   LearningRate 0.0809   Epoch: 2   Global Step: 33640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:46:58,905-Speed 3328.97 samples/sec   Loss 4.2951   LearningRate 0.0809   Epoch: 2   Global Step: 33650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:01,981-Speed 3330.04 samples/sec   Loss 4.2739   LearningRate 0.0809   Epoch: 2   Global Step: 33660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:05,109-Speed 3274.14 samples/sec   Loss 4.2354   LearningRate 0.0808   Epoch: 2   Global Step: 33670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:08,207-Speed 3306.71 samples/sec   Loss 4.2289   LearningRate 0.0808   Epoch: 2   Global Step: 33680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:11,310-Speed 3302.42 samples/sec   Loss 4.2031   LearningRate 0.0808   Epoch: 2   Global Step: 33690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:14,389-Speed 3326.09 samples/sec   Loss 4.2736   LearningRate 0.0808   Epoch: 2   Global Step: 33700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:47:17,473-Speed 3321.91 samples/sec   Loss 4.2701   LearningRate 0.0808   Epoch: 2   Global Step: 33710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:47:20,533-Speed 3346.82 samples/sec   Loss 4.1648   LearningRate 0.0808   Epoch: 2   Global Step: 33720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:23,607-Speed 3332.37 samples/sec   Loss 4.3491   LearningRate 0.0808   Epoch: 2   Global Step: 33730   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:26,690-Speed 3321.48 samples/sec   Loss 4.2187   LearningRate 0.0808   Epoch: 2   Global Step: 33740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:29,777-Speed 3318.36 samples/sec   Loss 4.3242   LearningRate 0.0808   Epoch: 2   Global Step: 33750   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:32,874-Speed 3307.43 samples/sec   Loss 4.2137   LearningRate 0.0808   Epoch: 2   Global Step: 33760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:35,961-Speed 3317.74 samples/sec   Loss 4.3012   LearningRate 0.0808   Epoch: 2   Global Step: 33770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:39,044-Speed 3322.92 samples/sec   Loss 4.3181   LearningRate 0.0808   Epoch: 2   Global Step: 33780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:42,113-Speed 3337.59 samples/sec   Loss 4.3803   LearningRate 0.0808   Epoch: 2   Global Step: 33790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:45,186-Speed 3333.44 samples/sec   Loss 4.3371   LearningRate 0.0808   Epoch: 2   Global Step: 33800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:48,258-Speed 3334.04 samples/sec   Loss 4.3064   LearningRate 0.0808   Epoch: 2   Global Step: 33810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:47:51,326-Speed 3338.43 samples/sec   Loss 4.3095   LearningRate 0.0808   Epoch: 2   Global Step: 33820   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:47:54,398-Speed 3333.40 samples/sec   Loss 4.3320   LearningRate 0.0808   Epoch: 2   Global Step: 33830   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:47:57,486-Speed 3317.07 samples/sec   Loss 4.2992   LearningRate 0.0808   Epoch: 2   Global Step: 33840   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:00,623-Speed 3264.98 samples/sec   Loss 4.3329   LearningRate 0.0807   Epoch: 2   Global Step: 33850   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:03,701-Speed 3328.27 samples/sec   Loss 4.2707   LearningRate 0.0807   Epoch: 2   Global Step: 33860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:06,813-Speed 3291.30 samples/sec   Loss 4.3794   LearningRate 0.0807   Epoch: 2   Global Step: 33870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:09,907-Speed 3310.53 samples/sec   Loss 4.3106   LearningRate 0.0807   Epoch: 2   Global Step: 33880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:12,975-Speed 3338.41 samples/sec   Loss 4.3522   LearningRate 0.0807   Epoch: 2   Global Step: 33890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:16,057-Speed 3322.86 samples/sec   Loss 4.2661   LearningRate 0.0807   Epoch: 2   Global Step: 33900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:19,143-Speed 3318.62 samples/sec   Loss 4.3223   LearningRate 0.0807   Epoch: 2   Global Step: 33910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:22,218-Speed 3331.37 samples/sec   Loss 4.3352   LearningRate 0.0807   Epoch: 2   Global Step: 33920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:25,322-Speed 3298.99 samples/sec   Loss 4.3234   LearningRate 0.0807   Epoch: 2   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:28,471-Speed 3253.23 samples/sec   Loss 4.4169   LearningRate 0.0807   Epoch: 2   Global Step: 33940   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:31,600-Speed 3273.59 samples/sec   Loss 4.2595   LearningRate 0.0807   Epoch: 2   Global Step: 33950   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:34,681-Speed 3324.30 samples/sec   Loss 4.1973   LearningRate 0.0807   Epoch: 2   Global Step: 33960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:37,758-Speed 3328.72 samples/sec   Loss 4.3554   LearningRate 0.0807   Epoch: 2   Global Step: 33970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:40,833-Speed 3331.46 samples/sec   Loss 4.3579   LearningRate 0.0807   Epoch: 2   Global Step: 33980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:43,898-Speed 3341.60 samples/sec   Loss 4.2679   LearningRate 0.0807   Epoch: 2   Global Step: 33990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:48:46,973-Speed 3330.87 samples/sec   Loss 4.4101   LearningRate 0.0807   Epoch: 2   Global Step: 34000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:49:31,165-[lfw][34000]XNorm: 24.389967
Training: 2022-04-11 02:49:31,166-[lfw][34000]Accuracy-Flip: 0.99733+-0.00281
Training: 2022-04-11 02:49:31,166-[lfw][34000]Accuracy-Highest: 0.99767
Training: 2022-04-11 02:50:22,431-[cfp_fp][34000]XNorm: 22.961980
Training: 2022-04-11 02:50:22,432-[cfp_fp][34000]Accuracy-Flip: 0.97757+-0.00655
Training: 2022-04-11 02:50:22,432-[cfp_fp][34000]Accuracy-Highest: 0.97757
Training: 2022-04-11 02:51:06,587-[agedb_30][34000]XNorm: 24.284009
Training: 2022-04-11 02:51:06,587-[agedb_30][34000]Accuracy-Flip: 0.97433+-0.00593
Training: 2022-04-11 02:51:06,588-[agedb_30][34000]Accuracy-Highest: 0.97667
Training: 2022-04-11 02:51:09,680-Speed 71.76 samples/sec   Loss 4.3774   LearningRate 0.0807   Epoch: 2   Global Step: 34010   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:12,743-Speed 3343.96 samples/sec   Loss 4.3748   LearningRate 0.0807   Epoch: 2   Global Step: 34020   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:51:15,816-Speed 3332.99 samples/sec   Loss 4.3481   LearningRate 0.0807   Epoch: 2   Global Step: 34030   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:51:18,900-Speed 3321.10 samples/sec   Loss 4.3763   LearningRate 0.0806   Epoch: 2   Global Step: 34040   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:21,984-Speed 3321.21 samples/sec   Loss 4.3011   LearningRate 0.0806   Epoch: 2   Global Step: 34050   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:25,050-Speed 3339.91 samples/sec   Loss 4.3808   LearningRate 0.0806   Epoch: 2   Global Step: 34060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:28,141-Speed 3314.48 samples/sec   Loss 4.3396   LearningRate 0.0806   Epoch: 2   Global Step: 34070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:31,205-Speed 3343.11 samples/sec   Loss 4.3250   LearningRate 0.0806   Epoch: 2   Global Step: 34080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:34,268-Speed 3343.85 samples/sec   Loss 4.3706   LearningRate 0.0806   Epoch: 2   Global Step: 34090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:37,330-Speed 3345.33 samples/sec   Loss 4.3097   LearningRate 0.0806   Epoch: 2   Global Step: 34100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:40,394-Speed 3342.50 samples/sec   Loss 4.3783   LearningRate 0.0806   Epoch: 2   Global Step: 34110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:43,471-Speed 3328.37 samples/sec   Loss 4.3339   LearningRate 0.0806   Epoch: 2   Global Step: 34120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:46,554-Speed 3321.89 samples/sec   Loss 4.3099   LearningRate 0.0806   Epoch: 2   Global Step: 34130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:49,634-Speed 3325.80 samples/sec   Loss 4.2901   LearningRate 0.0806   Epoch: 2   Global Step: 34140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:52,743-Speed 3294.89 samples/sec   Loss 4.2557   LearningRate 0.0806   Epoch: 2   Global Step: 34150   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:55,843-Speed 3303.47 samples/sec   Loss 4.3558   LearningRate 0.0806   Epoch: 2   Global Step: 34160   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:51:58,919-Speed 3330.12 samples/sec   Loss 4.4086   LearningRate 0.0806   Epoch: 2   Global Step: 34170   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:01,994-Speed 3331.30 samples/sec   Loss 4.3570   LearningRate 0.0806   Epoch: 2   Global Step: 34180   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:05,064-Speed 3335.44 samples/sec   Loss 4.2837   LearningRate 0.0806   Epoch: 2   Global Step: 34190   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:08,130-Speed 3341.11 samples/sec   Loss 4.3529   LearningRate 0.0806   Epoch: 2   Global Step: 34200   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:11,196-Speed 3340.93 samples/sec   Loss 4.2633   LearningRate 0.0806   Epoch: 2   Global Step: 34210   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:14,277-Speed 3324.37 samples/sec   Loss 4.4200   LearningRate 0.0805   Epoch: 2   Global Step: 34220   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:17,380-Speed 3300.35 samples/sec   Loss 4.3649   LearningRate 0.0805   Epoch: 2   Global Step: 34230   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:20,446-Speed 3340.65 samples/sec   Loss 4.3332   LearningRate 0.0805   Epoch: 2   Global Step: 34240   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:52:23,506-Speed 3347.36 samples/sec   Loss 4.3926   LearningRate 0.0805   Epoch: 2   Global Step: 34250   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:52:26,608-Speed 3301.60 samples/sec   Loss 4.3460   LearningRate 0.0805   Epoch: 2   Global Step: 34260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:29,704-Speed 3308.30 samples/sec   Loss 4.3082   LearningRate 0.0805   Epoch: 2   Global Step: 34270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:32,789-Speed 3320.31 samples/sec   Loss 4.4378   LearningRate 0.0805   Epoch: 2   Global Step: 34280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:52:35,860-Speed 3335.91 samples/sec   Loss 4.3860   LearningRate 0.0805   Epoch: 2   Global Step: 34290   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:52:39,000-Speed 3261.68 samples/sec   Loss 4.3201   LearningRate 0.0805   Epoch: 2   Global Step: 34300   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:52:42,215-Speed 3185.95 samples/sec   Loss 4.4374   LearningRate 0.0805   Epoch: 2   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:52:45,273-Speed 3348.89 samples/sec   Loss 4.4041   LearningRate 0.0805   Epoch: 2   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:52:48,332-Speed 3349.29 samples/sec   Loss 4.2430   LearningRate 0.0805   Epoch: 2   Global Step: 34330   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:52:51,409-Speed 3327.65 samples/sec   Loss 4.4428   LearningRate 0.0805   Epoch: 2   Global Step: 34340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:52:54,542-Speed 3269.53 samples/sec   Loss 4.4049   LearningRate 0.0805   Epoch: 2   Global Step: 34350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:52:57,656-Speed 3289.23 samples/sec   Loss 4.3086   LearningRate 0.0805   Epoch: 2   Global Step: 34360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:00,722-Speed 3341.27 samples/sec   Loss 4.2161   LearningRate 0.0805   Epoch: 2   Global Step: 34370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:03,786-Speed 3342.59 samples/sec   Loss 4.3433   LearningRate 0.0805   Epoch: 2   Global Step: 34380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:06,878-Speed 3313.07 samples/sec   Loss 4.3841   LearningRate 0.0805   Epoch: 2   Global Step: 34390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:53:09,992-Speed 3288.75 samples/sec   Loss 4.4438   LearningRate 0.0805   Epoch: 2   Global Step: 34400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:53:13,126-Speed 3268.56 samples/sec   Loss 4.4332   LearningRate 0.0804   Epoch: 2   Global Step: 34410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:53:16,209-Speed 3322.61 samples/sec   Loss 4.4751   LearningRate 0.0804   Epoch: 2   Global Step: 34420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:53:19,280-Speed 3335.03 samples/sec   Loss 4.3869   LearningRate 0.0804   Epoch: 2   Global Step: 34430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:53:22,342-Speed 3345.10 samples/sec   Loss 4.4084   LearningRate 0.0804   Epoch: 2   Global Step: 34440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:53:25,433-Speed 3313.94 samples/sec   Loss 4.4681   LearningRate 0.0804   Epoch: 2   Global Step: 34450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:53:28,515-Speed 3322.91 samples/sec   Loss 4.3912   LearningRate 0.0804   Epoch: 2   Global Step: 34460   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:31,580-Speed 3342.03 samples/sec   Loss 4.3839   LearningRate 0.0804   Epoch: 2   Global Step: 34470   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:34,639-Speed 3348.16 samples/sec   Loss 4.5006   LearningRate 0.0804   Epoch: 2   Global Step: 34480   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:37,705-Speed 3341.34 samples/sec   Loss 4.4788   LearningRate 0.0804   Epoch: 2   Global Step: 34490   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:40,766-Speed 3346.46 samples/sec   Loss 4.4183   LearningRate 0.0804   Epoch: 2   Global Step: 34500   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:43,844-Speed 3326.59 samples/sec   Loss 4.3937   LearningRate 0.0804   Epoch: 2   Global Step: 34510   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:46,934-Speed 3314.64 samples/sec   Loss 4.5067   LearningRate 0.0804   Epoch: 2   Global Step: 34520   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:50,000-Speed 3340.90 samples/sec   Loss 4.4424   LearningRate 0.0804   Epoch: 2   Global Step: 34530   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:53,073-Speed 3333.83 samples/sec   Loss 4.4271   LearningRate 0.0804   Epoch: 2   Global Step: 34540   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:56,145-Speed 3334.41 samples/sec   Loss 4.4866   LearningRate 0.0804   Epoch: 2   Global Step: 34550   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-04-11 02:53:59,206-Speed 3346.29 samples/sec   Loss 4.4221   LearningRate 0.0804   Epoch: 2   Global Step: 34560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:02,327-Speed 3280.78 samples/sec   Loss 4.4767   LearningRate 0.0804   Epoch: 2   Global Step: 34570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:05,394-Speed 3339.82 samples/sec   Loss 4.3778   LearningRate 0.0804   Epoch: 2   Global Step: 34580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:08,452-Speed 3349.43 samples/sec   Loss 4.4607   LearningRate 0.0803   Epoch: 2   Global Step: 34590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:11,531-Speed 3326.40 samples/sec   Loss 4.4684   LearningRate 0.0803   Epoch: 2   Global Step: 34600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:14,658-Speed 3274.87 samples/sec   Loss 4.4023   LearningRate 0.0803   Epoch: 2   Global Step: 34610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:17,730-Speed 3334.71 samples/sec   Loss 4.4466   LearningRate 0.0803   Epoch: 2   Global Step: 34620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:20,805-Speed 3331.20 samples/sec   Loss 4.3809   LearningRate 0.0803   Epoch: 2   Global Step: 34630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:23,911-Speed 3297.83 samples/sec   Loss 4.4263   LearningRate 0.0803   Epoch: 2   Global Step: 34640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:26,985-Speed 3332.12 samples/sec   Loss 4.4051   LearningRate 0.0803   Epoch: 2   Global Step: 34650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:30,062-Speed 3328.12 samples/sec   Loss 4.3538   LearningRate 0.0803   Epoch: 2   Global Step: 34660   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-04-11 02:54:33,126-Speed 3342.54 samples/sec   Loss 4.4721   LearningRate 0.0803   Epoch: 2   Global Step: 34670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:36,188-Speed 3345.78 samples/sec   Loss 4.4826   LearningRate 0.0803   Epoch: 2   Global Step: 34680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:39,251-Speed 3343.30 samples/sec   Loss 4.3487   LearningRate 0.0803   Epoch: 2   Global Step: 34690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:42,322-Speed 3335.42 samples/sec   Loss 4.3686   LearningRate 0.0803   Epoch: 2   Global Step: 34700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-04-11 02:54:45,383-Speed 3345.89 samples/sec   Loss 4.4391   LearningRate 0.0803   Epoch: 2   Global Step: 34710   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-11 02:54:48,462-Speed 3327.36 samples/sec   Loss 4.4111   LearningRate 0.0803   Epoch: 2   Global Step: 34720   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-11 02:54:51,524-Speed 3345.30 samples/sec   Loss 4.5014   LearningRate 0.0803   Epoch: 2   Global Step: 34730   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-11 02:54:54,607-Speed 3321.84 samples/sec   Loss 4.4336   LearningRate 0.0803   Epoch: 2   Global Step: 34740   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-11 02:54:57,666-Speed 3347.53 samples/sec   Loss 4.4722   LearningRate 0.0803   Epoch: 2   Global Step: 34750   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-11 02:55:00,764-Speed 3306.38 samples/sec   Loss 4.4859   LearningRate 0.0803   Epoch: 2   Global Step: 34760   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-11 02:55:03,839-Speed 3331.40 samples/sec   Loss 4.4589   LearningRate 0.0803   Epoch: 2   Global Step: 34770   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-11 02:55:06,898-Speed 3348.15 samples/sec   Loss 4.4585   LearningRate 0.0802   Epoch: 2   Global Step: 34780   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-11 02:55:09,976-Speed 3327.57 samples/sec   Loss 4.4189   LearningRate 0.0802   Epoch: 2   Global Step: 34790   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-04-11 02:55:13,094-Speed 3284.59 samples/sec   Loss 4.4668   LearningRate 0.0802   Epoch: 2   Global Step: 34800   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:55:16,177-Speed 3322.21 samples/sec   Loss 4.4786   LearningRate 0.0802   Epoch: 2   Global Step: 34810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:19,366-Speed 3212.44 samples/sec   Loss 4.3921   LearningRate 0.0802   Epoch: 2   Global Step: 34820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:22,499-Speed 3269.11 samples/sec   Loss 4.4015   LearningRate 0.0802   Epoch: 2   Global Step: 34830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:25,559-Speed 3346.83 samples/sec   Loss 4.5770   LearningRate 0.0802   Epoch: 2   Global Step: 34840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:28,623-Speed 3342.99 samples/sec   Loss 4.4233   LearningRate 0.0802   Epoch: 2   Global Step: 34850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:31,691-Speed 3337.63 samples/sec   Loss 4.5317   LearningRate 0.0802   Epoch: 2   Global Step: 34860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:34,754-Speed 3344.12 samples/sec   Loss 4.5202   LearningRate 0.0802   Epoch: 2   Global Step: 34870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:37,844-Speed 3314.56 samples/sec   Loss 4.5077   LearningRate 0.0802   Epoch: 2   Global Step: 34880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:40,921-Speed 3329.44 samples/sec   Loss 4.4391   LearningRate 0.0802   Epoch: 2   Global Step: 34890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:43,988-Speed 3339.47 samples/sec   Loss 4.3793   LearningRate 0.0802   Epoch: 2   Global Step: 34900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:47,047-Speed 3347.93 samples/sec   Loss 4.4385   LearningRate 0.0802   Epoch: 2   Global Step: 34910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:55:50,120-Speed 3333.47 samples/sec   Loss 4.4184   LearningRate 0.0802   Epoch: 2   Global Step: 34920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:53,207-Speed 3317.91 samples/sec   Loss 4.4692   LearningRate 0.0802   Epoch: 2   Global Step: 34930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:56,283-Speed 3329.19 samples/sec   Loss 4.4085   LearningRate 0.0802   Epoch: 2   Global Step: 34940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:55:59,356-Speed 3333.90 samples/sec   Loss 4.3492   LearningRate 0.0802   Epoch: 2   Global Step: 34950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:02,444-Speed 3316.66 samples/sec   Loss 4.4218   LearningRate 0.0802   Epoch: 2   Global Step: 34960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:05,525-Speed 3323.94 samples/sec   Loss 4.5240   LearningRate 0.0801   Epoch: 2   Global Step: 34970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:08,595-Speed 3335.86 samples/sec   Loss 4.4973   LearningRate 0.0801   Epoch: 2   Global Step: 34980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:11,670-Speed 3331.67 samples/sec   Loss 4.4412   LearningRate 0.0801   Epoch: 2   Global Step: 34990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:14,771-Speed 3303.19 samples/sec   Loss 4.4029   LearningRate 0.0801   Epoch: 2   Global Step: 35000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:17,948-Speed 3223.38 samples/sec   Loss 4.3765   LearningRate 0.0801   Epoch: 2   Global Step: 35010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:21,036-Speed 3316.83 samples/sec   Loss 4.4481   LearningRate 0.0801   Epoch: 2   Global Step: 35020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:56:24,120-Speed 3321.60 samples/sec   Loss 4.3847   LearningRate 0.0801   Epoch: 2   Global Step: 35030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:56:27,247-Speed 3274.87 samples/sec   Loss 4.4000   LearningRate 0.0801   Epoch: 2   Global Step: 35040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:56:30,328-Speed 3324.94 samples/sec   Loss 4.4688   LearningRate 0.0801   Epoch: 2   Global Step: 35050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:56:33,417-Speed 3315.03 samples/sec   Loss 4.4673   LearningRate 0.0801   Epoch: 2   Global Step: 35060   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:56:36,489-Speed 3334.94 samples/sec   Loss 4.4943   LearningRate 0.0801   Epoch: 2   Global Step: 35070   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:56:39,544-Speed 3352.57 samples/sec   Loss 4.4846   LearningRate 0.0801   Epoch: 2   Global Step: 35080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:42,605-Speed 3346.44 samples/sec   Loss 4.4669   LearningRate 0.0801   Epoch: 2   Global Step: 35090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:45,743-Speed 3263.22 samples/sec   Loss 4.4807   LearningRate 0.0801   Epoch: 2   Global Step: 35100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:48,816-Speed 3332.90 samples/sec   Loss 4.4692   LearningRate 0.0801   Epoch: 2   Global Step: 35110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:51,947-Speed 3271.40 samples/sec   Loss 4.4743   LearningRate 0.0801   Epoch: 2   Global Step: 35120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:55,130-Speed 3218.35 samples/sec   Loss 4.5105   LearningRate 0.0801   Epoch: 2   Global Step: 35130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:56:58,202-Speed 3333.71 samples/sec   Loss 4.4626   LearningRate 0.0801   Epoch: 2   Global Step: 35140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:01,280-Speed 3327.60 samples/sec   Loss 4.5113   LearningRate 0.0800   Epoch: 2   Global Step: 35150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:04,344-Speed 3342.30 samples/sec   Loss 4.4008   LearningRate 0.0800   Epoch: 2   Global Step: 35160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:07,406-Speed 3345.86 samples/sec   Loss 4.4662   LearningRate 0.0800   Epoch: 2   Global Step: 35170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:10,472-Speed 3340.90 samples/sec   Loss 4.4755   LearningRate 0.0800   Epoch: 2   Global Step: 35180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:57:13,522-Speed 3357.64 samples/sec   Loss 4.4029   LearningRate 0.0800   Epoch: 2   Global Step: 35190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:16,602-Speed 3324.86 samples/sec   Loss 4.4151   LearningRate 0.0800   Epoch: 2   Global Step: 35200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:19,665-Speed 3344.51 samples/sec   Loss 4.3167   LearningRate 0.0800   Epoch: 2   Global Step: 35210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:22,727-Speed 3345.05 samples/sec   Loss 4.3977   LearningRate 0.0800   Epoch: 2   Global Step: 35220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:25,861-Speed 3268.08 samples/sec   Loss 4.3748   LearningRate 0.0800   Epoch: 2   Global Step: 35230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:28,929-Speed 3338.38 samples/sec   Loss 4.4172   LearningRate 0.0800   Epoch: 2   Global Step: 35240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:31,993-Speed 3342.93 samples/sec   Loss 4.4823   LearningRate 0.0800   Epoch: 2   Global Step: 35250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:35,077-Speed 3321.30 samples/sec   Loss 4.4149   LearningRate 0.0800   Epoch: 2   Global Step: 35260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:38,192-Speed 3288.09 samples/sec   Loss 4.4813   LearningRate 0.0800   Epoch: 2   Global Step: 35270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:41,288-Speed 3308.42 samples/sec   Loss 4.4185   LearningRate 0.0800   Epoch: 2   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:57:44,350-Speed 3344.79 samples/sec   Loss 4.4177   LearningRate 0.0800   Epoch: 2   Global Step: 35290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:57:47,411-Speed 3345.58 samples/sec   Loss 4.4764   LearningRate 0.0800   Epoch: 2   Global Step: 35300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:57:50,478-Speed 3339.79 samples/sec   Loss 4.4601   LearningRate 0.0800   Epoch: 2   Global Step: 35310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:57:53,546-Speed 3338.25 samples/sec   Loss 4.4326   LearningRate 0.0800   Epoch: 2   Global Step: 35320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:57:56,648-Speed 3301.31 samples/sec   Loss 4.4553   LearningRate 0.0800   Epoch: 2   Global Step: 35330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:57:59,710-Speed 3345.75 samples/sec   Loss 4.5202   LearningRate 0.0799   Epoch: 2   Global Step: 35340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:02,778-Speed 3338.84 samples/sec   Loss 4.4486   LearningRate 0.0799   Epoch: 2   Global Step: 35350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:05,838-Speed 3346.40 samples/sec   Loss 4.5311   LearningRate 0.0799   Epoch: 2   Global Step: 35360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:08,901-Speed 3343.80 samples/sec   Loss 4.4288   LearningRate 0.0799   Epoch: 2   Global Step: 35370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:12,000-Speed 3305.73 samples/sec   Loss 4.5910   LearningRate 0.0799   Epoch: 2   Global Step: 35380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:15,169-Speed 3231.21 samples/sec   Loss 4.4877   LearningRate 0.0799   Epoch: 2   Global Step: 35390   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 02:58:18,289-Speed 3283.28 samples/sec   Loss 4.4981   LearningRate 0.0799   Epoch: 2   Global Step: 35400   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 02:58:21,364-Speed 3330.90 samples/sec   Loss 4.4701   LearningRate 0.0799   Epoch: 2   Global Step: 35410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:24,449-Speed 3319.68 samples/sec   Loss 4.4546   LearningRate 0.0799   Epoch: 2   Global Step: 35420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:27,554-Speed 3298.51 samples/sec   Loss 4.4792   LearningRate 0.0799   Epoch: 2   Global Step: 35430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:30,629-Speed 3332.23 samples/sec   Loss 4.5419   LearningRate 0.0799   Epoch: 2   Global Step: 35440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:33,711-Speed 3322.72 samples/sec   Loss 4.5153   LearningRate 0.0799   Epoch: 2   Global Step: 35450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:36,786-Speed 3330.57 samples/sec   Loss 4.5666   LearningRate 0.0799   Epoch: 2   Global Step: 35460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:39,852-Speed 3341.45 samples/sec   Loss 4.4607   LearningRate 0.0799   Epoch: 2   Global Step: 35470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:42,920-Speed 3338.14 samples/sec   Loss 4.5467   LearningRate 0.0799   Epoch: 2   Global Step: 35480   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:45,983-Speed 3343.86 samples/sec   Loss 4.5340   LearningRate 0.0799   Epoch: 2   Global Step: 35490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:49,052-Speed 3336.88 samples/sec   Loss 4.5772   LearningRate 0.0799   Epoch: 2   Global Step: 35500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:58:52,120-Speed 3338.33 samples/sec   Loss 4.5689   LearningRate 0.0799   Epoch: 2   Global Step: 35510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:58:55,181-Speed 3346.26 samples/sec   Loss 4.5215   LearningRate 0.0799   Epoch: 2   Global Step: 35520   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:58:58,254-Speed 3334.16 samples/sec   Loss 4.5052   LearningRate 0.0798   Epoch: 2   Global Step: 35530   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:59:01,332-Speed 3327.25 samples/sec   Loss 4.5691   LearningRate 0.0798   Epoch: 2   Global Step: 35540   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:59:04,428-Speed 3307.69 samples/sec   Loss 4.5160   LearningRate 0.0798   Epoch: 2   Global Step: 35550   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:59:07,502-Speed 3332.49 samples/sec   Loss 4.5702   LearningRate 0.0798   Epoch: 2   Global Step: 35560   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:59:10,591-Speed 3316.05 samples/sec   Loss 4.4008   LearningRate 0.0798   Epoch: 2   Global Step: 35570   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:59:13,689-Speed 3305.40 samples/sec   Loss 4.6020   LearningRate 0.0798   Epoch: 2   Global Step: 35580   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:59:16,765-Speed 3329.91 samples/sec   Loss 4.5518   LearningRate 0.0798   Epoch: 2   Global Step: 35590   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:59:19,836-Speed 3335.20 samples/sec   Loss 4.5400   LearningRate 0.0798   Epoch: 2   Global Step: 35600   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:59:22,954-Speed 3284.50 samples/sec   Loss 4.5042   LearningRate 0.0798   Epoch: 2   Global Step: 35610   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 02:59:26,079-Speed 3278.03 samples/sec   Loss 4.4751   LearningRate 0.0798   Epoch: 2   Global Step: 35620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:29,146-Speed 3340.44 samples/sec   Loss 4.3969   LearningRate 0.0798   Epoch: 2   Global Step: 35630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:32,212-Speed 3340.51 samples/sec   Loss 4.4128   LearningRate 0.0798   Epoch: 2   Global Step: 35640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:35,280-Speed 3337.53 samples/sec   Loss 4.5150   LearningRate 0.0798   Epoch: 2   Global Step: 35650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:38,374-Speed 3311.10 samples/sec   Loss 4.5832   LearningRate 0.0798   Epoch: 2   Global Step: 35660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:41,455-Speed 3323.65 samples/sec   Loss 4.4957   LearningRate 0.0798   Epoch: 2   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:44,526-Speed 3335.47 samples/sec   Loss 4.4866   LearningRate 0.0798   Epoch: 2   Global Step: 35680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:47,591-Speed 3341.65 samples/sec   Loss 4.4784   LearningRate 0.0798   Epoch: 2   Global Step: 35690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:50,674-Speed 3322.50 samples/sec   Loss 4.5932   LearningRate 0.0798   Epoch: 2   Global Step: 35700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:53,760-Speed 3318.81 samples/sec   Loss 4.4614   LearningRate 0.0797   Epoch: 2   Global Step: 35710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 02:59:56,868-Speed 3295.20 samples/sec   Loss 4.4342   LearningRate 0.0797   Epoch: 2   Global Step: 35720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 02:59:59,932-Speed 3343.22 samples/sec   Loss 4.5065   LearningRate 0.0797   Epoch: 2   Global Step: 35730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:03,017-Speed 3319.47 samples/sec   Loss 4.5225   LearningRate 0.0797   Epoch: 2   Global Step: 35740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:06,091-Speed 3331.81 samples/sec   Loss 4.4243   LearningRate 0.0797   Epoch: 2   Global Step: 35750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:09,165-Speed 3332.24 samples/sec   Loss 4.4936   LearningRate 0.0797   Epoch: 2   Global Step: 35760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:12,245-Speed 3325.98 samples/sec   Loss 4.5188   LearningRate 0.0797   Epoch: 2   Global Step: 35770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:15,314-Speed 3336.58 samples/sec   Loss 4.4500   LearningRate 0.0797   Epoch: 2   Global Step: 35780   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:18,421-Speed 3296.54 samples/sec   Loss 4.4027   LearningRate 0.0797   Epoch: 2   Global Step: 35790   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:21,529-Speed 3295.67 samples/sec   Loss 4.4446   LearningRate 0.0797   Epoch: 2   Global Step: 35800   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:24,638-Speed 3294.74 samples/sec   Loss 4.5239   LearningRate 0.0797   Epoch: 2   Global Step: 35810   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:27,739-Speed 3302.70 samples/sec   Loss 4.4513   LearningRate 0.0797   Epoch: 2   Global Step: 35820   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:00:30,807-Speed 3338.98 samples/sec   Loss 4.4456   LearningRate 0.0797   Epoch: 2   Global Step: 35830   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:33,876-Speed 3337.22 samples/sec   Loss 4.4931   LearningRate 0.0797   Epoch: 2   Global Step: 35840   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:36,941-Speed 3341.84 samples/sec   Loss 4.4421   LearningRate 0.0797   Epoch: 2   Global Step: 35850   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:40,006-Speed 3341.27 samples/sec   Loss 4.5518   LearningRate 0.0797   Epoch: 2   Global Step: 35860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:43,079-Speed 3332.51 samples/sec   Loss 4.4692   LearningRate 0.0797   Epoch: 2   Global Step: 35870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:46,148-Speed 3338.33 samples/sec   Loss 4.4386   LearningRate 0.0797   Epoch: 2   Global Step: 35880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:49,212-Speed 3342.10 samples/sec   Loss 4.5351   LearningRate 0.0797   Epoch: 2   Global Step: 35890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:52,284-Speed 3334.86 samples/sec   Loss 4.5136   LearningRate 0.0796   Epoch: 2   Global Step: 35900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:00:55,343-Speed 3348.04 samples/sec   Loss 4.4624   LearningRate 0.0796   Epoch: 2   Global Step: 35910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:00:58,411-Speed 3338.73 samples/sec   Loss 4.5682   LearningRate 0.0796   Epoch: 2   Global Step: 35920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:01:01,477-Speed 3340.58 samples/sec   Loss 4.5628   LearningRate 0.0796   Epoch: 2   Global Step: 35930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:01:04,544-Speed 3339.55 samples/sec   Loss 4.6026   LearningRate 0.0796   Epoch: 2   Global Step: 35940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:01:07,614-Speed 3335.60 samples/sec   Loss 4.4072   LearningRate 0.0796   Epoch: 2   Global Step: 35950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:01:10,686-Speed 3334.70 samples/sec   Loss 4.5571   LearningRate 0.0796   Epoch: 2   Global Step: 35960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:01:13,756-Speed 3335.94 samples/sec   Loss 4.4609   LearningRate 0.0796   Epoch: 2   Global Step: 35970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:01:16,879-Speed 3280.44 samples/sec   Loss 4.4907   LearningRate 0.0796   Epoch: 2   Global Step: 35980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:01:19,982-Speed 3300.70 samples/sec   Loss 4.5542   LearningRate 0.0796   Epoch: 2   Global Step: 35990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:01:23,100-Speed 3284.58 samples/sec   Loss 4.5883   LearningRate 0.0796   Epoch: 2   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:02:06,930-[lfw][36000]XNorm: 22.132697
Training: 2022-04-11 03:02:06,930-[lfw][36000]Accuracy-Flip: 0.99700+-0.00256
Training: 2022-04-11 03:02:06,931-[lfw][36000]Accuracy-Highest: 0.99767
Training: 2022-04-11 03:02:57,949-[cfp_fp][36000]XNorm: 20.001465
Training: 2022-04-11 03:02:57,949-[cfp_fp][36000]Accuracy-Flip: 0.97529+-0.00734
Training: 2022-04-11 03:02:57,950-[cfp_fp][36000]Accuracy-Highest: 0.97757
Training: 2022-04-11 03:03:41,887-[agedb_30][36000]XNorm: 22.096262
Training: 2022-04-11 03:03:41,888-[agedb_30][36000]Accuracy-Flip: 0.97633+-0.00994
Training: 2022-04-11 03:03:41,888-[agedb_30][36000]Accuracy-Highest: 0.97667
Training: 2022-04-11 03:03:44,938-Speed 72.20 samples/sec   Loss 4.5605   LearningRate 0.0796   Epoch: 2   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:03:47,989-Speed 3356.92 samples/sec   Loss 4.4777   LearningRate 0.0796   Epoch: 2   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:03:51,092-Speed 3301.09 samples/sec   Loss 4.4463   LearningRate 0.0796   Epoch: 2   Global Step: 36030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:03:54,275-Speed 3217.02 samples/sec   Loss 4.5307   LearningRate 0.0796   Epoch: 2   Global Step: 36040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:03:57,334-Speed 3349.10 samples/sec   Loss 4.4512   LearningRate 0.0796   Epoch: 2   Global Step: 36050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:04:00,519-Speed 3215.15 samples/sec   Loss 4.4399   LearningRate 0.0796   Epoch: 2   Global Step: 36060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:04:03,649-Speed 3272.32 samples/sec   Loss 4.5663   LearningRate 0.0796   Epoch: 2   Global Step: 36070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:04:06,899-Speed 3151.51 samples/sec   Loss 4.4762   LearningRate 0.0796   Epoch: 2   Global Step: 36080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:04:10,149-Speed 3151.28 samples/sec   Loss 4.5679   LearningRate 0.0795   Epoch: 2   Global Step: 36090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:04:13,350-Speed 3199.95 samples/sec   Loss 4.5063   LearningRate 0.0795   Epoch: 2   Global Step: 36100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:04:16,432-Speed 3323.36 samples/sec   Loss 4.4949   LearningRate 0.0795   Epoch: 2   Global Step: 36110   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:19,548-Speed 3287.07 samples/sec   Loss 4.5021   LearningRate 0.0795   Epoch: 2   Global Step: 36120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:22,666-Speed 3284.61 samples/sec   Loss 4.5713   LearningRate 0.0795   Epoch: 2   Global Step: 36130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:25,760-Speed 3311.13 samples/sec   Loss 4.4935   LearningRate 0.0795   Epoch: 2   Global Step: 36140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:28,856-Speed 3308.13 samples/sec   Loss 4.4889   LearningRate 0.0795   Epoch: 2   Global Step: 36150   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:31,952-Speed 3308.24 samples/sec   Loss 4.5443   LearningRate 0.0795   Epoch: 2   Global Step: 36160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:35,027-Speed 3330.18 samples/sec   Loss 4.5896   LearningRate 0.0795   Epoch: 2   Global Step: 36170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:38,113-Speed 3320.13 samples/sec   Loss 4.5607   LearningRate 0.0795   Epoch: 2   Global Step: 36180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:41,173-Speed 3346.22 samples/sec   Loss 4.5594   LearningRate 0.0795   Epoch: 2   Global Step: 36190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:44,248-Speed 3331.34 samples/sec   Loss 4.5410   LearningRate 0.0795   Epoch: 2   Global Step: 36200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:47,366-Speed 3285.19 samples/sec   Loss 4.5326   LearningRate 0.0795   Epoch: 2   Global Step: 36210   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:04:50,443-Speed 3328.52 samples/sec   Loss 4.6154   LearningRate 0.0795   Epoch: 2   Global Step: 36220   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:53,547-Speed 3299.86 samples/sec   Loss 4.4832   LearningRate 0.0795   Epoch: 2   Global Step: 36230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:56,695-Speed 3252.75 samples/sec   Loss 4.4449   LearningRate 0.0795   Epoch: 2   Global Step: 36240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:04:59,813-Speed 3284.70 samples/sec   Loss 4.5587   LearningRate 0.0795   Epoch: 2   Global Step: 36250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:02,886-Speed 3333.11 samples/sec   Loss 4.5822   LearningRate 0.0795   Epoch: 2   Global Step: 36260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:05,952-Speed 3341.53 samples/sec   Loss 4.5586   LearningRate 0.0795   Epoch: 2   Global Step: 36270   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:09,084-Speed 3269.73 samples/sec   Loss 4.5145   LearningRate 0.0794   Epoch: 2   Global Step: 36280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:12,166-Speed 3324.00 samples/sec   Loss 4.5201   LearningRate 0.0794   Epoch: 2   Global Step: 36290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:15,233-Speed 3339.35 samples/sec   Loss 4.5861   LearningRate 0.0794   Epoch: 2   Global Step: 36300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:18,320-Speed 3317.51 samples/sec   Loss 4.4988   LearningRate 0.0794   Epoch: 2   Global Step: 36310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:21,379-Speed 3348.69 samples/sec   Loss 4.4934   LearningRate 0.0794   Epoch: 2   Global Step: 36320   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:05:24,450-Speed 3335.38 samples/sec   Loss 4.4966   LearningRate 0.0794   Epoch: 2   Global Step: 36330   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:05:27,497-Speed 3361.02 samples/sec   Loss 4.5484   LearningRate 0.0794   Epoch: 2   Global Step: 36340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:30,570-Speed 3332.39 samples/sec   Loss 4.5309   LearningRate 0.0794   Epoch: 2   Global Step: 36350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:33,630-Speed 3347.82 samples/sec   Loss 4.4729   LearningRate 0.0794   Epoch: 2   Global Step: 36360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:36,693-Speed 3344.47 samples/sec   Loss 4.5008   LearningRate 0.0794   Epoch: 2   Global Step: 36370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:39,751-Speed 3349.27 samples/sec   Loss 4.4539   LearningRate 0.0794   Epoch: 2   Global Step: 36380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:42,814-Speed 3343.11 samples/sec   Loss 4.4324   LearningRate 0.0794   Epoch: 2   Global Step: 36390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:45,894-Speed 3325.07 samples/sec   Loss 4.5120   LearningRate 0.0794   Epoch: 2   Global Step: 36400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:48,962-Speed 3338.57 samples/sec   Loss 4.5097   LearningRate 0.0794   Epoch: 2   Global Step: 36410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:52,117-Speed 3247.17 samples/sec   Loss 4.5334   LearningRate 0.0794   Epoch: 2   Global Step: 36420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:55,225-Speed 3295.57 samples/sec   Loss 4.4878   LearningRate 0.0794   Epoch: 2   Global Step: 36430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:05:58,339-Speed 3288.56 samples/sec   Loss 4.5589   LearningRate 0.0794   Epoch: 2   Global Step: 36440   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:06:01,444-Speed 3298.74 samples/sec   Loss 4.5191   LearningRate 0.0794   Epoch: 2   Global Step: 36450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:04,514-Speed 3336.59 samples/sec   Loss 4.4805   LearningRate 0.0793   Epoch: 2   Global Step: 36460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:07,574-Speed 3347.56 samples/sec   Loss 4.5760   LearningRate 0.0793   Epoch: 2   Global Step: 36470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:10,661-Speed 3317.82 samples/sec   Loss 4.5103   LearningRate 0.0793   Epoch: 2   Global Step: 36480   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:13,730-Speed 3337.23 samples/sec   Loss 4.5869   LearningRate 0.0793   Epoch: 2   Global Step: 36490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:16,791-Speed 3346.06 samples/sec   Loss 4.5171   LearningRate 0.0793   Epoch: 2   Global Step: 36500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:19,854-Speed 3344.09 samples/sec   Loss 4.5242   LearningRate 0.0793   Epoch: 2   Global Step: 36510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:22,954-Speed 3303.63 samples/sec   Loss 4.3564   LearningRate 0.0793   Epoch: 2   Global Step: 36520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:26,030-Speed 3329.95 samples/sec   Loss 4.4526   LearningRate 0.0793   Epoch: 2   Global Step: 36530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:29,094-Speed 3343.19 samples/sec   Loss 4.5762   LearningRate 0.0793   Epoch: 2   Global Step: 36540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:32,143-Speed 3358.25 samples/sec   Loss 4.4192   LearningRate 0.0793   Epoch: 2   Global Step: 36550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:35,243-Speed 3303.81 samples/sec   Loss 4.6186   LearningRate 0.0793   Epoch: 2   Global Step: 36560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:38,322-Speed 3327.47 samples/sec   Loss 4.4739   LearningRate 0.0793   Epoch: 2   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:41,384-Speed 3344.13 samples/sec   Loss 4.4925   LearningRate 0.0793   Epoch: 2   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:44,450-Speed 3341.35 samples/sec   Loss 4.4805   LearningRate 0.0793   Epoch: 2   Global Step: 36590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:47,514-Speed 3342.15 samples/sec   Loss 4.5662   LearningRate 0.0793   Epoch: 2   Global Step: 36600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:50,596-Speed 3323.69 samples/sec   Loss 4.6450   LearningRate 0.0793   Epoch: 2   Global Step: 36610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:53,668-Speed 3333.93 samples/sec   Loss 4.5387   LearningRate 0.0793   Epoch: 2   Global Step: 36620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:56,810-Speed 3260.27 samples/sec   Loss 4.5332   LearningRate 0.0793   Epoch: 2   Global Step: 36630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:06:59,924-Speed 3288.58 samples/sec   Loss 4.6070   LearningRate 0.0793   Epoch: 2   Global Step: 36640   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:03,014-Speed 3314.96 samples/sec   Loss 4.5583   LearningRate 0.0792   Epoch: 2   Global Step: 36650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:06,081-Speed 3339.88 samples/sec   Loss 4.5367   LearningRate 0.0792   Epoch: 2   Global Step: 36660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:09,143-Speed 3344.38 samples/sec   Loss 4.5568   LearningRate 0.0792   Epoch: 2   Global Step: 36670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:12,203-Speed 3347.66 samples/sec   Loss 4.5423   LearningRate 0.0792   Epoch: 2   Global Step: 36680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:15,272-Speed 3337.28 samples/sec   Loss 4.4409   LearningRate 0.0792   Epoch: 2   Global Step: 36690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:18,330-Speed 3349.20 samples/sec   Loss 4.5812   LearningRate 0.0792   Epoch: 2   Global Step: 36700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:21,420-Speed 3314.34 samples/sec   Loss 4.5005   LearningRate 0.0792   Epoch: 2   Global Step: 36710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:24,493-Speed 3334.27 samples/sec   Loss 4.6019   LearningRate 0.0792   Epoch: 2   Global Step: 36720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:27,554-Speed 3345.63 samples/sec   Loss 4.5701   LearningRate 0.0792   Epoch: 2   Global Step: 36730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:30,624-Speed 3336.43 samples/sec   Loss 4.5696   LearningRate 0.0792   Epoch: 2   Global Step: 36740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:33,710-Speed 3318.58 samples/sec   Loss 4.5350   LearningRate 0.0792   Epoch: 2   Global Step: 36750   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:07:36,806-Speed 3308.69 samples/sec   Loss 4.5164   LearningRate 0.0792   Epoch: 2   Global Step: 36760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:39,962-Speed 3244.78 samples/sec   Loss 4.5390   LearningRate 0.0792   Epoch: 2   Global Step: 36770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:43,023-Speed 3346.50 samples/sec   Loss 4.5758   LearningRate 0.0792   Epoch: 2   Global Step: 36780   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:46,088-Speed 3340.60 samples/sec   Loss 4.5881   LearningRate 0.0792   Epoch: 2   Global Step: 36790   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:49,197-Speed 3295.25 samples/sec   Loss 4.4869   LearningRate 0.0792   Epoch: 2   Global Step: 36800   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:07:52,297-Speed 3303.60 samples/sec   Loss 4.5610   LearningRate 0.0792   Epoch: 2   Global Step: 36810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:07:55,365-Speed 3339.26 samples/sec   Loss 4.5356   LearningRate 0.0792   Epoch: 2   Global Step: 36820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:07:58,441-Speed 3329.27 samples/sec   Loss 4.4889   LearningRate 0.0792   Epoch: 2   Global Step: 36830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:01,512-Speed 3335.15 samples/sec   Loss 4.5616   LearningRate 0.0791   Epoch: 2   Global Step: 36840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:04,599-Speed 3317.87 samples/sec   Loss 4.4495   LearningRate 0.0791   Epoch: 2   Global Step: 36850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:07,659-Speed 3347.46 samples/sec   Loss 4.5263   LearningRate 0.0791   Epoch: 2   Global Step: 36860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:10,718-Speed 3347.88 samples/sec   Loss 4.5733   LearningRate 0.0791   Epoch: 2   Global Step: 36870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:13,795-Speed 3329.29 samples/sec   Loss 4.6153   LearningRate 0.0791   Epoch: 2   Global Step: 36880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:16,880-Speed 3319.61 samples/sec   Loss 4.5752   LearningRate 0.0791   Epoch: 2   Global Step: 36890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:19,944-Speed 3343.51 samples/sec   Loss 4.5340   LearningRate 0.0791   Epoch: 2   Global Step: 36900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:23,001-Speed 3349.46 samples/sec   Loss 4.4855   LearningRate 0.0791   Epoch: 2   Global Step: 36910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:08:26,072-Speed 3335.92 samples/sec   Loss 4.5735   LearningRate 0.0791   Epoch: 2   Global Step: 36920   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:08:29,173-Speed 3302.53 samples/sec   Loss 4.4960   LearningRate 0.0791   Epoch: 2   Global Step: 36930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:32,232-Speed 3348.49 samples/sec   Loss 4.5537   LearningRate 0.0791   Epoch: 2   Global Step: 36940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:35,310-Speed 3326.98 samples/sec   Loss 4.5562   LearningRate 0.0791   Epoch: 2   Global Step: 36950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:38,384-Speed 3331.72 samples/sec   Loss 4.5506   LearningRate 0.0791   Epoch: 2   Global Step: 36960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:41,514-Speed 3272.90 samples/sec   Loss 4.4294   LearningRate 0.0791   Epoch: 2   Global Step: 36970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:08:44,655-Speed 3261.40 samples/sec   Loss 4.5079   LearningRate 0.0791   Epoch: 2   Global Step: 36980   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:08:47,758-Speed 3300.77 samples/sec   Loss 4.5863   LearningRate 0.0791   Epoch: 2   Global Step: 36990   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:08:50,838-Speed 3325.92 samples/sec   Loss 4.4713   LearningRate 0.0791   Epoch: 2   Global Step: 37000   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:08:53,917-Speed 3325.97 samples/sec   Loss 4.4830   LearningRate 0.0791   Epoch: 2   Global Step: 37010   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:08:56,981-Speed 3342.47 samples/sec   Loss 4.5677   LearningRate 0.0791   Epoch: 2   Global Step: 37020   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:09:00,054-Speed 3333.65 samples/sec   Loss 4.6211   LearningRate 0.0790   Epoch: 2   Global Step: 37030   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:09:03,120-Speed 3340.74 samples/sec   Loss 4.5255   LearningRate 0.0790   Epoch: 2   Global Step: 37040   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:09:06,180-Speed 3346.98 samples/sec   Loss 4.4846   LearningRate 0.0790   Epoch: 2   Global Step: 37050   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:09:09,247-Speed 3339.19 samples/sec   Loss 4.5096   LearningRate 0.0790   Epoch: 2   Global Step: 37060   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:09:12,322-Speed 3331.67 samples/sec   Loss 4.5323   LearningRate 0.0790   Epoch: 2   Global Step: 37070   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:09:15,390-Speed 3338.37 samples/sec   Loss 4.5451   LearningRate 0.0790   Epoch: 2   Global Step: 37080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:18,594-Speed 3196.05 samples/sec   Loss 4.5291   LearningRate 0.0790   Epoch: 2   Global Step: 37090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:21,724-Speed 3272.27 samples/sec   Loss 4.4828   LearningRate 0.0790   Epoch: 2   Global Step: 37100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:24,889-Speed 3236.76 samples/sec   Loss 4.5073   LearningRate 0.0790   Epoch: 2   Global Step: 37110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:28,033-Speed 3257.08 samples/sec   Loss 4.5548   LearningRate 0.0790   Epoch: 2   Global Step: 37120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:31,153-Speed 3283.56 samples/sec   Loss 4.5162   LearningRate 0.0790   Epoch: 2   Global Step: 37130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:34,296-Speed 3257.98 samples/sec   Loss 4.5295   LearningRate 0.0790   Epoch: 2   Global Step: 37140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:37,399-Speed 3301.41 samples/sec   Loss 4.3699   LearningRate 0.0790   Epoch: 2   Global Step: 37150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:40,593-Speed 3206.33 samples/sec   Loss 4.5518   LearningRate 0.0790   Epoch: 2   Global Step: 37160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:43,659-Speed 3341.35 samples/sec   Loss 4.6493   LearningRate 0.0790   Epoch: 2   Global Step: 37170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:09:46,806-Speed 3254.27 samples/sec   Loss 4.6317   LearningRate 0.0790   Epoch: 2   Global Step: 37180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:09:49,940-Speed 3268.55 samples/sec   Loss 4.6014   LearningRate 0.0790   Epoch: 2   Global Step: 37190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:09:53,053-Speed 3289.31 samples/sec   Loss 4.5639   LearningRate 0.0790   Epoch: 2   Global Step: 37200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:09:56,114-Speed 3346.78 samples/sec   Loss 4.4985   LearningRate 0.0789   Epoch: 2   Global Step: 37210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:09:59,184-Speed 3335.72 samples/sec   Loss 4.4815   LearningRate 0.0789   Epoch: 2   Global Step: 37220   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:02,249-Speed 3342.61 samples/sec   Loss 4.5018   LearningRate 0.0789   Epoch: 2   Global Step: 37230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:05,318-Speed 3336.56 samples/sec   Loss 4.6040   LearningRate 0.0789   Epoch: 2   Global Step: 37240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:08,381-Speed 3344.20 samples/sec   Loss 4.4699   LearningRate 0.0789   Epoch: 2   Global Step: 37250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:11,460-Speed 3326.64 samples/sec   Loss 4.5678   LearningRate 0.0789   Epoch: 2   Global Step: 37260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:14,540-Speed 3325.22 samples/sec   Loss 4.5357   LearningRate 0.0789   Epoch: 2   Global Step: 37270   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:17,611-Speed 3335.00 samples/sec   Loss 4.5773   LearningRate 0.0789   Epoch: 2   Global Step: 37280   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:10:20,663-Speed 3355.69 samples/sec   Loss 4.5344   LearningRate 0.0789   Epoch: 2   Global Step: 37290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:23,727-Speed 3342.91 samples/sec   Loss 4.4995   LearningRate 0.0789   Epoch: 2   Global Step: 37300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:26,798-Speed 3335.49 samples/sec   Loss 4.5354   LearningRate 0.0789   Epoch: 2   Global Step: 37310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:29,881-Speed 3322.11 samples/sec   Loss 4.5236   LearningRate 0.0789   Epoch: 2   Global Step: 37320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:32,954-Speed 3333.21 samples/sec   Loss 4.5084   LearningRate 0.0789   Epoch: 2   Global Step: 37330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:36,020-Speed 3340.46 samples/sec   Loss 4.4552   LearningRate 0.0789   Epoch: 2   Global Step: 37340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:39,089-Speed 3337.72 samples/sec   Loss 4.5713   LearningRate 0.0789   Epoch: 2   Global Step: 37350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:42,222-Speed 3269.52 samples/sec   Loss 4.5842   LearningRate 0.0789   Epoch: 2   Global Step: 37360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:45,365-Speed 3258.57 samples/sec   Loss 4.4426   LearningRate 0.0789   Epoch: 2   Global Step: 37370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:48,473-Speed 3294.74 samples/sec   Loss 4.5537   LearningRate 0.0789   Epoch: 2   Global Step: 37380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:51,523-Speed 3357.75 samples/sec   Loss 4.5266   LearningRate 0.0789   Epoch: 2   Global Step: 37390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:54,612-Speed 3316.92 samples/sec   Loss 4.5567   LearningRate 0.0788   Epoch: 2   Global Step: 37400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:10:57,673-Speed 3345.58 samples/sec   Loss 4.6109   LearningRate 0.0788   Epoch: 2   Global Step: 37410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:00,748-Speed 3331.40 samples/sec   Loss 4.5186   LearningRate 0.0788   Epoch: 2   Global Step: 37420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:03,817-Speed 3337.07 samples/sec   Loss 4.4685   LearningRate 0.0788   Epoch: 2   Global Step: 37430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:06,886-Speed 3337.90 samples/sec   Loss 4.5340   LearningRate 0.0788   Epoch: 2   Global Step: 37440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:09,959-Speed 3332.34 samples/sec   Loss 4.4853   LearningRate 0.0788   Epoch: 2   Global Step: 37450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:13,053-Speed 3310.64 samples/sec   Loss 4.5717   LearningRate 0.0788   Epoch: 2   Global Step: 37460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:16,126-Speed 3332.59 samples/sec   Loss 4.5983   LearningRate 0.0788   Epoch: 2   Global Step: 37470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:19,212-Speed 3319.82 samples/sec   Loss 4.5901   LearningRate 0.0788   Epoch: 2   Global Step: 37480   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:22,283-Speed 3334.04 samples/sec   Loss 4.5887   LearningRate 0.0788   Epoch: 2   Global Step: 37490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:25,354-Speed 3335.71 samples/sec   Loss 4.5518   LearningRate 0.0788   Epoch: 2   Global Step: 37500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:28,438-Speed 3321.20 samples/sec   Loss 4.5879   LearningRate 0.0788   Epoch: 2   Global Step: 37510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:31,518-Speed 3326.42 samples/sec   Loss 4.4911   LearningRate 0.0788   Epoch: 2   Global Step: 37520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:34,587-Speed 3337.20 samples/sec   Loss 4.4869   LearningRate 0.0788   Epoch: 2   Global Step: 37530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:37,662-Speed 3330.65 samples/sec   Loss 4.4916   LearningRate 0.0788   Epoch: 2   Global Step: 37540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:40,729-Speed 3338.94 samples/sec   Loss 4.5715   LearningRate 0.0788   Epoch: 2   Global Step: 37550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:43,792-Speed 3344.33 samples/sec   Loss 4.5105   LearningRate 0.0788   Epoch: 2   Global Step: 37560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:46,875-Speed 3322.13 samples/sec   Loss 4.5586   LearningRate 0.0788   Epoch: 2   Global Step: 37570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:49,936-Speed 3346.13 samples/sec   Loss 4.6388   LearningRate 0.0788   Epoch: 2   Global Step: 37580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:53,019-Speed 3322.95 samples/sec   Loss 4.4984   LearningRate 0.0787   Epoch: 2   Global Step: 37590   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:11:56,079-Speed 3346.73 samples/sec   Loss 4.4758   LearningRate 0.0787   Epoch: 2   Global Step: 37600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:11:59,152-Speed 3333.58 samples/sec   Loss 4.5223   LearningRate 0.0787   Epoch: 2   Global Step: 37610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:02,229-Speed 3328.19 samples/sec   Loss 4.5232   LearningRate 0.0787   Epoch: 2   Global Step: 37620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:05,295-Speed 3340.48 samples/sec   Loss 4.5870   LearningRate 0.0787   Epoch: 2   Global Step: 37630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:08,424-Speed 3273.22 samples/sec   Loss 4.4772   LearningRate 0.0787   Epoch: 2   Global Step: 37640   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:11,499-Speed 3331.28 samples/sec   Loss 4.3806   LearningRate 0.0787   Epoch: 2   Global Step: 37650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:14,574-Speed 3330.47 samples/sec   Loss 4.5562   LearningRate 0.0787   Epoch: 2   Global Step: 37660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:17,644-Speed 3336.23 samples/sec   Loss 4.5346   LearningRate 0.0787   Epoch: 2   Global Step: 37670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:20,734-Speed 3314.96 samples/sec   Loss 4.4655   LearningRate 0.0787   Epoch: 2   Global Step: 37680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:23,812-Speed 3328.04 samples/sec   Loss 4.3793   LearningRate 0.0787   Epoch: 2   Global Step: 37690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:26,880-Speed 3337.67 samples/sec   Loss 4.4885   LearningRate 0.0787   Epoch: 2   Global Step: 37700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:29,945-Speed 3342.40 samples/sec   Loss 4.5817   LearningRate 0.0787   Epoch: 2   Global Step: 37710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:33,010-Speed 3341.49 samples/sec   Loss 4.4939   LearningRate 0.0787   Epoch: 2   Global Step: 37720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:36,075-Speed 3341.90 samples/sec   Loss 4.5458   LearningRate 0.0787   Epoch: 2   Global Step: 37730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:39,186-Speed 3292.43 samples/sec   Loss 4.4329   LearningRate 0.0787   Epoch: 2   Global Step: 37740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:42,272-Speed 3318.59 samples/sec   Loss 4.4718   LearningRate 0.0787   Epoch: 2   Global Step: 37750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:45,338-Speed 3340.94 samples/sec   Loss 4.5888   LearningRate 0.0787   Epoch: 2   Global Step: 37760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:48,413-Speed 3330.52 samples/sec   Loss 4.4546   LearningRate 0.0787   Epoch: 2   Global Step: 37770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:51,475-Speed 3345.64 samples/sec   Loss 4.5589   LearningRate 0.0786   Epoch: 2   Global Step: 37780   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:54,549-Speed 3331.87 samples/sec   Loss 4.5498   LearningRate 0.0786   Epoch: 2   Global Step: 37790   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:12:57,611-Speed 3344.64 samples/sec   Loss 4.5666   LearningRate 0.0786   Epoch: 2   Global Step: 37800   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:13:00,687-Speed 3329.34 samples/sec   Loss 4.5006   LearningRate 0.0786   Epoch: 2   Global Step: 37810   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:03,788-Speed 3303.72 samples/sec   Loss 4.4889   LearningRate 0.0786   Epoch: 2   Global Step: 37820   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:06,965-Speed 3223.39 samples/sec   Loss 4.5017   LearningRate 0.0786   Epoch: 2   Global Step: 37830   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:10,055-Speed 3314.40 samples/sec   Loss 4.4487   LearningRate 0.0786   Epoch: 2   Global Step: 37840   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:13,136-Speed 3323.96 samples/sec   Loss 4.5211   LearningRate 0.0786   Epoch: 2   Global Step: 37850   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:16,199-Speed 3345.42 samples/sec   Loss 4.6045   LearningRate 0.0786   Epoch: 2   Global Step: 37860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:19,265-Speed 3339.73 samples/sec   Loss 4.5213   LearningRate 0.0786   Epoch: 2   Global Step: 37870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:22,356-Speed 3314.11 samples/sec   Loss 4.5535   LearningRate 0.0786   Epoch: 2   Global Step: 37880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:25,445-Speed 3315.24 samples/sec   Loss 4.5381   LearningRate 0.0786   Epoch: 2   Global Step: 37890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:28,530-Speed 3320.57 samples/sec   Loss 4.5133   LearningRate 0.0786   Epoch: 2   Global Step: 37900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:13:31,595-Speed 3341.11 samples/sec   Loss 4.5681   LearningRate 0.0786   Epoch: 2   Global Step: 37910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:13:34,665-Speed 3336.73 samples/sec   Loss 4.4710   LearningRate 0.0786   Epoch: 2   Global Step: 37920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:13:37,734-Speed 3336.88 samples/sec   Loss 4.4750   LearningRate 0.0786   Epoch: 2   Global Step: 37930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:13:40,819-Speed 3320.71 samples/sec   Loss 4.5279   LearningRate 0.0786   Epoch: 2   Global Step: 37940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:13:43,901-Speed 3323.51 samples/sec   Loss 4.4942   LearningRate 0.0786   Epoch: 2   Global Step: 37950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:13:46,965-Speed 3343.25 samples/sec   Loss 4.4826   LearningRate 0.0786   Epoch: 2   Global Step: 37960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:13:50,051-Speed 3318.63 samples/sec   Loss 4.5626   LearningRate 0.0785   Epoch: 2   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:13:53,112-Speed 3345.61 samples/sec   Loss 4.4511   LearningRate 0.0785   Epoch: 2   Global Step: 37980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:13:56,179-Speed 3339.87 samples/sec   Loss 4.5027   LearningRate 0.0785   Epoch: 2   Global Step: 37990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:13:59,239-Speed 3346.64 samples/sec   Loss 4.4193   LearningRate 0.0785   Epoch: 2   Global Step: 38000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:14:42,998-[lfw][38000]XNorm: 22.292006
Training: 2022-04-11 03:14:42,998-[lfw][38000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-11 03:14:42,999-[lfw][38000]Accuracy-Highest: 0.99767
Training: 2022-04-11 03:15:33,888-[cfp_fp][38000]XNorm: 20.497811
Training: 2022-04-11 03:15:33,889-[cfp_fp][38000]Accuracy-Flip: 0.98143+-0.00452
Training: 2022-04-11 03:15:33,889-[cfp_fp][38000]Accuracy-Highest: 0.98143
Training: 2022-04-11 03:16:17,562-[agedb_30][38000]XNorm: 22.437197
Training: 2022-04-11 03:16:17,563-[agedb_30][38000]Accuracy-Flip: 0.97467+-0.00903
Training: 2022-04-11 03:16:17,563-[agedb_30][38000]Accuracy-Highest: 0.97667
Training: 2022-04-11 03:16:20,636-Speed 72.42 samples/sec   Loss 4.4865   LearningRate 0.0785   Epoch: 2   Global Step: 38010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:23,697-Speed 3346.46 samples/sec   Loss 4.5176   LearningRate 0.0785   Epoch: 2   Global Step: 38020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:26,766-Speed 3337.57 samples/sec   Loss 4.4998   LearningRate 0.0785   Epoch: 2   Global Step: 38030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:29,838-Speed 3333.96 samples/sec   Loss 4.4981   LearningRate 0.0785   Epoch: 2   Global Step: 38040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:32,901-Speed 3343.10 samples/sec   Loss 4.5047   LearningRate 0.0785   Epoch: 2   Global Step: 38050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:35,970-Speed 3337.68 samples/sec   Loss 4.4945   LearningRate 0.0785   Epoch: 2   Global Step: 38060   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:39,024-Speed 3354.77 samples/sec   Loss 4.4811   LearningRate 0.0785   Epoch: 2   Global Step: 38070   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:42,083-Speed 3348.35 samples/sec   Loss 4.4961   LearningRate 0.0785   Epoch: 2   Global Step: 38080   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:45,245-Speed 3239.10 samples/sec   Loss 4.4950   LearningRate 0.0785   Epoch: 2   Global Step: 38090   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:48,303-Speed 3349.30 samples/sec   Loss 4.6078   LearningRate 0.0785   Epoch: 2   Global Step: 38100   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:51,364-Speed 3345.33 samples/sec   Loss 4.5044   LearningRate 0.0785   Epoch: 2   Global Step: 38110   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:16:54,412-Speed 3360.37 samples/sec   Loss 4.5311   LearningRate 0.0785   Epoch: 2   Global Step: 38120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:16:57,475-Speed 3344.25 samples/sec   Loss 4.5646   LearningRate 0.0785   Epoch: 2   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:17:00,555-Speed 3325.63 samples/sec   Loss 4.4584   LearningRate 0.0785   Epoch: 2   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:17:03,610-Speed 3352.26 samples/sec   Loss 4.3712   LearningRate 0.0784   Epoch: 2   Global Step: 38150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:06,672-Speed 3345.18 samples/sec   Loss 4.6184   LearningRate 0.0784   Epoch: 2   Global Step: 38160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:09,748-Speed 3330.36 samples/sec   Loss 4.5823   LearningRate 0.0784   Epoch: 2   Global Step: 38170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:12,823-Speed 3330.45 samples/sec   Loss 4.5845   LearningRate 0.0784   Epoch: 2   Global Step: 38180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:15,958-Speed 3266.54 samples/sec   Loss 4.5324   LearningRate 0.0784   Epoch: 2   Global Step: 38190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:19,050-Speed 3313.10 samples/sec   Loss 4.4308   LearningRate 0.0784   Epoch: 2   Global Step: 38200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:22,113-Speed 3343.71 samples/sec   Loss 4.4964   LearningRate 0.0784   Epoch: 2   Global Step: 38210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:25,172-Speed 3348.33 samples/sec   Loss 4.4875   LearningRate 0.0784   Epoch: 2   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:28,235-Speed 3343.55 samples/sec   Loss 4.4346   LearningRate 0.0784   Epoch: 2   Global Step: 38230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:31,449-Speed 3187.51 samples/sec   Loss 4.5237   LearningRate 0.0784   Epoch: 2   Global Step: 38240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:34,534-Speed 3320.06 samples/sec   Loss 4.5103   LearningRate 0.0784   Epoch: 2   Global Step: 38250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:17:37,689-Speed 3245.98 samples/sec   Loss 4.5567   LearningRate 0.0784   Epoch: 2   Global Step: 38260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:17:40,742-Speed 3354.80 samples/sec   Loss 4.4774   LearningRate 0.0784   Epoch: 2   Global Step: 38270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:43,821-Speed 3326.94 samples/sec   Loss 4.4743   LearningRate 0.0784   Epoch: 2   Global Step: 38280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:46,890-Speed 3337.46 samples/sec   Loss 4.5329   LearningRate 0.0784   Epoch: 2   Global Step: 38290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:49,948-Speed 3348.62 samples/sec   Loss 4.4568   LearningRate 0.0784   Epoch: 2   Global Step: 38300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:53,013-Speed 3342.17 samples/sec   Loss 4.5162   LearningRate 0.0784   Epoch: 2   Global Step: 38310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:56,076-Speed 3346.53 samples/sec   Loss 4.5004   LearningRate 0.0784   Epoch: 2   Global Step: 38320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:17:59,148-Speed 3333.98 samples/sec   Loss 4.5818   LearningRate 0.0784   Epoch: 2   Global Step: 38330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:18:02,205-Speed 3350.09 samples/sec   Loss 4.5193   LearningRate 0.0783   Epoch: 2   Global Step: 38340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:18:05,275-Speed 3336.16 samples/sec   Loss 4.5577   LearningRate 0.0783   Epoch: 2   Global Step: 38350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:18:08,333-Speed 3349.38 samples/sec   Loss 4.5833   LearningRate 0.0783   Epoch: 2   Global Step: 38360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:18:11,393-Speed 3347.73 samples/sec   Loss 4.4233   LearningRate 0.0783   Epoch: 2   Global Step: 38370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:14,461-Speed 3337.98 samples/sec   Loss 4.5425   LearningRate 0.0783   Epoch: 2   Global Step: 38380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:17,516-Speed 3352.99 samples/sec   Loss 4.5378   LearningRate 0.0783   Epoch: 2   Global Step: 38390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:20,614-Speed 3305.57 samples/sec   Loss 4.5491   LearningRate 0.0783   Epoch: 2   Global Step: 38400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:23,712-Speed 3306.54 samples/sec   Loss 4.4726   LearningRate 0.0783   Epoch: 2   Global Step: 38410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:26,775-Speed 3343.90 samples/sec   Loss 4.5229   LearningRate 0.0783   Epoch: 2   Global Step: 38420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:29,834-Speed 3349.04 samples/sec   Loss 4.5452   LearningRate 0.0783   Epoch: 2   Global Step: 38430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:32,894-Speed 3347.28 samples/sec   Loss 4.5972   LearningRate 0.0783   Epoch: 2   Global Step: 38440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:35,964-Speed 3335.71 samples/sec   Loss 4.5922   LearningRate 0.0783   Epoch: 2   Global Step: 38450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:39,023-Speed 3348.01 samples/sec   Loss 4.5851   LearningRate 0.0783   Epoch: 2   Global Step: 38460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:42,081-Speed 3349.59 samples/sec   Loss 4.4907   LearningRate 0.0783   Epoch: 2   Global Step: 38470   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:18:45,134-Speed 3354.72 samples/sec   Loss 4.5989   LearningRate 0.0783   Epoch: 2   Global Step: 38480   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:48,212-Speed 3327.36 samples/sec   Loss 4.5131   LearningRate 0.0783   Epoch: 2   Global Step: 38490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:51,276-Speed 3343.78 samples/sec   Loss 4.4488   LearningRate 0.0783   Epoch: 2   Global Step: 38500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:54,332-Speed 3351.26 samples/sec   Loss 4.4484   LearningRate 0.0783   Epoch: 2   Global Step: 38510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:18:57,398-Speed 3340.49 samples/sec   Loss 4.4926   LearningRate 0.0783   Epoch: 2   Global Step: 38520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:19:00,461-Speed 3344.51 samples/sec   Loss 4.5009   LearningRate 0.0782   Epoch: 2   Global Step: 38530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:19:03,519-Speed 3348.53 samples/sec   Loss 4.5578   LearningRate 0.0782   Epoch: 2   Global Step: 38540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:19:06,586-Speed 3339.87 samples/sec   Loss 4.4813   LearningRate 0.0782   Epoch: 2   Global Step: 38550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:19:09,648-Speed 3344.73 samples/sec   Loss 4.5397   LearningRate 0.0782   Epoch: 2   Global Step: 38560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:19:12,718-Speed 3337.15 samples/sec   Loss 4.4554   LearningRate 0.0782   Epoch: 2   Global Step: 38570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:19:15,773-Speed 3351.98 samples/sec   Loss 4.4273   LearningRate 0.0782   Epoch: 2   Global Step: 38580   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:19:18,824-Speed 3357.25 samples/sec   Loss 4.4635   LearningRate 0.0782   Epoch: 2   Global Step: 38590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:19:21,885-Speed 3345.81 samples/sec   Loss 4.4909   LearningRate 0.0782   Epoch: 2   Global Step: 38600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:19:24,945-Speed 3348.23 samples/sec   Loss 4.4403   LearningRate 0.0782   Epoch: 2   Global Step: 38610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:19:27,991-Speed 3362.06 samples/sec   Loss 4.4892   LearningRate 0.0782   Epoch: 2   Global Step: 38620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:31,050-Speed 3348.44 samples/sec   Loss 4.4851   LearningRate 0.0782   Epoch: 2   Global Step: 38630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:34,133-Speed 3322.19 samples/sec   Loss 4.5588   LearningRate 0.0782   Epoch: 2   Global Step: 38640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:37,217-Speed 3320.54 samples/sec   Loss 4.5512   LearningRate 0.0782   Epoch: 2   Global Step: 38650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:40,303-Speed 3319.30 samples/sec   Loss 4.6355   LearningRate 0.0782   Epoch: 2   Global Step: 38660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:43,366-Speed 3344.21 samples/sec   Loss 4.5036   LearningRate 0.0782   Epoch: 2   Global Step: 38670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:46,436-Speed 3336.32 samples/sec   Loss 4.5166   LearningRate 0.0782   Epoch: 2   Global Step: 38680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:49,495-Speed 3347.89 samples/sec   Loss 4.6222   LearningRate 0.0782   Epoch: 2   Global Step: 38690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:52,554-Speed 3348.62 samples/sec   Loss 4.5032   LearningRate 0.0782   Epoch: 2   Global Step: 38700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:55,642-Speed 3316.97 samples/sec   Loss 4.5876   LearningRate 0.0782   Epoch: 2   Global Step: 38710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:19:58,719-Speed 3328.32 samples/sec   Loss 4.5235   LearningRate 0.0781   Epoch: 2   Global Step: 38720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:20:01,781-Speed 3345.25 samples/sec   Loss 4.4867   LearningRate 0.0781   Epoch: 2   Global Step: 38730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:20:04,841-Speed 3347.00 samples/sec   Loss 4.5278   LearningRate 0.0781   Epoch: 2   Global Step: 38740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:20:07,909-Speed 3338.43 samples/sec   Loss 4.5423   LearningRate 0.0781   Epoch: 2   Global Step: 38750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:20:10,976-Speed 3339.86 samples/sec   Loss 4.4836   LearningRate 0.0781   Epoch: 2   Global Step: 38760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:20:14,871-Speed 2629.64 samples/sec   Loss 4.5010   LearningRate 0.0781   Epoch: 2   Global Step: 38770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:17,937-Speed 3340.99 samples/sec   Loss 4.5427   LearningRate 0.0781   Epoch: 2   Global Step: 38780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:21,040-Speed 3301.48 samples/sec   Loss 4.5314   LearningRate 0.0781   Epoch: 2   Global Step: 38790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:24,122-Speed 3323.02 samples/sec   Loss 4.4873   LearningRate 0.0781   Epoch: 2   Global Step: 38800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:27,187-Speed 3341.18 samples/sec   Loss 4.5021   LearningRate 0.0781   Epoch: 2   Global Step: 38810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:30,253-Speed 3340.52 samples/sec   Loss 4.4888   LearningRate 0.0781   Epoch: 2   Global Step: 38820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:33,318-Speed 3342.13 samples/sec   Loss 4.5725   LearningRate 0.0781   Epoch: 2   Global Step: 38830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:36,386-Speed 3338.52 samples/sec   Loss 4.4205   LearningRate 0.0781   Epoch: 2   Global Step: 38840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:39,461-Speed 3330.60 samples/sec   Loss 4.5755   LearningRate 0.0781   Epoch: 2   Global Step: 38850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:42,536-Speed 3330.98 samples/sec   Loss 4.4103   LearningRate 0.0781   Epoch: 2   Global Step: 38860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:20:45,605-Speed 3337.21 samples/sec   Loss 4.4598   LearningRate 0.0781   Epoch: 2   Global Step: 38870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:20:48,671-Speed 3340.94 samples/sec   Loss 4.5487   LearningRate 0.0781   Epoch: 2   Global Step: 38880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:20:51,769-Speed 3306.59 samples/sec   Loss 4.4936   LearningRate 0.0781   Epoch: 2   Global Step: 38890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:20:54,860-Speed 3313.97 samples/sec   Loss 4.4277   LearningRate 0.0781   Epoch: 2   Global Step: 38900   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:20:57,957-Speed 3306.38 samples/sec   Loss 4.4328   LearningRate 0.0780   Epoch: 2   Global Step: 38910   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:21:01,016-Speed 3348.57 samples/sec   Loss 4.4482   LearningRate 0.0780   Epoch: 2   Global Step: 38920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:04,080-Speed 3343.50 samples/sec   Loss 4.4897   LearningRate 0.0780   Epoch: 2   Global Step: 38930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:07,143-Speed 3343.51 samples/sec   Loss 4.4920   LearningRate 0.0780   Epoch: 2   Global Step: 38940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:10,211-Speed 3338.61 samples/sec   Loss 4.5270   LearningRate 0.0780   Epoch: 2   Global Step: 38950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:13,284-Speed 3332.93 samples/sec   Loss 4.5010   LearningRate 0.0780   Epoch: 2   Global Step: 38960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:16,358-Speed 3331.79 samples/sec   Loss 4.5079   LearningRate 0.0780   Epoch: 2   Global Step: 38970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:19,420-Speed 3345.18 samples/sec   Loss 4.4900   LearningRate 0.0780   Epoch: 2   Global Step: 38980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:22,482-Speed 3345.41 samples/sec   Loss 4.5571   LearningRate 0.0780   Epoch: 2   Global Step: 38990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:25,540-Speed 3349.24 samples/sec   Loss 4.5493   LearningRate 0.0780   Epoch: 2   Global Step: 39000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:28,598-Speed 3349.22 samples/sec   Loss 4.4822   LearningRate 0.0780   Epoch: 2   Global Step: 39010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:31,667-Speed 3336.92 samples/sec   Loss 4.5193   LearningRate 0.0780   Epoch: 2   Global Step: 39020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:21:34,731-Speed 3342.79 samples/sec   Loss 4.5462   LearningRate 0.0780   Epoch: 2   Global Step: 39030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:21:37,804-Speed 3333.39 samples/sec   Loss 4.5491   LearningRate 0.0780   Epoch: 2   Global Step: 39040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:21:40,868-Speed 3342.46 samples/sec   Loss 4.5077   LearningRate 0.0780   Epoch: 2   Global Step: 39050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:21:43,930-Speed 3345.22 samples/sec   Loss 4.5697   LearningRate 0.0780   Epoch: 2   Global Step: 39060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:46,993-Speed 3343.98 samples/sec   Loss 4.5407   LearningRate 0.0780   Epoch: 2   Global Step: 39070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:50,058-Speed 3342.69 samples/sec   Loss 4.5032   LearningRate 0.0780   Epoch: 2   Global Step: 39080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:21:53,117-Speed 3347.71 samples/sec   Loss 4.4623   LearningRate 0.0780   Epoch: 2   Global Step: 39090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:22:00,633-Speed 1362.55 samples/sec   Loss 4.5287   LearningRate 0.0779   Epoch: 2   Global Step: 39100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:22:03,691-Speed 3349.31 samples/sec   Loss 4.5666   LearningRate 0.0779   Epoch: 2   Global Step: 39110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:22:06,761-Speed 3337.00 samples/sec   Loss 4.4952   LearningRate 0.0779   Epoch: 2   Global Step: 39120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:22:09,824-Speed 3343.16 samples/sec   Loss 4.4685   LearningRate 0.0779   Epoch: 2   Global Step: 39130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:22:12,920-Speed 3309.14 samples/sec   Loss 4.3911   LearningRate 0.0779   Epoch: 2   Global Step: 39140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:22:15,981-Speed 3346.09 samples/sec   Loss 4.3897   LearningRate 0.0779   Epoch: 2   Global Step: 39150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:22:19,044-Speed 3343.75 samples/sec   Loss 4.5509   LearningRate 0.0779   Epoch: 2   Global Step: 39160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:22,126-Speed 3323.28 samples/sec   Loss 4.5361   LearningRate 0.0779   Epoch: 2   Global Step: 39170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:25,190-Speed 3342.90 samples/sec   Loss 4.4894   LearningRate 0.0779   Epoch: 2   Global Step: 39180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:28,262-Speed 3333.69 samples/sec   Loss 4.5229   LearningRate 0.0779   Epoch: 2   Global Step: 39190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:31,325-Speed 3343.44 samples/sec   Loss 4.4645   LearningRate 0.0779   Epoch: 2   Global Step: 39200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:34,459-Speed 3268.31 samples/sec   Loss 4.4799   LearningRate 0.0779   Epoch: 2   Global Step: 39210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:37,548-Speed 3316.40 samples/sec   Loss 4.5252   LearningRate 0.0779   Epoch: 2   Global Step: 39220   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:40,604-Speed 3351.30 samples/sec   Loss 4.5216   LearningRate 0.0779   Epoch: 2   Global Step: 39230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:43,664-Speed 3347.02 samples/sec   Loss 4.5873   LearningRate 0.0779   Epoch: 2   Global Step: 39240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:46,760-Speed 3308.39 samples/sec   Loss 4.5134   LearningRate 0.0779   Epoch: 2   Global Step: 39250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:49,907-Speed 3256.78 samples/sec   Loss 4.4954   LearningRate 0.0779   Epoch: 2   Global Step: 39260   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:22:52,970-Speed 3344.83 samples/sec   Loss 4.4555   LearningRate 0.0779   Epoch: 2   Global Step: 39270   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:22:56,017-Speed 3361.48 samples/sec   Loss 4.6040   LearningRate 0.0779   Epoch: 2   Global Step: 39280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:22:59,106-Speed 3315.09 samples/sec   Loss 4.5443   LearningRate 0.0778   Epoch: 2   Global Step: 39290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:02,208-Speed 3301.99 samples/sec   Loss 4.5111   LearningRate 0.0778   Epoch: 2   Global Step: 39300   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:05,272-Speed 3344.52 samples/sec   Loss 4.4097   LearningRate 0.0778   Epoch: 2   Global Step: 39310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:08,328-Speed 3351.13 samples/sec   Loss 4.4958   LearningRate 0.0778   Epoch: 2   Global Step: 39320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:11,385-Speed 3351.04 samples/sec   Loss 4.5128   LearningRate 0.0778   Epoch: 2   Global Step: 39330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:14,509-Speed 3278.87 samples/sec   Loss 4.5344   LearningRate 0.0778   Epoch: 2   Global Step: 39340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:17,574-Speed 3340.80 samples/sec   Loss 4.5431   LearningRate 0.0778   Epoch: 2   Global Step: 39350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:20,656-Speed 3323.01 samples/sec   Loss 4.4735   LearningRate 0.0778   Epoch: 2   Global Step: 39360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:23,735-Speed 3327.39 samples/sec   Loss 4.4524   LearningRate 0.0778   Epoch: 2   Global Step: 39370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:26,801-Speed 3340.59 samples/sec   Loss 4.4851   LearningRate 0.0778   Epoch: 2   Global Step: 39380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:23:29,875-Speed 3331.62 samples/sec   Loss 4.4651   LearningRate 0.0778   Epoch: 2   Global Step: 39390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:23:32,954-Speed 3326.23 samples/sec   Loss 4.4996   LearningRate 0.0778   Epoch: 2   Global Step: 39400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:23:36,017-Speed 3343.96 samples/sec   Loss 4.4807   LearningRate 0.0778   Epoch: 2   Global Step: 39410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:23:39,080-Speed 3344.05 samples/sec   Loss 4.4594   LearningRate 0.0778   Epoch: 2   Global Step: 39420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:23:42,146-Speed 3341.31 samples/sec   Loss 4.4940   LearningRate 0.0778   Epoch: 2   Global Step: 39430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:23:45,283-Speed 3264.72 samples/sec   Loss 4.5693   LearningRate 0.0778   Epoch: 2   Global Step: 39440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:23:48,370-Speed 3318.05 samples/sec   Loss 4.5793   LearningRate 0.0778   Epoch: 2   Global Step: 39450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:23:51,515-Speed 3256.33 samples/sec   Loss 4.5034   LearningRate 0.0778   Epoch: 2   Global Step: 39460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:23:54,599-Speed 3321.10 samples/sec   Loss 4.5394   LearningRate 0.0778   Epoch: 2   Global Step: 39470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:23:57,737-Speed 3263.70 samples/sec   Loss 4.4735   LearningRate 0.0777   Epoch: 2   Global Step: 39480   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:00,818-Speed 3324.70 samples/sec   Loss 4.4595   LearningRate 0.0777   Epoch: 2   Global Step: 39490   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:24:03,962-Speed 3258.12 samples/sec   Loss 4.3939   LearningRate 0.0777   Epoch: 2   Global Step: 39500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:07,126-Speed 3236.94 samples/sec   Loss 4.5030   LearningRate 0.0777   Epoch: 2   Global Step: 39510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:10,197-Speed 3335.79 samples/sec   Loss 4.4101   LearningRate 0.0777   Epoch: 2   Global Step: 39520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:13,291-Speed 3309.96 samples/sec   Loss 4.4786   LearningRate 0.0777   Epoch: 2   Global Step: 39530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:16,392-Speed 3302.33 samples/sec   Loss 4.4360   LearningRate 0.0777   Epoch: 2   Global Step: 39540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:19,475-Speed 3322.73 samples/sec   Loss 4.5007   LearningRate 0.0777   Epoch: 2   Global Step: 39550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:22,548-Speed 3333.29 samples/sec   Loss 4.5001   LearningRate 0.0777   Epoch: 2   Global Step: 39560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:25,621-Speed 3332.95 samples/sec   Loss 4.5141   LearningRate 0.0777   Epoch: 2   Global Step: 39570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:28,696-Speed 3330.78 samples/sec   Loss 4.4485   LearningRate 0.0777   Epoch: 2   Global Step: 39580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:31,758-Speed 3344.78 samples/sec   Loss 4.4653   LearningRate 0.0777   Epoch: 2   Global Step: 39590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:34,810-Speed 3356.73 samples/sec   Loss 4.4799   LearningRate 0.0777   Epoch: 2   Global Step: 39600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:37,876-Speed 3339.92 samples/sec   Loss 4.3777   LearningRate 0.0777   Epoch: 2   Global Step: 39610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:40,942-Speed 3341.00 samples/sec   Loss 4.5792   LearningRate 0.0777   Epoch: 2   Global Step: 39620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:44,037-Speed 3309.54 samples/sec   Loss 4.5371   LearningRate 0.0777   Epoch: 2   Global Step: 39630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:47,126-Speed 3315.25 samples/sec   Loss 4.4448   LearningRate 0.0777   Epoch: 2   Global Step: 39640   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:50,205-Speed 3327.09 samples/sec   Loss 4.3971   LearningRate 0.0777   Epoch: 2   Global Step: 39650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:53,275-Speed 3336.87 samples/sec   Loss 4.4361   LearningRate 0.0777   Epoch: 2   Global Step: 39660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:56,432-Speed 3244.42 samples/sec   Loss 4.4053   LearningRate 0.0776   Epoch: 2   Global Step: 39670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:24:59,508-Speed 3329.23 samples/sec   Loss 4.4041   LearningRate 0.0776   Epoch: 2   Global Step: 39680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:25:02,578-Speed 3336.00 samples/sec   Loss 4.4467   LearningRate 0.0776   Epoch: 2   Global Step: 39690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:25:05,663-Speed 3320.76 samples/sec   Loss 4.3879   LearningRate 0.0776   Epoch: 2   Global Step: 39700   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:25:08,742-Speed 3326.71 samples/sec   Loss 4.4543   LearningRate 0.0776   Epoch: 2   Global Step: 39710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:25:11,874-Speed 3269.99 samples/sec   Loss 4.4659   LearningRate 0.0776   Epoch: 2   Global Step: 39720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:25:14,955-Speed 3324.15 samples/sec   Loss 4.4605   LearningRate 0.0776   Epoch: 2   Global Step: 39730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:25:18,027-Speed 3333.96 samples/sec   Loss 4.4183   LearningRate 0.0776   Epoch: 2   Global Step: 39740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:25:21,088-Speed 3346.68 samples/sec   Loss 4.4844   LearningRate 0.0776   Epoch: 2   Global Step: 39750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:25:24,227-Speed 3263.28 samples/sec   Loss 4.4406   LearningRate 0.0776   Epoch: 2   Global Step: 39760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:27,341-Speed 3288.28 samples/sec   Loss 4.3857   LearningRate 0.0776   Epoch: 2   Global Step: 39770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:30,488-Speed 3255.52 samples/sec   Loss 4.4932   LearningRate 0.0776   Epoch: 2   Global Step: 39780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:33,551-Speed 3342.99 samples/sec   Loss 4.3876   LearningRate 0.0776   Epoch: 2   Global Step: 39790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:36,611-Speed 3347.89 samples/sec   Loss 4.5084   LearningRate 0.0776   Epoch: 2   Global Step: 39800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:39,682-Speed 3335.39 samples/sec   Loss 4.4208   LearningRate 0.0776   Epoch: 2   Global Step: 39810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:42,749-Speed 3338.72 samples/sec   Loss 4.5714   LearningRate 0.0776   Epoch: 2   Global Step: 39820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:45,816-Speed 3339.59 samples/sec   Loss 4.4523   LearningRate 0.0776   Epoch: 2   Global Step: 39830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:48,882-Speed 3341.29 samples/sec   Loss 4.4883   LearningRate 0.0776   Epoch: 2   Global Step: 39840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:51,947-Speed 3341.92 samples/sec   Loss 4.4710   LearningRate 0.0775   Epoch: 2   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:55,009-Speed 3345.04 samples/sec   Loss 4.4951   LearningRate 0.0775   Epoch: 2   Global Step: 39860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:25:58,097-Speed 3317.14 samples/sec   Loss 4.5182   LearningRate 0.0775   Epoch: 2   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:26:01,175-Speed 3327.15 samples/sec   Loss 4.5003   LearningRate 0.0775   Epoch: 2   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:26:04,241-Speed 3341.04 samples/sec   Loss 4.3912   LearningRate 0.0775   Epoch: 2   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:26:07,308-Speed 3339.44 samples/sec   Loss 4.5040   LearningRate 0.0775   Epoch: 2   Global Step: 39900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:26:10,395-Speed 3317.25 samples/sec   Loss 4.5566   LearningRate 0.0775   Epoch: 2   Global Step: 39910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:26:13,540-Speed 3257.33 samples/sec   Loss 4.5294   LearningRate 0.0775   Epoch: 2   Global Step: 39920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:26:16,603-Speed 3343.66 samples/sec   Loss 4.4720   LearningRate 0.0775   Epoch: 2   Global Step: 39930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:26:19,666-Speed 3344.15 samples/sec   Loss 4.3726   LearningRate 0.0775   Epoch: 2   Global Step: 39940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:26:22,746-Speed 3325.76 samples/sec   Loss 4.4109   LearningRate 0.0775   Epoch: 2   Global Step: 39950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:26:25,810-Speed 3343.22 samples/sec   Loss 4.4032   LearningRate 0.0775   Epoch: 2   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:26:28,879-Speed 3336.55 samples/sec   Loss 4.5710   LearningRate 0.0775   Epoch: 2   Global Step: 39970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:26:31,970-Speed 3313.39 samples/sec   Loss 4.4481   LearningRate 0.0775   Epoch: 2   Global Step: 39980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:26:35,043-Speed 3334.17 samples/sec   Loss 4.3815   LearningRate 0.0775   Epoch: 2   Global Step: 39990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:26:38,110-Speed 3338.72 samples/sec   Loss 4.4757   LearningRate 0.0775   Epoch: 2   Global Step: 40000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:27:21,662-[lfw][40000]XNorm: 23.173412
Training: 2022-04-11 03:27:21,662-[lfw][40000]Accuracy-Flip: 0.99700+-0.00287
Training: 2022-04-11 03:27:21,663-[lfw][40000]Accuracy-Highest: 0.99767
Training: 2022-04-11 03:28:12,313-[cfp_fp][40000]XNorm: 21.302325
Training: 2022-04-11 03:28:12,314-[cfp_fp][40000]Accuracy-Flip: 0.97786+-0.00630
Training: 2022-04-11 03:28:12,314-[cfp_fp][40000]Accuracy-Highest: 0.98143
Training: 2022-04-11 03:28:55,818-[agedb_30][40000]XNorm: 23.249844
Training: 2022-04-11 03:28:55,818-[agedb_30][40000]Accuracy-Flip: 0.97833+-0.00833
Training: 2022-04-11 03:28:55,819-[agedb_30][40000]Accuracy-Highest: 0.97833
Training: 2022-04-11 03:28:58,870-Speed 72.75 samples/sec   Loss 4.4106   LearningRate 0.0775   Epoch: 2   Global Step: 40010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:01,922-Speed 3355.35 samples/sec   Loss 4.4637   LearningRate 0.0775   Epoch: 2   Global Step: 40020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:04,977-Speed 3353.03 samples/sec   Loss 4.4836   LearningRate 0.0775   Epoch: 2   Global Step: 40030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:08,094-Speed 3285.64 samples/sec   Loss 4.4536   LearningRate 0.0774   Epoch: 2   Global Step: 40040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:11,306-Speed 3188.50 samples/sec   Loss 4.4928   LearningRate 0.0774   Epoch: 2   Global Step: 40050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:14,373-Speed 3340.73 samples/sec   Loss 4.5099   LearningRate 0.0774   Epoch: 2   Global Step: 40060   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:29:17,417-Speed 3364.43 samples/sec   Loss 4.5623   LearningRate 0.0774   Epoch: 2   Global Step: 40070   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:20,506-Speed 3316.22 samples/sec   Loss 4.4857   LearningRate 0.0774   Epoch: 2   Global Step: 40080   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:23,566-Speed 3346.45 samples/sec   Loss 4.4667   LearningRate 0.0774   Epoch: 2   Global Step: 40090   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:26,635-Speed 3337.69 samples/sec   Loss 4.5350   LearningRate 0.0774   Epoch: 2   Global Step: 40100   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:29,697-Speed 3345.18 samples/sec   Loss 4.4906   LearningRate 0.0774   Epoch: 2   Global Step: 40110   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:32,747-Speed 3357.95 samples/sec   Loss 4.4040   LearningRate 0.0774   Epoch: 2   Global Step: 40120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:35,806-Speed 3348.01 samples/sec   Loss 4.5165   LearningRate 0.0774   Epoch: 2   Global Step: 40130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:38,870-Speed 3343.17 samples/sec   Loss 4.4315   LearningRate 0.0774   Epoch: 2   Global Step: 40140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:41,936-Speed 3340.82 samples/sec   Loss 4.4815   LearningRate 0.0774   Epoch: 2   Global Step: 40150   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:45,046-Speed 3292.80 samples/sec   Loss 4.4453   LearningRate 0.0774   Epoch: 2   Global Step: 40160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:48,092-Speed 3362.57 samples/sec   Loss 4.4333   LearningRate 0.0774   Epoch: 2   Global Step: 40170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:51,207-Speed 3288.08 samples/sec   Loss 4.4127   LearningRate 0.0774   Epoch: 2   Global Step: 40180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:54,286-Speed 3326.97 samples/sec   Loss 4.4265   LearningRate 0.0774   Epoch: 2   Global Step: 40190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:29:57,361-Speed 3330.80 samples/sec   Loss 4.4588   LearningRate 0.0774   Epoch: 2   Global Step: 40200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:00,440-Speed 3326.28 samples/sec   Loss 4.3924   LearningRate 0.0774   Epoch: 2   Global Step: 40210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:03,557-Speed 3288.50 samples/sec   Loss 4.4975   LearningRate 0.0774   Epoch: 2   Global Step: 40220   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:06,656-Speed 3304.77 samples/sec   Loss 4.4939   LearningRate 0.0773   Epoch: 2   Global Step: 40230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:09,716-Speed 3347.72 samples/sec   Loss 4.4200   LearningRate 0.0773   Epoch: 2   Global Step: 40240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:12,776-Speed 3346.77 samples/sec   Loss 4.4932   LearningRate 0.0773   Epoch: 2   Global Step: 40250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:15,838-Speed 3345.76 samples/sec   Loss 4.4725   LearningRate 0.0773   Epoch: 2   Global Step: 40260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:18,894-Speed 3350.51 samples/sec   Loss 4.3996   LearningRate 0.0773   Epoch: 2   Global Step: 40270   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:30:21,940-Speed 3363.40 samples/sec   Loss 4.4849   LearningRate 0.0773   Epoch: 2   Global Step: 40280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:24,997-Speed 3349.96 samples/sec   Loss 4.3843   LearningRate 0.0773   Epoch: 2   Global Step: 40290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:28,080-Speed 3322.52 samples/sec   Loss 4.3715   LearningRate 0.0773   Epoch: 2   Global Step: 40300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:31,141-Speed 3345.96 samples/sec   Loss 4.4611   LearningRate 0.0773   Epoch: 2   Global Step: 40310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:34,195-Speed 3353.74 samples/sec   Loss 4.4405   LearningRate 0.0773   Epoch: 2   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:37,254-Speed 3348.27 samples/sec   Loss 4.4736   LearningRate 0.0773   Epoch: 2   Global Step: 40330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:40,309-Speed 3352.50 samples/sec   Loss 4.4477   LearningRate 0.0773   Epoch: 2   Global Step: 40340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:43,381-Speed 3334.56 samples/sec   Loss 4.5081   LearningRate 0.0773   Epoch: 2   Global Step: 40350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:46,437-Speed 3351.14 samples/sec   Loss 4.4731   LearningRate 0.0773   Epoch: 2   Global Step: 40360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:49,495-Speed 3349.53 samples/sec   Loss 4.4838   LearningRate 0.0773   Epoch: 2   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:52,540-Speed 3363.70 samples/sec   Loss 4.4878   LearningRate 0.0773   Epoch: 2   Global Step: 40380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:55,607-Speed 3339.09 samples/sec   Loss 4.4948   LearningRate 0.0773   Epoch: 2   Global Step: 40390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:30:58,687-Speed 3325.39 samples/sec   Loss 4.4068   LearningRate 0.0773   Epoch: 2   Global Step: 40400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:31:01,759-Speed 3334.67 samples/sec   Loss 4.5036   LearningRate 0.0773   Epoch: 2   Global Step: 40410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:31:04,817-Speed 3349.99 samples/sec   Loss 4.4927   LearningRate 0.0772   Epoch: 2   Global Step: 40420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:31:07,866-Speed 3359.25 samples/sec   Loss 4.5036   LearningRate 0.0772   Epoch: 2   Global Step: 40430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:10,918-Speed 3355.46 samples/sec   Loss 4.4000   LearningRate 0.0772   Epoch: 2   Global Step: 40440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:14,016-Speed 3305.82 samples/sec   Loss 4.3766   LearningRate 0.0772   Epoch: 2   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:17,102-Speed 3318.90 samples/sec   Loss 4.4384   LearningRate 0.0772   Epoch: 2   Global Step: 40460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:20,165-Speed 3344.11 samples/sec   Loss 4.4556   LearningRate 0.0772   Epoch: 2   Global Step: 40470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:23,225-Speed 3346.80 samples/sec   Loss 4.5278   LearningRate 0.0772   Epoch: 2   Global Step: 40480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:26,294-Speed 3337.80 samples/sec   Loss 4.5104   LearningRate 0.0772   Epoch: 2   Global Step: 40490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:29,357-Speed 3343.86 samples/sec   Loss 4.4366   LearningRate 0.0772   Epoch: 2   Global Step: 40500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:32,501-Speed 3257.88 samples/sec   Loss 4.4498   LearningRate 0.0772   Epoch: 2   Global Step: 40510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:35,589-Speed 3316.91 samples/sec   Loss 4.4595   LearningRate 0.0772   Epoch: 2   Global Step: 40520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:31:38,653-Speed 3343.59 samples/sec   Loss 4.3949   LearningRate 0.0772   Epoch: 2   Global Step: 40530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:31:41,711-Speed 3349.34 samples/sec   Loss 4.4975   LearningRate 0.0772   Epoch: 2   Global Step: 40540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:31:44,777-Speed 3339.66 samples/sec   Loss 4.4577   LearningRate 0.0772   Epoch: 2   Global Step: 40550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:31:47,853-Speed 3330.54 samples/sec   Loss 4.5015   LearningRate 0.0772   Epoch: 2   Global Step: 40560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:31:50,941-Speed 3316.47 samples/sec   Loss 4.4533   LearningRate 0.0772   Epoch: 2   Global Step: 40570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:31:54,002-Speed 3346.09 samples/sec   Loss 4.4252   LearningRate 0.0772   Epoch: 2   Global Step: 40580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:31:57,066-Speed 3343.37 samples/sec   Loss 4.4579   LearningRate 0.0772   Epoch: 2   Global Step: 40590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:32:00,137-Speed 3334.86 samples/sec   Loss 4.4653   LearningRate 0.0772   Epoch: 2   Global Step: 40600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:32:03,201-Speed 3342.44 samples/sec   Loss 4.3805   LearningRate 0.0771   Epoch: 2   Global Step: 40610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:32:06,255-Speed 3353.53 samples/sec   Loss 4.4727   LearningRate 0.0771   Epoch: 2   Global Step: 40620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:09,315-Speed 3347.52 samples/sec   Loss 4.5409   LearningRate 0.0771   Epoch: 2   Global Step: 40630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:12,379-Speed 3343.19 samples/sec   Loss 4.5094   LearningRate 0.0771   Epoch: 2   Global Step: 40640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:15,445-Speed 3340.08 samples/sec   Loss 4.4477   LearningRate 0.0771   Epoch: 2   Global Step: 40650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:18,511-Speed 3340.60 samples/sec   Loss 4.4042   LearningRate 0.0771   Epoch: 2   Global Step: 40660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:21,590-Speed 3327.15 samples/sec   Loss 4.4918   LearningRate 0.0771   Epoch: 2   Global Step: 40670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:24,692-Speed 3302.31 samples/sec   Loss 4.5334   LearningRate 0.0771   Epoch: 2   Global Step: 40680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:27,760-Speed 3337.43 samples/sec   Loss 4.4126   LearningRate 0.0771   Epoch: 2   Global Step: 40690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:30,837-Speed 3328.71 samples/sec   Loss 4.4045   LearningRate 0.0771   Epoch: 2   Global Step: 40700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:33,917-Speed 3326.65 samples/sec   Loss 4.4951   LearningRate 0.0771   Epoch: 2   Global Step: 40710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:36,978-Speed 3345.75 samples/sec   Loss 4.4868   LearningRate 0.0771   Epoch: 2   Global Step: 40720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:32:40,093-Speed 3288.75 samples/sec   Loss 4.4633   LearningRate 0.0771   Epoch: 2   Global Step: 40730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:32:43,153-Speed 3346.44 samples/sec   Loss 4.5178   LearningRate 0.0771   Epoch: 2   Global Step: 40740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:32:46,222-Speed 3337.27 samples/sec   Loss 4.3923   LearningRate 0.0771   Epoch: 2   Global Step: 40750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:32:49,307-Speed 3320.26 samples/sec   Loss 4.5005   LearningRate 0.0771   Epoch: 2   Global Step: 40760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:52,379-Speed 3333.96 samples/sec   Loss 4.3775   LearningRate 0.0771   Epoch: 2   Global Step: 40770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:55,442-Speed 3344.83 samples/sec   Loss 4.4557   LearningRate 0.0771   Epoch: 2   Global Step: 40780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:32:58,515-Speed 3332.42 samples/sec   Loss 4.4437   LearningRate 0.0771   Epoch: 2   Global Step: 40790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:01,578-Speed 3343.29 samples/sec   Loss 4.3591   LearningRate 0.0770   Epoch: 2   Global Step: 40800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:04,644-Speed 3341.10 samples/sec   Loss 4.4589   LearningRate 0.0770   Epoch: 2   Global Step: 40810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:07,714-Speed 3336.20 samples/sec   Loss 4.4866   LearningRate 0.0770   Epoch: 2   Global Step: 40820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:10,871-Speed 3244.74 samples/sec   Loss 4.4603   LearningRate 0.0770   Epoch: 2   Global Step: 40830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:13,998-Speed 3274.75 samples/sec   Loss 4.4847   LearningRate 0.0770   Epoch: 2   Global Step: 40840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:17,065-Speed 3340.27 samples/sec   Loss 4.5120   LearningRate 0.0770   Epoch: 2   Global Step: 40850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:20,128-Speed 3343.53 samples/sec   Loss 4.4624   LearningRate 0.0770   Epoch: 2   Global Step: 40860   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:33:23,191-Speed 3344.18 samples/sec   Loss 4.4163   LearningRate 0.0770   Epoch: 2   Global Step: 40870   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:33:26,259-Speed 3338.99 samples/sec   Loss 4.4312   LearningRate 0.0770   Epoch: 2   Global Step: 40880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:33:29,325-Speed 3339.81 samples/sec   Loss 4.4600   LearningRate 0.0770   Epoch: 2   Global Step: 40890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:32,428-Speed 3301.15 samples/sec   Loss 4.4467   LearningRate 0.0770   Epoch: 2   Global Step: 40900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:35,538-Speed 3293.23 samples/sec   Loss 4.5490   LearningRate 0.0770   Epoch: 2   Global Step: 40910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:38,606-Speed 3338.72 samples/sec   Loss 4.4950   LearningRate 0.0770   Epoch: 2   Global Step: 40920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:41,667-Speed 3346.09 samples/sec   Loss 4.4949   LearningRate 0.0770   Epoch: 2   Global Step: 40930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:44,734-Speed 3339.11 samples/sec   Loss 4.4641   LearningRate 0.0770   Epoch: 2   Global Step: 40940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:47,872-Speed 3264.75 samples/sec   Loss 4.3179   LearningRate 0.0770   Epoch: 2   Global Step: 40950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:50,938-Speed 3340.26 samples/sec   Loss 4.4339   LearningRate 0.0770   Epoch: 2   Global Step: 40960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:54,005-Speed 3339.93 samples/sec   Loss 4.4398   LearningRate 0.0770   Epoch: 2   Global Step: 40970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:33:57,076-Speed 3335.01 samples/sec   Loss 4.3096   LearningRate 0.0770   Epoch: 2   Global Step: 40980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:00,247-Speed 3229.61 samples/sec   Loss 4.4708   LearningRate 0.0769   Epoch: 2   Global Step: 40990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:34:03,332-Speed 3320.24 samples/sec   Loss 4.4286   LearningRate 0.0769   Epoch: 2   Global Step: 41000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:34:06,433-Speed 3303.32 samples/sec   Loss 4.3531   LearningRate 0.0769   Epoch: 2   Global Step: 41010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:34:09,589-Speed 3244.37 samples/sec   Loss 4.4403   LearningRate 0.0769   Epoch: 2   Global Step: 41020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:34:12,738-Speed 3253.48 samples/sec   Loss 4.4927   LearningRate 0.0769   Epoch: 2   Global Step: 41030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:34:15,838-Speed 3304.33 samples/sec   Loss 4.4250   LearningRate 0.0769   Epoch: 2   Global Step: 41040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:34:18,909-Speed 3335.01 samples/sec   Loss 4.3391   LearningRate 0.0769   Epoch: 2   Global Step: 41050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:34:21,974-Speed 3341.25 samples/sec   Loss 4.3517   LearningRate 0.0769   Epoch: 2   Global Step: 41060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:25,151-Speed 3224.70 samples/sec   Loss 4.4473   LearningRate 0.0769   Epoch: 2   Global Step: 41070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:28,253-Speed 3302.69 samples/sec   Loss 4.5132   LearningRate 0.0769   Epoch: 2   Global Step: 41080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:31,481-Speed 3172.38 samples/sec   Loss 4.4481   LearningRate 0.0769   Epoch: 2   Global Step: 41090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:34,580-Speed 3304.51 samples/sec   Loss 4.3843   LearningRate 0.0769   Epoch: 2   Global Step: 41100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:37,643-Speed 3344.24 samples/sec   Loss 4.4553   LearningRate 0.0769   Epoch: 2   Global Step: 41110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:40,714-Speed 3335.33 samples/sec   Loss 4.4242   LearningRate 0.0769   Epoch: 2   Global Step: 41120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:43,787-Speed 3332.96 samples/sec   Loss 4.3905   LearningRate 0.0769   Epoch: 2   Global Step: 41130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:46,863-Speed 3330.37 samples/sec   Loss 4.3524   LearningRate 0.0769   Epoch: 2   Global Step: 41140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:49,971-Speed 3295.62 samples/sec   Loss 4.3817   LearningRate 0.0769   Epoch: 2   Global Step: 41150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:34:53,040-Speed 3337.73 samples/sec   Loss 4.4462   LearningRate 0.0769   Epoch: 2   Global Step: 41160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:34:56,137-Speed 3307.05 samples/sec   Loss 4.3969   LearningRate 0.0769   Epoch: 2   Global Step: 41170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:34:59,209-Speed 3333.75 samples/sec   Loss 4.3943   LearningRate 0.0768   Epoch: 2   Global Step: 41180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:02,288-Speed 3326.91 samples/sec   Loss 4.4740   LearningRate 0.0768   Epoch: 2   Global Step: 41190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:05,497-Speed 3191.32 samples/sec   Loss 4.3852   LearningRate 0.0768   Epoch: 2   Global Step: 41200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:08,622-Speed 3278.11 samples/sec   Loss 4.5356   LearningRate 0.0768   Epoch: 2   Global Step: 41210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:11,693-Speed 3335.10 samples/sec   Loss 4.4394   LearningRate 0.0768   Epoch: 2   Global Step: 41220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:14,765-Speed 3334.96 samples/sec   Loss 4.4494   LearningRate 0.0768   Epoch: 2   Global Step: 41230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:17,830-Speed 3340.92 samples/sec   Loss 4.4821   LearningRate 0.0768   Epoch: 2   Global Step: 41240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:20,897-Speed 3339.81 samples/sec   Loss 4.3126   LearningRate 0.0768   Epoch: 2   Global Step: 41250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:23,997-Speed 3303.70 samples/sec   Loss 4.3943   LearningRate 0.0768   Epoch: 2   Global Step: 41260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:27,074-Speed 3329.18 samples/sec   Loss 4.4265   LearningRate 0.0768   Epoch: 2   Global Step: 41270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:30,154-Speed 3324.63 samples/sec   Loss 4.4669   LearningRate 0.0768   Epoch: 2   Global Step: 41280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:35:33,263-Speed 3294.83 samples/sec   Loss 4.4949   LearningRate 0.0768   Epoch: 2   Global Step: 41290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:35:36,348-Speed 3320.92 samples/sec   Loss 4.4551   LearningRate 0.0768   Epoch: 2   Global Step: 41300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:35:39,402-Speed 3353.23 samples/sec   Loss 4.4227   LearningRate 0.0768   Epoch: 2   Global Step: 41310   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:42,485-Speed 3322.59 samples/sec   Loss 4.3484   LearningRate 0.0768   Epoch: 2   Global Step: 41320   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:45,550-Speed 3340.87 samples/sec   Loss 4.4072   LearningRate 0.0768   Epoch: 2   Global Step: 41330   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:48,650-Speed 3305.12 samples/sec   Loss 4.4269   LearningRate 0.0768   Epoch: 2   Global Step: 41340   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:51,747-Speed 3306.74 samples/sec   Loss 4.5014   LearningRate 0.0768   Epoch: 2   Global Step: 41350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:54,813-Speed 3340.29 samples/sec   Loss 4.4399   LearningRate 0.0768   Epoch: 2   Global Step: 41360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:35:57,884-Speed 3335.46 samples/sec   Loss 4.4012   LearningRate 0.0768   Epoch: 2   Global Step: 41370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:00,948-Speed 3342.68 samples/sec   Loss 4.4205   LearningRate 0.0767   Epoch: 2   Global Step: 41380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:04,031-Speed 3322.11 samples/sec   Loss 4.4717   LearningRate 0.0767   Epoch: 2   Global Step: 41390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:07,095-Speed 3343.25 samples/sec   Loss 4.4398   LearningRate 0.0767   Epoch: 2   Global Step: 41400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:10,159-Speed 3342.29 samples/sec   Loss 4.4723   LearningRate 0.0767   Epoch: 2   Global Step: 41410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:36:13,224-Speed 3342.56 samples/sec   Loss 4.3720   LearningRate 0.0767   Epoch: 2   Global Step: 41420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:16,317-Speed 3311.12 samples/sec   Loss 4.3810   LearningRate 0.0767   Epoch: 2   Global Step: 41430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:19,394-Speed 3328.86 samples/sec   Loss 4.4354   LearningRate 0.0767   Epoch: 2   Global Step: 41440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:22,494-Speed 3303.04 samples/sec   Loss 4.3839   LearningRate 0.0767   Epoch: 2   Global Step: 41450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:25,559-Speed 3342.88 samples/sec   Loss 4.3839   LearningRate 0.0767   Epoch: 2   Global Step: 41460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:28,644-Speed 3319.28 samples/sec   Loss 4.4637   LearningRate 0.0767   Epoch: 2   Global Step: 41470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:31,709-Speed 3342.17 samples/sec   Loss 4.4086   LearningRate 0.0767   Epoch: 2   Global Step: 41480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:34,781-Speed 3333.95 samples/sec   Loss 4.4320   LearningRate 0.0767   Epoch: 2   Global Step: 41490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:37,843-Speed 3344.85 samples/sec   Loss 4.4370   LearningRate 0.0767   Epoch: 2   Global Step: 41500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:40,914-Speed 3335.16 samples/sec   Loss 4.3881   LearningRate 0.0767   Epoch: 2   Global Step: 41510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:36:44,052-Speed 3264.62 samples/sec   Loss 4.4641   LearningRate 0.0767   Epoch: 2   Global Step: 41520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:36:47,123-Speed 3335.15 samples/sec   Loss 4.4656   LearningRate 0.0767   Epoch: 2   Global Step: 41530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:36:50,196-Speed 3333.34 samples/sec   Loss 4.4290   LearningRate 0.0767   Epoch: 2   Global Step: 41540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:36:53,259-Speed 3343.16 samples/sec   Loss 4.3776   LearningRate 0.0767   Epoch: 2   Global Step: 41550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:36:56,340-Speed 3324.97 samples/sec   Loss 4.4374   LearningRate 0.0767   Epoch: 2   Global Step: 41560   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:36:59,408-Speed 3338.75 samples/sec   Loss 4.4775   LearningRate 0.0766   Epoch: 2   Global Step: 41570   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:37:02,482-Speed 3332.13 samples/sec   Loss 4.3715   LearningRate 0.0766   Epoch: 2   Global Step: 41580   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:37:05,545-Speed 3343.41 samples/sec   Loss 4.4740   LearningRate 0.0766   Epoch: 2   Global Step: 41590   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:37:08,618-Speed 3333.68 samples/sec   Loss 4.4302   LearningRate 0.0766   Epoch: 2   Global Step: 41600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:37:11,691-Speed 3332.03 samples/sec   Loss 4.4908   LearningRate 0.0766   Epoch: 2   Global Step: 41610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:37:14,757-Speed 3340.73 samples/sec   Loss 4.4953   LearningRate 0.0766   Epoch: 2   Global Step: 41620   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:37:17,820-Speed 3343.99 samples/sec   Loss 4.3112   LearningRate 0.0766   Epoch: 2   Global Step: 41630   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:37:20,859-Speed 3370.14 samples/sec   Loss 4.3241   LearningRate 0.0766   Epoch: 2   Global Step: 41640   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:23,929-Speed 3337.15 samples/sec   Loss 4.3704   LearningRate 0.0766   Epoch: 2   Global Step: 41650   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:26,992-Speed 3343.89 samples/sec   Loss 4.3164   LearningRate 0.0766   Epoch: 2   Global Step: 41660   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:30,058-Speed 3340.95 samples/sec   Loss 4.5964   LearningRate 0.0766   Epoch: 2   Global Step: 41670   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:33,118-Speed 3346.54 samples/sec   Loss 4.4300   LearningRate 0.0766   Epoch: 2   Global Step: 41680   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:36,206-Speed 3317.76 samples/sec   Loss 4.4203   LearningRate 0.0766   Epoch: 2   Global Step: 41690   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:39,281-Speed 3330.44 samples/sec   Loss 4.4637   LearningRate 0.0766   Epoch: 2   Global Step: 41700   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:42,353-Speed 3333.78 samples/sec   Loss 4.4055   LearningRate 0.0766   Epoch: 2   Global Step: 41710   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:45,421-Speed 3338.44 samples/sec   Loss 4.4117   LearningRate 0.0766   Epoch: 2   Global Step: 41720   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:48,508-Speed 3318.58 samples/sec   Loss 4.4224   LearningRate 0.0766   Epoch: 2   Global Step: 41730   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:37:51,573-Speed 3340.91 samples/sec   Loss 4.4618   LearningRate 0.0766   Epoch: 2   Global Step: 41740   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:37:54,635-Speed 3345.93 samples/sec   Loss 4.3394   LearningRate 0.0766   Epoch: 2   Global Step: 41750   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:37:57,701-Speed 3340.57 samples/sec   Loss 4.3720   LearningRate 0.0765   Epoch: 2   Global Step: 41760   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:00,774-Speed 3332.77 samples/sec   Loss 4.4232   LearningRate 0.0765   Epoch: 2   Global Step: 41770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:03,845-Speed 3335.44 samples/sec   Loss 4.3474   LearningRate 0.0765   Epoch: 2   Global Step: 41780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:06,924-Speed 3326.54 samples/sec   Loss 4.4366   LearningRate 0.0765   Epoch: 2   Global Step: 41790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:09,995-Speed 3334.67 samples/sec   Loss 4.3297   LearningRate 0.0765   Epoch: 2   Global Step: 41800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:13,071-Speed 3330.25 samples/sec   Loss 4.3896   LearningRate 0.0765   Epoch: 2   Global Step: 41810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:16,155-Speed 3320.83 samples/sec   Loss 4.4025   LearningRate 0.0765   Epoch: 2   Global Step: 41820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:19,334-Speed 3221.86 samples/sec   Loss 4.3076   LearningRate 0.0765   Epoch: 2   Global Step: 41830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:22,414-Speed 3325.25 samples/sec   Loss 4.4182   LearningRate 0.0765   Epoch: 2   Global Step: 41840   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:38:25,484-Speed 3336.99 samples/sec   Loss 4.4111   LearningRate 0.0765   Epoch: 2   Global Step: 41850   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:38:28,542-Speed 3349.53 samples/sec   Loss 4.4509   LearningRate 0.0765   Epoch: 2   Global Step: 41860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:31,668-Speed 3276.10 samples/sec   Loss 4.4337   LearningRate 0.0765   Epoch: 2   Global Step: 41870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:34,773-Speed 3299.36 samples/sec   Loss 4.3603   LearningRate 0.0765   Epoch: 2   Global Step: 41880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:37,919-Speed 3255.06 samples/sec   Loss 4.4157   LearningRate 0.0765   Epoch: 2   Global Step: 41890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:41,013-Speed 3310.49 samples/sec   Loss 4.4325   LearningRate 0.0765   Epoch: 2   Global Step: 41900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:44,097-Speed 3321.26 samples/sec   Loss 4.3377   LearningRate 0.0765   Epoch: 2   Global Step: 41910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:47,199-Speed 3301.83 samples/sec   Loss 4.4012   LearningRate 0.0765   Epoch: 2   Global Step: 41920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:50,308-Speed 3295.02 samples/sec   Loss 4.5327   LearningRate 0.0765   Epoch: 2   Global Step: 41930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:53,390-Speed 3323.32 samples/sec   Loss 4.4076   LearningRate 0.0765   Epoch: 2   Global Step: 41940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:56,455-Speed 3340.58 samples/sec   Loss 4.3933   LearningRate 0.0764   Epoch: 2   Global Step: 41950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:38:59,585-Speed 3273.03 samples/sec   Loss 4.3725   LearningRate 0.0764   Epoch: 2   Global Step: 41960   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:39:02,653-Speed 3338.59 samples/sec   Loss 4.2900   LearningRate 0.0764   Epoch: 2   Global Step: 41970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:39:05,753-Speed 3303.60 samples/sec   Loss 4.4182   LearningRate 0.0764   Epoch: 2   Global Step: 41980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:39:08,837-Speed 3320.91 samples/sec   Loss 4.4766   LearningRate 0.0764   Epoch: 2   Global Step: 41990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:39:11,948-Speed 3292.98 samples/sec   Loss 4.3442   LearningRate 0.0764   Epoch: 2   Global Step: 42000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:39:55,858-[lfw][42000]XNorm: 21.622965
Training: 2022-04-11 03:39:55,859-[lfw][42000]Accuracy-Flip: 0.99717+-0.00269
Training: 2022-04-11 03:39:55,859-[lfw][42000]Accuracy-Highest: 0.99767
Training: 2022-04-11 03:40:46,668-[cfp_fp][42000]XNorm: 19.723359
Training: 2022-04-11 03:40:46,668-[cfp_fp][42000]Accuracy-Flip: 0.97971+-0.00758
Training: 2022-04-11 03:40:46,669-[cfp_fp][42000]Accuracy-Highest: 0.98143
Training: 2022-04-11 03:41:30,286-[agedb_30][42000]XNorm: 21.682372
Training: 2022-04-11 03:41:30,287-[agedb_30][42000]Accuracy-Flip: 0.97917+-0.00768
Training: 2022-04-11 03:41:30,287-[agedb_30][42000]Accuracy-Highest: 0.97917
Training: 2022-04-11 03:41:33,345-Speed 72.42 samples/sec   Loss 4.4143   LearningRate 0.0764   Epoch: 2   Global Step: 42010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:41:36,403-Speed 3348.97 samples/sec   Loss 4.4399   LearningRate 0.0764   Epoch: 2   Global Step: 42020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:41:39,460-Speed 3351.43 samples/sec   Loss 4.3516   LearningRate 0.0764   Epoch: 2   Global Step: 42030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:41:42,514-Speed 3353.12 samples/sec   Loss 4.3069   LearningRate 0.0764   Epoch: 2   Global Step: 42040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:41:45,632-Speed 3285.28 samples/sec   Loss 4.4370   LearningRate 0.0764   Epoch: 2   Global Step: 42050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:41:48,683-Speed 3356.79 samples/sec   Loss 4.4823   LearningRate 0.0764   Epoch: 2   Global Step: 42060   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:41:51,751-Speed 3338.32 samples/sec   Loss 4.4038   LearningRate 0.0764   Epoch: 2   Global Step: 42070   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:41:54,807-Speed 3352.13 samples/sec   Loss 4.4075   LearningRate 0.0764   Epoch: 2   Global Step: 42080   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:41:57,869-Speed 3344.17 samples/sec   Loss 4.5011   LearningRate 0.0764   Epoch: 2   Global Step: 42090   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:42:00,928-Speed 3348.42 samples/sec   Loss 4.3958   LearningRate 0.0764   Epoch: 2   Global Step: 42100   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:42:04,069-Speed 3260.60 samples/sec   Loss 4.4815   LearningRate 0.0764   Epoch: 2   Global Step: 42110   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:42:07,137-Speed 3338.80 samples/sec   Loss 4.3173   LearningRate 0.0764   Epoch: 2   Global Step: 42120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:42:10,198-Speed 3346.48 samples/sec   Loss 4.4582   LearningRate 0.0764   Epoch: 2   Global Step: 42130   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:42:13,418-Speed 3181.25 samples/sec   Loss 4.4555   LearningRate 0.0763   Epoch: 2   Global Step: 42140   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:42:16,510-Speed 3312.37 samples/sec   Loss 4.4607   LearningRate 0.0763   Epoch: 2   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:42:19,572-Speed 3344.46 samples/sec   Loss 4.3368   LearningRate 0.0763   Epoch: 2   Global Step: 42160   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:42:22,635-Speed 3344.60 samples/sec   Loss 4.4470   LearningRate 0.0763   Epoch: 2   Global Step: 42170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:42:25,743-Speed 3294.68 samples/sec   Loss 4.3764   LearningRate 0.0763   Epoch: 2   Global Step: 42180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:28,803-Speed 3347.83 samples/sec   Loss 4.3884   LearningRate 0.0763   Epoch: 2   Global Step: 42190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:31,933-Speed 3271.95 samples/sec   Loss 4.3923   LearningRate 0.0763   Epoch: 2   Global Step: 42200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:34,997-Speed 3342.28 samples/sec   Loss 4.3461   LearningRate 0.0763   Epoch: 2   Global Step: 42210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:38,096-Speed 3305.87 samples/sec   Loss 4.3646   LearningRate 0.0763   Epoch: 2   Global Step: 42220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:41,181-Speed 3320.32 samples/sec   Loss 4.4066   LearningRate 0.0763   Epoch: 2   Global Step: 42230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:44,251-Speed 3336.26 samples/sec   Loss 4.4092   LearningRate 0.0763   Epoch: 2   Global Step: 42240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:47,358-Speed 3296.22 samples/sec   Loss 4.3062   LearningRate 0.0763   Epoch: 2   Global Step: 42250   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:50,435-Speed 3328.38 samples/sec   Loss 4.3984   LearningRate 0.0763   Epoch: 2   Global Step: 42260   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:53,507-Speed 3333.78 samples/sec   Loss 4.3793   LearningRate 0.0763   Epoch: 2   Global Step: 42270   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:42:56,589-Speed 3323.64 samples/sec   Loss 4.3115   LearningRate 0.0763   Epoch: 2   Global Step: 42280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:42:59,712-Speed 3279.50 samples/sec   Loss 4.3735   LearningRate 0.0763   Epoch: 2   Global Step: 42290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:02,794-Speed 3323.58 samples/sec   Loss 4.4932   LearningRate 0.0763   Epoch: 2   Global Step: 42300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:05,860-Speed 3340.49 samples/sec   Loss 4.3001   LearningRate 0.0763   Epoch: 2   Global Step: 42310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:08,942-Speed 3323.74 samples/sec   Loss 4.3595   LearningRate 0.0763   Epoch: 2   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:12,011-Speed 3337.44 samples/sec   Loss 4.2965   LearningRate 0.0762   Epoch: 2   Global Step: 42330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:15,074-Speed 3343.26 samples/sec   Loss 4.3717   LearningRate 0.0762   Epoch: 2   Global Step: 42340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:18,134-Speed 3347.19 samples/sec   Loss 4.3404   LearningRate 0.0762   Epoch: 2   Global Step: 42350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:21,198-Speed 3342.60 samples/sec   Loss 4.4114   LearningRate 0.0762   Epoch: 2   Global Step: 42360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:24,262-Speed 3343.54 samples/sec   Loss 4.4790   LearningRate 0.0762   Epoch: 2   Global Step: 42370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:27,400-Speed 3264.20 samples/sec   Loss 4.3479   LearningRate 0.0762   Epoch: 2   Global Step: 42380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:30,518-Speed 3284.18 samples/sec   Loss 4.4923   LearningRate 0.0762   Epoch: 2   Global Step: 42390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:33,688-Speed 3231.17 samples/sec   Loss 4.4102   LearningRate 0.0762   Epoch: 2   Global Step: 42400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:36,762-Speed 3332.83 samples/sec   Loss 4.4113   LearningRate 0.0762   Epoch: 2   Global Step: 42410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:39,837-Speed 3330.05 samples/sec   Loss 4.2828   LearningRate 0.0762   Epoch: 2   Global Step: 42420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:42,895-Speed 3349.26 samples/sec   Loss 4.3998   LearningRate 0.0762   Epoch: 2   Global Step: 42430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:45,969-Speed 3332.44 samples/sec   Loss 4.3876   LearningRate 0.0762   Epoch: 2   Global Step: 42440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:49,039-Speed 3335.58 samples/sec   Loss 4.3734   LearningRate 0.0762   Epoch: 2   Global Step: 42450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:52,101-Speed 3345.77 samples/sec   Loss 4.3127   LearningRate 0.0762   Epoch: 2   Global Step: 42460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:55,240-Speed 3262.27 samples/sec   Loss 4.4518   LearningRate 0.0762   Epoch: 2   Global Step: 42470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:43:58,308-Speed 3339.00 samples/sec   Loss 4.2940   LearningRate 0.0762   Epoch: 2   Global Step: 42480   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:44:01,457-Speed 3252.34 samples/sec   Loss 4.2686   LearningRate 0.0762   Epoch: 2   Global Step: 42490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:44:04,523-Speed 3340.23 samples/sec   Loss 4.3837   LearningRate 0.0762   Epoch: 2   Global Step: 42500   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:44:07,602-Speed 3327.05 samples/sec   Loss 4.3352   LearningRate 0.0762   Epoch: 2   Global Step: 42510   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:44:10,694-Speed 3312.22 samples/sec   Loss 4.4152   LearningRate 0.0761   Epoch: 2   Global Step: 42520   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:44:13,755-Speed 3347.23 samples/sec   Loss 4.4045   LearningRate 0.0761   Epoch: 2   Global Step: 42530   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:44:16,822-Speed 3338.98 samples/sec   Loss 4.3385   LearningRate 0.0761   Epoch: 2   Global Step: 42540   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:44:19,873-Speed 3356.80 samples/sec   Loss 4.3158   LearningRate 0.0761   Epoch: 2   Global Step: 42550   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:22,943-Speed 3336.67 samples/sec   Loss 4.3662   LearningRate 0.0761   Epoch: 2   Global Step: 42560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:26,027-Speed 3321.92 samples/sec   Loss 4.3324   LearningRate 0.0761   Epoch: 2   Global Step: 42570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:29,103-Speed 3330.10 samples/sec   Loss 4.4189   LearningRate 0.0761   Epoch: 2   Global Step: 42580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:32,228-Speed 3277.62 samples/sec   Loss 4.4278   LearningRate 0.0761   Epoch: 2   Global Step: 42590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:35,298-Speed 3336.23 samples/sec   Loss 4.2955   LearningRate 0.0761   Epoch: 2   Global Step: 42600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:38,422-Speed 3278.17 samples/sec   Loss 4.3824   LearningRate 0.0761   Epoch: 2   Global Step: 42610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:41,541-Speed 3283.53 samples/sec   Loss 4.4504   LearningRate 0.0761   Epoch: 2   Global Step: 42620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:44,625-Speed 3322.16 samples/sec   Loss 4.3820   LearningRate 0.0761   Epoch: 2   Global Step: 42630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:47,697-Speed 3342.61 samples/sec   Loss 4.3948   LearningRate 0.0761   Epoch: 2   Global Step: 42640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:44:50,780-Speed 3321.63 samples/sec   Loss 4.3851   LearningRate 0.0761   Epoch: 2   Global Step: 42650   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:44:54,003-Speed 3178.40 samples/sec   Loss 4.3613   LearningRate 0.0761   Epoch: 2   Global Step: 42660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:44:57,081-Speed 3327.72 samples/sec   Loss 4.3848   LearningRate 0.0761   Epoch: 2   Global Step: 42670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:00,183-Speed 3301.95 samples/sec   Loss 4.3628   LearningRate 0.0761   Epoch: 2   Global Step: 42680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:03,310-Speed 3275.45 samples/sec   Loss 4.3734   LearningRate 0.0761   Epoch: 2   Global Step: 42690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:06,378-Speed 3338.88 samples/sec   Loss 4.3260   LearningRate 0.0761   Epoch: 2   Global Step: 42700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:09,463-Speed 3319.36 samples/sec   Loss 4.2725   LearningRate 0.0760   Epoch: 2   Global Step: 42710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:12,533-Speed 3336.29 samples/sec   Loss 4.3890   LearningRate 0.0760   Epoch: 2   Global Step: 42720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:15,600-Speed 3340.44 samples/sec   Loss 4.3731   LearningRate 0.0760   Epoch: 2   Global Step: 42730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:18,685-Speed 3319.43 samples/sec   Loss 4.3182   LearningRate 0.0760   Epoch: 2   Global Step: 42740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:21,757-Speed 3334.05 samples/sec   Loss 4.3453   LearningRate 0.0760   Epoch: 2   Global Step: 42750   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:45:24,824-Speed 3339.91 samples/sec   Loss 4.2985   LearningRate 0.0760   Epoch: 2   Global Step: 42760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:27,896-Speed 3333.81 samples/sec   Loss 4.3556   LearningRate 0.0760   Epoch: 2   Global Step: 42770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:30,965-Speed 3338.03 samples/sec   Loss 4.4024   LearningRate 0.0760   Epoch: 2   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:34,030-Speed 3341.52 samples/sec   Loss 4.3257   LearningRate 0.0760   Epoch: 2   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:37,136-Speed 3297.38 samples/sec   Loss 4.2420   LearningRate 0.0760   Epoch: 2   Global Step: 42800   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:40,222-Speed 3319.10 samples/sec   Loss 4.3575   LearningRate 0.0760   Epoch: 2   Global Step: 42810   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:43,290-Speed 3338.70 samples/sec   Loss 4.3224   LearningRate 0.0760   Epoch: 2   Global Step: 42820   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:46,385-Speed 3309.15 samples/sec   Loss 4.3771   LearningRate 0.0760   Epoch: 2   Global Step: 42830   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:49,481-Speed 3307.97 samples/sec   Loss 4.3017   LearningRate 0.0760   Epoch: 2   Global Step: 42840   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:52,566-Speed 3320.35 samples/sec   Loss 4.3434   LearningRate 0.0760   Epoch: 2   Global Step: 42850   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:45:55,628-Speed 3345.49 samples/sec   Loss 4.2942   LearningRate 0.0760   Epoch: 2   Global Step: 42860   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:45:58,699-Speed 3334.81 samples/sec   Loss 4.2802   LearningRate 0.0760   Epoch: 2   Global Step: 42870   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:46:01,790-Speed 3313.49 samples/sec   Loss 4.4405   LearningRate 0.0760   Epoch: 2   Global Step: 42880   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:46:04,852-Speed 3344.74 samples/sec   Loss 4.3386   LearningRate 0.0760   Epoch: 2   Global Step: 42890   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:46:07,929-Speed 3329.24 samples/sec   Loss 4.3461   LearningRate 0.0759   Epoch: 2   Global Step: 42900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:11,001-Speed 3333.24 samples/sec   Loss 4.4044   LearningRate 0.0759   Epoch: 2   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:14,145-Speed 3257.98 samples/sec   Loss 4.3365   LearningRate 0.0759   Epoch: 2   Global Step: 42920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:17,230-Speed 3319.38 samples/sec   Loss 4.4276   LearningRate 0.0759   Epoch: 2   Global Step: 42930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:20,375-Speed 3257.11 samples/sec   Loss 4.2999   LearningRate 0.0759   Epoch: 2   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:23,519-Speed 3257.82 samples/sec   Loss 4.3685   LearningRate 0.0759   Epoch: 2   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:26,638-Speed 3284.62 samples/sec   Loss 4.3026   LearningRate 0.0759   Epoch: 2   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:29,721-Speed 3322.14 samples/sec   Loss 4.4297   LearningRate 0.0759   Epoch: 2   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:32,793-Speed 3334.18 samples/sec   Loss 4.3908   LearningRate 0.0759   Epoch: 2   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:35,883-Speed 3313.88 samples/sec   Loss 4.3236   LearningRate 0.0759   Epoch: 2   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:46:38,961-Speed 3328.15 samples/sec   Loss 4.3366   LearningRate 0.0759   Epoch: 2   Global Step: 43000   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:46:42,044-Speed 3321.96 samples/sec   Loss 4.3297   LearningRate 0.0759   Epoch: 2   Global Step: 43010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:46:45,235-Speed 3210.04 samples/sec   Loss 4.4267   LearningRate 0.0759   Epoch: 2   Global Step: 43020   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:46:48,315-Speed 3326.11 samples/sec   Loss 4.4260   LearningRate 0.0759   Epoch: 2   Global Step: 43030   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:46:51,385-Speed 3335.74 samples/sec   Loss 4.3779   LearningRate 0.0759   Epoch: 2   Global Step: 43040   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:46:54,451-Speed 3341.04 samples/sec   Loss 4.4058   LearningRate 0.0759   Epoch: 2   Global Step: 43050   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:46:57,516-Speed 3341.29 samples/sec   Loss 4.3594   LearningRate 0.0759   Epoch: 2   Global Step: 43060   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:47:00,581-Speed 3341.30 samples/sec   Loss 4.3578   LearningRate 0.0759   Epoch: 2   Global Step: 43070   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:47:03,651-Speed 3336.56 samples/sec   Loss 4.3257   LearningRate 0.0759   Epoch: 2   Global Step: 43080   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:47:06,728-Speed 3328.96 samples/sec   Loss 4.3815   LearningRate 0.0758   Epoch: 2   Global Step: 43090   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:47:09,789-Speed 3345.55 samples/sec   Loss 4.2801   LearningRate 0.0758   Epoch: 2   Global Step: 43100   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:47:12,863-Speed 3332.13 samples/sec   Loss 4.4047   LearningRate 0.0758   Epoch: 2   Global Step: 43110   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-11 03:47:15,934-Speed 3335.25 samples/sec   Loss 4.2918   LearningRate 0.0758   Epoch: 2   Global Step: 43120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:19,055-Speed 3282.15 samples/sec   Loss 4.1983   LearningRate 0.0758   Epoch: 2   Global Step: 43130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:22,160-Speed 3299.12 samples/sec   Loss 4.3135   LearningRate 0.0758   Epoch: 2   Global Step: 43140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:25,226-Speed 3339.43 samples/sec   Loss 4.4061   LearningRate 0.0758   Epoch: 2   Global Step: 43150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:28,297-Speed 3335.26 samples/sec   Loss 4.3222   LearningRate 0.0758   Epoch: 2   Global Step: 43160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:31,401-Speed 3299.79 samples/sec   Loss 4.3145   LearningRate 0.0758   Epoch: 2   Global Step: 43170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:34,477-Speed 3329.56 samples/sec   Loss 4.2991   LearningRate 0.0758   Epoch: 2   Global Step: 43180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:37,680-Speed 3198.17 samples/sec   Loss 4.4084   LearningRate 0.0758   Epoch: 2   Global Step: 43190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:40,761-Speed 3323.72 samples/sec   Loss 4.3860   LearningRate 0.0758   Epoch: 2   Global Step: 43200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:43,849-Speed 3317.52 samples/sec   Loss 4.4324   LearningRate 0.0758   Epoch: 2   Global Step: 43210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:47:46,919-Speed 3336.18 samples/sec   Loss 4.3888   LearningRate 0.0758   Epoch: 2   Global Step: 43220   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:47:50,002-Speed 3322.43 samples/sec   Loss 4.4752   LearningRate 0.0758   Epoch: 2   Global Step: 43230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:47:53,072-Speed 3336.51 samples/sec   Loss 4.2489   LearningRate 0.0758   Epoch: 2   Global Step: 43240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:47:56,142-Speed 3336.08 samples/sec   Loss 4.3895   LearningRate 0.0758   Epoch: 2   Global Step: 43250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:47:59,213-Speed 3335.21 samples/sec   Loss 4.2845   LearningRate 0.0758   Epoch: 2   Global Step: 43260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:02,286-Speed 3332.40 samples/sec   Loss 4.3100   LearningRate 0.0758   Epoch: 2   Global Step: 43270   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:05,360-Speed 3332.29 samples/sec   Loss 4.3746   LearningRate 0.0758   Epoch: 2   Global Step: 43280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:08,432-Speed 3333.44 samples/sec   Loss 4.3102   LearningRate 0.0757   Epoch: 2   Global Step: 43290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:11,501-Speed 3338.00 samples/sec   Loss 4.2710   LearningRate 0.0757   Epoch: 2   Global Step: 43300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:14,566-Speed 3342.15 samples/sec   Loss 4.4019   LearningRate 0.0757   Epoch: 2   Global Step: 43310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:17,630-Speed 3342.22 samples/sec   Loss 4.2314   LearningRate 0.0757   Epoch: 2   Global Step: 43320   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-04-11 03:48:20,693-Speed 3343.96 samples/sec   Loss 4.4237   LearningRate 0.0757   Epoch: 2   Global Step: 43330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:23,762-Speed 3337.24 samples/sec   Loss 4.2674   LearningRate 0.0757   Epoch: 2   Global Step: 43340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:26,830-Speed 3339.13 samples/sec   Loss 4.4547   LearningRate 0.0757   Epoch: 2   Global Step: 43350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:29,898-Speed 3337.84 samples/sec   Loss 4.4349   LearningRate 0.0757   Epoch: 2   Global Step: 43360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:48:32,972-Speed 3331.66 samples/sec   Loss 4.3860   LearningRate 0.0757   Epoch: 2   Global Step: 43370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:48:36,045-Speed 3333.91 samples/sec   Loss 4.3116   LearningRate 0.0757   Epoch: 2   Global Step: 43380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:48:39,175-Speed 3272.06 samples/sec   Loss 4.3403   LearningRate 0.0757   Epoch: 2   Global Step: 43390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:48:42,278-Speed 3300.94 samples/sec   Loss 4.4207   LearningRate 0.0757   Epoch: 2   Global Step: 43400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:48:45,347-Speed 3337.59 samples/sec   Loss 4.3728   LearningRate 0.0757   Epoch: 2   Global Step: 43410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:48:48,481-Speed 3268.05 samples/sec   Loss 4.3603   LearningRate 0.0757   Epoch: 2   Global Step: 43420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:48:51,552-Speed 3334.74 samples/sec   Loss 4.3005   LearningRate 0.0757   Epoch: 2   Global Step: 43430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:48:54,659-Speed 3296.52 samples/sec   Loss 4.3284   LearningRate 0.0757   Epoch: 2   Global Step: 43440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:48:57,728-Speed 3337.04 samples/sec   Loss 4.3203   LearningRate 0.0757   Epoch: 2   Global Step: 43450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:49:00,833-Speed 3299.27 samples/sec   Loss 4.2733   LearningRate 0.0757   Epoch: 2   Global Step: 43460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:49:03,904-Speed 3335.16 samples/sec   Loss 4.3017   LearningRate 0.0757   Epoch: 2   Global Step: 43470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:06,983-Speed 3326.61 samples/sec   Loss 4.3894   LearningRate 0.0756   Epoch: 2   Global Step: 43480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:10,071-Speed 3317.26 samples/sec   Loss 4.3160   LearningRate 0.0756   Epoch: 2   Global Step: 43490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:13,134-Speed 3343.57 samples/sec   Loss 4.3332   LearningRate 0.0756   Epoch: 2   Global Step: 43500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:16,216-Speed 3323.26 samples/sec   Loss 4.3498   LearningRate 0.0756   Epoch: 2   Global Step: 43510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:19,289-Speed 3333.10 samples/sec   Loss 4.3131   LearningRate 0.0756   Epoch: 2   Global Step: 43520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:22,376-Speed 3317.95 samples/sec   Loss 4.4293   LearningRate 0.0756   Epoch: 2   Global Step: 43530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:25,456-Speed 3325.24 samples/sec   Loss 4.3599   LearningRate 0.0756   Epoch: 2   Global Step: 43540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:28,532-Speed 3329.96 samples/sec   Loss 4.2509   LearningRate 0.0756   Epoch: 2   Global Step: 43550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:31,599-Speed 3339.57 samples/sec   Loss 4.2686   LearningRate 0.0756   Epoch: 2   Global Step: 43560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:34,663-Speed 3343.33 samples/sec   Loss 4.3229   LearningRate 0.0756   Epoch: 2   Global Step: 43570   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 03:49:37,741-Speed 3326.90 samples/sec   Loss 4.2965   LearningRate 0.0756   Epoch: 2   Global Step: 43580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:40,816-Speed 3331.24 samples/sec   Loss 4.4105   LearningRate 0.0756   Epoch: 2   Global Step: 43590   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:43,886-Speed 3336.02 samples/sec   Loss 4.3471   LearningRate 0.0756   Epoch: 2   Global Step: 43600   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:46,963-Speed 3329.08 samples/sec   Loss 4.3083   LearningRate 0.0756   Epoch: 2   Global Step: 43610   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:50,043-Speed 3325.35 samples/sec   Loss 4.3695   LearningRate 0.0756   Epoch: 2   Global Step: 43620   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:53,116-Speed 3332.38 samples/sec   Loss 4.4092   LearningRate 0.0756   Epoch: 2   Global Step: 43630   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:56,197-Speed 3325.11 samples/sec   Loss 4.4178   LearningRate 0.0756   Epoch: 2   Global Step: 43640   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:49:59,278-Speed 3323.91 samples/sec   Loss 4.2953   LearningRate 0.0756   Epoch: 2   Global Step: 43650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:50:02,351-Speed 3333.09 samples/sec   Loss 4.2805   LearningRate 0.0756   Epoch: 2   Global Step: 43660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:50:05,423-Speed 3334.32 samples/sec   Loss 4.5366   LearningRate 0.0755   Epoch: 2   Global Step: 43670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:50:08,512-Speed 3316.02 samples/sec   Loss 4.4378   LearningRate 0.0755   Epoch: 2   Global Step: 43680   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 03:50:11,595-Speed 3322.03 samples/sec   Loss 4.2546   LearningRate 0.0755   Epoch: 2   Global Step: 43690   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 03:50:14,651-Speed 3351.80 samples/sec   Loss 4.4082   LearningRate 0.0755   Epoch: 2   Global Step: 43700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:50:17,741-Speed 3314.95 samples/sec   Loss 4.2801   LearningRate 0.0755   Epoch: 2   Global Step: 43710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:50:20,805-Speed 3342.26 samples/sec   Loss 4.3064   LearningRate 0.0755   Epoch: 2   Global Step: 43720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:50:23,880-Speed 3330.81 samples/sec   Loss 4.2710   LearningRate 0.0755   Epoch: 2   Global Step: 43730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:50:26,953-Speed 3333.57 samples/sec   Loss 4.3671   LearningRate 0.0755   Epoch: 2   Global Step: 43740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:50:30,012-Speed 3348.70 samples/sec   Loss 4.2194   LearningRate 0.0755   Epoch: 2   Global Step: 43750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:50:33,100-Speed 3316.29 samples/sec   Loss 4.3005   LearningRate 0.0755   Epoch: 2   Global Step: 43760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:50:36,182-Speed 3323.28 samples/sec   Loss 4.3063   LearningRate 0.0755   Epoch: 2   Global Step: 43770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:50:39,340-Speed 3243.46 samples/sec   Loss 4.2944   LearningRate 0.0755   Epoch: 2   Global Step: 43780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:50:42,411-Speed 3335.55 samples/sec   Loss 4.3694   LearningRate 0.0755   Epoch: 2   Global Step: 43790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:50:45,488-Speed 3328.27 samples/sec   Loss 4.2642   LearningRate 0.0755   Epoch: 2   Global Step: 43800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:50:48,563-Speed 3330.56 samples/sec   Loss 4.3956   LearningRate 0.0755   Epoch: 2   Global Step: 43810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:50:51,632-Speed 3336.96 samples/sec   Loss 4.3353   LearningRate 0.0755   Epoch: 2   Global Step: 43820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:50:54,702-Speed 3337.18 samples/sec   Loss 4.3557   LearningRate 0.0755   Epoch: 2   Global Step: 43830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:50:57,770-Speed 3337.87 samples/sec   Loss 4.2862   LearningRate 0.0755   Epoch: 2   Global Step: 43840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:51:00,865-Speed 3309.78 samples/sec   Loss 4.2602   LearningRate 0.0755   Epoch: 2   Global Step: 43850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:04,060-Speed 3206.10 samples/sec   Loss 4.2737   LearningRate 0.0754   Epoch: 2   Global Step: 43860   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:07,188-Speed 3273.48 samples/sec   Loss 4.2759   LearningRate 0.0754   Epoch: 2   Global Step: 43870   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:10,311-Speed 3280.08 samples/sec   Loss 4.2950   LearningRate 0.0754   Epoch: 2   Global Step: 43880   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:13,436-Speed 3277.50 samples/sec   Loss 4.4062   LearningRate 0.0754   Epoch: 2   Global Step: 43890   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:16,507-Speed 3334.37 samples/sec   Loss 4.3576   LearningRate 0.0754   Epoch: 2   Global Step: 43900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:19,607-Speed 3304.34 samples/sec   Loss 4.2875   LearningRate 0.0754   Epoch: 2   Global Step: 43910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:22,681-Speed 3332.84 samples/sec   Loss 4.3028   LearningRate 0.0754   Epoch: 2   Global Step: 43920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:25,769-Speed 3316.02 samples/sec   Loss 4.2577   LearningRate 0.0754   Epoch: 2   Global Step: 43930   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:28,870-Speed 3304.13 samples/sec   Loss 4.3119   LearningRate 0.0754   Epoch: 2   Global Step: 43940   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:51:31,938-Speed 3337.97 samples/sec   Loss 4.3092   LearningRate 0.0754   Epoch: 2   Global Step: 43950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:51:35,005-Speed 3339.64 samples/sec   Loss 4.2346   LearningRate 0.0754   Epoch: 2   Global Step: 43960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:51:38,073-Speed 3338.56 samples/sec   Loss 4.3366   LearningRate 0.0754   Epoch: 2   Global Step: 43970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:51:41,142-Speed 3337.82 samples/sec   Loss 4.3046   LearningRate 0.0754   Epoch: 2   Global Step: 43980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:51:44,226-Speed 3320.70 samples/sec   Loss 4.3073   LearningRate 0.0754   Epoch: 2   Global Step: 43990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:51:47,292-Speed 3340.26 samples/sec   Loss 4.2609   LearningRate 0.0754   Epoch: 2   Global Step: 44000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:52:30,626-[lfw][44000]XNorm: 23.401007
Training: 2022-04-11 03:52:30,627-[lfw][44000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 03:52:30,628-[lfw][44000]Accuracy-Highest: 0.99783
Training: 2022-04-11 03:53:21,195-[cfp_fp][44000]XNorm: 21.589601
Training: 2022-04-11 03:53:21,196-[cfp_fp][44000]Accuracy-Flip: 0.98214+-0.00583
Training: 2022-04-11 03:53:21,196-[cfp_fp][44000]Accuracy-Highest: 0.98214
Training: 2022-04-11 03:54:04,802-[agedb_30][44000]XNorm: 23.120066
Training: 2022-04-11 03:54:04,803-[agedb_30][44000]Accuracy-Flip: 0.97867+-0.00884
Training: 2022-04-11 03:54:04,803-[agedb_30][44000]Accuracy-Highest: 0.97917
Training: 2022-04-11 03:54:07,873-Speed 72.84 samples/sec   Loss 4.3561   LearningRate 0.0754   Epoch: 2   Global Step: 44010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:10,932-Speed 3348.34 samples/sec   Loss 4.2751   LearningRate 0.0754   Epoch: 2   Global Step: 44020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:14,037-Speed 3298.03 samples/sec   Loss 4.3476   LearningRate 0.0754   Epoch: 2   Global Step: 44030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:17,086-Speed 3359.73 samples/sec   Loss 4.3164   LearningRate 0.0754   Epoch: 2   Global Step: 44040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:20,158-Speed 3334.02 samples/sec   Loss 4.2764   LearningRate 0.0753   Epoch: 2   Global Step: 44050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:54:23,246-Speed 3316.28 samples/sec   Loss 4.2827   LearningRate 0.0753   Epoch: 2   Global Step: 44060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:26,295-Speed 3360.51 samples/sec   Loss 4.3651   LearningRate 0.0753   Epoch: 2   Global Step: 44070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:29,354-Speed 3348.58 samples/sec   Loss 4.3378   LearningRate 0.0753   Epoch: 2   Global Step: 44080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:32,409-Speed 3352.48 samples/sec   Loss 4.2901   LearningRate 0.0753   Epoch: 2   Global Step: 44090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:35,485-Speed 3329.71 samples/sec   Loss 4.3767   LearningRate 0.0753   Epoch: 2   Global Step: 44100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:38,542-Speed 3350.11 samples/sec   Loss 4.3747   LearningRate 0.0753   Epoch: 2   Global Step: 44110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:41,668-Speed 3276.88 samples/sec   Loss 4.2467   LearningRate 0.0753   Epoch: 2   Global Step: 44120   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:44,785-Speed 3285.74 samples/sec   Loss 4.3345   LearningRate 0.0753   Epoch: 2   Global Step: 44130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:47,861-Speed 3329.79 samples/sec   Loss 4.3757   LearningRate 0.0753   Epoch: 2   Global Step: 44140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:50,935-Speed 3331.36 samples/sec   Loss 4.3143   LearningRate 0.0753   Epoch: 2   Global Step: 44150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-04-11 03:54:54,007-Speed 3334.21 samples/sec   Loss 4.2755   LearningRate 0.0753   Epoch: 2   Global Step: 44160   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:54:57,074-Speed 3339.57 samples/sec   Loss 4.3792   LearningRate 0.0753   Epoch: 2   Global Step: 44170   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:00,140-Speed 3341.00 samples/sec   Loss 4.2488   LearningRate 0.0753   Epoch: 2   Global Step: 44180   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:03,213-Speed 3333.04 samples/sec   Loss 4.2304   LearningRate 0.0753   Epoch: 2   Global Step: 44190   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:06,278-Speed 3341.12 samples/sec   Loss 4.3027   LearningRate 0.0753   Epoch: 2   Global Step: 44200   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:09,359-Speed 3324.78 samples/sec   Loss 4.2709   LearningRate 0.0753   Epoch: 2   Global Step: 44210   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:12,524-Speed 3236.56 samples/sec   Loss 4.3041   LearningRate 0.0753   Epoch: 2   Global Step: 44220   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:15,658-Speed 3267.47 samples/sec   Loss 4.2637   LearningRate 0.0753   Epoch: 2   Global Step: 44230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:18,729-Speed 3335.99 samples/sec   Loss 4.2861   LearningRate 0.0753   Epoch: 2   Global Step: 44240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:21,798-Speed 3336.95 samples/sec   Loss 4.2894   LearningRate 0.0752   Epoch: 2   Global Step: 44250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:24,863-Speed 3341.28 samples/sec   Loss 4.2582   LearningRate 0.0752   Epoch: 2   Global Step: 44260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:27,925-Speed 3344.72 samples/sec   Loss 4.3480   LearningRate 0.0752   Epoch: 2   Global Step: 44270   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:30,994-Speed 3338.16 samples/sec   Loss 4.3338   LearningRate 0.0752   Epoch: 2   Global Step: 44280   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:34,064-Speed 3336.11 samples/sec   Loss 4.3803   LearningRate 0.0752   Epoch: 2   Global Step: 44290   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:37,125-Speed 3345.55 samples/sec   Loss 4.3029   LearningRate 0.0752   Epoch: 2   Global Step: 44300   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:40,198-Speed 3332.96 samples/sec   Loss 4.2661   LearningRate 0.0752   Epoch: 2   Global Step: 44310   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:43,259-Speed 3346.24 samples/sec   Loss 4.3757   LearningRate 0.0752   Epoch: 2   Global Step: 44320   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:46,342-Speed 3322.52 samples/sec   Loss 4.4130   LearningRate 0.0752   Epoch: 2   Global Step: 44330   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:49,430-Speed 3317.04 samples/sec   Loss 4.3478   LearningRate 0.0752   Epoch: 2   Global Step: 44340   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:52,498-Speed 3338.11 samples/sec   Loss 4.3237   LearningRate 0.0752   Epoch: 2   Global Step: 44350   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:55,570-Speed 3334.84 samples/sec   Loss 4.3459   LearningRate 0.0752   Epoch: 2   Global Step: 44360   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:55:58,635-Speed 3341.70 samples/sec   Loss 4.3097   LearningRate 0.0752   Epoch: 2   Global Step: 44370   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:56:01,704-Speed 3336.70 samples/sec   Loss 4.2962   LearningRate 0.0752   Epoch: 2   Global Step: 44380   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:56:04,768-Speed 3343.09 samples/sec   Loss 4.2880   LearningRate 0.0752   Epoch: 2   Global Step: 44390   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:56:07,835-Speed 3339.04 samples/sec   Loss 4.3522   LearningRate 0.0752   Epoch: 2   Global Step: 44400   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:56:10,933-Speed 3306.60 samples/sec   Loss 4.3865   LearningRate 0.0752   Epoch: 2   Global Step: 44410   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:56:13,993-Speed 3347.44 samples/sec   Loss 4.2932   LearningRate 0.0752   Epoch: 2   Global Step: 44420   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:56:17,214-Speed 3179.94 samples/sec   Loss 4.3614   LearningRate 0.0752   Epoch: 2   Global Step: 44430   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:56:20,335-Speed 3282.34 samples/sec   Loss 4.2394   LearningRate 0.0751   Epoch: 2   Global Step: 44440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:56:23,409-Speed 3332.02 samples/sec   Loss 4.3319   LearningRate 0.0751   Epoch: 2   Global Step: 44450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-04-11 03:56:26,527-Speed 3284.33 samples/sec   Loss 4.3218   LearningRate 0.0751   Epoch: 2   Global Step: 44460   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 03:56:29,624-Speed 3307.00 samples/sec   Loss 4.3044   LearningRate 0.0751   Epoch: 2   Global Step: 44470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:56:32,850-Speed 3175.72 samples/sec   Loss 4.2530   LearningRate 0.0751   Epoch: 2   Global Step: 44480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:56:36,059-Speed 3191.68 samples/sec   Loss 4.2158   LearningRate 0.0751   Epoch: 2   Global Step: 44490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:56:39,127-Speed 3338.40 samples/sec   Loss 4.2386   LearningRate 0.0751   Epoch: 2   Global Step: 44500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:56:42,202-Speed 3330.99 samples/sec   Loss 4.3441   LearningRate 0.0751   Epoch: 2   Global Step: 44510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:56:45,263-Speed 3346.45 samples/sec   Loss 4.2369   LearningRate 0.0751   Epoch: 2   Global Step: 44520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:56:48,328-Speed 3341.96 samples/sec   Loss 4.3376   LearningRate 0.0751   Epoch: 2   Global Step: 44530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:56:51,398-Speed 3336.31 samples/sec   Loss 4.2792   LearningRate 0.0751   Epoch: 2   Global Step: 44540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:56:54,466-Speed 3338.03 samples/sec   Loss 4.2477   LearningRate 0.0751   Epoch: 2   Global Step: 44550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:56:57,584-Speed 3284.12 samples/sec   Loss 4.2873   LearningRate 0.0751   Epoch: 2   Global Step: 44560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:57:00,675-Speed 3314.65 samples/sec   Loss 4.1497   LearningRate 0.0751   Epoch: 2   Global Step: 44570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:57:03,746-Speed 3334.24 samples/sec   Loss 4.2917   LearningRate 0.0751   Epoch: 2   Global Step: 44580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:57:06,878-Speed 3270.76 samples/sec   Loss 4.2545   LearningRate 0.0751   Epoch: 2   Global Step: 44590   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:57:09,930-Speed 3355.48 samples/sec   Loss 4.4209   LearningRate 0.0751   Epoch: 2   Global Step: 44600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:12,996-Speed 3340.73 samples/sec   Loss 4.2456   LearningRate 0.0751   Epoch: 2   Global Step: 44610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:16,056-Speed 3347.80 samples/sec   Loss 4.2466   LearningRate 0.0751   Epoch: 2   Global Step: 44620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:19,127-Speed 3335.42 samples/sec   Loss 4.2929   LearningRate 0.0750   Epoch: 2   Global Step: 44630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:22,195-Speed 3338.03 samples/sec   Loss 4.2958   LearningRate 0.0750   Epoch: 2   Global Step: 44640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:25,276-Speed 3324.60 samples/sec   Loss 4.2791   LearningRate 0.0750   Epoch: 2   Global Step: 44650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:28,365-Speed 3315.40 samples/sec   Loss 4.3174   LearningRate 0.0750   Epoch: 2   Global Step: 44660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:31,463-Speed 3305.54 samples/sec   Loss 4.2399   LearningRate 0.0750   Epoch: 2   Global Step: 44670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:34,542-Speed 3326.69 samples/sec   Loss 4.3014   LearningRate 0.0750   Epoch: 2   Global Step: 44680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:37,608-Speed 3340.92 samples/sec   Loss 4.4074   LearningRate 0.0750   Epoch: 2   Global Step: 44690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:40,671-Speed 3343.38 samples/sec   Loss 4.3418   LearningRate 0.0750   Epoch: 2   Global Step: 44700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:57:43,746-Speed 3331.95 samples/sec   Loss 4.2519   LearningRate 0.0750   Epoch: 2   Global Step: 44710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:57:46,813-Speed 3339.14 samples/sec   Loss 4.2708   LearningRate 0.0750   Epoch: 2   Global Step: 44720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:57:49,877-Speed 3342.71 samples/sec   Loss 4.2811   LearningRate 0.0750   Epoch: 2   Global Step: 44730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:57:52,929-Speed 3355.80 samples/sec   Loss 4.2707   LearningRate 0.0750   Epoch: 2   Global Step: 44740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:55,996-Speed 3339.08 samples/sec   Loss 4.2844   LearningRate 0.0750   Epoch: 2   Global Step: 44750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:57:59,061-Speed 3342.33 samples/sec   Loss 4.1622   LearningRate 0.0750   Epoch: 2   Global Step: 44760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:02,128-Speed 3338.94 samples/sec   Loss 4.2927   LearningRate 0.0750   Epoch: 2   Global Step: 44770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:05,196-Speed 3339.52 samples/sec   Loss 4.2388   LearningRate 0.0750   Epoch: 2   Global Step: 44780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:08,260-Speed 3342.57 samples/sec   Loss 4.3065   LearningRate 0.0750   Epoch: 2   Global Step: 44790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:11,327-Speed 3340.22 samples/sec   Loss 4.2027   LearningRate 0.0750   Epoch: 2   Global Step: 44800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:14,405-Speed 3327.38 samples/sec   Loss 4.3031   LearningRate 0.0750   Epoch: 2   Global Step: 44810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:17,470-Speed 3342.19 samples/sec   Loss 4.3058   LearningRate 0.0749   Epoch: 2   Global Step: 44820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:20,532-Speed 3344.81 samples/sec   Loss 4.3398   LearningRate 0.0749   Epoch: 2   Global Step: 44830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:23,601-Speed 3337.27 samples/sec   Loss 4.2249   LearningRate 0.0749   Epoch: 2   Global Step: 44840   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:58:26,675-Speed 3331.94 samples/sec   Loss 4.3424   LearningRate 0.0749   Epoch: 2   Global Step: 44850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:58:29,736-Speed 3345.77 samples/sec   Loss 4.2034   LearningRate 0.0749   Epoch: 2   Global Step: 44860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:32,801-Speed 3342.29 samples/sec   Loss 4.3263   LearningRate 0.0749   Epoch: 2   Global Step: 44870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:35,865-Speed 3342.68 samples/sec   Loss 4.1932   LearningRate 0.0749   Epoch: 2   Global Step: 44880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:38,930-Speed 3340.96 samples/sec   Loss 4.2788   LearningRate 0.0749   Epoch: 2   Global Step: 44890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:42,024-Speed 3310.27 samples/sec   Loss 4.2362   LearningRate 0.0749   Epoch: 2   Global Step: 44900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:45,089-Speed 3341.67 samples/sec   Loss 4.1581   LearningRate 0.0749   Epoch: 2   Global Step: 44910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:48,150-Speed 3346.06 samples/sec   Loss 4.1976   LearningRate 0.0749   Epoch: 2   Global Step: 44920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:51,219-Speed 3338.00 samples/sec   Loss 4.3069   LearningRate 0.0749   Epoch: 2   Global Step: 44930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:54,308-Speed 3315.61 samples/sec   Loss 4.2623   LearningRate 0.0749   Epoch: 2   Global Step: 44940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:58:57,393-Speed 3320.75 samples/sec   Loss 4.2483   LearningRate 0.0749   Epoch: 2   Global Step: 44950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:59:00,462-Speed 3337.05 samples/sec   Loss 4.2983   LearningRate 0.0749   Epoch: 2   Global Step: 44960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:03,525-Speed 3344.10 samples/sec   Loss 4.3199   LearningRate 0.0749   Epoch: 2   Global Step: 44970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:06,610-Speed 3320.11 samples/sec   Loss 4.3347   LearningRate 0.0749   Epoch: 2   Global Step: 44980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:09,673-Speed 3343.92 samples/sec   Loss 4.2033   LearningRate 0.0749   Epoch: 2   Global Step: 44990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:12,743-Speed 3335.94 samples/sec   Loss 4.2524   LearningRate 0.0749   Epoch: 2   Global Step: 45000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:15,824-Speed 3324.27 samples/sec   Loss 4.2873   LearningRate 0.0749   Epoch: 2   Global Step: 45010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:18,894-Speed 3336.66 samples/sec   Loss 4.2007   LearningRate 0.0748   Epoch: 2   Global Step: 45020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:21,968-Speed 3330.91 samples/sec   Loss 4.2636   LearningRate 0.0748   Epoch: 2   Global Step: 45030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:25,192-Speed 3177.69 samples/sec   Loss 4.2614   LearningRate 0.0748   Epoch: 2   Global Step: 45040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:28,254-Speed 3345.45 samples/sec   Loss 4.2171   LearningRate 0.0748   Epoch: 2   Global Step: 45050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:31,316-Speed 3344.92 samples/sec   Loss 4.2401   LearningRate 0.0748   Epoch: 2   Global Step: 45060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:34,411-Speed 3308.27 samples/sec   Loss 4.2393   LearningRate 0.0748   Epoch: 2   Global Step: 45070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:37,496-Speed 3320.93 samples/sec   Loss 4.2325   LearningRate 0.0748   Epoch: 2   Global Step: 45080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 03:59:40,603-Speed 3295.86 samples/sec   Loss 4.2343   LearningRate 0.0748   Epoch: 2   Global Step: 45090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:59:43,669-Speed 3340.35 samples/sec   Loss 4.2694   LearningRate 0.0748   Epoch: 2   Global Step: 45100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:59:46,757-Speed 3317.65 samples/sec   Loss 4.3171   LearningRate 0.0748   Epoch: 2   Global Step: 45110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:59:49,832-Speed 3330.43 samples/sec   Loss 4.1909   LearningRate 0.0748   Epoch: 2   Global Step: 45120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:59:52,909-Speed 3328.69 samples/sec   Loss 4.2631   LearningRate 0.0748   Epoch: 2   Global Step: 45130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:59:55,975-Speed 3340.73 samples/sec   Loss 4.1883   LearningRate 0.0748   Epoch: 2   Global Step: 45140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 03:59:59,049-Speed 3331.92 samples/sec   Loss 4.2479   LearningRate 0.0748   Epoch: 2   Global Step: 45150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:02,180-Speed 3271.15 samples/sec   Loss 4.1634   LearningRate 0.0748   Epoch: 2   Global Step: 45160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:05,245-Speed 3342.33 samples/sec   Loss 4.3050   LearningRate 0.0748   Epoch: 2   Global Step: 45170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:08,312-Speed 3339.54 samples/sec   Loss 4.2885   LearningRate 0.0748   Epoch: 2   Global Step: 45180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:11,391-Speed 3325.75 samples/sec   Loss 4.2679   LearningRate 0.0748   Epoch: 2   Global Step: 45190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:00:14,492-Speed 3302.90 samples/sec   Loss 4.2716   LearningRate 0.0748   Epoch: 2   Global Step: 45200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:00:17,570-Speed 3328.14 samples/sec   Loss 4.3101   LearningRate 0.0747   Epoch: 2   Global Step: 45210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:00:20,636-Speed 3341.20 samples/sec   Loss 4.3342   LearningRate 0.0747   Epoch: 2   Global Step: 45220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:00:23,738-Speed 3301.56 samples/sec   Loss 4.2075   LearningRate 0.0747   Epoch: 2   Global Step: 45230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:00:26,799-Speed 3345.92 samples/sec   Loss 4.2910   LearningRate 0.0747   Epoch: 2   Global Step: 45240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:00:29,852-Speed 3354.87 samples/sec   Loss 4.2458   LearningRate 0.0747   Epoch: 2   Global Step: 45250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:32,972-Speed 3282.96 samples/sec   Loss 4.1999   LearningRate 0.0747   Epoch: 2   Global Step: 45260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:36,035-Speed 3344.30 samples/sec   Loss 4.2182   LearningRate 0.0747   Epoch: 2   Global Step: 45270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:39,164-Speed 3273.14 samples/sec   Loss 4.2820   LearningRate 0.0747   Epoch: 2   Global Step: 45280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:42,229-Speed 3341.65 samples/sec   Loss 4.2453   LearningRate 0.0747   Epoch: 2   Global Step: 45290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:45,311-Speed 3323.34 samples/sec   Loss 4.2006   LearningRate 0.0747   Epoch: 2   Global Step: 45300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:48,413-Speed 3302.23 samples/sec   Loss 4.2904   LearningRate 0.0747   Epoch: 2   Global Step: 45310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:51,475-Speed 3344.09 samples/sec   Loss 4.1818   LearningRate 0.0747   Epoch: 2   Global Step: 45320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:54,548-Speed 3334.18 samples/sec   Loss 4.2934   LearningRate 0.0747   Epoch: 2   Global Step: 45330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:00:57,614-Speed 3339.76 samples/sec   Loss 4.1460   LearningRate 0.0747   Epoch: 2   Global Step: 45340   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:00,684-Speed 3336.83 samples/sec   Loss 4.3090   LearningRate 0.0747   Epoch: 2   Global Step: 45350   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:03,776-Speed 3312.33 samples/sec   Loss 4.2830   LearningRate 0.0747   Epoch: 2   Global Step: 45360   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:06,881-Speed 3298.74 samples/sec   Loss 4.2663   LearningRate 0.0747   Epoch: 2   Global Step: 45370   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:09,943-Speed 3344.44 samples/sec   Loss 4.2118   LearningRate 0.0747   Epoch: 2   Global Step: 45380   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:13,035-Speed 3312.70 samples/sec   Loss 4.2619   LearningRate 0.0747   Epoch: 2   Global Step: 45390   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:16,228-Speed 3207.91 samples/sec   Loss 4.2865   LearningRate 0.0746   Epoch: 2   Global Step: 45400   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:19,413-Speed 3215.43 samples/sec   Loss 4.3005   LearningRate 0.0746   Epoch: 2   Global Step: 45410   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:22,491-Speed 3328.28 samples/sec   Loss 4.1324   LearningRate 0.0746   Epoch: 2   Global Step: 45420   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:25,601-Speed 3293.26 samples/sec   Loss 4.2324   LearningRate 0.0746   Epoch: 2   Global Step: 45430   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:01:28,675-Speed 3331.18 samples/sec   Loss 4.2144   LearningRate 0.0746   Epoch: 2   Global Step: 45440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:31,751-Speed 3329.74 samples/sec   Loss 4.2619   LearningRate 0.0746   Epoch: 2   Global Step: 45450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:34,843-Speed 3312.57 samples/sec   Loss 4.2820   LearningRate 0.0746   Epoch: 2   Global Step: 45460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:37,945-Speed 3302.18 samples/sec   Loss 4.2786   LearningRate 0.0746   Epoch: 2   Global Step: 45470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:41,024-Speed 3326.60 samples/sec   Loss 4.2210   LearningRate 0.0746   Epoch: 2   Global Step: 45480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:44,089-Speed 3341.37 samples/sec   Loss 4.1984   LearningRate 0.0746   Epoch: 2   Global Step: 45490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:47,177-Speed 3317.34 samples/sec   Loss 4.1152   LearningRate 0.0746   Epoch: 2   Global Step: 45500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:50,281-Speed 3299.81 samples/sec   Loss 4.3596   LearningRate 0.0746   Epoch: 2   Global Step: 45510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:53,343-Speed 3345.10 samples/sec   Loss 4.2394   LearningRate 0.0746   Epoch: 2   Global Step: 45520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:56,406-Speed 3344.35 samples/sec   Loss 4.1801   LearningRate 0.0746   Epoch: 2   Global Step: 45530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:01:59,473-Speed 3339.02 samples/sec   Loss 4.3049   LearningRate 0.0746   Epoch: 2   Global Step: 45540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:02,558-Speed 3320.08 samples/sec   Loss 4.3058   LearningRate 0.0746   Epoch: 2   Global Step: 45550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:05,625-Speed 3339.04 samples/sec   Loss 4.2857   LearningRate 0.0746   Epoch: 2   Global Step: 45560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:08,692-Speed 3339.60 samples/sec   Loss 4.2226   LearningRate 0.0746   Epoch: 2   Global Step: 45570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:11,770-Speed 3327.82 samples/sec   Loss 4.3351   LearningRate 0.0746   Epoch: 2   Global Step: 45580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:14,849-Speed 3327.23 samples/sec   Loss 4.2527   LearningRate 0.0746   Epoch: 2   Global Step: 45590   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:17,912-Speed 3343.23 samples/sec   Loss 4.2034   LearningRate 0.0745   Epoch: 2   Global Step: 45600   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:20,990-Speed 3328.06 samples/sec   Loss 4.2139   LearningRate 0.0745   Epoch: 2   Global Step: 45610   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:24,061-Speed 3335.39 samples/sec   Loss 4.2844   LearningRate 0.0745   Epoch: 2   Global Step: 45620   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:27,129-Speed 3338.38 samples/sec   Loss 4.2149   LearningRate 0.0745   Epoch: 2   Global Step: 45630   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:02:30,197-Speed 3337.90 samples/sec   Loss 4.2360   LearningRate 0.0745   Epoch: 2   Global Step: 45640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:02:33,300-Speed 3301.10 samples/sec   Loss 4.1512   LearningRate 0.0745   Epoch: 2   Global Step: 45650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:02:36,487-Speed 3214.02 samples/sec   Loss 4.2299   LearningRate 0.0745   Epoch: 2   Global Step: 45660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:02:39,571-Speed 3320.46 samples/sec   Loss 4.2600   LearningRate 0.0745   Epoch: 2   Global Step: 45670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:02:42,662-Speed 3313.63 samples/sec   Loss 4.2246   LearningRate 0.0745   Epoch: 2   Global Step: 45680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:02:45,761-Speed 3305.02 samples/sec   Loss 4.3200   LearningRate 0.0745   Epoch: 2   Global Step: 45690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:02:48,826-Speed 3341.69 samples/sec   Loss 4.3054   LearningRate 0.0745   Epoch: 2   Global Step: 45700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:02:51,904-Speed 3327.74 samples/sec   Loss 4.2604   LearningRate 0.0745   Epoch: 2   Global Step: 45710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:02:55,026-Speed 3281.39 samples/sec   Loss 4.0870   LearningRate 0.0745   Epoch: 2   Global Step: 45720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:02:58,105-Speed 3325.85 samples/sec   Loss 4.1435   LearningRate 0.0745   Epoch: 2   Global Step: 45730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:01,168-Speed 3344.73 samples/sec   Loss 4.2498   LearningRate 0.0745   Epoch: 2   Global Step: 45740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:03:04,245-Speed 3327.57 samples/sec   Loss 4.2757   LearningRate 0.0745   Epoch: 2   Global Step: 45750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:07,318-Speed 3333.69 samples/sec   Loss 4.2008   LearningRate 0.0745   Epoch: 2   Global Step: 45760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:10,399-Speed 3324.76 samples/sec   Loss 4.3002   LearningRate 0.0745   Epoch: 2   Global Step: 45770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:13,540-Speed 3260.57 samples/sec   Loss 4.2286   LearningRate 0.0745   Epoch: 2   Global Step: 45780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:16,609-Speed 3337.16 samples/sec   Loss 4.1514   LearningRate 0.0744   Epoch: 2   Global Step: 45790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:19,729-Speed 3283.19 samples/sec   Loss 4.2270   LearningRate 0.0744   Epoch: 2   Global Step: 45800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:22,818-Speed 3315.50 samples/sec   Loss 4.2773   LearningRate 0.0744   Epoch: 2   Global Step: 45810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:25,889-Speed 3335.06 samples/sec   Loss 4.2436   LearningRate 0.0744   Epoch: 2   Global Step: 45820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:28,954-Speed 3341.77 samples/sec   Loss 4.2082   LearningRate 0.0744   Epoch: 2   Global Step: 45830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:32,027-Speed 3333.46 samples/sec   Loss 4.1682   LearningRate 0.0744   Epoch: 2   Global Step: 45840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:03:35,104-Speed 3330.14 samples/sec   Loss 4.2514   LearningRate 0.0744   Epoch: 2   Global Step: 45850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:03:38,204-Speed 3303.82 samples/sec   Loss 4.1816   LearningRate 0.0744   Epoch: 2   Global Step: 45860   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:03:41,278-Speed 3332.07 samples/sec   Loss 4.2174   LearningRate 0.0744   Epoch: 2   Global Step: 45870   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:03:44,348-Speed 3335.93 samples/sec   Loss 4.2053   LearningRate 0.0744   Epoch: 2   Global Step: 45880   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:03:47,413-Speed 3342.49 samples/sec   Loss 4.2835   LearningRate 0.0744   Epoch: 2   Global Step: 45890   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:03:50,478-Speed 3341.38 samples/sec   Loss 4.2252   LearningRate 0.0744   Epoch: 2   Global Step: 45900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:03:53,556-Speed 3328.39 samples/sec   Loss 4.2493   LearningRate 0.0744   Epoch: 2   Global Step: 45910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:03:56,675-Speed 3282.94 samples/sec   Loss 4.3073   LearningRate 0.0744   Epoch: 2   Global Step: 45920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:03:59,773-Speed 3306.66 samples/sec   Loss 4.2021   LearningRate 0.0744   Epoch: 2   Global Step: 45930   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:04:02,855-Speed 3323.79 samples/sec   Loss 4.2752   LearningRate 0.0744   Epoch: 2   Global Step: 45940   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:04:05,988-Speed 3269.60 samples/sec   Loss 4.3070   LearningRate 0.0744   Epoch: 2   Global Step: 45950   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:04:09,081-Speed 3310.47 samples/sec   Loss 4.2206   LearningRate 0.0744   Epoch: 2   Global Step: 45960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:04:12,182-Speed 3303.64 samples/sec   Loss 4.2420   LearningRate 0.0744   Epoch: 2   Global Step: 45970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:04:15,286-Speed 3298.90 samples/sec   Loss 4.2146   LearningRate 0.0743   Epoch: 2   Global Step: 45980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:04:18,405-Speed 3284.10 samples/sec   Loss 4.3018   LearningRate 0.0743   Epoch: 2   Global Step: 45990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:04:21,628-Speed 3178.29 samples/sec   Loss 4.2709   LearningRate 0.0743   Epoch: 2   Global Step: 46000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:05:05,713-[lfw][46000]XNorm: 22.211487
Training: 2022-04-11 04:05:05,713-[lfw][46000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-04-11 04:05:05,714-[lfw][46000]Accuracy-Highest: 0.99783
Training: 2022-04-11 04:05:56,911-[cfp_fp][46000]XNorm: 20.651301
Training: 2022-04-11 04:05:56,911-[cfp_fp][46000]Accuracy-Flip: 0.98186+-0.00610
Training: 2022-04-11 04:05:56,912-[cfp_fp][46000]Accuracy-Highest: 0.98214
Training: 2022-04-11 04:06:40,917-[agedb_30][46000]XNorm: 22.577251
Training: 2022-04-11 04:06:40,918-[agedb_30][46000]Accuracy-Flip: 0.97750+-0.00659
Training: 2022-04-11 04:06:40,918-[agedb_30][46000]Accuracy-Highest: 0.97917
Training: 2022-04-11 04:06:44,048-Speed 71.90 samples/sec   Loss 4.1724   LearningRate 0.0743   Epoch: 2   Global Step: 46010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:06:47,227-Speed 3221.38 samples/sec   Loss 4.2892   LearningRate 0.0743   Epoch: 2   Global Step: 46020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:06:50,290-Speed 3344.30 samples/sec   Loss 4.1121   LearningRate 0.0743   Epoch: 2   Global Step: 46030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:06:53,350-Speed 3346.56 samples/sec   Loss 4.1786   LearningRate 0.0743   Epoch: 2   Global Step: 46040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:06:56,409-Speed 3348.55 samples/sec   Loss 4.1910   LearningRate 0.0743   Epoch: 2   Global Step: 46050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:06:59,444-Speed 3375.27 samples/sec   Loss 4.1887   LearningRate 0.0743   Epoch: 2   Global Step: 46060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:02,503-Speed 3347.46 samples/sec   Loss 4.2590   LearningRate 0.0743   Epoch: 2   Global Step: 46070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:05,552-Speed 3359.88 samples/sec   Loss 4.2275   LearningRate 0.0743   Epoch: 2   Global Step: 46080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:08,604-Speed 3356.48 samples/sec   Loss 4.2967   LearningRate 0.0743   Epoch: 2   Global Step: 46090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:11,659-Speed 3352.46 samples/sec   Loss 4.1107   LearningRate 0.0743   Epoch: 2   Global Step: 46100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:14,730-Speed 3335.61 samples/sec   Loss 4.1794   LearningRate 0.0743   Epoch: 2   Global Step: 46110   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:17,802-Speed 3332.95 samples/sec   Loss 4.2242   LearningRate 0.0743   Epoch: 2   Global Step: 46120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:20,858-Speed 3352.35 samples/sec   Loss 4.1585   LearningRate 0.0743   Epoch: 2   Global Step: 46130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:23,913-Speed 3351.93 samples/sec   Loss 4.2444   LearningRate 0.0743   Epoch: 2   Global Step: 46140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:26,981-Speed 3339.18 samples/sec   Loss 4.1812   LearningRate 0.0743   Epoch: 2   Global Step: 46150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:30,033-Speed 3356.28 samples/sec   Loss 4.1917   LearningRate 0.0743   Epoch: 2   Global Step: 46160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:33,098-Speed 3341.88 samples/sec   Loss 4.1990   LearningRate 0.0743   Epoch: 2   Global Step: 46170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:36,153-Speed 3352.66 samples/sec   Loss 4.2487   LearningRate 0.0742   Epoch: 2   Global Step: 46180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:39,213-Speed 3346.93 samples/sec   Loss 4.2326   LearningRate 0.0742   Epoch: 2   Global Step: 46190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:42,326-Speed 3290.70 samples/sec   Loss 4.2563   LearningRate 0.0742   Epoch: 2   Global Step: 46200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:45,413-Speed 3317.92 samples/sec   Loss 4.1960   LearningRate 0.0742   Epoch: 2   Global Step: 46210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:48,480-Speed 3339.24 samples/sec   Loss 4.2021   LearningRate 0.0742   Epoch: 2   Global Step: 46220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:51,539-Speed 3348.20 samples/sec   Loss 4.2362   LearningRate 0.0742   Epoch: 2   Global Step: 46230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:54,641-Speed 3301.31 samples/sec   Loss 4.1630   LearningRate 0.0742   Epoch: 2   Global Step: 46240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:07:57,751-Speed 3294.15 samples/sec   Loss 4.2343   LearningRate 0.0742   Epoch: 2   Global Step: 46250   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:00,799-Speed 3360.31 samples/sec   Loss 4.1639   LearningRate 0.0742   Epoch: 2   Global Step: 46260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:03,862-Speed 3344.01 samples/sec   Loss 4.2234   LearningRate 0.0742   Epoch: 2   Global Step: 46270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:06,911-Speed 3359.32 samples/sec   Loss 4.2338   LearningRate 0.0742   Epoch: 2   Global Step: 46280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:09,969-Speed 3349.05 samples/sec   Loss 4.2465   LearningRate 0.0742   Epoch: 2   Global Step: 46290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:13,029-Speed 3347.96 samples/sec   Loss 4.2512   LearningRate 0.0742   Epoch: 2   Global Step: 46300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:16,090-Speed 3345.26 samples/sec   Loss 4.2054   LearningRate 0.0742   Epoch: 2   Global Step: 46310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:19,188-Speed 3306.03 samples/sec   Loss 4.1004   LearningRate 0.0742   Epoch: 2   Global Step: 46320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:22,287-Speed 3305.81 samples/sec   Loss 4.2447   LearningRate 0.0742   Epoch: 2   Global Step: 46330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:25,367-Speed 3325.65 samples/sec   Loss 4.2242   LearningRate 0.0742   Epoch: 2   Global Step: 46340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:28,440-Speed 3332.77 samples/sec   Loss 4.0953   LearningRate 0.0742   Epoch: 2   Global Step: 46350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:31,564-Speed 3278.12 samples/sec   Loss 4.0984   LearningRate 0.0742   Epoch: 2   Global Step: 46360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:34,716-Speed 3249.92 samples/sec   Loss 4.1189   LearningRate 0.0741   Epoch: 2   Global Step: 46370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:08:37,866-Speed 3251.73 samples/sec   Loss 4.2030   LearningRate 0.0741   Epoch: 2   Global Step: 46380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:41,050-Speed 3216.77 samples/sec   Loss 4.1848   LearningRate 0.0741   Epoch: 2   Global Step: 46390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:44,156-Speed 3297.16 samples/sec   Loss 4.2094   LearningRate 0.0741   Epoch: 2   Global Step: 46400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:47,252-Speed 3307.87 samples/sec   Loss 4.2942   LearningRate 0.0741   Epoch: 2   Global Step: 46410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:50,315-Speed 3344.43 samples/sec   Loss 4.1893   LearningRate 0.0741   Epoch: 2   Global Step: 46420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:53,399-Speed 3322.22 samples/sec   Loss 4.0897   LearningRate 0.0741   Epoch: 2   Global Step: 46430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:56,461-Speed 3344.44 samples/sec   Loss 4.2111   LearningRate 0.0741   Epoch: 2   Global Step: 46440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:08:59,519-Speed 3349.15 samples/sec   Loss 4.2791   LearningRate 0.0741   Epoch: 2   Global Step: 46450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:02,578-Speed 3348.44 samples/sec   Loss 4.1306   LearningRate 0.0741   Epoch: 2   Global Step: 46460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:05,642-Speed 3342.99 samples/sec   Loss 4.2373   LearningRate 0.0741   Epoch: 2   Global Step: 46470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:08,713-Speed 3336.23 samples/sec   Loss 4.2396   LearningRate 0.0741   Epoch: 2   Global Step: 46480   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:09:11,799-Speed 3318.66 samples/sec   Loss 4.1519   LearningRate 0.0741   Epoch: 2   Global Step: 46490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:14,963-Speed 3237.87 samples/sec   Loss 4.2157   LearningRate 0.0741   Epoch: 2   Global Step: 46500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:18,147-Speed 3216.98 samples/sec   Loss 4.2064   LearningRate 0.0741   Epoch: 2   Global Step: 46510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:21,238-Speed 3312.74 samples/sec   Loss 4.2262   LearningRate 0.0741   Epoch: 2   Global Step: 46520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:24,357-Speed 3284.04 samples/sec   Loss 4.1991   LearningRate 0.0741   Epoch: 2   Global Step: 46530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:27,485-Speed 3274.07 samples/sec   Loss 4.1989   LearningRate 0.0741   Epoch: 2   Global Step: 46540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:30,549-Speed 3342.99 samples/sec   Loss 4.2985   LearningRate 0.0741   Epoch: 2   Global Step: 46550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:33,609-Speed 3347.62 samples/sec   Loss 4.1863   LearningRate 0.0741   Epoch: 2   Global Step: 46560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:36,664-Speed 3352.15 samples/sec   Loss 4.2829   LearningRate 0.0740   Epoch: 2   Global Step: 46570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:39,746-Speed 3324.09 samples/sec   Loss 4.1843   LearningRate 0.0740   Epoch: 2   Global Step: 46580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:42,788-Speed 3366.68 samples/sec   Loss 4.1368   LearningRate 0.0740   Epoch: 2   Global Step: 46590   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:45,846-Speed 3350.23 samples/sec   Loss 4.1418   LearningRate 0.0740   Epoch: 2   Global Step: 46600   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:48,902-Speed 3351.07 samples/sec   Loss 4.1677   LearningRate 0.0740   Epoch: 2   Global Step: 46610   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:09:51,954-Speed 3356.19 samples/sec   Loss 4.2554   LearningRate 0.0740   Epoch: 2   Global Step: 46620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:09:55,012-Speed 3349.08 samples/sec   Loss 4.2089   LearningRate 0.0740   Epoch: 2   Global Step: 46630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:09:58,096-Speed 3320.68 samples/sec   Loss 4.1620   LearningRate 0.0740   Epoch: 2   Global Step: 46640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:01,207-Speed 3292.49 samples/sec   Loss 4.2664   LearningRate 0.0740   Epoch: 2   Global Step: 46650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:04,301-Speed 3310.38 samples/sec   Loss 4.0568   LearningRate 0.0740   Epoch: 2   Global Step: 46660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:07,371-Speed 3336.56 samples/sec   Loss 4.1599   LearningRate 0.0740   Epoch: 2   Global Step: 46670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:10,430-Speed 3348.71 samples/sec   Loss 4.2094   LearningRate 0.0740   Epoch: 2   Global Step: 46680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:13,502-Speed 3333.92 samples/sec   Loss 4.2432   LearningRate 0.0740   Epoch: 2   Global Step: 46690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:16,629-Speed 3275.94 samples/sec   Loss 4.2818   LearningRate 0.0740   Epoch: 2   Global Step: 46700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:19,756-Speed 3274.95 samples/sec   Loss 4.2334   LearningRate 0.0740   Epoch: 2   Global Step: 46710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:22,847-Speed 3313.25 samples/sec   Loss 4.2290   LearningRate 0.0740   Epoch: 2   Global Step: 46720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:10:25,914-Speed 3339.22 samples/sec   Loss 4.0877   LearningRate 0.0740   Epoch: 2   Global Step: 46730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:10:29,068-Speed 3248.39 samples/sec   Loss 4.1728   LearningRate 0.0740   Epoch: 2   Global Step: 46740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:10:32,128-Speed 3346.87 samples/sec   Loss 4.1815   LearningRate 0.0740   Epoch: 2   Global Step: 46750   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:10:35,187-Speed 3348.54 samples/sec   Loss 4.2474   LearningRate 0.0739   Epoch: 2   Global Step: 46760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:10:38,244-Speed 3349.88 samples/sec   Loss 4.2230   LearningRate 0.0739   Epoch: 2   Global Step: 46770   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:10:41,307-Speed 3345.11 samples/sec   Loss 4.2179   LearningRate 0.0739   Epoch: 2   Global Step: 46780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:44,411-Speed 3299.59 samples/sec   Loss 4.2277   LearningRate 0.0739   Epoch: 2   Global Step: 46790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:47,471-Speed 3347.23 samples/sec   Loss 4.2278   LearningRate 0.0739   Epoch: 2   Global Step: 46800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:50,594-Speed 3279.24 samples/sec   Loss 4.1382   LearningRate 0.0739   Epoch: 2   Global Step: 46810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:53,677-Speed 3322.39 samples/sec   Loss 4.2391   LearningRate 0.0739   Epoch: 2   Global Step: 46820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:56,767-Speed 3313.89 samples/sec   Loss 4.2300   LearningRate 0.0739   Epoch: 2   Global Step: 46830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:10:59,848-Speed 3324.59 samples/sec   Loss 4.1365   LearningRate 0.0739   Epoch: 2   Global Step: 46840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:02,978-Speed 3273.05 samples/sec   Loss 4.2210   LearningRate 0.0739   Epoch: 2   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:06,040-Speed 3344.94 samples/sec   Loss 4.0441   LearningRate 0.0739   Epoch: 2   Global Step: 46860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:09,201-Speed 3239.87 samples/sec   Loss 4.1436   LearningRate 0.0739   Epoch: 2   Global Step: 46870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:12,258-Speed 3351.05 samples/sec   Loss 4.1851   LearningRate 0.0739   Epoch: 2   Global Step: 46880   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:11:15,304-Speed 3361.92 samples/sec   Loss 4.1299   LearningRate 0.0739   Epoch: 2   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:18,367-Speed 3343.74 samples/sec   Loss 4.2413   LearningRate 0.0739   Epoch: 2   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:21,421-Speed 3354.69 samples/sec   Loss 4.2454   LearningRate 0.0739   Epoch: 2   Global Step: 46910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:24,491-Speed 3335.20 samples/sec   Loss 4.1776   LearningRate 0.0739   Epoch: 2   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:27,552-Speed 3346.80 samples/sec   Loss 4.1497   LearningRate 0.0739   Epoch: 2   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:30,627-Speed 3330.93 samples/sec   Loss 4.2256   LearningRate 0.0739   Epoch: 2   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:33,694-Speed 3339.28 samples/sec   Loss 4.1570   LearningRate 0.0738   Epoch: 2   Global Step: 46950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:36,798-Speed 3300.43 samples/sec   Loss 4.1624   LearningRate 0.0738   Epoch: 2   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:39,966-Speed 3232.74 samples/sec   Loss 4.1474   LearningRate 0.0738   Epoch: 2   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:43,158-Speed 3208.26 samples/sec   Loss 4.1580   LearningRate 0.0738   Epoch: 2   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:11:46,290-Speed 3271.36 samples/sec   Loss 4.2376   LearningRate 0.0738   Epoch: 2   Global Step: 46990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:11:49,408-Speed 3284.06 samples/sec   Loss 4.1857   LearningRate 0.0738   Epoch: 2   Global Step: 47000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:11:52,603-Speed 3205.90 samples/sec   Loss 4.2018   LearningRate 0.0738   Epoch: 2   Global Step: 47010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:11:55,667-Speed 3343.16 samples/sec   Loss 4.2480   LearningRate 0.0738   Epoch: 2   Global Step: 47020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:11:58,749-Speed 3323.78 samples/sec   Loss 4.1187   LearningRate 0.0738   Epoch: 2   Global Step: 47030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:12:01,798-Speed 3359.22 samples/sec   Loss 4.2093   LearningRate 0.0738   Epoch: 2   Global Step: 47040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:04,859-Speed 3345.42 samples/sec   Loss 4.2088   LearningRate 0.0738   Epoch: 2   Global Step: 47050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:07,919-Speed 3347.40 samples/sec   Loss 4.1612   LearningRate 0.0738   Epoch: 2   Global Step: 47060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:10,990-Speed 3334.85 samples/sec   Loss 4.1389   LearningRate 0.0738   Epoch: 2   Global Step: 47070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:14,058-Speed 3338.71 samples/sec   Loss 4.1598   LearningRate 0.0738   Epoch: 2   Global Step: 47080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:17,126-Speed 3338.82 samples/sec   Loss 4.2399   LearningRate 0.0738   Epoch: 2   Global Step: 47090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:20,188-Speed 3344.80 samples/sec   Loss 4.1647   LearningRate 0.0738   Epoch: 2   Global Step: 47100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:23,250-Speed 3345.56 samples/sec   Loss 4.2450   LearningRate 0.0738   Epoch: 2   Global Step: 47110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:26,387-Speed 3264.24 samples/sec   Loss 4.1924   LearningRate 0.0738   Epoch: 2   Global Step: 47120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:29,538-Speed 3251.14 samples/sec   Loss 4.1353   LearningRate 0.0738   Epoch: 2   Global Step: 47130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:32,632-Speed 3309.92 samples/sec   Loss 4.2330   LearningRate 0.0738   Epoch: 2   Global Step: 47140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:12:35,727-Speed 3309.10 samples/sec   Loss 4.1697   LearningRate 0.0737   Epoch: 2   Global Step: 47150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:12:38,806-Speed 3326.67 samples/sec   Loss 4.1901   LearningRate 0.0737   Epoch: 2   Global Step: 47160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:12:41,891-Speed 3320.25 samples/sec   Loss 4.1715   LearningRate 0.0737   Epoch: 2   Global Step: 47170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:12:44,963-Speed 3333.91 samples/sec   Loss 4.0998   LearningRate 0.0737   Epoch: 2   Global Step: 47180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:12:48,025-Speed 3346.07 samples/sec   Loss 4.2279   LearningRate 0.0737   Epoch: 2   Global Step: 47190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:12:51,088-Speed 3343.24 samples/sec   Loss 4.1075   LearningRate 0.0737   Epoch: 2   Global Step: 47200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:12:54,137-Speed 3359.78 samples/sec   Loss 4.1709   LearningRate 0.0737   Epoch: 2   Global Step: 47210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:12:57,199-Speed 3344.57 samples/sec   Loss 4.2172   LearningRate 0.0737   Epoch: 2   Global Step: 47220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:13:00,310-Speed 3292.47 samples/sec   Loss 4.1994   LearningRate 0.0737   Epoch: 2   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:13:03,375-Speed 3342.17 samples/sec   Loss 4.2635   LearningRate 0.0737   Epoch: 2   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:13:06,435-Speed 3346.57 samples/sec   Loss 4.1606   LearningRate 0.0737   Epoch: 2   Global Step: 47250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:13:09,538-Speed 3300.58 samples/sec   Loss 4.2590   LearningRate 0.0737   Epoch: 2   Global Step: 47260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:13:12,608-Speed 3336.76 samples/sec   Loss 4.2374   LearningRate 0.0737   Epoch: 2   Global Step: 47270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:13:15,697-Speed 3315.83 samples/sec   Loss 4.1405   LearningRate 0.0737   Epoch: 2   Global Step: 47280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:13:18,772-Speed 3331.43 samples/sec   Loss 4.2691   LearningRate 0.0737   Epoch: 2   Global Step: 47290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:13:21,842-Speed 3335.91 samples/sec   Loss 4.0984   LearningRate 0.0737   Epoch: 2   Global Step: 47300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:13:24,932-Speed 3314.89 samples/sec   Loss 4.2336   LearningRate 0.0737   Epoch: 2   Global Step: 47310   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:28,010-Speed 3327.85 samples/sec   Loss 4.2386   LearningRate 0.0737   Epoch: 2   Global Step: 47320   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:31,204-Speed 3206.18 samples/sec   Loss 4.2090   LearningRate 0.0737   Epoch: 2   Global Step: 47330   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:34,286-Speed 3322.88 samples/sec   Loss 4.2636   LearningRate 0.0736   Epoch: 2   Global Step: 47340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:37,358-Speed 3334.47 samples/sec   Loss 4.2401   LearningRate 0.0736   Epoch: 2   Global Step: 47350   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:40,476-Speed 3285.32 samples/sec   Loss 4.1867   LearningRate 0.0736   Epoch: 2   Global Step: 47360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:43,628-Speed 3249.63 samples/sec   Loss 4.1001   LearningRate 0.0736   Epoch: 2   Global Step: 47370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:46,728-Speed 3304.04 samples/sec   Loss 4.2187   LearningRate 0.0736   Epoch: 2   Global Step: 47380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:49,802-Speed 3331.89 samples/sec   Loss 4.2157   LearningRate 0.0736   Epoch: 2   Global Step: 47390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:52,861-Speed 3348.32 samples/sec   Loss 4.2358   LearningRate 0.0736   Epoch: 2   Global Step: 47400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:55,946-Speed 3319.66 samples/sec   Loss 4.1400   LearningRate 0.0736   Epoch: 2   Global Step: 47410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:13:59,064-Speed 3285.07 samples/sec   Loss 4.1285   LearningRate 0.0736   Epoch: 2   Global Step: 47420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:14:02,146-Speed 3323.63 samples/sec   Loss 4.1458   LearningRate 0.0736   Epoch: 2   Global Step: 47430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:14:05,229-Speed 3321.44 samples/sec   Loss 4.1026   LearningRate 0.0736   Epoch: 2   Global Step: 47440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:14:08,305-Speed 3329.98 samples/sec   Loss 4.2648   LearningRate 0.0736   Epoch: 2   Global Step: 47450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:14:11,356-Speed 3357.36 samples/sec   Loss 4.1505   LearningRate 0.0736   Epoch: 2   Global Step: 47460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:14,429-Speed 3333.44 samples/sec   Loss 4.2000   LearningRate 0.0736   Epoch: 2   Global Step: 47470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:17,487-Speed 3349.06 samples/sec   Loss 4.1466   LearningRate 0.0736   Epoch: 2   Global Step: 47480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:20,554-Speed 3339.67 samples/sec   Loss 4.1704   LearningRate 0.0736   Epoch: 2   Global Step: 47490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:23,630-Speed 3329.65 samples/sec   Loss 4.1318   LearningRate 0.0736   Epoch: 2   Global Step: 47500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:26,689-Speed 3347.71 samples/sec   Loss 4.2013   LearningRate 0.0736   Epoch: 2   Global Step: 47510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:29,756-Speed 3339.73 samples/sec   Loss 4.1926   LearningRate 0.0736   Epoch: 2   Global Step: 47520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:32,830-Speed 3332.01 samples/sec   Loss 4.1593   LearningRate 0.0736   Epoch: 2   Global Step: 47530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:35,905-Speed 3331.39 samples/sec   Loss 4.1846   LearningRate 0.0735   Epoch: 2   Global Step: 47540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:38,965-Speed 3347.17 samples/sec   Loss 4.0912   LearningRate 0.0735   Epoch: 2   Global Step: 47550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:14:42,026-Speed 3346.28 samples/sec   Loss 4.1936   LearningRate 0.0735   Epoch: 2   Global Step: 47560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:14:45,092-Speed 3340.76 samples/sec   Loss 4.0923   LearningRate 0.0735   Epoch: 2   Global Step: 47570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:14:48,184-Speed 3312.17 samples/sec   Loss 4.1729   LearningRate 0.0735   Epoch: 2   Global Step: 47580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:14:51,319-Speed 3267.31 samples/sec   Loss 4.1367   LearningRate 0.0735   Epoch: 2   Global Step: 47590   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:14:54,390-Speed 3335.34 samples/sec   Loss 4.2617   LearningRate 0.0735   Epoch: 2   Global Step: 47600   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:14:57,494-Speed 3299.81 samples/sec   Loss 4.1665   LearningRate 0.0735   Epoch: 2   Global Step: 47610   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:00,575-Speed 3323.39 samples/sec   Loss 4.1560   LearningRate 0.0735   Epoch: 2   Global Step: 47620   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:03,642-Speed 3340.36 samples/sec   Loss 4.0902   LearningRate 0.0735   Epoch: 2   Global Step: 47630   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:06,707-Speed 3341.25 samples/sec   Loss 4.2279   LearningRate 0.0735   Epoch: 2   Global Step: 47640   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:09,822-Speed 3288.15 samples/sec   Loss 4.2678   LearningRate 0.0735   Epoch: 2   Global Step: 47650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:12,886-Speed 3343.22 samples/sec   Loss 4.1473   LearningRate 0.0735   Epoch: 2   Global Step: 47660   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:15:15,936-Speed 3358.75 samples/sec   Loss 4.0970   LearningRate 0.0735   Epoch: 2   Global Step: 47670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:18,998-Speed 3344.61 samples/sec   Loss 4.2013   LearningRate 0.0735   Epoch: 2   Global Step: 47680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:22,075-Speed 3328.40 samples/sec   Loss 4.1601   LearningRate 0.0735   Epoch: 2   Global Step: 47690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:25,143-Speed 3338.05 samples/sec   Loss 4.1905   LearningRate 0.0735   Epoch: 2   Global Step: 47700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:28,222-Speed 3327.17 samples/sec   Loss 4.0832   LearningRate 0.0735   Epoch: 2   Global Step: 47710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:31,284-Speed 3345.54 samples/sec   Loss 4.1883   LearningRate 0.0735   Epoch: 2   Global Step: 47720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:34,365-Speed 3324.48 samples/sec   Loss 4.2381   LearningRate 0.0734   Epoch: 2   Global Step: 47730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:37,464-Speed 3304.43 samples/sec   Loss 4.1713   LearningRate 0.0734   Epoch: 2   Global Step: 47740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:40,564-Speed 3303.81 samples/sec   Loss 4.1189   LearningRate 0.0734   Epoch: 2   Global Step: 47750   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:43,626-Speed 3345.03 samples/sec   Loss 4.2037   LearningRate 0.0734   Epoch: 2   Global Step: 47760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:46,685-Speed 3348.65 samples/sec   Loss 4.2159   LearningRate 0.0734   Epoch: 2   Global Step: 47770   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:49,783-Speed 3306.66 samples/sec   Loss 4.2591   LearningRate 0.0734   Epoch: 2   Global Step: 47780   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:52,849-Speed 3340.20 samples/sec   Loss 4.1471   LearningRate 0.0734   Epoch: 2   Global Step: 47790   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:55,922-Speed 3332.87 samples/sec   Loss 4.1973   LearningRate 0.0734   Epoch: 2   Global Step: 47800   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:15:59,050-Speed 3274.38 samples/sec   Loss 4.1797   LearningRate 0.0734   Epoch: 2   Global Step: 47810   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:02,185-Speed 3267.14 samples/sec   Loss 4.1106   LearningRate 0.0734   Epoch: 2   Global Step: 47820   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:05,288-Speed 3300.84 samples/sec   Loss 4.1582   LearningRate 0.0734   Epoch: 2   Global Step: 47830   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:08,352-Speed 3342.17 samples/sec   Loss 4.2442   LearningRate 0.0734   Epoch: 2   Global Step: 47840   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:11,416-Speed 3343.65 samples/sec   Loss 4.1810   LearningRate 0.0734   Epoch: 2   Global Step: 47850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:14,481-Speed 3341.32 samples/sec   Loss 4.1169   LearningRate 0.0734   Epoch: 2   Global Step: 47860   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:17,562-Speed 3324.20 samples/sec   Loss 4.1429   LearningRate 0.0734   Epoch: 2   Global Step: 47870   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:16:20,616-Speed 3354.08 samples/sec   Loss 4.2606   LearningRate 0.0734   Epoch: 2   Global Step: 47880   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:23,689-Speed 3332.61 samples/sec   Loss 4.0918   LearningRate 0.0734   Epoch: 2   Global Step: 47890   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:26,750-Speed 3347.04 samples/sec   Loss 4.1365   LearningRate 0.0734   Epoch: 2   Global Step: 47900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:29,812-Speed 3344.89 samples/sec   Loss 4.1377   LearningRate 0.0734   Epoch: 2   Global Step: 47910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:32,872-Speed 3346.61 samples/sec   Loss 4.1553   LearningRate 0.0734   Epoch: 2   Global Step: 47920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:35,936-Speed 3343.38 samples/sec   Loss 4.0714   LearningRate 0.0733   Epoch: 2   Global Step: 47930   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:39,006-Speed 3336.36 samples/sec   Loss 4.2756   LearningRate 0.0733   Epoch: 2   Global Step: 47940   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:42,076-Speed 3336.30 samples/sec   Loss 4.1944   LearningRate 0.0733   Epoch: 2   Global Step: 47950   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:45,163-Speed 3317.72 samples/sec   Loss 4.0414   LearningRate 0.0733   Epoch: 2   Global Step: 47960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:48,347-Speed 3216.84 samples/sec   Loss 4.1119   LearningRate 0.0733   Epoch: 2   Global Step: 47970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:51,413-Speed 3341.03 samples/sec   Loss 4.0196   LearningRate 0.0733   Epoch: 2   Global Step: 47980   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:16:54,468-Speed 3353.66 samples/sec   Loss 4.1354   LearningRate 0.0733   Epoch: 2   Global Step: 47990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:16:57,537-Speed 3337.42 samples/sec   Loss 4.1321   LearningRate 0.0733   Epoch: 2   Global Step: 48000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:17:41,671-[lfw][48000]XNorm: 20.443697
Training: 2022-04-11 04:17:41,672-[lfw][48000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-11 04:17:41,672-[lfw][48000]Accuracy-Highest: 0.99783
Training: 2022-04-11 04:18:32,905-[cfp_fp][48000]XNorm: 18.593204
Training: 2022-04-11 04:18:32,906-[cfp_fp][48000]Accuracy-Flip: 0.98029+-0.00595
Training: 2022-04-11 04:18:32,906-[cfp_fp][48000]Accuracy-Highest: 0.98214
Training: 2022-04-11 04:19:16,982-[agedb_30][48000]XNorm: 20.881032
Training: 2022-04-11 04:19:16,983-[agedb_30][48000]Accuracy-Flip: 0.97783+-0.00882
Training: 2022-04-11 04:19:16,984-[agedb_30][48000]Accuracy-Highest: 0.97917
Training: 2022-04-11 04:19:20,068-Speed 71.84 samples/sec   Loss 4.0472   LearningRate 0.0733   Epoch: 2   Global Step: 48010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:19:23,142-Speed 3331.41 samples/sec   Loss 4.1594   LearningRate 0.0733   Epoch: 2   Global Step: 48020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:26,237-Speed 3309.12 samples/sec   Loss 4.0694   LearningRate 0.0733   Epoch: 2   Global Step: 48030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:29,360-Speed 3279.59 samples/sec   Loss 4.1240   LearningRate 0.0733   Epoch: 2   Global Step: 48040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:32,434-Speed 3332.41 samples/sec   Loss 4.1125   LearningRate 0.0733   Epoch: 2   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:35,488-Speed 3353.25 samples/sec   Loss 4.1475   LearningRate 0.0733   Epoch: 2   Global Step: 48060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:38,546-Speed 3349.21 samples/sec   Loss 4.1132   LearningRate 0.0733   Epoch: 2   Global Step: 48070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:41,600-Speed 3354.46 samples/sec   Loss 4.1039   LearningRate 0.0733   Epoch: 2   Global Step: 48080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:44,664-Speed 3342.28 samples/sec   Loss 4.0294   LearningRate 0.0733   Epoch: 2   Global Step: 48090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:47,721-Speed 3351.08 samples/sec   Loss 4.1562   LearningRate 0.0733   Epoch: 2   Global Step: 48100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:50,787-Speed 3340.13 samples/sec   Loss 4.1061   LearningRate 0.0733   Epoch: 2   Global Step: 48110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:19:53,842-Speed 3352.89 samples/sec   Loss 4.1189   LearningRate 0.0732   Epoch: 2   Global Step: 48120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:19:56,916-Speed 3332.44 samples/sec   Loss 4.1685   LearningRate 0.0732   Epoch: 2   Global Step: 48130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:00,008-Speed 3312.31 samples/sec   Loss 4.0545   LearningRate 0.0732   Epoch: 2   Global Step: 48140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:03,088-Speed 3325.22 samples/sec   Loss 4.1236   LearningRate 0.0732   Epoch: 2   Global Step: 48150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:06,161-Speed 3332.72 samples/sec   Loss 4.1858   LearningRate 0.0732   Epoch: 2   Global Step: 48160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:09,228-Speed 3340.39 samples/sec   Loss 4.2080   LearningRate 0.0732   Epoch: 2   Global Step: 48170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:12,411-Speed 3217.06 samples/sec   Loss 4.1093   LearningRate 0.0732   Epoch: 2   Global Step: 48180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:15,471-Speed 3348.00 samples/sec   Loss 4.1124   LearningRate 0.0732   Epoch: 2   Global Step: 48190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:18,537-Speed 3340.16 samples/sec   Loss 4.2050   LearningRate 0.0732   Epoch: 2   Global Step: 48200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:21,606-Speed 3337.24 samples/sec   Loss 4.0501   LearningRate 0.0732   Epoch: 2   Global Step: 48210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:24,661-Speed 3352.95 samples/sec   Loss 4.1443   LearningRate 0.0732   Epoch: 2   Global Step: 48220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:27,734-Speed 3333.48 samples/sec   Loss 4.1463   LearningRate 0.0732   Epoch: 2   Global Step: 48230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:30,803-Speed 3337.49 samples/sec   Loss 4.1168   LearningRate 0.0732   Epoch: 2   Global Step: 48240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:33,879-Speed 3329.08 samples/sec   Loss 4.0670   LearningRate 0.0732   Epoch: 2   Global Step: 48250   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:36,958-Speed 3326.88 samples/sec   Loss 4.1902   LearningRate 0.0732   Epoch: 2   Global Step: 48260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:40,027-Speed 3337.44 samples/sec   Loss 4.1488   LearningRate 0.0732   Epoch: 2   Global Step: 48270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:43,097-Speed 3336.89 samples/sec   Loss 4.1310   LearningRate 0.0732   Epoch: 2   Global Step: 48280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:46,179-Speed 3323.39 samples/sec   Loss 4.1522   LearningRate 0.0732   Epoch: 2   Global Step: 48290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:49,245-Speed 3340.39 samples/sec   Loss 4.0819   LearningRate 0.0732   Epoch: 2   Global Step: 48300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:52,320-Speed 3330.66 samples/sec   Loss 4.1492   LearningRate 0.0732   Epoch: 2   Global Step: 48310   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:20:55,390-Speed 3336.65 samples/sec   Loss 4.1308   LearningRate 0.0731   Epoch: 2   Global Step: 48320   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:20:58,448-Speed 3348.58 samples/sec   Loss 4.1793   LearningRate 0.0731   Epoch: 2   Global Step: 48330   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:21:01,528-Speed 3326.23 samples/sec   Loss 4.0181   LearningRate 0.0731   Epoch: 2   Global Step: 48340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:21:04,612-Speed 3321.09 samples/sec   Loss 4.1137   LearningRate 0.0731   Epoch: 2   Global Step: 48350   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:21:07,715-Speed 3300.30 samples/sec   Loss 4.1884   LearningRate 0.0731   Epoch: 2   Global Step: 48360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:21:10,792-Speed 3328.57 samples/sec   Loss 4.1288   LearningRate 0.0731   Epoch: 2   Global Step: 48370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:21:13,869-Speed 3329.44 samples/sec   Loss 3.9360   LearningRate 0.0731   Epoch: 2   Global Step: 48380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:21:16,938-Speed 3337.39 samples/sec   Loss 4.0380   LearningRate 0.0731   Epoch: 2   Global Step: 48390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:21:19,996-Speed 3349.50 samples/sec   Loss 4.1584   LearningRate 0.0731   Epoch: 2   Global Step: 48400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:21:23,079-Speed 3321.40 samples/sec   Loss 4.0954   LearningRate 0.0731   Epoch: 2   Global Step: 48410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:21:26,173-Speed 3310.42 samples/sec   Loss 4.1588   LearningRate 0.0731   Epoch: 2   Global Step: 48420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:21:29,265-Speed 3313.37 samples/sec   Loss 4.0473   LearningRate 0.0731   Epoch: 2   Global Step: 48430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:21:32,323-Speed 3349.46 samples/sec   Loss 4.0693   LearningRate 0.0731   Epoch: 2   Global Step: 48440   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:21:35,391-Speed 3338.74 samples/sec   Loss 4.1313   LearningRate 0.0731   Epoch: 2   Global Step: 48450   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:21:38,469-Speed 3327.90 samples/sec   Loss 4.1054   LearningRate 0.0731   Epoch: 2   Global Step: 48460   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:21:41,540-Speed 3335.32 samples/sec   Loss 4.0372   LearningRate 0.0731   Epoch: 2   Global Step: 48470   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:21:44,634-Speed 3310.11 samples/sec   Loss 4.1064   LearningRate 0.0731   Epoch: 2   Global Step: 48480   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:21:47,707-Speed 3333.54 samples/sec   Loss 4.1443   LearningRate 0.0731   Epoch: 2   Global Step: 48490   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:21:50,780-Speed 3332.96 samples/sec   Loss 4.1816   LearningRate 0.0731   Epoch: 2   Global Step: 48500   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:21:53,859-Speed 3326.18 samples/sec   Loss 4.1213   LearningRate 0.0730   Epoch: 2   Global Step: 48510   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:21:56,969-Speed 3293.68 samples/sec   Loss 4.1002   LearningRate 0.0730   Epoch: 2   Global Step: 48520   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:22:00,045-Speed 3329.79 samples/sec   Loss 4.0143   LearningRate 0.0730   Epoch: 2   Global Step: 48530   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-11 04:22:03,118-Speed 3333.44 samples/sec   Loss 4.1340   LearningRate 0.0730   Epoch: 2   Global Step: 48540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:06,208-Speed 3314.72 samples/sec   Loss 4.1174   LearningRate 0.0730   Epoch: 2   Global Step: 48550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:09,406-Speed 3202.33 samples/sec   Loss 4.1639   LearningRate 0.0730   Epoch: 2   Global Step: 48560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:12,498-Speed 3312.65 samples/sec   Loss 4.1325   LearningRate 0.0730   Epoch: 2   Global Step: 48570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:15,567-Speed 3336.71 samples/sec   Loss 4.0923   LearningRate 0.0730   Epoch: 2   Global Step: 48580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:18,635-Speed 3338.30 samples/sec   Loss 4.0700   LearningRate 0.0730   Epoch: 2   Global Step: 48590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:21,706-Speed 3335.21 samples/sec   Loss 4.1134   LearningRate 0.0730   Epoch: 2   Global Step: 48600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:24,799-Speed 3311.70 samples/sec   Loss 4.1573   LearningRate 0.0730   Epoch: 2   Global Step: 48610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:27,870-Speed 3335.98 samples/sec   Loss 4.2322   LearningRate 0.0730   Epoch: 2   Global Step: 48620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:30,937-Speed 3339.49 samples/sec   Loss 4.1102   LearningRate 0.0730   Epoch: 2   Global Step: 48630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:22:34,062-Speed 3278.11 samples/sec   Loss 4.1391   LearningRate 0.0730   Epoch: 2   Global Step: 48640   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:22:37,146-Speed 3320.24 samples/sec   Loss 4.1667   LearningRate 0.0730   Epoch: 2   Global Step: 48650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:22:40,219-Speed 3333.62 samples/sec   Loss 4.0836   LearningRate 0.0730   Epoch: 2   Global Step: 48660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:22:43,292-Speed 3333.16 samples/sec   Loss 4.0971   LearningRate 0.0730   Epoch: 2   Global Step: 48670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:22:46,368-Speed 3329.65 samples/sec   Loss 4.1118   LearningRate 0.0730   Epoch: 2   Global Step: 48680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:22:49,449-Speed 3323.86 samples/sec   Loss 3.9824   LearningRate 0.0730   Epoch: 2   Global Step: 48690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:22:52,532-Speed 3322.38 samples/sec   Loss 4.1251   LearningRate 0.0730   Epoch: 2   Global Step: 48700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:22:55,606-Speed 3333.01 samples/sec   Loss 4.1812   LearningRate 0.0729   Epoch: 2   Global Step: 48710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:22:58,696-Speed 3314.15 samples/sec   Loss 4.0700   LearningRate 0.0729   Epoch: 2   Global Step: 48720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:23:01,772-Speed 3329.93 samples/sec   Loss 4.0793   LearningRate 0.0729   Epoch: 2   Global Step: 48730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:23:04,846-Speed 3331.41 samples/sec   Loss 4.0587   LearningRate 0.0729   Epoch: 2   Global Step: 48740   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:23:07,906-Speed 3347.44 samples/sec   Loss 4.1660   LearningRate 0.0729   Epoch: 2   Global Step: 48750   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:23:10,977-Speed 3334.54 samples/sec   Loss 4.0290   LearningRate 0.0729   Epoch: 2   Global Step: 48760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:23:14,045-Speed 3338.44 samples/sec   Loss 4.0507   LearningRate 0.0729   Epoch: 2   Global Step: 48770   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:23:17,129-Speed 3321.97 samples/sec   Loss 4.1227   LearningRate 0.0729   Epoch: 2   Global Step: 48780   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:23:20,270-Speed 3260.76 samples/sec   Loss 4.0165   LearningRate 0.0729   Epoch: 2   Global Step: 48790   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:23:23,357-Speed 3317.26 samples/sec   Loss 4.1291   LearningRate 0.0729   Epoch: 2   Global Step: 48800   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:23:26,411-Speed 3353.62 samples/sec   Loss 4.1505   LearningRate 0.0729   Epoch: 2   Global Step: 48810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:29,478-Speed 3340.19 samples/sec   Loss 4.0968   LearningRate 0.0729   Epoch: 2   Global Step: 48820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:32,566-Speed 3316.93 samples/sec   Loss 4.1297   LearningRate 0.0729   Epoch: 2   Global Step: 48830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:35,632-Speed 3340.56 samples/sec   Loss 3.9992   LearningRate 0.0729   Epoch: 2   Global Step: 48840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:38,745-Speed 3290.39 samples/sec   Loss 4.1123   LearningRate 0.0729   Epoch: 2   Global Step: 48850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:41,813-Speed 3338.11 samples/sec   Loss 4.0236   LearningRate 0.0729   Epoch: 2   Global Step: 48860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:44,877-Speed 3342.48 samples/sec   Loss 4.0824   LearningRate 0.0729   Epoch: 2   Global Step: 48870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:47,949-Speed 3334.82 samples/sec   Loss 4.1345   LearningRate 0.0729   Epoch: 2   Global Step: 48880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:51,014-Speed 3341.43 samples/sec   Loss 4.1159   LearningRate 0.0729   Epoch: 2   Global Step: 48890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:54,073-Speed 3348.31 samples/sec   Loss 4.1076   LearningRate 0.0728   Epoch: 2   Global Step: 48900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:23:57,126-Speed 3354.70 samples/sec   Loss 4.0799   LearningRate 0.0728   Epoch: 2   Global Step: 48910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:00,188-Speed 3345.17 samples/sec   Loss 4.0540   LearningRate 0.0728   Epoch: 2   Global Step: 48920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:03,253-Speed 3341.58 samples/sec   Loss 4.0336   LearningRate 0.0728   Epoch: 2   Global Step: 48930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:06,329-Speed 3329.71 samples/sec   Loss 4.1261   LearningRate 0.0728   Epoch: 2   Global Step: 48940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:09,403-Speed 3331.95 samples/sec   Loss 4.1380   LearningRate 0.0728   Epoch: 2   Global Step: 48950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:12,473-Speed 3336.17 samples/sec   Loss 4.0631   LearningRate 0.0728   Epoch: 2   Global Step: 48960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:15,549-Speed 3329.50 samples/sec   Loss 4.0245   LearningRate 0.0728   Epoch: 2   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:18,642-Speed 3312.38 samples/sec   Loss 4.1431   LearningRate 0.0728   Epoch: 2   Global Step: 48980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:21,741-Speed 3304.51 samples/sec   Loss 4.1114   LearningRate 0.0728   Epoch: 2   Global Step: 48990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:24,844-Speed 3301.18 samples/sec   Loss 4.1127   LearningRate 0.0728   Epoch: 2   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:24:27,916-Speed 3333.98 samples/sec   Loss 4.1209   LearningRate 0.0728   Epoch: 2   Global Step: 49010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:30,994-Speed 3326.72 samples/sec   Loss 4.0757   LearningRate 0.0728   Epoch: 2   Global Step: 49020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:34,060-Speed 3340.74 samples/sec   Loss 4.0189   LearningRate 0.0728   Epoch: 2   Global Step: 49030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:37,137-Speed 3329.63 samples/sec   Loss 4.0508   LearningRate 0.0728   Epoch: 2   Global Step: 49040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:40,209-Speed 3333.99 samples/sec   Loss 4.1155   LearningRate 0.0728   Epoch: 2   Global Step: 49050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:43,304-Speed 3309.33 samples/sec   Loss 4.0900   LearningRate 0.0728   Epoch: 2   Global Step: 49060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:46,428-Speed 3279.43 samples/sec   Loss 4.0880   LearningRate 0.0728   Epoch: 2   Global Step: 49070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:49,499-Speed 3335.29 samples/sec   Loss 4.0405   LearningRate 0.0728   Epoch: 2   Global Step: 49080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:52,573-Speed 3331.26 samples/sec   Loss 4.0421   LearningRate 0.0728   Epoch: 2   Global Step: 49090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:55,657-Speed 3321.52 samples/sec   Loss 4.1536   LearningRate 0.0727   Epoch: 2   Global Step: 49100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:24:58,738-Speed 3324.73 samples/sec   Loss 4.0894   LearningRate 0.0727   Epoch: 2   Global Step: 49110   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:25:01,811-Speed 3332.66 samples/sec   Loss 4.1176   LearningRate 0.0727   Epoch: 2   Global Step: 49120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:25:04,897-Speed 3319.81 samples/sec   Loss 4.2075   LearningRate 0.0727   Epoch: 2   Global Step: 49130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:25:07,967-Speed 3336.71 samples/sec   Loss 4.1233   LearningRate 0.0727   Epoch: 2   Global Step: 49140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:25:11,031-Speed 3342.01 samples/sec   Loss 4.0821   LearningRate 0.0727   Epoch: 2   Global Step: 49150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:25:14,103-Speed 3334.41 samples/sec   Loss 4.0744   LearningRate 0.0727   Epoch: 2   Global Step: 49160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:17,175-Speed 3334.30 samples/sec   Loss 4.1071   LearningRate 0.0727   Epoch: 2   Global Step: 49170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:20,256-Speed 3324.08 samples/sec   Loss 4.1089   LearningRate 0.0727   Epoch: 2   Global Step: 49180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:23,410-Speed 3247.37 samples/sec   Loss 4.1632   LearningRate 0.0727   Epoch: 2   Global Step: 49190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:26,558-Speed 3253.09 samples/sec   Loss 4.0474   LearningRate 0.0727   Epoch: 2   Global Step: 49200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:29,626-Speed 3339.06 samples/sec   Loss 4.1331   LearningRate 0.0727   Epoch: 2   Global Step: 49210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:32,703-Speed 3328.25 samples/sec   Loss 3.9988   LearningRate 0.0727   Epoch: 2   Global Step: 49220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:35,768-Speed 3342.44 samples/sec   Loss 4.0795   LearningRate 0.0727   Epoch: 2   Global Step: 49230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:38,903-Speed 3266.92 samples/sec   Loss 4.0735   LearningRate 0.0727   Epoch: 2   Global Step: 49240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:42,033-Speed 3272.27 samples/sec   Loss 4.0045   LearningRate 0.0727   Epoch: 2   Global Step: 49250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:45,102-Speed 3336.80 samples/sec   Loss 4.0635   LearningRate 0.0727   Epoch: 2   Global Step: 49260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:48,177-Speed 3331.47 samples/sec   Loss 4.1971   LearningRate 0.0727   Epoch: 2   Global Step: 49270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:51,256-Speed 3326.05 samples/sec   Loss 4.0933   LearningRate 0.0727   Epoch: 2   Global Step: 49280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:54,393-Speed 3264.81 samples/sec   Loss 4.1336   LearningRate 0.0726   Epoch: 2   Global Step: 49290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:25:57,491-Speed 3306.78 samples/sec   Loss 4.1019   LearningRate 0.0726   Epoch: 2   Global Step: 49300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:00,562-Speed 3335.21 samples/sec   Loss 4.0502   LearningRate 0.0726   Epoch: 2   Global Step: 49310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:03,682-Speed 3283.33 samples/sec   Loss 4.0999   LearningRate 0.0726   Epoch: 2   Global Step: 49320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:06,773-Speed 3313.05 samples/sec   Loss 4.0793   LearningRate 0.0726   Epoch: 2   Global Step: 49330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:09,863-Speed 3314.85 samples/sec   Loss 4.0951   LearningRate 0.0726   Epoch: 2   Global Step: 49340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:13,006-Speed 3259.04 samples/sec   Loss 4.1253   LearningRate 0.0726   Epoch: 2   Global Step: 49350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:16,137-Speed 3270.66 samples/sec   Loss 4.0959   LearningRate 0.0726   Epoch: 2   Global Step: 49360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:26:19,221-Speed 3321.09 samples/sec   Loss 4.0407   LearningRate 0.0726   Epoch: 2   Global Step: 49370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:26:22,298-Speed 3328.70 samples/sec   Loss 4.1258   LearningRate 0.0726   Epoch: 2   Global Step: 49380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:26:25,442-Speed 3257.75 samples/sec   Loss 4.1280   LearningRate 0.0726   Epoch: 2   Global Step: 49390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:26:28,534-Speed 3312.61 samples/sec   Loss 4.1534   LearningRate 0.0726   Epoch: 2   Global Step: 49400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:26:31,660-Speed 3276.99 samples/sec   Loss 4.1098   LearningRate 0.0726   Epoch: 2   Global Step: 49410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:34,735-Speed 3330.76 samples/sec   Loss 4.1585   LearningRate 0.0726   Epoch: 2   Global Step: 49420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:37,803-Speed 3337.94 samples/sec   Loss 4.0553   LearningRate 0.0726   Epoch: 2   Global Step: 49430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:40,927-Speed 3279.23 samples/sec   Loss 4.1272   LearningRate 0.0726   Epoch: 2   Global Step: 49440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:44,012-Speed 3319.92 samples/sec   Loss 4.0737   LearningRate 0.0726   Epoch: 2   Global Step: 49450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:47,095-Speed 3321.27 samples/sec   Loss 4.0992   LearningRate 0.0726   Epoch: 2   Global Step: 49460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:50,163-Speed 3339.38 samples/sec   Loss 3.9576   LearningRate 0.0726   Epoch: 2   Global Step: 49470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:53,304-Speed 3261.15 samples/sec   Loss 4.0864   LearningRate 0.0726   Epoch: 2   Global Step: 49480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:56,371-Speed 3338.89 samples/sec   Loss 4.0888   LearningRate 0.0725   Epoch: 2   Global Step: 49490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:26:59,436-Speed 3341.70 samples/sec   Loss 4.1065   LearningRate 0.0725   Epoch: 2   Global Step: 49500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:27:02,527-Speed 3314.28 samples/sec   Loss 4.0472   LearningRate 0.0725   Epoch: 2   Global Step: 49510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:05,592-Speed 3341.69 samples/sec   Loss 4.0767   LearningRate 0.0725   Epoch: 2   Global Step: 49520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:08,680-Speed 3316.99 samples/sec   Loss 4.1656   LearningRate 0.0725   Epoch: 2   Global Step: 49530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:11,769-Speed 3314.83 samples/sec   Loss 4.0412   LearningRate 0.0725   Epoch: 2   Global Step: 49540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:14,845-Speed 3329.93 samples/sec   Loss 4.1331   LearningRate 0.0725   Epoch: 2   Global Step: 49550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:17,935-Speed 3314.42 samples/sec   Loss 4.0980   LearningRate 0.0725   Epoch: 2   Global Step: 49560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:21,007-Speed 3334.94 samples/sec   Loss 4.1047   LearningRate 0.0725   Epoch: 2   Global Step: 49570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:24,107-Speed 3303.90 samples/sec   Loss 4.0416   LearningRate 0.0725   Epoch: 2   Global Step: 49580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:27,193-Speed 3318.31 samples/sec   Loss 4.0809   LearningRate 0.0725   Epoch: 2   Global Step: 49590   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:30,282-Speed 3316.45 samples/sec   Loss 4.0842   LearningRate 0.0725   Epoch: 2   Global Step: 49600   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:33,432-Speed 3251.14 samples/sec   Loss 4.1587   LearningRate 0.0725   Epoch: 2   Global Step: 49610   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:27:36,496-Speed 3343.33 samples/sec   Loss 4.0912   LearningRate 0.0725   Epoch: 2   Global Step: 49620   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:39,566-Speed 3336.23 samples/sec   Loss 4.0708   LearningRate 0.0725   Epoch: 2   Global Step: 49630   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:42,646-Speed 3325.06 samples/sec   Loss 4.0251   LearningRate 0.0725   Epoch: 2   Global Step: 49640   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:45,728-Speed 3323.09 samples/sec   Loss 4.1365   LearningRate 0.0725   Epoch: 2   Global Step: 49650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:48,804-Speed 3330.58 samples/sec   Loss 4.0319   LearningRate 0.0725   Epoch: 2   Global Step: 49660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:52,011-Speed 3193.07 samples/sec   Loss 4.0847   LearningRate 0.0725   Epoch: 2   Global Step: 49670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:55,142-Speed 3271.56 samples/sec   Loss 4.0547   LearningRate 0.0725   Epoch: 2   Global Step: 49680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:27:58,321-Speed 3221.70 samples/sec   Loss 4.1507   LearningRate 0.0724   Epoch: 2   Global Step: 49690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:28:01,567-Speed 3155.48 samples/sec   Loss 4.0984   LearningRate 0.0724   Epoch: 2   Global Step: 49700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:28:04,635-Speed 3338.11 samples/sec   Loss 4.1213   LearningRate 0.0724   Epoch: 2   Global Step: 49710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:28:07,706-Speed 3335.86 samples/sec   Loss 4.1164   LearningRate 0.0724   Epoch: 2   Global Step: 49720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:28:10,780-Speed 3331.27 samples/sec   Loss 4.1120   LearningRate 0.0724   Epoch: 2   Global Step: 49730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:28:13,864-Speed 3321.86 samples/sec   Loss 4.1639   LearningRate 0.0724   Epoch: 2   Global Step: 49740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:28:16,955-Speed 3313.81 samples/sec   Loss 4.0425   LearningRate 0.0724   Epoch: 2   Global Step: 49750   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:28:20,009-Speed 3354.34 samples/sec   Loss 4.1032   LearningRate 0.0724   Epoch: 2   Global Step: 49760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:23,105-Speed 3308.33 samples/sec   Loss 4.0868   LearningRate 0.0724   Epoch: 2   Global Step: 49770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:26,176-Speed 3334.67 samples/sec   Loss 4.1205   LearningRate 0.0724   Epoch: 2   Global Step: 49780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:29,240-Speed 3342.72 samples/sec   Loss 4.1270   LearningRate 0.0724   Epoch: 2   Global Step: 49790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:32,305-Speed 3342.27 samples/sec   Loss 4.0376   LearningRate 0.0724   Epoch: 2   Global Step: 49800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:35,376-Speed 3335.03 samples/sec   Loss 4.0540   LearningRate 0.0724   Epoch: 2   Global Step: 49810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:38,446-Speed 3336.01 samples/sec   Loss 4.0818   LearningRate 0.0724   Epoch: 2   Global Step: 49820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:41,512-Speed 3341.45 samples/sec   Loss 4.0900   LearningRate 0.0724   Epoch: 2   Global Step: 49830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:44,588-Speed 3329.82 samples/sec   Loss 4.1085   LearningRate 0.0724   Epoch: 2   Global Step: 49840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:47,656-Speed 3337.89 samples/sec   Loss 3.9773   LearningRate 0.0724   Epoch: 2   Global Step: 49850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:28:50,764-Speed 3295.90 samples/sec   Loss 4.0642   LearningRate 0.0724   Epoch: 2   Global Step: 49860   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:28:53,893-Speed 3273.09 samples/sec   Loss 3.9938   LearningRate 0.0724   Epoch: 2   Global Step: 49870   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:28:57,012-Speed 3283.22 samples/sec   Loss 4.0242   LearningRate 0.0723   Epoch: 2   Global Step: 49880   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:29:00,077-Speed 3342.08 samples/sec   Loss 4.0303   LearningRate 0.0723   Epoch: 2   Global Step: 49890   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:29:03,153-Speed 3330.38 samples/sec   Loss 4.0025   LearningRate 0.0723   Epoch: 2   Global Step: 49900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:29:06,270-Speed 3285.28 samples/sec   Loss 4.0495   LearningRate 0.0723   Epoch: 2   Global Step: 49910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:29:09,358-Speed 3317.55 samples/sec   Loss 4.0216   LearningRate 0.0723   Epoch: 2   Global Step: 49920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:29:12,467-Speed 3294.09 samples/sec   Loss 4.0478   LearningRate 0.0723   Epoch: 2   Global Step: 49930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:29:15,566-Speed 3305.07 samples/sec   Loss 4.0442   LearningRate 0.0723   Epoch: 2   Global Step: 49940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:29:18,635-Speed 3337.13 samples/sec   Loss 4.1036   LearningRate 0.0723   Epoch: 2   Global Step: 49950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:29:21,708-Speed 3332.87 samples/sec   Loss 4.0775   LearningRate 0.0723   Epoch: 2   Global Step: 49960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:29:24,792-Speed 3321.40 samples/sec   Loss 4.0741   LearningRate 0.0723   Epoch: 2   Global Step: 49970   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:29:27,909-Speed 3285.96 samples/sec   Loss 4.0474   LearningRate 0.0723   Epoch: 2   Global Step: 49980   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:29:30,986-Speed 3328.27 samples/sec   Loss 4.0825   LearningRate 0.0723   Epoch: 2   Global Step: 49990   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:29:34,073-Speed 3317.97 samples/sec   Loss 4.0659   LearningRate 0.0723   Epoch: 2   Global Step: 50000   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:30:17,987-[lfw][50000]XNorm: 22.063981
Training: 2022-04-11 04:30:17,987-[lfw][50000]Accuracy-Flip: 0.99733+-0.00309
Training: 2022-04-11 04:30:17,988-[lfw][50000]Accuracy-Highest: 0.99783
Training: 2022-04-11 04:31:09,051-[cfp_fp][50000]XNorm: 20.598621
Training: 2022-04-11 04:31:09,052-[cfp_fp][50000]Accuracy-Flip: 0.98300+-0.00627
Training: 2022-04-11 04:31:09,052-[cfp_fp][50000]Accuracy-Highest: 0.98300
Training: 2022-04-11 04:31:52,664-[agedb_30][50000]XNorm: 22.280248
Training: 2022-04-11 04:31:52,665-[agedb_30][50000]Accuracy-Flip: 0.97917+-0.00672
Training: 2022-04-11 04:31:52,665-[agedb_30][50000]Accuracy-Highest: 0.97917
Training: 2022-04-11 04:31:55,745-Speed 72.28 samples/sec   Loss 4.1295   LearningRate 0.0723   Epoch: 2   Global Step: 50010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:31:58,812-Speed 3339.54 samples/sec   Loss 4.0550   LearningRate 0.0723   Epoch: 2   Global Step: 50020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:32:01,893-Speed 3324.00 samples/sec   Loss 4.1668   LearningRate 0.0723   Epoch: 2   Global Step: 50030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:32:04,961-Speed 3338.44 samples/sec   Loss 4.0456   LearningRate 0.0723   Epoch: 2   Global Step: 50040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:32:08,039-Speed 3327.68 samples/sec   Loss 4.1150   LearningRate 0.0723   Epoch: 2   Global Step: 50050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:32:11,098-Speed 3349.11 samples/sec   Loss 3.9746   LearningRate 0.0723   Epoch: 2   Global Step: 50060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:32:14,460-Speed 3046.29 samples/sec   Loss 4.0028   LearningRate 0.0723   Epoch: 2   Global Step: 50070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:01,883-Speed 215.93 samples/sec   Loss 3.6089   LearningRate 0.0722   Epoch: 3   Global Step: 50080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:05,102-Speed 3182.04 samples/sec   Loss 3.4633   LearningRate 0.0722   Epoch: 3   Global Step: 50090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:08,241-Speed 3262.85 samples/sec   Loss 3.4877   LearningRate 0.0722   Epoch: 3   Global Step: 50100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:11,346-Speed 3299.13 samples/sec   Loss 3.3707   LearningRate 0.0722   Epoch: 3   Global Step: 50110   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:14,447-Speed 3302.98 samples/sec   Loss 3.4372   LearningRate 0.0722   Epoch: 3   Global Step: 50120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:17,505-Speed 3349.11 samples/sec   Loss 3.4474   LearningRate 0.0722   Epoch: 3   Global Step: 50130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:20,562-Speed 3351.07 samples/sec   Loss 3.3820   LearningRate 0.0722   Epoch: 3   Global Step: 50140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:23,614-Speed 3356.35 samples/sec   Loss 3.4653   LearningRate 0.0722   Epoch: 3   Global Step: 50150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:26,681-Speed 3339.12 samples/sec   Loss 3.3989   LearningRate 0.0722   Epoch: 3   Global Step: 50160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:29,740-Speed 3347.82 samples/sec   Loss 3.5042   LearningRate 0.0722   Epoch: 3   Global Step: 50170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:32,849-Speed 3294.89 samples/sec   Loss 3.4525   LearningRate 0.0722   Epoch: 3   Global Step: 50180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:36,097-Speed 3153.36 samples/sec   Loss 3.3823   LearningRate 0.0722   Epoch: 3   Global Step: 50190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:39,610-Speed 2915.21 samples/sec   Loss 3.4399   LearningRate 0.0722   Epoch: 3   Global Step: 50200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:42,673-Speed 3344.25 samples/sec   Loss 3.3842   LearningRate 0.0722   Epoch: 3   Global Step: 50210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:45,741-Speed 3338.86 samples/sec   Loss 3.4522   LearningRate 0.0722   Epoch: 3   Global Step: 50220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:48,823-Speed 3323.78 samples/sec   Loss 3.4336   LearningRate 0.0722   Epoch: 3   Global Step: 50230   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:33:51,906-Speed 3322.45 samples/sec   Loss 3.4357   LearningRate 0.0722   Epoch: 3   Global Step: 50240   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:33:54,958-Speed 3355.65 samples/sec   Loss 3.4380   LearningRate 0.0722   Epoch: 3   Global Step: 50250   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:33:58,035-Speed 3329.00 samples/sec   Loss 3.4712   LearningRate 0.0722   Epoch: 3   Global Step: 50260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:01,103-Speed 3338.04 samples/sec   Loss 3.4125   LearningRate 0.0721   Epoch: 3   Global Step: 50270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:04,172-Speed 3337.20 samples/sec   Loss 3.4293   LearningRate 0.0721   Epoch: 3   Global Step: 50280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:07,384-Speed 3188.99 samples/sec   Loss 3.4237   LearningRate 0.0721   Epoch: 3   Global Step: 50290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:10,466-Speed 3322.97 samples/sec   Loss 3.4524   LearningRate 0.0721   Epoch: 3   Global Step: 50300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:13,563-Speed 3307.24 samples/sec   Loss 3.4428   LearningRate 0.0721   Epoch: 3   Global Step: 50310   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:16,640-Speed 3330.25 samples/sec   Loss 3.4002   LearningRate 0.0721   Epoch: 3   Global Step: 50320   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:19,722-Speed 3323.51 samples/sec   Loss 3.4830   LearningRate 0.0721   Epoch: 3   Global Step: 50330   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:22,799-Speed 3328.11 samples/sec   Loss 3.4682   LearningRate 0.0721   Epoch: 3   Global Step: 50340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:25,924-Speed 3278.00 samples/sec   Loss 3.3946   LearningRate 0.0721   Epoch: 3   Global Step: 50350   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:34:28,989-Speed 3341.51 samples/sec   Loss 3.5278   LearningRate 0.0721   Epoch: 3   Global Step: 50360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:32,051-Speed 3344.44 samples/sec   Loss 3.5222   LearningRate 0.0721   Epoch: 3   Global Step: 50370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:35,123-Speed 3334.41 samples/sec   Loss 3.4794   LearningRate 0.0721   Epoch: 3   Global Step: 50380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:38,205-Speed 3322.66 samples/sec   Loss 3.4669   LearningRate 0.0721   Epoch: 3   Global Step: 50390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:41,271-Speed 3341.14 samples/sec   Loss 3.4205   LearningRate 0.0721   Epoch: 3   Global Step: 50400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:44,335-Speed 3342.70 samples/sec   Loss 3.4930   LearningRate 0.0721   Epoch: 3   Global Step: 50410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:47,404-Speed 3338.45 samples/sec   Loss 3.4938   LearningRate 0.0721   Epoch: 3   Global Step: 50420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:50,504-Speed 3302.92 samples/sec   Loss 3.4217   LearningRate 0.0721   Epoch: 3   Global Step: 50430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:53,678-Speed 3226.88 samples/sec   Loss 3.4859   LearningRate 0.0721   Epoch: 3   Global Step: 50440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:34:56,915-Speed 3164.85 samples/sec   Loss 3.4159   LearningRate 0.0721   Epoch: 3   Global Step: 50450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:35:00,076-Speed 3240.19 samples/sec   Loss 3.3943   LearningRate 0.0721   Epoch: 3   Global Step: 50460   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:35:03,206-Speed 3272.16 samples/sec   Loss 3.4529   LearningRate 0.0720   Epoch: 3   Global Step: 50470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:35:06,270-Speed 3342.47 samples/sec   Loss 3.4278   LearningRate 0.0720   Epoch: 3   Global Step: 50480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:35:09,361-Speed 3313.92 samples/sec   Loss 3.4376   LearningRate 0.0720   Epoch: 3   Global Step: 50490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:35:12,431-Speed 3336.20 samples/sec   Loss 3.5312   LearningRate 0.0720   Epoch: 3   Global Step: 50500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:35:15,533-Speed 3302.53 samples/sec   Loss 3.4155   LearningRate 0.0720   Epoch: 3   Global Step: 50510   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:35:18,619-Speed 3318.79 samples/sec   Loss 3.5653   LearningRate 0.0720   Epoch: 3   Global Step: 50520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:35:21,720-Speed 3302.18 samples/sec   Loss 3.5333   LearningRate 0.0720   Epoch: 3   Global Step: 50530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:35:24,862-Speed 3260.22 samples/sec   Loss 3.4932   LearningRate 0.0720   Epoch: 3   Global Step: 50540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:35:27,921-Speed 3348.43 samples/sec   Loss 3.4952   LearningRate 0.0720   Epoch: 3   Global Step: 50550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:31,041-Speed 3282.58 samples/sec   Loss 3.4786   LearningRate 0.0720   Epoch: 3   Global Step: 50560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:34,179-Speed 3263.62 samples/sec   Loss 3.5039   LearningRate 0.0720   Epoch: 3   Global Step: 50570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:37,320-Speed 3261.60 samples/sec   Loss 3.5002   LearningRate 0.0720   Epoch: 3   Global Step: 50580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:40,465-Speed 3256.60 samples/sec   Loss 3.5037   LearningRate 0.0720   Epoch: 3   Global Step: 50590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:43,554-Speed 3315.37 samples/sec   Loss 3.4872   LearningRate 0.0720   Epoch: 3   Global Step: 50600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:46,631-Speed 3329.23 samples/sec   Loss 3.4447   LearningRate 0.0720   Epoch: 3   Global Step: 50610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:49,701-Speed 3336.48 samples/sec   Loss 3.5338   LearningRate 0.0720   Epoch: 3   Global Step: 50620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:52,773-Speed 3333.99 samples/sec   Loss 3.5726   LearningRate 0.0720   Epoch: 3   Global Step: 50630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:55,935-Speed 3238.40 samples/sec   Loss 3.4363   LearningRate 0.0720   Epoch: 3   Global Step: 50640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:35:59,014-Speed 3326.90 samples/sec   Loss 3.5106   LearningRate 0.0720   Epoch: 3   Global Step: 50650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:36:02,094-Speed 3325.78 samples/sec   Loss 3.4470   LearningRate 0.0720   Epoch: 3   Global Step: 50660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:36:05,205-Speed 3292.83 samples/sec   Loss 3.4357   LearningRate 0.0719   Epoch: 3   Global Step: 50670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:36:08,306-Speed 3302.49 samples/sec   Loss 3.4564   LearningRate 0.0719   Epoch: 3   Global Step: 50680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:36:11,431-Speed 3277.32 samples/sec   Loss 3.4240   LearningRate 0.0719   Epoch: 3   Global Step: 50690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:36:14,522-Speed 3314.12 samples/sec   Loss 3.4735   LearningRate 0.0719   Epoch: 3   Global Step: 50700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:36:17,590-Speed 3338.01 samples/sec   Loss 3.4428   LearningRate 0.0719   Epoch: 3   Global Step: 50710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:36:20,658-Speed 3338.80 samples/sec   Loss 3.4837   LearningRate 0.0719   Epoch: 3   Global Step: 50720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:23,735-Speed 3328.70 samples/sec   Loss 3.6028   LearningRate 0.0719   Epoch: 3   Global Step: 50730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:26,810-Speed 3331.02 samples/sec   Loss 3.5539   LearningRate 0.0719   Epoch: 3   Global Step: 50740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:29,877-Speed 3338.87 samples/sec   Loss 3.5164   LearningRate 0.0719   Epoch: 3   Global Step: 50750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:32,970-Speed 3311.69 samples/sec   Loss 3.5330   LearningRate 0.0719   Epoch: 3   Global Step: 50760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:36,039-Speed 3337.97 samples/sec   Loss 3.4718   LearningRate 0.0719   Epoch: 3   Global Step: 50770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:39,108-Speed 3336.95 samples/sec   Loss 3.4024   LearningRate 0.0719   Epoch: 3   Global Step: 50780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:42,179-Speed 3334.76 samples/sec   Loss 3.4893   LearningRate 0.0719   Epoch: 3   Global Step: 50790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:45,246-Speed 3339.65 samples/sec   Loss 3.5465   LearningRate 0.0719   Epoch: 3   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:48,312-Speed 3341.02 samples/sec   Loss 3.5710   LearningRate 0.0719   Epoch: 3   Global Step: 50810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:36:51,379-Speed 3339.51 samples/sec   Loss 3.5584   LearningRate 0.0719   Epoch: 3   Global Step: 50820   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:36:54,459-Speed 3325.02 samples/sec   Loss 3.5076   LearningRate 0.0719   Epoch: 3   Global Step: 50830   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:36:57,527-Speed 3339.54 samples/sec   Loss 3.5575   LearningRate 0.0719   Epoch: 3   Global Step: 50840   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:00,617-Speed 3315.37 samples/sec   Loss 3.5178   LearningRate 0.0719   Epoch: 3   Global Step: 50850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:03,696-Speed 3326.36 samples/sec   Loss 3.4888   LearningRate 0.0718   Epoch: 3   Global Step: 50860   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:06,755-Speed 3348.51 samples/sec   Loss 3.5847   LearningRate 0.0718   Epoch: 3   Global Step: 50870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:09,822-Speed 3339.41 samples/sec   Loss 3.4905   LearningRate 0.0718   Epoch: 3   Global Step: 50880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:12,910-Speed 3317.01 samples/sec   Loss 3.5133   LearningRate 0.0718   Epoch: 3   Global Step: 50890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:16,145-Speed 3166.67 samples/sec   Loss 3.5927   LearningRate 0.0718   Epoch: 3   Global Step: 50900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:19,231-Speed 3319.03 samples/sec   Loss 3.5279   LearningRate 0.0718   Epoch: 3   Global Step: 50910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:22,302-Speed 3335.74 samples/sec   Loss 3.5379   LearningRate 0.0718   Epoch: 3   Global Step: 50920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:25,379-Speed 3329.07 samples/sec   Loss 3.5023   LearningRate 0.0718   Epoch: 3   Global Step: 50930   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:28,446-Speed 3339.62 samples/sec   Loss 3.5186   LearningRate 0.0718   Epoch: 3   Global Step: 50940   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:31,531-Speed 3319.17 samples/sec   Loss 3.4773   LearningRate 0.0718   Epoch: 3   Global Step: 50950   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:34,596-Speed 3341.85 samples/sec   Loss 3.6429   LearningRate 0.0718   Epoch: 3   Global Step: 50960   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:37:37,661-Speed 3342.14 samples/sec   Loss 3.6029   LearningRate 0.0718   Epoch: 3   Global Step: 50970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:40,729-Speed 3337.50 samples/sec   Loss 3.5516   LearningRate 0.0718   Epoch: 3   Global Step: 50980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:43,797-Speed 3338.69 samples/sec   Loss 3.5416   LearningRate 0.0718   Epoch: 3   Global Step: 50990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:46,874-Speed 3329.47 samples/sec   Loss 3.5419   LearningRate 0.0718   Epoch: 3   Global Step: 51000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:49,973-Speed 3304.91 samples/sec   Loss 3.5889   LearningRate 0.0718   Epoch: 3   Global Step: 51010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:53,159-Speed 3215.10 samples/sec   Loss 3.5430   LearningRate 0.0718   Epoch: 3   Global Step: 51020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:56,236-Speed 3327.85 samples/sec   Loss 3.5455   LearningRate 0.0718   Epoch: 3   Global Step: 51030   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:37:59,304-Speed 3339.24 samples/sec   Loss 3.5146   LearningRate 0.0718   Epoch: 3   Global Step: 51040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:38:02,413-Speed 3293.75 samples/sec   Loss 3.5058   LearningRate 0.0718   Epoch: 3   Global Step: 51050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:38:05,484-Speed 3335.62 samples/sec   Loss 3.4775   LearningRate 0.0717   Epoch: 3   Global Step: 51060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:38:08,563-Speed 3326.40 samples/sec   Loss 3.4801   LearningRate 0.0717   Epoch: 3   Global Step: 51070   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:38:11,616-Speed 3354.12 samples/sec   Loss 3.5375   LearningRate 0.0717   Epoch: 3   Global Step: 51080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:38:14,693-Speed 3329.43 samples/sec   Loss 3.5902   LearningRate 0.0717   Epoch: 3   Global Step: 51090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:38:17,771-Speed 3327.51 samples/sec   Loss 3.5842   LearningRate 0.0717   Epoch: 3   Global Step: 51100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:20,838-Speed 3339.42 samples/sec   Loss 3.5918   LearningRate 0.0717   Epoch: 3   Global Step: 51110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:23,927-Speed 3316.48 samples/sec   Loss 3.5664   LearningRate 0.0717   Epoch: 3   Global Step: 51120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:27,081-Speed 3246.89 samples/sec   Loss 3.5737   LearningRate 0.0717   Epoch: 3   Global Step: 51130   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:30,152-Speed 3335.39 samples/sec   Loss 3.6169   LearningRate 0.0717   Epoch: 3   Global Step: 51140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:33,228-Speed 3329.92 samples/sec   Loss 3.5191   LearningRate 0.0717   Epoch: 3   Global Step: 51150   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:36,297-Speed 3337.60 samples/sec   Loss 3.5532   LearningRate 0.0717   Epoch: 3   Global Step: 51160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:39,367-Speed 3335.08 samples/sec   Loss 3.6543   LearningRate 0.0717   Epoch: 3   Global Step: 51170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:42,450-Speed 3322.34 samples/sec   Loss 3.6213   LearningRate 0.0717   Epoch: 3   Global Step: 51180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:45,527-Speed 3329.03 samples/sec   Loss 3.5145   LearningRate 0.0717   Epoch: 3   Global Step: 51190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:38:48,595-Speed 3338.90 samples/sec   Loss 3.5439   LearningRate 0.0717   Epoch: 3   Global Step: 51200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:38:51,665-Speed 3336.69 samples/sec   Loss 3.5846   LearningRate 0.0717   Epoch: 3   Global Step: 51210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:38:54,750-Speed 3320.09 samples/sec   Loss 3.6142   LearningRate 0.0717   Epoch: 3   Global Step: 51220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:38:57,818-Speed 3337.61 samples/sec   Loss 3.6012   LearningRate 0.0717   Epoch: 3   Global Step: 51230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:00,885-Speed 3339.98 samples/sec   Loss 3.4804   LearningRate 0.0717   Epoch: 3   Global Step: 51240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:03,944-Speed 3347.86 samples/sec   Loss 3.6088   LearningRate 0.0717   Epoch: 3   Global Step: 51250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:07,063-Speed 3284.49 samples/sec   Loss 3.6194   LearningRate 0.0716   Epoch: 3   Global Step: 51260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:10,142-Speed 3325.55 samples/sec   Loss 3.5695   LearningRate 0.0716   Epoch: 3   Global Step: 51270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:13,229-Speed 3318.42 samples/sec   Loss 3.5221   LearningRate 0.0716   Epoch: 3   Global Step: 51280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:16,298-Speed 3338.19 samples/sec   Loss 3.5615   LearningRate 0.0716   Epoch: 3   Global Step: 51290   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:19,373-Speed 3330.37 samples/sec   Loss 3.5165   LearningRate 0.0716   Epoch: 3   Global Step: 51300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:22,459-Speed 3319.74 samples/sec   Loss 3.5957   LearningRate 0.0716   Epoch: 3   Global Step: 51310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:25,537-Speed 3326.91 samples/sec   Loss 3.5766   LearningRate 0.0716   Epoch: 3   Global Step: 51320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:28,624-Speed 3318.25 samples/sec   Loss 3.4954   LearningRate 0.0716   Epoch: 3   Global Step: 51330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:31,692-Speed 3337.90 samples/sec   Loss 3.6695   LearningRate 0.0716   Epoch: 3   Global Step: 51340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:39:34,780-Speed 3317.14 samples/sec   Loss 3.5456   LearningRate 0.0716   Epoch: 3   Global Step: 51350   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:37,939-Speed 3242.36 samples/sec   Loss 3.6525   LearningRate 0.0716   Epoch: 3   Global Step: 51360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:41,042-Speed 3301.13 samples/sec   Loss 3.6831   LearningRate 0.0716   Epoch: 3   Global Step: 51370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:44,112-Speed 3336.61 samples/sec   Loss 3.5228   LearningRate 0.0716   Epoch: 3   Global Step: 51380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:47,190-Speed 3326.99 samples/sec   Loss 3.5569   LearningRate 0.0716   Epoch: 3   Global Step: 51390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:50,268-Speed 3327.62 samples/sec   Loss 3.6439   LearningRate 0.0716   Epoch: 3   Global Step: 51400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:53,354-Speed 3319.48 samples/sec   Loss 3.6542   LearningRate 0.0716   Epoch: 3   Global Step: 51410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:56,425-Speed 3335.06 samples/sec   Loss 3.5824   LearningRate 0.0716   Epoch: 3   Global Step: 51420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:39:59,522-Speed 3307.20 samples/sec   Loss 3.5962   LearningRate 0.0716   Epoch: 3   Global Step: 51430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:40:02,649-Speed 3275.84 samples/sec   Loss 3.5974   LearningRate 0.0716   Epoch: 3   Global Step: 51440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:40:05,734-Speed 3319.09 samples/sec   Loss 3.6216   LearningRate 0.0716   Epoch: 3   Global Step: 51450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:40:08,847-Speed 3290.45 samples/sec   Loss 3.6943   LearningRate 0.0715   Epoch: 3   Global Step: 51460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:11,924-Speed 3329.68 samples/sec   Loss 3.5709   LearningRate 0.0715   Epoch: 3   Global Step: 51470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:15,017-Speed 3311.44 samples/sec   Loss 3.6126   LearningRate 0.0715   Epoch: 3   Global Step: 51480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:18,241-Speed 3176.26 samples/sec   Loss 3.6734   LearningRate 0.0715   Epoch: 3   Global Step: 51490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:21,307-Speed 3340.62 samples/sec   Loss 3.6740   LearningRate 0.0715   Epoch: 3   Global Step: 51500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:24,391-Speed 3321.04 samples/sec   Loss 3.6225   LearningRate 0.0715   Epoch: 3   Global Step: 51510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:27,457-Speed 3341.23 samples/sec   Loss 3.6108   LearningRate 0.0715   Epoch: 3   Global Step: 51520   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:30,526-Speed 3337.03 samples/sec   Loss 3.6588   LearningRate 0.0715   Epoch: 3   Global Step: 51530   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:33,604-Speed 3327.84 samples/sec   Loss 3.6002   LearningRate 0.0715   Epoch: 3   Global Step: 51540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:36,686-Speed 3323.64 samples/sec   Loss 3.5844   LearningRate 0.0715   Epoch: 3   Global Step: 51550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:40:39,758-Speed 3333.65 samples/sec   Loss 3.6326   LearningRate 0.0715   Epoch: 3   Global Step: 51560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:40:42,949-Speed 3210.33 samples/sec   Loss 3.6460   LearningRate 0.0715   Epoch: 3   Global Step: 51570   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:40:46,084-Speed 3267.09 samples/sec   Loss 3.6362   LearningRate 0.0715   Epoch: 3   Global Step: 51580   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:40:49,202-Speed 3285.02 samples/sec   Loss 3.6760   LearningRate 0.0715   Epoch: 3   Global Step: 51590   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:40:52,276-Speed 3331.03 samples/sec   Loss 3.6248   LearningRate 0.0715   Epoch: 3   Global Step: 51600   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:40:55,351-Speed 3331.67 samples/sec   Loss 3.6056   LearningRate 0.0715   Epoch: 3   Global Step: 51610   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:40:58,447-Speed 3308.15 samples/sec   Loss 3.6352   LearningRate 0.0715   Epoch: 3   Global Step: 51620   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:01,565-Speed 3284.12 samples/sec   Loss 3.5946   LearningRate 0.0715   Epoch: 3   Global Step: 51630   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:04,664-Speed 3306.36 samples/sec   Loss 3.7002   LearningRate 0.0715   Epoch: 3   Global Step: 51640   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:07,750-Speed 3318.47 samples/sec   Loss 3.5813   LearningRate 0.0714   Epoch: 3   Global Step: 51650   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:10,809-Speed 3347.79 samples/sec   Loss 3.7094   LearningRate 0.0714   Epoch: 3   Global Step: 51660   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:13,880-Speed 3336.17 samples/sec   Loss 3.5410   LearningRate 0.0714   Epoch: 3   Global Step: 51670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:16,963-Speed 3321.69 samples/sec   Loss 3.6002   LearningRate 0.0714   Epoch: 3   Global Step: 51680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:20,031-Speed 3338.93 samples/sec   Loss 3.6185   LearningRate 0.0714   Epoch: 3   Global Step: 51690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:23,118-Speed 3317.44 samples/sec   Loss 3.6065   LearningRate 0.0714   Epoch: 3   Global Step: 51700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:26,192-Speed 3332.68 samples/sec   Loss 3.6334   LearningRate 0.0714   Epoch: 3   Global Step: 51710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:29,308-Speed 3287.15 samples/sec   Loss 3.6231   LearningRate 0.0714   Epoch: 3   Global Step: 51720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:41:32,366-Speed 3349.52 samples/sec   Loss 3.6427   LearningRate 0.0714   Epoch: 3   Global Step: 51730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:41:35,433-Speed 3339.06 samples/sec   Loss 3.6426   LearningRate 0.0714   Epoch: 3   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:41:38,504-Speed 3335.13 samples/sec   Loss 3.6302   LearningRate 0.0714   Epoch: 3   Global Step: 51750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:41:41,578-Speed 3332.21 samples/sec   Loss 3.6471   LearningRate 0.0714   Epoch: 3   Global Step: 51760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:41:44,651-Speed 3332.54 samples/sec   Loss 3.6105   LearningRate 0.0714   Epoch: 3   Global Step: 51770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:41:47,726-Speed 3331.43 samples/sec   Loss 3.5895   LearningRate 0.0714   Epoch: 3   Global Step: 51780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:41:50,803-Speed 3327.69 samples/sec   Loss 3.6354   LearningRate 0.0714   Epoch: 3   Global Step: 51790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:41:53,926-Speed 3280.14 samples/sec   Loss 3.6020   LearningRate 0.0714   Epoch: 3   Global Step: 51800   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:41:57,062-Speed 3266.46 samples/sec   Loss 3.6764   LearningRate 0.0714   Epoch: 3   Global Step: 51810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:42:00,145-Speed 3322.62 samples/sec   Loss 3.6409   LearningRate 0.0714   Epoch: 3   Global Step: 51820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:42:03,220-Speed 3330.48 samples/sec   Loss 3.6020   LearningRate 0.0714   Epoch: 3   Global Step: 51830   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:06,290-Speed 3336.60 samples/sec   Loss 3.6996   LearningRate 0.0714   Epoch: 3   Global Step: 51840   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:09,358-Speed 3337.51 samples/sec   Loss 3.6338   LearningRate 0.0713   Epoch: 3   Global Step: 51850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:12,441-Speed 3323.22 samples/sec   Loss 3.6921   LearningRate 0.0713   Epoch: 3   Global Step: 51860   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:15,601-Speed 3241.39 samples/sec   Loss 3.5986   LearningRate 0.0713   Epoch: 3   Global Step: 51870   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:18,687-Speed 3318.48 samples/sec   Loss 3.6385   LearningRate 0.0713   Epoch: 3   Global Step: 51880   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:21,770-Speed 3322.03 samples/sec   Loss 3.7049   LearningRate 0.0713   Epoch: 3   Global Step: 51890   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:24,890-Speed 3284.02 samples/sec   Loss 3.6730   LearningRate 0.0713   Epoch: 3   Global Step: 51900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:27,975-Speed 3319.92 samples/sec   Loss 3.6704   LearningRate 0.0713   Epoch: 3   Global Step: 51910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:31,043-Speed 3338.59 samples/sec   Loss 3.6237   LearningRate 0.0713   Epoch: 3   Global Step: 51920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:34,112-Speed 3336.90 samples/sec   Loss 3.7006   LearningRate 0.0713   Epoch: 3   Global Step: 51930   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:37,253-Speed 3261.32 samples/sec   Loss 3.6616   LearningRate 0.0713   Epoch: 3   Global Step: 51940   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:40,329-Speed 3328.87 samples/sec   Loss 3.7043   LearningRate 0.0713   Epoch: 3   Global Step: 51950   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:43,415-Speed 3318.96 samples/sec   Loss 3.6148   LearningRate 0.0713   Epoch: 3   Global Step: 51960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:46,525-Speed 3294.03 samples/sec   Loss 3.5966   LearningRate 0.0713   Epoch: 3   Global Step: 51970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:49,658-Speed 3269.00 samples/sec   Loss 3.6588   LearningRate 0.0713   Epoch: 3   Global Step: 51980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:52,817-Speed 3242.21 samples/sec   Loss 3.6198   LearningRate 0.0713   Epoch: 3   Global Step: 51990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:42:55,908-Speed 3314.15 samples/sec   Loss 3.6506   LearningRate 0.0713   Epoch: 3   Global Step: 52000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:43:39,507-[lfw][52000]XNorm: 21.897672
Training: 2022-04-11 04:43:39,507-[lfw][52000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-11 04:43:39,508-[lfw][52000]Accuracy-Highest: 0.99783
Training: 2022-04-11 04:44:30,170-[cfp_fp][52000]XNorm: 20.287661
Training: 2022-04-11 04:44:30,171-[cfp_fp][52000]Accuracy-Flip: 0.98271+-0.00773
Training: 2022-04-11 04:44:30,171-[cfp_fp][52000]Accuracy-Highest: 0.98300
Training: 2022-04-11 04:45:13,777-[agedb_30][52000]XNorm: 21.862695
Training: 2022-04-11 04:45:13,778-[agedb_30][52000]Accuracy-Flip: 0.97883+-0.00719
Training: 2022-04-11 04:45:13,778-[agedb_30][52000]Accuracy-Highest: 0.97917
Training: 2022-04-11 04:45:16,860-Speed 72.65 samples/sec   Loss 3.7165   LearningRate 0.0713   Epoch: 3   Global Step: 52010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:19,910-Speed 3358.50 samples/sec   Loss 3.6511   LearningRate 0.0713   Epoch: 3   Global Step: 52020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:22,965-Speed 3352.86 samples/sec   Loss 3.5870   LearningRate 0.0713   Epoch: 3   Global Step: 52030   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:45:26,005-Speed 3368.98 samples/sec   Loss 3.6845   LearningRate 0.0713   Epoch: 3   Global Step: 52040   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:29,057-Speed 3355.69 samples/sec   Loss 3.6388   LearningRate 0.0712   Epoch: 3   Global Step: 52050   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:32,127-Speed 3336.06 samples/sec   Loss 3.6195   LearningRate 0.0712   Epoch: 3   Global Step: 52060   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:35,190-Speed 3343.46 samples/sec   Loss 3.7111   LearningRate 0.0712   Epoch: 3   Global Step: 52070   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:38,239-Speed 3359.65 samples/sec   Loss 3.7278   LearningRate 0.0712   Epoch: 3   Global Step: 52080   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:41,293-Speed 3352.98 samples/sec   Loss 3.5948   LearningRate 0.0712   Epoch: 3   Global Step: 52090   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:44,350-Speed 3351.09 samples/sec   Loss 3.6747   LearningRate 0.0712   Epoch: 3   Global Step: 52100   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:47,403-Speed 3355.14 samples/sec   Loss 3.5822   LearningRate 0.0712   Epoch: 3   Global Step: 52110   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:50,508-Speed 3298.18 samples/sec   Loss 3.6161   LearningRate 0.0712   Epoch: 3   Global Step: 52120   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:53,562-Speed 3353.71 samples/sec   Loss 3.6440   LearningRate 0.0712   Epoch: 3   Global Step: 52130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:56,612-Speed 3359.21 samples/sec   Loss 3.6628   LearningRate 0.0712   Epoch: 3   Global Step: 52140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:45:59,693-Speed 3323.57 samples/sec   Loss 3.5844   LearningRate 0.0712   Epoch: 3   Global Step: 52150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:46:02,810-Speed 3285.62 samples/sec   Loss 3.6755   LearningRate 0.0712   Epoch: 3   Global Step: 52160   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:05,939-Speed 3274.10 samples/sec   Loss 3.7221   LearningRate 0.0712   Epoch: 3   Global Step: 52170   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:09,009-Speed 3336.32 samples/sec   Loss 3.6309   LearningRate 0.0712   Epoch: 3   Global Step: 52180   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:12,062-Speed 3354.75 samples/sec   Loss 3.6424   LearningRate 0.0712   Epoch: 3   Global Step: 52190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:15,133-Speed 3335.91 samples/sec   Loss 3.6540   LearningRate 0.0712   Epoch: 3   Global Step: 52200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:18,186-Speed 3354.41 samples/sec   Loss 3.6402   LearningRate 0.0712   Epoch: 3   Global Step: 52210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:21,250-Speed 3342.97 samples/sec   Loss 3.6567   LearningRate 0.0712   Epoch: 3   Global Step: 52220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:24,312-Speed 3344.33 samples/sec   Loss 3.7025   LearningRate 0.0712   Epoch: 3   Global Step: 52230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:27,369-Speed 3351.46 samples/sec   Loss 3.6201   LearningRate 0.0712   Epoch: 3   Global Step: 52240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:30,431-Speed 3344.10 samples/sec   Loss 3.6344   LearningRate 0.0711   Epoch: 3   Global Step: 52250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:33,499-Speed 3338.18 samples/sec   Loss 3.6324   LearningRate 0.0711   Epoch: 3   Global Step: 52260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:46:36,561-Speed 3346.20 samples/sec   Loss 3.6592   LearningRate 0.0711   Epoch: 3   Global Step: 52270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:46:39,627-Speed 3339.88 samples/sec   Loss 3.7167   LearningRate 0.0711   Epoch: 3   Global Step: 52280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:46:42,741-Speed 3290.17 samples/sec   Loss 3.6304   LearningRate 0.0711   Epoch: 3   Global Step: 52290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:46:45,821-Speed 3325.22 samples/sec   Loss 3.7176   LearningRate 0.0711   Epoch: 3   Global Step: 52300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:46:48,911-Speed 3314.70 samples/sec   Loss 3.6572   LearningRate 0.0711   Epoch: 3   Global Step: 52310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:51,972-Speed 3345.09 samples/sec   Loss 3.6882   LearningRate 0.0711   Epoch: 3   Global Step: 52320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:55,035-Speed 3344.43 samples/sec   Loss 3.6768   LearningRate 0.0711   Epoch: 3   Global Step: 52330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:46:58,094-Speed 3348.55 samples/sec   Loss 3.6752   LearningRate 0.0711   Epoch: 3   Global Step: 52340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:47:01,162-Speed 3338.33 samples/sec   Loss 3.7261   LearningRate 0.0711   Epoch: 3   Global Step: 52350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:47:04,219-Speed 3350.62 samples/sec   Loss 3.6345   LearningRate 0.0711   Epoch: 3   Global Step: 52360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:47:07,284-Speed 3341.24 samples/sec   Loss 3.6662   LearningRate 0.0711   Epoch: 3   Global Step: 52370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:47:10,462-Speed 3223.41 samples/sec   Loss 3.6787   LearningRate 0.0711   Epoch: 3   Global Step: 52380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:47:13,516-Speed 3353.44 samples/sec   Loss 3.6752   LearningRate 0.0711   Epoch: 3   Global Step: 52390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:47:16,576-Speed 3347.13 samples/sec   Loss 3.7156   LearningRate 0.0711   Epoch: 3   Global Step: 52400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:47:19,653-Speed 3328.72 samples/sec   Loss 3.5895   LearningRate 0.0711   Epoch: 3   Global Step: 52410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:22,739-Speed 3319.11 samples/sec   Loss 3.6834   LearningRate 0.0711   Epoch: 3   Global Step: 52420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:25,922-Speed 3217.75 samples/sec   Loss 3.5970   LearningRate 0.0711   Epoch: 3   Global Step: 52430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:29,032-Speed 3293.08 samples/sec   Loss 3.6848   LearningRate 0.0710   Epoch: 3   Global Step: 52440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:32,114-Speed 3324.07 samples/sec   Loss 3.7039   LearningRate 0.0710   Epoch: 3   Global Step: 52450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:35,172-Speed 3348.77 samples/sec   Loss 3.5999   LearningRate 0.0710   Epoch: 3   Global Step: 52460   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:38,259-Speed 3317.99 samples/sec   Loss 3.6615   LearningRate 0.0710   Epoch: 3   Global Step: 52470   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:41,326-Speed 3339.71 samples/sec   Loss 3.6885   LearningRate 0.0710   Epoch: 3   Global Step: 52480   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:44,384-Speed 3349.07 samples/sec   Loss 3.7191   LearningRate 0.0710   Epoch: 3   Global Step: 52490   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:47,504-Speed 3282.70 samples/sec   Loss 3.6372   LearningRate 0.0710   Epoch: 3   Global Step: 52500   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:50,633-Speed 3274.15 samples/sec   Loss 3.7738   LearningRate 0.0710   Epoch: 3   Global Step: 52510   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:47:53,680-Speed 3360.53 samples/sec   Loss 3.6983   LearningRate 0.0710   Epoch: 3   Global Step: 52520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:56,747-Speed 3340.16 samples/sec   Loss 3.6367   LearningRate 0.0710   Epoch: 3   Global Step: 52530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:47:59,800-Speed 3354.72 samples/sec   Loss 3.6425   LearningRate 0.0710   Epoch: 3   Global Step: 52540   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:48:02,858-Speed 3349.82 samples/sec   Loss 3.6361   LearningRate 0.0710   Epoch: 3   Global Step: 52550   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:48:05,929-Speed 3334.88 samples/sec   Loss 3.6624   LearningRate 0.0710   Epoch: 3   Global Step: 52560   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:48:08,976-Speed 3361.82 samples/sec   Loss 3.6561   LearningRate 0.0710   Epoch: 3   Global Step: 52570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:12,034-Speed 3349.22 samples/sec   Loss 3.6798   LearningRate 0.0710   Epoch: 3   Global Step: 52580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:15,090-Speed 3350.98 samples/sec   Loss 3.5880   LearningRate 0.0710   Epoch: 3   Global Step: 52590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:18,153-Speed 3344.57 samples/sec   Loss 3.6896   LearningRate 0.0710   Epoch: 3   Global Step: 52600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:21,216-Speed 3343.77 samples/sec   Loss 3.7104   LearningRate 0.0710   Epoch: 3   Global Step: 52610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:24,294-Speed 3327.39 samples/sec   Loss 3.6990   LearningRate 0.0710   Epoch: 3   Global Step: 52620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:27,354-Speed 3347.32 samples/sec   Loss 3.6916   LearningRate 0.0710   Epoch: 3   Global Step: 52630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:30,459-Speed 3299.22 samples/sec   Loss 3.6650   LearningRate 0.0709   Epoch: 3   Global Step: 52640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:33,529-Speed 3335.54 samples/sec   Loss 3.7877   LearningRate 0.0709   Epoch: 3   Global Step: 52650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:36,614-Speed 3320.12 samples/sec   Loss 3.7180   LearningRate 0.0709   Epoch: 3   Global Step: 52660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:48:39,677-Speed 3343.77 samples/sec   Loss 3.6791   LearningRate 0.0709   Epoch: 3   Global Step: 52670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:48:42,780-Speed 3300.82 samples/sec   Loss 3.7397   LearningRate 0.0709   Epoch: 3   Global Step: 52680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:48:45,856-Speed 3330.39 samples/sec   Loss 3.6743   LearningRate 0.0709   Epoch: 3   Global Step: 52690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:48:48,914-Speed 3349.03 samples/sec   Loss 3.7302   LearningRate 0.0709   Epoch: 3   Global Step: 52700   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:48:51,975-Speed 3346.38 samples/sec   Loss 3.6687   LearningRate 0.0709   Epoch: 3   Global Step: 52710   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:48:55,035-Speed 3347.37 samples/sec   Loss 3.7223   LearningRate 0.0709   Epoch: 3   Global Step: 52720   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:48:58,100-Speed 3341.39 samples/sec   Loss 3.6688   LearningRate 0.0709   Epoch: 3   Global Step: 52730   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:01,187-Speed 3318.14 samples/sec   Loss 3.7080   LearningRate 0.0709   Epoch: 3   Global Step: 52740   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:04,271-Speed 3320.70 samples/sec   Loss 3.6759   LearningRate 0.0709   Epoch: 3   Global Step: 52750   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:07,437-Speed 3235.22 samples/sec   Loss 3.6736   LearningRate 0.0709   Epoch: 3   Global Step: 52760   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:10,488-Speed 3357.45 samples/sec   Loss 3.7220   LearningRate 0.0709   Epoch: 3   Global Step: 52770   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:13,566-Speed 3327.67 samples/sec   Loss 3.6965   LearningRate 0.0709   Epoch: 3   Global Step: 52780   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:16,643-Speed 3327.85 samples/sec   Loss 3.6921   LearningRate 0.0709   Epoch: 3   Global Step: 52790   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:19,747-Speed 3300.26 samples/sec   Loss 3.6865   LearningRate 0.0709   Epoch: 3   Global Step: 52800   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:22,826-Speed 3327.20 samples/sec   Loss 3.7472   LearningRate 0.0709   Epoch: 3   Global Step: 52810   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:25,908-Speed 3323.13 samples/sec   Loss 3.6782   LearningRate 0.0709   Epoch: 3   Global Step: 52820   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:28,982-Speed 3332.05 samples/sec   Loss 3.7599   LearningRate 0.0709   Epoch: 3   Global Step: 52830   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:32,198-Speed 3184.72 samples/sec   Loss 3.8019   LearningRate 0.0708   Epoch: 3   Global Step: 52840   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:35,312-Speed 3288.40 samples/sec   Loss 3.7052   LearningRate 0.0708   Epoch: 3   Global Step: 52850   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:38,394-Speed 3323.47 samples/sec   Loss 3.7015   LearningRate 0.0708   Epoch: 3   Global Step: 52860   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:41,458-Speed 3342.51 samples/sec   Loss 3.7934   LearningRate 0.0708   Epoch: 3   Global Step: 52870   Fp16 Grad Scale: 262144   Required: 30 hours
Training: 2022-04-11 04:49:44,514-Speed 3352.47 samples/sec   Loss 3.6268   LearningRate 0.0708   Epoch: 3   Global Step: 52880   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:47,575-Speed 3346.37 samples/sec   Loss 3.7678   LearningRate 0.0708   Epoch: 3   Global Step: 52890   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:50,671-Speed 3308.53 samples/sec   Loss 3.6844   LearningRate 0.0708   Epoch: 3   Global Step: 52900   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:53,742-Speed 3335.08 samples/sec   Loss 3.7271   LearningRate 0.0708   Epoch: 3   Global Step: 52910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:56,842-Speed 3303.43 samples/sec   Loss 3.7390   LearningRate 0.0708   Epoch: 3   Global Step: 52920   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:49:59,918-Speed 3329.26 samples/sec   Loss 3.7268   LearningRate 0.0708   Epoch: 3   Global Step: 52930   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:02,977-Speed 3348.49 samples/sec   Loss 3.7231   LearningRate 0.0708   Epoch: 3   Global Step: 52940   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:06,048-Speed 3335.75 samples/sec   Loss 3.6679   LearningRate 0.0708   Epoch: 3   Global Step: 52950   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:09,127-Speed 3326.16 samples/sec   Loss 3.6512   LearningRate 0.0708   Epoch: 3   Global Step: 52960   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:12,205-Speed 3328.11 samples/sec   Loss 3.6489   LearningRate 0.0708   Epoch: 3   Global Step: 52970   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:15,304-Speed 3304.58 samples/sec   Loss 3.7060   LearningRate 0.0708   Epoch: 3   Global Step: 52980   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:18,377-Speed 3333.60 samples/sec   Loss 3.6929   LearningRate 0.0708   Epoch: 3   Global Step: 52990   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:21,438-Speed 3346.07 samples/sec   Loss 3.6933   LearningRate 0.0708   Epoch: 3   Global Step: 53000   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:24,505-Speed 3339.28 samples/sec   Loss 3.7562   LearningRate 0.0708   Epoch: 3   Global Step: 53010   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:27,573-Speed 3338.50 samples/sec   Loss 3.7044   LearningRate 0.0708   Epoch: 3   Global Step: 53020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:50:30,636-Speed 3344.38 samples/sec   Loss 3.7485   LearningRate 0.0708   Epoch: 3   Global Step: 53030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:50:33,748-Speed 3290.44 samples/sec   Loss 3.7467   LearningRate 0.0707   Epoch: 3   Global Step: 53040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:50:36,826-Speed 3328.01 samples/sec   Loss 3.6694   LearningRate 0.0707   Epoch: 3   Global Step: 53050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:50:39,893-Speed 3338.98 samples/sec   Loss 3.6803   LearningRate 0.0707   Epoch: 3   Global Step: 53060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:50:42,996-Speed 3301.34 samples/sec   Loss 3.6251   LearningRate 0.0707   Epoch: 3   Global Step: 53070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:50:46,052-Speed 3351.55 samples/sec   Loss 3.7380   LearningRate 0.0707   Epoch: 3   Global Step: 53080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:50:49,129-Speed 3328.60 samples/sec   Loss 3.6865   LearningRate 0.0707   Epoch: 3   Global Step: 53090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:50:52,297-Speed 3233.12 samples/sec   Loss 3.7134   LearningRate 0.0707   Epoch: 3   Global Step: 53100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:50:55,409-Speed 3291.60 samples/sec   Loss 3.7344   LearningRate 0.0707   Epoch: 3   Global Step: 53110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:50:58,487-Speed 3327.47 samples/sec   Loss 3.6807   LearningRate 0.0707   Epoch: 3   Global Step: 53120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:01,576-Speed 3315.44 samples/sec   Loss 3.7260   LearningRate 0.0707   Epoch: 3   Global Step: 53130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:51:04,638-Speed 3345.22 samples/sec   Loss 3.7416   LearningRate 0.0707   Epoch: 3   Global Step: 53140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:51:07,714-Speed 3330.50 samples/sec   Loss 3.7555   LearningRate 0.0707   Epoch: 3   Global Step: 53150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:51:10,809-Speed 3309.12 samples/sec   Loss 3.7609   LearningRate 0.0707   Epoch: 3   Global Step: 53160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:51:13,868-Speed 3348.90 samples/sec   Loss 3.7060   LearningRate 0.0707   Epoch: 3   Global Step: 53170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:51:16,932-Speed 3342.53 samples/sec   Loss 3.7084   LearningRate 0.0707   Epoch: 3   Global Step: 53180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:51:19,979-Speed 3361.33 samples/sec   Loss 3.7862   LearningRate 0.0707   Epoch: 3   Global Step: 53190   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:23,038-Speed 3348.49 samples/sec   Loss 3.6906   LearningRate 0.0707   Epoch: 3   Global Step: 53200   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:26,095-Speed 3350.05 samples/sec   Loss 3.8156   LearningRate 0.0707   Epoch: 3   Global Step: 53210   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:29,164-Speed 3337.63 samples/sec   Loss 3.6014   LearningRate 0.0707   Epoch: 3   Global Step: 53220   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:32,286-Speed 3280.46 samples/sec   Loss 3.7444   LearningRate 0.0707   Epoch: 3   Global Step: 53230   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:35,372-Speed 3319.32 samples/sec   Loss 3.7580   LearningRate 0.0706   Epoch: 3   Global Step: 53240   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:38,458-Speed 3319.12 samples/sec   Loss 3.6984   LearningRate 0.0706   Epoch: 3   Global Step: 53250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:41,531-Speed 3332.34 samples/sec   Loss 3.7463   LearningRate 0.0706   Epoch: 3   Global Step: 53260   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:44,611-Speed 3325.76 samples/sec   Loss 3.6985   LearningRate 0.0706   Epoch: 3   Global Step: 53270   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:47,773-Speed 3239.86 samples/sec   Loss 3.7124   LearningRate 0.0706   Epoch: 3   Global Step: 53280   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:51:50,855-Speed 3322.58 samples/sec   Loss 3.8081   LearningRate 0.0706   Epoch: 3   Global Step: 53290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:51:53,949-Speed 3311.25 samples/sec   Loss 3.7352   LearningRate 0.0706   Epoch: 3   Global Step: 53300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:51:57,109-Speed 3241.37 samples/sec   Loss 3.8121   LearningRate 0.0706   Epoch: 3   Global Step: 53310   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:00,197-Speed 3317.03 samples/sec   Loss 3.7470   LearningRate 0.0706   Epoch: 3   Global Step: 53320   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:03,304-Speed 3296.00 samples/sec   Loss 3.8010   LearningRate 0.0706   Epoch: 3   Global Step: 53330   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:06,439-Speed 3268.12 samples/sec   Loss 3.7389   LearningRate 0.0706   Epoch: 3   Global Step: 53340   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:09,513-Speed 3331.95 samples/sec   Loss 3.7183   LearningRate 0.0706   Epoch: 3   Global Step: 53350   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:12,610-Speed 3306.58 samples/sec   Loss 3.8007   LearningRate 0.0706   Epoch: 3   Global Step: 53360   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:15,682-Speed 3333.79 samples/sec   Loss 3.7290   LearningRate 0.0706   Epoch: 3   Global Step: 53370   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:18,819-Speed 3264.83 samples/sec   Loss 3.8193   LearningRate 0.0706   Epoch: 3   Global Step: 53380   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:21,883-Speed 3343.03 samples/sec   Loss 3.7652   LearningRate 0.0706   Epoch: 3   Global Step: 53390   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:24,972-Speed 3316.55 samples/sec   Loss 3.7400   LearningRate 0.0706   Epoch: 3   Global Step: 53400   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:28,032-Speed 3346.63 samples/sec   Loss 3.8116   LearningRate 0.0706   Epoch: 3   Global Step: 53410   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:31,101-Speed 3337.57 samples/sec   Loss 3.6964   LearningRate 0.0706   Epoch: 3   Global Step: 53420   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:34,184-Speed 3321.98 samples/sec   Loss 3.7984   LearningRate 0.0706   Epoch: 3   Global Step: 53430   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:52:37,294-Speed 3293.95 samples/sec   Loss 3.7919   LearningRate 0.0705   Epoch: 3   Global Step: 53440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:52:40,349-Speed 3352.33 samples/sec   Loss 3.7447   LearningRate 0.0705   Epoch: 3   Global Step: 53450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:52:43,421-Speed 3334.39 samples/sec   Loss 3.8122   LearningRate 0.0705   Epoch: 3   Global Step: 53460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:52:46,535-Speed 3288.59 samples/sec   Loss 3.7431   LearningRate 0.0705   Epoch: 3   Global Step: 53470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:52:49,671-Speed 3266.27 samples/sec   Loss 3.6750   LearningRate 0.0705   Epoch: 3   Global Step: 53480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:52:52,752-Speed 3324.77 samples/sec   Loss 3.7600   LearningRate 0.0705   Epoch: 3   Global Step: 53490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:52:55,896-Speed 3258.32 samples/sec   Loss 3.8120   LearningRate 0.0705   Epoch: 3   Global Step: 53500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:52:58,974-Speed 3327.90 samples/sec   Loss 3.7851   LearningRate 0.0705   Epoch: 3   Global Step: 53510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:02,080-Speed 3296.86 samples/sec   Loss 3.7940   LearningRate 0.0705   Epoch: 3   Global Step: 53520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:05,157-Speed 3328.59 samples/sec   Loss 3.7681   LearningRate 0.0705   Epoch: 3   Global Step: 53530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:08,230-Speed 3332.70 samples/sec   Loss 3.6976   LearningRate 0.0705   Epoch: 3   Global Step: 53540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:11,297-Speed 3339.94 samples/sec   Loss 3.7629   LearningRate 0.0705   Epoch: 3   Global Step: 53550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:53:14,375-Speed 3328.11 samples/sec   Loss 3.6640   LearningRate 0.0705   Epoch: 3   Global Step: 53560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:53:17,447-Speed 3333.93 samples/sec   Loss 3.7180   LearningRate 0.0705   Epoch: 3   Global Step: 53570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:53:20,520-Speed 3332.62 samples/sec   Loss 3.7252   LearningRate 0.0705   Epoch: 3   Global Step: 53580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:53:23,603-Speed 3322.47 samples/sec   Loss 3.7372   LearningRate 0.0705   Epoch: 3   Global Step: 53590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:53:26,670-Speed 3340.22 samples/sec   Loss 3.8005   LearningRate 0.0705   Epoch: 3   Global Step: 53600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:29,740-Speed 3336.26 samples/sec   Loss 3.8147   LearningRate 0.0705   Epoch: 3   Global Step: 53610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:32,826-Speed 3318.40 samples/sec   Loss 3.7146   LearningRate 0.0705   Epoch: 3   Global Step: 53620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:35,905-Speed 3327.18 samples/sec   Loss 3.7936   LearningRate 0.0704   Epoch: 3   Global Step: 53630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:38,984-Speed 3325.74 samples/sec   Loss 3.7175   LearningRate 0.0704   Epoch: 3   Global Step: 53640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:42,063-Speed 3326.40 samples/sec   Loss 3.8277   LearningRate 0.0704   Epoch: 3   Global Step: 53650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:45,126-Speed 3344.48 samples/sec   Loss 3.7164   LearningRate 0.0704   Epoch: 3   Global Step: 53660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:48,236-Speed 3293.52 samples/sec   Loss 3.6908   LearningRate 0.0704   Epoch: 3   Global Step: 53670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:51,308-Speed 3333.36 samples/sec   Loss 3.7907   LearningRate 0.0704   Epoch: 3   Global Step: 53680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:54,397-Speed 3315.85 samples/sec   Loss 3.7282   LearningRate 0.0704   Epoch: 3   Global Step: 53690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:53:57,486-Speed 3315.88 samples/sec   Loss 3.7359   LearningRate 0.0704   Epoch: 3   Global Step: 53700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:00,579-Speed 3312.23 samples/sec   Loss 3.8622   LearningRate 0.0704   Epoch: 3   Global Step: 53710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:03,653-Speed 3331.20 samples/sec   Loss 3.7775   LearningRate 0.0704   Epoch: 3   Global Step: 53720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:06,729-Speed 3330.01 samples/sec   Loss 3.7319   LearningRate 0.0704   Epoch: 3   Global Step: 53730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:09,879-Speed 3251.70 samples/sec   Loss 3.6812   LearningRate 0.0704   Epoch: 3   Global Step: 53740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:13,043-Speed 3237.78 samples/sec   Loss 3.7131   LearningRate 0.0704   Epoch: 3   Global Step: 53750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:16,105-Speed 3344.04 samples/sec   Loss 3.8080   LearningRate 0.0704   Epoch: 3   Global Step: 53760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:19,192-Speed 3317.83 samples/sec   Loss 3.6791   LearningRate 0.0704   Epoch: 3   Global Step: 53770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:22,309-Speed 3285.97 samples/sec   Loss 3.6633   LearningRate 0.0704   Epoch: 3   Global Step: 53780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:25,413-Speed 3300.08 samples/sec   Loss 3.6666   LearningRate 0.0704   Epoch: 3   Global Step: 53790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:28,480-Speed 3340.08 samples/sec   Loss 3.7290   LearningRate 0.0704   Epoch: 3   Global Step: 53800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:31,551-Speed 3334.56 samples/sec   Loss 3.7064   LearningRate 0.0704   Epoch: 3   Global Step: 53810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:34,636-Speed 3320.46 samples/sec   Loss 3.7196   LearningRate 0.0704   Epoch: 3   Global Step: 53820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:37,699-Speed 3344.02 samples/sec   Loss 3.7269   LearningRate 0.0703   Epoch: 3   Global Step: 53830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:40,762-Speed 3343.70 samples/sec   Loss 3.7042   LearningRate 0.0703   Epoch: 3   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:43,844-Speed 3323.58 samples/sec   Loss 3.8114   LearningRate 0.0703   Epoch: 3   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:54:46,915-Speed 3334.87 samples/sec   Loss 3.6972   LearningRate 0.0703   Epoch: 3   Global Step: 53860   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:50,049-Speed 3267.61 samples/sec   Loss 3.6697   LearningRate 0.0703   Epoch: 3   Global Step: 53870   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:53,116-Speed 3340.31 samples/sec   Loss 3.7270   LearningRate 0.0703   Epoch: 3   Global Step: 53880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:56,189-Speed 3332.40 samples/sec   Loss 3.7339   LearningRate 0.0703   Epoch: 3   Global Step: 53890   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:54:59,255-Speed 3340.88 samples/sec   Loss 3.7186   LearningRate 0.0703   Epoch: 3   Global Step: 53900   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:55:02,322-Speed 3339.57 samples/sec   Loss 3.7262   LearningRate 0.0703   Epoch: 3   Global Step: 53910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:55:05,372-Speed 3358.74 samples/sec   Loss 3.8048   LearningRate 0.0703   Epoch: 3   Global Step: 53920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:55:08,438-Speed 3340.28 samples/sec   Loss 3.7715   LearningRate 0.0703   Epoch: 3   Global Step: 53930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:55:11,502-Speed 3342.69 samples/sec   Loss 3.7479   LearningRate 0.0703   Epoch: 3   Global Step: 53940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:55:14,606-Speed 3300.04 samples/sec   Loss 3.7202   LearningRate 0.0703   Epoch: 3   Global Step: 53950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:55:17,762-Speed 3245.39 samples/sec   Loss 3.6710   LearningRate 0.0703   Epoch: 3   Global Step: 53960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:55:20,833-Speed 3334.70 samples/sec   Loss 3.7380   LearningRate 0.0703   Epoch: 3   Global Step: 53970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:55:23,901-Speed 3338.50 samples/sec   Loss 3.7279   LearningRate 0.0703   Epoch: 3   Global Step: 53980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:55:26,969-Speed 3338.80 samples/sec   Loss 3.7528   LearningRate 0.0703   Epoch: 3   Global Step: 53990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:55:30,048-Speed 3327.06 samples/sec   Loss 3.7194   LearningRate 0.0703   Epoch: 3   Global Step: 54000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:56:13,867-[lfw][54000]XNorm: 23.121639
Training: 2022-04-11 04:56:13,868-[lfw][54000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-11 04:56:13,868-[lfw][54000]Accuracy-Highest: 0.99800
Training: 2022-04-11 04:57:04,545-[cfp_fp][54000]XNorm: 21.460285
Training: 2022-04-11 04:57:04,546-[cfp_fp][54000]Accuracy-Flip: 0.98300+-0.00493
Training: 2022-04-11 04:57:04,546-[cfp_fp][54000]Accuracy-Highest: 0.98300
Training: 2022-04-11 04:57:48,086-[agedb_30][54000]XNorm: 23.088607
Training: 2022-04-11 04:57:48,087-[agedb_30][54000]Accuracy-Flip: 0.98000+-0.00730
Training: 2022-04-11 04:57:48,087-[agedb_30][54000]Accuracy-Highest: 0.98000
Training: 2022-04-11 04:57:51,140-Speed 72.58 samples/sec   Loss 3.7092   LearningRate 0.0703   Epoch: 3   Global Step: 54010   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:57:54,207-Speed 3340.05 samples/sec   Loss 3.7944   LearningRate 0.0703   Epoch: 3   Global Step: 54020   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:57:57,245-Speed 3371.70 samples/sec   Loss 3.7014   LearningRate 0.0702   Epoch: 3   Global Step: 54030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:00,353-Speed 3294.70 samples/sec   Loss 3.7725   LearningRate 0.0702   Epoch: 3   Global Step: 54040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:03,522-Speed 3232.75 samples/sec   Loss 3.7230   LearningRate 0.0702   Epoch: 3   Global Step: 54050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:06,602-Speed 3325.55 samples/sec   Loss 3.7677   LearningRate 0.0702   Epoch: 3   Global Step: 54060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:09,691-Speed 3315.94 samples/sec   Loss 3.6222   LearningRate 0.0702   Epoch: 3   Global Step: 54070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:12,741-Speed 3357.28 samples/sec   Loss 3.7517   LearningRate 0.0702   Epoch: 3   Global Step: 54080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:15,908-Speed 3234.30 samples/sec   Loss 3.7627   LearningRate 0.0702   Epoch: 3   Global Step: 54090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:18,958-Speed 3358.17 samples/sec   Loss 3.7732   LearningRate 0.0702   Epoch: 3   Global Step: 54100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:22,013-Speed 3353.23 samples/sec   Loss 3.8222   LearningRate 0.0702   Epoch: 3   Global Step: 54110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:25,065-Speed 3354.87 samples/sec   Loss 3.7175   LearningRate 0.0702   Epoch: 3   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:58:28,130-Speed 3341.65 samples/sec   Loss 3.8017   LearningRate 0.0702   Epoch: 3   Global Step: 54130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:31,226-Speed 3309.56 samples/sec   Loss 3.8244   LearningRate 0.0702   Epoch: 3   Global Step: 54140   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:35,074-Speed 2661.66 samples/sec   Loss 3.7864   LearningRate 0.0702   Epoch: 3   Global Step: 54150   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:38,144-Speed 3335.90 samples/sec   Loss 3.7564   LearningRate 0.0702   Epoch: 3   Global Step: 54160   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:41,205-Speed 3345.54 samples/sec   Loss 3.7502   LearningRate 0.0702   Epoch: 3   Global Step: 54170   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:44,265-Speed 3347.45 samples/sec   Loss 3.7682   LearningRate 0.0702   Epoch: 3   Global Step: 54180   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:47,327-Speed 3345.18 samples/sec   Loss 3.7244   LearningRate 0.0702   Epoch: 3   Global Step: 54190   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:50,387-Speed 3347.13 samples/sec   Loss 3.7195   LearningRate 0.0702   Epoch: 3   Global Step: 54200   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:53,451-Speed 3342.32 samples/sec   Loss 3.7695   LearningRate 0.0702   Epoch: 3   Global Step: 54210   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:56,527-Speed 3329.52 samples/sec   Loss 3.8131   LearningRate 0.0702   Epoch: 3   Global Step: 54220   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:58:59,601-Speed 3333.08 samples/sec   Loss 3.8118   LearningRate 0.0701   Epoch: 3   Global Step: 54230   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:59:02,677-Speed 3329.19 samples/sec   Loss 3.7104   LearningRate 0.0701   Epoch: 3   Global Step: 54240   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:59:05,753-Speed 3330.39 samples/sec   Loss 3.7407   LearningRate 0.0701   Epoch: 3   Global Step: 54250   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:59:08,834-Speed 3324.09 samples/sec   Loss 3.8006   LearningRate 0.0701   Epoch: 3   Global Step: 54260   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:59:11,911-Speed 3329.48 samples/sec   Loss 3.7485   LearningRate 0.0701   Epoch: 3   Global Step: 54270   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:59:14,989-Speed 3327.31 samples/sec   Loss 3.7464   LearningRate 0.0701   Epoch: 3   Global Step: 54280   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:59:18,047-Speed 3349.31 samples/sec   Loss 3.6455   LearningRate 0.0701   Epoch: 3   Global Step: 54290   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:59:21,106-Speed 3348.04 samples/sec   Loss 3.8088   LearningRate 0.0701   Epoch: 3   Global Step: 54300   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-04-11 04:59:24,154-Speed 3359.92 samples/sec   Loss 3.7882   LearningRate 0.0701   Epoch: 3   Global Step: 54310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:59:27,242-Speed 3317.95 samples/sec   Loss 3.7800   LearningRate 0.0701   Epoch: 3   Global Step: 54320   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:59:30,302-Speed 3346.68 samples/sec   Loss 3.7214   LearningRate 0.0701   Epoch: 3   Global Step: 54330   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:59:33,361-Speed 3349.27 samples/sec   Loss 3.7819   LearningRate 0.0701   Epoch: 3   Global Step: 54340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:59:36,428-Speed 3339.37 samples/sec   Loss 3.8224   LearningRate 0.0701   Epoch: 3   Global Step: 54350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:59:39,553-Speed 3277.96 samples/sec   Loss 3.6979   LearningRate 0.0701   Epoch: 3   Global Step: 54360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:59:42,653-Speed 3304.06 samples/sec   Loss 3.8041   LearningRate 0.0701   Epoch: 3   Global Step: 54370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:59:45,716-Speed 3343.05 samples/sec   Loss 3.7223   LearningRate 0.0701   Epoch: 3   Global Step: 54380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:59:48,772-Speed 3352.09 samples/sec   Loss 3.7485   LearningRate 0.0701   Epoch: 3   Global Step: 54390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-04-11 04:59:51,927-Speed 3246.58 samples/sec   Loss 3.7133   LearningRate 0.0701   Epoch: 3   Global Step: 54400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 04:59:55,022-Speed 3308.33 samples/sec   Loss 3.7488   LearningRate 0.0701   Epoch: 3   Global Step: 54410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 04:59:58,080-Speed 3350.57 samples/sec   Loss 3.8146   LearningRate 0.0701   Epoch: 3   Global Step: 54420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:01,143-Speed 3343.56 samples/sec   Loss 3.7286   LearningRate 0.0700   Epoch: 3   Global Step: 54430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:04,223-Speed 3325.52 samples/sec   Loss 3.6732   LearningRate 0.0700   Epoch: 3   Global Step: 54440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:07,290-Speed 3339.37 samples/sec   Loss 3.7556   LearningRate 0.0700   Epoch: 3   Global Step: 54450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:10,383-Speed 3311.48 samples/sec   Loss 3.7151   LearningRate 0.0700   Epoch: 3   Global Step: 54460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:13,481-Speed 3306.91 samples/sec   Loss 3.8058   LearningRate 0.0700   Epoch: 3   Global Step: 54470   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:16,548-Speed 3339.58 samples/sec   Loss 3.7329   LearningRate 0.0700   Epoch: 3   Global Step: 54480   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:19,612-Speed 3342.43 samples/sec   Loss 3.6959   LearningRate 0.0700   Epoch: 3   Global Step: 54490   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:22,669-Speed 3350.19 samples/sec   Loss 3.7057   LearningRate 0.0700   Epoch: 3   Global Step: 54500   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:25,732-Speed 3344.00 samples/sec   Loss 3.7991   LearningRate 0.0700   Epoch: 3   Global Step: 54510   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:28,793-Speed 3346.42 samples/sec   Loss 3.7579   LearningRate 0.0700   Epoch: 3   Global Step: 54520   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:31,863-Speed 3337.30 samples/sec   Loss 3.7969   LearningRate 0.0700   Epoch: 3   Global Step: 54530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:34,918-Speed 3351.70 samples/sec   Loss 3.6978   LearningRate 0.0700   Epoch: 3   Global Step: 54540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:37,987-Speed 3337.99 samples/sec   Loss 3.7517   LearningRate 0.0700   Epoch: 3   Global Step: 54550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:41,047-Speed 3346.89 samples/sec   Loss 3.8271   LearningRate 0.0700   Epoch: 3   Global Step: 54560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:44,149-Speed 3302.13 samples/sec   Loss 3.7835   LearningRate 0.0700   Epoch: 3   Global Step: 54570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:47,227-Speed 3327.96 samples/sec   Loss 3.7654   LearningRate 0.0700   Epoch: 3   Global Step: 54580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:50,302-Speed 3331.04 samples/sec   Loss 3.6975   LearningRate 0.0700   Epoch: 3   Global Step: 54590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:53,381-Speed 3326.33 samples/sec   Loss 3.7833   LearningRate 0.0700   Epoch: 3   Global Step: 54600   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:00:56,441-Speed 3347.55 samples/sec   Loss 3.8004   LearningRate 0.0700   Epoch: 3   Global Step: 54610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:00:59,495-Speed 3353.56 samples/sec   Loss 3.7505   LearningRate 0.0700   Epoch: 3   Global Step: 54620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:01:02,571-Speed 3329.94 samples/sec   Loss 3.7777   LearningRate 0.0699   Epoch: 3   Global Step: 54630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:01:05,691-Speed 3282.71 samples/sec   Loss 3.7856   LearningRate 0.0699   Epoch: 3   Global Step: 54640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:01:08,750-Speed 3347.54 samples/sec   Loss 3.6983   LearningRate 0.0699   Epoch: 3   Global Step: 54650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:01:11,819-Speed 3337.30 samples/sec   Loss 3.6839   LearningRate 0.0699   Epoch: 3   Global Step: 54660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:01:14,902-Speed 3322.64 samples/sec   Loss 3.7193   LearningRate 0.0699   Epoch: 3   Global Step: 54670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:01:18,002-Speed 3304.50 samples/sec   Loss 3.7590   LearningRate 0.0699   Epoch: 3   Global Step: 54680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:01:21,070-Speed 3338.65 samples/sec   Loss 3.8431   LearningRate 0.0699   Epoch: 3   Global Step: 54690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:01:24,162-Speed 3312.40 samples/sec   Loss 3.7318   LearningRate 0.0699   Epoch: 3   Global Step: 54700   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:01:27,355-Speed 3207.37 samples/sec   Loss 3.7306   LearningRate 0.0699   Epoch: 3   Global Step: 54710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:30,550-Speed 3206.22 samples/sec   Loss 3.7267   LearningRate 0.0699   Epoch: 3   Global Step: 54720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:33,621-Speed 3335.18 samples/sec   Loss 3.7988   LearningRate 0.0699   Epoch: 3   Global Step: 54730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:36,681-Speed 3346.21 samples/sec   Loss 3.8164   LearningRate 0.0699   Epoch: 3   Global Step: 54740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:39,754-Speed 3333.94 samples/sec   Loss 3.7086   LearningRate 0.0699   Epoch: 3   Global Step: 54750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:42,817-Speed 3343.73 samples/sec   Loss 3.7562   LearningRate 0.0699   Epoch: 3   Global Step: 54760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:45,887-Speed 3335.49 samples/sec   Loss 3.7537   LearningRate 0.0699   Epoch: 3   Global Step: 54770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:48,957-Speed 3337.41 samples/sec   Loss 3.7354   LearningRate 0.0699   Epoch: 3   Global Step: 54780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:52,126-Speed 3232.29 samples/sec   Loss 3.7024   LearningRate 0.0699   Epoch: 3   Global Step: 54790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:55,293-Speed 3233.55 samples/sec   Loss 3.7901   LearningRate 0.0699   Epoch: 3   Global Step: 54800   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:01:58,365-Speed 3334.27 samples/sec   Loss 3.7117   LearningRate 0.0699   Epoch: 3   Global Step: 54810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:02:01,491-Speed 3276.61 samples/sec   Loss 3.8362   LearningRate 0.0699   Epoch: 3   Global Step: 54820   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:02:04,554-Speed 3343.82 samples/sec   Loss 3.7094   LearningRate 0.0698   Epoch: 3   Global Step: 54830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:07,617-Speed 3343.74 samples/sec   Loss 3.7467   LearningRate 0.0698   Epoch: 3   Global Step: 54840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:10,685-Speed 3338.37 samples/sec   Loss 3.7729   LearningRate 0.0698   Epoch: 3   Global Step: 54850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:13,743-Speed 3349.75 samples/sec   Loss 3.7451   LearningRate 0.0698   Epoch: 3   Global Step: 54860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:16,808-Speed 3341.29 samples/sec   Loss 3.7341   LearningRate 0.0698   Epoch: 3   Global Step: 54870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:19,865-Speed 3351.29 samples/sec   Loss 3.8206   LearningRate 0.0698   Epoch: 3   Global Step: 54880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:22,932-Speed 3339.37 samples/sec   Loss 3.7057   LearningRate 0.0698   Epoch: 3   Global Step: 54890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:25,993-Speed 3346.24 samples/sec   Loss 3.6970   LearningRate 0.0698   Epoch: 3   Global Step: 54900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:29,061-Speed 3338.39 samples/sec   Loss 3.7345   LearningRate 0.0698   Epoch: 3   Global Step: 54910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:32,127-Speed 3340.70 samples/sec   Loss 3.7350   LearningRate 0.0698   Epoch: 3   Global Step: 54920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:35,203-Speed 3329.56 samples/sec   Loss 3.7288   LearningRate 0.0698   Epoch: 3   Global Step: 54930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:02:38,261-Speed 3349.18 samples/sec   Loss 3.7335   LearningRate 0.0698   Epoch: 3   Global Step: 54940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:02:41,331-Speed 3336.71 samples/sec   Loss 3.7782   LearningRate 0.0698   Epoch: 3   Global Step: 54950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:02:44,389-Speed 3349.43 samples/sec   Loss 3.8003   LearningRate 0.0698   Epoch: 3   Global Step: 54960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:02:47,450-Speed 3346.76 samples/sec   Loss 3.8291   LearningRate 0.0698   Epoch: 3   Global Step: 54970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:02:50,528-Speed 3327.60 samples/sec   Loss 3.7649   LearningRate 0.0698   Epoch: 3   Global Step: 54980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:53,658-Speed 3271.66 samples/sec   Loss 3.8244   LearningRate 0.0698   Epoch: 3   Global Step: 54990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:56,715-Speed 3351.10 samples/sec   Loss 3.7134   LearningRate 0.0698   Epoch: 3   Global Step: 55000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:02:59,775-Speed 3347.09 samples/sec   Loss 3.7798   LearningRate 0.0698   Epoch: 3   Global Step: 55010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:03:02,835-Speed 3347.51 samples/sec   Loss 3.6912   LearningRate 0.0698   Epoch: 3   Global Step: 55020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:03:05,894-Speed 3347.85 samples/sec   Loss 3.7708   LearningRate 0.0697   Epoch: 3   Global Step: 55030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:03:08,968-Speed 3331.17 samples/sec   Loss 3.7110   LearningRate 0.0697   Epoch: 3   Global Step: 55040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:03:12,029-Speed 3348.17 samples/sec   Loss 3.6821   LearningRate 0.0697   Epoch: 3   Global Step: 55050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:03:15,085-Speed 3351.46 samples/sec   Loss 3.7528   LearningRate 0.0697   Epoch: 3   Global Step: 55060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:03:18,154-Speed 3337.91 samples/sec   Loss 3.7274   LearningRate 0.0697   Epoch: 3   Global Step: 55070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:03:21,211-Speed 3349.50 samples/sec   Loss 3.8311   LearningRate 0.0697   Epoch: 3   Global Step: 55080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:24,276-Speed 3341.83 samples/sec   Loss 3.8218   LearningRate 0.0697   Epoch: 3   Global Step: 55090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:27,347-Speed 3334.88 samples/sec   Loss 3.8737   LearningRate 0.0697   Epoch: 3   Global Step: 55100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:30,412-Speed 3342.04 samples/sec   Loss 3.8134   LearningRate 0.0697   Epoch: 3   Global Step: 55110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:33,487-Speed 3331.16 samples/sec   Loss 3.7899   LearningRate 0.0697   Epoch: 3   Global Step: 55120   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:36,557-Speed 3336.50 samples/sec   Loss 3.8614   LearningRate 0.0697   Epoch: 3   Global Step: 55130   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:39,646-Speed 3315.29 samples/sec   Loss 3.7388   LearningRate 0.0697   Epoch: 3   Global Step: 55140   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:42,706-Speed 3348.20 samples/sec   Loss 3.7119   LearningRate 0.0697   Epoch: 3   Global Step: 55150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:45,804-Speed 3305.53 samples/sec   Loss 3.7996   LearningRate 0.0697   Epoch: 3   Global Step: 55160   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:48,896-Speed 3313.04 samples/sec   Loss 3.7921   LearningRate 0.0697   Epoch: 3   Global Step: 55170   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:03:51,967-Speed 3335.06 samples/sec   Loss 3.7328   LearningRate 0.0697   Epoch: 3   Global Step: 55180   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-04-11 05:03:55,031-Speed 3343.31 samples/sec   Loss 3.7257   LearningRate 0.0697   Epoch: 3   Global Step: 55190   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-04-11 05:03:58,084-Speed 3354.83 samples/sec   Loss 3.7687   LearningRate 0.0697   Epoch: 3   Global Step: 55200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:01,193-Speed 3293.52 samples/sec   Loss 3.8070   LearningRate 0.0697   Epoch: 3   Global Step: 55210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:04,343-Speed 3251.87 samples/sec   Loss 3.7762   LearningRate 0.0697   Epoch: 3   Global Step: 55220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:07,441-Speed 3306.56 samples/sec   Loss 3.7188   LearningRate 0.0696   Epoch: 3   Global Step: 55230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:10,509-Speed 3338.54 samples/sec   Loss 3.7823   LearningRate 0.0696   Epoch: 3   Global Step: 55240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:13,571-Speed 3344.87 samples/sec   Loss 3.7916   LearningRate 0.0696   Epoch: 3   Global Step: 55250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:16,630-Speed 3349.01 samples/sec   Loss 3.7563   LearningRate 0.0696   Epoch: 3   Global Step: 55260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:19,724-Speed 3310.76 samples/sec   Loss 3.7954   LearningRate 0.0696   Epoch: 3   Global Step: 55270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:22,842-Speed 3285.12 samples/sec   Loss 3.7295   LearningRate 0.0696   Epoch: 3   Global Step: 55280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:25,923-Speed 3324.31 samples/sec   Loss 3.8189   LearningRate 0.0696   Epoch: 3   Global Step: 55290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:04:28,955-Speed 3378.40 samples/sec   Loss 3.8304   LearningRate 0.0696   Epoch: 3   Global Step: 55300   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:32,026-Speed 3335.44 samples/sec   Loss 3.8046   LearningRate 0.0696   Epoch: 3   Global Step: 55310   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:35,086-Speed 3346.92 samples/sec   Loss 3.7903   LearningRate 0.0696   Epoch: 3   Global Step: 55320   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:38,244-Speed 3243.59 samples/sec   Loss 3.8490   LearningRate 0.0696   Epoch: 3   Global Step: 55330   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:41,313-Speed 3337.24 samples/sec   Loss 3.8365   LearningRate 0.0696   Epoch: 3   Global Step: 55340   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:44,403-Speed 3314.81 samples/sec   Loss 3.8010   LearningRate 0.0696   Epoch: 3   Global Step: 55350   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:47,523-Speed 3282.45 samples/sec   Loss 3.8029   LearningRate 0.0696   Epoch: 3   Global Step: 55360   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:50,611-Speed 3316.13 samples/sec   Loss 3.7893   LearningRate 0.0696   Epoch: 3   Global Step: 55370   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:53,674-Speed 3344.18 samples/sec   Loss 3.8180   LearningRate 0.0696   Epoch: 3   Global Step: 55380   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:56,747-Speed 3333.57 samples/sec   Loss 3.7810   LearningRate 0.0696   Epoch: 3   Global Step: 55390   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:04:59,827-Speed 3325.97 samples/sec   Loss 3.6477   LearningRate 0.0696   Epoch: 3   Global Step: 55400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:02,885-Speed 3349.08 samples/sec   Loss 3.7935   LearningRate 0.0696   Epoch: 3   Global Step: 55410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:05,947-Speed 3344.74 samples/sec   Loss 3.7040   LearningRate 0.0696   Epoch: 3   Global Step: 55420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:09,019-Speed 3334.03 samples/sec   Loss 3.8020   LearningRate 0.0695   Epoch: 3   Global Step: 55430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:13,661-Speed 2206.39 samples/sec   Loss 3.7932   LearningRate 0.0695   Epoch: 3   Global Step: 55440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:19,089-Speed 1886.90 samples/sec   Loss 3.8036   LearningRate 0.0695   Epoch: 3   Global Step: 55450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:22,242-Speed 3247.68 samples/sec   Loss 3.8131   LearningRate 0.0695   Epoch: 3   Global Step: 55460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:25,304-Speed 3346.16 samples/sec   Loss 3.8104   LearningRate 0.0695   Epoch: 3   Global Step: 55470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:28,372-Speed 3337.92 samples/sec   Loss 3.7052   LearningRate 0.0695   Epoch: 3   Global Step: 55480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:31,435-Speed 3343.77 samples/sec   Loss 3.8238   LearningRate 0.0695   Epoch: 3   Global Step: 55490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:34,505-Speed 3336.15 samples/sec   Loss 3.7726   LearningRate 0.0695   Epoch: 3   Global Step: 55500   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:05:37,588-Speed 3322.32 samples/sec   Loss 3.8049   LearningRate 0.0695   Epoch: 3   Global Step: 55510   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:05:40,655-Speed 3340.11 samples/sec   Loss 3.6991   LearningRate 0.0695   Epoch: 3   Global Step: 55520   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:05:43,791-Speed 3265.63 samples/sec   Loss 3.7454   LearningRate 0.0695   Epoch: 3   Global Step: 55530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:05:46,878-Speed 3317.91 samples/sec   Loss 3.7200   LearningRate 0.0695   Epoch: 3   Global Step: 55540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:05:49,939-Speed 3345.98 samples/sec   Loss 3.8591   LearningRate 0.0695   Epoch: 3   Global Step: 55550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:53,000-Speed 3347.02 samples/sec   Loss 3.7969   LearningRate 0.0695   Epoch: 3   Global Step: 55560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:56,064-Speed 3342.05 samples/sec   Loss 3.8071   LearningRate 0.0695   Epoch: 3   Global Step: 55570   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:05:59,129-Speed 3342.72 samples/sec   Loss 3.7456   LearningRate 0.0695   Epoch: 3   Global Step: 55580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:02,207-Speed 3326.45 samples/sec   Loss 3.7860   LearningRate 0.0695   Epoch: 3   Global Step: 55590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:05,272-Speed 3341.68 samples/sec   Loss 3.7657   LearningRate 0.0695   Epoch: 3   Global Step: 55600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:08,343-Speed 3335.03 samples/sec   Loss 3.8146   LearningRate 0.0695   Epoch: 3   Global Step: 55610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:11,408-Speed 3342.64 samples/sec   Loss 3.6786   LearningRate 0.0695   Epoch: 3   Global Step: 55620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:14,496-Speed 3316.64 samples/sec   Loss 3.7723   LearningRate 0.0694   Epoch: 3   Global Step: 55630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:17,571-Speed 3331.02 samples/sec   Loss 3.8524   LearningRate 0.0694   Epoch: 3   Global Step: 55640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:20,641-Speed 3336.70 samples/sec   Loss 3.8011   LearningRate 0.0694   Epoch: 3   Global Step: 55650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:06:23,707-Speed 3339.95 samples/sec   Loss 3.6917   LearningRate 0.0694   Epoch: 3   Global Step: 55660   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:06:26,785-Speed 3328.52 samples/sec   Loss 3.7429   LearningRate 0.0694   Epoch: 3   Global Step: 55670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:06:29,896-Speed 3292.00 samples/sec   Loss 3.7366   LearningRate 0.0694   Epoch: 3   Global Step: 55680   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:06:32,968-Speed 3334.25 samples/sec   Loss 3.7571   LearningRate 0.0694   Epoch: 3   Global Step: 55690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:06:36,044-Speed 3328.98 samples/sec   Loss 3.7457   LearningRate 0.0694   Epoch: 3   Global Step: 55700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:06:39,126-Speed 3323.11 samples/sec   Loss 3.7784   LearningRate 0.0694   Epoch: 3   Global Step: 55710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:06:42,203-Speed 3329.35 samples/sec   Loss 3.8188   LearningRate 0.0694   Epoch: 3   Global Step: 55720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:06:45,289-Speed 3318.59 samples/sec   Loss 3.7760   LearningRate 0.0694   Epoch: 3   Global Step: 55730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:06:48,348-Speed 3348.84 samples/sec   Loss 3.7719   LearningRate 0.0694   Epoch: 3   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:51,412-Speed 3342.99 samples/sec   Loss 3.7426   LearningRate 0.0694   Epoch: 3   Global Step: 55750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:54,473-Speed 3345.25 samples/sec   Loss 3.7155   LearningRate 0.0694   Epoch: 3   Global Step: 55760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:06:57,555-Speed 3323.99 samples/sec   Loss 3.7775   LearningRate 0.0694   Epoch: 3   Global Step: 55770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:07:00,628-Speed 3333.27 samples/sec   Loss 3.7432   LearningRate 0.0694   Epoch: 3   Global Step: 55780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:07:03,699-Speed 3335.03 samples/sec   Loss 3.8160   LearningRate 0.0694   Epoch: 3   Global Step: 55790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:07:06,766-Speed 3338.90 samples/sec   Loss 3.7785   LearningRate 0.0694   Epoch: 3   Global Step: 55800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:07:09,877-Speed 3292.51 samples/sec   Loss 3.7559   LearningRate 0.0694   Epoch: 3   Global Step: 55810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:07:13,006-Speed 3274.04 samples/sec   Loss 3.7169   LearningRate 0.0694   Epoch: 3   Global Step: 55820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:07:16,066-Speed 3347.11 samples/sec   Loss 3.7517   LearningRate 0.0693   Epoch: 3   Global Step: 55830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:07:19,141-Speed 3330.98 samples/sec   Loss 3.7906   LearningRate 0.0693   Epoch: 3   Global Step: 55840   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:22,219-Speed 3327.07 samples/sec   Loss 3.7589   LearningRate 0.0693   Epoch: 3   Global Step: 55850   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:25,325-Speed 3298.14 samples/sec   Loss 3.7836   LearningRate 0.0693   Epoch: 3   Global Step: 55860   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:28,406-Speed 3324.01 samples/sec   Loss 3.8299   LearningRate 0.0693   Epoch: 3   Global Step: 55870   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:31,471-Speed 3341.92 samples/sec   Loss 3.7912   LearningRate 0.0693   Epoch: 3   Global Step: 55880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:34,543-Speed 3334.23 samples/sec   Loss 3.7256   LearningRate 0.0693   Epoch: 3   Global Step: 55890   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:37,637-Speed 3310.00 samples/sec   Loss 3.7480   LearningRate 0.0693   Epoch: 3   Global Step: 55900   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:40,768-Speed 3271.51 samples/sec   Loss 3.7721   LearningRate 0.0693   Epoch: 3   Global Step: 55910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:43,839-Speed 3335.14 samples/sec   Loss 3.7539   LearningRate 0.0693   Epoch: 3   Global Step: 55920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:46,925-Speed 3319.49 samples/sec   Loss 3.7474   LearningRate 0.0693   Epoch: 3   Global Step: 55930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:50,013-Speed 3316.51 samples/sec   Loss 3.7529   LearningRate 0.0693   Epoch: 3   Global Step: 55940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:53,109-Speed 3308.53 samples/sec   Loss 3.7896   LearningRate 0.0693   Epoch: 3   Global Step: 55950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:56,207-Speed 3305.35 samples/sec   Loss 3.8163   LearningRate 0.0693   Epoch: 3   Global Step: 55960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:07:59,283-Speed 3330.38 samples/sec   Loss 3.7072   LearningRate 0.0693   Epoch: 3   Global Step: 55970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:08:02,502-Speed 3182.12 samples/sec   Loss 3.8168   LearningRate 0.0693   Epoch: 3   Global Step: 55980   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:08:05,583-Speed 3323.39 samples/sec   Loss 3.7982   LearningRate 0.0693   Epoch: 3   Global Step: 55990   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:08:08,668-Speed 3320.32 samples/sec   Loss 3.7425   LearningRate 0.0693   Epoch: 3   Global Step: 56000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:08:52,936-[lfw][56000]XNorm: 22.399913
Training: 2022-04-11 05:08:52,936-[lfw][56000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-11 05:08:52,937-[lfw][56000]Accuracy-Highest: 0.99800
Training: 2022-04-11 05:09:44,387-[cfp_fp][56000]XNorm: 21.294058
Training: 2022-04-11 05:09:44,388-[cfp_fp][56000]Accuracy-Flip: 0.98200+-0.00630
Training: 2022-04-11 05:09:44,388-[cfp_fp][56000]Accuracy-Highest: 0.98300
Training: 2022-04-11 05:10:28,641-[agedb_30][56000]XNorm: 22.735270
Training: 2022-04-11 05:10:28,642-[agedb_30][56000]Accuracy-Flip: 0.98100+-0.00712
Training: 2022-04-11 05:10:28,643-[agedb_30][56000]Accuracy-Highest: 0.98100
Training: 2022-04-11 05:10:31,716-Speed 71.59 samples/sec   Loss 3.7038   LearningRate 0.0693   Epoch: 3   Global Step: 56010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:10:34,815-Speed 3305.61 samples/sec   Loss 3.8192   LearningRate 0.0693   Epoch: 3   Global Step: 56020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:10:37,874-Speed 3348.20 samples/sec   Loss 3.7104   LearningRate 0.0692   Epoch: 3   Global Step: 56030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:10:40,936-Speed 3344.64 samples/sec   Loss 3.7659   LearningRate 0.0692   Epoch: 3   Global Step: 56040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:10:43,999-Speed 3343.52 samples/sec   Loss 3.8427   LearningRate 0.0692   Epoch: 3   Global Step: 56050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:10:47,057-Speed 3349.58 samples/sec   Loss 3.6847   LearningRate 0.0692   Epoch: 3   Global Step: 56060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:10:50,112-Speed 3352.59 samples/sec   Loss 3.7312   LearningRate 0.0692   Epoch: 3   Global Step: 56070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:10:53,164-Speed 3355.50 samples/sec   Loss 3.7659   LearningRate 0.0692   Epoch: 3   Global Step: 56080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:10:56,221-Speed 3350.85 samples/sec   Loss 3.8656   LearningRate 0.0692   Epoch: 3   Global Step: 56090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:10:59,290-Speed 3337.09 samples/sec   Loss 3.8681   LearningRate 0.0692   Epoch: 3   Global Step: 56100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:02,356-Speed 3340.59 samples/sec   Loss 3.7876   LearningRate 0.0692   Epoch: 3   Global Step: 56110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:05,423-Speed 3340.80 samples/sec   Loss 3.7682   LearningRate 0.0692   Epoch: 3   Global Step: 56120   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:08,478-Speed 3352.55 samples/sec   Loss 3.7857   LearningRate 0.0692   Epoch: 3   Global Step: 56130   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:11,536-Speed 3348.74 samples/sec   Loss 3.7562   LearningRate 0.0692   Epoch: 3   Global Step: 56140   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:14,595-Speed 3347.88 samples/sec   Loss 3.8165   LearningRate 0.0692   Epoch: 3   Global Step: 56150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:17,661-Speed 3341.37 samples/sec   Loss 3.7390   LearningRate 0.0692   Epoch: 3   Global Step: 56160   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:20,723-Speed 3344.14 samples/sec   Loss 3.6858   LearningRate 0.0692   Epoch: 3   Global Step: 56170   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:23,802-Speed 3326.82 samples/sec   Loss 3.7756   LearningRate 0.0692   Epoch: 3   Global Step: 56180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:26,870-Speed 3338.74 samples/sec   Loss 3.7562   LearningRate 0.0692   Epoch: 3   Global Step: 56190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:11:29,948-Speed 3327.57 samples/sec   Loss 3.7553   LearningRate 0.0692   Epoch: 3   Global Step: 56200   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:11:33,057-Speed 3294.13 samples/sec   Loss 3.8069   LearningRate 0.0692   Epoch: 3   Global Step: 56210   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:11:36,135-Speed 3328.86 samples/sec   Loss 3.7605   LearningRate 0.0692   Epoch: 3   Global Step: 56220   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:11:39,259-Speed 3277.68 samples/sec   Loss 3.7359   LearningRate 0.0691   Epoch: 3   Global Step: 56230   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:11:42,324-Speed 3342.54 samples/sec   Loss 3.8212   LearningRate 0.0691   Epoch: 3   Global Step: 56240   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:11:45,395-Speed 3334.66 samples/sec   Loss 3.6933   LearningRate 0.0691   Epoch: 3   Global Step: 56250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:11:48,463-Speed 3338.56 samples/sec   Loss 3.7253   LearningRate 0.0691   Epoch: 3   Global Step: 56260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:11:51,536-Speed 3332.33 samples/sec   Loss 3.6678   LearningRate 0.0691   Epoch: 3   Global Step: 56270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:11:54,639-Speed 3300.94 samples/sec   Loss 3.7786   LearningRate 0.0691   Epoch: 3   Global Step: 56280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:11:57,764-Speed 3278.02 samples/sec   Loss 3.7109   LearningRate 0.0691   Epoch: 3   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:00,845-Speed 3324.91 samples/sec   Loss 3.7297   LearningRate 0.0691   Epoch: 3   Global Step: 56300   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:03,907-Speed 3344.06 samples/sec   Loss 3.7054   LearningRate 0.0691   Epoch: 3   Global Step: 56310   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:06,993-Speed 3319.21 samples/sec   Loss 3.8225   LearningRate 0.0691   Epoch: 3   Global Step: 56320   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:10,054-Speed 3346.22 samples/sec   Loss 3.7629   LearningRate 0.0691   Epoch: 3   Global Step: 56330   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:13,150-Speed 3308.24 samples/sec   Loss 3.7622   LearningRate 0.0691   Epoch: 3   Global Step: 56340   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:16,281-Speed 3271.75 samples/sec   Loss 3.8005   LearningRate 0.0691   Epoch: 3   Global Step: 56350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:19,462-Speed 3219.36 samples/sec   Loss 3.7103   LearningRate 0.0691   Epoch: 3   Global Step: 56360   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:22,580-Speed 3285.62 samples/sec   Loss 3.7813   LearningRate 0.0691   Epoch: 3   Global Step: 56370   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:25,641-Speed 3346.24 samples/sec   Loss 3.7040   LearningRate 0.0691   Epoch: 3   Global Step: 56380   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:28,750-Speed 3293.79 samples/sec   Loss 3.8012   LearningRate 0.0691   Epoch: 3   Global Step: 56390   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:12:31,794-Speed 3364.74 samples/sec   Loss 3.6821   LearningRate 0.0691   Epoch: 3   Global Step: 56400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:34,854-Speed 3347.19 samples/sec   Loss 3.8431   LearningRate 0.0691   Epoch: 3   Global Step: 56410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:37,910-Speed 3351.56 samples/sec   Loss 3.7456   LearningRate 0.0691   Epoch: 3   Global Step: 56420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:40,966-Speed 3351.12 samples/sec   Loss 3.8422   LearningRate 0.0690   Epoch: 3   Global Step: 56430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:44,073-Speed 3296.91 samples/sec   Loss 3.7063   LearningRate 0.0690   Epoch: 3   Global Step: 56440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:47,166-Speed 3312.11 samples/sec   Loss 3.7506   LearningRate 0.0690   Epoch: 3   Global Step: 56450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:50,268-Speed 3302.21 samples/sec   Loss 3.8327   LearningRate 0.0690   Epoch: 3   Global Step: 56460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:53,324-Speed 3350.74 samples/sec   Loss 3.7097   LearningRate 0.0690   Epoch: 3   Global Step: 56470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:56,385-Speed 3347.00 samples/sec   Loss 3.7207   LearningRate 0.0690   Epoch: 3   Global Step: 56480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:12:59,440-Speed 3352.62 samples/sec   Loss 3.8648   LearningRate 0.0690   Epoch: 3   Global Step: 56490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:02,502-Speed 3344.49 samples/sec   Loss 3.7003   LearningRate 0.0690   Epoch: 3   Global Step: 56500   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:13:05,547-Speed 3364.03 samples/sec   Loss 3.6845   LearningRate 0.0690   Epoch: 3   Global Step: 56510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:08,606-Speed 3347.39 samples/sec   Loss 3.7502   LearningRate 0.0690   Epoch: 3   Global Step: 56520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:11,665-Speed 3349.29 samples/sec   Loss 3.7263   LearningRate 0.0690   Epoch: 3   Global Step: 56530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:14,799-Speed 3267.72 samples/sec   Loss 3.7602   LearningRate 0.0690   Epoch: 3   Global Step: 56540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:17,921-Speed 3280.32 samples/sec   Loss 3.8720   LearningRate 0.0690   Epoch: 3   Global Step: 56550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:21,108-Speed 3213.82 samples/sec   Loss 3.7961   LearningRate 0.0690   Epoch: 3   Global Step: 56560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:24,235-Speed 3276.18 samples/sec   Loss 3.8048   LearningRate 0.0690   Epoch: 3   Global Step: 56570   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:27,316-Speed 3324.65 samples/sec   Loss 3.7424   LearningRate 0.0690   Epoch: 3   Global Step: 56580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:30,406-Speed 3313.62 samples/sec   Loss 3.7440   LearningRate 0.0690   Epoch: 3   Global Step: 56590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:33,462-Speed 3351.56 samples/sec   Loss 3.7679   LearningRate 0.0690   Epoch: 3   Global Step: 56600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:13:36,523-Speed 3346.82 samples/sec   Loss 3.6973   LearningRate 0.0690   Epoch: 3   Global Step: 56610   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:13:39,594-Speed 3335.49 samples/sec   Loss 3.7272   LearningRate 0.0690   Epoch: 3   Global Step: 56620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:13:42,673-Speed 3325.79 samples/sec   Loss 3.7349   LearningRate 0.0689   Epoch: 3   Global Step: 56630   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:13:45,733-Speed 3347.46 samples/sec   Loss 3.7435   LearningRate 0.0689   Epoch: 3   Global Step: 56640   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:13:48,790-Speed 3351.40 samples/sec   Loss 3.8901   LearningRate 0.0689   Epoch: 3   Global Step: 56650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:13:51,879-Speed 3315.03 samples/sec   Loss 3.8000   LearningRate 0.0689   Epoch: 3   Global Step: 56660   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:13:54,972-Speed 3311.48 samples/sec   Loss 3.6889   LearningRate 0.0689   Epoch: 3   Global Step: 56670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:13:58,042-Speed 3336.46 samples/sec   Loss 3.7385   LearningRate 0.0689   Epoch: 3   Global Step: 56680   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:01,105-Speed 3344.11 samples/sec   Loss 3.8253   LearningRate 0.0689   Epoch: 3   Global Step: 56690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:04,162-Speed 3349.61 samples/sec   Loss 3.8195   LearningRate 0.0689   Epoch: 3   Global Step: 56700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:07,223-Speed 3346.90 samples/sec   Loss 3.8087   LearningRate 0.0689   Epoch: 3   Global Step: 56710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:10,286-Speed 3343.43 samples/sec   Loss 3.8357   LearningRate 0.0689   Epoch: 3   Global Step: 56720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:13,345-Speed 3348.41 samples/sec   Loss 3.6825   LearningRate 0.0689   Epoch: 3   Global Step: 56730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:16,407-Speed 3344.89 samples/sec   Loss 3.7271   LearningRate 0.0689   Epoch: 3   Global Step: 56740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:19,467-Speed 3346.92 samples/sec   Loss 3.7714   LearningRate 0.0689   Epoch: 3   Global Step: 56750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:22,542-Speed 3330.84 samples/sec   Loss 3.8383   LearningRate 0.0689   Epoch: 3   Global Step: 56760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:25,643-Speed 3304.23 samples/sec   Loss 3.7695   LearningRate 0.0689   Epoch: 3   Global Step: 56770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:14:28,741-Speed 3305.85 samples/sec   Loss 3.7119   LearningRate 0.0689   Epoch: 3   Global Step: 56780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:31,849-Speed 3295.13 samples/sec   Loss 3.7558   LearningRate 0.0689   Epoch: 3   Global Step: 56790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:34,919-Speed 3336.36 samples/sec   Loss 3.7895   LearningRate 0.0689   Epoch: 3   Global Step: 56800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:37,979-Speed 3346.95 samples/sec   Loss 3.7561   LearningRate 0.0689   Epoch: 3   Global Step: 56810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:41,070-Speed 3313.70 samples/sec   Loss 3.7636   LearningRate 0.0689   Epoch: 3   Global Step: 56820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:44,148-Speed 3327.97 samples/sec   Loss 3.8553   LearningRate 0.0688   Epoch: 3   Global Step: 56830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:47,249-Speed 3303.08 samples/sec   Loss 3.7086   LearningRate 0.0688   Epoch: 3   Global Step: 56840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:50,317-Speed 3337.92 samples/sec   Loss 3.7455   LearningRate 0.0688   Epoch: 3   Global Step: 56850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:53,377-Speed 3346.99 samples/sec   Loss 3.8677   LearningRate 0.0688   Epoch: 3   Global Step: 56860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:56,518-Speed 3261.01 samples/sec   Loss 3.8109   LearningRate 0.0688   Epoch: 3   Global Step: 56870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:14:59,653-Speed 3267.05 samples/sec   Loss 3.7437   LearningRate 0.0688   Epoch: 3   Global Step: 56880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:15:02,714-Speed 3346.57 samples/sec   Loss 3.7278   LearningRate 0.0688   Epoch: 3   Global Step: 56890   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:15:05,918-Speed 3196.23 samples/sec   Loss 3.8023   LearningRate 0.0688   Epoch: 3   Global Step: 56900   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:15:08,977-Speed 3348.33 samples/sec   Loss 3.7350   LearningRate 0.0688   Epoch: 3   Global Step: 56910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:15:12,041-Speed 3342.83 samples/sec   Loss 3.8119   LearningRate 0.0688   Epoch: 3   Global Step: 56920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:15,099-Speed 3349.30 samples/sec   Loss 3.7184   LearningRate 0.0688   Epoch: 3   Global Step: 56930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:18,158-Speed 3348.72 samples/sec   Loss 3.7180   LearningRate 0.0688   Epoch: 3   Global Step: 56940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:21,221-Speed 3344.33 samples/sec   Loss 3.7880   LearningRate 0.0688   Epoch: 3   Global Step: 56950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:24,287-Speed 3340.28 samples/sec   Loss 3.8279   LearningRate 0.0688   Epoch: 3   Global Step: 56960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:27,349-Speed 3344.95 samples/sec   Loss 3.7175   LearningRate 0.0688   Epoch: 3   Global Step: 56970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:30,424-Speed 3331.32 samples/sec   Loss 3.7064   LearningRate 0.0688   Epoch: 3   Global Step: 56980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:33,494-Speed 3336.92 samples/sec   Loss 3.7720   LearningRate 0.0688   Epoch: 3   Global Step: 56990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:36,560-Speed 3340.43 samples/sec   Loss 3.7178   LearningRate 0.0688   Epoch: 3   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:39,637-Speed 3328.46 samples/sec   Loss 3.7338   LearningRate 0.0688   Epoch: 3   Global Step: 57010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:15:42,714-Speed 3328.81 samples/sec   Loss 3.7028   LearningRate 0.0688   Epoch: 3   Global Step: 57020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:15:45,774-Speed 3347.31 samples/sec   Loss 3.7726   LearningRate 0.0688   Epoch: 3   Global Step: 57030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:15:48,850-Speed 3329.87 samples/sec   Loss 3.7701   LearningRate 0.0687   Epoch: 3   Global Step: 57040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:15:51,922-Speed 3333.91 samples/sec   Loss 3.7596   LearningRate 0.0687   Epoch: 3   Global Step: 57050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:15:54,990-Speed 3338.59 samples/sec   Loss 3.7462   LearningRate 0.0687   Epoch: 3   Global Step: 57060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:15:58,077-Speed 3318.08 samples/sec   Loss 3.8308   LearningRate 0.0687   Epoch: 3   Global Step: 57070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:01,146-Speed 3337.35 samples/sec   Loss 3.7849   LearningRate 0.0687   Epoch: 3   Global Step: 57080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:04,275-Speed 3273.75 samples/sec   Loss 3.7747   LearningRate 0.0687   Epoch: 3   Global Step: 57090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:07,337-Speed 3345.31 samples/sec   Loss 3.7781   LearningRate 0.0687   Epoch: 3   Global Step: 57100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:10,405-Speed 3338.21 samples/sec   Loss 3.7798   LearningRate 0.0687   Epoch: 3   Global Step: 57110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:13,457-Speed 3355.05 samples/sec   Loss 3.7840   LearningRate 0.0687   Epoch: 3   Global Step: 57120   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:16,531-Speed 3332.64 samples/sec   Loss 3.7774   LearningRate 0.0687   Epoch: 3   Global Step: 57130   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:19,598-Speed 3338.83 samples/sec   Loss 3.7739   LearningRate 0.0687   Epoch: 3   Global Step: 57140   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:22,662-Speed 3343.93 samples/sec   Loss 3.7864   LearningRate 0.0687   Epoch: 3   Global Step: 57150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:25,729-Speed 3338.97 samples/sec   Loss 3.7007   LearningRate 0.0687   Epoch: 3   Global Step: 57160   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:28,796-Speed 3339.62 samples/sec   Loss 3.7894   LearningRate 0.0687   Epoch: 3   Global Step: 57170   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:31,873-Speed 3328.76 samples/sec   Loss 3.7531   LearningRate 0.0687   Epoch: 3   Global Step: 57180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:34,952-Speed 3326.99 samples/sec   Loss 3.8062   LearningRate 0.0687   Epoch: 3   Global Step: 57190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:38,013-Speed 3346.15 samples/sec   Loss 3.6617   LearningRate 0.0687   Epoch: 3   Global Step: 57200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:41,091-Speed 3326.83 samples/sec   Loss 3.7617   LearningRate 0.0687   Epoch: 3   Global Step: 57210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:44,184-Speed 3311.25 samples/sec   Loss 3.8237   LearningRate 0.0687   Epoch: 3   Global Step: 57220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:16:47,268-Speed 3321.34 samples/sec   Loss 3.7443   LearningRate 0.0687   Epoch: 3   Global Step: 57230   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:16:50,377-Speed 3294.70 samples/sec   Loss 3.7207   LearningRate 0.0686   Epoch: 3   Global Step: 57240   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:16:53,517-Speed 3262.11 samples/sec   Loss 3.6570   LearningRate 0.0686   Epoch: 3   Global Step: 57250   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:16:56,601-Speed 3321.37 samples/sec   Loss 3.6903   LearningRate 0.0686   Epoch: 3   Global Step: 57260   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:16:59,662-Speed 3346.36 samples/sec   Loss 3.7085   LearningRate 0.0686   Epoch: 3   Global Step: 57270   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:17:02,729-Speed 3339.07 samples/sec   Loss 3.8074   LearningRate 0.0686   Epoch: 3   Global Step: 57280   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:17:05,790-Speed 3345.83 samples/sec   Loss 3.7706   LearningRate 0.0686   Epoch: 3   Global Step: 57290   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:17:08,857-Speed 3339.65 samples/sec   Loss 3.7229   LearningRate 0.0686   Epoch: 3   Global Step: 57300   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:17:11,924-Speed 3339.32 samples/sec   Loss 3.7534   LearningRate 0.0686   Epoch: 3   Global Step: 57310   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:17:15,015-Speed 3313.28 samples/sec   Loss 3.8229   LearningRate 0.0686   Epoch: 3   Global Step: 57320   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:17:18,096-Speed 3324.56 samples/sec   Loss 3.7920   LearningRate 0.0686   Epoch: 3   Global Step: 57330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:21,182-Speed 3319.70 samples/sec   Loss 3.7996   LearningRate 0.0686   Epoch: 3   Global Step: 57340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:24,257-Speed 3330.99 samples/sec   Loss 3.7707   LearningRate 0.0686   Epoch: 3   Global Step: 57350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:27,351-Speed 3310.12 samples/sec   Loss 3.7039   LearningRate 0.0686   Epoch: 3   Global Step: 57360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:30,543-Speed 3209.15 samples/sec   Loss 3.7970   LearningRate 0.0686   Epoch: 3   Global Step: 57370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:33,618-Speed 3330.52 samples/sec   Loss 3.7295   LearningRate 0.0686   Epoch: 3   Global Step: 57380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:36,688-Speed 3335.65 samples/sec   Loss 3.7560   LearningRate 0.0686   Epoch: 3   Global Step: 57390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:39,756-Speed 3338.95 samples/sec   Loss 3.7259   LearningRate 0.0686   Epoch: 3   Global Step: 57400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:42,828-Speed 3334.26 samples/sec   Loss 3.7562   LearningRate 0.0686   Epoch: 3   Global Step: 57410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:45,894-Speed 3340.92 samples/sec   Loss 3.7137   LearningRate 0.0686   Epoch: 3   Global Step: 57420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:17:49,026-Speed 3270.06 samples/sec   Loss 3.7151   LearningRate 0.0686   Epoch: 3   Global Step: 57430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:17:52,156-Speed 3273.44 samples/sec   Loss 3.8194   LearningRate 0.0685   Epoch: 3   Global Step: 57440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:17:55,258-Speed 3301.80 samples/sec   Loss 3.6567   LearningRate 0.0685   Epoch: 3   Global Step: 57450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:17:58,364-Speed 3296.92 samples/sec   Loss 3.8540   LearningRate 0.0685   Epoch: 3   Global Step: 57460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:18:01,433-Speed 3337.80 samples/sec   Loss 3.7431   LearningRate 0.0685   Epoch: 3   Global Step: 57470   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:18:04,495-Speed 3345.34 samples/sec   Loss 3.7448   LearningRate 0.0685   Epoch: 3   Global Step: 57480   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:18:07,572-Speed 3327.73 samples/sec   Loss 3.7602   LearningRate 0.0685   Epoch: 3   Global Step: 57490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:10,637-Speed 3342.41 samples/sec   Loss 3.7167   LearningRate 0.0685   Epoch: 3   Global Step: 57500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:13,718-Speed 3323.98 samples/sec   Loss 3.6826   LearningRate 0.0685   Epoch: 3   Global Step: 57510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:16,789-Speed 3335.80 samples/sec   Loss 3.7365   LearningRate 0.0685   Epoch: 3   Global Step: 57520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:19,866-Speed 3328.65 samples/sec   Loss 3.7068   LearningRate 0.0685   Epoch: 3   Global Step: 57530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:22,985-Speed 3283.98 samples/sec   Loss 3.7481   LearningRate 0.0685   Epoch: 3   Global Step: 57540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:26,070-Speed 3320.40 samples/sec   Loss 3.7472   LearningRate 0.0685   Epoch: 3   Global Step: 57550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:29,134-Speed 3342.57 samples/sec   Loss 3.8098   LearningRate 0.0685   Epoch: 3   Global Step: 57560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:32,200-Speed 3340.41 samples/sec   Loss 3.7961   LearningRate 0.0685   Epoch: 3   Global Step: 57570   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:35,264-Speed 3342.86 samples/sec   Loss 3.7181   LearningRate 0.0685   Epoch: 3   Global Step: 57580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:38,335-Speed 3335.31 samples/sec   Loss 3.8208   LearningRate 0.0685   Epoch: 3   Global Step: 57590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:18:41,396-Speed 3346.26 samples/sec   Loss 3.7492   LearningRate 0.0685   Epoch: 3   Global Step: 57600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:44,462-Speed 3340.38 samples/sec   Loss 3.6706   LearningRate 0.0685   Epoch: 3   Global Step: 57610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:47,535-Speed 3333.69 samples/sec   Loss 3.7070   LearningRate 0.0685   Epoch: 3   Global Step: 57620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:50,598-Speed 3343.09 samples/sec   Loss 3.7997   LearningRate 0.0685   Epoch: 3   Global Step: 57630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:53,686-Speed 3317.36 samples/sec   Loss 3.8231   LearningRate 0.0684   Epoch: 3   Global Step: 57640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:56,749-Speed 3344.15 samples/sec   Loss 3.7548   LearningRate 0.0684   Epoch: 3   Global Step: 57650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:18:59,853-Speed 3298.71 samples/sec   Loss 3.7330   LearningRate 0.0684   Epoch: 3   Global Step: 57660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:19:02,917-Speed 3343.11 samples/sec   Loss 3.7639   LearningRate 0.0684   Epoch: 3   Global Step: 57670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:19:05,996-Speed 3326.33 samples/sec   Loss 3.7451   LearningRate 0.0684   Epoch: 3   Global Step: 57680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:19:09,060-Speed 3343.34 samples/sec   Loss 3.7397   LearningRate 0.0684   Epoch: 3   Global Step: 57690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:19:12,212-Speed 3249.23 samples/sec   Loss 3.7658   LearningRate 0.0684   Epoch: 3   Global Step: 57700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:15,283-Speed 3335.25 samples/sec   Loss 3.7407   LearningRate 0.0684   Epoch: 3   Global Step: 57710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:18,347-Speed 3342.65 samples/sec   Loss 3.7221   LearningRate 0.0684   Epoch: 3   Global Step: 57720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:21,426-Speed 3327.36 samples/sec   Loss 3.8497   LearningRate 0.0684   Epoch: 3   Global Step: 57730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:24,509-Speed 3321.44 samples/sec   Loss 3.8356   LearningRate 0.0684   Epoch: 3   Global Step: 57740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:27,609-Speed 3304.83 samples/sec   Loss 3.7577   LearningRate 0.0684   Epoch: 3   Global Step: 57750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:30,782-Speed 3227.54 samples/sec   Loss 3.7318   LearningRate 0.0684   Epoch: 3   Global Step: 57760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:33,857-Speed 3330.16 samples/sec   Loss 3.7744   LearningRate 0.0684   Epoch: 3   Global Step: 57770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:36,933-Speed 3329.59 samples/sec   Loss 3.7663   LearningRate 0.0684   Epoch: 3   Global Step: 57780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:40,020-Speed 3318.40 samples/sec   Loss 3.7580   LearningRate 0.0684   Epoch: 3   Global Step: 57790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:43,081-Speed 3346.07 samples/sec   Loss 3.6226   LearningRate 0.0684   Epoch: 3   Global Step: 57800   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-04-11 05:19:46,145-Speed 3342.66 samples/sec   Loss 3.6823   LearningRate 0.0684   Epoch: 3   Global Step: 57810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:49,249-Speed 3300.05 samples/sec   Loss 3.7192   LearningRate 0.0684   Epoch: 3   Global Step: 57820   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:52,338-Speed 3315.54 samples/sec   Loss 3.7742   LearningRate 0.0684   Epoch: 3   Global Step: 57830   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:55,403-Speed 3341.79 samples/sec   Loss 3.8244   LearningRate 0.0683   Epoch: 3   Global Step: 57840   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:19:58,485-Speed 3323.13 samples/sec   Loss 3.7380   LearningRate 0.0683   Epoch: 3   Global Step: 57850   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:20:01,550-Speed 3341.57 samples/sec   Loss 3.7793   LearningRate 0.0683   Epoch: 3   Global Step: 57860   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:20:04,626-Speed 3331.25 samples/sec   Loss 3.7287   LearningRate 0.0683   Epoch: 3   Global Step: 57870   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:20:07,690-Speed 3342.70 samples/sec   Loss 3.7161   LearningRate 0.0683   Epoch: 3   Global Step: 57880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:20:10,754-Speed 3342.51 samples/sec   Loss 3.7669   LearningRate 0.0683   Epoch: 3   Global Step: 57890   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:20:13,815-Speed 3346.50 samples/sec   Loss 3.7513   LearningRate 0.0683   Epoch: 3   Global Step: 57900   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:20:16,877-Speed 3344.89 samples/sec   Loss 3.7355   LearningRate 0.0683   Epoch: 3   Global Step: 57910   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-04-11 05:20:19,931-Speed 3353.17 samples/sec   Loss 3.7229   LearningRate 0.0683   Epoch: 3   Global Step: 57920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:20:23,002-Speed 3335.65 samples/sec   Loss 3.7340   LearningRate 0.0683   Epoch: 3   Global Step: 57930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:20:26,051-Speed 3358.77 samples/sec   Loss 3.7220   LearningRate 0.0683   Epoch: 3   Global Step: 57940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:20:29,113-Speed 3345.25 samples/sec   Loss 3.6749   LearningRate 0.0683   Epoch: 3   Global Step: 57950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:20:32,197-Speed 3321.97 samples/sec   Loss 3.6875   LearningRate 0.0683   Epoch: 3   Global Step: 57960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:20:35,262-Speed 3341.55 samples/sec   Loss 3.6724   LearningRate 0.0683   Epoch: 3   Global Step: 57970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:20:38,369-Speed 3296.56 samples/sec   Loss 3.7518   LearningRate 0.0683   Epoch: 3   Global Step: 57980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:20:41,434-Speed 3341.27 samples/sec   Loss 3.6814   LearningRate 0.0683   Epoch: 3   Global Step: 57990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:20:44,516-Speed 3323.82 samples/sec   Loss 3.6528   LearningRate 0.0683   Epoch: 3   Global Step: 58000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:21:28,204-[lfw][58000]XNorm: 20.401003
Training: 2022-04-11 05:21:28,205-[lfw][58000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-11 05:21:28,205-[lfw][58000]Accuracy-Highest: 0.99800
Training: 2022-04-11 05:22:18,861-[cfp_fp][58000]XNorm: 18.769944
Training: 2022-04-11 05:22:18,861-[cfp_fp][58000]Accuracy-Flip: 0.98271+-0.00634
Training: 2022-04-11 05:22:18,862-[cfp_fp][58000]Accuracy-Highest: 0.98300
Training: 2022-04-11 05:23:02,496-[agedb_30][58000]XNorm: 20.396007
Training: 2022-04-11 05:23:02,497-[agedb_30][58000]Accuracy-Flip: 0.97967+-0.00888
Training: 2022-04-11 05:23:02,497-[agedb_30][58000]Accuracy-Highest: 0.98100
Training: 2022-04-11 05:23:05,586-Speed 72.59 samples/sec   Loss 3.6963   LearningRate 0.0683   Epoch: 3   Global Step: 58010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:23:08,634-Speed 3359.96 samples/sec   Loss 3.7708   LearningRate 0.0683   Epoch: 3   Global Step: 58020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:23:11,701-Speed 3340.14 samples/sec   Loss 3.7612   LearningRate 0.0683   Epoch: 3   Global Step: 58030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:23:14,751-Speed 3358.14 samples/sec   Loss 3.7288   LearningRate 0.0682   Epoch: 3   Global Step: 58040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:17,822-Speed 3335.35 samples/sec   Loss 3.7038   LearningRate 0.0682   Epoch: 3   Global Step: 58050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:20,904-Speed 3322.59 samples/sec   Loss 3.7473   LearningRate 0.0682   Epoch: 3   Global Step: 58060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:23,967-Speed 3344.12 samples/sec   Loss 3.8115   LearningRate 0.0682   Epoch: 3   Global Step: 58070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:27,027-Speed 3346.81 samples/sec   Loss 3.8280   LearningRate 0.0682   Epoch: 3   Global Step: 58080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:30,078-Speed 3357.05 samples/sec   Loss 3.7709   LearningRate 0.0682   Epoch: 3   Global Step: 58090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:33,131-Speed 3355.17 samples/sec   Loss 3.8605   LearningRate 0.0682   Epoch: 3   Global Step: 58100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:36,184-Speed 3354.79 samples/sec   Loss 3.8018   LearningRate 0.0682   Epoch: 3   Global Step: 58110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:39,240-Speed 3351.68 samples/sec   Loss 3.7608   LearningRate 0.0682   Epoch: 3   Global Step: 58120   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:42,314-Speed 3331.77 samples/sec   Loss 3.7054   LearningRate 0.0682   Epoch: 3   Global Step: 58130   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:45,364-Speed 3358.23 samples/sec   Loss 3.7492   LearningRate 0.0682   Epoch: 3   Global Step: 58140   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:48,476-Speed 3291.78 samples/sec   Loss 3.6888   LearningRate 0.0682   Epoch: 3   Global Step: 58150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:51,535-Speed 3347.69 samples/sec   Loss 3.7731   LearningRate 0.0682   Epoch: 3   Global Step: 58160   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:54,607-Speed 3334.46 samples/sec   Loss 3.6665   LearningRate 0.0682   Epoch: 3   Global Step: 58170   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:23:57,667-Speed 3347.69 samples/sec   Loss 3.7344   LearningRate 0.0682   Epoch: 3   Global Step: 58180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:00,773-Speed 3297.39 samples/sec   Loss 3.7491   LearningRate 0.0682   Epoch: 3   Global Step: 58190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:03,832-Speed 3347.88 samples/sec   Loss 3.8331   LearningRate 0.0682   Epoch: 3   Global Step: 58200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:06,897-Speed 3341.71 samples/sec   Loss 3.6821   LearningRate 0.0682   Epoch: 3   Global Step: 58210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:09,958-Speed 3345.97 samples/sec   Loss 3.7957   LearningRate 0.0682   Epoch: 3   Global Step: 58220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:13,037-Speed 3326.38 samples/sec   Loss 3.8136   LearningRate 0.0682   Epoch: 3   Global Step: 58230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:16,113-Speed 3329.85 samples/sec   Loss 3.7289   LearningRate 0.0682   Epoch: 3   Global Step: 58240   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-04-11 05:24:19,168-Speed 3352.71 samples/sec   Loss 3.6913   LearningRate 0.0681   Epoch: 3   Global Step: 58250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:22,229-Speed 3345.86 samples/sec   Loss 3.7213   LearningRate 0.0681   Epoch: 3   Global Step: 58260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:25,307-Speed 3327.33 samples/sec   Loss 3.7133   LearningRate 0.0681   Epoch: 3   Global Step: 58270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:28,372-Speed 3342.47 samples/sec   Loss 3.7316   LearningRate 0.0681   Epoch: 3   Global Step: 58280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:31,451-Speed 3326.09 samples/sec   Loss 3.7421   LearningRate 0.0681   Epoch: 3   Global Step: 58290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:24:34,514-Speed 3344.60 samples/sec   Loss 3.7518   LearningRate 0.0681   Epoch: 3   Global Step: 58300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:24:37,592-Speed 3327.41 samples/sec   Loss 3.8320   LearningRate 0.0681   Epoch: 3   Global Step: 58310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:24:40,651-Speed 3348.19 samples/sec   Loss 3.7672   LearningRate 0.0681   Epoch: 3   Global Step: 58320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:24:43,709-Speed 3348.97 samples/sec   Loss 3.7288   LearningRate 0.0681   Epoch: 3   Global Step: 58330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:24:46,782-Speed 3332.87 samples/sec   Loss 3.8251   LearningRate 0.0681   Epoch: 3   Global Step: 58340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:24:49,853-Speed 3335.11 samples/sec   Loss 3.7552   LearningRate 0.0681   Epoch: 3   Global Step: 58350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:24:53,036-Speed 3218.33 samples/sec   Loss 3.7227   LearningRate 0.0681   Epoch: 3   Global Step: 58360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:24:56,108-Speed 3333.54 samples/sec   Loss 3.6832   LearningRate 0.0681   Epoch: 3   Global Step: 58370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:24:59,162-Speed 3355.42 samples/sec   Loss 3.7881   LearningRate 0.0681   Epoch: 3   Global Step: 58380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:02,229-Speed 3338.65 samples/sec   Loss 3.7815   LearningRate 0.0681   Epoch: 3   Global Step: 58390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:05,342-Speed 3290.62 samples/sec   Loss 3.7120   LearningRate 0.0681   Epoch: 3   Global Step: 58400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:25:08,434-Speed 3312.60 samples/sec   Loss 3.7412   LearningRate 0.0681   Epoch: 3   Global Step: 58410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:25:11,487-Speed 3354.84 samples/sec   Loss 3.7947   LearningRate 0.0681   Epoch: 3   Global Step: 58420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:25:14,541-Speed 3353.91 samples/sec   Loss 3.8735   LearningRate 0.0681   Epoch: 3   Global Step: 58430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:25:17,600-Speed 3349.51 samples/sec   Loss 3.7400   LearningRate 0.0681   Epoch: 3   Global Step: 58440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:25:20,658-Speed 3348.57 samples/sec   Loss 3.8062   LearningRate 0.0680   Epoch: 3   Global Step: 58450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:25:23,718-Speed 3347.95 samples/sec   Loss 3.6681   LearningRate 0.0680   Epoch: 3   Global Step: 58460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:25:26,763-Speed 3363.44 samples/sec   Loss 3.7055   LearningRate 0.0680   Epoch: 3   Global Step: 58470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:29,830-Speed 3340.41 samples/sec   Loss 3.7151   LearningRate 0.0680   Epoch: 3   Global Step: 58480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:32,944-Speed 3288.77 samples/sec   Loss 3.6877   LearningRate 0.0680   Epoch: 3   Global Step: 58490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:36,027-Speed 3321.33 samples/sec   Loss 3.7045   LearningRate 0.0680   Epoch: 3   Global Step: 58500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:39,093-Speed 3341.02 samples/sec   Loss 3.7034   LearningRate 0.0680   Epoch: 3   Global Step: 58510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:42,152-Speed 3348.08 samples/sec   Loss 3.6576   LearningRate 0.0680   Epoch: 3   Global Step: 58520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:45,235-Speed 3322.56 samples/sec   Loss 3.7820   LearningRate 0.0680   Epoch: 3   Global Step: 58530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:48,378-Speed 3259.01 samples/sec   Loss 3.7309   LearningRate 0.0680   Epoch: 3   Global Step: 58540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:51,493-Speed 3288.41 samples/sec   Loss 3.6864   LearningRate 0.0680   Epoch: 3   Global Step: 58550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:54,555-Speed 3344.20 samples/sec   Loss 3.7361   LearningRate 0.0680   Epoch: 3   Global Step: 58560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:25:57,615-Speed 3347.47 samples/sec   Loss 3.7208   LearningRate 0.0680   Epoch: 3   Global Step: 58570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:26:00,747-Speed 3270.70 samples/sec   Loss 3.7426   LearningRate 0.0680   Epoch: 3   Global Step: 58580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:26:03,865-Speed 3284.69 samples/sec   Loss 3.7831   LearningRate 0.0680   Epoch: 3   Global Step: 58590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:26:06,933-Speed 3337.93 samples/sec   Loss 3.7419   LearningRate 0.0680   Epoch: 3   Global Step: 58600   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:26:10,069-Speed 3265.89 samples/sec   Loss 3.7688   LearningRate 0.0680   Epoch: 3   Global Step: 58610   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:26:13,111-Speed 3367.33 samples/sec   Loss 3.6808   LearningRate 0.0680   Epoch: 3   Global Step: 58620   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:16,209-Speed 3306.85 samples/sec   Loss 3.6751   LearningRate 0.0680   Epoch: 3   Global Step: 58630   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:19,317-Speed 3295.25 samples/sec   Loss 3.7604   LearningRate 0.0680   Epoch: 3   Global Step: 58640   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:22,399-Speed 3323.60 samples/sec   Loss 3.8193   LearningRate 0.0679   Epoch: 3   Global Step: 58650   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:25,495-Speed 3307.44 samples/sec   Loss 3.7164   LearningRate 0.0679   Epoch: 3   Global Step: 58660   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:28,565-Speed 3336.83 samples/sec   Loss 3.7064   LearningRate 0.0679   Epoch: 3   Global Step: 58670   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:31,702-Speed 3264.38 samples/sec   Loss 3.8239   LearningRate 0.0679   Epoch: 3   Global Step: 58680   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:34,801-Speed 3305.97 samples/sec   Loss 3.7732   LearningRate 0.0679   Epoch: 3   Global Step: 58690   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:37,909-Speed 3294.86 samples/sec   Loss 3.7232   LearningRate 0.0679   Epoch: 3   Global Step: 58700   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:41,030-Speed 3282.10 samples/sec   Loss 3.8173   LearningRate 0.0679   Epoch: 3   Global Step: 58710   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:26:44,114-Speed 3321.07 samples/sec   Loss 3.7170   LearningRate 0.0679   Epoch: 3   Global Step: 58720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:26:47,202-Speed 3316.99 samples/sec   Loss 3.7048   LearningRate 0.0679   Epoch: 3   Global Step: 58730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:26:50,311-Speed 3294.95 samples/sec   Loss 3.6468   LearningRate 0.0679   Epoch: 3   Global Step: 58740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:26:53,479-Speed 3232.32 samples/sec   Loss 3.7556   LearningRate 0.0679   Epoch: 3   Global Step: 58750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:26:56,612-Speed 3268.86 samples/sec   Loss 3.7268   LearningRate 0.0679   Epoch: 3   Global Step: 58760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:26:59,672-Speed 3347.37 samples/sec   Loss 3.8198   LearningRate 0.0679   Epoch: 3   Global Step: 58770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:02,744-Speed 3334.51 samples/sec   Loss 3.6985   LearningRate 0.0679   Epoch: 3   Global Step: 58780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:05,830-Speed 3318.46 samples/sec   Loss 3.6799   LearningRate 0.0679   Epoch: 3   Global Step: 58790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:08,907-Speed 3328.92 samples/sec   Loss 3.8097   LearningRate 0.0679   Epoch: 3   Global Step: 58800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:11,978-Speed 3335.38 samples/sec   Loss 3.7607   LearningRate 0.0679   Epoch: 3   Global Step: 58810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:15,052-Speed 3332.43 samples/sec   Loss 3.7284   LearningRate 0.0679   Epoch: 3   Global Step: 58820   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:27:18,127-Speed 3330.06 samples/sec   Loss 3.7172   LearningRate 0.0679   Epoch: 3   Global Step: 58830   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:27:21,216-Speed 3316.41 samples/sec   Loss 3.6552   LearningRate 0.0679   Epoch: 3   Global Step: 58840   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:27:24,363-Speed 3254.26 samples/sec   Loss 3.7050   LearningRate 0.0678   Epoch: 3   Global Step: 58850   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:27:27,479-Speed 3286.96 samples/sec   Loss 3.7390   LearningRate 0.0678   Epoch: 3   Global Step: 58860   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:27:30,588-Speed 3294.01 samples/sec   Loss 3.6990   LearningRate 0.0678   Epoch: 3   Global Step: 58870   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:27:33,657-Speed 3337.01 samples/sec   Loss 3.6639   LearningRate 0.0678   Epoch: 3   Global Step: 58880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:27:36,724-Speed 3340.20 samples/sec   Loss 3.7449   LearningRate 0.0678   Epoch: 3   Global Step: 58890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:39,823-Speed 3305.26 samples/sec   Loss 3.6425   LearningRate 0.0678   Epoch: 3   Global Step: 58900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:42,894-Speed 3335.42 samples/sec   Loss 3.6179   LearningRate 0.0678   Epoch: 3   Global Step: 58910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:45,962-Speed 3338.00 samples/sec   Loss 3.8045   LearningRate 0.0678   Epoch: 3   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:49,028-Speed 3341.38 samples/sec   Loss 3.7123   LearningRate 0.0678   Epoch: 3   Global Step: 58930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:52,102-Speed 3331.36 samples/sec   Loss 3.7749   LearningRate 0.0678   Epoch: 3   Global Step: 58940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:55,220-Speed 3284.97 samples/sec   Loss 3.7185   LearningRate 0.0678   Epoch: 3   Global Step: 58950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:27:58,293-Speed 3333.36 samples/sec   Loss 3.6937   LearningRate 0.0678   Epoch: 3   Global Step: 58960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:28:01,366-Speed 3332.63 samples/sec   Loss 3.6681   LearningRate 0.0678   Epoch: 3   Global Step: 58970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:28:04,433-Speed 3339.45 samples/sec   Loss 3.6558   LearningRate 0.0678   Epoch: 3   Global Step: 58980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:28:07,502-Speed 3337.66 samples/sec   Loss 3.7643   LearningRate 0.0678   Epoch: 3   Global Step: 58990   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:10,578-Speed 3330.19 samples/sec   Loss 3.7302   LearningRate 0.0678   Epoch: 3   Global Step: 59000   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:13,660-Speed 3323.38 samples/sec   Loss 3.7698   LearningRate 0.0678   Epoch: 3   Global Step: 59010   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:16,723-Speed 3342.92 samples/sec   Loss 3.7372   LearningRate 0.0678   Epoch: 3   Global Step: 59020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:19,787-Speed 3343.78 samples/sec   Loss 3.7026   LearningRate 0.0678   Epoch: 3   Global Step: 59030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:22,919-Speed 3269.56 samples/sec   Loss 3.8337   LearningRate 0.0678   Epoch: 3   Global Step: 59040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:26,045-Speed 3276.94 samples/sec   Loss 3.6774   LearningRate 0.0678   Epoch: 3   Global Step: 59050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:29,109-Speed 3343.52 samples/sec   Loss 3.8297   LearningRate 0.0677   Epoch: 3   Global Step: 59060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:32,186-Speed 3328.71 samples/sec   Loss 3.7215   LearningRate 0.0677   Epoch: 3   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:35,251-Speed 3341.53 samples/sec   Loss 3.6908   LearningRate 0.0677   Epoch: 3   Global Step: 59080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:38,323-Speed 3335.07 samples/sec   Loss 3.8356   LearningRate 0.0677   Epoch: 3   Global Step: 59090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:41,469-Speed 3255.60 samples/sec   Loss 3.6865   LearningRate 0.0677   Epoch: 3   Global Step: 59100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:44,591-Speed 3280.35 samples/sec   Loss 3.7057   LearningRate 0.0677   Epoch: 3   Global Step: 59110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:47,663-Speed 3334.22 samples/sec   Loss 3.7261   LearningRate 0.0677   Epoch: 3   Global Step: 59120   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:50,730-Speed 3339.59 samples/sec   Loss 3.7734   LearningRate 0.0677   Epoch: 3   Global Step: 59130   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:53,836-Speed 3296.85 samples/sec   Loss 3.6540   LearningRate 0.0677   Epoch: 3   Global Step: 59140   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:28:56,931-Speed 3309.75 samples/sec   Loss 3.7254   LearningRate 0.0677   Epoch: 3   Global Step: 59150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:00,059-Speed 3274.47 samples/sec   Loss 3.7057   LearningRate 0.0677   Epoch: 3   Global Step: 59160   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:03,129-Speed 3336.68 samples/sec   Loss 3.7443   LearningRate 0.0677   Epoch: 3   Global Step: 59170   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:06,203-Speed 3331.27 samples/sec   Loss 3.7205   LearningRate 0.0677   Epoch: 3   Global Step: 59180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:09,266-Speed 3344.19 samples/sec   Loss 3.7897   LearningRate 0.0677   Epoch: 3   Global Step: 59190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:12,329-Speed 3344.25 samples/sec   Loss 3.7320   LearningRate 0.0677   Epoch: 3   Global Step: 59200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:15,425-Speed 3308.13 samples/sec   Loss 3.7997   LearningRate 0.0677   Epoch: 3   Global Step: 59210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:18,501-Speed 3329.72 samples/sec   Loss 3.7220   LearningRate 0.0677   Epoch: 3   Global Step: 59220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:21,566-Speed 3342.19 samples/sec   Loss 3.7288   LearningRate 0.0677   Epoch: 3   Global Step: 59230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:24,634-Speed 3338.75 samples/sec   Loss 3.7126   LearningRate 0.0677   Epoch: 3   Global Step: 59240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:27,696-Speed 3344.53 samples/sec   Loss 3.7253   LearningRate 0.0677   Epoch: 3   Global Step: 59250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:30,765-Speed 3337.61 samples/sec   Loss 3.6995   LearningRate 0.0676   Epoch: 3   Global Step: 59260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:33,830-Speed 3341.07 samples/sec   Loss 3.8025   LearningRate 0.0676   Epoch: 3   Global Step: 59270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:36,905-Speed 3331.51 samples/sec   Loss 3.7914   LearningRate 0.0676   Epoch: 3   Global Step: 59280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:39,995-Speed 3314.34 samples/sec   Loss 3.6980   LearningRate 0.0676   Epoch: 3   Global Step: 59290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:43,066-Speed 3335.29 samples/sec   Loss 3.6705   LearningRate 0.0676   Epoch: 3   Global Step: 59300   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:46,145-Speed 3326.79 samples/sec   Loss 3.7156   LearningRate 0.0676   Epoch: 3   Global Step: 59310   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:49,295-Speed 3251.46 samples/sec   Loss 3.7117   LearningRate 0.0676   Epoch: 3   Global Step: 59320   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:52,373-Speed 3327.16 samples/sec   Loss 3.7559   LearningRate 0.0676   Epoch: 3   Global Step: 59330   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:55,444-Speed 3336.13 samples/sec   Loss 3.7365   LearningRate 0.0676   Epoch: 3   Global Step: 59340   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:29:58,510-Speed 3339.91 samples/sec   Loss 3.7199   LearningRate 0.0676   Epoch: 3   Global Step: 59350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:30:01,577-Speed 3339.72 samples/sec   Loss 3.7117   LearningRate 0.0676   Epoch: 3   Global Step: 59360   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:30:04,643-Speed 3340.95 samples/sec   Loss 3.6452   LearningRate 0.0676   Epoch: 3   Global Step: 59370   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:30:07,705-Speed 3344.65 samples/sec   Loss 3.7024   LearningRate 0.0676   Epoch: 3   Global Step: 59380   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:30:10,779-Speed 3331.71 samples/sec   Loss 3.7436   LearningRate 0.0676   Epoch: 3   Global Step: 59390   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:30:14,009-Speed 3171.53 samples/sec   Loss 3.7216   LearningRate 0.0676   Epoch: 3   Global Step: 59400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:30:17,286-Speed 3125.09 samples/sec   Loss 3.7095   LearningRate 0.0676   Epoch: 3   Global Step: 59410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:30:20,368-Speed 3323.84 samples/sec   Loss 3.6599   LearningRate 0.0676   Epoch: 3   Global Step: 59420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:23,482-Speed 3289.51 samples/sec   Loss 3.7397   LearningRate 0.0676   Epoch: 3   Global Step: 59430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:26,550-Speed 3338.18 samples/sec   Loss 3.7769   LearningRate 0.0676   Epoch: 3   Global Step: 59440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:29,641-Speed 3314.09 samples/sec   Loss 3.7839   LearningRate 0.0676   Epoch: 3   Global Step: 59450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:32,704-Speed 3342.81 samples/sec   Loss 3.6729   LearningRate 0.0675   Epoch: 3   Global Step: 59460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:35,773-Speed 3337.29 samples/sec   Loss 3.7150   LearningRate 0.0675   Epoch: 3   Global Step: 59470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:38,848-Speed 3331.79 samples/sec   Loss 3.6403   LearningRate 0.0675   Epoch: 3   Global Step: 59480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:41,929-Speed 3324.14 samples/sec   Loss 3.6920   LearningRate 0.0675   Epoch: 3   Global Step: 59490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:45,006-Speed 3329.04 samples/sec   Loss 3.8008   LearningRate 0.0675   Epoch: 3   Global Step: 59500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:48,109-Speed 3300.56 samples/sec   Loss 3.8344   LearningRate 0.0675   Epoch: 3   Global Step: 59510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:30:51,294-Speed 3215.88 samples/sec   Loss 3.6614   LearningRate 0.0675   Epoch: 3   Global Step: 59520   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:30:54,374-Speed 3325.38 samples/sec   Loss 3.7437   LearningRate 0.0675   Epoch: 3   Global Step: 59530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:30:57,472-Speed 3306.04 samples/sec   Loss 3.6836   LearningRate 0.0675   Epoch: 3   Global Step: 59540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:00,544-Speed 3334.19 samples/sec   Loss 3.7578   LearningRate 0.0675   Epoch: 3   Global Step: 59550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:03,615-Speed 3334.87 samples/sec   Loss 3.6958   LearningRate 0.0675   Epoch: 3   Global Step: 59560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:06,758-Speed 3258.96 samples/sec   Loss 3.7017   LearningRate 0.0675   Epoch: 3   Global Step: 59570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:09,843-Speed 3320.09 samples/sec   Loss 3.7780   LearningRate 0.0675   Epoch: 3   Global Step: 59580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:12,980-Speed 3265.68 samples/sec   Loss 3.6862   LearningRate 0.0675   Epoch: 3   Global Step: 59590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:16,124-Speed 3257.79 samples/sec   Loss 3.7628   LearningRate 0.0675   Epoch: 3   Global Step: 59600   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:19,222-Speed 3305.20 samples/sec   Loss 3.7603   LearningRate 0.0675   Epoch: 3   Global Step: 59610   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:22,361-Speed 3263.73 samples/sec   Loss 3.6138   LearningRate 0.0675   Epoch: 3   Global Step: 59620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:25,445-Speed 3321.57 samples/sec   Loss 3.7245   LearningRate 0.0675   Epoch: 3   Global Step: 59630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:28,508-Speed 3343.55 samples/sec   Loss 3.6461   LearningRate 0.0675   Epoch: 3   Global Step: 59640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:31,596-Speed 3316.89 samples/sec   Loss 3.7891   LearningRate 0.0675   Epoch: 3   Global Step: 59650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:34,662-Speed 3340.87 samples/sec   Loss 3.7072   LearningRate 0.0675   Epoch: 3   Global Step: 59660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:37,744-Speed 3323.02 samples/sec   Loss 3.6670   LearningRate 0.0674   Epoch: 3   Global Step: 59670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:40,807-Speed 3343.81 samples/sec   Loss 3.7735   LearningRate 0.0674   Epoch: 3   Global Step: 59680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:43,876-Speed 3337.92 samples/sec   Loss 3.7806   LearningRate 0.0674   Epoch: 3   Global Step: 59690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:46,946-Speed 3335.52 samples/sec   Loss 3.7143   LearningRate 0.0674   Epoch: 3   Global Step: 59700   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:50,015-Speed 3337.45 samples/sec   Loss 3.7141   LearningRate 0.0674   Epoch: 3   Global Step: 59710   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:53,087-Speed 3334.15 samples/sec   Loss 3.6905   LearningRate 0.0674   Epoch: 3   Global Step: 59720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:31:56,157-Speed 3336.85 samples/sec   Loss 3.6083   LearningRate 0.0674   Epoch: 3   Global Step: 59730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:31:59,219-Speed 3344.18 samples/sec   Loss 3.6342   LearningRate 0.0674   Epoch: 3   Global Step: 59740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:32:02,286-Speed 3339.79 samples/sec   Loss 3.7256   LearningRate 0.0674   Epoch: 3   Global Step: 59750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:32:05,355-Speed 3337.66 samples/sec   Loss 3.7104   LearningRate 0.0674   Epoch: 3   Global Step: 59760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:32:08,417-Speed 3345.12 samples/sec   Loss 3.7396   LearningRate 0.0674   Epoch: 3   Global Step: 59770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:32:11,519-Speed 3302.24 samples/sec   Loss 3.7124   LearningRate 0.0674   Epoch: 3   Global Step: 59780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:32:14,570-Speed 3356.75 samples/sec   Loss 3.7775   LearningRate 0.0674   Epoch: 3   Global Step: 59790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:17,638-Speed 3338.80 samples/sec   Loss 3.6736   LearningRate 0.0674   Epoch: 3   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:20,717-Speed 3326.69 samples/sec   Loss 3.6777   LearningRate 0.0674   Epoch: 3   Global Step: 59810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:23,802-Speed 3319.22 samples/sec   Loss 3.7188   LearningRate 0.0674   Epoch: 3   Global Step: 59820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:26,867-Speed 3342.10 samples/sec   Loss 3.7145   LearningRate 0.0674   Epoch: 3   Global Step: 59830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:29,994-Speed 3275.64 samples/sec   Loss 3.6568   LearningRate 0.0674   Epoch: 3   Global Step: 59840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:33,136-Speed 3259.62 samples/sec   Loss 3.7759   LearningRate 0.0674   Epoch: 3   Global Step: 59850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:36,203-Speed 3340.28 samples/sec   Loss 3.7418   LearningRate 0.0674   Epoch: 3   Global Step: 59860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:39,271-Speed 3337.75 samples/sec   Loss 3.7072   LearningRate 0.0673   Epoch: 3   Global Step: 59870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:42,335-Speed 3342.70 samples/sec   Loss 3.6922   LearningRate 0.0673   Epoch: 3   Global Step: 59880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:32:45,413-Speed 3328.06 samples/sec   Loss 3.6907   LearningRate 0.0673   Epoch: 3   Global Step: 59890   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:32:48,476-Speed 3343.68 samples/sec   Loss 3.6583   LearningRate 0.0673   Epoch: 3   Global Step: 59900   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:32:51,542-Speed 3340.94 samples/sec   Loss 3.7307   LearningRate 0.0673   Epoch: 3   Global Step: 59910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:32:54,614-Speed 3333.70 samples/sec   Loss 3.7244   LearningRate 0.0673   Epoch: 3   Global Step: 59920   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:32:57,683-Speed 3338.09 samples/sec   Loss 3.7207   LearningRate 0.0673   Epoch: 3   Global Step: 59930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:33:00,757-Speed 3332.13 samples/sec   Loss 3.7164   LearningRate 0.0673   Epoch: 3   Global Step: 59940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:33:03,890-Speed 3269.13 samples/sec   Loss 3.7647   LearningRate 0.0673   Epoch: 3   Global Step: 59950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:33:07,107-Speed 3184.47 samples/sec   Loss 3.7373   LearningRate 0.0673   Epoch: 3   Global Step: 59960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:33:10,200-Speed 3310.68 samples/sec   Loss 3.8015   LearningRate 0.0673   Epoch: 3   Global Step: 59970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:33:13,279-Speed 3327.32 samples/sec   Loss 3.7342   LearningRate 0.0673   Epoch: 3   Global Step: 59980   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:33:16,336-Speed 3350.01 samples/sec   Loss 3.7340   LearningRate 0.0673   Epoch: 3   Global Step: 59990   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:33:19,401-Speed 3341.89 samples/sec   Loss 3.6236   LearningRate 0.0673   Epoch: 3   Global Step: 60000   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:34:03,939-[lfw][60000]XNorm: 22.256462
Training: 2022-04-11 05:34:03,940-[lfw][60000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-04-11 05:34:03,940-[lfw][60000]Accuracy-Highest: 0.99800
Training: 2022-04-11 05:34:55,921-[cfp_fp][60000]XNorm: 20.752711
Training: 2022-04-11 05:34:55,922-[cfp_fp][60000]Accuracy-Flip: 0.98414+-0.00621
Training: 2022-04-11 05:34:55,922-[cfp_fp][60000]Accuracy-Highest: 0.98414
Training: 2022-04-11 05:35:40,176-[agedb_30][60000]XNorm: 22.088436
Training: 2022-04-11 05:35:40,177-[agedb_30][60000]Accuracy-Flip: 0.98017+-0.00867
Training: 2022-04-11 05:35:40,177-[agedb_30][60000]Accuracy-Highest: 0.98100
Training: 2022-04-11 05:35:43,243-Speed 71.19 samples/sec   Loss 3.6796   LearningRate 0.0673   Epoch: 3   Global Step: 60010   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:35:46,288-Speed 3362.74 samples/sec   Loss 3.6490   LearningRate 0.0673   Epoch: 3   Global Step: 60020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:35:49,351-Speed 3344.12 samples/sec   Loss 3.6974   LearningRate 0.0673   Epoch: 3   Global Step: 60030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:35:52,404-Speed 3355.07 samples/sec   Loss 3.7389   LearningRate 0.0673   Epoch: 3   Global Step: 60040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:35:55,445-Speed 3368.48 samples/sec   Loss 3.6447   LearningRate 0.0673   Epoch: 3   Global Step: 60050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:35:58,503-Speed 3349.61 samples/sec   Loss 3.7179   LearningRate 0.0673   Epoch: 3   Global Step: 60060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:36:01,565-Speed 3345.15 samples/sec   Loss 3.8119   LearningRate 0.0672   Epoch: 3   Global Step: 60070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:36:04,622-Speed 3350.56 samples/sec   Loss 3.5844   LearningRate 0.0672   Epoch: 3   Global Step: 60080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:36:07,679-Speed 3349.69 samples/sec   Loss 3.7505   LearningRate 0.0672   Epoch: 3   Global Step: 60090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:36:10,734-Speed 3352.73 samples/sec   Loss 3.6022   LearningRate 0.0672   Epoch: 3   Global Step: 60100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:36:13,790-Speed 3351.82 samples/sec   Loss 3.7557   LearningRate 0.0672   Epoch: 3   Global Step: 60110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:36:16,972-Speed 3218.25 samples/sec   Loss 3.5645   LearningRate 0.0672   Epoch: 3   Global Step: 60120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:36:20,053-Speed 3325.04 samples/sec   Loss 3.6423   LearningRate 0.0672   Epoch: 3   Global Step: 60130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:36:23,117-Speed 3342.25 samples/sec   Loss 3.6796   LearningRate 0.0672   Epoch: 3   Global Step: 60140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:36:26,180-Speed 3344.48 samples/sec   Loss 3.6032   LearningRate 0.0672   Epoch: 3   Global Step: 60150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:29,264-Speed 3321.42 samples/sec   Loss 3.6596   LearningRate 0.0672   Epoch: 3   Global Step: 60160   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:32,332-Speed 3338.28 samples/sec   Loss 3.6766   LearningRate 0.0672   Epoch: 3   Global Step: 60170   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:35,402-Speed 3336.77 samples/sec   Loss 3.7372   LearningRate 0.0672   Epoch: 3   Global Step: 60180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:38,518-Speed 3286.72 samples/sec   Loss 3.7038   LearningRate 0.0672   Epoch: 3   Global Step: 60190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:41,714-Speed 3204.85 samples/sec   Loss 3.6882   LearningRate 0.0672   Epoch: 3   Global Step: 60200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:44,824-Speed 3292.89 samples/sec   Loss 3.7025   LearningRate 0.0672   Epoch: 3   Global Step: 60210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:47,896-Speed 3334.73 samples/sec   Loss 3.6521   LearningRate 0.0672   Epoch: 3   Global Step: 60220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:50,965-Speed 3337.54 samples/sec   Loss 3.6782   LearningRate 0.0672   Epoch: 3   Global Step: 60230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:54,039-Speed 3332.72 samples/sec   Loss 3.7352   LearningRate 0.0672   Epoch: 3   Global Step: 60240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:36:57,091-Speed 3356.27 samples/sec   Loss 3.7120   LearningRate 0.0672   Epoch: 3   Global Step: 60250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:00,160-Speed 3336.91 samples/sec   Loss 3.6738   LearningRate 0.0672   Epoch: 3   Global Step: 60260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:03,230-Speed 3335.97 samples/sec   Loss 3.6976   LearningRate 0.0672   Epoch: 3   Global Step: 60270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:06,323-Speed 3311.66 samples/sec   Loss 3.7018   LearningRate 0.0671   Epoch: 3   Global Step: 60280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:09,398-Speed 3330.80 samples/sec   Loss 3.6665   LearningRate 0.0671   Epoch: 3   Global Step: 60290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:12,481-Speed 3321.83 samples/sec   Loss 3.6165   LearningRate 0.0671   Epoch: 3   Global Step: 60300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:15,543-Speed 3345.62 samples/sec   Loss 3.6391   LearningRate 0.0671   Epoch: 3   Global Step: 60310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:18,613-Speed 3337.00 samples/sec   Loss 3.6111   LearningRate 0.0671   Epoch: 3   Global Step: 60320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:21,684-Speed 3335.08 samples/sec   Loss 3.6602   LearningRate 0.0671   Epoch: 3   Global Step: 60330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:24,745-Speed 3346.83 samples/sec   Loss 3.6458   LearningRate 0.0671   Epoch: 3   Global Step: 60340   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:27,855-Speed 3292.88 samples/sec   Loss 3.7948   LearningRate 0.0671   Epoch: 3   Global Step: 60350   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:30,986-Speed 3271.35 samples/sec   Loss 3.6349   LearningRate 0.0671   Epoch: 3   Global Step: 60360   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:34,047-Speed 3346.10 samples/sec   Loss 3.7292   LearningRate 0.0671   Epoch: 3   Global Step: 60370   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:37,150-Speed 3301.33 samples/sec   Loss 3.5960   LearningRate 0.0671   Epoch: 3   Global Step: 60380   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:40,220-Speed 3335.88 samples/sec   Loss 3.6977   LearningRate 0.0671   Epoch: 3   Global Step: 60390   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:43,280-Speed 3347.17 samples/sec   Loss 3.6997   LearningRate 0.0671   Epoch: 3   Global Step: 60400   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:46,350-Speed 3335.74 samples/sec   Loss 3.7150   LearningRate 0.0671   Epoch: 3   Global Step: 60410   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:49,444-Speed 3310.68 samples/sec   Loss 3.6845   LearningRate 0.0671   Epoch: 3   Global Step: 60420   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:52,508-Speed 3343.43 samples/sec   Loss 3.7386   LearningRate 0.0671   Epoch: 3   Global Step: 60430   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:37:55,602-Speed 3310.75 samples/sec   Loss 3.6451   LearningRate 0.0671   Epoch: 3   Global Step: 60440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:37:58,661-Speed 3347.94 samples/sec   Loss 3.6321   LearningRate 0.0671   Epoch: 3   Global Step: 60450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:38:01,718-Speed 3350.94 samples/sec   Loss 3.7279   LearningRate 0.0671   Epoch: 3   Global Step: 60460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:38:04,786-Speed 3338.38 samples/sec   Loss 3.6814   LearningRate 0.0671   Epoch: 3   Global Step: 60470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:38:07,865-Speed 3326.88 samples/sec   Loss 3.6337   LearningRate 0.0670   Epoch: 3   Global Step: 60480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:38:10,924-Speed 3347.52 samples/sec   Loss 3.7210   LearningRate 0.0670   Epoch: 3   Global Step: 60490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:38:14,013-Speed 3316.09 samples/sec   Loss 3.6797   LearningRate 0.0670   Epoch: 3   Global Step: 60500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:38:17,093-Speed 3325.35 samples/sec   Loss 3.6670   LearningRate 0.0670   Epoch: 3   Global Step: 60510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:38:20,153-Speed 3347.31 samples/sec   Loss 3.6844   LearningRate 0.0670   Epoch: 3   Global Step: 60520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:38:23,218-Speed 3342.03 samples/sec   Loss 3.6887   LearningRate 0.0670   Epoch: 3   Global Step: 60530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:38:26,285-Speed 3339.72 samples/sec   Loss 3.6491   LearningRate 0.0670   Epoch: 3   Global Step: 60540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:29,362-Speed 3328.52 samples/sec   Loss 3.6868   LearningRate 0.0670   Epoch: 3   Global Step: 60550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:32,441-Speed 3325.86 samples/sec   Loss 3.7117   LearningRate 0.0670   Epoch: 3   Global Step: 60560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:35,503-Speed 3345.92 samples/sec   Loss 3.6414   LearningRate 0.0670   Epoch: 3   Global Step: 60570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:38,563-Speed 3346.38 samples/sec   Loss 3.6889   LearningRate 0.0670   Epoch: 3   Global Step: 60580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:41,628-Speed 3342.32 samples/sec   Loss 3.6940   LearningRate 0.0670   Epoch: 3   Global Step: 60590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:44,690-Speed 3345.41 samples/sec   Loss 3.7111   LearningRate 0.0670   Epoch: 3   Global Step: 60600   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:47,804-Speed 3289.31 samples/sec   Loss 3.6963   LearningRate 0.0670   Epoch: 3   Global Step: 60610   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:50,893-Speed 3315.39 samples/sec   Loss 3.6424   LearningRate 0.0670   Epoch: 3   Global Step: 60620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:54,042-Speed 3252.35 samples/sec   Loss 3.6604   LearningRate 0.0670   Epoch: 3   Global Step: 60630   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:38:57,170-Speed 3275.08 samples/sec   Loss 3.7653   LearningRate 0.0670   Epoch: 3   Global Step: 60640   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:00,228-Speed 3349.03 samples/sec   Loss 3.6005   LearningRate 0.0670   Epoch: 3   Global Step: 60650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:03,289-Speed 3345.65 samples/sec   Loss 3.6657   LearningRate 0.0670   Epoch: 3   Global Step: 60660   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:06,365-Speed 3329.51 samples/sec   Loss 3.6752   LearningRate 0.0670   Epoch: 3   Global Step: 60670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:09,459-Speed 3311.07 samples/sec   Loss 3.6794   LearningRate 0.0669   Epoch: 3   Global Step: 60680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:12,667-Speed 3192.33 samples/sec   Loss 3.6303   LearningRate 0.0669   Epoch: 3   Global Step: 60690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:15,775-Speed 3296.18 samples/sec   Loss 3.6484   LearningRate 0.0669   Epoch: 3   Global Step: 60700   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:18,843-Speed 3338.25 samples/sec   Loss 3.7981   LearningRate 0.0669   Epoch: 3   Global Step: 60710   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:21,912-Speed 3337.77 samples/sec   Loss 3.7064   LearningRate 0.0669   Epoch: 3   Global Step: 60720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:24,980-Speed 3338.63 samples/sec   Loss 3.6146   LearningRate 0.0669   Epoch: 3   Global Step: 60730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:28,045-Speed 3341.80 samples/sec   Loss 3.6894   LearningRate 0.0669   Epoch: 3   Global Step: 60740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:31,109-Speed 3342.67 samples/sec   Loss 3.6282   LearningRate 0.0669   Epoch: 3   Global Step: 60750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:34,257-Speed 3253.72 samples/sec   Loss 3.7161   LearningRate 0.0669   Epoch: 3   Global Step: 60760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:37,346-Speed 3315.65 samples/sec   Loss 3.6751   LearningRate 0.0669   Epoch: 3   Global Step: 60770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:39:40,412-Speed 3340.17 samples/sec   Loss 3.6104   LearningRate 0.0669   Epoch: 3   Global Step: 60780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:43,479-Speed 3340.50 samples/sec   Loss 3.6679   LearningRate 0.0669   Epoch: 3   Global Step: 60790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:46,547-Speed 3338.15 samples/sec   Loss 3.7021   LearningRate 0.0669   Epoch: 3   Global Step: 60800   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:49,610-Speed 3343.25 samples/sec   Loss 3.7368   LearningRate 0.0669   Epoch: 3   Global Step: 60810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:52,671-Speed 3346.91 samples/sec   Loss 3.6354   LearningRate 0.0669   Epoch: 3   Global Step: 60820   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:55,732-Speed 3345.38 samples/sec   Loss 3.6361   LearningRate 0.0669   Epoch: 3   Global Step: 60830   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:39:58,795-Speed 3344.28 samples/sec   Loss 3.7294   LearningRate 0.0669   Epoch: 3   Global Step: 60840   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:40:01,856-Speed 3346.08 samples/sec   Loss 3.7390   LearningRate 0.0669   Epoch: 3   Global Step: 60850   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:40:04,924-Speed 3338.18 samples/sec   Loss 3.6373   LearningRate 0.0669   Epoch: 3   Global Step: 60860   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:40:08,103-Speed 3222.16 samples/sec   Loss 3.7053   LearningRate 0.0669   Epoch: 3   Global Step: 60870   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:40:11,210-Speed 3297.16 samples/sec   Loss 3.7424   LearningRate 0.0669   Epoch: 3   Global Step: 60880   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:40:14,347-Speed 3264.85 samples/sec   Loss 3.6472   LearningRate 0.0668   Epoch: 3   Global Step: 60890   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:40:17,502-Speed 3245.78 samples/sec   Loss 3.5876   LearningRate 0.0668   Epoch: 3   Global Step: 60900   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:40:20,585-Speed 3321.76 samples/sec   Loss 3.6477   LearningRate 0.0668   Epoch: 3   Global Step: 60910   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:40:23,637-Speed 3356.04 samples/sec   Loss 3.6449   LearningRate 0.0668   Epoch: 3   Global Step: 60920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:26,704-Speed 3339.55 samples/sec   Loss 3.6615   LearningRate 0.0668   Epoch: 3   Global Step: 60930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:29,780-Speed 3329.92 samples/sec   Loss 3.6399   LearningRate 0.0668   Epoch: 3   Global Step: 60940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:32,869-Speed 3316.42 samples/sec   Loss 3.6823   LearningRate 0.0668   Epoch: 3   Global Step: 60950   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:35,935-Speed 3340.75 samples/sec   Loss 3.6752   LearningRate 0.0668   Epoch: 3   Global Step: 60960   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:38,995-Speed 3346.66 samples/sec   Loss 3.6421   LearningRate 0.0668   Epoch: 3   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:42,060-Speed 3342.09 samples/sec   Loss 3.8467   LearningRate 0.0668   Epoch: 3   Global Step: 60980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:45,127-Speed 3340.11 samples/sec   Loss 3.6750   LearningRate 0.0668   Epoch: 3   Global Step: 60990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:48,195-Speed 3337.66 samples/sec   Loss 3.6488   LearningRate 0.0668   Epoch: 3   Global Step: 61000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:51,276-Speed 3324.44 samples/sec   Loss 3.7135   LearningRate 0.0668   Epoch: 3   Global Step: 61010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:40:54,345-Speed 3337.25 samples/sec   Loss 3.7264   LearningRate 0.0668   Epoch: 3   Global Step: 61020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:40:57,416-Speed 3335.78 samples/sec   Loss 3.6500   LearningRate 0.0668   Epoch: 3   Global Step: 61030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:00,480-Speed 3342.54 samples/sec   Loss 3.7290   LearningRate 0.0668   Epoch: 3   Global Step: 61040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:03,549-Speed 3338.29 samples/sec   Loss 3.7892   LearningRate 0.0668   Epoch: 3   Global Step: 61050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:06,613-Speed 3342.18 samples/sec   Loss 3.6597   LearningRate 0.0668   Epoch: 3   Global Step: 61060   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:09,674-Speed 3346.67 samples/sec   Loss 3.6493   LearningRate 0.0668   Epoch: 3   Global Step: 61070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:12,737-Speed 3343.42 samples/sec   Loss 3.7092   LearningRate 0.0668   Epoch: 3   Global Step: 61080   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:15,805-Speed 3338.92 samples/sec   Loss 3.6995   LearningRate 0.0667   Epoch: 3   Global Step: 61090   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:18,888-Speed 3321.45 samples/sec   Loss 3.7251   LearningRate 0.0667   Epoch: 3   Global Step: 61100   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:21,952-Speed 3343.71 samples/sec   Loss 3.6543   LearningRate 0.0667   Epoch: 3   Global Step: 61110   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:25,011-Speed 3347.56 samples/sec   Loss 3.5901   LearningRate 0.0667   Epoch: 3   Global Step: 61120   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:28,139-Speed 3275.30 samples/sec   Loss 3.6793   LearningRate 0.0667   Epoch: 3   Global Step: 61130   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:31,199-Speed 3347.83 samples/sec   Loss 3.6511   LearningRate 0.0667   Epoch: 3   Global Step: 61140   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:34,274-Speed 3330.99 samples/sec   Loss 3.7292   LearningRate 0.0667   Epoch: 3   Global Step: 61150   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:41:37,334-Speed 3346.69 samples/sec   Loss 3.6606   LearningRate 0.0667   Epoch: 3   Global Step: 61160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:41:40,402-Speed 3338.63 samples/sec   Loss 3.7236   LearningRate 0.0667   Epoch: 3   Global Step: 61170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:41:43,466-Speed 3342.19 samples/sec   Loss 3.5873   LearningRate 0.0667   Epoch: 3   Global Step: 61180   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:41:46,540-Speed 3332.60 samples/sec   Loss 3.8106   LearningRate 0.0667   Epoch: 3   Global Step: 61190   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:41:49,610-Speed 3336.22 samples/sec   Loss 3.6056   LearningRate 0.0667   Epoch: 3   Global Step: 61200   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:41:52,693-Speed 3322.67 samples/sec   Loss 3.5932   LearningRate 0.0667   Epoch: 3   Global Step: 61210   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:41:55,781-Speed 3317.06 samples/sec   Loss 3.5998   LearningRate 0.0667   Epoch: 3   Global Step: 61220   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:41:58,857-Speed 3329.23 samples/sec   Loss 3.6978   LearningRate 0.0667   Epoch: 3   Global Step: 61230   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:42:01,921-Speed 3343.51 samples/sec   Loss 3.7140   LearningRate 0.0667   Epoch: 3   Global Step: 61240   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:42:04,983-Speed 3344.89 samples/sec   Loss 3.5942   LearningRate 0.0667   Epoch: 3   Global Step: 61250   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:42:08,089-Speed 3297.23 samples/sec   Loss 3.6643   LearningRate 0.0667   Epoch: 3   Global Step: 61260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:11,179-Speed 3315.17 samples/sec   Loss 3.6831   LearningRate 0.0667   Epoch: 3   Global Step: 61270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:14,245-Speed 3339.94 samples/sec   Loss 3.6434   LearningRate 0.0667   Epoch: 3   Global Step: 61280   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:17,321-Speed 3330.32 samples/sec   Loss 3.5690   LearningRate 0.0667   Epoch: 3   Global Step: 61290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:20,409-Speed 3316.51 samples/sec   Loss 3.5817   LearningRate 0.0666   Epoch: 3   Global Step: 61300   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:23,498-Speed 3316.56 samples/sec   Loss 3.6747   LearningRate 0.0666   Epoch: 3   Global Step: 61310   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:26,676-Speed 3223.03 samples/sec   Loss 3.7387   LearningRate 0.0666   Epoch: 3   Global Step: 61320   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:29,804-Speed 3274.01 samples/sec   Loss 3.7207   LearningRate 0.0666   Epoch: 3   Global Step: 61330   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:33,007-Speed 3197.27 samples/sec   Loss 3.6136   LearningRate 0.0666   Epoch: 3   Global Step: 61340   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:36,075-Speed 3338.33 samples/sec   Loss 3.5959   LearningRate 0.0666   Epoch: 3   Global Step: 61350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:39,182-Speed 3296.81 samples/sec   Loss 3.6819   LearningRate 0.0666   Epoch: 3   Global Step: 61360   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:42,293-Speed 3292.23 samples/sec   Loss 3.7300   LearningRate 0.0666   Epoch: 3   Global Step: 61370   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:45,432-Speed 3263.47 samples/sec   Loss 3.6144   LearningRate 0.0666   Epoch: 3   Global Step: 61380   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:48,541-Speed 3294.16 samples/sec   Loss 3.6218   LearningRate 0.0666   Epoch: 3   Global Step: 61390   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:51,606-Speed 3342.18 samples/sec   Loss 3.6227   LearningRate 0.0666   Epoch: 3   Global Step: 61400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:54,778-Speed 3228.92 samples/sec   Loss 3.6254   LearningRate 0.0666   Epoch: 3   Global Step: 61410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:42:57,923-Speed 3256.57 samples/sec   Loss 3.7181   LearningRate 0.0666   Epoch: 3   Global Step: 61420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:00,987-Speed 3342.97 samples/sec   Loss 3.6144   LearningRate 0.0666   Epoch: 3   Global Step: 61430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:04,068-Speed 3324.48 samples/sec   Loss 3.7477   LearningRate 0.0666   Epoch: 3   Global Step: 61440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:07,130-Speed 3344.40 samples/sec   Loss 3.6394   LearningRate 0.0666   Epoch: 3   Global Step: 61450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:10,230-Speed 3304.55 samples/sec   Loss 3.5811   LearningRate 0.0666   Epoch: 3   Global Step: 61460   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-04-11 05:43:13,313-Speed 3322.29 samples/sec   Loss 3.6685   LearningRate 0.0666   Epoch: 3   Global Step: 61470   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:16,424-Speed 3292.95 samples/sec   Loss 3.6375   LearningRate 0.0666   Epoch: 3   Global Step: 61480   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:19,542-Speed 3285.14 samples/sec   Loss 3.6418   LearningRate 0.0666   Epoch: 3   Global Step: 61490   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:22,642-Speed 3303.98 samples/sec   Loss 3.6052   LearningRate 0.0665   Epoch: 3   Global Step: 61500   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:25,718-Speed 3329.05 samples/sec   Loss 3.6592   LearningRate 0.0665   Epoch: 3   Global Step: 61510   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:28,789-Speed 3335.61 samples/sec   Loss 3.6715   LearningRate 0.0665   Epoch: 3   Global Step: 61520   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:31,870-Speed 3324.32 samples/sec   Loss 3.6087   LearningRate 0.0665   Epoch: 3   Global Step: 61530   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:34,939-Speed 3336.99 samples/sec   Loss 3.6479   LearningRate 0.0665   Epoch: 3   Global Step: 61540   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:38,021-Speed 3323.34 samples/sec   Loss 3.6998   LearningRate 0.0665   Epoch: 3   Global Step: 61550   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:41,084-Speed 3343.90 samples/sec   Loss 3.6328   LearningRate 0.0665   Epoch: 3   Global Step: 61560   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:44,151-Speed 3339.54 samples/sec   Loss 3.5960   LearningRate 0.0665   Epoch: 3   Global Step: 61570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:47,271-Speed 3283.10 samples/sec   Loss 3.7296   LearningRate 0.0665   Epoch: 3   Global Step: 61580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:50,443-Speed 3229.70 samples/sec   Loss 3.6233   LearningRate 0.0665   Epoch: 3   Global Step: 61590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:53,617-Speed 3226.88 samples/sec   Loss 3.6716   LearningRate 0.0665   Epoch: 3   Global Step: 61600   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:56,698-Speed 3323.89 samples/sec   Loss 3.5836   LearningRate 0.0665   Epoch: 3   Global Step: 61610   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:43:59,786-Speed 3317.30 samples/sec   Loss 3.6850   LearningRate 0.0665   Epoch: 3   Global Step: 61620   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:02,887-Speed 3302.40 samples/sec   Loss 3.6824   LearningRate 0.0665   Epoch: 3   Global Step: 61630   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:05,991-Speed 3300.12 samples/sec   Loss 3.6978   LearningRate 0.0665   Epoch: 3   Global Step: 61640   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:09,102-Speed 3291.67 samples/sec   Loss 3.6902   LearningRate 0.0665   Epoch: 3   Global Step: 61650   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:12,240-Speed 3264.27 samples/sec   Loss 3.6602   LearningRate 0.0665   Epoch: 3   Global Step: 61660   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:15,355-Speed 3288.93 samples/sec   Loss 3.7627   LearningRate 0.0665   Epoch: 3   Global Step: 61670   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:18,438-Speed 3322.45 samples/sec   Loss 3.6922   LearningRate 0.0665   Epoch: 3   Global Step: 61680   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:21,501-Speed 3343.69 samples/sec   Loss 3.6484   LearningRate 0.0665   Epoch: 3   Global Step: 61690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:24,570-Speed 3337.26 samples/sec   Loss 3.5762   LearningRate 0.0665   Epoch: 3   Global Step: 61700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:27,633-Speed 3343.65 samples/sec   Loss 3.6171   LearningRate 0.0664   Epoch: 3   Global Step: 61710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:30,710-Speed 3328.70 samples/sec   Loss 3.7015   LearningRate 0.0664   Epoch: 3   Global Step: 61720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:33,773-Speed 3344.00 samples/sec   Loss 3.6665   LearningRate 0.0664   Epoch: 3   Global Step: 61730   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:36,843-Speed 3335.95 samples/sec   Loss 3.5447   LearningRate 0.0664   Epoch: 3   Global Step: 61740   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:39,917-Speed 3332.14 samples/sec   Loss 3.6852   LearningRate 0.0664   Epoch: 3   Global Step: 61750   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:43,067-Speed 3252.24 samples/sec   Loss 3.6616   LearningRate 0.0664   Epoch: 3   Global Step: 61760   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:46,122-Speed 3352.13 samples/sec   Loss 3.6777   LearningRate 0.0664   Epoch: 3   Global Step: 61770   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:49,188-Speed 3340.13 samples/sec   Loss 3.6991   LearningRate 0.0664   Epoch: 3   Global Step: 61780   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:52,250-Speed 3345.19 samples/sec   Loss 3.7202   LearningRate 0.0664   Epoch: 3   Global Step: 61790   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:55,324-Speed 3332.38 samples/sec   Loss 3.6352   LearningRate 0.0664   Epoch: 3   Global Step: 61800   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:44:58,408-Speed 3321.26 samples/sec   Loss 3.6512   LearningRate 0.0664   Epoch: 3   Global Step: 61810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:45:01,475-Speed 3339.39 samples/sec   Loss 3.6313   LearningRate 0.0664   Epoch: 3   Global Step: 61820   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:45:04,531-Speed 3351.69 samples/sec   Loss 3.7427   LearningRate 0.0664   Epoch: 3   Global Step: 61830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:07,593-Speed 3345.59 samples/sec   Loss 3.6300   LearningRate 0.0664   Epoch: 3   Global Step: 61840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:10,658-Speed 3341.23 samples/sec   Loss 3.5901   LearningRate 0.0664   Epoch: 3   Global Step: 61850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:13,730-Speed 3334.24 samples/sec   Loss 3.6867   LearningRate 0.0664   Epoch: 3   Global Step: 61860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:16,795-Speed 3341.96 samples/sec   Loss 3.6763   LearningRate 0.0664   Epoch: 3   Global Step: 61870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:19,862-Speed 3339.30 samples/sec   Loss 3.6200   LearningRate 0.0664   Epoch: 3   Global Step: 61880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:22,967-Speed 3298.27 samples/sec   Loss 3.7325   LearningRate 0.0664   Epoch: 3   Global Step: 61890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:26,053-Speed 3318.74 samples/sec   Loss 3.6941   LearningRate 0.0664   Epoch: 3   Global Step: 61900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:29,143-Speed 3315.23 samples/sec   Loss 3.6941   LearningRate 0.0663   Epoch: 3   Global Step: 61910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:32,258-Speed 3288.07 samples/sec   Loss 3.5985   LearningRate 0.0663   Epoch: 3   Global Step: 61920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:35,348-Speed 3315.38 samples/sec   Loss 3.6185   LearningRate 0.0663   Epoch: 3   Global Step: 61930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:45:38,470-Speed 3279.98 samples/sec   Loss 3.6965   LearningRate 0.0663   Epoch: 3   Global Step: 61940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:45:41,592-Speed 3280.94 samples/sec   Loss 3.6159   LearningRate 0.0663   Epoch: 3   Global Step: 61950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:45:44,694-Speed 3301.96 samples/sec   Loss 3.5352   LearningRate 0.0663   Epoch: 3   Global Step: 61960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:45:47,775-Speed 3324.07 samples/sec   Loss 3.5899   LearningRate 0.0663   Epoch: 3   Global Step: 61970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:50,841-Speed 3340.93 samples/sec   Loss 3.6301   LearningRate 0.0663   Epoch: 3   Global Step: 61980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:53,953-Speed 3291.20 samples/sec   Loss 3.5838   LearningRate 0.0663   Epoch: 3   Global Step: 61990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:45:57,035-Speed 3322.66 samples/sec   Loss 3.7276   LearningRate 0.0663   Epoch: 3   Global Step: 62000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:46:40,842-[lfw][62000]XNorm: 21.550207
Training: 2022-04-11 05:46:40,843-[lfw][62000]Accuracy-Flip: 0.99800+-0.00277
Training: 2022-04-11 05:46:40,843-[lfw][62000]Accuracy-Highest: 0.99800
Training: 2022-04-11 05:47:32,149-[cfp_fp][62000]XNorm: 20.141830
Training: 2022-04-11 05:47:32,150-[cfp_fp][62000]Accuracy-Flip: 0.98400+-0.00530
Training: 2022-04-11 05:47:32,150-[cfp_fp][62000]Accuracy-Highest: 0.98414
Training: 2022-04-11 05:48:16,368-[agedb_30][62000]XNorm: 21.769997
Training: 2022-04-11 05:48:16,369-[agedb_30][62000]Accuracy-Flip: 0.97867+-0.00829
Training: 2022-04-11 05:48:16,369-[agedb_30][62000]Accuracy-Highest: 0.98100
Training: 2022-04-11 05:48:19,441-Speed 71.91 samples/sec   Loss 3.6311   LearningRate 0.0663   Epoch: 3   Global Step: 62010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:22,535-Speed 3309.95 samples/sec   Loss 3.6173   LearningRate 0.0663   Epoch: 3   Global Step: 62020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:25,611-Speed 3330.63 samples/sec   Loss 3.5644   LearningRate 0.0663   Epoch: 3   Global Step: 62030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:28,769-Speed 3243.32 samples/sec   Loss 3.7045   LearningRate 0.0663   Epoch: 3   Global Step: 62040   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:31,920-Speed 3249.58 samples/sec   Loss 3.6142   LearningRate 0.0663   Epoch: 3   Global Step: 62050   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:34,971-Speed 3357.78 samples/sec   Loss 3.6931   LearningRate 0.0663   Epoch: 3   Global Step: 62060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:38,024-Speed 3355.23 samples/sec   Loss 3.6457   LearningRate 0.0663   Epoch: 3   Global Step: 62070   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:48:41,064-Speed 3369.18 samples/sec   Loss 3.5961   LearningRate 0.0663   Epoch: 3   Global Step: 62080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:44,151-Speed 3317.45 samples/sec   Loss 3.5938   LearningRate 0.0663   Epoch: 3   Global Step: 62090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:47,293-Speed 3260.05 samples/sec   Loss 3.6858   LearningRate 0.0663   Epoch: 3   Global Step: 62100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:50,384-Speed 3313.20 samples/sec   Loss 3.6050   LearningRate 0.0663   Epoch: 3   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:53,450-Speed 3341.51 samples/sec   Loss 3.6209   LearningRate 0.0662   Epoch: 3   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:56,505-Speed 3352.78 samples/sec   Loss 3.6470   LearningRate 0.0662   Epoch: 3   Global Step: 62130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:48:59,585-Speed 3325.00 samples/sec   Loss 3.5963   LearningRate 0.0662   Epoch: 3   Global Step: 62140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:49:02,657-Speed 3334.23 samples/sec   Loss 3.7328   LearningRate 0.0662   Epoch: 3   Global Step: 62150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:49:05,736-Speed 3326.37 samples/sec   Loss 3.6712   LearningRate 0.0662   Epoch: 3   Global Step: 62160   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:49:08,811-Speed 3330.70 samples/sec   Loss 3.5553   LearningRate 0.0662   Epoch: 3   Global Step: 62170   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:49:11,876-Speed 3342.62 samples/sec   Loss 3.6900   LearningRate 0.0662   Epoch: 3   Global Step: 62180   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:14,947-Speed 3334.95 samples/sec   Loss 3.6159   LearningRate 0.0662   Epoch: 3   Global Step: 62190   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:18,019-Speed 3333.62 samples/sec   Loss 3.6133   LearningRate 0.0662   Epoch: 3   Global Step: 62200   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:21,099-Speed 3325.31 samples/sec   Loss 3.6478   LearningRate 0.0662   Epoch: 3   Global Step: 62210   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:24,167-Speed 3339.51 samples/sec   Loss 3.6627   LearningRate 0.0662   Epoch: 3   Global Step: 62220   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:27,239-Speed 3333.82 samples/sec   Loss 3.6176   LearningRate 0.0662   Epoch: 3   Global Step: 62230   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:30,308-Speed 3337.99 samples/sec   Loss 3.6873   LearningRate 0.0662   Epoch: 3   Global Step: 62240   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:33,388-Speed 3324.69 samples/sec   Loss 3.7329   LearningRate 0.0662   Epoch: 3   Global Step: 62250   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:36,501-Speed 3290.20 samples/sec   Loss 3.7538   LearningRate 0.0662   Epoch: 3   Global Step: 62260   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:39,583-Speed 3323.02 samples/sec   Loss 3.7306   LearningRate 0.0662   Epoch: 3   Global Step: 62270   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:42,661-Speed 3327.96 samples/sec   Loss 3.6705   LearningRate 0.0662   Epoch: 3   Global Step: 62280   Fp16 Grad Scale: 262144   Required: 29 hours
Training: 2022-04-11 05:49:45,717-Speed 3351.76 samples/sec   Loss 3.6023   LearningRate 0.0662   Epoch: 3   Global Step: 62290   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:48,780-Speed 3343.42 samples/sec   Loss 3.6889   LearningRate 0.0662   Epoch: 3   Global Step: 62300   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:51,856-Speed 3330.10 samples/sec   Loss 3.7121   LearningRate 0.0662   Epoch: 3   Global Step: 62310   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:54,934-Speed 3327.61 samples/sec   Loss 3.6354   LearningRate 0.0661   Epoch: 3   Global Step: 62320   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:49:58,014-Speed 3326.33 samples/sec   Loss 3.6073   LearningRate 0.0661   Epoch: 3   Global Step: 62330   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:01,088-Speed 3331.47 samples/sec   Loss 3.7270   LearningRate 0.0661   Epoch: 3   Global Step: 62340   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:04,225-Speed 3264.65 samples/sec   Loss 3.6251   LearningRate 0.0661   Epoch: 3   Global Step: 62350   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:07,293-Speed 3338.55 samples/sec   Loss 3.7109   LearningRate 0.0661   Epoch: 3   Global Step: 62360   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:10,355-Speed 3344.74 samples/sec   Loss 3.6658   LearningRate 0.0661   Epoch: 3   Global Step: 62370   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:13,419-Speed 3342.66 samples/sec   Loss 3.6488   LearningRate 0.0661   Epoch: 3   Global Step: 62380   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:16,481-Speed 3345.08 samples/sec   Loss 3.7270   LearningRate 0.0661   Epoch: 3   Global Step: 62390   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:19,550-Speed 3338.03 samples/sec   Loss 3.6667   LearningRate 0.0661   Epoch: 3   Global Step: 62400   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:22,606-Speed 3351.01 samples/sec   Loss 3.6571   LearningRate 0.0661   Epoch: 3   Global Step: 62410   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:25,662-Speed 3352.51 samples/sec   Loss 3.6798   LearningRate 0.0661   Epoch: 3   Global Step: 62420   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:28,718-Speed 3351.41 samples/sec   Loss 3.5724   LearningRate 0.0661   Epoch: 3   Global Step: 62430   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:31,776-Speed 3349.85 samples/sec   Loss 3.5877   LearningRate 0.0661   Epoch: 3   Global Step: 62440   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:34,833-Speed 3350.57 samples/sec   Loss 3.6123   LearningRate 0.0661   Epoch: 3   Global Step: 62450   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:38,046-Speed 3187.36 samples/sec   Loss 3.6429   LearningRate 0.0661   Epoch: 3   Global Step: 62460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:41,294-Speed 3153.67 samples/sec   Loss 3.6140   LearningRate 0.0661   Epoch: 3   Global Step: 62470   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:50:44,355-Speed 3346.10 samples/sec   Loss 3.6523   LearningRate 0.0661   Epoch: 3   Global Step: 62480   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:50:47,430-Speed 3330.42 samples/sec   Loss 3.6865   LearningRate 0.0661   Epoch: 3   Global Step: 62490   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:50:50,495-Speed 3342.02 samples/sec   Loss 3.6222   LearningRate 0.0661   Epoch: 3   Global Step: 62500   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:50:53,556-Speed 3346.45 samples/sec   Loss 3.6533   LearningRate 0.0661   Epoch: 3   Global Step: 62510   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:50:56,611-Speed 3351.73 samples/sec   Loss 3.6452   LearningRate 0.0661   Epoch: 3   Global Step: 62520   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:50:59,669-Speed 3349.91 samples/sec   Loss 3.5985   LearningRate 0.0660   Epoch: 3   Global Step: 62530   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:51:02,734-Speed 3341.93 samples/sec   Loss 3.6373   LearningRate 0.0660   Epoch: 3   Global Step: 62540   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:51:05,865-Speed 3271.31 samples/sec   Loss 3.5881   LearningRate 0.0660   Epoch: 3   Global Step: 62550   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:51:08,926-Speed 3346.05 samples/sec   Loss 3.6594   LearningRate 0.0660   Epoch: 3   Global Step: 62560   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:51:12,019-Speed 3311.48 samples/sec   Loss 3.6725   LearningRate 0.0660   Epoch: 3   Global Step: 62570   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-11 05:51:15,123-Speed 3299.55 samples/sec   Loss 3.5794   LearningRate 0.0660   Epoch: 3   Global Step: 62580   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:18,236-Speed 3290.07 samples/sec   Loss 3.5343   LearningRate 0.0660   Epoch: 3   Global Step: 62590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:21,303-Speed 3340.01 samples/sec   Loss 3.5549   LearningRate 0.0660   Epoch: 3   Global Step: 62600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:24,404-Speed 3302.72 samples/sec   Loss 3.6068   LearningRate 0.0660   Epoch: 3   Global Step: 62610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:27,471-Speed 3339.97 samples/sec   Loss 3.6934   LearningRate 0.0660   Epoch: 3   Global Step: 62620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:30,537-Speed 3339.93 samples/sec   Loss 3.6424   LearningRate 0.0660   Epoch: 3   Global Step: 62630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:33,603-Speed 3341.24 samples/sec   Loss 3.5958   LearningRate 0.0660   Epoch: 3   Global Step: 62640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:36,682-Speed 3326.01 samples/sec   Loss 3.6404   LearningRate 0.0660   Epoch: 3   Global Step: 62650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:39,766-Speed 3321.20 samples/sec   Loss 3.6650   LearningRate 0.0660   Epoch: 3   Global Step: 62660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:42,854-Speed 3316.16 samples/sec   Loss 3.6363   LearningRate 0.0660   Epoch: 3   Global Step: 62670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:45,939-Speed 3320.58 samples/sec   Loss 3.6462   LearningRate 0.0660   Epoch: 3   Global Step: 62680   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:51:49,001-Speed 3344.69 samples/sec   Loss 3.6245   LearningRate 0.0660   Epoch: 3   Global Step: 62690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:51:52,067-Speed 3341.36 samples/sec   Loss 3.6822   LearningRate 0.0660   Epoch: 3   Global Step: 62700   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:55,130-Speed 3344.02 samples/sec   Loss 3.6320   LearningRate 0.0660   Epoch: 3   Global Step: 62710   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:51:58,203-Speed 3332.44 samples/sec   Loss 3.6192   LearningRate 0.0660   Epoch: 3   Global Step: 62720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:01,346-Speed 3258.82 samples/sec   Loss 3.7028   LearningRate 0.0659   Epoch: 3   Global Step: 62730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:04,412-Speed 3340.87 samples/sec   Loss 3.6419   LearningRate 0.0659   Epoch: 3   Global Step: 62740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:07,494-Speed 3323.55 samples/sec   Loss 3.5849   LearningRate 0.0659   Epoch: 3   Global Step: 62750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:10,568-Speed 3331.49 samples/sec   Loss 3.6627   LearningRate 0.0659   Epoch: 3   Global Step: 62760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:13,642-Speed 3332.21 samples/sec   Loss 3.6474   LearningRate 0.0659   Epoch: 3   Global Step: 62770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:16,702-Speed 3347.15 samples/sec   Loss 3.6397   LearningRate 0.0659   Epoch: 3   Global Step: 62780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:19,768-Speed 3340.72 samples/sec   Loss 3.7475   LearningRate 0.0659   Epoch: 3   Global Step: 62790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:22,949-Speed 3220.03 samples/sec   Loss 3.6886   LearningRate 0.0659   Epoch: 3   Global Step: 62800   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:52:26,012-Speed 3344.12 samples/sec   Loss 3.6852   LearningRate 0.0659   Epoch: 3   Global Step: 62810   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:52:29,075-Speed 3343.41 samples/sec   Loss 3.6625   LearningRate 0.0659   Epoch: 3   Global Step: 62820   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:52:32,129-Speed 3354.38 samples/sec   Loss 3.6626   LearningRate 0.0659   Epoch: 3   Global Step: 62830   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:35,231-Speed 3301.65 samples/sec   Loss 3.6330   LearningRate 0.0659   Epoch: 3   Global Step: 62840   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:38,304-Speed 3332.35 samples/sec   Loss 3.6809   LearningRate 0.0659   Epoch: 3   Global Step: 62850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:41,380-Speed 3330.30 samples/sec   Loss 3.6490   LearningRate 0.0659   Epoch: 3   Global Step: 62860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:44,445-Speed 3341.54 samples/sec   Loss 3.6306   LearningRate 0.0659   Epoch: 3   Global Step: 62870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:47,603-Speed 3243.38 samples/sec   Loss 3.6513   LearningRate 0.0659   Epoch: 3   Global Step: 62880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:50,671-Speed 3338.97 samples/sec   Loss 3.6264   LearningRate 0.0659   Epoch: 3   Global Step: 62890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:53,742-Speed 3334.41 samples/sec   Loss 3.5855   LearningRate 0.0659   Epoch: 3   Global Step: 62900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:56,821-Speed 3326.85 samples/sec   Loss 3.6203   LearningRate 0.0659   Epoch: 3   Global Step: 62910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:52:59,893-Speed 3334.19 samples/sec   Loss 3.5826   LearningRate 0.0659   Epoch: 3   Global Step: 62920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:53:02,968-Speed 3330.76 samples/sec   Loss 3.5214   LearningRate 0.0659   Epoch: 3   Global Step: 62930   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:06,033-Speed 3342.10 samples/sec   Loss 3.6037   LearningRate 0.0658   Epoch: 3   Global Step: 62940   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:09,115-Speed 3322.21 samples/sec   Loss 3.5564   LearningRate 0.0658   Epoch: 3   Global Step: 62950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:12,185-Speed 3337.13 samples/sec   Loss 3.5404   LearningRate 0.0658   Epoch: 3   Global Step: 62960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:15,253-Speed 3338.71 samples/sec   Loss 3.6732   LearningRate 0.0658   Epoch: 3   Global Step: 62970   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:18,360-Speed 3296.61 samples/sec   Loss 3.6640   LearningRate 0.0658   Epoch: 3   Global Step: 62980   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:21,439-Speed 3325.97 samples/sec   Loss 3.5665   LearningRate 0.0658   Epoch: 3   Global Step: 62990   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:24,509-Speed 3335.78 samples/sec   Loss 3.6231   LearningRate 0.0658   Epoch: 3   Global Step: 63000   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:27,591-Speed 3323.11 samples/sec   Loss 3.6425   LearningRate 0.0658   Epoch: 3   Global Step: 63010   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:30,687-Speed 3309.24 samples/sec   Loss 3.5601   LearningRate 0.0658   Epoch: 3   Global Step: 63020   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:33,749-Speed 3344.88 samples/sec   Loss 3.6587   LearningRate 0.0658   Epoch: 3   Global Step: 63030   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:36,896-Speed 3254.24 samples/sec   Loss 3.5961   LearningRate 0.0658   Epoch: 3   Global Step: 63040   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:39,972-Speed 3329.82 samples/sec   Loss 3.5945   LearningRate 0.0658   Epoch: 3   Global Step: 63050   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-04-11 05:53:43,051-Speed 3327.10 samples/sec   Loss 3.5652   LearningRate 0.0658   Epoch: 3   Global Step: 63060   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:53:46,119-Speed 3337.96 samples/sec   Loss 3.5263   LearningRate 0.0658   Epoch: 3   Global Step: 63070   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:53:49,191-Speed 3334.41 samples/sec   Loss 3.5591   LearningRate 0.0658   Epoch: 3   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:53:52,306-Speed 3288.19 samples/sec   Loss 3.6356   LearningRate 0.0658   Epoch: 3   Global Step: 63090   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:53:55,384-Speed 3327.36 samples/sec   Loss 3.6332   LearningRate 0.0658   Epoch: 3   Global Step: 63100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:53:58,465-Speed 3324.36 samples/sec   Loss 3.5784   LearningRate 0.0658   Epoch: 3   Global Step: 63110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:54:01,542-Speed 3328.59 samples/sec   Loss 3.6184   LearningRate 0.0658   Epoch: 3   Global Step: 63120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-11 05:54:04,613-Speed 3335.58 samples/sec   Loss 3.7035   LearningRate 0.0658   Epoch: 3   Global Step: 63130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:54:07,693-Speed 3325.95 samples/sec   Loss 3.6044   LearningRate 0.0657   Epoch: 3   Global Step: 63140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:54:10,761-Speed 3337.83 samples/sec   Loss 3.6728   LearningRate 0.0657   Epoch: 3   Global Step: 63150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:54:13,888-Speed 3275.37 samples/sec   Loss 3.5747   LearningRate 0.0657   Epoch: 3   Global Step: 63160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:16,962-Speed 3332.28 samples/sec   Loss 3.6601   LearningRate 0.0657   Epoch: 3   Global Step: 63170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:20,044-Speed 3323.48 samples/sec   Loss 3.6154   LearningRate 0.0657   Epoch: 3   Global Step: 63180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:23,123-Speed 3326.11 samples/sec   Loss 3.5771   LearningRate 0.0657   Epoch: 3   Global Step: 63190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:26,192-Speed 3337.21 samples/sec   Loss 3.5841   LearningRate 0.0657   Epoch: 3   Global Step: 63200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:29,274-Speed 3322.72 samples/sec   Loss 3.5487   LearningRate 0.0657   Epoch: 3   Global Step: 63210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:32,342-Speed 3339.53 samples/sec   Loss 3.6161   LearningRate 0.0657   Epoch: 3   Global Step: 63220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:35,409-Speed 3339.69 samples/sec   Loss 3.6345   LearningRate 0.0657   Epoch: 3   Global Step: 63230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:38,490-Speed 3324.43 samples/sec   Loss 3.5401   LearningRate 0.0657   Epoch: 3   Global Step: 63240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:41,589-Speed 3304.50 samples/sec   Loss 3.6426   LearningRate 0.0657   Epoch: 3   Global Step: 63250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:44,666-Speed 3329.08 samples/sec   Loss 3.5811   LearningRate 0.0657   Epoch: 3   Global Step: 63260   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 05:54:47,722-Speed 3351.26 samples/sec   Loss 3.4982   LearningRate 0.0657   Epoch: 3   Global Step: 63270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:50,808-Speed 3318.64 samples/sec   Loss 3.6487   LearningRate 0.0657   Epoch: 3   Global Step: 63280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:53,894-Speed 3319.24 samples/sec   Loss 3.5123   LearningRate 0.0657   Epoch: 3   Global Step: 63290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:54:56,970-Speed 3329.43 samples/sec   Loss 3.6079   LearningRate 0.0657   Epoch: 3   Global Step: 63300   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:00,136-Speed 3235.44 samples/sec   Loss 3.6048   LearningRate 0.0657   Epoch: 3   Global Step: 63310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:03,366-Speed 3170.62 samples/sec   Loss 3.6343   LearningRate 0.0657   Epoch: 3   Global Step: 63320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:06,457-Speed 3314.21 samples/sec   Loss 3.6844   LearningRate 0.0657   Epoch: 3   Global Step: 63330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:09,532-Speed 3330.67 samples/sec   Loss 3.6480   LearningRate 0.0657   Epoch: 3   Global Step: 63340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:12,601-Speed 3337.02 samples/sec   Loss 3.6890   LearningRate 0.0656   Epoch: 3   Global Step: 63350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:15,767-Speed 3235.57 samples/sec   Loss 3.6181   LearningRate 0.0656   Epoch: 3   Global Step: 63360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:18,881-Speed 3289.00 samples/sec   Loss 3.5582   LearningRate 0.0656   Epoch: 3   Global Step: 63370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:22,115-Speed 3166.94 samples/sec   Loss 3.7360   LearningRate 0.0656   Epoch: 3   Global Step: 63380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:25,220-Speed 3298.68 samples/sec   Loss 3.6069   LearningRate 0.0656   Epoch: 3   Global Step: 63390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:28,290-Speed 3336.15 samples/sec   Loss 3.6361   LearningRate 0.0656   Epoch: 3   Global Step: 63400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:31,364-Speed 3332.08 samples/sec   Loss 3.6297   LearningRate 0.0656   Epoch: 3   Global Step: 63410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:34,438-Speed 3332.66 samples/sec   Loss 3.7029   LearningRate 0.0656   Epoch: 3   Global Step: 63420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:55:37,522-Speed 3321.08 samples/sec   Loss 3.6025   LearningRate 0.0656   Epoch: 3   Global Step: 63430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:40,607-Speed 3319.59 samples/sec   Loss 3.6540   LearningRate 0.0656   Epoch: 3   Global Step: 63440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:43,797-Speed 3210.58 samples/sec   Loss 3.5685   LearningRate 0.0656   Epoch: 3   Global Step: 63450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:46,875-Speed 3327.43 samples/sec   Loss 3.5113   LearningRate 0.0656   Epoch: 3   Global Step: 63460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:49,962-Speed 3318.45 samples/sec   Loss 3.5771   LearningRate 0.0656   Epoch: 3   Global Step: 63470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:53,032-Speed 3336.08 samples/sec   Loss 3.6245   LearningRate 0.0656   Epoch: 3   Global Step: 63480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:56,125-Speed 3311.76 samples/sec   Loss 3.5683   LearningRate 0.0656   Epoch: 3   Global Step: 63490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:55:59,215-Speed 3316.27 samples/sec   Loss 3.6035   LearningRate 0.0656   Epoch: 3   Global Step: 63500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:56:02,287-Speed 3334.23 samples/sec   Loss 3.6035   LearningRate 0.0656   Epoch: 3   Global Step: 63510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:56:05,360-Speed 3333.18 samples/sec   Loss 3.5928   LearningRate 0.0656   Epoch: 3   Global Step: 63520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:56:08,587-Speed 3173.54 samples/sec   Loss 3.6339   LearningRate 0.0656   Epoch: 3   Global Step: 63530   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 05:56:11,651-Speed 3342.99 samples/sec   Loss 3.6260   LearningRate 0.0656   Epoch: 3   Global Step: 63540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:14,788-Speed 3265.18 samples/sec   Loss 3.5888   LearningRate 0.0655   Epoch: 3   Global Step: 63550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:17,860-Speed 3333.62 samples/sec   Loss 3.6463   LearningRate 0.0655   Epoch: 3   Global Step: 63560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:20,945-Speed 3319.81 samples/sec   Loss 3.6983   LearningRate 0.0655   Epoch: 3   Global Step: 63570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:24,025-Speed 3326.26 samples/sec   Loss 3.6270   LearningRate 0.0655   Epoch: 3   Global Step: 63580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:27,128-Speed 3300.71 samples/sec   Loss 3.6571   LearningRate 0.0655   Epoch: 3   Global Step: 63590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:30,224-Speed 3308.77 samples/sec   Loss 3.6161   LearningRate 0.0655   Epoch: 3   Global Step: 63600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:33,404-Speed 3220.35 samples/sec   Loss 3.6848   LearningRate 0.0655   Epoch: 3   Global Step: 63610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:36,642-Speed 3163.72 samples/sec   Loss 3.6399   LearningRate 0.0655   Epoch: 3   Global Step: 63620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:39,883-Speed 3159.51 samples/sec   Loss 3.6371   LearningRate 0.0655   Epoch: 3   Global Step: 63630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 05:56:43,062-Speed 3221.96 samples/sec   Loss 3.6307   LearningRate 0.0655   Epoch: 3   Global Step: 63640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:56:46,200-Speed 3263.47 samples/sec   Loss 3.6093   LearningRate 0.0655   Epoch: 3   Global Step: 63650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:56:49,308-Speed 3296.63 samples/sec   Loss 3.6280   LearningRate 0.0655   Epoch: 3   Global Step: 63660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:56:52,378-Speed 3335.85 samples/sec   Loss 3.5745   LearningRate 0.0655   Epoch: 3   Global Step: 63670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:56:55,480-Speed 3301.85 samples/sec   Loss 3.6717   LearningRate 0.0655   Epoch: 3   Global Step: 63680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:56:58,565-Speed 3319.77 samples/sec   Loss 3.5505   LearningRate 0.0655   Epoch: 3   Global Step: 63690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:01,667-Speed 3301.94 samples/sec   Loss 3.6381   LearningRate 0.0655   Epoch: 3   Global Step: 63700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:04,815-Speed 3254.43 samples/sec   Loss 3.6054   LearningRate 0.0655   Epoch: 3   Global Step: 63710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:07,888-Speed 3331.98 samples/sec   Loss 3.6916   LearningRate 0.0655   Epoch: 3   Global Step: 63720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:10,957-Speed 3338.00 samples/sec   Loss 3.5961   LearningRate 0.0655   Epoch: 3   Global Step: 63730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:14,019-Speed 3344.08 samples/sec   Loss 3.5956   LearningRate 0.0655   Epoch: 3   Global Step: 63740   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 05:57:17,082-Speed 3345.41 samples/sec   Loss 3.5952   LearningRate 0.0655   Epoch: 3   Global Step: 63750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:20,160-Speed 3327.47 samples/sec   Loss 3.6674   LearningRate 0.0654   Epoch: 3   Global Step: 63760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:23,242-Speed 3322.56 samples/sec   Loss 3.5683   LearningRate 0.0654   Epoch: 3   Global Step: 63770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:26,331-Speed 3315.36 samples/sec   Loss 3.7050   LearningRate 0.0654   Epoch: 3   Global Step: 63780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:29,396-Speed 3342.24 samples/sec   Loss 3.6570   LearningRate 0.0654   Epoch: 3   Global Step: 63790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:32,501-Speed 3298.11 samples/sec   Loss 3.5592   LearningRate 0.0654   Epoch: 3   Global Step: 63800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:35,604-Speed 3301.43 samples/sec   Loss 3.6264   LearningRate 0.0654   Epoch: 3   Global Step: 63810   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:38,676-Speed 3334.34 samples/sec   Loss 3.6564   LearningRate 0.0654   Epoch: 3   Global Step: 63820   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:41,759-Speed 3322.41 samples/sec   Loss 3.5611   LearningRate 0.0654   Epoch: 3   Global Step: 63830   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:44,839-Speed 3325.39 samples/sec   Loss 3.5651   LearningRate 0.0654   Epoch: 3   Global Step: 63840   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:47,905-Speed 3339.99 samples/sec   Loss 3.6433   LearningRate 0.0654   Epoch: 3   Global Step: 63850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:50,974-Speed 3337.77 samples/sec   Loss 3.5782   LearningRate 0.0654   Epoch: 3   Global Step: 63860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:54,040-Speed 3340.35 samples/sec   Loss 3.5618   LearningRate 0.0654   Epoch: 3   Global Step: 63870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:57:57,157-Speed 3286.19 samples/sec   Loss 3.6714   LearningRate 0.0654   Epoch: 3   Global Step: 63880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:00,220-Speed 3343.76 samples/sec   Loss 3.6003   LearningRate 0.0654   Epoch: 3   Global Step: 63890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:03,301-Speed 3323.69 samples/sec   Loss 3.6037   LearningRate 0.0654   Epoch: 3   Global Step: 63900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:06,390-Speed 3316.02 samples/sec   Loss 3.5857   LearningRate 0.0654   Epoch: 3   Global Step: 63910   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:09,456-Speed 3340.79 samples/sec   Loss 3.5020   LearningRate 0.0654   Epoch: 3   Global Step: 63920   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:12,525-Speed 3338.24 samples/sec   Loss 3.6088   LearningRate 0.0654   Epoch: 3   Global Step: 63930   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:15,598-Speed 3332.22 samples/sec   Loss 3.4951   LearningRate 0.0654   Epoch: 3   Global Step: 63940   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:18,660-Speed 3345.24 samples/sec   Loss 3.5744   LearningRate 0.0654   Epoch: 3   Global Step: 63950   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:21,746-Speed 3318.80 samples/sec   Loss 3.5484   LearningRate 0.0654   Epoch: 3   Global Step: 63960   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:24,820-Speed 3331.64 samples/sec   Loss 3.5188   LearningRate 0.0653   Epoch: 3   Global Step: 63970   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:27,893-Speed 3332.92 samples/sec   Loss 3.5906   LearningRate 0.0653   Epoch: 3   Global Step: 63980   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:30,967-Speed 3333.15 samples/sec   Loss 3.5345   LearningRate 0.0653   Epoch: 3   Global Step: 63990   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:58:34,056-Speed 3315.15 samples/sec   Loss 3.5482   LearningRate 0.0653   Epoch: 3   Global Step: 64000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 05:59:17,995-[lfw][64000]XNorm: 23.310858
Training: 2022-04-11 05:59:17,996-[lfw][64000]Accuracy-Flip: 0.99767+-0.00291
Training: 2022-04-11 05:59:17,996-[lfw][64000]Accuracy-Highest: 0.99800
Training: 2022-04-11 06:00:09,055-[cfp_fp][64000]XNorm: 21.344082
Training: 2022-04-11 06:00:09,055-[cfp_fp][64000]Accuracy-Flip: 0.98414+-0.00435
Training: 2022-04-11 06:00:09,056-[cfp_fp][64000]Accuracy-Highest: 0.98414
Training: 2022-04-11 06:00:52,557-[agedb_30][64000]XNorm: 22.841764
Training: 2022-04-11 06:00:52,558-[agedb_30][64000]Accuracy-Flip: 0.97783+-0.00742
Training: 2022-04-11 06:00:52,558-[agedb_30][64000]Accuracy-Highest: 0.98100
Training: 2022-04-11 06:00:55,624-Speed 72.33 samples/sec   Loss 3.5970   LearningRate 0.0653   Epoch: 3   Global Step: 64010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:00:58,673-Speed 3359.49 samples/sec   Loss 3.5553   LearningRate 0.0653   Epoch: 3   Global Step: 64020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:01,735-Speed 3345.36 samples/sec   Loss 3.5605   LearningRate 0.0653   Epoch: 3   Global Step: 64030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:04,805-Speed 3336.49 samples/sec   Loss 3.5396   LearningRate 0.0653   Epoch: 3   Global Step: 64040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:07,845-Speed 3368.54 samples/sec   Loss 3.6195   LearningRate 0.0653   Epoch: 3   Global Step: 64050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:10,898-Speed 3354.38 samples/sec   Loss 3.5142   LearningRate 0.0653   Epoch: 3   Global Step: 64060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:13,950-Speed 3355.72 samples/sec   Loss 3.5671   LearningRate 0.0653   Epoch: 3   Global Step: 64070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:17,012-Speed 3345.92 samples/sec   Loss 3.6239   LearningRate 0.0653   Epoch: 3   Global Step: 64080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:20,066-Speed 3353.32 samples/sec   Loss 3.5015   LearningRate 0.0653   Epoch: 3   Global Step: 64090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:23,145-Speed 3326.95 samples/sec   Loss 3.6123   LearningRate 0.0653   Epoch: 3   Global Step: 64100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:26,211-Speed 3340.14 samples/sec   Loss 3.6008   LearningRate 0.0653   Epoch: 3   Global Step: 64110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:29,278-Speed 3340.18 samples/sec   Loss 3.5325   LearningRate 0.0653   Epoch: 3   Global Step: 64120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:32,366-Speed 3316.88 samples/sec   Loss 3.5664   LearningRate 0.0653   Epoch: 3   Global Step: 64130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:35,422-Speed 3351.81 samples/sec   Loss 3.6310   LearningRate 0.0653   Epoch: 3   Global Step: 64140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:38,468-Speed 3361.65 samples/sec   Loss 3.6664   LearningRate 0.0653   Epoch: 3   Global Step: 64150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:41,576-Speed 3296.22 samples/sec   Loss 3.5314   LearningRate 0.0653   Epoch: 3   Global Step: 64160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:44,641-Speed 3341.39 samples/sec   Loss 3.5146   LearningRate 0.0652   Epoch: 3   Global Step: 64170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:47,696-Speed 3352.89 samples/sec   Loss 3.5964   LearningRate 0.0652   Epoch: 3   Global Step: 64180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:50,757-Speed 3345.48 samples/sec   Loss 3.5797   LearningRate 0.0652   Epoch: 3   Global Step: 64190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:53,830-Speed 3333.38 samples/sec   Loss 3.5065   LearningRate 0.0652   Epoch: 3   Global Step: 64200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:56,893-Speed 3343.90 samples/sec   Loss 3.4800   LearningRate 0.0652   Epoch: 3   Global Step: 64210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:01:59,952-Speed 3348.94 samples/sec   Loss 3.6443   LearningRate 0.0652   Epoch: 3   Global Step: 64220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:02:03,011-Speed 3347.60 samples/sec   Loss 3.4577   LearningRate 0.0652   Epoch: 3   Global Step: 64230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:02:06,075-Speed 3342.40 samples/sec   Loss 3.5811   LearningRate 0.0652   Epoch: 3   Global Step: 64240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:02:09,122-Speed 3361.65 samples/sec   Loss 3.5379   LearningRate 0.0652   Epoch: 3   Global Step: 64250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:02:12,182-Speed 3347.41 samples/sec   Loss 3.6269   LearningRate 0.0652   Epoch: 3   Global Step: 64260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:02:15,245-Speed 3343.18 samples/sec   Loss 3.5812   LearningRate 0.0652   Epoch: 3   Global Step: 64270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:02:18,305-Speed 3347.29 samples/sec   Loss 3.5241   LearningRate 0.0652   Epoch: 3   Global Step: 64280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:02:21,382-Speed 3329.03 samples/sec   Loss 3.5837   LearningRate 0.0652   Epoch: 3   Global Step: 64290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:24,527-Speed 3256.73 samples/sec   Loss 3.5629   LearningRate 0.0652   Epoch: 3   Global Step: 64300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:27,583-Speed 3351.55 samples/sec   Loss 3.5918   LearningRate 0.0652   Epoch: 3   Global Step: 64310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:30,647-Speed 3342.81 samples/sec   Loss 3.4376   LearningRate 0.0652   Epoch: 3   Global Step: 64320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:33,710-Speed 3343.63 samples/sec   Loss 3.6180   LearningRate 0.0652   Epoch: 3   Global Step: 64330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:36,776-Speed 3340.71 samples/sec   Loss 3.5872   LearningRate 0.0652   Epoch: 3   Global Step: 64340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:39,834-Speed 3349.83 samples/sec   Loss 3.4944   LearningRate 0.0652   Epoch: 3   Global Step: 64350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:42,912-Speed 3327.54 samples/sec   Loss 3.5806   LearningRate 0.0652   Epoch: 3   Global Step: 64360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:45,972-Speed 3346.88 samples/sec   Loss 3.5601   LearningRate 0.0652   Epoch: 3   Global Step: 64370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:49,040-Speed 3338.61 samples/sec   Loss 3.5971   LearningRate 0.0651   Epoch: 3   Global Step: 64380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:02:52,099-Speed 3348.96 samples/sec   Loss 3.5780   LearningRate 0.0651   Epoch: 3   Global Step: 64390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:02:55,156-Speed 3350.18 samples/sec   Loss 3.6409   LearningRate 0.0651   Epoch: 3   Global Step: 64400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:02:58,220-Speed 3342.95 samples/sec   Loss 3.5566   LearningRate 0.0651   Epoch: 3   Global Step: 64410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:01,280-Speed 3347.08 samples/sec   Loss 3.6651   LearningRate 0.0651   Epoch: 3   Global Step: 64420   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:04,401-Speed 3281.95 samples/sec   Loss 3.5926   LearningRate 0.0651   Epoch: 3   Global Step: 64430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:07,531-Speed 3271.24 samples/sec   Loss 3.5597   LearningRate 0.0651   Epoch: 3   Global Step: 64440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:10,636-Speed 3299.55 samples/sec   Loss 3.5862   LearningRate 0.0651   Epoch: 3   Global Step: 64450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:13,694-Speed 3348.64 samples/sec   Loss 3.5872   LearningRate 0.0651   Epoch: 3   Global Step: 64460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:16,755-Speed 3346.79 samples/sec   Loss 3.6259   LearningRate 0.0651   Epoch: 3   Global Step: 64470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:19,814-Speed 3348.07 samples/sec   Loss 3.6576   LearningRate 0.0651   Epoch: 3   Global Step: 64480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:22,890-Speed 3329.48 samples/sec   Loss 3.6133   LearningRate 0.0651   Epoch: 3   Global Step: 64490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:25,971-Speed 3324.86 samples/sec   Loss 3.5267   LearningRate 0.0651   Epoch: 3   Global Step: 64500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:29,029-Speed 3348.81 samples/sec   Loss 3.5911   LearningRate 0.0651   Epoch: 3   Global Step: 64510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:32,108-Speed 3326.76 samples/sec   Loss 3.6217   LearningRate 0.0651   Epoch: 3   Global Step: 64520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:35,166-Speed 3349.84 samples/sec   Loss 3.6156   LearningRate 0.0651   Epoch: 3   Global Step: 64530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:38,241-Speed 3330.08 samples/sec   Loss 3.6078   LearningRate 0.0651   Epoch: 3   Global Step: 64540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:41,298-Speed 3350.94 samples/sec   Loss 3.5414   LearningRate 0.0651   Epoch: 3   Global Step: 64550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:03:44,357-Speed 3348.17 samples/sec   Loss 3.5653   LearningRate 0.0651   Epoch: 3   Global Step: 64560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:47,421-Speed 3343.63 samples/sec   Loss 3.5374   LearningRate 0.0651   Epoch: 3   Global Step: 64570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:50,483-Speed 3344.44 samples/sec   Loss 3.5901   LearningRate 0.0651   Epoch: 3   Global Step: 64580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:53,541-Speed 3349.19 samples/sec   Loss 3.5130   LearningRate 0.0650   Epoch: 3   Global Step: 64590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:56,598-Speed 3350.75 samples/sec   Loss 3.6522   LearningRate 0.0650   Epoch: 3   Global Step: 64600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:03:59,658-Speed 3346.63 samples/sec   Loss 3.6291   LearningRate 0.0650   Epoch: 3   Global Step: 64610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:02,728-Speed 3336.64 samples/sec   Loss 3.5463   LearningRate 0.0650   Epoch: 3   Global Step: 64620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:05,800-Speed 3333.80 samples/sec   Loss 3.5354   LearningRate 0.0650   Epoch: 3   Global Step: 64630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:08,882-Speed 3324.13 samples/sec   Loss 3.6217   LearningRate 0.0650   Epoch: 3   Global Step: 64640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:11,956-Speed 3331.73 samples/sec   Loss 3.4901   LearningRate 0.0650   Epoch: 3   Global Step: 64650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:15,036-Speed 3325.27 samples/sec   Loss 3.5617   LearningRate 0.0650   Epoch: 3   Global Step: 64660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:18,113-Speed 3328.49 samples/sec   Loss 3.5480   LearningRate 0.0650   Epoch: 3   Global Step: 64670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:21,174-Speed 3347.37 samples/sec   Loss 3.5769   LearningRate 0.0650   Epoch: 3   Global Step: 64680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:24,269-Speed 3309.36 samples/sec   Loss 3.6805   LearningRate 0.0650   Epoch: 3   Global Step: 64690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:27,328-Speed 3348.14 samples/sec   Loss 3.6181   LearningRate 0.0650   Epoch: 3   Global Step: 64700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:30,393-Speed 3341.33 samples/sec   Loss 3.6157   LearningRate 0.0650   Epoch: 3   Global Step: 64710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:33,450-Speed 3350.57 samples/sec   Loss 3.5639   LearningRate 0.0650   Epoch: 3   Global Step: 64720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:36,516-Speed 3340.97 samples/sec   Loss 3.5219   LearningRate 0.0650   Epoch: 3   Global Step: 64730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:39,574-Speed 3348.83 samples/sec   Loss 3.6667   LearningRate 0.0650   Epoch: 3   Global Step: 64740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:42,633-Speed 3348.12 samples/sec   Loss 3.6052   LearningRate 0.0650   Epoch: 3   Global Step: 64750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:45,724-Speed 3313.84 samples/sec   Loss 3.5524   LearningRate 0.0650   Epoch: 3   Global Step: 64760   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:04:48,774-Speed 3357.56 samples/sec   Loss 3.5754   LearningRate 0.0650   Epoch: 3   Global Step: 64770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:51,848-Speed 3332.24 samples/sec   Loss 3.5498   LearningRate 0.0650   Epoch: 3   Global Step: 64780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:54,912-Speed 3343.08 samples/sec   Loss 3.6203   LearningRate 0.0649   Epoch: 3   Global Step: 64790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:04:57,984-Speed 3333.94 samples/sec   Loss 3.6132   LearningRate 0.0649   Epoch: 3   Global Step: 64800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:01,054-Speed 3336.46 samples/sec   Loss 3.5827   LearningRate 0.0649   Epoch: 3   Global Step: 64810   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:04,144-Speed 3315.05 samples/sec   Loss 3.6306   LearningRate 0.0649   Epoch: 3   Global Step: 64820   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:07,214-Speed 3336.93 samples/sec   Loss 3.5461   LearningRate 0.0649   Epoch: 3   Global Step: 64830   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:10,278-Speed 3342.67 samples/sec   Loss 3.5487   LearningRate 0.0649   Epoch: 3   Global Step: 64840   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:13,341-Speed 3343.93 samples/sec   Loss 3.6309   LearningRate 0.0649   Epoch: 3   Global Step: 64850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:16,400-Speed 3347.61 samples/sec   Loss 3.5849   LearningRate 0.0649   Epoch: 3   Global Step: 64860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:19,466-Speed 3340.62 samples/sec   Loss 3.5633   LearningRate 0.0649   Epoch: 3   Global Step: 64870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:22,545-Speed 3326.23 samples/sec   Loss 3.5398   LearningRate 0.0649   Epoch: 3   Global Step: 64880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:25,615-Speed 3337.43 samples/sec   Loss 3.5447   LearningRate 0.0649   Epoch: 3   Global Step: 64890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:28,686-Speed 3336.40 samples/sec   Loss 3.5161   LearningRate 0.0649   Epoch: 3   Global Step: 64900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:05:31,746-Speed 3347.37 samples/sec   Loss 3.5964   LearningRate 0.0649   Epoch: 3   Global Step: 64910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:05:34,816-Speed 3336.03 samples/sec   Loss 3.5978   LearningRate 0.0649   Epoch: 3   Global Step: 64920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:05:37,884-Speed 3338.50 samples/sec   Loss 3.5043   LearningRate 0.0649   Epoch: 3   Global Step: 64930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:05:40,958-Speed 3331.13 samples/sec   Loss 3.5566   LearningRate 0.0649   Epoch: 3   Global Step: 64940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:05:44,020-Speed 3345.51 samples/sec   Loss 3.5388   LearningRate 0.0649   Epoch: 3   Global Step: 64950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:05:47,183-Speed 3237.66 samples/sec   Loss 3.6122   LearningRate 0.0649   Epoch: 3   Global Step: 64960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:05:50,272-Speed 3316.63 samples/sec   Loss 3.6111   LearningRate 0.0649   Epoch: 3   Global Step: 64970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:05:53,363-Speed 3313.77 samples/sec   Loss 3.5431   LearningRate 0.0649   Epoch: 3   Global Step: 64980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:05:56,464-Speed 3303.40 samples/sec   Loss 3.5500   LearningRate 0.0649   Epoch: 3   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:05:59,542-Speed 3327.42 samples/sec   Loss 3.5172   LearningRate 0.0648   Epoch: 3   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:06:02,609-Speed 3339.84 samples/sec   Loss 3.5078   LearningRate 0.0648   Epoch: 3   Global Step: 65010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:05,671-Speed 3344.66 samples/sec   Loss 3.6001   LearningRate 0.0648   Epoch: 3   Global Step: 65020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:08,744-Speed 3332.41 samples/sec   Loss 3.5735   LearningRate 0.0648   Epoch: 3   Global Step: 65030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:11,813-Speed 3338.06 samples/sec   Loss 3.5536   LearningRate 0.0648   Epoch: 3   Global Step: 65040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:14,889-Speed 3329.86 samples/sec   Loss 3.6641   LearningRate 0.0648   Epoch: 3   Global Step: 65050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:17,951-Speed 3345.27 samples/sec   Loss 3.5782   LearningRate 0.0648   Epoch: 3   Global Step: 65060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:21,015-Speed 3342.42 samples/sec   Loss 3.5112   LearningRate 0.0648   Epoch: 3   Global Step: 65070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:24,081-Speed 3340.35 samples/sec   Loss 3.4432   LearningRate 0.0648   Epoch: 3   Global Step: 65080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:27,144-Speed 3344.06 samples/sec   Loss 3.6003   LearningRate 0.0648   Epoch: 3   Global Step: 65090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:30,211-Speed 3340.09 samples/sec   Loss 3.5174   LearningRate 0.0648   Epoch: 3   Global Step: 65100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:06:33,282-Speed 3334.79 samples/sec   Loss 3.5496   LearningRate 0.0648   Epoch: 3   Global Step: 65110   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:06:36,342-Speed 3347.06 samples/sec   Loss 3.5028   LearningRate 0.0648   Epoch: 3   Global Step: 65120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:06:39,404-Speed 3345.12 samples/sec   Loss 3.4994   LearningRate 0.0648   Epoch: 3   Global Step: 65130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:06:42,483-Speed 3326.32 samples/sec   Loss 3.5380   LearningRate 0.0648   Epoch: 3   Global Step: 65140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:06:45,558-Speed 3331.56 samples/sec   Loss 3.5984   LearningRate 0.0648   Epoch: 3   Global Step: 65150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:06:48,626-Speed 3338.17 samples/sec   Loss 3.5210   LearningRate 0.0648   Epoch: 3   Global Step: 65160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:06:51,705-Speed 3326.76 samples/sec   Loss 3.6106   LearningRate 0.0648   Epoch: 3   Global Step: 65170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:06:54,776-Speed 3334.85 samples/sec   Loss 3.5276   LearningRate 0.0648   Epoch: 3   Global Step: 65180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:06:57,844-Speed 3339.02 samples/sec   Loss 3.5506   LearningRate 0.0648   Epoch: 3   Global Step: 65190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:00,964-Speed 3282.88 samples/sec   Loss 3.5627   LearningRate 0.0648   Epoch: 3   Global Step: 65200   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:04,108-Speed 3257.76 samples/sec   Loss 3.5781   LearningRate 0.0647   Epoch: 3   Global Step: 65210   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:07,175-Speed 3339.41 samples/sec   Loss 3.4670   LearningRate 0.0647   Epoch: 3   Global Step: 65220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:07:10,241-Speed 3340.95 samples/sec   Loss 3.5564   LearningRate 0.0647   Epoch: 3   Global Step: 65230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:07:13,314-Speed 3333.46 samples/sec   Loss 3.6221   LearningRate 0.0647   Epoch: 3   Global Step: 65240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:07:16,385-Speed 3334.76 samples/sec   Loss 3.5729   LearningRate 0.0647   Epoch: 3   Global Step: 65250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:07:19,437-Speed 3355.58 samples/sec   Loss 3.5606   LearningRate 0.0647   Epoch: 3   Global Step: 65260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:22,521-Speed 3321.21 samples/sec   Loss 3.5790   LearningRate 0.0647   Epoch: 3   Global Step: 65270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:25,585-Speed 3342.40 samples/sec   Loss 3.5344   LearningRate 0.0647   Epoch: 3   Global Step: 65280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:28,652-Speed 3340.33 samples/sec   Loss 3.6070   LearningRate 0.0647   Epoch: 3   Global Step: 65290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:31,735-Speed 3321.38 samples/sec   Loss 3.5904   LearningRate 0.0647   Epoch: 3   Global Step: 65300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:34,808-Speed 3333.68 samples/sec   Loss 3.4997   LearningRate 0.0647   Epoch: 3   Global Step: 65310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:37,892-Speed 3321.23 samples/sec   Loss 3.5360   LearningRate 0.0647   Epoch: 3   Global Step: 65320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:40,979-Speed 3317.90 samples/sec   Loss 3.5430   LearningRate 0.0647   Epoch: 3   Global Step: 65330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:44,053-Speed 3331.82 samples/sec   Loss 3.6203   LearningRate 0.0647   Epoch: 3   Global Step: 65340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:47,134-Speed 3324.78 samples/sec   Loss 3.6097   LearningRate 0.0647   Epoch: 3   Global Step: 65350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:07:50,262-Speed 3274.60 samples/sec   Loss 3.4713   LearningRate 0.0647   Epoch: 3   Global Step: 65360   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:07:53,329-Speed 3339.03 samples/sec   Loss 3.5095   LearningRate 0.0647   Epoch: 3   Global Step: 65370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:07:56,424-Speed 3309.13 samples/sec   Loss 3.4966   LearningRate 0.0647   Epoch: 3   Global Step: 65380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:07:59,504-Speed 3325.28 samples/sec   Loss 3.6118   LearningRate 0.0647   Epoch: 3   Global Step: 65390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:08:02,696-Speed 3209.41 samples/sec   Loss 3.5542   LearningRate 0.0647   Epoch: 3   Global Step: 65400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:08:05,803-Speed 3296.25 samples/sec   Loss 3.5388   LearningRate 0.0647   Epoch: 3   Global Step: 65410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:08:08,856-Speed 3355.40 samples/sec   Loss 3.6043   LearningRate 0.0646   Epoch: 3   Global Step: 65420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:11,946-Speed 3314.37 samples/sec   Loss 3.6323   LearningRate 0.0646   Epoch: 3   Global Step: 65430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:15,026-Speed 3325.39 samples/sec   Loss 3.5807   LearningRate 0.0646   Epoch: 3   Global Step: 65440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:18,106-Speed 3326.03 samples/sec   Loss 3.5370   LearningRate 0.0646   Epoch: 3   Global Step: 65450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:21,188-Speed 3323.44 samples/sec   Loss 3.6002   LearningRate 0.0646   Epoch: 3   Global Step: 65460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:24,256-Speed 3338.28 samples/sec   Loss 3.5875   LearningRate 0.0646   Epoch: 3   Global Step: 65470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:27,354-Speed 3305.98 samples/sec   Loss 3.6101   LearningRate 0.0646   Epoch: 3   Global Step: 65480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:30,422-Speed 3338.45 samples/sec   Loss 3.5693   LearningRate 0.0646   Epoch: 3   Global Step: 65490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:33,491-Speed 3337.40 samples/sec   Loss 3.4478   LearningRate 0.0646   Epoch: 3   Global Step: 65500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:36,565-Speed 3331.41 samples/sec   Loss 3.5042   LearningRate 0.0646   Epoch: 3   Global Step: 65510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:39,632-Speed 3340.09 samples/sec   Loss 3.4975   LearningRate 0.0646   Epoch: 3   Global Step: 65520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:08:42,705-Speed 3333.21 samples/sec   Loss 3.5736   LearningRate 0.0646   Epoch: 3   Global Step: 65530   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:08:45,780-Speed 3330.69 samples/sec   Loss 3.5880   LearningRate 0.0646   Epoch: 3   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:48,856-Speed 3329.10 samples/sec   Loss 3.5567   LearningRate 0.0646   Epoch: 3   Global Step: 65550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:51,934-Speed 3327.64 samples/sec   Loss 3.5216   LearningRate 0.0646   Epoch: 3   Global Step: 65560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:55,006-Speed 3333.82 samples/sec   Loss 3.5048   LearningRate 0.0646   Epoch: 3   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:08:58,084-Speed 3328.33 samples/sec   Loss 3.5657   LearningRate 0.0646   Epoch: 3   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:09:01,147-Speed 3343.99 samples/sec   Loss 3.5730   LearningRate 0.0646   Epoch: 3   Global Step: 65590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:09:04,217-Speed 3335.88 samples/sec   Loss 3.5753   LearningRate 0.0646   Epoch: 3   Global Step: 65600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:09:07,330-Speed 3291.13 samples/sec   Loss 3.5244   LearningRate 0.0646   Epoch: 3   Global Step: 65610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:09:10,407-Speed 3328.35 samples/sec   Loss 3.5235   LearningRate 0.0645   Epoch: 3   Global Step: 65620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:09:13,491-Speed 3321.32 samples/sec   Loss 3.6095   LearningRate 0.0645   Epoch: 3   Global Step: 65630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:09:16,563-Speed 3333.43 samples/sec   Loss 3.5369   LearningRate 0.0645   Epoch: 3   Global Step: 65640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:19,632-Speed 3337.25 samples/sec   Loss 3.6321   LearningRate 0.0645   Epoch: 3   Global Step: 65650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:22,702-Speed 3336.94 samples/sec   Loss 3.5993   LearningRate 0.0645   Epoch: 3   Global Step: 65660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:25,765-Speed 3343.81 samples/sec   Loss 3.6147   LearningRate 0.0645   Epoch: 3   Global Step: 65670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:28,841-Speed 3329.48 samples/sec   Loss 3.5462   LearningRate 0.0645   Epoch: 3   Global Step: 65680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:31,913-Speed 3333.97 samples/sec   Loss 3.6193   LearningRate 0.0645   Epoch: 3   Global Step: 65690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:34,979-Speed 3341.44 samples/sec   Loss 3.5788   LearningRate 0.0645   Epoch: 3   Global Step: 65700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:38,097-Speed 3284.60 samples/sec   Loss 3.5963   LearningRate 0.0645   Epoch: 3   Global Step: 65710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:41,178-Speed 3324.44 samples/sec   Loss 3.6394   LearningRate 0.0645   Epoch: 3   Global Step: 65720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:44,272-Speed 3309.66 samples/sec   Loss 3.6014   LearningRate 0.0645   Epoch: 3   Global Step: 65730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:47,378-Speed 3298.05 samples/sec   Loss 3.6373   LearningRate 0.0645   Epoch: 3   Global Step: 65740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:50,452-Speed 3332.00 samples/sec   Loss 3.5522   LearningRate 0.0645   Epoch: 3   Global Step: 65750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:53,522-Speed 3336.70 samples/sec   Loss 3.5439   LearningRate 0.0645   Epoch: 3   Global Step: 65760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:09:56,579-Speed 3350.64 samples/sec   Loss 3.5193   LearningRate 0.0645   Epoch: 3   Global Step: 65770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:09:59,781-Speed 3198.79 samples/sec   Loss 3.5181   LearningRate 0.0645   Epoch: 3   Global Step: 65780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:02,852-Speed 3334.23 samples/sec   Loss 3.5374   LearningRate 0.0645   Epoch: 3   Global Step: 65790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:05,984-Speed 3270.73 samples/sec   Loss 3.5111   LearningRate 0.0645   Epoch: 3   Global Step: 65800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:09,188-Speed 3196.94 samples/sec   Loss 3.5388   LearningRate 0.0645   Epoch: 3   Global Step: 65810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:12,345-Speed 3243.77 samples/sec   Loss 3.5462   LearningRate 0.0645   Epoch: 3   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:15,502-Speed 3244.38 samples/sec   Loss 3.5076   LearningRate 0.0644   Epoch: 3   Global Step: 65830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:18,688-Speed 3214.95 samples/sec   Loss 3.6081   LearningRate 0.0644   Epoch: 3   Global Step: 65840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:21,856-Speed 3234.29 samples/sec   Loss 3.6180   LearningRate 0.0644   Epoch: 3   Global Step: 65850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:24,954-Speed 3305.60 samples/sec   Loss 3.6297   LearningRate 0.0644   Epoch: 3   Global Step: 65860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:28,044-Speed 3314.07 samples/sec   Loss 3.5602   LearningRate 0.0644   Epoch: 3   Global Step: 65870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:10:31,118-Speed 3332.68 samples/sec   Loss 3.5002   LearningRate 0.0644   Epoch: 3   Global Step: 65880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:10:34,195-Speed 3327.98 samples/sec   Loss 3.5776   LearningRate 0.0644   Epoch: 3   Global Step: 65890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:10:37,262-Speed 3340.42 samples/sec   Loss 3.5475   LearningRate 0.0644   Epoch: 3   Global Step: 65900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:10:40,352-Speed 3314.32 samples/sec   Loss 3.5582   LearningRate 0.0644   Epoch: 3   Global Step: 65910   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:10:43,428-Speed 3329.96 samples/sec   Loss 3.5129   LearningRate 0.0644   Epoch: 3   Global Step: 65920   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:10:46,531-Speed 3300.59 samples/sec   Loss 3.5623   LearningRate 0.0644   Epoch: 3   Global Step: 65930   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:10:49,609-Speed 3327.33 samples/sec   Loss 3.4554   LearningRate 0.0644   Epoch: 3   Global Step: 65940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:52,698-Speed 3316.32 samples/sec   Loss 3.5704   LearningRate 0.0644   Epoch: 3   Global Step: 65950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:55,826-Speed 3274.04 samples/sec   Loss 3.4977   LearningRate 0.0644   Epoch: 3   Global Step: 65960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:10:58,891-Speed 3341.93 samples/sec   Loss 3.5807   LearningRate 0.0644   Epoch: 3   Global Step: 65970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:11:01,974-Speed 3322.54 samples/sec   Loss 3.5928   LearningRate 0.0644   Epoch: 3   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:11:05,123-Speed 3252.55 samples/sec   Loss 3.5873   LearningRate 0.0644   Epoch: 3   Global Step: 65990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:11:08,228-Speed 3298.63 samples/sec   Loss 3.5045   LearningRate 0.0644   Epoch: 3   Global Step: 66000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:11:52,306-[lfw][66000]XNorm: 22.518547
Training: 2022-04-11 06:11:52,307-[lfw][66000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 06:11:52,307-[lfw][66000]Accuracy-Highest: 0.99817
Training: 2022-04-11 06:12:43,796-[cfp_fp][66000]XNorm: 20.936121
Training: 2022-04-11 06:12:43,796-[cfp_fp][66000]Accuracy-Flip: 0.98043+-0.00679
Training: 2022-04-11 06:12:43,797-[cfp_fp][66000]Accuracy-Highest: 0.98414
Training: 2022-04-11 06:13:28,014-[agedb_30][66000]XNorm: 22.619924
Training: 2022-04-11 06:13:28,014-[agedb_30][66000]Accuracy-Flip: 0.98033+-0.00752
Training: 2022-04-11 06:13:28,015-[agedb_30][66000]Accuracy-Highest: 0.98100
Training: 2022-04-11 06:13:31,096-Speed 71.67 samples/sec   Loss 3.5843   LearningRate 0.0644   Epoch: 3   Global Step: 66010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:13:34,212-Speed 3286.89 samples/sec   Loss 3.5550   LearningRate 0.0644   Epoch: 3   Global Step: 66020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:13:37,294-Speed 3323.07 samples/sec   Loss 3.5281   LearningRate 0.0644   Epoch: 3   Global Step: 66030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:13:40,373-Speed 3326.08 samples/sec   Loss 3.4940   LearningRate 0.0643   Epoch: 3   Global Step: 66040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:13:43,445-Speed 3333.93 samples/sec   Loss 3.4853   LearningRate 0.0643   Epoch: 3   Global Step: 66050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:13:46,607-Speed 3239.71 samples/sec   Loss 3.5866   LearningRate 0.0643   Epoch: 3   Global Step: 66060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:13:49,823-Speed 3184.85 samples/sec   Loss 3.5508   LearningRate 0.0643   Epoch: 3   Global Step: 66070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:13:52,899-Speed 3330.33 samples/sec   Loss 3.5522   LearningRate 0.0643   Epoch: 3   Global Step: 66080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:13:55,963-Speed 3341.76 samples/sec   Loss 3.4717   LearningRate 0.0643   Epoch: 3   Global Step: 66090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:13:59,024-Speed 3346.76 samples/sec   Loss 3.6382   LearningRate 0.0643   Epoch: 3   Global Step: 66100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:02,082-Speed 3349.88 samples/sec   Loss 3.5295   LearningRate 0.0643   Epoch: 3   Global Step: 66110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:05,149-Speed 3338.63 samples/sec   Loss 3.5347   LearningRate 0.0643   Epoch: 3   Global Step: 66120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:08,211-Speed 3344.68 samples/sec   Loss 3.5745   LearningRate 0.0643   Epoch: 3   Global Step: 66130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:11,414-Speed 3198.04 samples/sec   Loss 3.5314   LearningRate 0.0643   Epoch: 3   Global Step: 66140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:14,621-Speed 3194.35 samples/sec   Loss 3.5771   LearningRate 0.0643   Epoch: 3   Global Step: 66150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:17,738-Speed 3285.56 samples/sec   Loss 3.4333   LearningRate 0.0643   Epoch: 3   Global Step: 66160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:20,833-Speed 3309.94 samples/sec   Loss 3.5185   LearningRate 0.0643   Epoch: 3   Global Step: 66170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:23,896-Speed 3343.39 samples/sec   Loss 3.5917   LearningRate 0.0643   Epoch: 3   Global Step: 66180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:26,961-Speed 3341.64 samples/sec   Loss 3.5265   LearningRate 0.0643   Epoch: 3   Global Step: 66190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:30,022-Speed 3346.43 samples/sec   Loss 3.4529   LearningRate 0.0643   Epoch: 3   Global Step: 66200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:33,101-Speed 3326.75 samples/sec   Loss 3.4691   LearningRate 0.0643   Epoch: 3   Global Step: 66210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:36,178-Speed 3327.79 samples/sec   Loss 3.5424   LearningRate 0.0643   Epoch: 3   Global Step: 66220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:39,269-Speed 3314.64 samples/sec   Loss 3.5865   LearningRate 0.0643   Epoch: 3   Global Step: 66230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:42,353-Speed 3321.09 samples/sec   Loss 3.5911   LearningRate 0.0643   Epoch: 3   Global Step: 66240   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:14:45,416-Speed 3344.12 samples/sec   Loss 3.4704   LearningRate 0.0642   Epoch: 3   Global Step: 66250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:48,478-Speed 3344.59 samples/sec   Loss 3.4568   LearningRate 0.0642   Epoch: 3   Global Step: 66260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:51,563-Speed 3320.75 samples/sec   Loss 3.4702   LearningRate 0.0642   Epoch: 3   Global Step: 66270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:54,670-Speed 3295.81 samples/sec   Loss 3.5284   LearningRate 0.0642   Epoch: 3   Global Step: 66280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:14:57,757-Speed 3318.42 samples/sec   Loss 3.5712   LearningRate 0.0642   Epoch: 3   Global Step: 66290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:00,823-Speed 3340.75 samples/sec   Loss 3.5225   LearningRate 0.0642   Epoch: 3   Global Step: 66300   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:03,902-Speed 3326.28 samples/sec   Loss 3.5272   LearningRate 0.0642   Epoch: 3   Global Step: 66310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:07,006-Speed 3299.52 samples/sec   Loss 3.5685   LearningRate 0.0642   Epoch: 3   Global Step: 66320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:10,071-Speed 3342.78 samples/sec   Loss 3.5108   LearningRate 0.0642   Epoch: 3   Global Step: 66330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:13,135-Speed 3342.81 samples/sec   Loss 3.4416   LearningRate 0.0642   Epoch: 3   Global Step: 66340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:16,183-Speed 3359.54 samples/sec   Loss 3.4922   LearningRate 0.0642   Epoch: 3   Global Step: 66350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:19,300-Speed 3286.49 samples/sec   Loss 3.5490   LearningRate 0.0642   Epoch: 3   Global Step: 66360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:22,368-Speed 3337.92 samples/sec   Loss 3.4851   LearningRate 0.0642   Epoch: 3   Global Step: 66370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:25,435-Speed 3339.28 samples/sec   Loss 3.5463   LearningRate 0.0642   Epoch: 3   Global Step: 66380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:28,510-Speed 3331.75 samples/sec   Loss 3.5220   LearningRate 0.0642   Epoch: 3   Global Step: 66390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:31,573-Speed 3343.18 samples/sec   Loss 3.4501   LearningRate 0.0642   Epoch: 3   Global Step: 66400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:34,653-Speed 3326.00 samples/sec   Loss 3.5353   LearningRate 0.0642   Epoch: 3   Global Step: 66410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:37,741-Speed 3316.32 samples/sec   Loss 3.4938   LearningRate 0.0642   Epoch: 3   Global Step: 66420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:40,880-Speed 3263.88 samples/sec   Loss 3.5411   LearningRate 0.0642   Epoch: 3   Global Step: 66430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:43,946-Speed 3339.95 samples/sec   Loss 3.5112   LearningRate 0.0642   Epoch: 3   Global Step: 66440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:15:47,022-Speed 3329.66 samples/sec   Loss 3.5632   LearningRate 0.0642   Epoch: 3   Global Step: 66450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:50,090-Speed 3337.97 samples/sec   Loss 3.5676   LearningRate 0.0641   Epoch: 3   Global Step: 66460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:53,169-Speed 3327.07 samples/sec   Loss 3.5173   LearningRate 0.0641   Epoch: 3   Global Step: 66470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:56,249-Speed 3325.92 samples/sec   Loss 3.5692   LearningRate 0.0641   Epoch: 3   Global Step: 66480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:15:59,319-Speed 3336.03 samples/sec   Loss 3.4937   LearningRate 0.0641   Epoch: 3   Global Step: 66490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:02,493-Speed 3227.76 samples/sec   Loss 3.4872   LearningRate 0.0641   Epoch: 3   Global Step: 66500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:05,568-Speed 3330.05 samples/sec   Loss 3.5389   LearningRate 0.0641   Epoch: 3   Global Step: 66510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:08,642-Speed 3331.81 samples/sec   Loss 3.5044   LearningRate 0.0641   Epoch: 3   Global Step: 66520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:11,719-Speed 3328.85 samples/sec   Loss 3.5394   LearningRate 0.0641   Epoch: 3   Global Step: 66530   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:14,793-Speed 3331.94 samples/sec   Loss 3.5546   LearningRate 0.0641   Epoch: 3   Global Step: 66540   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:17,869-Speed 3329.53 samples/sec   Loss 3.5277   LearningRate 0.0641   Epoch: 3   Global Step: 66550   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:20,946-Speed 3329.06 samples/sec   Loss 3.4846   LearningRate 0.0641   Epoch: 3   Global Step: 66560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:24,012-Speed 3341.26 samples/sec   Loss 3.4938   LearningRate 0.0641   Epoch: 3   Global Step: 66570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:27,083-Speed 3334.68 samples/sec   Loss 3.5752   LearningRate 0.0641   Epoch: 3   Global Step: 66580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:30,171-Speed 3317.52 samples/sec   Loss 3.5109   LearningRate 0.0641   Epoch: 3   Global Step: 66590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:33,275-Speed 3299.29 samples/sec   Loss 3.5924   LearningRate 0.0641   Epoch: 3   Global Step: 66600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:36,351-Speed 3329.45 samples/sec   Loss 3.5013   LearningRate 0.0641   Epoch: 3   Global Step: 66610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:39,420-Speed 3338.08 samples/sec   Loss 3.5488   LearningRate 0.0641   Epoch: 3   Global Step: 66620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:42,506-Speed 3319.01 samples/sec   Loss 3.5260   LearningRate 0.0641   Epoch: 3   Global Step: 66630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:45,574-Speed 3337.88 samples/sec   Loss 3.5485   LearningRate 0.0641   Epoch: 3   Global Step: 66640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:48,639-Speed 3341.82 samples/sec   Loss 3.4616   LearningRate 0.0641   Epoch: 3   Global Step: 66650   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:16:51,701-Speed 3345.25 samples/sec   Loss 3.5297   LearningRate 0.0640   Epoch: 3   Global Step: 66660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:54,850-Speed 3252.26 samples/sec   Loss 3.3839   LearningRate 0.0640   Epoch: 3   Global Step: 66670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:16:57,937-Speed 3318.77 samples/sec   Loss 3.5513   LearningRate 0.0640   Epoch: 3   Global Step: 66680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:01,012-Speed 3330.72 samples/sec   Loss 3.4890   LearningRate 0.0640   Epoch: 3   Global Step: 66690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:04,096-Speed 3321.41 samples/sec   Loss 3.5071   LearningRate 0.0640   Epoch: 3   Global Step: 66700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:07,229-Speed 3268.45 samples/sec   Loss 3.5539   LearningRate 0.0640   Epoch: 3   Global Step: 66710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:10,294-Speed 3341.66 samples/sec   Loss 3.4737   LearningRate 0.0640   Epoch: 3   Global Step: 66720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:13,357-Speed 3344.07 samples/sec   Loss 3.5552   LearningRate 0.0640   Epoch: 3   Global Step: 66730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:16,433-Speed 3329.32 samples/sec   Loss 3.6038   LearningRate 0.0640   Epoch: 3   Global Step: 66740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:19,500-Speed 3340.13 samples/sec   Loss 3.5221   LearningRate 0.0640   Epoch: 3   Global Step: 66750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:22,792-Speed 3111.27 samples/sec   Loss 3.6277   LearningRate 0.0640   Epoch: 3   Global Step: 66760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:52,718-Speed 342.19 samples/sec   Loss 3.1324   LearningRate 0.0640   Epoch: 4   Global Step: 66770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:56,290-Speed 2867.83 samples/sec   Loss 2.8918   LearningRate 0.0640   Epoch: 4   Global Step: 66780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:17:59,376-Speed 3319.13 samples/sec   Loss 3.0033   LearningRate 0.0640   Epoch: 4   Global Step: 66790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:02,450-Speed 3332.56 samples/sec   Loss 2.9091   LearningRate 0.0640   Epoch: 4   Global Step: 66800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:05,515-Speed 3341.71 samples/sec   Loss 2.9788   LearningRate 0.0640   Epoch: 4   Global Step: 66810   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:08,588-Speed 3333.46 samples/sec   Loss 2.8611   LearningRate 0.0640   Epoch: 4   Global Step: 66820   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:11,659-Speed 3335.09 samples/sec   Loss 2.8906   LearningRate 0.0640   Epoch: 4   Global Step: 66830   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:14,736-Speed 3329.38 samples/sec   Loss 2.9637   LearningRate 0.0640   Epoch: 4   Global Step: 66840   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:17,898-Speed 3239.55 samples/sec   Loss 2.8853   LearningRate 0.0640   Epoch: 4   Global Step: 66850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:20,970-Speed 3334.25 samples/sec   Loss 2.9431   LearningRate 0.0640   Epoch: 4   Global Step: 66860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:24,062-Speed 3313.01 samples/sec   Loss 2.9405   LearningRate 0.0639   Epoch: 4   Global Step: 66870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:27,120-Speed 3349.42 samples/sec   Loss 2.9000   LearningRate 0.0639   Epoch: 4   Global Step: 66880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:30,682-Speed 2876.01 samples/sec   Loss 2.9038   LearningRate 0.0639   Epoch: 4   Global Step: 66890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:18:33,767-Speed 3320.45 samples/sec   Loss 2.9035   LearningRate 0.0639   Epoch: 4   Global Step: 66900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:18:36,866-Speed 3304.79 samples/sec   Loss 3.0019   LearningRate 0.0639   Epoch: 4   Global Step: 66910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:18:39,931-Speed 3342.74 samples/sec   Loss 2.8769   LearningRate 0.0639   Epoch: 4   Global Step: 66920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:18:43,010-Speed 3326.44 samples/sec   Loss 2.8030   LearningRate 0.0639   Epoch: 4   Global Step: 66930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:18:46,097-Speed 3318.15 samples/sec   Loss 2.9310   LearningRate 0.0639   Epoch: 4   Global Step: 66940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:18:49,166-Speed 3337.36 samples/sec   Loss 2.9847   LearningRate 0.0639   Epoch: 4   Global Step: 66950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:18:52,238-Speed 3334.24 samples/sec   Loss 2.9684   LearningRate 0.0639   Epoch: 4   Global Step: 66960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:18:55,307-Speed 3337.01 samples/sec   Loss 2.9000   LearningRate 0.0639   Epoch: 4   Global Step: 66970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:18:58,377-Speed 3337.35 samples/sec   Loss 2.9619   LearningRate 0.0639   Epoch: 4   Global Step: 66980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:19:01,516-Speed 3262.79 samples/sec   Loss 2.9059   LearningRate 0.0639   Epoch: 4   Global Step: 66990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:19:04,680-Speed 3237.00 samples/sec   Loss 2.9479   LearningRate 0.0639   Epoch: 4   Global Step: 67000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:07,773-Speed 3311.26 samples/sec   Loss 2.8748   LearningRate 0.0639   Epoch: 4   Global Step: 67010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:10,846-Speed 3333.75 samples/sec   Loss 2.9472   LearningRate 0.0639   Epoch: 4   Global Step: 67020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:13,951-Speed 3298.15 samples/sec   Loss 2.9845   LearningRate 0.0639   Epoch: 4   Global Step: 67030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:17,061-Speed 3293.96 samples/sec   Loss 2.9054   LearningRate 0.0639   Epoch: 4   Global Step: 67040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:20,164-Speed 3299.86 samples/sec   Loss 2.9380   LearningRate 0.0639   Epoch: 4   Global Step: 67050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:23,237-Speed 3334.51 samples/sec   Loss 2.8997   LearningRate 0.0639   Epoch: 4   Global Step: 67060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:26,409-Speed 3228.09 samples/sec   Loss 2.8895   LearningRate 0.0639   Epoch: 4   Global Step: 67070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:29,479-Speed 3336.69 samples/sec   Loss 2.9813   LearningRate 0.0638   Epoch: 4   Global Step: 67080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:32,551-Speed 3334.18 samples/sec   Loss 2.9744   LearningRate 0.0638   Epoch: 4   Global Step: 67090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:35,775-Speed 3177.04 samples/sec   Loss 2.9053   LearningRate 0.0638   Epoch: 4   Global Step: 67100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:38,853-Speed 3327.22 samples/sec   Loss 2.9698   LearningRate 0.0638   Epoch: 4   Global Step: 67110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:41,948-Speed 3309.09 samples/sec   Loss 2.9347   LearningRate 0.0638   Epoch: 4   Global Step: 67120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:45,032-Speed 3321.84 samples/sec   Loss 2.8888   LearningRate 0.0638   Epoch: 4   Global Step: 67130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:48,144-Speed 3291.07 samples/sec   Loss 2.9090   LearningRate 0.0638   Epoch: 4   Global Step: 67140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:51,224-Speed 3325.83 samples/sec   Loss 2.9569   LearningRate 0.0638   Epoch: 4   Global Step: 67150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:54,324-Speed 3303.68 samples/sec   Loss 3.0405   LearningRate 0.0638   Epoch: 4   Global Step: 67160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:19:57,518-Speed 3207.90 samples/sec   Loss 3.0425   LearningRate 0.0638   Epoch: 4   Global Step: 67170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:00,624-Speed 3296.80 samples/sec   Loss 2.9405   LearningRate 0.0638   Epoch: 4   Global Step: 67180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:03,726-Speed 3302.20 samples/sec   Loss 2.9235   LearningRate 0.0638   Epoch: 4   Global Step: 67190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:06,783-Speed 3350.75 samples/sec   Loss 2.9191   LearningRate 0.0638   Epoch: 4   Global Step: 67200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:09,868-Speed 3320.14 samples/sec   Loss 2.9348   LearningRate 0.0638   Epoch: 4   Global Step: 67210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:13,006-Speed 3264.29 samples/sec   Loss 2.9751   LearningRate 0.0638   Epoch: 4   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:16,111-Speed 3298.44 samples/sec   Loss 3.0089   LearningRate 0.0638   Epoch: 4   Global Step: 67230   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:19,213-Speed 3302.51 samples/sec   Loss 2.9558   LearningRate 0.0638   Epoch: 4   Global Step: 67240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:22,291-Speed 3328.55 samples/sec   Loss 2.9625   LearningRate 0.0638   Epoch: 4   Global Step: 67250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:25,372-Speed 3323.91 samples/sec   Loss 2.9221   LearningRate 0.0638   Epoch: 4   Global Step: 67260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:28,448-Speed 3330.28 samples/sec   Loss 2.9384   LearningRate 0.0638   Epoch: 4   Global Step: 67270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:31,529-Speed 3323.40 samples/sec   Loss 2.9845   LearningRate 0.0638   Epoch: 4   Global Step: 67280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:34,642-Speed 3290.62 samples/sec   Loss 2.9667   LearningRate 0.0637   Epoch: 4   Global Step: 67290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:37,721-Speed 3327.14 samples/sec   Loss 2.8861   LearningRate 0.0637   Epoch: 4   Global Step: 67300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:40,826-Speed 3298.10 samples/sec   Loss 2.9845   LearningRate 0.0637   Epoch: 4   Global Step: 67310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:20:43,958-Speed 3270.61 samples/sec   Loss 2.9855   LearningRate 0.0637   Epoch: 4   Global Step: 67320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:47,065-Speed 3295.96 samples/sec   Loss 2.9635   LearningRate 0.0637   Epoch: 4   Global Step: 67330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:50,144-Speed 3326.77 samples/sec   Loss 3.0150   LearningRate 0.0637   Epoch: 4   Global Step: 67340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:53,219-Speed 3331.21 samples/sec   Loss 2.9661   LearningRate 0.0637   Epoch: 4   Global Step: 67350   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:56,286-Speed 3339.92 samples/sec   Loss 3.0485   LearningRate 0.0637   Epoch: 4   Global Step: 67360   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:20:59,383-Speed 3307.33 samples/sec   Loss 2.8834   LearningRate 0.0637   Epoch: 4   Global Step: 67370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:02,448-Speed 3341.85 samples/sec   Loss 2.9045   LearningRate 0.0637   Epoch: 4   Global Step: 67380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:05,526-Speed 3327.01 samples/sec   Loss 2.9539   LearningRate 0.0637   Epoch: 4   Global Step: 67390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:08,624-Speed 3306.18 samples/sec   Loss 2.9268   LearningRate 0.0637   Epoch: 4   Global Step: 67400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:11,694-Speed 3336.57 samples/sec   Loss 2.9542   LearningRate 0.0637   Epoch: 4   Global Step: 67410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:14,771-Speed 3328.33 samples/sec   Loss 2.9557   LearningRate 0.0637   Epoch: 4   Global Step: 67420   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:21:17,831-Speed 3347.28 samples/sec   Loss 2.9583   LearningRate 0.0637   Epoch: 4   Global Step: 67430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:20,904-Speed 3333.26 samples/sec   Loss 3.0059   LearningRate 0.0637   Epoch: 4   Global Step: 67440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:24,001-Speed 3307.37 samples/sec   Loss 3.0562   LearningRate 0.0637   Epoch: 4   Global Step: 67450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:27,069-Speed 3338.04 samples/sec   Loss 3.0048   LearningRate 0.0637   Epoch: 4   Global Step: 67460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:30,154-Speed 3320.49 samples/sec   Loss 2.9106   LearningRate 0.0637   Epoch: 4   Global Step: 67470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:33,255-Speed 3303.44 samples/sec   Loss 2.9763   LearningRate 0.0637   Epoch: 4   Global Step: 67480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:36,324-Speed 3336.92 samples/sec   Loss 2.9751   LearningRate 0.0637   Epoch: 4   Global Step: 67490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:39,401-Speed 3329.47 samples/sec   Loss 2.9816   LearningRate 0.0636   Epoch: 4   Global Step: 67500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:42,490-Speed 3315.66 samples/sec   Loss 3.0361   LearningRate 0.0636   Epoch: 4   Global Step: 67510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:45,597-Speed 3295.73 samples/sec   Loss 2.9798   LearningRate 0.0636   Epoch: 4   Global Step: 67520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:21:48,716-Speed 3284.33 samples/sec   Loss 3.0545   LearningRate 0.0636   Epoch: 4   Global Step: 67530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:21:51,806-Speed 3315.27 samples/sec   Loss 2.9190   LearningRate 0.0636   Epoch: 4   Global Step: 67540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:21:54,952-Speed 3255.60 samples/sec   Loss 2.9449   LearningRate 0.0636   Epoch: 4   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:21:58,074-Speed 3280.78 samples/sec   Loss 2.9841   LearningRate 0.0636   Epoch: 4   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:01,149-Speed 3330.33 samples/sec   Loss 2.9764   LearningRate 0.0636   Epoch: 4   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:04,218-Speed 3337.93 samples/sec   Loss 2.9868   LearningRate 0.0636   Epoch: 4   Global Step: 67580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:07,290-Speed 3334.47 samples/sec   Loss 3.0047   LearningRate 0.0636   Epoch: 4   Global Step: 67590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:10,379-Speed 3315.15 samples/sec   Loss 3.0198   LearningRate 0.0636   Epoch: 4   Global Step: 67600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:13,466-Speed 3318.36 samples/sec   Loss 2.9887   LearningRate 0.0636   Epoch: 4   Global Step: 67610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:16,551-Speed 3319.54 samples/sec   Loss 3.0419   LearningRate 0.0636   Epoch: 4   Global Step: 67620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:19,671-Speed 3282.82 samples/sec   Loss 3.0326   LearningRate 0.0636   Epoch: 4   Global Step: 67630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:22:22,766-Speed 3309.38 samples/sec   Loss 3.0556   LearningRate 0.0636   Epoch: 4   Global Step: 67640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:22:25,855-Speed 3315.83 samples/sec   Loss 2.9760   LearningRate 0.0636   Epoch: 4   Global Step: 67650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:22:28,940-Speed 3321.00 samples/sec   Loss 3.0118   LearningRate 0.0636   Epoch: 4   Global Step: 67660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:22:32,083-Speed 3257.87 samples/sec   Loss 2.9564   LearningRate 0.0636   Epoch: 4   Global Step: 67670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:22:35,215-Speed 3270.18 samples/sec   Loss 2.9266   LearningRate 0.0636   Epoch: 4   Global Step: 67680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:22:38,286-Speed 3335.84 samples/sec   Loss 3.0104   LearningRate 0.0636   Epoch: 4   Global Step: 67690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:22:41,361-Speed 3330.97 samples/sec   Loss 2.9681   LearningRate 0.0636   Epoch: 4   Global Step: 67700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:44,432-Speed 3335.11 samples/sec   Loss 2.9691   LearningRate 0.0635   Epoch: 4   Global Step: 67710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:47,524-Speed 3312.00 samples/sec   Loss 3.0110   LearningRate 0.0635   Epoch: 4   Global Step: 67720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:50,684-Speed 3241.43 samples/sec   Loss 3.0525   LearningRate 0.0635   Epoch: 4   Global Step: 67730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:53,756-Speed 3334.77 samples/sec   Loss 3.0199   LearningRate 0.0635   Epoch: 4   Global Step: 67740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:56,827-Speed 3335.30 samples/sec   Loss 3.0516   LearningRate 0.0635   Epoch: 4   Global Step: 67750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:22:59,896-Speed 3336.98 samples/sec   Loss 3.0219   LearningRate 0.0635   Epoch: 4   Global Step: 67760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:02,964-Speed 3338.11 samples/sec   Loss 2.9742   LearningRate 0.0635   Epoch: 4   Global Step: 67770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:06,051-Speed 3318.44 samples/sec   Loss 2.9767   LearningRate 0.0635   Epoch: 4   Global Step: 67780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:09,121-Speed 3335.81 samples/sec   Loss 3.0347   LearningRate 0.0635   Epoch: 4   Global Step: 67790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:12,197-Speed 3330.00 samples/sec   Loss 3.0208   LearningRate 0.0635   Epoch: 4   Global Step: 67800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:23:15,272-Speed 3330.84 samples/sec   Loss 3.0297   LearningRate 0.0635   Epoch: 4   Global Step: 67810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:18,338-Speed 3341.04 samples/sec   Loss 3.0072   LearningRate 0.0635   Epoch: 4   Global Step: 67820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:21,417-Speed 3326.28 samples/sec   Loss 3.0399   LearningRate 0.0635   Epoch: 4   Global Step: 67830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:24,500-Speed 3322.45 samples/sec   Loss 2.9428   LearningRate 0.0635   Epoch: 4   Global Step: 67840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:27,605-Speed 3298.16 samples/sec   Loss 3.0311   LearningRate 0.0635   Epoch: 4   Global Step: 67850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:30,691-Speed 3319.75 samples/sec   Loss 3.0711   LearningRate 0.0635   Epoch: 4   Global Step: 67860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:33,841-Speed 3250.92 samples/sec   Loss 3.0847   LearningRate 0.0635   Epoch: 4   Global Step: 67870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:36,931-Speed 3314.57 samples/sec   Loss 3.0217   LearningRate 0.0635   Epoch: 4   Global Step: 67880   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:40,064-Speed 3269.07 samples/sec   Loss 3.0546   LearningRate 0.0635   Epoch: 4   Global Step: 67890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:43,192-Speed 3274.62 samples/sec   Loss 3.0168   LearningRate 0.0635   Epoch: 4   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:46,265-Speed 3333.75 samples/sec   Loss 3.0369   LearningRate 0.0635   Epoch: 4   Global Step: 67910   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:23:49,393-Speed 3273.77 samples/sec   Loss 3.0606   LearningRate 0.0634   Epoch: 4   Global Step: 67920   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:23:52,512-Speed 3284.67 samples/sec   Loss 3.0359   LearningRate 0.0634   Epoch: 4   Global Step: 67930   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:23:55,587-Speed 3331.04 samples/sec   Loss 2.9975   LearningRate 0.0634   Epoch: 4   Global Step: 67940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:23:58,656-Speed 3336.93 samples/sec   Loss 3.0141   LearningRate 0.0634   Epoch: 4   Global Step: 67950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:24:01,800-Speed 3257.66 samples/sec   Loss 3.0376   LearningRate 0.0634   Epoch: 4   Global Step: 67960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:24:04,877-Speed 3329.28 samples/sec   Loss 3.0514   LearningRate 0.0634   Epoch: 4   Global Step: 67970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:24:07,945-Speed 3337.55 samples/sec   Loss 3.0244   LearningRate 0.0634   Epoch: 4   Global Step: 67980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:24:11,022-Speed 3329.10 samples/sec   Loss 3.0991   LearningRate 0.0634   Epoch: 4   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:24:14,105-Speed 3322.26 samples/sec   Loss 3.0925   LearningRate 0.0634   Epoch: 4   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:24:57,830-[lfw][68000]XNorm: 24.379198
Training: 2022-04-11 06:24:57,831-[lfw][68000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-11 06:24:57,831-[lfw][68000]Accuracy-Highest: 0.99817
Training: 2022-04-11 06:25:48,793-[cfp_fp][68000]XNorm: 22.487743
Training: 2022-04-11 06:25:48,793-[cfp_fp][68000]Accuracy-Flip: 0.98114+-0.00556
Training: 2022-04-11 06:25:48,794-[cfp_fp][68000]Accuracy-Highest: 0.98414
Training: 2022-04-11 06:26:32,580-[agedb_30][68000]XNorm: 24.320105
Training: 2022-04-11 06:26:32,581-[agedb_30][68000]Accuracy-Flip: 0.97967+-0.00767
Training: 2022-04-11 06:26:32,581-[agedb_30][68000]Accuracy-Highest: 0.98100
Training: 2022-04-11 06:26:35,640-Speed 72.35 samples/sec   Loss 3.0221   LearningRate 0.0634   Epoch: 4   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:26:38,711-Speed 3335.87 samples/sec   Loss 3.0463   LearningRate 0.0634   Epoch: 4   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:26:41,780-Speed 3336.90 samples/sec   Loss 3.0849   LearningRate 0.0634   Epoch: 4   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:26:44,833-Speed 3355.51 samples/sec   Loss 3.0565   LearningRate 0.0634   Epoch: 4   Global Step: 68040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:26:47,916-Speed 3321.39 samples/sec   Loss 3.0952   LearningRate 0.0634   Epoch: 4   Global Step: 68050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:26:50,983-Speed 3340.04 samples/sec   Loss 3.1152   LearningRate 0.0634   Epoch: 4   Global Step: 68060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:26:54,061-Speed 3327.34 samples/sec   Loss 3.0588   LearningRate 0.0634   Epoch: 4   Global Step: 68070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:26:57,221-Speed 3241.80 samples/sec   Loss 3.0579   LearningRate 0.0634   Epoch: 4   Global Step: 68080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:27:00,364-Speed 3257.91 samples/sec   Loss 3.1140   LearningRate 0.0634   Epoch: 4   Global Step: 68090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:27:03,436-Speed 3334.75 samples/sec   Loss 3.1131   LearningRate 0.0634   Epoch: 4   Global Step: 68100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:27:06,522-Speed 3319.84 samples/sec   Loss 3.0697   LearningRate 0.0634   Epoch: 4   Global Step: 68110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:27:09,639-Speed 3285.81 samples/sec   Loss 3.0364   LearningRate 0.0634   Epoch: 4   Global Step: 68120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:27:12,715-Speed 3329.85 samples/sec   Loss 3.0872   LearningRate 0.0633   Epoch: 4   Global Step: 68130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:27:15,784-Speed 3339.12 samples/sec   Loss 3.0864   LearningRate 0.0633   Epoch: 4   Global Step: 68140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:27:18,872-Speed 3316.04 samples/sec   Loss 3.1316   LearningRate 0.0633   Epoch: 4   Global Step: 68150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:27:21,926-Speed 3354.32 samples/sec   Loss 3.0241   LearningRate 0.0633   Epoch: 4   Global Step: 68160   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:25,033-Speed 3296.27 samples/sec   Loss 3.0340   LearningRate 0.0633   Epoch: 4   Global Step: 68170   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:28,134-Speed 3302.57 samples/sec   Loss 3.0828   LearningRate 0.0633   Epoch: 4   Global Step: 68180   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:31,275-Speed 3261.17 samples/sec   Loss 3.0768   LearningRate 0.0633   Epoch: 4   Global Step: 68190   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:34,419-Speed 3257.73 samples/sec   Loss 3.0497   LearningRate 0.0633   Epoch: 4   Global Step: 68200   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:37,559-Speed 3262.49 samples/sec   Loss 3.0762   LearningRate 0.0633   Epoch: 4   Global Step: 68210   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:40,631-Speed 3333.56 samples/sec   Loss 3.1012   LearningRate 0.0633   Epoch: 4   Global Step: 68220   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:43,712-Speed 3324.33 samples/sec   Loss 3.0828   LearningRate 0.0633   Epoch: 4   Global Step: 68230   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:46,784-Speed 3334.41 samples/sec   Loss 3.1568   LearningRate 0.0633   Epoch: 4   Global Step: 68240   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:49,872-Speed 3316.42 samples/sec   Loss 3.0539   LearningRate 0.0633   Epoch: 4   Global Step: 68250   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:52,986-Speed 3289.54 samples/sec   Loss 3.0309   LearningRate 0.0633   Epoch: 4   Global Step: 68260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:27:56,064-Speed 3327.45 samples/sec   Loss 3.0368   LearningRate 0.0633   Epoch: 4   Global Step: 68270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:27:59,133-Speed 3338.11 samples/sec   Loss 3.0939   LearningRate 0.0633   Epoch: 4   Global Step: 68280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:02,192-Speed 3348.47 samples/sec   Loss 3.0589   LearningRate 0.0633   Epoch: 4   Global Step: 68290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:05,254-Speed 3344.06 samples/sec   Loss 3.0704   LearningRate 0.0633   Epoch: 4   Global Step: 68300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:08,432-Speed 3223.15 samples/sec   Loss 2.9532   LearningRate 0.0633   Epoch: 4   Global Step: 68310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:11,499-Speed 3341.10 samples/sec   Loss 3.1568   LearningRate 0.0633   Epoch: 4   Global Step: 68320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:14,558-Speed 3347.73 samples/sec   Loss 3.0465   LearningRate 0.0633   Epoch: 4   Global Step: 68330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:17,624-Speed 3340.40 samples/sec   Loss 3.0713   LearningRate 0.0632   Epoch: 4   Global Step: 68340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:20,695-Speed 3335.84 samples/sec   Loss 3.0395   LearningRate 0.0632   Epoch: 4   Global Step: 68350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:23,751-Speed 3351.48 samples/sec   Loss 3.0875   LearningRate 0.0632   Epoch: 4   Global Step: 68360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:26,809-Speed 3349.13 samples/sec   Loss 3.1704   LearningRate 0.0632   Epoch: 4   Global Step: 68370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:28:29,868-Speed 3349.12 samples/sec   Loss 3.0777   LearningRate 0.0632   Epoch: 4   Global Step: 68380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:28:32,937-Speed 3336.94 samples/sec   Loss 3.1655   LearningRate 0.0632   Epoch: 4   Global Step: 68390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:28:35,995-Speed 3349.01 samples/sec   Loss 3.1432   LearningRate 0.0632   Epoch: 4   Global Step: 68400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:28:39,060-Speed 3342.55 samples/sec   Loss 3.0319   LearningRate 0.0632   Epoch: 4   Global Step: 68410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:28:42,150-Speed 3314.08 samples/sec   Loss 3.0283   LearningRate 0.0632   Epoch: 4   Global Step: 68420   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:28:45,213-Speed 3344.49 samples/sec   Loss 3.1139   LearningRate 0.0632   Epoch: 4   Global Step: 68430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:28:48,281-Speed 3338.46 samples/sec   Loss 3.0866   LearningRate 0.0632   Epoch: 4   Global Step: 68440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:51,384-Speed 3300.28 samples/sec   Loss 3.0748   LearningRate 0.0632   Epoch: 4   Global Step: 68450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:54,482-Speed 3306.30 samples/sec   Loss 3.1215   LearningRate 0.0632   Epoch: 4   Global Step: 68460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:28:57,668-Speed 3214.42 samples/sec   Loss 3.1579   LearningRate 0.0632   Epoch: 4   Global Step: 68470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:00,761-Speed 3311.96 samples/sec   Loss 3.1416   LearningRate 0.0632   Epoch: 4   Global Step: 68480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:03,822-Speed 3345.96 samples/sec   Loss 3.1650   LearningRate 0.0632   Epoch: 4   Global Step: 68490   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:06,883-Speed 3345.58 samples/sec   Loss 3.0990   LearningRate 0.0632   Epoch: 4   Global Step: 68500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:09,979-Speed 3309.62 samples/sec   Loss 3.1081   LearningRate 0.0632   Epoch: 4   Global Step: 68510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:13,822-Speed 2664.82 samples/sec   Loss 3.0788   LearningRate 0.0632   Epoch: 4   Global Step: 68520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:16,915-Speed 3311.73 samples/sec   Loss 3.1286   LearningRate 0.0632   Epoch: 4   Global Step: 68530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:19,987-Speed 3333.82 samples/sec   Loss 3.0605   LearningRate 0.0632   Epoch: 4   Global Step: 68540   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:29:23,066-Speed 3326.24 samples/sec   Loss 3.0615   LearningRate 0.0631   Epoch: 4   Global Step: 68550   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:29:26,127-Speed 3346.72 samples/sec   Loss 3.0676   LearningRate 0.0631   Epoch: 4   Global Step: 68560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:29:29,194-Speed 3340.21 samples/sec   Loss 3.0321   LearningRate 0.0631   Epoch: 4   Global Step: 68570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:29:32,415-Speed 3179.69 samples/sec   Loss 3.0547   LearningRate 0.0631   Epoch: 4   Global Step: 68580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:29:35,486-Speed 3334.66 samples/sec   Loss 3.1488   LearningRate 0.0631   Epoch: 4   Global Step: 68590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:29:38,548-Speed 3345.08 samples/sec   Loss 3.1710   LearningRate 0.0631   Epoch: 4   Global Step: 68600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:29:41,631-Speed 3322.96 samples/sec   Loss 3.1074   LearningRate 0.0631   Epoch: 4   Global Step: 68610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:44,746-Speed 3287.85 samples/sec   Loss 3.0484   LearningRate 0.0631   Epoch: 4   Global Step: 68620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:47,819-Speed 3333.07 samples/sec   Loss 3.1388   LearningRate 0.0631   Epoch: 4   Global Step: 68630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:50,992-Speed 3227.46 samples/sec   Loss 3.1019   LearningRate 0.0631   Epoch: 4   Global Step: 68640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:54,067-Speed 3331.13 samples/sec   Loss 3.1339   LearningRate 0.0631   Epoch: 4   Global Step: 68650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:29:57,159-Speed 3312.49 samples/sec   Loss 3.1845   LearningRate 0.0631   Epoch: 4   Global Step: 68660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:00,220-Speed 3346.78 samples/sec   Loss 3.0834   LearningRate 0.0631   Epoch: 4   Global Step: 68670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:03,325-Speed 3298.71 samples/sec   Loss 3.1570   LearningRate 0.0631   Epoch: 4   Global Step: 68680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:06,387-Speed 3344.87 samples/sec   Loss 3.1389   LearningRate 0.0631   Epoch: 4   Global Step: 68690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:09,447-Speed 3346.79 samples/sec   Loss 3.1409   LearningRate 0.0631   Epoch: 4   Global Step: 68700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:12,517-Speed 3336.96 samples/sec   Loss 3.0875   LearningRate 0.0631   Epoch: 4   Global Step: 68710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:30:15,603-Speed 3318.84 samples/sec   Loss 3.0530   LearningRate 0.0631   Epoch: 4   Global Step: 68720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:30:18,665-Speed 3344.65 samples/sec   Loss 3.0638   LearningRate 0.0631   Epoch: 4   Global Step: 68730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:30:21,746-Speed 3324.69 samples/sec   Loss 3.1601   LearningRate 0.0631   Epoch: 4   Global Step: 68740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:30:24,984-Speed 3163.20 samples/sec   Loss 3.0873   LearningRate 0.0631   Epoch: 4   Global Step: 68750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:30:28,053-Speed 3337.32 samples/sec   Loss 3.1990   LearningRate 0.0630   Epoch: 4   Global Step: 68760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:30:31,117-Speed 3342.80 samples/sec   Loss 3.1545   LearningRate 0.0630   Epoch: 4   Global Step: 68770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:30:34,179-Speed 3344.79 samples/sec   Loss 3.0862   LearningRate 0.0630   Epoch: 4   Global Step: 68780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:37,255-Speed 3329.92 samples/sec   Loss 3.1328   LearningRate 0.0630   Epoch: 4   Global Step: 68790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:40,322-Speed 3339.90 samples/sec   Loss 3.1346   LearningRate 0.0630   Epoch: 4   Global Step: 68800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:43,386-Speed 3342.81 samples/sec   Loss 3.1189   LearningRate 0.0630   Epoch: 4   Global Step: 68810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:46,459-Speed 3332.78 samples/sec   Loss 3.1295   LearningRate 0.0630   Epoch: 4   Global Step: 68820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:49,706-Speed 3154.70 samples/sec   Loss 3.1618   LearningRate 0.0630   Epoch: 4   Global Step: 68830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:52,846-Speed 3261.78 samples/sec   Loss 3.1565   LearningRate 0.0630   Epoch: 4   Global Step: 68840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:55,918-Speed 3334.57 samples/sec   Loss 3.1787   LearningRate 0.0630   Epoch: 4   Global Step: 68850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:30:58,983-Speed 3341.39 samples/sec   Loss 2.9902   LearningRate 0.0630   Epoch: 4   Global Step: 68860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:02,045-Speed 3344.96 samples/sec   Loss 3.1266   LearningRate 0.0630   Epoch: 4   Global Step: 68870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:05,106-Speed 3345.86 samples/sec   Loss 3.1522   LearningRate 0.0630   Epoch: 4   Global Step: 68880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:31:08,169-Speed 3343.74 samples/sec   Loss 3.1044   LearningRate 0.0630   Epoch: 4   Global Step: 68890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:11,237-Speed 3338.89 samples/sec   Loss 3.1064   LearningRate 0.0630   Epoch: 4   Global Step: 68900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:14,332-Speed 3309.12 samples/sec   Loss 3.1767   LearningRate 0.0630   Epoch: 4   Global Step: 68910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:17,529-Speed 3203.97 samples/sec   Loss 3.2131   LearningRate 0.0630   Epoch: 4   Global Step: 68920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:20,734-Speed 3196.48 samples/sec   Loss 3.1549   LearningRate 0.0630   Epoch: 4   Global Step: 68930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:23,827-Speed 3311.57 samples/sec   Loss 3.1222   LearningRate 0.0630   Epoch: 4   Global Step: 68940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:26,892-Speed 3341.80 samples/sec   Loss 3.2162   LearningRate 0.0630   Epoch: 4   Global Step: 68950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:29,952-Speed 3347.07 samples/sec   Loss 3.1019   LearningRate 0.0630   Epoch: 4   Global Step: 68960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:33,043-Speed 3313.08 samples/sec   Loss 3.1382   LearningRate 0.0629   Epoch: 4   Global Step: 68970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:36,156-Speed 3290.73 samples/sec   Loss 3.1998   LearningRate 0.0629   Epoch: 4   Global Step: 68980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:31:39,262-Speed 3297.25 samples/sec   Loss 3.2507   LearningRate 0.0629   Epoch: 4   Global Step: 68990   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:31:42,368-Speed 3297.18 samples/sec   Loss 3.1100   LearningRate 0.0629   Epoch: 4   Global Step: 69000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:31:45,445-Speed 3328.43 samples/sec   Loss 3.2229   LearningRate 0.0629   Epoch: 4   Global Step: 69010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:31:48,513-Speed 3338.30 samples/sec   Loss 3.1340   LearningRate 0.0629   Epoch: 4   Global Step: 69020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:31:51,694-Speed 3221.08 samples/sec   Loss 3.1109   LearningRate 0.0629   Epoch: 4   Global Step: 69030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:31:54,791-Speed 3306.91 samples/sec   Loss 3.1562   LearningRate 0.0629   Epoch: 4   Global Step: 69040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:31:57,875-Speed 3320.64 samples/sec   Loss 3.0973   LearningRate 0.0629   Epoch: 4   Global Step: 69050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:00,961-Speed 3318.79 samples/sec   Loss 3.2146   LearningRate 0.0629   Epoch: 4   Global Step: 69060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:04,085-Speed 3279.55 samples/sec   Loss 3.1699   LearningRate 0.0629   Epoch: 4   Global Step: 69070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:07,171-Speed 3318.10 samples/sec   Loss 3.1130   LearningRate 0.0629   Epoch: 4   Global Step: 69080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:10,242-Speed 3335.25 samples/sec   Loss 3.1506   LearningRate 0.0629   Epoch: 4   Global Step: 69090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:13,355-Speed 3290.43 samples/sec   Loss 3.1117   LearningRate 0.0629   Epoch: 4   Global Step: 69100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:16,431-Speed 3330.24 samples/sec   Loss 3.1448   LearningRate 0.0629   Epoch: 4   Global Step: 69110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:19,498-Speed 3339.34 samples/sec   Loss 3.1395   LearningRate 0.0629   Epoch: 4   Global Step: 69120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:22,565-Speed 3339.73 samples/sec   Loss 3.0918   LearningRate 0.0629   Epoch: 4   Global Step: 69130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:25,638-Speed 3333.08 samples/sec   Loss 3.1353   LearningRate 0.0629   Epoch: 4   Global Step: 69140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:28,708-Speed 3336.32 samples/sec   Loss 3.2095   LearningRate 0.0629   Epoch: 4   Global Step: 69150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:31,772-Speed 3342.66 samples/sec   Loss 3.1439   LearningRate 0.0629   Epoch: 4   Global Step: 69160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:34,843-Speed 3335.94 samples/sec   Loss 3.1455   LearningRate 0.0629   Epoch: 4   Global Step: 69170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:37,912-Speed 3337.23 samples/sec   Loss 3.1440   LearningRate 0.0628   Epoch: 4   Global Step: 69180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:40,973-Speed 3345.76 samples/sec   Loss 3.1585   LearningRate 0.0628   Epoch: 4   Global Step: 69190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:44,064-Speed 3312.99 samples/sec   Loss 3.0695   LearningRate 0.0628   Epoch: 4   Global Step: 69200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:47,141-Speed 3329.29 samples/sec   Loss 3.1018   LearningRate 0.0628   Epoch: 4   Global Step: 69210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:50,258-Speed 3286.40 samples/sec   Loss 3.1079   LearningRate 0.0628   Epoch: 4   Global Step: 69220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:53,353-Speed 3310.33 samples/sec   Loss 3.1352   LearningRate 0.0628   Epoch: 4   Global Step: 69230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:56,419-Speed 3340.16 samples/sec   Loss 3.1612   LearningRate 0.0628   Epoch: 4   Global Step: 69240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:32:59,520-Speed 3302.69 samples/sec   Loss 3.1302   LearningRate 0.0628   Epoch: 4   Global Step: 69250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:02,587-Speed 3339.91 samples/sec   Loss 3.1925   LearningRate 0.0628   Epoch: 4   Global Step: 69260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:05,651-Speed 3342.77 samples/sec   Loss 3.2046   LearningRate 0.0628   Epoch: 4   Global Step: 69270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:08,715-Speed 3342.25 samples/sec   Loss 3.2120   LearningRate 0.0628   Epoch: 4   Global Step: 69280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:11,776-Speed 3346.40 samples/sec   Loss 3.2140   LearningRate 0.0628   Epoch: 4   Global Step: 69290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:14,839-Speed 3343.73 samples/sec   Loss 3.1704   LearningRate 0.0628   Epoch: 4   Global Step: 69300   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:17,903-Speed 3342.86 samples/sec   Loss 3.2241   LearningRate 0.0628   Epoch: 4   Global Step: 69310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:20,964-Speed 3346.62 samples/sec   Loss 3.2555   LearningRate 0.0628   Epoch: 4   Global Step: 69320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:24,032-Speed 3338.41 samples/sec   Loss 3.1981   LearningRate 0.0628   Epoch: 4   Global Step: 69330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:27,094-Speed 3344.86 samples/sec   Loss 3.2073   LearningRate 0.0628   Epoch: 4   Global Step: 69340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:30,199-Speed 3298.31 samples/sec   Loss 3.1854   LearningRate 0.0628   Epoch: 4   Global Step: 69350   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:33,263-Speed 3342.66 samples/sec   Loss 3.2751   LearningRate 0.0628   Epoch: 4   Global Step: 69360   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:36,366-Speed 3301.38 samples/sec   Loss 3.1514   LearningRate 0.0628   Epoch: 4   Global Step: 69370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:33:39,426-Speed 3347.21 samples/sec   Loss 3.1859   LearningRate 0.0628   Epoch: 4   Global Step: 69380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:33:42,506-Speed 3324.45 samples/sec   Loss 3.1198   LearningRate 0.0627   Epoch: 4   Global Step: 69390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:33:45,581-Speed 3331.33 samples/sec   Loss 3.1788   LearningRate 0.0627   Epoch: 4   Global Step: 69400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:33:48,658-Speed 3328.60 samples/sec   Loss 3.1166   LearningRate 0.0627   Epoch: 4   Global Step: 69410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:33:51,723-Speed 3342.84 samples/sec   Loss 3.1468   LearningRate 0.0627   Epoch: 4   Global Step: 69420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:33:54,784-Speed 3345.15 samples/sec   Loss 3.1722   LearningRate 0.0627   Epoch: 4   Global Step: 69430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:33:57,869-Speed 3321.00 samples/sec   Loss 3.1964   LearningRate 0.0627   Epoch: 4   Global Step: 69440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:00,935-Speed 3339.96 samples/sec   Loss 3.2238   LearningRate 0.0627   Epoch: 4   Global Step: 69450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:04,000-Speed 3341.90 samples/sec   Loss 3.1520   LearningRate 0.0627   Epoch: 4   Global Step: 69460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:07,069-Speed 3337.62 samples/sec   Loss 3.1723   LearningRate 0.0627   Epoch: 4   Global Step: 69470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:10,219-Speed 3250.94 samples/sec   Loss 3.1847   LearningRate 0.0627   Epoch: 4   Global Step: 69480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:34:13,319-Speed 3303.67 samples/sec   Loss 3.1691   LearningRate 0.0627   Epoch: 4   Global Step: 69490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:34:16,376-Speed 3350.70 samples/sec   Loss 3.1748   LearningRate 0.0627   Epoch: 4   Global Step: 69500   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:19,471-Speed 3309.78 samples/sec   Loss 3.1702   LearningRate 0.0627   Epoch: 4   Global Step: 69510   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:22,625-Speed 3247.89 samples/sec   Loss 3.2068   LearningRate 0.0627   Epoch: 4   Global Step: 69520   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:25,692-Speed 3339.19 samples/sec   Loss 3.1999   LearningRate 0.0627   Epoch: 4   Global Step: 69530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:28,753-Speed 3345.52 samples/sec   Loss 3.1225   LearningRate 0.0627   Epoch: 4   Global Step: 69540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:31,824-Speed 3335.16 samples/sec   Loss 3.2371   LearningRate 0.0627   Epoch: 4   Global Step: 69550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:34,908-Speed 3320.98 samples/sec   Loss 3.2200   LearningRate 0.0627   Epoch: 4   Global Step: 69560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:38,031-Speed 3280.42 samples/sec   Loss 3.1202   LearningRate 0.0627   Epoch: 4   Global Step: 69570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:41,109-Speed 3327.42 samples/sec   Loss 3.1378   LearningRate 0.0627   Epoch: 4   Global Step: 69580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:44,224-Speed 3287.51 samples/sec   Loss 3.1863   LearningRate 0.0627   Epoch: 4   Global Step: 69590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:34:47,365-Speed 3261.34 samples/sec   Loss 3.2096   LearningRate 0.0626   Epoch: 4   Global Step: 69600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:34:50,440-Speed 3330.82 samples/sec   Loss 3.2380   LearningRate 0.0626   Epoch: 4   Global Step: 69610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:34:53,507-Speed 3340.36 samples/sec   Loss 3.2267   LearningRate 0.0626   Epoch: 4   Global Step: 69620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:34:56,575-Speed 3337.34 samples/sec   Loss 3.1370   LearningRate 0.0626   Epoch: 4   Global Step: 69630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:34:59,666-Speed 3314.35 samples/sec   Loss 3.1762   LearningRate 0.0626   Epoch: 4   Global Step: 69640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:02,751-Speed 3319.10 samples/sec   Loss 3.2262   LearningRate 0.0626   Epoch: 4   Global Step: 69650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:05,813-Speed 3345.78 samples/sec   Loss 3.2244   LearningRate 0.0626   Epoch: 4   Global Step: 69660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:08,875-Speed 3344.44 samples/sec   Loss 3.1464   LearningRate 0.0626   Epoch: 4   Global Step: 69670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:11,935-Speed 3347.56 samples/sec   Loss 3.2611   LearningRate 0.0626   Epoch: 4   Global Step: 69680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:15,004-Speed 3336.79 samples/sec   Loss 3.1509   LearningRate 0.0626   Epoch: 4   Global Step: 69690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:18,108-Speed 3300.28 samples/sec   Loss 3.2039   LearningRate 0.0626   Epoch: 4   Global Step: 69700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:21,180-Speed 3334.77 samples/sec   Loss 3.2407   LearningRate 0.0626   Epoch: 4   Global Step: 69710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:24,251-Speed 3334.89 samples/sec   Loss 3.1895   LearningRate 0.0626   Epoch: 4   Global Step: 69720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:27,331-Speed 3325.40 samples/sec   Loss 3.2622   LearningRate 0.0626   Epoch: 4   Global Step: 69730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:30,427-Speed 3307.61 samples/sec   Loss 3.1903   LearningRate 0.0626   Epoch: 4   Global Step: 69740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:33,574-Speed 3254.56 samples/sec   Loss 3.2033   LearningRate 0.0626   Epoch: 4   Global Step: 69750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:36,687-Speed 3290.09 samples/sec   Loss 3.1956   LearningRate 0.0626   Epoch: 4   Global Step: 69760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:39,836-Speed 3253.27 samples/sec   Loss 3.2224   LearningRate 0.0626   Epoch: 4   Global Step: 69770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:42,922-Speed 3318.46 samples/sec   Loss 3.2534   LearningRate 0.0626   Epoch: 4   Global Step: 69780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:45,988-Speed 3341.05 samples/sec   Loss 3.2740   LearningRate 0.0626   Epoch: 4   Global Step: 69790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:49,097-Speed 3294.14 samples/sec   Loss 3.2115   LearningRate 0.0626   Epoch: 4   Global Step: 69800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:52,168-Speed 3335.30 samples/sec   Loss 3.2031   LearningRate 0.0625   Epoch: 4   Global Step: 69810   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:55,241-Speed 3333.50 samples/sec   Loss 3.2098   LearningRate 0.0625   Epoch: 4   Global Step: 69820   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:35:58,307-Speed 3339.60 samples/sec   Loss 3.2104   LearningRate 0.0625   Epoch: 4   Global Step: 69830   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:36:01,400-Speed 3312.17 samples/sec   Loss 3.2425   LearningRate 0.0625   Epoch: 4   Global Step: 69840   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:36:04,588-Speed 3212.73 samples/sec   Loss 3.2467   LearningRate 0.0625   Epoch: 4   Global Step: 69850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:36:07,662-Speed 3331.60 samples/sec   Loss 3.1744   LearningRate 0.0625   Epoch: 4   Global Step: 69860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:36:10,812-Speed 3251.40 samples/sec   Loss 3.2123   LearningRate 0.0625   Epoch: 4   Global Step: 69870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:36:13,998-Speed 3215.07 samples/sec   Loss 3.2336   LearningRate 0.0625   Epoch: 4   Global Step: 69880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:36:17,203-Speed 3195.78 samples/sec   Loss 3.2012   LearningRate 0.0625   Epoch: 4   Global Step: 69890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:20,432-Speed 3172.14 samples/sec   Loss 3.2175   LearningRate 0.0625   Epoch: 4   Global Step: 69900   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:23,495-Speed 3343.97 samples/sec   Loss 3.1730   LearningRate 0.0625   Epoch: 4   Global Step: 69910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:26,569-Speed 3331.24 samples/sec   Loss 3.2280   LearningRate 0.0625   Epoch: 4   Global Step: 69920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:29,653-Speed 3321.41 samples/sec   Loss 3.2093   LearningRate 0.0625   Epoch: 4   Global Step: 69930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:32,767-Speed 3289.54 samples/sec   Loss 3.1117   LearningRate 0.0625   Epoch: 4   Global Step: 69940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:35,886-Speed 3284.74 samples/sec   Loss 3.2627   LearningRate 0.0625   Epoch: 4   Global Step: 69950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:38,964-Speed 3327.43 samples/sec   Loss 3.2425   LearningRate 0.0625   Epoch: 4   Global Step: 69960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:42,060-Speed 3308.29 samples/sec   Loss 3.1536   LearningRate 0.0625   Epoch: 4   Global Step: 69970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:45,210-Speed 3251.40 samples/sec   Loss 3.1582   LearningRate 0.0625   Epoch: 4   Global Step: 69980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:36:48,283-Speed 3332.79 samples/sec   Loss 3.1951   LearningRate 0.0625   Epoch: 4   Global Step: 69990   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:36:51,414-Speed 3272.22 samples/sec   Loss 3.1686   LearningRate 0.0625   Epoch: 4   Global Step: 70000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:37:35,336-[lfw][70000]XNorm: 22.920241
Training: 2022-04-11 06:37:35,337-[lfw][70000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 06:37:35,337-[lfw][70000]Accuracy-Highest: 0.99817
Training: 2022-04-11 06:38:26,004-[cfp_fp][70000]XNorm: 21.592222
Training: 2022-04-11 06:38:26,004-[cfp_fp][70000]Accuracy-Flip: 0.98457+-0.00432
Training: 2022-04-11 06:38:26,005-[cfp_fp][70000]Accuracy-Highest: 0.98457
Training: 2022-04-11 06:39:09,629-[agedb_30][70000]XNorm: 23.004139
Training: 2022-04-11 06:39:09,630-[agedb_30][70000]Accuracy-Flip: 0.98017+-0.00701
Training: 2022-04-11 06:39:09,630-[agedb_30][70000]Accuracy-Highest: 0.98100
Training: 2022-04-11 06:39:12,832-Speed 72.41 samples/sec   Loss 3.2165   LearningRate 0.0625   Epoch: 4   Global Step: 70010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:39:16,038-Speed 3193.68 samples/sec   Loss 3.2182   LearningRate 0.0624   Epoch: 4   Global Step: 70020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:39:19,136-Speed 3306.67 samples/sec   Loss 3.1936   LearningRate 0.0624   Epoch: 4   Global Step: 70030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:39:22,187-Speed 3357.35 samples/sec   Loss 3.1785   LearningRate 0.0624   Epoch: 4   Global Step: 70040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:39:25,235-Speed 3359.83 samples/sec   Loss 3.2123   LearningRate 0.0624   Epoch: 4   Global Step: 70050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:39:28,357-Speed 3280.18 samples/sec   Loss 3.1381   LearningRate 0.0624   Epoch: 4   Global Step: 70060   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:31,408-Speed 3358.02 samples/sec   Loss 3.2236   LearningRate 0.0624   Epoch: 4   Global Step: 70070   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:34,581-Speed 3227.61 samples/sec   Loss 3.2412   LearningRate 0.0624   Epoch: 4   Global Step: 70080   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:37,636-Speed 3352.51 samples/sec   Loss 3.2471   LearningRate 0.0624   Epoch: 4   Global Step: 70090   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:40,740-Speed 3300.11 samples/sec   Loss 3.2635   LearningRate 0.0624   Epoch: 4   Global Step: 70100   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:43,795-Speed 3352.51 samples/sec   Loss 3.2295   LearningRate 0.0624   Epoch: 4   Global Step: 70110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:46,887-Speed 3313.13 samples/sec   Loss 3.1986   LearningRate 0.0624   Epoch: 4   Global Step: 70120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:50,032-Speed 3256.55 samples/sec   Loss 3.2097   LearningRate 0.0624   Epoch: 4   Global Step: 70130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:53,113-Speed 3325.14 samples/sec   Loss 3.1912   LearningRate 0.0624   Epoch: 4   Global Step: 70140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:56,223-Speed 3293.38 samples/sec   Loss 3.1963   LearningRate 0.0624   Epoch: 4   Global Step: 70150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:39:59,289-Speed 3339.84 samples/sec   Loss 3.2087   LearningRate 0.0624   Epoch: 4   Global Step: 70160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:02,372-Speed 3322.79 samples/sec   Loss 3.3845   LearningRate 0.0624   Epoch: 4   Global Step: 70170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:05,438-Speed 3340.67 samples/sec   Loss 3.1597   LearningRate 0.0624   Epoch: 4   Global Step: 70180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:08,500-Speed 3345.15 samples/sec   Loss 3.1570   LearningRate 0.0624   Epoch: 4   Global Step: 70190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:11,574-Speed 3331.28 samples/sec   Loss 3.2213   LearningRate 0.0624   Epoch: 4   Global Step: 70200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:14,637-Speed 3344.12 samples/sec   Loss 3.3123   LearningRate 0.0624   Epoch: 4   Global Step: 70210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:17,707-Speed 3336.64 samples/sec   Loss 3.2808   LearningRate 0.0624   Epoch: 4   Global Step: 70220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:20,769-Speed 3345.04 samples/sec   Loss 3.2714   LearningRate 0.0623   Epoch: 4   Global Step: 70230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:23,832-Speed 3344.13 samples/sec   Loss 3.2339   LearningRate 0.0623   Epoch: 4   Global Step: 70240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:26,907-Speed 3330.78 samples/sec   Loss 3.2596   LearningRate 0.0623   Epoch: 4   Global Step: 70250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:29,961-Speed 3353.48 samples/sec   Loss 3.1562   LearningRate 0.0623   Epoch: 4   Global Step: 70260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:33,026-Speed 3341.57 samples/sec   Loss 3.1464   LearningRate 0.0623   Epoch: 4   Global Step: 70270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:36,091-Speed 3342.37 samples/sec   Loss 3.3098   LearningRate 0.0623   Epoch: 4   Global Step: 70280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:39,166-Speed 3330.53 samples/sec   Loss 3.2295   LearningRate 0.0623   Epoch: 4   Global Step: 70290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:42,245-Speed 3326.51 samples/sec   Loss 3.2289   LearningRate 0.0623   Epoch: 4   Global Step: 70300   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:45,359-Speed 3288.67 samples/sec   Loss 3.1847   LearningRate 0.0623   Epoch: 4   Global Step: 70310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:48,438-Speed 3326.85 samples/sec   Loss 3.2516   LearningRate 0.0623   Epoch: 4   Global Step: 70320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:51,530-Speed 3312.80 samples/sec   Loss 3.2147   LearningRate 0.0623   Epoch: 4   Global Step: 70330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:54,623-Speed 3311.79 samples/sec   Loss 3.2787   LearningRate 0.0623   Epoch: 4   Global Step: 70340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:40:57,696-Speed 3332.93 samples/sec   Loss 3.2373   LearningRate 0.0623   Epoch: 4   Global Step: 70350   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:00,826-Speed 3271.25 samples/sec   Loss 3.3014   LearningRate 0.0623   Epoch: 4   Global Step: 70360   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:41:03,903-Speed 3329.38 samples/sec   Loss 3.2624   LearningRate 0.0623   Epoch: 4   Global Step: 70370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:06,967-Speed 3343.48 samples/sec   Loss 3.1749   LearningRate 0.0623   Epoch: 4   Global Step: 70380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:10,032-Speed 3341.74 samples/sec   Loss 3.2361   LearningRate 0.0623   Epoch: 4   Global Step: 70390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:13,129-Speed 3306.72 samples/sec   Loss 3.2571   LearningRate 0.0623   Epoch: 4   Global Step: 70400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:16,210-Speed 3325.02 samples/sec   Loss 3.1929   LearningRate 0.0623   Epoch: 4   Global Step: 70410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:19,281-Speed 3334.68 samples/sec   Loss 3.1812   LearningRate 0.0623   Epoch: 4   Global Step: 70420   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:22,363-Speed 3323.84 samples/sec   Loss 3.1401   LearningRate 0.0623   Epoch: 4   Global Step: 70430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:25,544-Speed 3219.11 samples/sec   Loss 3.2198   LearningRate 0.0623   Epoch: 4   Global Step: 70440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:28,660-Speed 3286.79 samples/sec   Loss 3.2472   LearningRate 0.0622   Epoch: 4   Global Step: 70450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:31,719-Speed 3348.99 samples/sec   Loss 3.2345   LearningRate 0.0622   Epoch: 4   Global Step: 70460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:34,765-Speed 3363.35 samples/sec   Loss 3.2079   LearningRate 0.0622   Epoch: 4   Global Step: 70470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:37,849-Speed 3321.37 samples/sec   Loss 3.3005   LearningRate 0.0622   Epoch: 4   Global Step: 70480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:40,951-Speed 3301.28 samples/sec   Loss 3.2190   LearningRate 0.0622   Epoch: 4   Global Step: 70490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:44,008-Speed 3350.47 samples/sec   Loss 3.2411   LearningRate 0.0622   Epoch: 4   Global Step: 70500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:47,070-Speed 3345.67 samples/sec   Loss 3.1505   LearningRate 0.0622   Epoch: 4   Global Step: 70510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:50,134-Speed 3342.07 samples/sec   Loss 3.2289   LearningRate 0.0622   Epoch: 4   Global Step: 70520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:53,332-Speed 3203.39 samples/sec   Loss 3.1672   LearningRate 0.0622   Epoch: 4   Global Step: 70530   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:56,447-Speed 3287.44 samples/sec   Loss 3.2390   LearningRate 0.0622   Epoch: 4   Global Step: 70540   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:41:59,513-Speed 3341.49 samples/sec   Loss 3.2480   LearningRate 0.0622   Epoch: 4   Global Step: 70550   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:02,580-Speed 3339.23 samples/sec   Loss 3.2538   LearningRate 0.0622   Epoch: 4   Global Step: 70560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:05,630-Speed 3357.94 samples/sec   Loss 3.2032   LearningRate 0.0622   Epoch: 4   Global Step: 70570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:08,692-Speed 3345.84 samples/sec   Loss 3.1926   LearningRate 0.0622   Epoch: 4   Global Step: 70580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:11,761-Speed 3336.74 samples/sec   Loss 3.2815   LearningRate 0.0622   Epoch: 4   Global Step: 70590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:14,885-Speed 3278.46 samples/sec   Loss 3.2384   LearningRate 0.0622   Epoch: 4   Global Step: 70600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:17,999-Speed 3289.10 samples/sec   Loss 3.2605   LearningRate 0.0622   Epoch: 4   Global Step: 70610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:21,066-Speed 3339.47 samples/sec   Loss 3.1290   LearningRate 0.0622   Epoch: 4   Global Step: 70620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:24,126-Speed 3347.45 samples/sec   Loss 3.1848   LearningRate 0.0622   Epoch: 4   Global Step: 70630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:27,205-Speed 3326.71 samples/sec   Loss 3.2023   LearningRate 0.0622   Epoch: 4   Global Step: 70640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:30,329-Speed 3279.31 samples/sec   Loss 3.2974   LearningRate 0.0622   Epoch: 4   Global Step: 70650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:33,486-Speed 3244.28 samples/sec   Loss 3.2973   LearningRate 0.0621   Epoch: 4   Global Step: 70660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:36,584-Speed 3305.62 samples/sec   Loss 3.2549   LearningRate 0.0621   Epoch: 4   Global Step: 70670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:39,647-Speed 3343.50 samples/sec   Loss 3.1142   LearningRate 0.0621   Epoch: 4   Global Step: 70680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:42,715-Speed 3339.01 samples/sec   Loss 3.2915   LearningRate 0.0621   Epoch: 4   Global Step: 70690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:45,776-Speed 3345.77 samples/sec   Loss 3.2252   LearningRate 0.0621   Epoch: 4   Global Step: 70700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:48,843-Speed 3339.43 samples/sec   Loss 3.2538   LearningRate 0.0621   Epoch: 4   Global Step: 70710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:51,921-Speed 3327.63 samples/sec   Loss 3.1849   LearningRate 0.0621   Epoch: 4   Global Step: 70720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:54,992-Speed 3335.88 samples/sec   Loss 3.2485   LearningRate 0.0621   Epoch: 4   Global Step: 70730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:42:58,056-Speed 3343.01 samples/sec   Loss 3.2483   LearningRate 0.0621   Epoch: 4   Global Step: 70740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:01,121-Speed 3341.57 samples/sec   Loss 3.2985   LearningRate 0.0621   Epoch: 4   Global Step: 70750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:04,196-Speed 3330.11 samples/sec   Loss 3.2547   LearningRate 0.0621   Epoch: 4   Global Step: 70760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:07,310-Speed 3289.59 samples/sec   Loss 3.2646   LearningRate 0.0621   Epoch: 4   Global Step: 70770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:10,375-Speed 3342.01 samples/sec   Loss 3.2093   LearningRate 0.0621   Epoch: 4   Global Step: 70780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:13,548-Speed 3227.87 samples/sec   Loss 3.2933   LearningRate 0.0621   Epoch: 4   Global Step: 70790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:16,662-Speed 3289.74 samples/sec   Loss 3.2105   LearningRate 0.0621   Epoch: 4   Global Step: 70800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:19,740-Speed 3327.22 samples/sec   Loss 3.1448   LearningRate 0.0621   Epoch: 4   Global Step: 70810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:22,835-Speed 3309.40 samples/sec   Loss 3.2373   LearningRate 0.0621   Epoch: 4   Global Step: 70820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:25,903-Speed 3338.64 samples/sec   Loss 3.3092   LearningRate 0.0621   Epoch: 4   Global Step: 70830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:29,034-Speed 3271.52 samples/sec   Loss 3.2788   LearningRate 0.0621   Epoch: 4   Global Step: 70840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:32,100-Speed 3340.59 samples/sec   Loss 3.3032   LearningRate 0.0621   Epoch: 4   Global Step: 70850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:43:35,230-Speed 3272.25 samples/sec   Loss 3.2441   LearningRate 0.0621   Epoch: 4   Global Step: 70860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:38,295-Speed 3341.73 samples/sec   Loss 3.3181   LearningRate 0.0620   Epoch: 4   Global Step: 70870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:41,369-Speed 3332.01 samples/sec   Loss 3.2393   LearningRate 0.0620   Epoch: 4   Global Step: 70880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:44,461-Speed 3311.81 samples/sec   Loss 3.2548   LearningRate 0.0620   Epoch: 4   Global Step: 70890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:47,538-Speed 3328.91 samples/sec   Loss 3.2547   LearningRate 0.0620   Epoch: 4   Global Step: 70900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:50,623-Speed 3320.65 samples/sec   Loss 3.2612   LearningRate 0.0620   Epoch: 4   Global Step: 70910   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:53,765-Speed 3259.79 samples/sec   Loss 3.2702   LearningRate 0.0620   Epoch: 4   Global Step: 70920   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:56,855-Speed 3314.94 samples/sec   Loss 3.2435   LearningRate 0.0620   Epoch: 4   Global Step: 70930   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:43:59,921-Speed 3339.72 samples/sec   Loss 3.2413   LearningRate 0.0620   Epoch: 4   Global Step: 70940   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:03,000-Speed 3326.44 samples/sec   Loss 3.2761   LearningRate 0.0620   Epoch: 4   Global Step: 70950   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:06,108-Speed 3296.06 samples/sec   Loss 3.2616   LearningRate 0.0620   Epoch: 4   Global Step: 70960   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:44:09,159-Speed 3356.58 samples/sec   Loss 3.2471   LearningRate 0.0620   Epoch: 4   Global Step: 70970   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:12,252-Speed 3311.95 samples/sec   Loss 3.1997   LearningRate 0.0620   Epoch: 4   Global Step: 70980   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:15,322-Speed 3336.64 samples/sec   Loss 3.2322   LearningRate 0.0620   Epoch: 4   Global Step: 70990   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:18,394-Speed 3333.56 samples/sec   Loss 3.2287   LearningRate 0.0620   Epoch: 4   Global Step: 71000   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:21,467-Speed 3332.99 samples/sec   Loss 3.2506   LearningRate 0.0620   Epoch: 4   Global Step: 71010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:24,536-Speed 3337.88 samples/sec   Loss 3.2806   LearningRate 0.0620   Epoch: 4   Global Step: 71020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:27,622-Speed 3319.14 samples/sec   Loss 3.2679   LearningRate 0.0620   Epoch: 4   Global Step: 71030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:30,688-Speed 3339.85 samples/sec   Loss 3.2416   LearningRate 0.0620   Epoch: 4   Global Step: 71040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:33,769-Speed 3326.10 samples/sec   Loss 3.1810   LearningRate 0.0620   Epoch: 4   Global Step: 71050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:36,834-Speed 3341.51 samples/sec   Loss 3.2397   LearningRate 0.0620   Epoch: 4   Global Step: 71060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:39,902-Speed 3337.62 samples/sec   Loss 3.2354   LearningRate 0.0620   Epoch: 4   Global Step: 71070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:42,969-Speed 3340.23 samples/sec   Loss 3.2119   LearningRate 0.0619   Epoch: 4   Global Step: 71080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:46,058-Speed 3315.64 samples/sec   Loss 3.2358   LearningRate 0.0619   Epoch: 4   Global Step: 71090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:49,143-Speed 3320.17 samples/sec   Loss 3.2490   LearningRate 0.0619   Epoch: 4   Global Step: 71100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:52,245-Speed 3301.64 samples/sec   Loss 3.1999   LearningRate 0.0619   Epoch: 4   Global Step: 71110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:55,340-Speed 3309.01 samples/sec   Loss 3.2013   LearningRate 0.0619   Epoch: 4   Global Step: 71120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:44:58,415-Speed 3331.64 samples/sec   Loss 3.2216   LearningRate 0.0619   Epoch: 4   Global Step: 71130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:01,508-Speed 3310.76 samples/sec   Loss 3.2621   LearningRate 0.0619   Epoch: 4   Global Step: 71140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:04,595-Speed 3318.58 samples/sec   Loss 3.2382   LearningRate 0.0619   Epoch: 4   Global Step: 71150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:07,774-Speed 3221.64 samples/sec   Loss 3.2528   LearningRate 0.0619   Epoch: 4   Global Step: 71160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:10,874-Speed 3304.45 samples/sec   Loss 3.2586   LearningRate 0.0619   Epoch: 4   Global Step: 71170   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:45:13,948-Speed 3331.46 samples/sec   Loss 3.1806   LearningRate 0.0619   Epoch: 4   Global Step: 71180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:17,020-Speed 3335.02 samples/sec   Loss 3.2163   LearningRate 0.0619   Epoch: 4   Global Step: 71190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:20,096-Speed 3329.03 samples/sec   Loss 3.2939   LearningRate 0.0619   Epoch: 4   Global Step: 71200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:23,167-Speed 3335.62 samples/sec   Loss 3.3093   LearningRate 0.0619   Epoch: 4   Global Step: 71210   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:26,241-Speed 3332.11 samples/sec   Loss 3.2095   LearningRate 0.0619   Epoch: 4   Global Step: 71220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:29,345-Speed 3299.62 samples/sec   Loss 3.1629   LearningRate 0.0619   Epoch: 4   Global Step: 71230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:32,458-Speed 3290.18 samples/sec   Loss 3.2777   LearningRate 0.0619   Epoch: 4   Global Step: 71240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:35,588-Speed 3273.07 samples/sec   Loss 3.2570   LearningRate 0.0619   Epoch: 4   Global Step: 71250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:38,785-Speed 3203.52 samples/sec   Loss 3.2109   LearningRate 0.0619   Epoch: 4   Global Step: 71260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:41,854-Speed 3337.47 samples/sec   Loss 3.2477   LearningRate 0.0619   Epoch: 4   Global Step: 71270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:44,949-Speed 3309.03 samples/sec   Loss 3.2628   LearningRate 0.0619   Epoch: 4   Global Step: 71280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:48,096-Speed 3255.92 samples/sec   Loss 3.1960   LearningRate 0.0618   Epoch: 4   Global Step: 71290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:51,266-Speed 3229.98 samples/sec   Loss 3.2237   LearningRate 0.0618   Epoch: 4   Global Step: 71300   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:54,393-Speed 3275.72 samples/sec   Loss 3.2101   LearningRate 0.0618   Epoch: 4   Global Step: 71310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:45:57,476-Speed 3322.46 samples/sec   Loss 3.1716   LearningRate 0.0618   Epoch: 4   Global Step: 71320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:00,566-Speed 3315.01 samples/sec   Loss 3.2549   LearningRate 0.0618   Epoch: 4   Global Step: 71330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:03,637-Speed 3334.95 samples/sec   Loss 3.2225   LearningRate 0.0618   Epoch: 4   Global Step: 71340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:06,712-Speed 3331.28 samples/sec   Loss 3.3070   LearningRate 0.0618   Epoch: 4   Global Step: 71350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:09,840-Speed 3273.92 samples/sec   Loss 3.2174   LearningRate 0.0618   Epoch: 4   Global Step: 71360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:12,905-Speed 3342.48 samples/sec   Loss 3.2325   LearningRate 0.0618   Epoch: 4   Global Step: 71370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:15,972-Speed 3339.16 samples/sec   Loss 3.2804   LearningRate 0.0618   Epoch: 4   Global Step: 71380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:19,060-Speed 3316.30 samples/sec   Loss 3.2351   LearningRate 0.0618   Epoch: 4   Global Step: 71390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:22,132-Speed 3334.40 samples/sec   Loss 3.2847   LearningRate 0.0618   Epoch: 4   Global Step: 71400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:25,210-Speed 3327.05 samples/sec   Loss 3.2322   LearningRate 0.0618   Epoch: 4   Global Step: 71410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:28,281-Speed 3335.50 samples/sec   Loss 3.2704   LearningRate 0.0618   Epoch: 4   Global Step: 71420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:46:31,352-Speed 3336.00 samples/sec   Loss 3.2460   LearningRate 0.0618   Epoch: 4   Global Step: 71430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:34,418-Speed 3340.68 samples/sec   Loss 3.2869   LearningRate 0.0618   Epoch: 4   Global Step: 71440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:37,552-Speed 3268.60 samples/sec   Loss 3.2946   LearningRate 0.0618   Epoch: 4   Global Step: 71450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:40,619-Speed 3338.80 samples/sec   Loss 3.2026   LearningRate 0.0618   Epoch: 4   Global Step: 71460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:43,700-Speed 3324.38 samples/sec   Loss 3.1779   LearningRate 0.0618   Epoch: 4   Global Step: 71470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:46,773-Speed 3333.47 samples/sec   Loss 3.2470   LearningRate 0.0618   Epoch: 4   Global Step: 71480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:49,853-Speed 3325.72 samples/sec   Loss 3.2758   LearningRate 0.0618   Epoch: 4   Global Step: 71490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:52,924-Speed 3334.81 samples/sec   Loss 3.2623   LearningRate 0.0618   Epoch: 4   Global Step: 71500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:55,989-Speed 3340.98 samples/sec   Loss 3.2924   LearningRate 0.0617   Epoch: 4   Global Step: 71510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:46:59,056-Speed 3340.19 samples/sec   Loss 3.2220   LearningRate 0.0617   Epoch: 4   Global Step: 71520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:47:02,115-Speed 3348.51 samples/sec   Loss 3.2197   LearningRate 0.0617   Epoch: 4   Global Step: 71530   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:05,193-Speed 3327.18 samples/sec   Loss 3.3396   LearningRate 0.0617   Epoch: 4   Global Step: 71540   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:08,260-Speed 3339.81 samples/sec   Loss 3.2057   LearningRate 0.0617   Epoch: 4   Global Step: 71550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:11,323-Speed 3343.65 samples/sec   Loss 3.3046   LearningRate 0.0617   Epoch: 4   Global Step: 71560   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:14,391-Speed 3338.58 samples/sec   Loss 3.2514   LearningRate 0.0617   Epoch: 4   Global Step: 71570   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:17,469-Speed 3327.77 samples/sec   Loss 3.2689   LearningRate 0.0617   Epoch: 4   Global Step: 71580   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:20,557-Speed 3317.24 samples/sec   Loss 3.2064   LearningRate 0.0617   Epoch: 4   Global Step: 71590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:23,632-Speed 3330.48 samples/sec   Loss 3.2309   LearningRate 0.0617   Epoch: 4   Global Step: 71600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:26,705-Speed 3333.26 samples/sec   Loss 3.2548   LearningRate 0.0617   Epoch: 4   Global Step: 71610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:29,782-Speed 3328.90 samples/sec   Loss 3.3519   LearningRate 0.0617   Epoch: 4   Global Step: 71620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:47:32,856-Speed 3331.79 samples/sec   Loss 3.1819   LearningRate 0.0617   Epoch: 4   Global Step: 71630   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:47:35,920-Speed 3342.72 samples/sec   Loss 3.2764   LearningRate 0.0617   Epoch: 4   Global Step: 71640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:47:38,993-Speed 3332.75 samples/sec   Loss 3.2784   LearningRate 0.0617   Epoch: 4   Global Step: 71650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:47:42,077-Speed 3321.50 samples/sec   Loss 3.2881   LearningRate 0.0617   Epoch: 4   Global Step: 71660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:47:45,154-Speed 3328.43 samples/sec   Loss 3.3231   LearningRate 0.0617   Epoch: 4   Global Step: 71670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:47:48,240-Speed 3319.17 samples/sec   Loss 3.2994   LearningRate 0.0617   Epoch: 4   Global Step: 71680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:47:51,307-Speed 3339.91 samples/sec   Loss 3.2579   LearningRate 0.0617   Epoch: 4   Global Step: 71690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:47:54,376-Speed 3337.66 samples/sec   Loss 3.2570   LearningRate 0.0617   Epoch: 4   Global Step: 71700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:47:57,446-Speed 3336.05 samples/sec   Loss 3.2880   LearningRate 0.0617   Epoch: 4   Global Step: 71710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:00,515-Speed 3337.18 samples/sec   Loss 3.2763   LearningRate 0.0616   Epoch: 4   Global Step: 71720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:03,570-Speed 3353.46 samples/sec   Loss 3.1848   LearningRate 0.0616   Epoch: 4   Global Step: 71730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:06,678-Speed 3295.19 samples/sec   Loss 3.2076   LearningRate 0.0616   Epoch: 4   Global Step: 71740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:09,775-Speed 3306.56 samples/sec   Loss 3.2944   LearningRate 0.0616   Epoch: 4   Global Step: 71750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:12,856-Speed 3325.10 samples/sec   Loss 3.3303   LearningRate 0.0616   Epoch: 4   Global Step: 71760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:15,933-Speed 3328.56 samples/sec   Loss 3.2231   LearningRate 0.0616   Epoch: 4   Global Step: 71770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:19,010-Speed 3328.52 samples/sec   Loss 3.2235   LearningRate 0.0616   Epoch: 4   Global Step: 71780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:24,660-Speed 1812.81 samples/sec   Loss 3.3102   LearningRate 0.0616   Epoch: 4   Global Step: 71790   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:27,734-Speed 3331.91 samples/sec   Loss 3.2074   LearningRate 0.0616   Epoch: 4   Global Step: 71800   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:30,797-Speed 3343.11 samples/sec   Loss 3.3010   LearningRate 0.0616   Epoch: 4   Global Step: 71810   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:33,910-Speed 3290.10 samples/sec   Loss 3.2574   LearningRate 0.0616   Epoch: 4   Global Step: 71820   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:36,974-Speed 3342.70 samples/sec   Loss 3.3028   LearningRate 0.0616   Epoch: 4   Global Step: 71830   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:40,053-Speed 3326.92 samples/sec   Loss 3.3208   LearningRate 0.0616   Epoch: 4   Global Step: 71840   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:43,127-Speed 3331.35 samples/sec   Loss 3.1691   LearningRate 0.0616   Epoch: 4   Global Step: 71850   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:46,191-Speed 3343.84 samples/sec   Loss 3.2649   LearningRate 0.0616   Epoch: 4   Global Step: 71860   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:49,254-Speed 3343.42 samples/sec   Loss 3.2773   LearningRate 0.0616   Epoch: 4   Global Step: 71870   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:52,331-Speed 3328.35 samples/sec   Loss 3.2366   LearningRate 0.0616   Epoch: 4   Global Step: 71880   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:55,409-Speed 3327.98 samples/sec   Loss 3.2837   LearningRate 0.0616   Epoch: 4   Global Step: 71890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:48:58,484-Speed 3330.34 samples/sec   Loss 3.2039   LearningRate 0.0616   Epoch: 4   Global Step: 71900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:49:01,536-Speed 3356.09 samples/sec   Loss 3.3770   LearningRate 0.0616   Epoch: 4   Global Step: 71910   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:49:04,648-Speed 3290.97 samples/sec   Loss 3.2699   LearningRate 0.0616   Epoch: 4   Global Step: 71920   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:49:07,722-Speed 3333.04 samples/sec   Loss 3.2744   LearningRate 0.0615   Epoch: 4   Global Step: 71930   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:49:10,815-Speed 3311.03 samples/sec   Loss 3.2408   LearningRate 0.0615   Epoch: 4   Global Step: 71940   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:49:13,885-Speed 3336.74 samples/sec   Loss 3.2991   LearningRate 0.0615   Epoch: 4   Global Step: 71950   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:49:16,959-Speed 3332.07 samples/sec   Loss 3.2841   LearningRate 0.0615   Epoch: 4   Global Step: 71960   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:49:20,024-Speed 3341.57 samples/sec   Loss 3.2788   LearningRate 0.0615   Epoch: 4   Global Step: 71970   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:49:23,097-Speed 3332.18 samples/sec   Loss 3.2671   LearningRate 0.0615   Epoch: 4   Global Step: 71980   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:49:26,169-Speed 3334.47 samples/sec   Loss 3.2560   LearningRate 0.0615   Epoch: 4   Global Step: 71990   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:49:29,233-Speed 3342.76 samples/sec   Loss 3.3084   LearningRate 0.0615   Epoch: 4   Global Step: 72000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:50:13,151-[lfw][72000]XNorm: 23.607859
Training: 2022-04-11 06:50:13,151-[lfw][72000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 06:50:13,152-[lfw][72000]Accuracy-Highest: 0.99817
Training: 2022-04-11 06:51:04,182-[cfp_fp][72000]XNorm: 22.231208
Training: 2022-04-11 06:51:04,183-[cfp_fp][72000]Accuracy-Flip: 0.98243+-0.00503
Training: 2022-04-11 06:51:04,183-[cfp_fp][72000]Accuracy-Highest: 0.98457
Training: 2022-04-11 06:51:48,340-[agedb_30][72000]XNorm: 23.745790
Training: 2022-04-11 06:51:48,340-[agedb_30][72000]Accuracy-Flip: 0.98000+-0.00726
Training: 2022-04-11 06:51:48,341-[agedb_30][72000]Accuracy-Highest: 0.98100
Training: 2022-04-11 06:51:51,418-Speed 72.02 samples/sec   Loss 3.2606   LearningRate 0.0615   Epoch: 4   Global Step: 72010   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:51:54,465-Speed 3360.92 samples/sec   Loss 3.3155   LearningRate 0.0615   Epoch: 4   Global Step: 72020   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:51:57,555-Speed 3315.39 samples/sec   Loss 3.2692   LearningRate 0.0615   Epoch: 4   Global Step: 72030   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:00,605-Speed 3357.89 samples/sec   Loss 3.3160   LearningRate 0.0615   Epoch: 4   Global Step: 72040   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:03,732-Speed 3275.47 samples/sec   Loss 3.2724   LearningRate 0.0615   Epoch: 4   Global Step: 72050   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:06,914-Speed 3218.42 samples/sec   Loss 3.3715   LearningRate 0.0615   Epoch: 4   Global Step: 72060   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:09,977-Speed 3344.60 samples/sec   Loss 3.2308   LearningRate 0.0615   Epoch: 4   Global Step: 72070   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:13,035-Speed 3348.83 samples/sec   Loss 3.2593   LearningRate 0.0615   Epoch: 4   Global Step: 72080   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:16,093-Speed 3349.88 samples/sec   Loss 3.2862   LearningRate 0.0615   Epoch: 4   Global Step: 72090   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:19,166-Speed 3332.21 samples/sec   Loss 3.2763   LearningRate 0.0615   Epoch: 4   Global Step: 72100   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:22,231-Speed 3341.71 samples/sec   Loss 3.2517   LearningRate 0.0615   Epoch: 4   Global Step: 72110   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:25,310-Speed 3327.55 samples/sec   Loss 3.2698   LearningRate 0.0615   Epoch: 4   Global Step: 72120   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:28,371-Speed 3345.65 samples/sec   Loss 3.3238   LearningRate 0.0615   Epoch: 4   Global Step: 72130   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:31,448-Speed 3328.53 samples/sec   Loss 3.2854   LearningRate 0.0614   Epoch: 4   Global Step: 72140   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:34,571-Speed 3279.82 samples/sec   Loss 3.3329   LearningRate 0.0614   Epoch: 4   Global Step: 72150   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:37,674-Speed 3300.46 samples/sec   Loss 3.2689   LearningRate 0.0614   Epoch: 4   Global Step: 72160   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:40,787-Speed 3290.23 samples/sec   Loss 3.3273   LearningRate 0.0614   Epoch: 4   Global Step: 72170   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:43,942-Speed 3246.28 samples/sec   Loss 3.3818   LearningRate 0.0614   Epoch: 4   Global Step: 72180   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:47,062-Speed 3283.05 samples/sec   Loss 3.3097   LearningRate 0.0614   Epoch: 4   Global Step: 72190   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:50,148-Speed 3318.64 samples/sec   Loss 3.2704   LearningRate 0.0614   Epoch: 4   Global Step: 72200   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:53,269-Speed 3282.48 samples/sec   Loss 3.2513   LearningRate 0.0614   Epoch: 4   Global Step: 72210   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:52:56,394-Speed 3278.23 samples/sec   Loss 3.2906   LearningRate 0.0614   Epoch: 4   Global Step: 72220   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:52:59,503-Speed 3293.70 samples/sec   Loss 3.2412   LearningRate 0.0614   Epoch: 4   Global Step: 72230   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:02,645-Speed 3260.20 samples/sec   Loss 3.2821   LearningRate 0.0614   Epoch: 4   Global Step: 72240   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:05,706-Speed 3346.77 samples/sec   Loss 3.2812   LearningRate 0.0614   Epoch: 4   Global Step: 72250   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:08,784-Speed 3327.28 samples/sec   Loss 3.2579   LearningRate 0.0614   Epoch: 4   Global Step: 72260   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:11,850-Speed 3340.24 samples/sec   Loss 3.3295   LearningRate 0.0614   Epoch: 4   Global Step: 72270   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:14,909-Speed 3348.33 samples/sec   Loss 3.2845   LearningRate 0.0614   Epoch: 4   Global Step: 72280   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:17,981-Speed 3334.58 samples/sec   Loss 3.3933   LearningRate 0.0614   Epoch: 4   Global Step: 72290   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:21,048-Speed 3339.48 samples/sec   Loss 3.1764   LearningRate 0.0614   Epoch: 4   Global Step: 72300   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:24,104-Speed 3351.24 samples/sec   Loss 3.1704   LearningRate 0.0614   Epoch: 4   Global Step: 72310   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:27,176-Speed 3333.68 samples/sec   Loss 3.2644   LearningRate 0.0614   Epoch: 4   Global Step: 72320   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:30,264-Speed 3316.55 samples/sec   Loss 3.2341   LearningRate 0.0614   Epoch: 4   Global Step: 72330   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:33,323-Speed 3349.57 samples/sec   Loss 3.2965   LearningRate 0.0614   Epoch: 4   Global Step: 72340   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:36,390-Speed 3340.46 samples/sec   Loss 3.2312   LearningRate 0.0614   Epoch: 4   Global Step: 72350   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:39,463-Speed 3332.48 samples/sec   Loss 3.2015   LearningRate 0.0613   Epoch: 4   Global Step: 72360   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:42,521-Speed 3349.03 samples/sec   Loss 3.3600   LearningRate 0.0613   Epoch: 4   Global Step: 72370   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:45,580-Speed 3348.98 samples/sec   Loss 3.2735   LearningRate 0.0613   Epoch: 4   Global Step: 72380   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:48,646-Speed 3339.96 samples/sec   Loss 3.3143   LearningRate 0.0613   Epoch: 4   Global Step: 72390   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:51,725-Speed 3326.87 samples/sec   Loss 3.2676   LearningRate 0.0613   Epoch: 4   Global Step: 72400   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:54,780-Speed 3352.33 samples/sec   Loss 3.2307   LearningRate 0.0613   Epoch: 4   Global Step: 72410   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:53:57,833-Speed 3354.47 samples/sec   Loss 3.2186   LearningRate 0.0613   Epoch: 4   Global Step: 72420   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:01,033-Speed 3201.43 samples/sec   Loss 3.3514   LearningRate 0.0613   Epoch: 4   Global Step: 72430   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:04,127-Speed 3310.68 samples/sec   Loss 3.2126   LearningRate 0.0613   Epoch: 4   Global Step: 72440   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:07,187-Speed 3346.60 samples/sec   Loss 3.2621   LearningRate 0.0613   Epoch: 4   Global Step: 72450   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:10,254-Speed 3340.10 samples/sec   Loss 3.2271   LearningRate 0.0613   Epoch: 4   Global Step: 72460   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:13,322-Speed 3338.37 samples/sec   Loss 3.2513   LearningRate 0.0613   Epoch: 4   Global Step: 72470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:16,395-Speed 3333.15 samples/sec   Loss 3.2995   LearningRate 0.0613   Epoch: 4   Global Step: 72480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:19,466-Speed 3334.13 samples/sec   Loss 3.2695   LearningRate 0.0613   Epoch: 4   Global Step: 72490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:22,554-Speed 3317.10 samples/sec   Loss 3.2778   LearningRate 0.0613   Epoch: 4   Global Step: 72500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:25,748-Speed 3207.23 samples/sec   Loss 3.2651   LearningRate 0.0613   Epoch: 4   Global Step: 72510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:28,806-Speed 3349.54 samples/sec   Loss 3.2365   LearningRate 0.0613   Epoch: 4   Global Step: 72520   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:54:31,852-Speed 3362.43 samples/sec   Loss 3.2985   LearningRate 0.0613   Epoch: 4   Global Step: 72530   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:34,921-Speed 3337.72 samples/sec   Loss 3.2854   LearningRate 0.0613   Epoch: 4   Global Step: 72540   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:37,999-Speed 3327.16 samples/sec   Loss 3.1838   LearningRate 0.0613   Epoch: 4   Global Step: 72550   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:41,068-Speed 3337.13 samples/sec   Loss 3.2946   LearningRate 0.0613   Epoch: 4   Global Step: 72560   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:44,200-Speed 3270.77 samples/sec   Loss 3.2835   LearningRate 0.0612   Epoch: 4   Global Step: 72570   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:47,326-Speed 3276.19 samples/sec   Loss 3.2206   LearningRate 0.0612   Epoch: 4   Global Step: 72580   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:50,400-Speed 3331.80 samples/sec   Loss 3.2704   LearningRate 0.0612   Epoch: 4   Global Step: 72590   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:53,465-Speed 3342.19 samples/sec   Loss 3.3272   LearningRate 0.0612   Epoch: 4   Global Step: 72600   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:56,529-Speed 3343.24 samples/sec   Loss 3.2634   LearningRate 0.0612   Epoch: 4   Global Step: 72610   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:54:59,601-Speed 3334.22 samples/sec   Loss 3.2497   LearningRate 0.0612   Epoch: 4   Global Step: 72620   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:02,667-Speed 3340.65 samples/sec   Loss 3.2847   LearningRate 0.0612   Epoch: 4   Global Step: 72630   Fp16 Grad Scale: 262144   Required: 28 hours
Training: 2022-04-11 06:55:05,721-Speed 3352.78 samples/sec   Loss 3.3394   LearningRate 0.0612   Epoch: 4   Global Step: 72640   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:08,821-Speed 3304.69 samples/sec   Loss 3.2691   LearningRate 0.0612   Epoch: 4   Global Step: 72650   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:11,886-Speed 3341.74 samples/sec   Loss 3.2795   LearningRate 0.0612   Epoch: 4   Global Step: 72660   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:14,965-Speed 3325.87 samples/sec   Loss 3.2936   LearningRate 0.0612   Epoch: 4   Global Step: 72670   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:18,036-Speed 3335.13 samples/sec   Loss 3.2575   LearningRate 0.0612   Epoch: 4   Global Step: 72680   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:21,130-Speed 3310.21 samples/sec   Loss 3.2167   LearningRate 0.0612   Epoch: 4   Global Step: 72690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:24,255-Speed 3278.34 samples/sec   Loss 3.2717   LearningRate 0.0612   Epoch: 4   Global Step: 72700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:27,391-Speed 3266.03 samples/sec   Loss 3.3251   LearningRate 0.0612   Epoch: 4   Global Step: 72710   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:30,480-Speed 3315.39 samples/sec   Loss 3.2940   LearningRate 0.0612   Epoch: 4   Global Step: 72720   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:33,544-Speed 3342.78 samples/sec   Loss 3.2639   LearningRate 0.0612   Epoch: 4   Global Step: 72730   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:36,614-Speed 3336.23 samples/sec   Loss 3.3075   LearningRate 0.0612   Epoch: 4   Global Step: 72740   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:39,688-Speed 3332.17 samples/sec   Loss 3.3484   LearningRate 0.0612   Epoch: 4   Global Step: 72750   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:42,760-Speed 3334.41 samples/sec   Loss 3.3030   LearningRate 0.0612   Epoch: 4   Global Step: 72760   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:45,865-Speed 3298.42 samples/sec   Loss 3.2198   LearningRate 0.0612   Epoch: 4   Global Step: 72770   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:48,990-Speed 3277.59 samples/sec   Loss 3.2213   LearningRate 0.0611   Epoch: 4   Global Step: 72780   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:55:52,057-Speed 3339.33 samples/sec   Loss 3.2339   LearningRate 0.0611   Epoch: 4   Global Step: 72790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:55:55,133-Speed 3329.72 samples/sec   Loss 3.1969   LearningRate 0.0611   Epoch: 4   Global Step: 72800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:55:58,198-Speed 3342.12 samples/sec   Loss 3.3121   LearningRate 0.0611   Epoch: 4   Global Step: 72810   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:56:01,280-Speed 3323.03 samples/sec   Loss 3.2924   LearningRate 0.0611   Epoch: 4   Global Step: 72820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:56:04,378-Speed 3306.32 samples/sec   Loss 3.2555   LearningRate 0.0611   Epoch: 4   Global Step: 72830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:56:07,454-Speed 3330.32 samples/sec   Loss 3.2444   LearningRate 0.0611   Epoch: 4   Global Step: 72840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:56:10,546-Speed 3311.73 samples/sec   Loss 3.3329   LearningRate 0.0611   Epoch: 4   Global Step: 72850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:56:13,629-Speed 3322.84 samples/sec   Loss 3.2750   LearningRate 0.0611   Epoch: 4   Global Step: 72860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:56:16,759-Speed 3272.03 samples/sec   Loss 3.2922   LearningRate 0.0611   Epoch: 4   Global Step: 72870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:56:19,857-Speed 3306.16 samples/sec   Loss 3.2503   LearningRate 0.0611   Epoch: 4   Global Step: 72880   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-11 06:56:22,923-Speed 3341.01 samples/sec   Loss 3.2736   LearningRate 0.0611   Epoch: 4   Global Step: 72890   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:56:25,992-Speed 3337.65 samples/sec   Loss 3.2796   LearningRate 0.0611   Epoch: 4   Global Step: 72900   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:56:29,056-Speed 3342.35 samples/sec   Loss 3.3274   LearningRate 0.0611   Epoch: 4   Global Step: 72910   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:56:32,220-Speed 3236.88 samples/sec   Loss 3.3047   LearningRate 0.0611   Epoch: 4   Global Step: 72920   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:56:35,367-Speed 3255.08 samples/sec   Loss 3.3223   LearningRate 0.0611   Epoch: 4   Global Step: 72930   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:56:38,444-Speed 3328.69 samples/sec   Loss 3.2539   LearningRate 0.0611   Epoch: 4   Global Step: 72940   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:56:41,507-Speed 3343.77 samples/sec   Loss 3.2208   LearningRate 0.0611   Epoch: 4   Global Step: 72950   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-04-11 06:56:44,667-Speed 3241.25 samples/sec   Loss 3.3184   LearningRate 0.0611   Epoch: 4   Global Step: 72960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:56:47,750-Speed 3322.85 samples/sec   Loss 3.2109   LearningRate 0.0611   Epoch: 4   Global Step: 72970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:56:50,852-Speed 3301.26 samples/sec   Loss 3.2050   LearningRate 0.0611   Epoch: 4   Global Step: 72980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:56:53,918-Speed 3340.84 samples/sec   Loss 3.4275   LearningRate 0.0611   Epoch: 4   Global Step: 72990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:56:56,984-Speed 3340.05 samples/sec   Loss 3.3455   LearningRate 0.0610   Epoch: 4   Global Step: 73000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:57:00,058-Speed 3332.12 samples/sec   Loss 3.3381   LearningRate 0.0610   Epoch: 4   Global Step: 73010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:57:03,145-Speed 3318.17 samples/sec   Loss 3.2682   LearningRate 0.0610   Epoch: 4   Global Step: 73020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:57:06,231-Speed 3319.39 samples/sec   Loss 3.1365   LearningRate 0.0610   Epoch: 4   Global Step: 73030   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:57:09,301-Speed 3335.75 samples/sec   Loss 3.2263   LearningRate 0.0610   Epoch: 4   Global Step: 73040   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:57:12,366-Speed 3342.04 samples/sec   Loss 3.2625   LearningRate 0.0610   Epoch: 4   Global Step: 73050   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:57:15,436-Speed 3336.92 samples/sec   Loss 3.2978   LearningRate 0.0610   Epoch: 4   Global Step: 73060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:57:18,507-Speed 3334.22 samples/sec   Loss 3.3320   LearningRate 0.0610   Epoch: 4   Global Step: 73070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:57:21,582-Speed 3331.42 samples/sec   Loss 3.1304   LearningRate 0.0610   Epoch: 4   Global Step: 73080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:57:24,668-Speed 3318.93 samples/sec   Loss 3.2253   LearningRate 0.0610   Epoch: 4   Global Step: 73090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:57:27,850-Speed 3219.08 samples/sec   Loss 3.2071   LearningRate 0.0610   Epoch: 4   Global Step: 73100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:57:30,965-Speed 3287.61 samples/sec   Loss 3.2304   LearningRate 0.0610   Epoch: 4   Global Step: 73110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:57:34,160-Speed 3205.93 samples/sec   Loss 3.2963   LearningRate 0.0610   Epoch: 4   Global Step: 73120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:57:37,259-Speed 3305.58 samples/sec   Loss 3.2988   LearningRate 0.0610   Epoch: 4   Global Step: 73130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:57:40,402-Speed 3258.23 samples/sec   Loss 3.2272   LearningRate 0.0610   Epoch: 4   Global Step: 73140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:57:43,457-Speed 3352.68 samples/sec   Loss 3.2313   LearningRate 0.0610   Epoch: 4   Global Step: 73150   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:57:46,518-Speed 3346.88 samples/sec   Loss 3.2200   LearningRate 0.0610   Epoch: 4   Global Step: 73160   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:57:49,580-Speed 3345.14 samples/sec   Loss 3.2887   LearningRate 0.0610   Epoch: 4   Global Step: 73170   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:57:52,646-Speed 3339.89 samples/sec   Loss 3.3349   LearningRate 0.0610   Epoch: 4   Global Step: 73180   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:57:55,707-Speed 3346.64 samples/sec   Loss 3.3295   LearningRate 0.0610   Epoch: 4   Global Step: 73190   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:57:58,828-Speed 3280.99 samples/sec   Loss 3.2892   LearningRate 0.0610   Epoch: 4   Global Step: 73200   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:58:01,950-Speed 3280.87 samples/sec   Loss 3.2211   LearningRate 0.0609   Epoch: 4   Global Step: 73210   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:58:05,017-Speed 3340.68 samples/sec   Loss 3.2697   LearningRate 0.0609   Epoch: 4   Global Step: 73220   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:58:08,117-Speed 3303.49 samples/sec   Loss 3.2974   LearningRate 0.0609   Epoch: 4   Global Step: 73230   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:58:11,179-Speed 3345.07 samples/sec   Loss 3.2527   LearningRate 0.0609   Epoch: 4   Global Step: 73240   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 06:58:14,243-Speed 3343.89 samples/sec   Loss 3.3318   LearningRate 0.0609   Epoch: 4   Global Step: 73250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:17,306-Speed 3343.58 samples/sec   Loss 3.1924   LearningRate 0.0609   Epoch: 4   Global Step: 73260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:20,375-Speed 3336.74 samples/sec   Loss 3.2834   LearningRate 0.0609   Epoch: 4   Global Step: 73270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:23,451-Speed 3330.08 samples/sec   Loss 3.2099   LearningRate 0.0609   Epoch: 4   Global Step: 73280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:26,561-Speed 3293.80 samples/sec   Loss 3.1975   LearningRate 0.0609   Epoch: 4   Global Step: 73290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:29,625-Speed 3343.39 samples/sec   Loss 3.3448   LearningRate 0.0609   Epoch: 4   Global Step: 73300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:32,711-Speed 3319.71 samples/sec   Loss 3.2316   LearningRate 0.0609   Epoch: 4   Global Step: 73310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:35,794-Speed 3321.88 samples/sec   Loss 3.3465   LearningRate 0.0609   Epoch: 4   Global Step: 73320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:38,883-Speed 3315.37 samples/sec   Loss 3.2662   LearningRate 0.0609   Epoch: 4   Global Step: 73330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:41,946-Speed 3343.98 samples/sec   Loss 3.4023   LearningRate 0.0609   Epoch: 4   Global Step: 73340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:58:45,014-Speed 3338.58 samples/sec   Loss 3.2091   LearningRate 0.0609   Epoch: 4   Global Step: 73350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:58:48,078-Speed 3342.77 samples/sec   Loss 3.3827   LearningRate 0.0609   Epoch: 4   Global Step: 73360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:58:51,157-Speed 3326.58 samples/sec   Loss 3.2954   LearningRate 0.0609   Epoch: 4   Global Step: 73370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:58:54,241-Speed 3320.51 samples/sec   Loss 3.3059   LearningRate 0.0609   Epoch: 4   Global Step: 73380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:58:57,304-Speed 3344.84 samples/sec   Loss 3.1977   LearningRate 0.0609   Epoch: 4   Global Step: 73390   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:00,377-Speed 3332.80 samples/sec   Loss 3.2161   LearningRate 0.0609   Epoch: 4   Global Step: 73400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:03,449-Speed 3334.61 samples/sec   Loss 3.2565   LearningRate 0.0609   Epoch: 4   Global Step: 73410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:06,529-Speed 3325.45 samples/sec   Loss 3.2473   LearningRate 0.0608   Epoch: 4   Global Step: 73420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:09,610-Speed 3323.88 samples/sec   Loss 3.2685   LearningRate 0.0608   Epoch: 4   Global Step: 73430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:12,691-Speed 3324.13 samples/sec   Loss 3.2147   LearningRate 0.0608   Epoch: 4   Global Step: 73440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:15,745-Speed 3353.54 samples/sec   Loss 3.2493   LearningRate 0.0608   Epoch: 4   Global Step: 73450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:18,812-Speed 3339.35 samples/sec   Loss 3.2453   LearningRate 0.0608   Epoch: 4   Global Step: 73460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:21,903-Speed 3314.32 samples/sec   Loss 3.2602   LearningRate 0.0608   Epoch: 4   Global Step: 73470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:24,969-Speed 3340.27 samples/sec   Loss 3.2714   LearningRate 0.0608   Epoch: 4   Global Step: 73480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:28,032-Speed 3344.58 samples/sec   Loss 3.2216   LearningRate 0.0608   Epoch: 4   Global Step: 73490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:31,098-Speed 3340.62 samples/sec   Loss 3.3555   LearningRate 0.0608   Epoch: 4   Global Step: 73500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:34,161-Speed 3343.87 samples/sec   Loss 3.2844   LearningRate 0.0608   Epoch: 4   Global Step: 73510   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:37,244-Speed 3322.14 samples/sec   Loss 3.2660   LearningRate 0.0608   Epoch: 4   Global Step: 73520   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:40,444-Speed 3200.96 samples/sec   Loss 3.2888   LearningRate 0.0608   Epoch: 4   Global Step: 73530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:43,515-Speed 3334.49 samples/sec   Loss 3.2934   LearningRate 0.0608   Epoch: 4   Global Step: 73540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 06:59:46,570-Speed 3352.23 samples/sec   Loss 3.1926   LearningRate 0.0608   Epoch: 4   Global Step: 73550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:59:49,729-Speed 3243.43 samples/sec   Loss 3.2594   LearningRate 0.0608   Epoch: 4   Global Step: 73560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:59:52,807-Speed 3327.39 samples/sec   Loss 3.2536   LearningRate 0.0608   Epoch: 4   Global Step: 73570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:59:55,877-Speed 3336.68 samples/sec   Loss 3.2746   LearningRate 0.0608   Epoch: 4   Global Step: 73580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 06:59:58,945-Speed 3338.44 samples/sec   Loss 3.2867   LearningRate 0.0608   Epoch: 4   Global Step: 73590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:00:02,060-Speed 3287.43 samples/sec   Loss 3.2450   LearningRate 0.0608   Epoch: 4   Global Step: 73600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:00:05,166-Speed 3298.02 samples/sec   Loss 3.3146   LearningRate 0.0608   Epoch: 4   Global Step: 73610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:00:08,307-Speed 3260.71 samples/sec   Loss 3.2962   LearningRate 0.0608   Epoch: 4   Global Step: 73620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:00:11,427-Speed 3282.82 samples/sec   Loss 3.2409   LearningRate 0.0608   Epoch: 4   Global Step: 73630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:00:14,504-Speed 3328.65 samples/sec   Loss 3.3592   LearningRate 0.0607   Epoch: 4   Global Step: 73640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:00:17,583-Speed 3326.77 samples/sec   Loss 3.2593   LearningRate 0.0607   Epoch: 4   Global Step: 73650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:20,721-Speed 3264.31 samples/sec   Loss 3.3741   LearningRate 0.0607   Epoch: 4   Global Step: 73660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:23,969-Speed 3153.32 samples/sec   Loss 3.2629   LearningRate 0.0607   Epoch: 4   Global Step: 73670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:27,083-Speed 3288.67 samples/sec   Loss 3.2594   LearningRate 0.0607   Epoch: 4   Global Step: 73680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:30,162-Speed 3327.22 samples/sec   Loss 3.2407   LearningRate 0.0607   Epoch: 4   Global Step: 73690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:33,307-Speed 3256.74 samples/sec   Loss 3.1604   LearningRate 0.0607   Epoch: 4   Global Step: 73700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:36,374-Speed 3339.46 samples/sec   Loss 3.2890   LearningRate 0.0607   Epoch: 4   Global Step: 73710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:39,495-Speed 3281.15 samples/sec   Loss 3.2815   LearningRate 0.0607   Epoch: 4   Global Step: 73720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:42,597-Speed 3302.28 samples/sec   Loss 3.2682   LearningRate 0.0607   Epoch: 4   Global Step: 73730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:45,667-Speed 3336.90 samples/sec   Loss 3.2574   LearningRate 0.0607   Epoch: 4   Global Step: 73740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:48,721-Speed 3353.48 samples/sec   Loss 3.2906   LearningRate 0.0607   Epoch: 4   Global Step: 73750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:51,794-Speed 3332.49 samples/sec   Loss 3.3202   LearningRate 0.0607   Epoch: 4   Global Step: 73760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:54,858-Speed 3343.16 samples/sec   Loss 3.2644   LearningRate 0.0607   Epoch: 4   Global Step: 73770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:00:57,934-Speed 3330.09 samples/sec   Loss 3.3077   LearningRate 0.0607   Epoch: 4   Global Step: 73780   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:01,005-Speed 3334.94 samples/sec   Loss 3.2330   LearningRate 0.0607   Epoch: 4   Global Step: 73790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:04,120-Speed 3287.80 samples/sec   Loss 3.3213   LearningRate 0.0607   Epoch: 4   Global Step: 73800   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:07,234-Speed 3289.10 samples/sec   Loss 3.2481   LearningRate 0.0607   Epoch: 4   Global Step: 73810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:10,402-Speed 3234.03 samples/sec   Loss 3.2691   LearningRate 0.0607   Epoch: 4   Global Step: 73820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:13,495-Speed 3311.50 samples/sec   Loss 3.2628   LearningRate 0.0607   Epoch: 4   Global Step: 73830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:16,602-Speed 3296.61 samples/sec   Loss 3.2946   LearningRate 0.0607   Epoch: 4   Global Step: 73840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:19,728-Speed 3276.55 samples/sec   Loss 3.1709   LearningRate 0.0606   Epoch: 4   Global Step: 73850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:22,876-Speed 3254.03 samples/sec   Loss 3.2457   LearningRate 0.0606   Epoch: 4   Global Step: 73860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:26,015-Speed 3262.53 samples/sec   Loss 3.2866   LearningRate 0.0606   Epoch: 4   Global Step: 73870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:29,078-Speed 3343.71 samples/sec   Loss 3.2609   LearningRate 0.0606   Epoch: 4   Global Step: 73880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:32,213-Speed 3267.19 samples/sec   Loss 3.2737   LearningRate 0.0606   Epoch: 4   Global Step: 73890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:35,279-Speed 3341.04 samples/sec   Loss 3.2871   LearningRate 0.0606   Epoch: 4   Global Step: 73900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:38,398-Speed 3283.97 samples/sec   Loss 3.2917   LearningRate 0.0606   Epoch: 4   Global Step: 73910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:41,525-Speed 3276.11 samples/sec   Loss 3.2818   LearningRate 0.0606   Epoch: 4   Global Step: 73920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:44,599-Speed 3331.88 samples/sec   Loss 3.2532   LearningRate 0.0606   Epoch: 4   Global Step: 73930   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:47,684-Speed 3318.94 samples/sec   Loss 3.2950   LearningRate 0.0606   Epoch: 4   Global Step: 73940   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:50,751-Speed 3340.03 samples/sec   Loss 3.3047   LearningRate 0.0606   Epoch: 4   Global Step: 73950   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:01:53,813-Speed 3344.53 samples/sec   Loss 3.3148   LearningRate 0.0606   Epoch: 4   Global Step: 73960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:01:56,930-Speed 3286.07 samples/sec   Loss 3.2022   LearningRate 0.0606   Epoch: 4   Global Step: 73970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:02:00,067-Speed 3265.55 samples/sec   Loss 3.2865   LearningRate 0.0606   Epoch: 4   Global Step: 73980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:02:03,202-Speed 3267.57 samples/sec   Loss 3.2662   LearningRate 0.0606   Epoch: 4   Global Step: 73990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:02:06,280-Speed 3327.66 samples/sec   Loss 3.3125   LearningRate 0.0606   Epoch: 4   Global Step: 74000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:02:50,854-[lfw][74000]XNorm: 21.204751
Training: 2022-04-11 07:02:50,855-[lfw][74000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 07:02:50,855-[lfw][74000]Accuracy-Highest: 0.99817
Training: 2022-04-11 07:03:42,841-[cfp_fp][74000]XNorm: 19.676139
Training: 2022-04-11 07:03:42,842-[cfp_fp][74000]Accuracy-Flip: 0.98100+-0.00550
Training: 2022-04-11 07:03:42,842-[cfp_fp][74000]Accuracy-Highest: 0.98457
Training: 2022-04-11 07:04:27,344-[agedb_30][74000]XNorm: 21.375119
Training: 2022-04-11 07:04:27,344-[agedb_30][74000]Accuracy-Flip: 0.98100+-0.00814
Training: 2022-04-11 07:04:27,345-[agedb_30][74000]Accuracy-Highest: 0.98100
Training: 2022-04-11 07:04:30,408-Speed 71.05 samples/sec   Loss 3.2544   LearningRate 0.0606   Epoch: 4   Global Step: 74010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:04:33,456-Speed 3359.44 samples/sec   Loss 3.2331   LearningRate 0.0606   Epoch: 4   Global Step: 74020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:04:36,513-Speed 3350.80 samples/sec   Loss 3.2165   LearningRate 0.0606   Epoch: 4   Global Step: 74030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:04:39,564-Speed 3356.80 samples/sec   Loss 3.2193   LearningRate 0.0606   Epoch: 4   Global Step: 74040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:04:42,630-Speed 3340.68 samples/sec   Loss 3.3562   LearningRate 0.0606   Epoch: 4   Global Step: 74050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:04:45,678-Speed 3360.36 samples/sec   Loss 3.1965   LearningRate 0.0606   Epoch: 4   Global Step: 74060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:04:48,763-Speed 3319.68 samples/sec   Loss 3.2202   LearningRate 0.0605   Epoch: 4   Global Step: 74070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:04:51,868-Speed 3299.21 samples/sec   Loss 3.2486   LearningRate 0.0605   Epoch: 4   Global Step: 74080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:04:55,051-Speed 3217.88 samples/sec   Loss 3.2862   LearningRate 0.0605   Epoch: 4   Global Step: 74090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:04:58,175-Speed 3278.69 samples/sec   Loss 3.2775   LearningRate 0.0605   Epoch: 4   Global Step: 74100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:01,265-Speed 3314.53 samples/sec   Loss 3.2902   LearningRate 0.0605   Epoch: 4   Global Step: 74110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:04,329-Speed 3343.44 samples/sec   Loss 3.3621   LearningRate 0.0605   Epoch: 4   Global Step: 74120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:07,405-Speed 3329.27 samples/sec   Loss 3.2728   LearningRate 0.0605   Epoch: 4   Global Step: 74130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:10,492-Speed 3318.09 samples/sec   Loss 3.3245   LearningRate 0.0605   Epoch: 4   Global Step: 74140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:13,554-Speed 3344.37 samples/sec   Loss 3.2977   LearningRate 0.0605   Epoch: 4   Global Step: 74150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:16,623-Speed 3337.30 samples/sec   Loss 3.2735   LearningRate 0.0605   Epoch: 4   Global Step: 74160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:19,739-Speed 3287.42 samples/sec   Loss 3.2439   LearningRate 0.0605   Epoch: 4   Global Step: 74170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:22,848-Speed 3294.77 samples/sec   Loss 3.3009   LearningRate 0.0605   Epoch: 4   Global Step: 74180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:25,937-Speed 3316.23 samples/sec   Loss 3.2929   LearningRate 0.0605   Epoch: 4   Global Step: 74190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:28,997-Speed 3347.49 samples/sec   Loss 3.3151   LearningRate 0.0605   Epoch: 4   Global Step: 74200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:32,060-Speed 3342.97 samples/sec   Loss 3.3489   LearningRate 0.0605   Epoch: 4   Global Step: 74210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:35,120-Speed 3347.66 samples/sec   Loss 3.2882   LearningRate 0.0605   Epoch: 4   Global Step: 74220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:38,181-Speed 3345.70 samples/sec   Loss 3.3094   LearningRate 0.0605   Epoch: 4   Global Step: 74230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:41,244-Speed 3343.61 samples/sec   Loss 3.4182   LearningRate 0.0605   Epoch: 4   Global Step: 74240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:44,325-Speed 3324.78 samples/sec   Loss 3.2703   LearningRate 0.0605   Epoch: 4   Global Step: 74250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:47,374-Speed 3358.64 samples/sec   Loss 3.2748   LearningRate 0.0605   Epoch: 4   Global Step: 74260   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:50,443-Speed 3337.96 samples/sec   Loss 3.2130   LearningRate 0.0605   Epoch: 4   Global Step: 74270   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:53,506-Speed 3344.31 samples/sec   Loss 3.3252   LearningRate 0.0604   Epoch: 4   Global Step: 74280   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:56,572-Speed 3343.15 samples/sec   Loss 3.2135   LearningRate 0.0604   Epoch: 4   Global Step: 74290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:05:59,659-Speed 3318.03 samples/sec   Loss 3.2272   LearningRate 0.0604   Epoch: 4   Global Step: 74300   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:06:02,811-Speed 3249.06 samples/sec   Loss 3.2699   LearningRate 0.0604   Epoch: 4   Global Step: 74310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:05,972-Speed 3239.98 samples/sec   Loss 3.2946   LearningRate 0.0604   Epoch: 4   Global Step: 74320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:09,046-Speed 3333.00 samples/sec   Loss 3.3461   LearningRate 0.0604   Epoch: 4   Global Step: 74330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:12,127-Speed 3323.47 samples/sec   Loss 3.2170   LearningRate 0.0604   Epoch: 4   Global Step: 74340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:15,185-Speed 3349.76 samples/sec   Loss 3.2633   LearningRate 0.0604   Epoch: 4   Global Step: 74350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:18,299-Speed 3289.66 samples/sec   Loss 3.3337   LearningRate 0.0604   Epoch: 4   Global Step: 74360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:21,363-Speed 3342.63 samples/sec   Loss 3.3547   LearningRate 0.0604   Epoch: 4   Global Step: 74370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:24,511-Speed 3253.52 samples/sec   Loss 3.3686   LearningRate 0.0604   Epoch: 4   Global Step: 74380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:27,648-Speed 3265.79 samples/sec   Loss 3.2416   LearningRate 0.0604   Epoch: 4   Global Step: 74390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:30,740-Speed 3311.63 samples/sec   Loss 3.2669   LearningRate 0.0604   Epoch: 4   Global Step: 74400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:06:33,869-Speed 3274.24 samples/sec   Loss 3.3663   LearningRate 0.0604   Epoch: 4   Global Step: 74410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:06:36,948-Speed 3326.39 samples/sec   Loss 3.2449   LearningRate 0.0604   Epoch: 4   Global Step: 74420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:06:40,013-Speed 3341.65 samples/sec   Loss 3.2839   LearningRate 0.0604   Epoch: 4   Global Step: 74430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:06:43,075-Speed 3344.31 samples/sec   Loss 3.2251   LearningRate 0.0604   Epoch: 4   Global Step: 74440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:06:46,149-Speed 3332.25 samples/sec   Loss 3.3148   LearningRate 0.0604   Epoch: 4   Global Step: 74450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:06:49,224-Speed 3331.47 samples/sec   Loss 3.3347   LearningRate 0.0604   Epoch: 4   Global Step: 74460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:06:52,311-Speed 3317.75 samples/sec   Loss 3.2927   LearningRate 0.0604   Epoch: 4   Global Step: 74470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:06:55,373-Speed 3345.25 samples/sec   Loss 3.3119   LearningRate 0.0604   Epoch: 4   Global Step: 74480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:06:58,451-Speed 3327.51 samples/sec   Loss 3.3044   LearningRate 0.0604   Epoch: 4   Global Step: 74490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:07:01,509-Speed 3349.25 samples/sec   Loss 3.3518   LearningRate 0.0603   Epoch: 4   Global Step: 74500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:07:04,612-Speed 3300.23 samples/sec   Loss 3.2364   LearningRate 0.0603   Epoch: 4   Global Step: 74510   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:07:07,698-Speed 3319.58 samples/sec   Loss 3.3477   LearningRate 0.0603   Epoch: 4   Global Step: 74520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:10,803-Speed 3297.95 samples/sec   Loss 3.2420   LearningRate 0.0603   Epoch: 4   Global Step: 74530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:13,903-Speed 3304.87 samples/sec   Loss 3.2691   LearningRate 0.0603   Epoch: 4   Global Step: 74540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:16,962-Speed 3347.65 samples/sec   Loss 3.2207   LearningRate 0.0603   Epoch: 4   Global Step: 74550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:20,019-Speed 3351.56 samples/sec   Loss 3.2006   LearningRate 0.0603   Epoch: 4   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:23,075-Speed 3350.84 samples/sec   Loss 3.3755   LearningRate 0.0603   Epoch: 4   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:26,132-Speed 3350.25 samples/sec   Loss 3.2878   LearningRate 0.0603   Epoch: 4   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:29,195-Speed 3344.65 samples/sec   Loss 3.3057   LearningRate 0.0603   Epoch: 4   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:32,254-Speed 3347.28 samples/sec   Loss 3.2576   LearningRate 0.0603   Epoch: 4   Global Step: 74600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:35,348-Speed 3310.85 samples/sec   Loss 3.2444   LearningRate 0.0603   Epoch: 4   Global Step: 74610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:38,420-Speed 3334.63 samples/sec   Loss 3.3012   LearningRate 0.0603   Epoch: 4   Global Step: 74620   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:07:41,476-Speed 3351.43 samples/sec   Loss 3.2565   LearningRate 0.0603   Epoch: 4   Global Step: 74630   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:07:44,542-Speed 3340.41 samples/sec   Loss 3.2386   LearningRate 0.0603   Epoch: 4   Global Step: 74640   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:07:47,638-Speed 3308.66 samples/sec   Loss 3.2842   LearningRate 0.0603   Epoch: 4   Global Step: 74650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:50,717-Speed 3327.00 samples/sec   Loss 3.3011   LearningRate 0.0603   Epoch: 4   Global Step: 74660   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:53,781-Speed 3342.12 samples/sec   Loss 3.2978   LearningRate 0.0603   Epoch: 4   Global Step: 74670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:56,839-Speed 3350.13 samples/sec   Loss 3.2592   LearningRate 0.0603   Epoch: 4   Global Step: 74680   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:07:59,911-Speed 3333.45 samples/sec   Loss 3.3552   LearningRate 0.0603   Epoch: 4   Global Step: 74690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:08:02,969-Speed 3349.58 samples/sec   Loss 3.2640   LearningRate 0.0603   Epoch: 4   Global Step: 74700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:08:06,045-Speed 3329.52 samples/sec   Loss 3.2511   LearningRate 0.0602   Epoch: 4   Global Step: 74710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:08:09,102-Speed 3350.52 samples/sec   Loss 3.2150   LearningRate 0.0602   Epoch: 4   Global Step: 74720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:08:12,162-Speed 3347.36 samples/sec   Loss 3.2541   LearningRate 0.0602   Epoch: 4   Global Step: 74730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:08:15,232-Speed 3336.45 samples/sec   Loss 3.1573   LearningRate 0.0602   Epoch: 4   Global Step: 74740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:08:18,290-Speed 3348.79 samples/sec   Loss 3.2967   LearningRate 0.0602   Epoch: 4   Global Step: 74750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:21,348-Speed 3349.75 samples/sec   Loss 3.3165   LearningRate 0.0602   Epoch: 4   Global Step: 74760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:24,407-Speed 3348.32 samples/sec   Loss 3.2914   LearningRate 0.0602   Epoch: 4   Global Step: 74770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:27,491-Speed 3320.78 samples/sec   Loss 3.2641   LearningRate 0.0602   Epoch: 4   Global Step: 74780   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:30,557-Speed 3340.89 samples/sec   Loss 3.2566   LearningRate 0.0602   Epoch: 4   Global Step: 74790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:33,618-Speed 3346.63 samples/sec   Loss 3.2268   LearningRate 0.0602   Epoch: 4   Global Step: 74800   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:36,695-Speed 3328.41 samples/sec   Loss 3.2361   LearningRate 0.0602   Epoch: 4   Global Step: 74810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:39,760-Speed 3341.45 samples/sec   Loss 3.3143   LearningRate 0.0602   Epoch: 4   Global Step: 74820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:42,824-Speed 3343.36 samples/sec   Loss 3.2778   LearningRate 0.0602   Epoch: 4   Global Step: 74830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:45,888-Speed 3343.19 samples/sec   Loss 3.3189   LearningRate 0.0602   Epoch: 4   Global Step: 74840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:48,951-Speed 3343.43 samples/sec   Loss 3.2673   LearningRate 0.0602   Epoch: 4   Global Step: 74850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:52,029-Speed 3327.54 samples/sec   Loss 3.2543   LearningRate 0.0602   Epoch: 4   Global Step: 74860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:55,107-Speed 3327.24 samples/sec   Loss 3.2872   LearningRate 0.0602   Epoch: 4   Global Step: 74870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:08:58,172-Speed 3341.99 samples/sec   Loss 3.3460   LearningRate 0.0602   Epoch: 4   Global Step: 74880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:01,237-Speed 3341.37 samples/sec   Loss 3.2592   LearningRate 0.0602   Epoch: 4   Global Step: 74890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:04,314-Speed 3328.91 samples/sec   Loss 3.3180   LearningRate 0.0602   Epoch: 4   Global Step: 74900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:07,376-Speed 3345.33 samples/sec   Loss 3.2942   LearningRate 0.0602   Epoch: 4   Global Step: 74910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:10,470-Speed 3310.97 samples/sec   Loss 3.3436   LearningRate 0.0602   Epoch: 4   Global Step: 74920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:13,531-Speed 3346.23 samples/sec   Loss 3.2998   LearningRate 0.0601   Epoch: 4   Global Step: 74930   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:16,615-Speed 3321.09 samples/sec   Loss 3.2837   LearningRate 0.0601   Epoch: 4   Global Step: 74940   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:19,674-Speed 3348.93 samples/sec   Loss 3.2527   LearningRate 0.0601   Epoch: 4   Global Step: 74950   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:22,746-Speed 3333.99 samples/sec   Loss 3.2406   LearningRate 0.0601   Epoch: 4   Global Step: 74960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:25,818-Speed 3334.18 samples/sec   Loss 3.3165   LearningRate 0.0601   Epoch: 4   Global Step: 74970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:28,877-Speed 3348.47 samples/sec   Loss 3.2494   LearningRate 0.0601   Epoch: 4   Global Step: 74980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:31,940-Speed 3344.22 samples/sec   Loss 3.3055   LearningRate 0.0601   Epoch: 4   Global Step: 74990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:35,004-Speed 3342.36 samples/sec   Loss 3.3178   LearningRate 0.0601   Epoch: 4   Global Step: 75000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:38,085-Speed 3324.16 samples/sec   Loss 3.3992   LearningRate 0.0601   Epoch: 4   Global Step: 75010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:41,162-Speed 3328.60 samples/sec   Loss 3.3505   LearningRate 0.0601   Epoch: 4   Global Step: 75020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:44,228-Speed 3341.05 samples/sec   Loss 3.2675   LearningRate 0.0601   Epoch: 4   Global Step: 75030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:47,294-Speed 3340.43 samples/sec   Loss 3.3086   LearningRate 0.0601   Epoch: 4   Global Step: 75040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:50,375-Speed 3324.44 samples/sec   Loss 3.2563   LearningRate 0.0601   Epoch: 4   Global Step: 75050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:53,442-Speed 3339.04 samples/sec   Loss 3.2645   LearningRate 0.0601   Epoch: 4   Global Step: 75060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:56,513-Speed 3335.49 samples/sec   Loss 3.2068   LearningRate 0.0601   Epoch: 4   Global Step: 75070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:09:59,559-Speed 3363.66 samples/sec   Loss 3.2382   LearningRate 0.0601   Epoch: 4   Global Step: 75080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:02,622-Speed 3343.06 samples/sec   Loss 3.2927   LearningRate 0.0601   Epoch: 4   Global Step: 75090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:05,703-Speed 3324.97 samples/sec   Loss 3.2617   LearningRate 0.0601   Epoch: 4   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:08,763-Speed 3347.07 samples/sec   Loss 3.3105   LearningRate 0.0601   Epoch: 4   Global Step: 75110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:11,828-Speed 3341.97 samples/sec   Loss 3.2568   LearningRate 0.0601   Epoch: 4   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:14,889-Speed 3345.24 samples/sec   Loss 3.2946   LearningRate 0.0601   Epoch: 4   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:17,955-Speed 3341.17 samples/sec   Loss 3.3146   LearningRate 0.0600   Epoch: 4   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:21,022-Speed 3339.15 samples/sec   Loss 3.3301   LearningRate 0.0600   Epoch: 4   Global Step: 75150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:24,083-Speed 3346.58 samples/sec   Loss 3.2893   LearningRate 0.0600   Epoch: 4   Global Step: 75160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:27,152-Speed 3338.19 samples/sec   Loss 3.2613   LearningRate 0.0600   Epoch: 4   Global Step: 75170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:10:30,229-Speed 3327.79 samples/sec   Loss 3.2513   LearningRate 0.0600   Epoch: 4   Global Step: 75180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:10:33,330-Speed 3303.96 samples/sec   Loss 3.2450   LearningRate 0.0600   Epoch: 4   Global Step: 75190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:10:36,393-Speed 3343.85 samples/sec   Loss 3.2950   LearningRate 0.0600   Epoch: 4   Global Step: 75200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:10:39,460-Speed 3338.99 samples/sec   Loss 3.2796   LearningRate 0.0600   Epoch: 4   Global Step: 75210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:10:42,554-Speed 3310.39 samples/sec   Loss 3.3347   LearningRate 0.0600   Epoch: 4   Global Step: 75220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:10:45,679-Speed 3278.25 samples/sec   Loss 3.2745   LearningRate 0.0600   Epoch: 4   Global Step: 75230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:10:48,813-Speed 3267.78 samples/sec   Loss 3.3048   LearningRate 0.0600   Epoch: 4   Global Step: 75240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:10:51,874-Speed 3346.79 samples/sec   Loss 3.2249   LearningRate 0.0600   Epoch: 4   Global Step: 75250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:10:54,938-Speed 3343.14 samples/sec   Loss 3.2074   LearningRate 0.0600   Epoch: 4   Global Step: 75260   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:10:58,002-Speed 3342.62 samples/sec   Loss 3.2587   LearningRate 0.0600   Epoch: 4   Global Step: 75270   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:01,059-Speed 3350.16 samples/sec   Loss 3.2645   LearningRate 0.0600   Epoch: 4   Global Step: 75280   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:04,121-Speed 3345.63 samples/sec   Loss 3.2966   LearningRate 0.0600   Epoch: 4   Global Step: 75290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:07,179-Speed 3349.66 samples/sec   Loss 3.2630   LearningRate 0.0600   Epoch: 4   Global Step: 75300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:10,247-Speed 3338.28 samples/sec   Loss 3.3139   LearningRate 0.0600   Epoch: 4   Global Step: 75310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:13,325-Speed 3327.47 samples/sec   Loss 3.3060   LearningRate 0.0600   Epoch: 4   Global Step: 75320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:16,389-Speed 3342.91 samples/sec   Loss 3.2732   LearningRate 0.0600   Epoch: 4   Global Step: 75330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:19,516-Speed 3275.59 samples/sec   Loss 3.2644   LearningRate 0.0600   Epoch: 4   Global Step: 75340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:22,582-Speed 3339.87 samples/sec   Loss 3.2716   LearningRate 0.0600   Epoch: 4   Global Step: 75350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:25,704-Speed 3281.17 samples/sec   Loss 3.2580   LearningRate 0.0599   Epoch: 4   Global Step: 75360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:28,799-Speed 3309.07 samples/sec   Loss 3.2681   LearningRate 0.0599   Epoch: 4   Global Step: 75370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:31,926-Speed 3275.61 samples/sec   Loss 3.2077   LearningRate 0.0599   Epoch: 4   Global Step: 75380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:35,095-Speed 3231.93 samples/sec   Loss 3.3112   LearningRate 0.0599   Epoch: 4   Global Step: 75390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:11:38,182-Speed 3318.80 samples/sec   Loss 3.2524   LearningRate 0.0599   Epoch: 4   Global Step: 75400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:41,247-Speed 3341.59 samples/sec   Loss 3.3394   LearningRate 0.0599   Epoch: 4   Global Step: 75410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:44,310-Speed 3344.14 samples/sec   Loss 3.2422   LearningRate 0.0599   Epoch: 4   Global Step: 75420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:47,463-Speed 3248.48 samples/sec   Loss 3.2939   LearningRate 0.0599   Epoch: 4   Global Step: 75430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:50,544-Speed 3324.57 samples/sec   Loss 3.3756   LearningRate 0.0599   Epoch: 4   Global Step: 75440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:53,615-Speed 3334.98 samples/sec   Loss 3.3338   LearningRate 0.0599   Epoch: 4   Global Step: 75450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:56,686-Speed 3334.37 samples/sec   Loss 3.3105   LearningRate 0.0599   Epoch: 4   Global Step: 75460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:11:59,764-Speed 3328.34 samples/sec   Loss 3.2991   LearningRate 0.0599   Epoch: 4   Global Step: 75470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:12:02,841-Speed 3328.12 samples/sec   Loss 3.2620   LearningRate 0.0599   Epoch: 4   Global Step: 75480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:12:05,916-Speed 3330.85 samples/sec   Loss 3.2387   LearningRate 0.0599   Epoch: 4   Global Step: 75490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:12:08,987-Speed 3335.33 samples/sec   Loss 3.2869   LearningRate 0.0599   Epoch: 4   Global Step: 75500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:12:12,075-Speed 3316.86 samples/sec   Loss 3.2685   LearningRate 0.0599   Epoch: 4   Global Step: 75510   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:12:15,166-Speed 3313.46 samples/sec   Loss 3.2896   LearningRate 0.0599   Epoch: 4   Global Step: 75520   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:12:18,235-Speed 3337.91 samples/sec   Loss 3.2659   LearningRate 0.0599   Epoch: 4   Global Step: 75530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:12:21,287-Speed 3355.31 samples/sec   Loss 3.2407   LearningRate 0.0599   Epoch: 4   Global Step: 75540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:24,359-Speed 3334.95 samples/sec   Loss 3.2772   LearningRate 0.0599   Epoch: 4   Global Step: 75550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:27,430-Speed 3334.49 samples/sec   Loss 3.3474   LearningRate 0.0599   Epoch: 4   Global Step: 75560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:30,580-Speed 3251.52 samples/sec   Loss 3.2219   LearningRate 0.0598   Epoch: 4   Global Step: 75570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:33,654-Speed 3332.80 samples/sec   Loss 3.2014   LearningRate 0.0598   Epoch: 4   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:36,717-Speed 3343.56 samples/sec   Loss 3.3363   LearningRate 0.0598   Epoch: 4   Global Step: 75590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:39,778-Speed 3346.82 samples/sec   Loss 3.2969   LearningRate 0.0598   Epoch: 4   Global Step: 75600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:42,845-Speed 3338.54 samples/sec   Loss 3.1868   LearningRate 0.0598   Epoch: 4   Global Step: 75610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:45,933-Speed 3316.66 samples/sec   Loss 3.2601   LearningRate 0.0598   Epoch: 4   Global Step: 75620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:48,998-Speed 3341.98 samples/sec   Loss 3.2089   LearningRate 0.0598   Epoch: 4   Global Step: 75630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:12:52,095-Speed 3306.96 samples/sec   Loss 3.2971   LearningRate 0.0598   Epoch: 4   Global Step: 75640   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:12:55,161-Speed 3340.64 samples/sec   Loss 3.2461   LearningRate 0.0598   Epoch: 4   Global Step: 75650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:12:58,238-Speed 3328.85 samples/sec   Loss 3.2924   LearningRate 0.0598   Epoch: 4   Global Step: 75660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:01,323-Speed 3321.14 samples/sec   Loss 3.2569   LearningRate 0.0598   Epoch: 4   Global Step: 75670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:04,397-Speed 3331.24 samples/sec   Loss 3.2759   LearningRate 0.0598   Epoch: 4   Global Step: 75680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:07,479-Speed 3323.84 samples/sec   Loss 3.2279   LearningRate 0.0598   Epoch: 4   Global Step: 75690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:10,553-Speed 3332.39 samples/sec   Loss 3.3100   LearningRate 0.0598   Epoch: 4   Global Step: 75700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:13,628-Speed 3330.45 samples/sec   Loss 3.2774   LearningRate 0.0598   Epoch: 4   Global Step: 75710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:16,696-Speed 3338.52 samples/sec   Loss 3.2473   LearningRate 0.0598   Epoch: 4   Global Step: 75720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:19,759-Speed 3343.41 samples/sec   Loss 3.3029   LearningRate 0.0598   Epoch: 4   Global Step: 75730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:22,871-Speed 3291.46 samples/sec   Loss 3.2292   LearningRate 0.0598   Epoch: 4   Global Step: 75740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:25,954-Speed 3322.41 samples/sec   Loss 3.1526   LearningRate 0.0598   Epoch: 4   Global Step: 75750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:29,024-Speed 3336.23 samples/sec   Loss 3.2433   LearningRate 0.0598   Epoch: 4   Global Step: 75760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:32,088-Speed 3342.83 samples/sec   Loss 3.2825   LearningRate 0.0598   Epoch: 4   Global Step: 75770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:35,214-Speed 3276.68 samples/sec   Loss 3.3334   LearningRate 0.0598   Epoch: 4   Global Step: 75780   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:38,296-Speed 3323.43 samples/sec   Loss 3.2691   LearningRate 0.0597   Epoch: 4   Global Step: 75790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:41,362-Speed 3341.07 samples/sec   Loss 3.2129   LearningRate 0.0597   Epoch: 4   Global Step: 75800   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:44,431-Speed 3337.45 samples/sec   Loss 3.2149   LearningRate 0.0597   Epoch: 4   Global Step: 75810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:47,629-Speed 3202.22 samples/sec   Loss 3.1838   LearningRate 0.0597   Epoch: 4   Global Step: 75820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:50,722-Speed 3312.16 samples/sec   Loss 3.2559   LearningRate 0.0597   Epoch: 4   Global Step: 75830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:53,801-Speed 3325.79 samples/sec   Loss 3.2281   LearningRate 0.0597   Epoch: 4   Global Step: 75840   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:13:56,852-Speed 3357.47 samples/sec   Loss 3.3639   LearningRate 0.0597   Epoch: 4   Global Step: 75850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:13:59,965-Speed 3290.94 samples/sec   Loss 3.2617   LearningRate 0.0597   Epoch: 4   Global Step: 75860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:03,080-Speed 3287.38 samples/sec   Loss 3.2163   LearningRate 0.0597   Epoch: 4   Global Step: 75870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:06,157-Speed 3328.78 samples/sec   Loss 3.2631   LearningRate 0.0597   Epoch: 4   Global Step: 75880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:09,259-Speed 3302.11 samples/sec   Loss 3.3439   LearningRate 0.0597   Epoch: 4   Global Step: 75890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:12,355-Speed 3307.89 samples/sec   Loss 3.2555   LearningRate 0.0597   Epoch: 4   Global Step: 75900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:15,459-Speed 3299.60 samples/sec   Loss 3.3017   LearningRate 0.0597   Epoch: 4   Global Step: 75910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:18,547-Speed 3316.77 samples/sec   Loss 3.3044   LearningRate 0.0597   Epoch: 4   Global Step: 75920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:21,641-Speed 3310.54 samples/sec   Loss 3.2009   LearningRate 0.0597   Epoch: 4   Global Step: 75930   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:24,744-Speed 3301.85 samples/sec   Loss 3.2023   LearningRate 0.0597   Epoch: 4   Global Step: 75940   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:27,811-Speed 3339.16 samples/sec   Loss 3.1838   LearningRate 0.0597   Epoch: 4   Global Step: 75950   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:30,883-Speed 3334.68 samples/sec   Loss 3.2853   LearningRate 0.0597   Epoch: 4   Global Step: 75960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:33,952-Speed 3336.31 samples/sec   Loss 3.2354   LearningRate 0.0597   Epoch: 4   Global Step: 75970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:37,020-Speed 3339.03 samples/sec   Loss 3.2267   LearningRate 0.0597   Epoch: 4   Global Step: 75980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:40,117-Speed 3306.95 samples/sec   Loss 3.2938   LearningRate 0.0597   Epoch: 4   Global Step: 75990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:14:43,197-Speed 3325.99 samples/sec   Loss 3.2206   LearningRate 0.0596   Epoch: 4   Global Step: 76000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:15:26,919-[lfw][76000]XNorm: 21.599380
Training: 2022-04-11 07:15:26,920-[lfw][76000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-11 07:15:26,920-[lfw][76000]Accuracy-Highest: 0.99817
Training: 2022-04-11 07:16:17,949-[cfp_fp][76000]XNorm: 20.631499
Training: 2022-04-11 07:16:17,950-[cfp_fp][76000]Accuracy-Flip: 0.98543+-0.00418
Training: 2022-04-11 07:16:17,951-[cfp_fp][76000]Accuracy-Highest: 0.98543
Training: 2022-04-11 07:17:01,867-[agedb_30][76000]XNorm: 21.886734
Training: 2022-04-11 07:17:01,867-[agedb_30][76000]Accuracy-Flip: 0.97950+-0.00792
Training: 2022-04-11 07:17:01,868-[agedb_30][76000]Accuracy-Highest: 0.98100
Training: 2022-04-11 07:17:04,944-Speed 72.24 samples/sec   Loss 3.2302   LearningRate 0.0596   Epoch: 4   Global Step: 76010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:07,988-Speed 3364.83 samples/sec   Loss 3.2520   LearningRate 0.0596   Epoch: 4   Global Step: 76020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:11,043-Speed 3352.56 samples/sec   Loss 3.2776   LearningRate 0.0596   Epoch: 4   Global Step: 76030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:14,091-Speed 3360.85 samples/sec   Loss 3.2712   LearningRate 0.0596   Epoch: 4   Global Step: 76040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:17,133-Speed 3366.82 samples/sec   Loss 3.3156   LearningRate 0.0596   Epoch: 4   Global Step: 76050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:20,262-Speed 3273.14 samples/sec   Loss 3.2481   LearningRate 0.0596   Epoch: 4   Global Step: 76060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:23,315-Speed 3355.95 samples/sec   Loss 3.2907   LearningRate 0.0596   Epoch: 4   Global Step: 76070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:26,377-Speed 3344.79 samples/sec   Loss 3.2349   LearningRate 0.0596   Epoch: 4   Global Step: 76080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:29,468-Speed 3313.19 samples/sec   Loss 3.3225   LearningRate 0.0596   Epoch: 4   Global Step: 76090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:32,535-Speed 3339.65 samples/sec   Loss 3.2715   LearningRate 0.0596   Epoch: 4   Global Step: 76100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:17:35,580-Speed 3363.97 samples/sec   Loss 3.3005   LearningRate 0.0596   Epoch: 4   Global Step: 76110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:17:38,644-Speed 3342.82 samples/sec   Loss 3.2881   LearningRate 0.0596   Epoch: 4   Global Step: 76120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:17:41,882-Speed 3162.49 samples/sec   Loss 3.3278   LearningRate 0.0596   Epoch: 4   Global Step: 76130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:17:45,137-Speed 3147.01 samples/sec   Loss 3.2344   LearningRate 0.0596   Epoch: 4   Global Step: 76140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:17:48,248-Speed 3292.73 samples/sec   Loss 3.3049   LearningRate 0.0596   Epoch: 4   Global Step: 76150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:17:51,405-Speed 3244.31 samples/sec   Loss 3.2394   LearningRate 0.0596   Epoch: 4   Global Step: 76160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:17:54,599-Speed 3206.90 samples/sec   Loss 3.2946   LearningRate 0.0596   Epoch: 4   Global Step: 76170   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:17:57,667-Speed 3338.42 samples/sec   Loss 3.2772   LearningRate 0.0596   Epoch: 4   Global Step: 76180   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:18:00,764-Speed 3306.73 samples/sec   Loss 3.3607   LearningRate 0.0596   Epoch: 4   Global Step: 76190   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:18:03,825-Speed 3345.86 samples/sec   Loss 3.2530   LearningRate 0.0596   Epoch: 4   Global Step: 76200   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:18:06,926-Speed 3303.07 samples/sec   Loss 3.2153   LearningRate 0.0596   Epoch: 4   Global Step: 76210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:09,989-Speed 3344.43 samples/sec   Loss 3.2608   LearningRate 0.0595   Epoch: 4   Global Step: 76220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:13,095-Speed 3296.66 samples/sec   Loss 3.2842   LearningRate 0.0595   Epoch: 4   Global Step: 76230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:16,173-Speed 3329.06 samples/sec   Loss 3.2396   LearningRate 0.0595   Epoch: 4   Global Step: 76240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:19,235-Speed 3344.61 samples/sec   Loss 3.2232   LearningRate 0.0595   Epoch: 4   Global Step: 76250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:22,302-Speed 3339.02 samples/sec   Loss 3.2526   LearningRate 0.0595   Epoch: 4   Global Step: 76260   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:25,373-Speed 3336.07 samples/sec   Loss 3.2850   LearningRate 0.0595   Epoch: 4   Global Step: 76270   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:28,438-Speed 3341.10 samples/sec   Loss 3.2547   LearningRate 0.0595   Epoch: 4   Global Step: 76280   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:31,503-Speed 3341.50 samples/sec   Loss 3.1981   LearningRate 0.0595   Epoch: 4   Global Step: 76290   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:34,577-Speed 3331.63 samples/sec   Loss 3.2366   LearningRate 0.0595   Epoch: 4   Global Step: 76300   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:37,676-Speed 3305.65 samples/sec   Loss 3.1764   LearningRate 0.0595   Epoch: 4   Global Step: 76310   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:40,756-Speed 3324.87 samples/sec   Loss 3.2355   LearningRate 0.0595   Epoch: 4   Global Step: 76320   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:43,872-Speed 3287.95 samples/sec   Loss 3.2149   LearningRate 0.0595   Epoch: 4   Global Step: 76330   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:47,073-Speed 3200.08 samples/sec   Loss 3.2840   LearningRate 0.0595   Epoch: 4   Global Step: 76340   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:50,156-Speed 3322.17 samples/sec   Loss 3.2327   LearningRate 0.0595   Epoch: 4   Global Step: 76350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:53,250-Speed 3310.01 samples/sec   Loss 3.1939   LearningRate 0.0595   Epoch: 4   Global Step: 76360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:18:56,313-Speed 3343.13 samples/sec   Loss 3.2262   LearningRate 0.0595   Epoch: 4   Global Step: 76370   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:18:59,377-Speed 3343.51 samples/sec   Loss 3.2317   LearningRate 0.0595   Epoch: 4   Global Step: 76380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:19:02,443-Speed 3340.91 samples/sec   Loss 3.1917   LearningRate 0.0595   Epoch: 4   Global Step: 76390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:19:05,593-Speed 3250.63 samples/sec   Loss 3.2348   LearningRate 0.0595   Epoch: 4   Global Step: 76400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:19:08,649-Speed 3351.51 samples/sec   Loss 3.1871   LearningRate 0.0595   Epoch: 4   Global Step: 76410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:19:11,708-Speed 3349.32 samples/sec   Loss 3.1822   LearningRate 0.0595   Epoch: 4   Global Step: 76420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:19:14,770-Speed 3345.51 samples/sec   Loss 3.2592   LearningRate 0.0595   Epoch: 4   Global Step: 76430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:19:17,829-Speed 3348.25 samples/sec   Loss 3.2685   LearningRate 0.0594   Epoch: 4   Global Step: 76440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:19:20,897-Speed 3338.40 samples/sec   Loss 3.2121   LearningRate 0.0594   Epoch: 4   Global Step: 76450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:19:23,984-Speed 3317.45 samples/sec   Loss 3.2773   LearningRate 0.0594   Epoch: 4   Global Step: 76460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:19:27,072-Speed 3317.19 samples/sec   Loss 3.2674   LearningRate 0.0594   Epoch: 4   Global Step: 76470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:30,256-Speed 3216.99 samples/sec   Loss 3.3160   LearningRate 0.0594   Epoch: 4   Global Step: 76480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:33,367-Speed 3291.87 samples/sec   Loss 3.2970   LearningRate 0.0594   Epoch: 4   Global Step: 76490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:36,496-Speed 3273.17 samples/sec   Loss 3.2645   LearningRate 0.0594   Epoch: 4   Global Step: 76500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:39,569-Speed 3333.62 samples/sec   Loss 3.2549   LearningRate 0.0594   Epoch: 4   Global Step: 76510   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:42,678-Speed 3294.59 samples/sec   Loss 3.2600   LearningRate 0.0594   Epoch: 4   Global Step: 76520   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:45,772-Speed 3310.19 samples/sec   Loss 3.2634   LearningRate 0.0594   Epoch: 4   Global Step: 76530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:48,849-Speed 3329.03 samples/sec   Loss 3.2014   LearningRate 0.0594   Epoch: 4   Global Step: 76540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:51,911-Speed 3344.49 samples/sec   Loss 3.2403   LearningRate 0.0594   Epoch: 4   Global Step: 76550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:55,030-Speed 3283.76 samples/sec   Loss 3.2481   LearningRate 0.0594   Epoch: 4   Global Step: 76560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:19:58,098-Speed 3338.99 samples/sec   Loss 3.2899   LearningRate 0.0594   Epoch: 4   Global Step: 76570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:01,162-Speed 3343.15 samples/sec   Loss 3.2258   LearningRate 0.0594   Epoch: 4   Global Step: 76580   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:04,249-Speed 3316.94 samples/sec   Loss 3.2708   LearningRate 0.0594   Epoch: 4   Global Step: 76590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:07,325-Speed 3330.00 samples/sec   Loss 3.2338   LearningRate 0.0594   Epoch: 4   Global Step: 76600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:10,384-Speed 3349.32 samples/sec   Loss 3.3041   LearningRate 0.0594   Epoch: 4   Global Step: 76610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:13,457-Speed 3332.19 samples/sec   Loss 3.2804   LearningRate 0.0594   Epoch: 4   Global Step: 76620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:16,543-Speed 3319.80 samples/sec   Loss 3.1816   LearningRate 0.0594   Epoch: 4   Global Step: 76630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:19,614-Speed 3334.38 samples/sec   Loss 3.2905   LearningRate 0.0594   Epoch: 4   Global Step: 76640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:22,673-Speed 3348.88 samples/sec   Loss 3.1620   LearningRate 0.0593   Epoch: 4   Global Step: 76650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:25,731-Speed 3349.23 samples/sec   Loss 3.1964   LearningRate 0.0593   Epoch: 4   Global Step: 76660   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:28,793-Speed 3344.25 samples/sec   Loss 3.2503   LearningRate 0.0593   Epoch: 4   Global Step: 76670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:31,866-Speed 3333.84 samples/sec   Loss 3.2632   LearningRate 0.0593   Epoch: 4   Global Step: 76680   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:34,931-Speed 3341.34 samples/sec   Loss 3.2171   LearningRate 0.0593   Epoch: 4   Global Step: 76690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:38,003-Speed 3334.71 samples/sec   Loss 3.1809   LearningRate 0.0593   Epoch: 4   Global Step: 76700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:20:41,064-Speed 3345.51 samples/sec   Loss 3.2604   LearningRate 0.0593   Epoch: 4   Global Step: 76710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:44,140-Speed 3330.15 samples/sec   Loss 3.2016   LearningRate 0.0593   Epoch: 4   Global Step: 76720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:47,216-Speed 3329.47 samples/sec   Loss 3.2005   LearningRate 0.0593   Epoch: 4   Global Step: 76730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:50,339-Speed 3279.24 samples/sec   Loss 3.2611   LearningRate 0.0593   Epoch: 4   Global Step: 76740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:53,407-Speed 3338.47 samples/sec   Loss 3.2479   LearningRate 0.0593   Epoch: 4   Global Step: 76750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:56,484-Speed 3329.44 samples/sec   Loss 3.2236   LearningRate 0.0593   Epoch: 4   Global Step: 76760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:20:59,555-Speed 3335.08 samples/sec   Loss 3.2317   LearningRate 0.0593   Epoch: 4   Global Step: 76770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:02,622-Speed 3339.86 samples/sec   Loss 3.1842   LearningRate 0.0593   Epoch: 4   Global Step: 76780   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:05,694-Speed 3334.07 samples/sec   Loss 3.3470   LearningRate 0.0593   Epoch: 4   Global Step: 76790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:08,791-Speed 3306.83 samples/sec   Loss 3.2366   LearningRate 0.0593   Epoch: 4   Global Step: 76800   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:11,871-Speed 3325.70 samples/sec   Loss 3.3097   LearningRate 0.0593   Epoch: 4   Global Step: 76810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:14,943-Speed 3334.14 samples/sec   Loss 3.3166   LearningRate 0.0593   Epoch: 4   Global Step: 76820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:18,042-Speed 3304.89 samples/sec   Loss 3.1501   LearningRate 0.0593   Epoch: 4   Global Step: 76830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:21,120-Speed 3328.78 samples/sec   Loss 3.3270   LearningRate 0.0593   Epoch: 4   Global Step: 76840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:24,188-Speed 3337.82 samples/sec   Loss 3.1989   LearningRate 0.0593   Epoch: 4   Global Step: 76850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:27,336-Speed 3254.03 samples/sec   Loss 3.2937   LearningRate 0.0593   Epoch: 4   Global Step: 76860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:30,436-Speed 3303.84 samples/sec   Loss 3.1993   LearningRate 0.0592   Epoch: 4   Global Step: 76870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:33,649-Speed 3188.59 samples/sec   Loss 3.3335   LearningRate 0.0592   Epoch: 4   Global Step: 76880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:36,722-Speed 3332.11 samples/sec   Loss 3.2970   LearningRate 0.0592   Epoch: 4   Global Step: 76890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:39,787-Speed 3342.34 samples/sec   Loss 3.3309   LearningRate 0.0592   Epoch: 4   Global Step: 76900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:42,877-Speed 3313.77 samples/sec   Loss 3.2125   LearningRate 0.0592   Epoch: 4   Global Step: 76910   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:21:45,939-Speed 3344.77 samples/sec   Loss 3.2289   LearningRate 0.0592   Epoch: 4   Global Step: 76920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:49,013-Speed 3333.34 samples/sec   Loss 3.2459   LearningRate 0.0592   Epoch: 4   Global Step: 76930   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:52,106-Speed 3310.64 samples/sec   Loss 3.2890   LearningRate 0.0592   Epoch: 4   Global Step: 76940   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:55,167-Speed 3346.35 samples/sec   Loss 3.2073   LearningRate 0.0592   Epoch: 4   Global Step: 76950   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:21:58,235-Speed 3339.51 samples/sec   Loss 3.2350   LearningRate 0.0592   Epoch: 4   Global Step: 76960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:01,307-Speed 3333.57 samples/sec   Loss 3.2564   LearningRate 0.0592   Epoch: 4   Global Step: 76970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:04,382-Speed 3330.66 samples/sec   Loss 3.2878   LearningRate 0.0592   Epoch: 4   Global Step: 76980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:07,464-Speed 3324.00 samples/sec   Loss 3.3160   LearningRate 0.0592   Epoch: 4   Global Step: 76990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:10,562-Speed 3306.27 samples/sec   Loss 3.2531   LearningRate 0.0592   Epoch: 4   Global Step: 77000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:13,629-Speed 3338.87 samples/sec   Loss 3.1809   LearningRate 0.0592   Epoch: 4   Global Step: 77010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:16,699-Speed 3336.19 samples/sec   Loss 3.2464   LearningRate 0.0592   Epoch: 4   Global Step: 77020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:19,865-Speed 3235.38 samples/sec   Loss 3.2955   LearningRate 0.0592   Epoch: 4   Global Step: 77030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:22,942-Speed 3329.35 samples/sec   Loss 3.2657   LearningRate 0.0592   Epoch: 4   Global Step: 77040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:26,014-Speed 3333.37 samples/sec   Loss 3.2726   LearningRate 0.0592   Epoch: 4   Global Step: 77050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:29,094-Speed 3326.10 samples/sec   Loss 3.2137   LearningRate 0.0592   Epoch: 4   Global Step: 77060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:32,162-Speed 3338.32 samples/sec   Loss 3.2162   LearningRate 0.0592   Epoch: 4   Global Step: 77070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:35,226-Speed 3343.23 samples/sec   Loss 3.2043   LearningRate 0.0592   Epoch: 4   Global Step: 77080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:38,289-Speed 3343.86 samples/sec   Loss 3.2977   LearningRate 0.0591   Epoch: 4   Global Step: 77090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:41,388-Speed 3304.84 samples/sec   Loss 3.3095   LearningRate 0.0591   Epoch: 4   Global Step: 77100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:44,488-Speed 3303.80 samples/sec   Loss 3.1657   LearningRate 0.0591   Epoch: 4   Global Step: 77110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:47,631-Speed 3259.66 samples/sec   Loss 3.2400   LearningRate 0.0591   Epoch: 4   Global Step: 77120   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:22:50,689-Speed 3349.20 samples/sec   Loss 3.2007   LearningRate 0.0591   Epoch: 4   Global Step: 77130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:53,762-Speed 3332.95 samples/sec   Loss 3.2696   LearningRate 0.0591   Epoch: 4   Global Step: 77140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:22:56,916-Speed 3247.64 samples/sec   Loss 3.2719   LearningRate 0.0591   Epoch: 4   Global Step: 77150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:00,049-Speed 3269.53 samples/sec   Loss 3.1957   LearningRate 0.0591   Epoch: 4   Global Step: 77160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:03,126-Speed 3328.39 samples/sec   Loss 3.2713   LearningRate 0.0591   Epoch: 4   Global Step: 77170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:06,190-Speed 3342.20 samples/sec   Loss 3.2871   LearningRate 0.0591   Epoch: 4   Global Step: 77180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:09,261-Speed 3335.55 samples/sec   Loss 3.2858   LearningRate 0.0591   Epoch: 4   Global Step: 77190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:12,339-Speed 3327.71 samples/sec   Loss 3.2782   LearningRate 0.0591   Epoch: 4   Global Step: 77200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:15,447-Speed 3295.38 samples/sec   Loss 3.2311   LearningRate 0.0591   Epoch: 4   Global Step: 77210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:18,529-Speed 3323.55 samples/sec   Loss 3.2782   LearningRate 0.0591   Epoch: 4   Global Step: 77220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:21,613-Speed 3320.95 samples/sec   Loss 3.2901   LearningRate 0.0591   Epoch: 4   Global Step: 77230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:24,710-Speed 3307.82 samples/sec   Loss 3.2758   LearningRate 0.0591   Epoch: 4   Global Step: 77240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:27,787-Speed 3328.28 samples/sec   Loss 3.2258   LearningRate 0.0591   Epoch: 4   Global Step: 77250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:23:30,875-Speed 3317.12 samples/sec   Loss 3.3231   LearningRate 0.0591   Epoch: 4   Global Step: 77260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:23:33,979-Speed 3299.48 samples/sec   Loss 3.2122   LearningRate 0.0591   Epoch: 4   Global Step: 77270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:23:37,078-Speed 3305.54 samples/sec   Loss 3.2647   LearningRate 0.0591   Epoch: 4   Global Step: 77280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:23:40,252-Speed 3226.93 samples/sec   Loss 3.1911   LearningRate 0.0591   Epoch: 4   Global Step: 77290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:23:43,342-Speed 3314.89 samples/sec   Loss 3.2655   LearningRate 0.0590   Epoch: 4   Global Step: 77300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:23:46,438-Speed 3307.69 samples/sec   Loss 3.2373   LearningRate 0.0590   Epoch: 4   Global Step: 77310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:23:49,525-Speed 3318.37 samples/sec   Loss 3.3342   LearningRate 0.0590   Epoch: 4   Global Step: 77320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:23:52,651-Speed 3276.58 samples/sec   Loss 3.2882   LearningRate 0.0590   Epoch: 4   Global Step: 77330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:23:55,742-Speed 3313.50 samples/sec   Loss 3.2236   LearningRate 0.0590   Epoch: 4   Global Step: 77340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:23:58,806-Speed 3343.45 samples/sec   Loss 3.2733   LearningRate 0.0590   Epoch: 4   Global Step: 77350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:24:01,883-Speed 3328.95 samples/sec   Loss 3.2432   LearningRate 0.0590   Epoch: 4   Global Step: 77360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:04,948-Speed 3341.39 samples/sec   Loss 3.2339   LearningRate 0.0590   Epoch: 4   Global Step: 77370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:08,013-Speed 3342.23 samples/sec   Loss 3.1190   LearningRate 0.0590   Epoch: 4   Global Step: 77380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:11,128-Speed 3287.43 samples/sec   Loss 3.2257   LearningRate 0.0590   Epoch: 4   Global Step: 77390   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:14,196-Speed 3339.19 samples/sec   Loss 3.2423   LearningRate 0.0590   Epoch: 4   Global Step: 77400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:17,268-Speed 3333.35 samples/sec   Loss 3.1888   LearningRate 0.0590   Epoch: 4   Global Step: 77410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:20,343-Speed 3331.71 samples/sec   Loss 3.1720   LearningRate 0.0590   Epoch: 4   Global Step: 77420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:23,415-Speed 3333.82 samples/sec   Loss 3.2283   LearningRate 0.0590   Epoch: 4   Global Step: 77430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:26,481-Speed 3341.26 samples/sec   Loss 3.2210   LearningRate 0.0590   Epoch: 4   Global Step: 77440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:29,558-Speed 3328.40 samples/sec   Loss 3.2658   LearningRate 0.0590   Epoch: 4   Global Step: 77450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:24:32,621-Speed 3343.35 samples/sec   Loss 3.2975   LearningRate 0.0590   Epoch: 4   Global Step: 77460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:24:35,692-Speed 3335.29 samples/sec   Loss 3.3051   LearningRate 0.0590   Epoch: 4   Global Step: 77470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:24:38,758-Speed 3341.02 samples/sec   Loss 3.2475   LearningRate 0.0590   Epoch: 4   Global Step: 77480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:24:41,837-Speed 3326.46 samples/sec   Loss 3.2177   LearningRate 0.0590   Epoch: 4   Global Step: 77490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:24:44,902-Speed 3341.80 samples/sec   Loss 3.1943   LearningRate 0.0590   Epoch: 4   Global Step: 77500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:24:48,017-Speed 3287.82 samples/sec   Loss 3.2486   LearningRate 0.0590   Epoch: 4   Global Step: 77510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:24:51,079-Speed 3345.40 samples/sec   Loss 3.2842   LearningRate 0.0589   Epoch: 4   Global Step: 77520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:24:54,139-Speed 3347.32 samples/sec   Loss 3.2901   LearningRate 0.0589   Epoch: 4   Global Step: 77530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:24:57,245-Speed 3297.76 samples/sec   Loss 3.3368   LearningRate 0.0589   Epoch: 4   Global Step: 77540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:25:00,309-Speed 3342.88 samples/sec   Loss 3.3630   LearningRate 0.0589   Epoch: 4   Global Step: 77550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:25:03,376-Speed 3339.48 samples/sec   Loss 3.2465   LearningRate 0.0589   Epoch: 4   Global Step: 77560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:06,448-Speed 3333.97 samples/sec   Loss 3.1896   LearningRate 0.0589   Epoch: 4   Global Step: 77570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:09,515-Speed 3340.15 samples/sec   Loss 3.2200   LearningRate 0.0589   Epoch: 4   Global Step: 77580   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:12,575-Speed 3346.28 samples/sec   Loss 3.2701   LearningRate 0.0589   Epoch: 4   Global Step: 77590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:15,647-Speed 3334.97 samples/sec   Loss 3.2180   LearningRate 0.0589   Epoch: 4   Global Step: 77600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:18,711-Speed 3342.42 samples/sec   Loss 3.2500   LearningRate 0.0589   Epoch: 4   Global Step: 77610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:21,778-Speed 3339.61 samples/sec   Loss 3.3152   LearningRate 0.0589   Epoch: 4   Global Step: 77620   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:24,848-Speed 3336.77 samples/sec   Loss 3.2376   LearningRate 0.0589   Epoch: 4   Global Step: 77630   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:27,911-Speed 3344.25 samples/sec   Loss 3.2157   LearningRate 0.0589   Epoch: 4   Global Step: 77640   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:30,978-Speed 3338.98 samples/sec   Loss 3.2051   LearningRate 0.0589   Epoch: 4   Global Step: 77650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:34,056-Speed 3326.87 samples/sec   Loss 3.2417   LearningRate 0.0589   Epoch: 4   Global Step: 77660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:37,127-Speed 3336.09 samples/sec   Loss 3.2158   LearningRate 0.0589   Epoch: 4   Global Step: 77670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:40,204-Speed 3328.49 samples/sec   Loss 3.2678   LearningRate 0.0589   Epoch: 4   Global Step: 77680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:43,283-Speed 3326.00 samples/sec   Loss 3.1993   LearningRate 0.0589   Epoch: 4   Global Step: 77690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:25:46,382-Speed 3306.11 samples/sec   Loss 3.2003   LearningRate 0.0589   Epoch: 4   Global Step: 77700   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:25:49,521-Speed 3262.42 samples/sec   Loss 3.1605   LearningRate 0.0589   Epoch: 4   Global Step: 77710   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:25:52,627-Speed 3298.04 samples/sec   Loss 3.1907   LearningRate 0.0589   Epoch: 4   Global Step: 77720   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:25:55,688-Speed 3346.46 samples/sec   Loss 3.2113   LearningRate 0.0589   Epoch: 4   Global Step: 77730   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:25:58,753-Speed 3341.00 samples/sec   Loss 3.1818   LearningRate 0.0588   Epoch: 4   Global Step: 77740   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:26:01,851-Speed 3306.60 samples/sec   Loss 3.2340   LearningRate 0.0588   Epoch: 4   Global Step: 77750   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:26:05,081-Speed 3170.33 samples/sec   Loss 3.2820   LearningRate 0.0588   Epoch: 4   Global Step: 77760   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:26:08,157-Speed 3329.70 samples/sec   Loss 3.2422   LearningRate 0.0588   Epoch: 4   Global Step: 77770   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:26:11,218-Speed 3346.85 samples/sec   Loss 3.2596   LearningRate 0.0588   Epoch: 4   Global Step: 77780   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:26:14,286-Speed 3339.03 samples/sec   Loss 3.2392   LearningRate 0.0588   Epoch: 4   Global Step: 77790   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:26:17,354-Speed 3337.67 samples/sec   Loss 3.2027   LearningRate 0.0588   Epoch: 4   Global Step: 77800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:20,425-Speed 3335.89 samples/sec   Loss 3.2520   LearningRate 0.0588   Epoch: 4   Global Step: 77810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:23,499-Speed 3331.55 samples/sec   Loss 3.2256   LearningRate 0.0588   Epoch: 4   Global Step: 77820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:26,652-Speed 3248.94 samples/sec   Loss 3.2117   LearningRate 0.0588   Epoch: 4   Global Step: 77830   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:29,731-Speed 3326.19 samples/sec   Loss 3.2585   LearningRate 0.0588   Epoch: 4   Global Step: 77840   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:32,816-Speed 3319.83 samples/sec   Loss 3.2440   LearningRate 0.0588   Epoch: 4   Global Step: 77850   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:35,905-Speed 3315.92 samples/sec   Loss 3.2281   LearningRate 0.0588   Epoch: 4   Global Step: 77860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:38,966-Speed 3346.48 samples/sec   Loss 3.1888   LearningRate 0.0588   Epoch: 4   Global Step: 77870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:42,032-Speed 3340.97 samples/sec   Loss 3.1877   LearningRate 0.0588   Epoch: 4   Global Step: 77880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:45,096-Speed 3342.40 samples/sec   Loss 3.2299   LearningRate 0.0588   Epoch: 4   Global Step: 77890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:26:48,178-Speed 3323.28 samples/sec   Loss 3.2132   LearningRate 0.0588   Epoch: 4   Global Step: 77900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:26:51,247-Speed 3337.24 samples/sec   Loss 3.2258   LearningRate 0.0588   Epoch: 4   Global Step: 77910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:26:54,315-Speed 3338.42 samples/sec   Loss 3.2736   LearningRate 0.0588   Epoch: 4   Global Step: 77920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:26:57,394-Speed 3327.22 samples/sec   Loss 3.2450   LearningRate 0.0588   Epoch: 4   Global Step: 77930   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:27:00,492-Speed 3305.95 samples/sec   Loss 3.2222   LearningRate 0.0588   Epoch: 4   Global Step: 77940   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:27:03,571-Speed 3326.77 samples/sec   Loss 3.2590   LearningRate 0.0588   Epoch: 4   Global Step: 77950   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:27:06,672-Speed 3302.49 samples/sec   Loss 3.2192   LearningRate 0.0587   Epoch: 4   Global Step: 77960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:27:09,739-Speed 3339.52 samples/sec   Loss 3.2743   LearningRate 0.0587   Epoch: 4   Global Step: 77970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:27:12,860-Speed 3282.08 samples/sec   Loss 3.2159   LearningRate 0.0587   Epoch: 4   Global Step: 77980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:27:16,049-Speed 3211.47 samples/sec   Loss 3.2232   LearningRate 0.0587   Epoch: 4   Global Step: 77990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:27:19,159-Speed 3293.56 samples/sec   Loss 3.2361   LearningRate 0.0587   Epoch: 4   Global Step: 78000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:28:03,856-[lfw][78000]XNorm: 21.648035
Training: 2022-04-11 07:28:03,857-[lfw][78000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-11 07:28:03,857-[lfw][78000]Accuracy-Highest: 0.99817
Training: 2022-04-11 07:28:55,469-[cfp_fp][78000]XNorm: 20.130088
Training: 2022-04-11 07:28:55,470-[cfp_fp][78000]Accuracy-Flip: 0.98529+-0.00438
Training: 2022-04-11 07:28:55,470-[cfp_fp][78000]Accuracy-Highest: 0.98543
Training: 2022-04-11 07:29:39,793-[agedb_30][78000]XNorm: 21.563187
Training: 2022-04-11 07:29:39,794-[agedb_30][78000]Accuracy-Flip: 0.98117+-0.00654
Training: 2022-04-11 07:29:39,795-[agedb_30][78000]Accuracy-Highest: 0.98117
Training: 2022-04-11 07:29:42,925-Speed 71.23 samples/sec   Loss 3.2044   LearningRate 0.0587   Epoch: 4   Global Step: 78010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:29:45,990-Speed 3341.52 samples/sec   Loss 3.2842   LearningRate 0.0587   Epoch: 4   Global Step: 78020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:29:49,041-Speed 3356.50 samples/sec   Loss 3.1918   LearningRate 0.0587   Epoch: 4   Global Step: 78030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:29:52,105-Speed 3343.66 samples/sec   Loss 3.2404   LearningRate 0.0587   Epoch: 4   Global Step: 78040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:29:55,173-Speed 3338.42 samples/sec   Loss 3.2616   LearningRate 0.0587   Epoch: 4   Global Step: 78050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:29:58,245-Speed 3333.68 samples/sec   Loss 3.2635   LearningRate 0.0587   Epoch: 4   Global Step: 78060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:01,340-Speed 3309.73 samples/sec   Loss 3.1651   LearningRate 0.0587   Epoch: 4   Global Step: 78070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:04,414-Speed 3331.51 samples/sec   Loss 3.1832   LearningRate 0.0587   Epoch: 4   Global Step: 78080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:07,492-Speed 3327.51 samples/sec   Loss 3.1293   LearningRate 0.0587   Epoch: 4   Global Step: 78090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:10,594-Speed 3302.96 samples/sec   Loss 3.1738   LearningRate 0.0587   Epoch: 4   Global Step: 78100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:13,709-Speed 3287.63 samples/sec   Loss 3.4124   LearningRate 0.0587   Epoch: 4   Global Step: 78110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:16,875-Speed 3235.35 samples/sec   Loss 3.2528   LearningRate 0.0587   Epoch: 4   Global Step: 78120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:19,945-Speed 3336.54 samples/sec   Loss 3.3625   LearningRate 0.0587   Epoch: 4   Global Step: 78130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:23,011-Speed 3340.63 samples/sec   Loss 3.2329   LearningRate 0.0587   Epoch: 4   Global Step: 78140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:26,099-Speed 3316.98 samples/sec   Loss 3.2676   LearningRate 0.0587   Epoch: 4   Global Step: 78150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:29,161-Speed 3344.44 samples/sec   Loss 3.2618   LearningRate 0.0587   Epoch: 4   Global Step: 78160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:32,232-Speed 3335.27 samples/sec   Loss 3.2519   LearningRate 0.0586   Epoch: 4   Global Step: 78170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:35,297-Speed 3341.11 samples/sec   Loss 3.2689   LearningRate 0.0586   Epoch: 4   Global Step: 78180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:38,418-Speed 3282.38 samples/sec   Loss 3.1999   LearningRate 0.0586   Epoch: 4   Global Step: 78190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:41,489-Speed 3335.13 samples/sec   Loss 3.2740   LearningRate 0.0586   Epoch: 4   Global Step: 78200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:44,623-Speed 3268.09 samples/sec   Loss 3.2171   LearningRate 0.0586   Epoch: 4   Global Step: 78210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:47,746-Speed 3279.30 samples/sec   Loss 3.2293   LearningRate 0.0586   Epoch: 4   Global Step: 78220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:50,956-Speed 3191.76 samples/sec   Loss 3.2310   LearningRate 0.0586   Epoch: 4   Global Step: 78230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:54,095-Speed 3262.04 samples/sec   Loss 3.2160   LearningRate 0.0586   Epoch: 4   Global Step: 78240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:30:57,177-Speed 3323.93 samples/sec   Loss 3.2631   LearningRate 0.0586   Epoch: 4   Global Step: 78250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:00,307-Speed 3271.94 samples/sec   Loss 3.2625   LearningRate 0.0586   Epoch: 4   Global Step: 78260   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:03,365-Speed 3348.92 samples/sec   Loss 3.2052   LearningRate 0.0586   Epoch: 4   Global Step: 78270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:06,433-Speed 3338.45 samples/sec   Loss 3.1683   LearningRate 0.0586   Epoch: 4   Global Step: 78280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:09,494-Speed 3346.35 samples/sec   Loss 3.2052   LearningRate 0.0586   Epoch: 4   Global Step: 78290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:12,593-Speed 3304.80 samples/sec   Loss 3.2976   LearningRate 0.0586   Epoch: 4   Global Step: 78300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:15,760-Speed 3234.72 samples/sec   Loss 3.2555   LearningRate 0.0586   Epoch: 4   Global Step: 78310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:18,895-Speed 3267.56 samples/sec   Loss 3.2089   LearningRate 0.0586   Epoch: 4   Global Step: 78320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:21,986-Speed 3313.00 samples/sec   Loss 3.2191   LearningRate 0.0586   Epoch: 4   Global Step: 78330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:25,151-Speed 3236.43 samples/sec   Loss 3.1576   LearningRate 0.0586   Epoch: 4   Global Step: 78340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:28,227-Speed 3329.38 samples/sec   Loss 3.2464   LearningRate 0.0586   Epoch: 4   Global Step: 78350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:31,426-Speed 3201.68 samples/sec   Loss 3.2149   LearningRate 0.0586   Epoch: 4   Global Step: 78360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:31:34,529-Speed 3301.35 samples/sec   Loss 3.2220   LearningRate 0.0586   Epoch: 4   Global Step: 78370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:37,685-Speed 3244.52 samples/sec   Loss 3.3087   LearningRate 0.0586   Epoch: 4   Global Step: 78380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:40,841-Speed 3245.70 samples/sec   Loss 3.2651   LearningRate 0.0585   Epoch: 4   Global Step: 78390   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:43,942-Speed 3303.74 samples/sec   Loss 3.1623   LearningRate 0.0585   Epoch: 4   Global Step: 78400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:47,046-Speed 3299.81 samples/sec   Loss 3.0930   LearningRate 0.0585   Epoch: 4   Global Step: 78410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:50,175-Speed 3272.51 samples/sec   Loss 3.2163   LearningRate 0.0585   Epoch: 4   Global Step: 78420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:53,239-Speed 3343.29 samples/sec   Loss 3.2195   LearningRate 0.0585   Epoch: 4   Global Step: 78430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:56,346-Speed 3296.08 samples/sec   Loss 3.2460   LearningRate 0.0585   Epoch: 4   Global Step: 78440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:31:59,426-Speed 3325.90 samples/sec   Loss 3.2281   LearningRate 0.0585   Epoch: 4   Global Step: 78450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:02,493-Speed 3338.98 samples/sec   Loss 3.3063   LearningRate 0.0585   Epoch: 4   Global Step: 78460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:05,601-Speed 3295.65 samples/sec   Loss 3.1394   LearningRate 0.0585   Epoch: 4   Global Step: 78470   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:32:08,675-Speed 3331.93 samples/sec   Loss 3.1767   LearningRate 0.0585   Epoch: 4   Global Step: 78480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:11,739-Speed 3343.77 samples/sec   Loss 3.2641   LearningRate 0.0585   Epoch: 4   Global Step: 78490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:14,803-Speed 3342.54 samples/sec   Loss 3.3086   LearningRate 0.0585   Epoch: 4   Global Step: 78500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:17,865-Speed 3345.05 samples/sec   Loss 3.2576   LearningRate 0.0585   Epoch: 4   Global Step: 78510   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:20,948-Speed 3321.95 samples/sec   Loss 3.2202   LearningRate 0.0585   Epoch: 4   Global Step: 78520   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:24,029-Speed 3324.12 samples/sec   Loss 3.3220   LearningRate 0.0585   Epoch: 4   Global Step: 78530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:27,091-Speed 3344.83 samples/sec   Loss 3.1959   LearningRate 0.0585   Epoch: 4   Global Step: 78540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:30,238-Speed 3254.65 samples/sec   Loss 3.2139   LearningRate 0.0585   Epoch: 4   Global Step: 78550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:33,308-Speed 3336.70 samples/sec   Loss 3.1586   LearningRate 0.0585   Epoch: 4   Global Step: 78560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:36,411-Speed 3300.80 samples/sec   Loss 3.2539   LearningRate 0.0585   Epoch: 4   Global Step: 78570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:39,531-Speed 3283.05 samples/sec   Loss 3.3388   LearningRate 0.0585   Epoch: 4   Global Step: 78580   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:42,593-Speed 3345.50 samples/sec   Loss 3.2899   LearningRate 0.0585   Epoch: 4   Global Step: 78590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:45,657-Speed 3341.94 samples/sec   Loss 3.2167   LearningRate 0.0585   Epoch: 4   Global Step: 78600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:48,724-Speed 3340.43 samples/sec   Loss 3.1979   LearningRate 0.0584   Epoch: 4   Global Step: 78610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:51,786-Speed 3343.89 samples/sec   Loss 3.1842   LearningRate 0.0584   Epoch: 4   Global Step: 78620   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:54,867-Speed 3324.58 samples/sec   Loss 3.2602   LearningRate 0.0584   Epoch: 4   Global Step: 78630   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:32:57,972-Speed 3299.05 samples/sec   Loss 3.2187   LearningRate 0.0584   Epoch: 4   Global Step: 78640   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:33:01,081-Speed 3294.68 samples/sec   Loss 3.3325   LearningRate 0.0584   Epoch: 4   Global Step: 78650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:33:04,167-Speed 3318.35 samples/sec   Loss 3.2329   LearningRate 0.0584   Epoch: 4   Global Step: 78660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:33:07,308-Speed 3261.24 samples/sec   Loss 3.2261   LearningRate 0.0584   Epoch: 4   Global Step: 78670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:33:10,396-Speed 3316.92 samples/sec   Loss 3.1613   LearningRate 0.0584   Epoch: 4   Global Step: 78680   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:33:13,540-Speed 3258.68 samples/sec   Loss 3.2216   LearningRate 0.0584   Epoch: 4   Global Step: 78690   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:33:16,617-Speed 3328.46 samples/sec   Loss 3.2093   LearningRate 0.0584   Epoch: 4   Global Step: 78700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:33:19,685-Speed 3337.60 samples/sec   Loss 3.1420   LearningRate 0.0584   Epoch: 4   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:22,756-Speed 3335.00 samples/sec   Loss 3.2433   LearningRate 0.0584   Epoch: 4   Global Step: 78720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:25,829-Speed 3333.52 samples/sec   Loss 3.3027   LearningRate 0.0584   Epoch: 4   Global Step: 78730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:28,897-Speed 3338.04 samples/sec   Loss 3.2371   LearningRate 0.0584   Epoch: 4   Global Step: 78740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:31,969-Speed 3333.97 samples/sec   Loss 3.1920   LearningRate 0.0584   Epoch: 4   Global Step: 78750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:35,071-Speed 3302.21 samples/sec   Loss 3.1809   LearningRate 0.0584   Epoch: 4   Global Step: 78760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:38,207-Speed 3266.11 samples/sec   Loss 3.2304   LearningRate 0.0584   Epoch: 4   Global Step: 78770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:41,294-Speed 3317.78 samples/sec   Loss 3.2651   LearningRate 0.0584   Epoch: 4   Global Step: 78780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:44,387-Speed 3312.26 samples/sec   Loss 3.1908   LearningRate 0.0584   Epoch: 4   Global Step: 78790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:47,453-Speed 3340.22 samples/sec   Loss 3.1756   LearningRate 0.0584   Epoch: 4   Global Step: 78800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:33:50,533-Speed 3326.28 samples/sec   Loss 3.2469   LearningRate 0.0584   Epoch: 4   Global Step: 78810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:33:53,633-Speed 3302.94 samples/sec   Loss 3.2583   LearningRate 0.0584   Epoch: 4   Global Step: 78820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:33:56,708-Speed 3331.20 samples/sec   Loss 3.1724   LearningRate 0.0583   Epoch: 4   Global Step: 78830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:33:59,784-Speed 3330.31 samples/sec   Loss 3.1183   LearningRate 0.0583   Epoch: 4   Global Step: 78840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:34:02,870-Speed 3319.39 samples/sec   Loss 3.2767   LearningRate 0.0583   Epoch: 4   Global Step: 78850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:34:05,928-Speed 3348.99 samples/sec   Loss 3.1773   LearningRate 0.0583   Epoch: 4   Global Step: 78860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:08,995-Speed 3339.42 samples/sec   Loss 3.2781   LearningRate 0.0583   Epoch: 4   Global Step: 78870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:12,082-Speed 3317.58 samples/sec   Loss 3.3435   LearningRate 0.0583   Epoch: 4   Global Step: 78880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:15,167-Speed 3320.44 samples/sec   Loss 3.1898   LearningRate 0.0583   Epoch: 4   Global Step: 78890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:18,240-Speed 3333.42 samples/sec   Loss 3.1609   LearningRate 0.0583   Epoch: 4   Global Step: 78900   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:21,313-Speed 3332.61 samples/sec   Loss 3.3138   LearningRate 0.0583   Epoch: 4   Global Step: 78910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:24,395-Speed 3323.40 samples/sec   Loss 3.2600   LearningRate 0.0583   Epoch: 4   Global Step: 78920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:27,466-Speed 3335.37 samples/sec   Loss 3.2764   LearningRate 0.0583   Epoch: 4   Global Step: 78930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:30,528-Speed 3345.03 samples/sec   Loss 3.1514   LearningRate 0.0583   Epoch: 4   Global Step: 78940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:33,604-Speed 3330.13 samples/sec   Loss 3.1826   LearningRate 0.0583   Epoch: 4   Global Step: 78950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:34:36,670-Speed 3339.86 samples/sec   Loss 3.2236   LearningRate 0.0583   Epoch: 4   Global Step: 78960   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:34:39,746-Speed 3329.99 samples/sec   Loss 3.1490   LearningRate 0.0583   Epoch: 4   Global Step: 78970   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:34:42,822-Speed 3329.28 samples/sec   Loss 3.3395   LearningRate 0.0583   Epoch: 4   Global Step: 78980   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:34:45,907-Speed 3320.78 samples/sec   Loss 3.2414   LearningRate 0.0583   Epoch: 4   Global Step: 78990   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:34:48,980-Speed 3333.27 samples/sec   Loss 3.2421   LearningRate 0.0583   Epoch: 4   Global Step: 79000   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:34:52,044-Speed 3343.01 samples/sec   Loss 3.1868   LearningRate 0.0583   Epoch: 4   Global Step: 79010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:34:55,111-Speed 3338.99 samples/sec   Loss 3.1879   LearningRate 0.0583   Epoch: 4   Global Step: 79020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:34:58,186-Speed 3331.42 samples/sec   Loss 3.2348   LearningRate 0.0583   Epoch: 4   Global Step: 79030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:01,266-Speed 3324.90 samples/sec   Loss 3.2527   LearningRate 0.0583   Epoch: 4   Global Step: 79040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:04,343-Speed 3329.07 samples/sec   Loss 3.1556   LearningRate 0.0582   Epoch: 4   Global Step: 79050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:07,486-Speed 3259.22 samples/sec   Loss 3.1988   LearningRate 0.0582   Epoch: 4   Global Step: 79060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:10,572-Speed 3318.67 samples/sec   Loss 3.2478   LearningRate 0.0582   Epoch: 4   Global Step: 79070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:13,648-Speed 3330.06 samples/sec   Loss 3.2574   LearningRate 0.0582   Epoch: 4   Global Step: 79080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:16,724-Speed 3330.47 samples/sec   Loss 3.2013   LearningRate 0.0582   Epoch: 4   Global Step: 79090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:19,796-Speed 3333.83 samples/sec   Loss 3.2320   LearningRate 0.0582   Epoch: 4   Global Step: 79100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:22,919-Speed 3279.32 samples/sec   Loss 3.2067   LearningRate 0.0582   Epoch: 4   Global Step: 79110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:26,001-Speed 3323.41 samples/sec   Loss 3.2412   LearningRate 0.0582   Epoch: 4   Global Step: 79120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:29,078-Speed 3328.71 samples/sec   Loss 3.2060   LearningRate 0.0582   Epoch: 4   Global Step: 79130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:32,142-Speed 3342.68 samples/sec   Loss 3.1927   LearningRate 0.0582   Epoch: 4   Global Step: 79140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:35,205-Speed 3343.85 samples/sec   Loss 3.2248   LearningRate 0.0582   Epoch: 4   Global Step: 79150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:38,322-Speed 3285.97 samples/sec   Loss 3.1822   LearningRate 0.0582   Epoch: 4   Global Step: 79160   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:35:41,380-Speed 3349.82 samples/sec   Loss 3.2672   LearningRate 0.0582   Epoch: 4   Global Step: 79170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:44,446-Speed 3340.89 samples/sec   Loss 3.1724   LearningRate 0.0582   Epoch: 4   Global Step: 79180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:47,591-Speed 3256.02 samples/sec   Loss 3.1909   LearningRate 0.0582   Epoch: 4   Global Step: 79190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:50,729-Speed 3264.33 samples/sec   Loss 3.1503   LearningRate 0.0582   Epoch: 4   Global Step: 79200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:53,825-Speed 3308.92 samples/sec   Loss 3.1593   LearningRate 0.0582   Epoch: 4   Global Step: 79210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:56,926-Speed 3302.15 samples/sec   Loss 3.2508   LearningRate 0.0582   Epoch: 4   Global Step: 79220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:35:59,991-Speed 3342.46 samples/sec   Loss 3.1852   LearningRate 0.0582   Epoch: 4   Global Step: 79230   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:36:03,068-Speed 3328.19 samples/sec   Loss 3.1594   LearningRate 0.0582   Epoch: 4   Global Step: 79240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:36:06,136-Speed 3339.01 samples/sec   Loss 3.2146   LearningRate 0.0582   Epoch: 4   Global Step: 79250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:36:09,217-Speed 3324.45 samples/sec   Loss 3.2006   LearningRate 0.0582   Epoch: 4   Global Step: 79260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:12,300-Speed 3322.55 samples/sec   Loss 3.1304   LearningRate 0.0581   Epoch: 4   Global Step: 79270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:15,374-Speed 3331.17 samples/sec   Loss 3.1707   LearningRate 0.0581   Epoch: 4   Global Step: 79280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:18,445-Speed 3335.95 samples/sec   Loss 3.1933   LearningRate 0.0581   Epoch: 4   Global Step: 79290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:21,544-Speed 3305.00 samples/sec   Loss 3.1791   LearningRate 0.0581   Epoch: 4   Global Step: 79300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:24,616-Speed 3333.19 samples/sec   Loss 3.2606   LearningRate 0.0581   Epoch: 4   Global Step: 79310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:27,812-Speed 3205.93 samples/sec   Loss 3.1928   LearningRate 0.0581   Epoch: 4   Global Step: 79320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:31,018-Speed 3194.40 samples/sec   Loss 3.1887   LearningRate 0.0581   Epoch: 4   Global Step: 79330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:34,093-Speed 3331.04 samples/sec   Loss 3.1905   LearningRate 0.0581   Epoch: 4   Global Step: 79340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:37,157-Speed 3342.79 samples/sec   Loss 3.2037   LearningRate 0.0581   Epoch: 4   Global Step: 79350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:36:40,231-Speed 3331.20 samples/sec   Loss 3.1530   LearningRate 0.0581   Epoch: 4   Global Step: 79360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:36:43,299-Speed 3339.60 samples/sec   Loss 3.1743   LearningRate 0.0581   Epoch: 4   Global Step: 79370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:36:46,390-Speed 3313.03 samples/sec   Loss 3.2950   LearningRate 0.0581   Epoch: 4   Global Step: 79380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:36:49,461-Speed 3335.38 samples/sec   Loss 3.2409   LearningRate 0.0581   Epoch: 4   Global Step: 79390   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:36:52,556-Speed 3309.13 samples/sec   Loss 3.2609   LearningRate 0.0581   Epoch: 4   Global Step: 79400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:36:55,740-Speed 3216.68 samples/sec   Loss 3.2317   LearningRate 0.0581   Epoch: 4   Global Step: 79410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:36:58,869-Speed 3273.41 samples/sec   Loss 3.2359   LearningRate 0.0581   Epoch: 4   Global Step: 79420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:37:01,983-Speed 3289.59 samples/sec   Loss 3.2552   LearningRate 0.0581   Epoch: 4   Global Step: 79430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:37:05,133-Speed 3252.27 samples/sec   Loss 3.1723   LearningRate 0.0581   Epoch: 4   Global Step: 79440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:37:08,221-Speed 3316.97 samples/sec   Loss 3.1586   LearningRate 0.0581   Epoch: 4   Global Step: 79450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:37:11,274-Speed 3354.93 samples/sec   Loss 3.1971   LearningRate 0.0581   Epoch: 4   Global Step: 79460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:14,337-Speed 3343.17 samples/sec   Loss 3.1730   LearningRate 0.0581   Epoch: 4   Global Step: 79470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:17,402-Speed 3341.92 samples/sec   Loss 3.1655   LearningRate 0.0581   Epoch: 4   Global Step: 79480   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:20,485-Speed 3322.28 samples/sec   Loss 3.2081   LearningRate 0.0580   Epoch: 4   Global Step: 79490   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:23,553-Speed 3338.32 samples/sec   Loss 3.2288   LearningRate 0.0580   Epoch: 4   Global Step: 79500   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:26,638-Speed 3320.35 samples/sec   Loss 3.2367   LearningRate 0.0580   Epoch: 4   Global Step: 79510   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:29,716-Speed 3327.45 samples/sec   Loss 3.2120   LearningRate 0.0580   Epoch: 4   Global Step: 79520   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:32,779-Speed 3343.70 samples/sec   Loss 3.1881   LearningRate 0.0580   Epoch: 4   Global Step: 79530   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:35,857-Speed 3328.36 samples/sec   Loss 3.1766   LearningRate 0.0580   Epoch: 4   Global Step: 79540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:38,939-Speed 3322.78 samples/sec   Loss 3.1627   LearningRate 0.0580   Epoch: 4   Global Step: 79550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:37:42,005-Speed 3340.58 samples/sec   Loss 3.2624   LearningRate 0.0580   Epoch: 4   Global Step: 79560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:37:45,076-Speed 3335.32 samples/sec   Loss 3.2166   LearningRate 0.0580   Epoch: 4   Global Step: 79570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:37:48,146-Speed 3336.34 samples/sec   Loss 3.1004   LearningRate 0.0580   Epoch: 4   Global Step: 79580   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:37:51,211-Speed 3341.13 samples/sec   Loss 3.1853   LearningRate 0.0580   Epoch: 4   Global Step: 79590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:37:54,283-Speed 3334.62 samples/sec   Loss 3.1882   LearningRate 0.0580   Epoch: 4   Global Step: 79600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:37:57,352-Speed 3336.81 samples/sec   Loss 3.2578   LearningRate 0.0580   Epoch: 4   Global Step: 79610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:00,427-Speed 3331.89 samples/sec   Loss 3.2107   LearningRate 0.0580   Epoch: 4   Global Step: 79620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:03,502-Speed 3330.68 samples/sec   Loss 3.2940   LearningRate 0.0580   Epoch: 4   Global Step: 79630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:06,574-Speed 3334.54 samples/sec   Loss 3.1659   LearningRate 0.0580   Epoch: 4   Global Step: 79640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:09,644-Speed 3336.02 samples/sec   Loss 3.2882   LearningRate 0.0580   Epoch: 4   Global Step: 79650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:12,719-Speed 3330.07 samples/sec   Loss 3.1298   LearningRate 0.0580   Epoch: 4   Global Step: 79660   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:15,818-Speed 3305.53 samples/sec   Loss 3.1783   LearningRate 0.0580   Epoch: 4   Global Step: 79670   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:18,897-Speed 3326.46 samples/sec   Loss 3.2599   LearningRate 0.0580   Epoch: 4   Global Step: 79680   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:22,019-Speed 3281.02 samples/sec   Loss 3.2415   LearningRate 0.0580   Epoch: 4   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:25,116-Speed 3307.21 samples/sec   Loss 3.1545   LearningRate 0.0579   Epoch: 4   Global Step: 79700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:28,191-Speed 3330.60 samples/sec   Loss 3.2244   LearningRate 0.0579   Epoch: 4   Global Step: 79710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:38:31,270-Speed 3327.32 samples/sec   Loss 3.2593   LearningRate 0.0579   Epoch: 4   Global Step: 79720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:34,362-Speed 3312.42 samples/sec   Loss 3.2336   LearningRate 0.0579   Epoch: 4   Global Step: 79730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:37,426-Speed 3342.66 samples/sec   Loss 3.2482   LearningRate 0.0579   Epoch: 4   Global Step: 79740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:40,490-Speed 3342.57 samples/sec   Loss 3.2566   LearningRate 0.0579   Epoch: 4   Global Step: 79750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:43,564-Speed 3332.05 samples/sec   Loss 3.2129   LearningRate 0.0579   Epoch: 4   Global Step: 79760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:46,655-Speed 3313.81 samples/sec   Loss 3.2218   LearningRate 0.0579   Epoch: 4   Global Step: 79770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:49,746-Speed 3312.91 samples/sec   Loss 3.2131   LearningRate 0.0579   Epoch: 4   Global Step: 79780   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:52,818-Speed 3334.53 samples/sec   Loss 3.2110   LearningRate 0.0579   Epoch: 4   Global Step: 79790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:55,902-Speed 3321.99 samples/sec   Loss 3.1322   LearningRate 0.0579   Epoch: 4   Global Step: 79800   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:38:59,059-Speed 3243.51 samples/sec   Loss 3.1382   LearningRate 0.0579   Epoch: 4   Global Step: 79810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:02,265-Speed 3195.12 samples/sec   Loss 3.2048   LearningRate 0.0579   Epoch: 4   Global Step: 79820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:05,371-Speed 3297.90 samples/sec   Loss 3.2028   LearningRate 0.0579   Epoch: 4   Global Step: 79830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:08,508-Speed 3264.79 samples/sec   Loss 3.1584   LearningRate 0.0579   Epoch: 4   Global Step: 79840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:11,613-Speed 3298.07 samples/sec   Loss 3.1968   LearningRate 0.0579   Epoch: 4   Global Step: 79850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:14,781-Speed 3233.90 samples/sec   Loss 3.2257   LearningRate 0.0579   Epoch: 4   Global Step: 79860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:17,992-Speed 3189.79 samples/sec   Loss 3.2023   LearningRate 0.0579   Epoch: 4   Global Step: 79870   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:21,084-Speed 3312.30 samples/sec   Loss 3.1055   LearningRate 0.0579   Epoch: 4   Global Step: 79880   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:24,176-Speed 3313.61 samples/sec   Loss 3.2167   LearningRate 0.0579   Epoch: 4   Global Step: 79890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:27,259-Speed 3321.95 samples/sec   Loss 3.1943   LearningRate 0.0579   Epoch: 4   Global Step: 79900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:30,353-Speed 3310.00 samples/sec   Loss 3.1778   LearningRate 0.0579   Epoch: 4   Global Step: 79910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:39:33,395-Speed 3366.73 samples/sec   Loss 3.3071   LearningRate 0.0578   Epoch: 4   Global Step: 79920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:39:36,463-Speed 3339.49 samples/sec   Loss 3.1830   LearningRate 0.0578   Epoch: 4   Global Step: 79930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:39:39,552-Speed 3315.55 samples/sec   Loss 3.1453   LearningRate 0.0578   Epoch: 4   Global Step: 79940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:39:42,617-Speed 3341.20 samples/sec   Loss 3.2348   LearningRate 0.0578   Epoch: 4   Global Step: 79950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:39:45,683-Speed 3340.48 samples/sec   Loss 3.2124   LearningRate 0.0578   Epoch: 4   Global Step: 79960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:39:48,760-Speed 3329.21 samples/sec   Loss 3.0997   LearningRate 0.0578   Epoch: 4   Global Step: 79970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:39:51,837-Speed 3328.71 samples/sec   Loss 3.2458   LearningRate 0.0578   Epoch: 4   Global Step: 79980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:39:55,024-Speed 3214.35 samples/sec   Loss 3.2516   LearningRate 0.0578   Epoch: 4   Global Step: 79990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:39:58,166-Speed 3259.87 samples/sec   Loss 3.2508   LearningRate 0.0578   Epoch: 4   Global Step: 80000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:40:42,146-[lfw][80000]XNorm: 23.115559
Training: 2022-04-11 07:40:42,146-[lfw][80000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-11 07:40:42,147-[lfw][80000]Accuracy-Highest: 0.99817
Training: 2022-04-11 07:41:33,245-[cfp_fp][80000]XNorm: 21.707295
Training: 2022-04-11 07:41:33,245-[cfp_fp][80000]Accuracy-Flip: 0.98286+-0.00658
Training: 2022-04-11 07:41:33,246-[cfp_fp][80000]Accuracy-Highest: 0.98543
Training: 2022-04-11 07:42:17,242-[agedb_30][80000]XNorm: 23.286348
Training: 2022-04-11 07:42:17,242-[agedb_30][80000]Accuracy-Flip: 0.97800+-0.00674
Training: 2022-04-11 07:42:17,243-[agedb_30][80000]Accuracy-Highest: 0.98117
Training: 2022-04-11 07:42:20,301-Speed 72.04 samples/sec   Loss 3.1859   LearningRate 0.0578   Epoch: 4   Global Step: 80010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:42:23,355-Speed 3354.36 samples/sec   Loss 3.2856   LearningRate 0.0578   Epoch: 4   Global Step: 80020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:42:26,415-Speed 3347.62 samples/sec   Loss 3.1633   LearningRate 0.0578   Epoch: 4   Global Step: 80030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:42:29,469-Speed 3353.24 samples/sec   Loss 3.1052   LearningRate 0.0578   Epoch: 4   Global Step: 80040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:42:32,530-Speed 3346.04 samples/sec   Loss 3.1972   LearningRate 0.0578   Epoch: 4   Global Step: 80050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:42:35,585-Speed 3352.39 samples/sec   Loss 3.0896   LearningRate 0.0578   Epoch: 4   Global Step: 80060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:42:38,678-Speed 3312.02 samples/sec   Loss 3.1576   LearningRate 0.0578   Epoch: 4   Global Step: 80070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:42:41,747-Speed 3337.26 samples/sec   Loss 3.2014   LearningRate 0.0578   Epoch: 4   Global Step: 80080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:42:44,804-Speed 3350.15 samples/sec   Loss 3.2038   LearningRate 0.0578   Epoch: 4   Global Step: 80090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:42:47,860-Speed 3351.45 samples/sec   Loss 3.2535   LearningRate 0.0578   Epoch: 4   Global Step: 80100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:42:50,917-Speed 3351.08 samples/sec   Loss 3.1643   LearningRate 0.0578   Epoch: 4   Global Step: 80110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:42:53,977-Speed 3347.38 samples/sec   Loss 3.2328   LearningRate 0.0578   Epoch: 4   Global Step: 80120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:42:57,036-Speed 3348.14 samples/sec   Loss 3.1873   LearningRate 0.0578   Epoch: 4   Global Step: 80130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:00,091-Speed 3351.98 samples/sec   Loss 3.0983   LearningRate 0.0577   Epoch: 4   Global Step: 80140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:03,183-Speed 3313.14 samples/sec   Loss 3.1997   LearningRate 0.0577   Epoch: 4   Global Step: 80150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:06,242-Speed 3348.39 samples/sec   Loss 3.2006   LearningRate 0.0577   Epoch: 4   Global Step: 80160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:09,331-Speed 3315.20 samples/sec   Loss 3.1229   LearningRate 0.0577   Epoch: 4   Global Step: 80170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:43:12,400-Speed 3337.79 samples/sec   Loss 3.1151   LearningRate 0.0577   Epoch: 4   Global Step: 80180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:43:15,458-Speed 3349.73 samples/sec   Loss 3.2259   LearningRate 0.0577   Epoch: 4   Global Step: 80190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:43:18,552-Speed 3309.76 samples/sec   Loss 3.1958   LearningRate 0.0577   Epoch: 4   Global Step: 80200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:43:21,620-Speed 3338.72 samples/sec   Loss 3.2528   LearningRate 0.0577   Epoch: 4   Global Step: 80210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:43:24,689-Speed 3337.39 samples/sec   Loss 3.1136   LearningRate 0.0577   Epoch: 4   Global Step: 80220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:43:27,747-Speed 3349.91 samples/sec   Loss 3.2335   LearningRate 0.0577   Epoch: 4   Global Step: 80230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:30,809-Speed 3344.19 samples/sec   Loss 3.2098   LearningRate 0.0577   Epoch: 4   Global Step: 80240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:33,879-Speed 3336.67 samples/sec   Loss 3.1625   LearningRate 0.0577   Epoch: 4   Global Step: 80250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:36,946-Speed 3339.04 samples/sec   Loss 3.1827   LearningRate 0.0577   Epoch: 4   Global Step: 80260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:40,017-Speed 3335.49 samples/sec   Loss 3.0925   LearningRate 0.0577   Epoch: 4   Global Step: 80270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:43,099-Speed 3324.01 samples/sec   Loss 3.2095   LearningRate 0.0577   Epoch: 4   Global Step: 80280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:46,166-Speed 3339.14 samples/sec   Loss 3.2249   LearningRate 0.0577   Epoch: 4   Global Step: 80290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:49,237-Speed 3335.78 samples/sec   Loss 3.1810   LearningRate 0.0577   Epoch: 4   Global Step: 80300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:52,310-Speed 3332.50 samples/sec   Loss 3.2493   LearningRate 0.0577   Epoch: 4   Global Step: 80310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:55,401-Speed 3314.05 samples/sec   Loss 3.1860   LearningRate 0.0577   Epoch: 4   Global Step: 80320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:43:58,559-Speed 3243.07 samples/sec   Loss 3.1152   LearningRate 0.0577   Epoch: 4   Global Step: 80330   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:44:01,653-Speed 3310.57 samples/sec   Loss 3.2033   LearningRate 0.0577   Epoch: 4   Global Step: 80340   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:44:04,758-Speed 3298.49 samples/sec   Loss 3.0935   LearningRate 0.0577   Epoch: 4   Global Step: 80350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:44:07,881-Speed 3280.44 samples/sec   Loss 3.2205   LearningRate 0.0576   Epoch: 4   Global Step: 80360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:44:11,014-Speed 3269.77 samples/sec   Loss 3.2742   LearningRate 0.0576   Epoch: 4   Global Step: 80370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:44:14,095-Speed 3323.58 samples/sec   Loss 3.1719   LearningRate 0.0576   Epoch: 4   Global Step: 80380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:17,160-Speed 3342.50 samples/sec   Loss 3.1720   LearningRate 0.0576   Epoch: 4   Global Step: 80390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:20,221-Speed 3345.87 samples/sec   Loss 3.2374   LearningRate 0.0576   Epoch: 4   Global Step: 80400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:23,312-Speed 3313.03 samples/sec   Loss 3.2679   LearningRate 0.0576   Epoch: 4   Global Step: 80410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:26,392-Speed 3325.97 samples/sec   Loss 3.2031   LearningRate 0.0576   Epoch: 4   Global Step: 80420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:29,523-Speed 3270.51 samples/sec   Loss 3.1088   LearningRate 0.0576   Epoch: 4   Global Step: 80430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:32,640-Speed 3286.34 samples/sec   Loss 3.1771   LearningRate 0.0576   Epoch: 4   Global Step: 80440   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:35,717-Speed 3328.94 samples/sec   Loss 3.1582   LearningRate 0.0576   Epoch: 4   Global Step: 80450   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:38,786-Speed 3337.12 samples/sec   Loss 3.2251   LearningRate 0.0576   Epoch: 4   Global Step: 80460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:41,863-Speed 3328.81 samples/sec   Loss 3.2024   LearningRate 0.0576   Epoch: 4   Global Step: 80470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:44:44,924-Speed 3346.51 samples/sec   Loss 3.1715   LearningRate 0.0576   Epoch: 4   Global Step: 80480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:44:48,020-Speed 3308.39 samples/sec   Loss 3.2193   LearningRate 0.0576   Epoch: 4   Global Step: 80490   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:44:51,111-Speed 3313.18 samples/sec   Loss 3.1591   LearningRate 0.0576   Epoch: 4   Global Step: 80500   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:44:54,218-Speed 3297.28 samples/sec   Loss 3.1248   LearningRate 0.0576   Epoch: 4   Global Step: 80510   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:44:57,279-Speed 3345.52 samples/sec   Loss 3.1155   LearningRate 0.0576   Epoch: 4   Global Step: 80520   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:00,368-Speed 3316.07 samples/sec   Loss 3.1185   LearningRate 0.0576   Epoch: 4   Global Step: 80530   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:03,440-Speed 3334.67 samples/sec   Loss 3.2121   LearningRate 0.0576   Epoch: 4   Global Step: 80540   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:06,503-Speed 3344.19 samples/sec   Loss 3.2006   LearningRate 0.0576   Epoch: 4   Global Step: 80550   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:09,596-Speed 3310.39 samples/sec   Loss 3.1425   LearningRate 0.0576   Epoch: 4   Global Step: 80560   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:12,659-Speed 3344.52 samples/sec   Loss 3.1967   LearningRate 0.0576   Epoch: 4   Global Step: 80570   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:15,720-Speed 3345.87 samples/sec   Loss 3.2173   LearningRate 0.0575   Epoch: 4   Global Step: 80580   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:45:18,771-Speed 3357.26 samples/sec   Loss 3.1329   LearningRate 0.0575   Epoch: 4   Global Step: 80590   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:21,831-Speed 3346.62 samples/sec   Loss 3.2035   LearningRate 0.0575   Epoch: 4   Global Step: 80600   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:24,901-Speed 3336.86 samples/sec   Loss 3.2338   LearningRate 0.0575   Epoch: 4   Global Step: 80610   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:27,969-Speed 3338.47 samples/sec   Loss 3.1111   LearningRate 0.0575   Epoch: 4   Global Step: 80620   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:31,103-Speed 3268.13 samples/sec   Loss 3.2058   LearningRate 0.0575   Epoch: 4   Global Step: 80630   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:34,262-Speed 3242.51 samples/sec   Loss 3.1127   LearningRate 0.0575   Epoch: 4   Global Step: 80640   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:37,356-Speed 3310.71 samples/sec   Loss 3.0390   LearningRate 0.0575   Epoch: 4   Global Step: 80650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:40,423-Speed 3339.48 samples/sec   Loss 3.1470   LearningRate 0.0575   Epoch: 4   Global Step: 80660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:43,484-Speed 3345.95 samples/sec   Loss 3.1512   LearningRate 0.0575   Epoch: 4   Global Step: 80670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:46,560-Speed 3329.57 samples/sec   Loss 3.1572   LearningRate 0.0575   Epoch: 4   Global Step: 80680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:49,618-Speed 3349.69 samples/sec   Loss 3.1275   LearningRate 0.0575   Epoch: 4   Global Step: 80690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:52,789-Speed 3229.65 samples/sec   Loss 3.1654   LearningRate 0.0575   Epoch: 4   Global Step: 80700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:55,937-Speed 3253.70 samples/sec   Loss 3.1642   LearningRate 0.0575   Epoch: 4   Global Step: 80710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:45:59,018-Speed 3324.53 samples/sec   Loss 3.1542   LearningRate 0.0575   Epoch: 4   Global Step: 80720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:46:02,103-Speed 3319.88 samples/sec   Loss 3.2042   LearningRate 0.0575   Epoch: 4   Global Step: 80730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:46:05,167-Speed 3343.11 samples/sec   Loss 3.3176   LearningRate 0.0575   Epoch: 4   Global Step: 80740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:46:08,230-Speed 3343.90 samples/sec   Loss 3.1866   LearningRate 0.0575   Epoch: 4   Global Step: 80750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:46:11,297-Speed 3338.92 samples/sec   Loss 3.1867   LearningRate 0.0575   Epoch: 4   Global Step: 80760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:46:14,362-Speed 3341.69 samples/sec   Loss 3.1680   LearningRate 0.0575   Epoch: 4   Global Step: 80770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:46:17,433-Speed 3336.10 samples/sec   Loss 3.2390   LearningRate 0.0575   Epoch: 4   Global Step: 80780   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:46:20,500-Speed 3339.84 samples/sec   Loss 3.1363   LearningRate 0.0575   Epoch: 4   Global Step: 80790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:46:23,559-Speed 3348.10 samples/sec   Loss 3.1907   LearningRate 0.0574   Epoch: 4   Global Step: 80800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:26,635-Speed 3329.68 samples/sec   Loss 3.1556   LearningRate 0.0574   Epoch: 4   Global Step: 80810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:29,725-Speed 3314.70 samples/sec   Loss 3.1763   LearningRate 0.0574   Epoch: 4   Global Step: 80820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:32,793-Speed 3338.08 samples/sec   Loss 3.2162   LearningRate 0.0574   Epoch: 4   Global Step: 80830   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:35,863-Speed 3336.44 samples/sec   Loss 3.1868   LearningRate 0.0574   Epoch: 4   Global Step: 80840   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:38,961-Speed 3305.99 samples/sec   Loss 3.2039   LearningRate 0.0574   Epoch: 4   Global Step: 80850   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:42,128-Speed 3233.98 samples/sec   Loss 3.1890   LearningRate 0.0574   Epoch: 4   Global Step: 80860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:45,193-Speed 3342.03 samples/sec   Loss 3.1213   LearningRate 0.0574   Epoch: 4   Global Step: 80870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:48,267-Speed 3331.95 samples/sec   Loss 3.2694   LearningRate 0.0574   Epoch: 4   Global Step: 80880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:51,340-Speed 3333.86 samples/sec   Loss 3.2636   LearningRate 0.0574   Epoch: 4   Global Step: 80890   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:46:54,418-Speed 3327.41 samples/sec   Loss 3.1756   LearningRate 0.0574   Epoch: 4   Global Step: 80900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:46:57,496-Speed 3327.13 samples/sec   Loss 3.1511   LearningRate 0.0574   Epoch: 4   Global Step: 80910   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:00,596-Speed 3304.07 samples/sec   Loss 3.2215   LearningRate 0.0574   Epoch: 4   Global Step: 80920   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:03,748-Speed 3249.70 samples/sec   Loss 3.2563   LearningRate 0.0574   Epoch: 4   Global Step: 80930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:06,866-Speed 3285.43 samples/sec   Loss 3.1529   LearningRate 0.0574   Epoch: 4   Global Step: 80940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:09,947-Speed 3323.98 samples/sec   Loss 3.1504   LearningRate 0.0574   Epoch: 4   Global Step: 80950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:13,042-Speed 3308.99 samples/sec   Loss 3.1831   LearningRate 0.0574   Epoch: 4   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:16,108-Speed 3340.64 samples/sec   Loss 3.1222   LearningRate 0.0574   Epoch: 4   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:19,176-Speed 3340.60 samples/sec   Loss 3.1185   LearningRate 0.0574   Epoch: 4   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:22,244-Speed 3337.87 samples/sec   Loss 3.2166   LearningRate 0.0574   Epoch: 4   Global Step: 80990   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:25,308-Speed 3343.04 samples/sec   Loss 3.1494   LearningRate 0.0574   Epoch: 4   Global Step: 81000   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:28,373-Speed 3341.59 samples/sec   Loss 3.1755   LearningRate 0.0574   Epoch: 4   Global Step: 81010   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:31,458-Speed 3320.19 samples/sec   Loss 3.1065   LearningRate 0.0573   Epoch: 4   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:47:34,529-Speed 3334.76 samples/sec   Loss 3.1904   LearningRate 0.0573   Epoch: 4   Global Step: 81030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:37,673-Speed 3258.06 samples/sec   Loss 3.1002   LearningRate 0.0573   Epoch: 4   Global Step: 81040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:40,780-Speed 3296.83 samples/sec   Loss 3.1570   LearningRate 0.0573   Epoch: 4   Global Step: 81050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:43,924-Speed 3257.85 samples/sec   Loss 3.1874   LearningRate 0.0573   Epoch: 4   Global Step: 81060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:47,027-Speed 3300.26 samples/sec   Loss 3.1339   LearningRate 0.0573   Epoch: 4   Global Step: 81070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:50,114-Speed 3318.48 samples/sec   Loss 3.3023   LearningRate 0.0573   Epoch: 4   Global Step: 81080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:53,232-Speed 3285.00 samples/sec   Loss 3.1837   LearningRate 0.0573   Epoch: 4   Global Step: 81090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:56,295-Speed 3343.53 samples/sec   Loss 3.0350   LearningRate 0.0573   Epoch: 4   Global Step: 81100   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:47:59,378-Speed 3321.88 samples/sec   Loss 3.2353   LearningRate 0.0573   Epoch: 4   Global Step: 81110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:02,447-Speed 3338.18 samples/sec   Loss 3.0543   LearningRate 0.0573   Epoch: 4   Global Step: 81120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:05,502-Speed 3353.15 samples/sec   Loss 3.1880   LearningRate 0.0573   Epoch: 4   Global Step: 81130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:08,568-Speed 3340.18 samples/sec   Loss 3.1660   LearningRate 0.0573   Epoch: 4   Global Step: 81140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:11,636-Speed 3338.47 samples/sec   Loss 3.1183   LearningRate 0.0573   Epoch: 4   Global Step: 81150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:14,701-Speed 3341.95 samples/sec   Loss 3.2339   LearningRate 0.0573   Epoch: 4   Global Step: 81160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:17,803-Speed 3301.33 samples/sec   Loss 3.2071   LearningRate 0.0573   Epoch: 4   Global Step: 81170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:20,875-Speed 3334.72 samples/sec   Loss 3.1861   LearningRate 0.0573   Epoch: 4   Global Step: 81180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:24,005-Speed 3271.40 samples/sec   Loss 3.2386   LearningRate 0.0573   Epoch: 4   Global Step: 81190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:27,074-Speed 3337.86 samples/sec   Loss 3.1953   LearningRate 0.0573   Epoch: 4   Global Step: 81200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:30,186-Speed 3291.68 samples/sec   Loss 3.1747   LearningRate 0.0573   Epoch: 4   Global Step: 81210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:33,251-Speed 3341.40 samples/sec   Loss 3.1263   LearningRate 0.0573   Epoch: 4   Global Step: 81220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:36,312-Speed 3346.61 samples/sec   Loss 3.1734   LearningRate 0.0573   Epoch: 4   Global Step: 81230   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:48:39,367-Speed 3352.82 samples/sec   Loss 3.1649   LearningRate 0.0572   Epoch: 4   Global Step: 81240   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:42,451-Speed 3321.17 samples/sec   Loss 3.1466   LearningRate 0.0572   Epoch: 4   Global Step: 81250   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:45,515-Speed 3342.35 samples/sec   Loss 3.2862   LearningRate 0.0572   Epoch: 4   Global Step: 81260   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:48:48,623-Speed 3295.86 samples/sec   Loss 3.0720   LearningRate 0.0572   Epoch: 4   Global Step: 81270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:48:51,721-Speed 3305.88 samples/sec   Loss 3.1933   LearningRate 0.0572   Epoch: 4   Global Step: 81280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:48:54,806-Speed 3319.91 samples/sec   Loss 3.1781   LearningRate 0.0572   Epoch: 4   Global Step: 81290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:48:57,939-Speed 3269.45 samples/sec   Loss 3.1817   LearningRate 0.0572   Epoch: 4   Global Step: 81300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:49:01,024-Speed 3320.26 samples/sec   Loss 3.1749   LearningRate 0.0572   Epoch: 4   Global Step: 81310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:49:04,089-Speed 3341.46 samples/sec   Loss 3.1602   LearningRate 0.0572   Epoch: 4   Global Step: 81320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:49:07,154-Speed 3342.24 samples/sec   Loss 3.1583   LearningRate 0.0572   Epoch: 4   Global Step: 81330   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:49:10,216-Speed 3344.97 samples/sec   Loss 3.1188   LearningRate 0.0572   Epoch: 4   Global Step: 81340   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:49:13,332-Speed 3286.71 samples/sec   Loss 3.2456   LearningRate 0.0572   Epoch: 4   Global Step: 81350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:49:16,397-Speed 3341.45 samples/sec   Loss 3.1411   LearningRate 0.0572   Epoch: 4   Global Step: 81360   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:49:19,466-Speed 3337.41 samples/sec   Loss 3.1989   LearningRate 0.0572   Epoch: 4   Global Step: 81370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:49:22,666-Speed 3200.98 samples/sec   Loss 3.1371   LearningRate 0.0572   Epoch: 4   Global Step: 81380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:49:25,779-Speed 3290.58 samples/sec   Loss 3.1698   LearningRate 0.0572   Epoch: 4   Global Step: 81390   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:49:28,873-Speed 3310.11 samples/sec   Loss 3.1556   LearningRate 0.0572   Epoch: 4   Global Step: 81400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:49:31,944-Speed 3334.82 samples/sec   Loss 3.1197   LearningRate 0.0572   Epoch: 4   Global Step: 81410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:49:35,014-Speed 3336.95 samples/sec   Loss 3.1092   LearningRate 0.0572   Epoch: 4   Global Step: 81420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:49:38,079-Speed 3341.83 samples/sec   Loss 3.1499   LearningRate 0.0572   Epoch: 4   Global Step: 81430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:49:41,188-Speed 3293.83 samples/sec   Loss 3.1244   LearningRate 0.0572   Epoch: 4   Global Step: 81440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:49:44,269-Speed 3324.33 samples/sec   Loss 3.1790   LearningRate 0.0572   Epoch: 4   Global Step: 81450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:49:47,312-Speed 3366.43 samples/sec   Loss 3.1832   LearningRate 0.0572   Epoch: 4   Global Step: 81460   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:49:50,381-Speed 3337.48 samples/sec   Loss 3.1280   LearningRate 0.0571   Epoch: 4   Global Step: 81470   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:49:53,446-Speed 3341.81 samples/sec   Loss 3.1353   LearningRate 0.0571   Epoch: 4   Global Step: 81480   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:49:56,554-Speed 3294.73 samples/sec   Loss 3.1106   LearningRate 0.0571   Epoch: 4   Global Step: 81490   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:49:59,622-Speed 3338.98 samples/sec   Loss 3.1063   LearningRate 0.0571   Epoch: 4   Global Step: 81500   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:50:02,693-Speed 3334.95 samples/sec   Loss 3.2481   LearningRate 0.0571   Epoch: 4   Global Step: 81510   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:50:05,769-Speed 3329.72 samples/sec   Loss 3.1957   LearningRate 0.0571   Epoch: 4   Global Step: 81520   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:50:08,908-Speed 3262.99 samples/sec   Loss 3.1291   LearningRate 0.0571   Epoch: 4   Global Step: 81530   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:50:12,111-Speed 3197.83 samples/sec   Loss 3.1034   LearningRate 0.0571   Epoch: 4   Global Step: 81540   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:50:15,306-Speed 3206.39 samples/sec   Loss 3.1205   LearningRate 0.0571   Epoch: 4   Global Step: 81550   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:50:18,388-Speed 3323.23 samples/sec   Loss 3.0647   LearningRate 0.0571   Epoch: 4   Global Step: 81560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:21,457-Speed 3337.34 samples/sec   Loss 3.2109   LearningRate 0.0571   Epoch: 4   Global Step: 81570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:24,524-Speed 3339.63 samples/sec   Loss 3.1291   LearningRate 0.0571   Epoch: 4   Global Step: 81580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:27,590-Speed 3340.62 samples/sec   Loss 3.1320   LearningRate 0.0571   Epoch: 4   Global Step: 81590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:30,668-Speed 3327.27 samples/sec   Loss 3.2032   LearningRate 0.0571   Epoch: 4   Global Step: 81600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:33,760-Speed 3312.19 samples/sec   Loss 3.2216   LearningRate 0.0571   Epoch: 4   Global Step: 81610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:36,826-Speed 3341.28 samples/sec   Loss 3.1550   LearningRate 0.0571   Epoch: 4   Global Step: 81620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:39,892-Speed 3340.47 samples/sec   Loss 3.2281   LearningRate 0.0571   Epoch: 4   Global Step: 81630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:42,984-Speed 3313.37 samples/sec   Loss 3.1558   LearningRate 0.0571   Epoch: 4   Global Step: 81640   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:46,058-Speed 3331.76 samples/sec   Loss 3.0697   LearningRate 0.0571   Epoch: 4   Global Step: 81650   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:50:49,135-Speed 3328.14 samples/sec   Loss 3.1882   LearningRate 0.0571   Epoch: 4   Global Step: 81660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:50:52,210-Speed 3331.07 samples/sec   Loss 3.1799   LearningRate 0.0571   Epoch: 4   Global Step: 81670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:50:55,393-Speed 3217.82 samples/sec   Loss 3.2021   LearningRate 0.0571   Epoch: 4   Global Step: 81680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:50:58,488-Speed 3309.58 samples/sec   Loss 3.2634   LearningRate 0.0570   Epoch: 4   Global Step: 81690   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:01,554-Speed 3340.61 samples/sec   Loss 3.1551   LearningRate 0.0570   Epoch: 4   Global Step: 81700   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:04,799-Speed 3156.78 samples/sec   Loss 3.1269   LearningRate 0.0570   Epoch: 4   Global Step: 81710   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:08,045-Speed 3155.33 samples/sec   Loss 3.1191   LearningRate 0.0570   Epoch: 4   Global Step: 81720   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:11,304-Speed 3142.62 samples/sec   Loss 3.1254   LearningRate 0.0570   Epoch: 4   Global Step: 81730   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:14,512-Speed 3192.61 samples/sec   Loss 3.0983   LearningRate 0.0570   Epoch: 4   Global Step: 81740   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:17,704-Speed 3208.61 samples/sec   Loss 3.2004   LearningRate 0.0570   Epoch: 4   Global Step: 81750   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:20,861-Speed 3244.66 samples/sec   Loss 3.1593   LearningRate 0.0570   Epoch: 4   Global Step: 81760   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:24,001-Speed 3261.76 samples/sec   Loss 3.1277   LearningRate 0.0570   Epoch: 4   Global Step: 81770   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:27,145-Speed 3257.53 samples/sec   Loss 3.1248   LearningRate 0.0570   Epoch: 4   Global Step: 81780   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:30,229-Speed 3320.72 samples/sec   Loss 3.2073   LearningRate 0.0570   Epoch: 4   Global Step: 81790   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:33,303-Speed 3332.30 samples/sec   Loss 3.1317   LearningRate 0.0570   Epoch: 4   Global Step: 81800   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:36,373-Speed 3336.66 samples/sec   Loss 3.2108   LearningRate 0.0570   Epoch: 4   Global Step: 81810   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:39,455-Speed 3323.96 samples/sec   Loss 3.0765   LearningRate 0.0570   Epoch: 4   Global Step: 81820   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:42,542-Speed 3316.87 samples/sec   Loss 3.1829   LearningRate 0.0570   Epoch: 4   Global Step: 81830   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:45,687-Speed 3257.49 samples/sec   Loss 3.1665   LearningRate 0.0570   Epoch: 4   Global Step: 81840   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:48,826-Speed 3262.12 samples/sec   Loss 3.1984   LearningRate 0.0570   Epoch: 4   Global Step: 81850   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:51,907-Speed 3324.15 samples/sec   Loss 3.1145   LearningRate 0.0570   Epoch: 4   Global Step: 81860   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:51:54,980-Speed 3333.75 samples/sec   Loss 3.1574   LearningRate 0.0570   Epoch: 4   Global Step: 81870   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:51:58,087-Speed 3296.61 samples/sec   Loss 3.1198   LearningRate 0.0570   Epoch: 4   Global Step: 81880   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:52:01,166-Speed 3326.60 samples/sec   Loss 3.1224   LearningRate 0.0570   Epoch: 4   Global Step: 81890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:52:04,238-Speed 3333.84 samples/sec   Loss 3.1457   LearningRate 0.0570   Epoch: 4   Global Step: 81900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:07,359-Speed 3282.13 samples/sec   Loss 3.2332   LearningRate 0.0569   Epoch: 4   Global Step: 81910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:10,420-Speed 3346.21 samples/sec   Loss 3.2160   LearningRate 0.0569   Epoch: 4   Global Step: 81920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:13,498-Speed 3326.90 samples/sec   Loss 3.2052   LearningRate 0.0569   Epoch: 4   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:16,563-Speed 3341.77 samples/sec   Loss 3.1537   LearningRate 0.0569   Epoch: 4   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:19,703-Speed 3261.77 samples/sec   Loss 3.1034   LearningRate 0.0569   Epoch: 4   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:22,775-Speed 3334.70 samples/sec   Loss 3.1369   LearningRate 0.0569   Epoch: 4   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:25,872-Speed 3307.12 samples/sec   Loss 3.1493   LearningRate 0.0569   Epoch: 4   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:28,971-Speed 3305.11 samples/sec   Loss 3.1686   LearningRate 0.0569   Epoch: 4   Global Step: 81980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:32,055-Speed 3321.52 samples/sec   Loss 3.1648   LearningRate 0.0569   Epoch: 4   Global Step: 81990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:52:35,129-Speed 3330.99 samples/sec   Loss 3.1561   LearningRate 0.0569   Epoch: 4   Global Step: 82000   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:53:19,199-[lfw][82000]XNorm: 21.200448
Training: 2022-04-11 07:53:19,200-[lfw][82000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-11 07:53:19,200-[lfw][82000]Accuracy-Highest: 0.99817
Training: 2022-04-11 07:54:10,444-[cfp_fp][82000]XNorm: 19.921196
Training: 2022-04-11 07:54:10,444-[cfp_fp][82000]Accuracy-Flip: 0.98457+-0.00549
Training: 2022-04-11 07:54:10,445-[cfp_fp][82000]Accuracy-Highest: 0.98543
Training: 2022-04-11 07:54:54,847-[agedb_30][82000]XNorm: 21.432718
Training: 2022-04-11 07:54:54,847-[agedb_30][82000]Accuracy-Flip: 0.97967+-0.00846
Training: 2022-04-11 07:54:54,848-[agedb_30][82000]Accuracy-Highest: 0.98117
Training: 2022-04-11 07:54:57,927-Speed 71.71 samples/sec   Loss 3.0366   LearningRate 0.0569   Epoch: 4   Global Step: 82010   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:00,985-Speed 3349.14 samples/sec   Loss 3.1227   LearningRate 0.0569   Epoch: 4   Global Step: 82020   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:04,036-Speed 3356.81 samples/sec   Loss 3.2466   LearningRate 0.0569   Epoch: 4   Global Step: 82030   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:07,093-Speed 3350.08 samples/sec   Loss 3.1063   LearningRate 0.0569   Epoch: 4   Global Step: 82040   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:10,145-Speed 3356.17 samples/sec   Loss 3.1457   LearningRate 0.0569   Epoch: 4   Global Step: 82050   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:13,260-Speed 3287.70 samples/sec   Loss 3.1668   LearningRate 0.0569   Epoch: 4   Global Step: 82060   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:16,322-Speed 3344.92 samples/sec   Loss 3.1421   LearningRate 0.0569   Epoch: 4   Global Step: 82070   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:19,385-Speed 3344.67 samples/sec   Loss 3.1555   LearningRate 0.0569   Epoch: 4   Global Step: 82080   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:22,470-Speed 3319.25 samples/sec   Loss 3.1860   LearningRate 0.0569   Epoch: 4   Global Step: 82090   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:25,567-Speed 3308.04 samples/sec   Loss 3.1715   LearningRate 0.0569   Epoch: 4   Global Step: 82100   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:55:28,619-Speed 3356.12 samples/sec   Loss 3.1522   LearningRate 0.0569   Epoch: 4   Global Step: 82110   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:31,715-Speed 3308.33 samples/sec   Loss 3.1282   LearningRate 0.0569   Epoch: 4   Global Step: 82120   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:34,805-Speed 3313.69 samples/sec   Loss 3.1981   LearningRate 0.0568   Epoch: 4   Global Step: 82130   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:37,896-Speed 3314.57 samples/sec   Loss 3.2121   LearningRate 0.0568   Epoch: 4   Global Step: 82140   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:40,960-Speed 3342.55 samples/sec   Loss 3.1225   LearningRate 0.0568   Epoch: 4   Global Step: 82150   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:44,030-Speed 3335.56 samples/sec   Loss 3.2237   LearningRate 0.0568   Epoch: 4   Global Step: 82160   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:47,097-Speed 3339.35 samples/sec   Loss 3.2451   LearningRate 0.0568   Epoch: 4   Global Step: 82170   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:50,175-Speed 3328.98 samples/sec   Loss 3.1516   LearningRate 0.0568   Epoch: 4   Global Step: 82180   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:53,246-Speed 3335.06 samples/sec   Loss 3.1409   LearningRate 0.0568   Epoch: 4   Global Step: 82190   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:56,308-Speed 3345.16 samples/sec   Loss 3.1041   LearningRate 0.0568   Epoch: 4   Global Step: 82200   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:55:59,368-Speed 3346.87 samples/sec   Loss 3.0845   LearningRate 0.0568   Epoch: 4   Global Step: 82210   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:56:02,437-Speed 3337.67 samples/sec   Loss 3.2132   LearningRate 0.0568   Epoch: 4   Global Step: 82220   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:56:05,489-Speed 3355.64 samples/sec   Loss 3.0998   LearningRate 0.0568   Epoch: 4   Global Step: 82230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:08,587-Speed 3306.30 samples/sec   Loss 3.1595   LearningRate 0.0568   Epoch: 4   Global Step: 82240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:11,654-Speed 3338.59 samples/sec   Loss 3.1240   LearningRate 0.0568   Epoch: 4   Global Step: 82250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:14,778-Speed 3279.50 samples/sec   Loss 3.1668   LearningRate 0.0568   Epoch: 4   Global Step: 82260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:17,852-Speed 3332.04 samples/sec   Loss 3.1561   LearningRate 0.0568   Epoch: 4   Global Step: 82270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:20,916-Speed 3342.88 samples/sec   Loss 3.1670   LearningRate 0.0568   Epoch: 4   Global Step: 82280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:23,996-Speed 3324.86 samples/sec   Loss 3.1321   LearningRate 0.0568   Epoch: 4   Global Step: 82290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:27,074-Speed 3327.77 samples/sec   Loss 3.1481   LearningRate 0.0568   Epoch: 4   Global Step: 82300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:30,147-Speed 3332.87 samples/sec   Loss 3.1308   LearningRate 0.0568   Epoch: 4   Global Step: 82310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:33,237-Speed 3314.57 samples/sec   Loss 3.1192   LearningRate 0.0568   Epoch: 4   Global Step: 82320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:56:36,303-Speed 3340.69 samples/sec   Loss 3.1268   LearningRate 0.0568   Epoch: 4   Global Step: 82330   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:56:39,367-Speed 3343.21 samples/sec   Loss 3.1295   LearningRate 0.0568   Epoch: 4   Global Step: 82340   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:56:42,462-Speed 3309.08 samples/sec   Loss 3.0719   LearningRate 0.0567   Epoch: 4   Global Step: 82350   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:56:45,532-Speed 3336.87 samples/sec   Loss 3.1550   LearningRate 0.0567   Epoch: 4   Global Step: 82360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:56:48,698-Speed 3234.45 samples/sec   Loss 3.1542   LearningRate 0.0567   Epoch: 4   Global Step: 82370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:56:51,774-Speed 3329.87 samples/sec   Loss 3.0523   LearningRate 0.0567   Epoch: 4   Global Step: 82380   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:56:54,839-Speed 3341.89 samples/sec   Loss 3.1764   LearningRate 0.0567   Epoch: 4   Global Step: 82390   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:56:57,939-Speed 3303.42 samples/sec   Loss 3.0732   LearningRate 0.0567   Epoch: 4   Global Step: 82400   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:57:01,005-Speed 3340.82 samples/sec   Loss 3.1265   LearningRate 0.0567   Epoch: 4   Global Step: 82410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:57:04,101-Speed 3308.92 samples/sec   Loss 3.1647   LearningRate 0.0567   Epoch: 4   Global Step: 82420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:57:07,179-Speed 3326.91 samples/sec   Loss 3.1085   LearningRate 0.0567   Epoch: 4   Global Step: 82430   Fp16 Grad Scale: 262144   Required: 27 hours
Training: 2022-04-11 07:57:10,282-Speed 3301.95 samples/sec   Loss 3.1553   LearningRate 0.0567   Epoch: 4   Global Step: 82440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:57:13,354-Speed 3333.87 samples/sec   Loss 3.1479   LearningRate 0.0567   Epoch: 4   Global Step: 82450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-11 07:57:16,419-Speed 3342.20 samples/sec   Loss 3.2193   LearningRate 0.0567   Epoch: 4   Global Step: 82460   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:57:19,492-Speed 3332.28 samples/sec   Loss 3.1478   LearningRate 0.0567   Epoch: 4   Global Step: 82470   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-11 07:57:22,546-Speed 3354.63 samples/sec   Loss 3.1110   LearningRate 0.0567   Epoch: 4   Global Step: 82480   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:25,621-Speed 3330.21 samples/sec   Loss 3.1593   LearningRate 0.0567   Epoch: 4   Global Step: 82490   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:28,735-Speed 3289.22 samples/sec   Loss 3.1563   LearningRate 0.0567   Epoch: 4   Global Step: 82500   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:31,903-Speed 3233.22 samples/sec   Loss 3.0924   LearningRate 0.0567   Epoch: 4   Global Step: 82510   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:34,974-Speed 3335.79 samples/sec   Loss 3.2026   LearningRate 0.0567   Epoch: 4   Global Step: 82520   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:38,036-Speed 3345.69 samples/sec   Loss 3.1020   LearningRate 0.0567   Epoch: 4   Global Step: 82530   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:41,113-Speed 3328.70 samples/sec   Loss 3.1557   LearningRate 0.0567   Epoch: 4   Global Step: 82540   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:44,178-Speed 3341.02 samples/sec   Loss 3.1308   LearningRate 0.0567   Epoch: 4   Global Step: 82550   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:47,243-Speed 3342.70 samples/sec   Loss 3.0811   LearningRate 0.0567   Epoch: 4   Global Step: 82560   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:50,326-Speed 3321.52 samples/sec   Loss 3.1553   LearningRate 0.0566   Epoch: 4   Global Step: 82570   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-11 07:57:53,398-Speed 3334.93 samples/sec   Loss 3.2061   LearningRate 0.0566   Epoch: 4   Global Step: 82580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:57:56,460-Speed 3344.71 samples/sec   Loss 3.1494   LearningRate 0.0566   Epoch: 4   Global Step: 82590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:57:59,557-Speed 3307.31 samples/sec   Loss 3.0866   LearningRate 0.0566   Epoch: 4   Global Step: 82600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:58:02,706-Speed 3252.64 samples/sec   Loss 3.1646   LearningRate 0.0566   Epoch: 4   Global Step: 82610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:58:05,834-Speed 3274.30 samples/sec   Loss 3.0906   LearningRate 0.0566   Epoch: 4   Global Step: 82620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:58:08,896-Speed 3345.22 samples/sec   Loss 3.1210   LearningRate 0.0566   Epoch: 4   Global Step: 82630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:58:12,094-Speed 3202.87 samples/sec   Loss 3.2205   LearningRate 0.0566   Epoch: 4   Global Step: 82640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:58:15,158-Speed 3342.91 samples/sec   Loss 3.0955   LearningRate 0.0566   Epoch: 4   Global Step: 82650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:58:18,232-Speed 3332.07 samples/sec   Loss 3.1225   LearningRate 0.0566   Epoch: 4   Global Step: 82660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:58:21,309-Speed 3328.32 samples/sec   Loss 3.1115   LearningRate 0.0566   Epoch: 4   Global Step: 82670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:58:24,390-Speed 3324.20 samples/sec   Loss 3.0346   LearningRate 0.0566   Epoch: 4   Global Step: 82680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:27,469-Speed 3326.60 samples/sec   Loss 3.1414   LearningRate 0.0566   Epoch: 4   Global Step: 82690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:30,534-Speed 3342.07 samples/sec   Loss 3.1277   LearningRate 0.0566   Epoch: 4   Global Step: 82700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:33,598-Speed 3343.12 samples/sec   Loss 3.1615   LearningRate 0.0566   Epoch: 4   Global Step: 82710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:36,659-Speed 3345.74 samples/sec   Loss 3.2086   LearningRate 0.0566   Epoch: 4   Global Step: 82720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:39,725-Speed 3340.65 samples/sec   Loss 3.0887   LearningRate 0.0566   Epoch: 4   Global Step: 82730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:42,841-Speed 3287.79 samples/sec   Loss 3.2597   LearningRate 0.0566   Epoch: 4   Global Step: 82740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:45,905-Speed 3342.15 samples/sec   Loss 3.1578   LearningRate 0.0566   Epoch: 4   Global Step: 82750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:48,989-Speed 3321.08 samples/sec   Loss 3.1513   LearningRate 0.0566   Epoch: 4   Global Step: 82760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:52,064-Speed 3331.13 samples/sec   Loss 3.2113   LearningRate 0.0566   Epoch: 4   Global Step: 82770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:58:55,140-Speed 3329.09 samples/sec   Loss 3.1146   LearningRate 0.0566   Epoch: 4   Global Step: 82780   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 07:58:58,184-Speed 3365.17 samples/sec   Loss 3.1204   LearningRate 0.0565   Epoch: 4   Global Step: 82790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:01,249-Speed 3341.96 samples/sec   Loss 3.1964   LearningRate 0.0565   Epoch: 4   Global Step: 82800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:04,336-Speed 3317.75 samples/sec   Loss 3.1708   LearningRate 0.0565   Epoch: 4   Global Step: 82810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:07,446-Speed 3293.40 samples/sec   Loss 3.1387   LearningRate 0.0565   Epoch: 4   Global Step: 82820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:10,513-Speed 3340.23 samples/sec   Loss 3.0855   LearningRate 0.0565   Epoch: 4   Global Step: 82830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:13,589-Speed 3329.42 samples/sec   Loss 3.1516   LearningRate 0.0565   Epoch: 4   Global Step: 82840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:16,659-Speed 3336.64 samples/sec   Loss 3.2611   LearningRate 0.0565   Epoch: 4   Global Step: 82850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:19,733-Speed 3331.60 samples/sec   Loss 3.1471   LearningRate 0.0565   Epoch: 4   Global Step: 82860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:22,810-Speed 3327.99 samples/sec   Loss 3.0603   LearningRate 0.0565   Epoch: 4   Global Step: 82870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:26,075-Speed 3137.93 samples/sec   Loss 3.1511   LearningRate 0.0565   Epoch: 4   Global Step: 82880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 07:59:29,202-Speed 3275.66 samples/sec   Loss 3.2024   LearningRate 0.0565   Epoch: 4   Global Step: 82890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:59:32,298-Speed 3307.99 samples/sec   Loss 3.1640   LearningRate 0.0565   Epoch: 4   Global Step: 82900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:59:35,370-Speed 3333.95 samples/sec   Loss 3.2320   LearningRate 0.0565   Epoch: 4   Global Step: 82910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:59:38,446-Speed 3329.19 samples/sec   Loss 3.0752   LearningRate 0.0565   Epoch: 4   Global Step: 82920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:59:41,516-Speed 3336.62 samples/sec   Loss 3.1239   LearningRate 0.0565   Epoch: 4   Global Step: 82930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:59:44,579-Speed 3344.24 samples/sec   Loss 3.1017   LearningRate 0.0565   Epoch: 4   Global Step: 82940   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:59:47,696-Speed 3286.08 samples/sec   Loss 3.0480   LearningRate 0.0565   Epoch: 4   Global Step: 82950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:59:50,839-Speed 3257.99 samples/sec   Loss 3.2003   LearningRate 0.0565   Epoch: 4   Global Step: 82960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:59:53,915-Speed 3330.44 samples/sec   Loss 3.1031   LearningRate 0.0565   Epoch: 4   Global Step: 82970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 07:59:56,988-Speed 3332.74 samples/sec   Loss 3.1692   LearningRate 0.0565   Epoch: 4   Global Step: 82980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:00,052-Speed 3343.27 samples/sec   Loss 3.1777   LearningRate 0.0565   Epoch: 4   Global Step: 82990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:03,157-Speed 3298.02 samples/sec   Loss 3.1457   LearningRate 0.0565   Epoch: 4   Global Step: 83000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:06,399-Speed 3159.55 samples/sec   Loss 3.1576   LearningRate 0.0565   Epoch: 4   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:09,488-Speed 3315.40 samples/sec   Loss 3.0640   LearningRate 0.0564   Epoch: 4   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:12,659-Speed 3230.25 samples/sec   Loss 3.1834   LearningRate 0.0564   Epoch: 4   Global Step: 83030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:15,829-Speed 3231.01 samples/sec   Loss 2.9940   LearningRate 0.0564   Epoch: 4   Global Step: 83040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:18,915-Speed 3319.44 samples/sec   Loss 3.1555   LearningRate 0.0564   Epoch: 4   Global Step: 83050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:22,056-Speed 3261.12 samples/sec   Loss 3.1330   LearningRate 0.0564   Epoch: 4   Global Step: 83060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:25,206-Speed 3250.87 samples/sec   Loss 3.1554   LearningRate 0.0564   Epoch: 4   Global Step: 83070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:00:28,270-Speed 3342.66 samples/sec   Loss 3.1328   LearningRate 0.0564   Epoch: 4   Global Step: 83080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:31,343-Speed 3333.81 samples/sec   Loss 3.1085   LearningRate 0.0564   Epoch: 4   Global Step: 83090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:34,410-Speed 3339.49 samples/sec   Loss 3.2395   LearningRate 0.0564   Epoch: 4   Global Step: 83100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:37,478-Speed 3337.42 samples/sec   Loss 3.1069   LearningRate 0.0564   Epoch: 4   Global Step: 83110   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:40,546-Speed 3338.30 samples/sec   Loss 3.0911   LearningRate 0.0564   Epoch: 4   Global Step: 83120   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:43,631-Speed 3320.57 samples/sec   Loss 3.1205   LearningRate 0.0564   Epoch: 4   Global Step: 83130   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:46,719-Speed 3316.89 samples/sec   Loss 3.0928   LearningRate 0.0564   Epoch: 4   Global Step: 83140   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:49,807-Speed 3317.21 samples/sec   Loss 3.1268   LearningRate 0.0564   Epoch: 4   Global Step: 83150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:52,890-Speed 3321.85 samples/sec   Loss 3.2060   LearningRate 0.0564   Epoch: 4   Global Step: 83160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:55,999-Speed 3294.98 samples/sec   Loss 3.0801   LearningRate 0.0564   Epoch: 4   Global Step: 83170   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:00:59,063-Speed 3342.65 samples/sec   Loss 3.0977   LearningRate 0.0564   Epoch: 4   Global Step: 83180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:01:02,198-Speed 3266.38 samples/sec   Loss 3.1708   LearningRate 0.0564   Epoch: 4   Global Step: 83190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:05,315-Speed 3286.05 samples/sec   Loss 3.0907   LearningRate 0.0564   Epoch: 4   Global Step: 83200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:08,386-Speed 3335.64 samples/sec   Loss 3.1143   LearningRate 0.0564   Epoch: 4   Global Step: 83210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:11,508-Speed 3280.71 samples/sec   Loss 3.1115   LearningRate 0.0564   Epoch: 4   Global Step: 83220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:14,591-Speed 3325.71 samples/sec   Loss 3.0732   LearningRate 0.0564   Epoch: 4   Global Step: 83230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:17,674-Speed 3322.95 samples/sec   Loss 3.0867   LearningRate 0.0563   Epoch: 4   Global Step: 83240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:20,739-Speed 3341.95 samples/sec   Loss 3.0991   LearningRate 0.0563   Epoch: 4   Global Step: 83250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:23,806-Speed 3339.72 samples/sec   Loss 3.1274   LearningRate 0.0563   Epoch: 4   Global Step: 83260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:26,871-Speed 3341.36 samples/sec   Loss 3.1079   LearningRate 0.0563   Epoch: 4   Global Step: 83270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:29,944-Speed 3332.48 samples/sec   Loss 3.1187   LearningRate 0.0563   Epoch: 4   Global Step: 83280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:01:33,115-Speed 3230.31 samples/sec   Loss 3.0890   LearningRate 0.0563   Epoch: 4   Global Step: 83290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:01:36,258-Speed 3258.51 samples/sec   Loss 3.0205   LearningRate 0.0563   Epoch: 4   Global Step: 83300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:01:39,363-Speed 3299.24 samples/sec   Loss 3.0650   LearningRate 0.0563   Epoch: 4   Global Step: 83310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:01:42,437-Speed 3332.00 samples/sec   Loss 3.1016   LearningRate 0.0563   Epoch: 4   Global Step: 83320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:01:45,606-Speed 3232.29 samples/sec   Loss 3.1099   LearningRate 0.0563   Epoch: 4   Global Step: 83330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:01:48,701-Speed 3309.34 samples/sec   Loss 3.1604   LearningRate 0.0563   Epoch: 4   Global Step: 83340   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:01:51,801-Speed 3304.12 samples/sec   Loss 3.0621   LearningRate 0.0563   Epoch: 4   Global Step: 83350   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:01:54,924-Speed 3279.19 samples/sec   Loss 3.1022   LearningRate 0.0563   Epoch: 4   Global Step: 83360   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:01:58,064-Speed 3261.89 samples/sec   Loss 3.1101   LearningRate 0.0563   Epoch: 4   Global Step: 83370   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:01,134-Speed 3337.09 samples/sec   Loss 3.0793   LearningRate 0.0563   Epoch: 4   Global Step: 83380   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:04,254-Speed 3281.67 samples/sec   Loss 3.1229   LearningRate 0.0563   Epoch: 4   Global Step: 83390   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:07,348-Speed 3310.81 samples/sec   Loss 3.1642   LearningRate 0.0563   Epoch: 4   Global Step: 83400   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:10,438-Speed 3315.04 samples/sec   Loss 3.1200   LearningRate 0.0563   Epoch: 4   Global Step: 83410   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:13,528-Speed 3314.50 samples/sec   Loss 3.1118   LearningRate 0.0563   Epoch: 4   Global Step: 83420   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:16,596-Speed 3338.96 samples/sec   Loss 3.1412   LearningRate 0.0563   Epoch: 4   Global Step: 83430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:19,683-Speed 3317.81 samples/sec   Loss 3.0795   LearningRate 0.0563   Epoch: 4   Global Step: 83440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:22,999-Speed 3088.87 samples/sec   Loss 3.1298   LearningRate 0.0563   Epoch: 4   Global Step: 83450   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:54,904-Speed 320.97 samples/sec   Loss 2.9027   LearningRate 0.0562   Epoch: 5   Global Step: 83460   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:02:58,103-Speed 3203.52 samples/sec   Loss 2.6137   LearningRate 0.0562   Epoch: 5   Global Step: 83470   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:03:01,228-Speed 3277.15 samples/sec   Loss 2.5527   LearningRate 0.0562   Epoch: 5   Global Step: 83480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:04,302-Speed 3331.97 samples/sec   Loss 2.5371   LearningRate 0.0562   Epoch: 5   Global Step: 83490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:07,363-Speed 3346.84 samples/sec   Loss 2.5277   LearningRate 0.0562   Epoch: 5   Global Step: 83500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:10,427-Speed 3343.07 samples/sec   Loss 2.4628   LearningRate 0.0562   Epoch: 5   Global Step: 83510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:13,583-Speed 3245.39 samples/sec   Loss 2.5224   LearningRate 0.0562   Epoch: 5   Global Step: 83520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:16,684-Speed 3302.56 samples/sec   Loss 2.4828   LearningRate 0.0562   Epoch: 5   Global Step: 83530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:19,800-Speed 3287.63 samples/sec   Loss 2.5459   LearningRate 0.0562   Epoch: 5   Global Step: 83540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:22,979-Speed 3222.05 samples/sec   Loss 2.5820   LearningRate 0.0562   Epoch: 5   Global Step: 83550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:26,215-Speed 3164.77 samples/sec   Loss 2.5328   LearningRate 0.0562   Epoch: 5   Global Step: 83560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:29,374-Speed 3242.69 samples/sec   Loss 2.5891   LearningRate 0.0562   Epoch: 5   Global Step: 83570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:32,670-Speed 3107.99 samples/sec   Loss 2.5208   LearningRate 0.0562   Epoch: 5   Global Step: 83580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:03:35,774-Speed 3300.51 samples/sec   Loss 2.4573   LearningRate 0.0562   Epoch: 5   Global Step: 83590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:39,059-Speed 3118.12 samples/sec   Loss 2.5347   LearningRate 0.0562   Epoch: 5   Global Step: 83600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:42,438-Speed 3030.96 samples/sec   Loss 2.5806   LearningRate 0.0562   Epoch: 5   Global Step: 83610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:45,510-Speed 3335.08 samples/sec   Loss 2.5592   LearningRate 0.0562   Epoch: 5   Global Step: 83620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:48,586-Speed 3329.37 samples/sec   Loss 2.5447   LearningRate 0.0562   Epoch: 5   Global Step: 83630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:51,771-Speed 3215.92 samples/sec   Loss 2.5501   LearningRate 0.0562   Epoch: 5   Global Step: 83640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:54,975-Speed 3197.99 samples/sec   Loss 2.4803   LearningRate 0.0562   Epoch: 5   Global Step: 83650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:03:58,061-Speed 3318.63 samples/sec   Loss 2.5335   LearningRate 0.0562   Epoch: 5   Global Step: 83660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:04:01,147-Speed 3319.32 samples/sec   Loss 2.4762   LearningRate 0.0562   Epoch: 5   Global Step: 83670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:04:04,294-Speed 3254.36 samples/sec   Loss 2.4914   LearningRate 0.0561   Epoch: 5   Global Step: 83680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:04:07,374-Speed 3325.46 samples/sec   Loss 2.5667   LearningRate 0.0561   Epoch: 5   Global Step: 83690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:04:10,487-Speed 3291.02 samples/sec   Loss 2.4862   LearningRate 0.0561   Epoch: 5   Global Step: 83700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:04:13,637-Speed 3251.21 samples/sec   Loss 2.5161   LearningRate 0.0561   Epoch: 5   Global Step: 83710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:04:16,891-Speed 3148.19 samples/sec   Loss 2.5247   LearningRate 0.0561   Epoch: 5   Global Step: 83720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:04:20,011-Speed 3283.37 samples/sec   Loss 2.4276   LearningRate 0.0561   Epoch: 5   Global Step: 83730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:04:23,153-Speed 3260.28 samples/sec   Loss 2.5588   LearningRate 0.0561   Epoch: 5   Global Step: 83740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:04:26,260-Speed 3295.76 samples/sec   Loss 2.5758   LearningRate 0.0561   Epoch: 5   Global Step: 83750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:04:29,345-Speed 3321.21 samples/sec   Loss 2.5659   LearningRate 0.0561   Epoch: 5   Global Step: 83760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:04:32,515-Speed 3230.34 samples/sec   Loss 2.5124   LearningRate 0.0561   Epoch: 5   Global Step: 83770   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:04:35,635-Speed 3283.57 samples/sec   Loss 2.5534   LearningRate 0.0561   Epoch: 5   Global Step: 83780   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:04:38,723-Speed 3316.32 samples/sec   Loss 2.5261   LearningRate 0.0561   Epoch: 5   Global Step: 83790   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:04:41,797-Speed 3331.86 samples/sec   Loss 2.5615   LearningRate 0.0561   Epoch: 5   Global Step: 83800   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:04:44,969-Speed 3229.74 samples/sec   Loss 2.5729   LearningRate 0.0561   Epoch: 5   Global Step: 83810   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:04:48,201-Speed 3168.85 samples/sec   Loss 2.5392   LearningRate 0.0561   Epoch: 5   Global Step: 83820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:04:51,349-Speed 3254.70 samples/sec   Loss 2.5870   LearningRate 0.0561   Epoch: 5   Global Step: 83830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:04:54,432-Speed 3321.82 samples/sec   Loss 2.5285   LearningRate 0.0561   Epoch: 5   Global Step: 83840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:04:57,527-Speed 3309.67 samples/sec   Loss 2.5839   LearningRate 0.0561   Epoch: 5   Global Step: 83850   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:05:00,622-Speed 3309.30 samples/sec   Loss 2.5572   LearningRate 0.0561   Epoch: 5   Global Step: 83860   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:05:03,703-Speed 3324.82 samples/sec   Loss 2.6006   LearningRate 0.0561   Epoch: 5   Global Step: 83870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:06,773-Speed 3336.29 samples/sec   Loss 2.5813   LearningRate 0.0561   Epoch: 5   Global Step: 83880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:09,841-Speed 3339.05 samples/sec   Loss 2.6118   LearningRate 0.0561   Epoch: 5   Global Step: 83890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:12,919-Speed 3327.97 samples/sec   Loss 2.6040   LearningRate 0.0561   Epoch: 5   Global Step: 83900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:16,002-Speed 3321.69 samples/sec   Loss 2.5574   LearningRate 0.0560   Epoch: 5   Global Step: 83910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:19,094-Speed 3312.73 samples/sec   Loss 2.6374   LearningRate 0.0560   Epoch: 5   Global Step: 83920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:22,168-Speed 3332.15 samples/sec   Loss 2.5506   LearningRate 0.0560   Epoch: 5   Global Step: 83930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:25,291-Speed 3279.90 samples/sec   Loss 2.5493   LearningRate 0.0560   Epoch: 5   Global Step: 83940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:28,367-Speed 3330.17 samples/sec   Loss 2.5320   LearningRate 0.0560   Epoch: 5   Global Step: 83950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:31,440-Speed 3332.20 samples/sec   Loss 2.5720   LearningRate 0.0560   Epoch: 5   Global Step: 83960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:05:34,629-Speed 3212.47 samples/sec   Loss 2.5395   LearningRate 0.0560   Epoch: 5   Global Step: 83970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:05:37,863-Speed 3166.83 samples/sec   Loss 2.6372   LearningRate 0.0560   Epoch: 5   Global Step: 83980   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:05:41,106-Speed 3158.28 samples/sec   Loss 2.5727   LearningRate 0.0560   Epoch: 5   Global Step: 83990   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:05:44,182-Speed 3330.53 samples/sec   Loss 2.5729   LearningRate 0.0560   Epoch: 5   Global Step: 84000   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:06:28,194-[lfw][84000]XNorm: 24.065116
Training: 2022-04-11 08:06:28,194-[lfw][84000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 08:06:28,195-[lfw][84000]Accuracy-Highest: 0.99817
Training: 2022-04-11 08:07:19,236-[cfp_fp][84000]XNorm: 22.576386
Training: 2022-04-11 08:07:19,237-[cfp_fp][84000]Accuracy-Flip: 0.98557+-0.00529
Training: 2022-04-11 08:07:19,237-[cfp_fp][84000]Accuracy-Highest: 0.98557
Training: 2022-04-11 08:08:03,124-[agedb_30][84000]XNorm: 24.211899
Training: 2022-04-11 08:08:03,124-[agedb_30][84000]Accuracy-Flip: 0.98167+-0.00753
Training: 2022-04-11 08:08:03,125-[agedb_30][84000]Accuracy-Highest: 0.98167
Training: 2022-04-11 08:08:06,245-Speed 72.08 samples/sec   Loss 2.6176   LearningRate 0.0560   Epoch: 5   Global Step: 84010   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:08:09,287-Speed 3366.85 samples/sec   Loss 2.5742   LearningRate 0.0560   Epoch: 5   Global Step: 84020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:12,339-Speed 3356.03 samples/sec   Loss 2.5719   LearningRate 0.0560   Epoch: 5   Global Step: 84030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:15,397-Speed 3348.88 samples/sec   Loss 2.5800   LearningRate 0.0560   Epoch: 5   Global Step: 84040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:18,449-Speed 3356.51 samples/sec   Loss 2.5787   LearningRate 0.0560   Epoch: 5   Global Step: 84050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:21,511-Speed 3344.69 samples/sec   Loss 2.6088   LearningRate 0.0560   Epoch: 5   Global Step: 84060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:24,565-Speed 3353.84 samples/sec   Loss 2.6119   LearningRate 0.0560   Epoch: 5   Global Step: 84070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:27,638-Speed 3333.84 samples/sec   Loss 2.5846   LearningRate 0.0560   Epoch: 5   Global Step: 84080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:30,731-Speed 3311.87 samples/sec   Loss 2.6348   LearningRate 0.0560   Epoch: 5   Global Step: 84090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:33,863-Speed 3270.65 samples/sec   Loss 2.5673   LearningRate 0.0560   Epoch: 5   Global Step: 84100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:36,929-Speed 3340.60 samples/sec   Loss 2.6020   LearningRate 0.0560   Epoch: 5   Global Step: 84110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:08:40,012-Speed 3322.21 samples/sec   Loss 2.5163   LearningRate 0.0560   Epoch: 5   Global Step: 84120   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:08:43,077-Speed 3342.21 samples/sec   Loss 2.6385   LearningRate 0.0559   Epoch: 5   Global Step: 84130   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:08:46,165-Speed 3317.61 samples/sec   Loss 2.6684   LearningRate 0.0559   Epoch: 5   Global Step: 84140   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:08:49,243-Speed 3327.86 samples/sec   Loss 2.6114   LearningRate 0.0559   Epoch: 5   Global Step: 84150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:08:52,332-Speed 3315.25 samples/sec   Loss 2.6346   LearningRate 0.0559   Epoch: 5   Global Step: 84160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:08:55,396-Speed 3343.40 samples/sec   Loss 2.6324   LearningRate 0.0559   Epoch: 5   Global Step: 84170   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:08:58,471-Speed 3330.69 samples/sec   Loss 2.5416   LearningRate 0.0559   Epoch: 5   Global Step: 84180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:01,553-Speed 3323.06 samples/sec   Loss 2.6079   LearningRate 0.0559   Epoch: 5   Global Step: 84190   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:04,667-Speed 3289.67 samples/sec   Loss 2.6542   LearningRate 0.0559   Epoch: 5   Global Step: 84200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:07,825-Speed 3243.62 samples/sec   Loss 2.6061   LearningRate 0.0559   Epoch: 5   Global Step: 84210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:11,029-Speed 3196.39 samples/sec   Loss 2.5776   LearningRate 0.0559   Epoch: 5   Global Step: 84220   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:14,110-Speed 3324.50 samples/sec   Loss 2.6181   LearningRate 0.0559   Epoch: 5   Global Step: 84230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:17,253-Speed 3259.39 samples/sec   Loss 2.5896   LearningRate 0.0559   Epoch: 5   Global Step: 84240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:20,377-Speed 3278.01 samples/sec   Loss 2.5984   LearningRate 0.0559   Epoch: 5   Global Step: 84250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:23,463-Speed 3320.15 samples/sec   Loss 2.6011   LearningRate 0.0559   Epoch: 5   Global Step: 84260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:26,529-Speed 3341.07 samples/sec   Loss 2.6554   LearningRate 0.0559   Epoch: 5   Global Step: 84270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:29,637-Speed 3295.67 samples/sec   Loss 2.6073   LearningRate 0.0559   Epoch: 5   Global Step: 84280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:32,763-Speed 3276.19 samples/sec   Loss 2.6325   LearningRate 0.0559   Epoch: 5   Global Step: 84290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:35,858-Speed 3309.50 samples/sec   Loss 2.6363   LearningRate 0.0559   Epoch: 5   Global Step: 84300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:38,939-Speed 3325.14 samples/sec   Loss 2.6505   LearningRate 0.0559   Epoch: 5   Global Step: 84310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:42,016-Speed 3327.98 samples/sec   Loss 2.6146   LearningRate 0.0559   Epoch: 5   Global Step: 84320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:09:45,087-Speed 3335.29 samples/sec   Loss 2.6198   LearningRate 0.0559   Epoch: 5   Global Step: 84330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:09:48,158-Speed 3335.39 samples/sec   Loss 2.5594   LearningRate 0.0559   Epoch: 5   Global Step: 84340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:09:51,224-Speed 3340.94 samples/sec   Loss 2.6018   LearningRate 0.0558   Epoch: 5   Global Step: 84350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:09:54,306-Speed 3323.26 samples/sec   Loss 2.6221   LearningRate 0.0558   Epoch: 5   Global Step: 84360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:09:57,432-Speed 3276.19 samples/sec   Loss 2.6350   LearningRate 0.0558   Epoch: 5   Global Step: 84370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:00,543-Speed 3293.40 samples/sec   Loss 2.6004   LearningRate 0.0558   Epoch: 5   Global Step: 84380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:03,611-Speed 3338.02 samples/sec   Loss 2.6581   LearningRate 0.0558   Epoch: 5   Global Step: 84390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:06,727-Speed 3286.87 samples/sec   Loss 2.6110   LearningRate 0.0558   Epoch: 5   Global Step: 84400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:09,935-Speed 3192.64 samples/sec   Loss 2.5613   LearningRate 0.0558   Epoch: 5   Global Step: 84410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:12,993-Speed 3349.62 samples/sec   Loss 2.5955   LearningRate 0.0558   Epoch: 5   Global Step: 84420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:16,056-Speed 3343.53 samples/sec   Loss 2.6568   LearningRate 0.0558   Epoch: 5   Global Step: 84430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:10:19,113-Speed 3351.04 samples/sec   Loss 2.6218   LearningRate 0.0558   Epoch: 5   Global Step: 84440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:10:22,176-Speed 3343.87 samples/sec   Loss 2.6866   LearningRate 0.0558   Epoch: 5   Global Step: 84450   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:10:25,225-Speed 3358.80 samples/sec   Loss 2.6624   LearningRate 0.0558   Epoch: 5   Global Step: 84460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:28,283-Speed 3350.06 samples/sec   Loss 2.6740   LearningRate 0.0558   Epoch: 5   Global Step: 84470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:31,376-Speed 3311.48 samples/sec   Loss 2.6195   LearningRate 0.0558   Epoch: 5   Global Step: 84480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:34,469-Speed 3311.18 samples/sec   Loss 2.6453   LearningRate 0.0558   Epoch: 5   Global Step: 84490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:37,535-Speed 3340.64 samples/sec   Loss 2.6431   LearningRate 0.0558   Epoch: 5   Global Step: 84500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:40,715-Speed 3220.76 samples/sec   Loss 2.7415   LearningRate 0.0558   Epoch: 5   Global Step: 84510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:43,904-Speed 3212.64 samples/sec   Loss 2.6514   LearningRate 0.0558   Epoch: 5   Global Step: 84520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:47,056-Speed 3248.86 samples/sec   Loss 2.6382   LearningRate 0.0558   Epoch: 5   Global Step: 84530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:50,119-Speed 3344.23 samples/sec   Loss 2.5716   LearningRate 0.0558   Epoch: 5   Global Step: 84540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:53,181-Speed 3345.21 samples/sec   Loss 2.7139   LearningRate 0.0558   Epoch: 5   Global Step: 84550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:10:56,242-Speed 3346.09 samples/sec   Loss 2.6186   LearningRate 0.0558   Epoch: 5   Global Step: 84560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:10:59,300-Speed 3349.77 samples/sec   Loss 2.7165   LearningRate 0.0558   Epoch: 5   Global Step: 84570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:02,355-Speed 3352.80 samples/sec   Loss 2.5838   LearningRate 0.0557   Epoch: 5   Global Step: 84580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:05,435-Speed 3325.53 samples/sec   Loss 2.6519   LearningRate 0.0557   Epoch: 5   Global Step: 84590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:08,540-Speed 3299.26 samples/sec   Loss 2.6974   LearningRate 0.0557   Epoch: 5   Global Step: 84600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:11,595-Speed 3352.42 samples/sec   Loss 2.6814   LearningRate 0.0557   Epoch: 5   Global Step: 84610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:14,659-Speed 3342.02 samples/sec   Loss 2.6691   LearningRate 0.0557   Epoch: 5   Global Step: 84620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:18,653-Speed 2564.69 samples/sec   Loss 2.6750   LearningRate 0.0557   Epoch: 5   Global Step: 84630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:21,713-Speed 3347.02 samples/sec   Loss 2.7042   LearningRate 0.0557   Epoch: 5   Global Step: 84640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:24,775-Speed 3344.75 samples/sec   Loss 2.6655   LearningRate 0.0557   Epoch: 5   Global Step: 84650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:27,874-Speed 3306.17 samples/sec   Loss 2.5471   LearningRate 0.0557   Epoch: 5   Global Step: 84660   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 08:11:31,000-Speed 3278.30 samples/sec   Loss 2.6292   LearningRate 0.0557   Epoch: 5   Global Step: 84670   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 08:11:34,175-Speed 3226.30 samples/sec   Loss 2.6143   LearningRate 0.0557   Epoch: 5   Global Step: 84680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:37,309-Speed 3268.63 samples/sec   Loss 2.6635   LearningRate 0.0557   Epoch: 5   Global Step: 84690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:40,444-Speed 3267.74 samples/sec   Loss 2.6324   LearningRate 0.0557   Epoch: 5   Global Step: 84700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:43,554-Speed 3293.47 samples/sec   Loss 2.6342   LearningRate 0.0557   Epoch: 5   Global Step: 84710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:46,616-Speed 3345.04 samples/sec   Loss 2.6535   LearningRate 0.0557   Epoch: 5   Global Step: 84720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:49,741-Speed 3278.17 samples/sec   Loss 2.7321   LearningRate 0.0557   Epoch: 5   Global Step: 84730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:52,798-Speed 3351.47 samples/sec   Loss 2.7836   LearningRate 0.0557   Epoch: 5   Global Step: 84740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:55,878-Speed 3325.53 samples/sec   Loss 2.6411   LearningRate 0.0557   Epoch: 5   Global Step: 84750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:11:59,068-Speed 3211.15 samples/sec   Loss 2.7020   LearningRate 0.0557   Epoch: 5   Global Step: 84760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:12:02,131-Speed 3345.04 samples/sec   Loss 2.6971   LearningRate 0.0557   Epoch: 5   Global Step: 84770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:12:05,167-Speed 3373.57 samples/sec   Loss 2.7124   LearningRate 0.0557   Epoch: 5   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:08,267-Speed 3304.57 samples/sec   Loss 2.6815   LearningRate 0.0557   Epoch: 5   Global Step: 84790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:11,376-Speed 3294.69 samples/sec   Loss 2.6728   LearningRate 0.0556   Epoch: 5   Global Step: 84800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:14,470-Speed 3311.10 samples/sec   Loss 2.6832   LearningRate 0.0556   Epoch: 5   Global Step: 84810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:17,546-Speed 3330.55 samples/sec   Loss 2.6893   LearningRate 0.0556   Epoch: 5   Global Step: 84820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:20,666-Speed 3282.84 samples/sec   Loss 2.7039   LearningRate 0.0556   Epoch: 5   Global Step: 84830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:23,890-Speed 3177.13 samples/sec   Loss 2.6455   LearningRate 0.0556   Epoch: 5   Global Step: 84840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:26,988-Speed 3306.73 samples/sec   Loss 2.7144   LearningRate 0.0556   Epoch: 5   Global Step: 84850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:30,057-Speed 3337.29 samples/sec   Loss 2.6966   LearningRate 0.0556   Epoch: 5   Global Step: 84860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:33,191-Speed 3269.38 samples/sec   Loss 2.7138   LearningRate 0.0556   Epoch: 5   Global Step: 84870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:36,291-Speed 3303.21 samples/sec   Loss 2.7152   LearningRate 0.0556   Epoch: 5   Global Step: 84880   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:12:39,437-Speed 3256.67 samples/sec   Loss 2.6850   LearningRate 0.0556   Epoch: 5   Global Step: 84890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:12:42,570-Speed 3269.50 samples/sec   Loss 2.7636   LearningRate 0.0556   Epoch: 5   Global Step: 84900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:12:45,632-Speed 3345.44 samples/sec   Loss 2.6034   LearningRate 0.0556   Epoch: 5   Global Step: 84910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:48,744-Speed 3291.50 samples/sec   Loss 2.7109   LearningRate 0.0556   Epoch: 5   Global Step: 84920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:51,839-Speed 3310.58 samples/sec   Loss 2.7258   LearningRate 0.0556   Epoch: 5   Global Step: 84930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:54,935-Speed 3307.73 samples/sec   Loss 2.6931   LearningRate 0.0556   Epoch: 5   Global Step: 84940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:12:58,026-Speed 3314.80 samples/sec   Loss 2.7243   LearningRate 0.0556   Epoch: 5   Global Step: 84950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:13:01,186-Speed 3242.05 samples/sec   Loss 2.6895   LearningRate 0.0556   Epoch: 5   Global Step: 84960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:13:04,274-Speed 3316.35 samples/sec   Loss 2.7207   LearningRate 0.0556   Epoch: 5   Global Step: 84970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:13:07,347-Speed 3333.94 samples/sec   Loss 2.7949   LearningRate 0.0556   Epoch: 5   Global Step: 84980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:13:10,432-Speed 3320.67 samples/sec   Loss 2.6839   LearningRate 0.0556   Epoch: 5   Global Step: 84990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:13:13,497-Speed 3342.50 samples/sec   Loss 2.7870   LearningRate 0.0556   Epoch: 5   Global Step: 85000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:13:16,558-Speed 3345.77 samples/sec   Loss 2.7082   LearningRate 0.0556   Epoch: 5   Global Step: 85010   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:19,674-Speed 3287.52 samples/sec   Loss 2.7022   LearningRate 0.0555   Epoch: 5   Global Step: 85020   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:22,756-Speed 3324.41 samples/sec   Loss 2.7099   LearningRate 0.0555   Epoch: 5   Global Step: 85030   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:25,862-Speed 3298.14 samples/sec   Loss 2.7089   LearningRate 0.0555   Epoch: 5   Global Step: 85040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:29,070-Speed 3192.84 samples/sec   Loss 2.7207   LearningRate 0.0555   Epoch: 5   Global Step: 85050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:32,208-Speed 3264.80 samples/sec   Loss 2.7262   LearningRate 0.0555   Epoch: 5   Global Step: 85060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:35,292-Speed 3321.84 samples/sec   Loss 2.7101   LearningRate 0.0555   Epoch: 5   Global Step: 85070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:38,383-Speed 3315.33 samples/sec   Loss 2.7406   LearningRate 0.0555   Epoch: 5   Global Step: 85080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:41,550-Speed 3233.93 samples/sec   Loss 2.6170   LearningRate 0.0555   Epoch: 5   Global Step: 85090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:44,627-Speed 3329.62 samples/sec   Loss 2.7189   LearningRate 0.0555   Epoch: 5   Global Step: 85100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:47,806-Speed 3222.76 samples/sec   Loss 2.6757   LearningRate 0.0555   Epoch: 5   Global Step: 85110   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 08:13:50,927-Speed 3281.83 samples/sec   Loss 2.7672   LearningRate 0.0555   Epoch: 5   Global Step: 85120   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:54,089-Speed 3239.53 samples/sec   Loss 2.8677   LearningRate 0.0555   Epoch: 5   Global Step: 85130   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:13:57,229-Speed 3261.94 samples/sec   Loss 2.7471   LearningRate 0.0555   Epoch: 5   Global Step: 85140   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:00,327-Speed 3306.52 samples/sec   Loss 2.6947   LearningRate 0.0555   Epoch: 5   Global Step: 85150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:03,482-Speed 3247.12 samples/sec   Loss 2.7554   LearningRate 0.0555   Epoch: 5   Global Step: 85160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:06,600-Speed 3285.26 samples/sec   Loss 2.7356   LearningRate 0.0555   Epoch: 5   Global Step: 85170   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:09,743-Speed 3260.23 samples/sec   Loss 2.6858   LearningRate 0.0555   Epoch: 5   Global Step: 85180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:12,865-Speed 3280.53 samples/sec   Loss 2.7022   LearningRate 0.0555   Epoch: 5   Global Step: 85190   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:15,984-Speed 3283.95 samples/sec   Loss 2.6756   LearningRate 0.0555   Epoch: 5   Global Step: 85200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:19,127-Speed 3259.68 samples/sec   Loss 2.7824   LearningRate 0.0555   Epoch: 5   Global Step: 85210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:22,297-Speed 3231.97 samples/sec   Loss 2.7541   LearningRate 0.0555   Epoch: 5   Global Step: 85220   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 08:14:25,443-Speed 3255.97 samples/sec   Loss 2.7145   LearningRate 0.0555   Epoch: 5   Global Step: 85230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:28,672-Speed 3172.40 samples/sec   Loss 2.6604   LearningRate 0.0555   Epoch: 5   Global Step: 85240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:31,828-Speed 3246.00 samples/sec   Loss 2.6677   LearningRate 0.0554   Epoch: 5   Global Step: 85250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:34,892-Speed 3343.44 samples/sec   Loss 2.6812   LearningRate 0.0554   Epoch: 5   Global Step: 85260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:37,989-Speed 3306.99 samples/sec   Loss 2.7567   LearningRate 0.0554   Epoch: 5   Global Step: 85270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:41,093-Speed 3300.41 samples/sec   Loss 2.8013   LearningRate 0.0554   Epoch: 5   Global Step: 85280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:44,178-Speed 3320.41 samples/sec   Loss 2.7523   LearningRate 0.0554   Epoch: 5   Global Step: 85290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:47,277-Speed 3305.47 samples/sec   Loss 2.7506   LearningRate 0.0554   Epoch: 5   Global Step: 85300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:50,393-Speed 3286.84 samples/sec   Loss 2.7574   LearningRate 0.0554   Epoch: 5   Global Step: 85310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:53,522-Speed 3273.46 samples/sec   Loss 2.6798   LearningRate 0.0554   Epoch: 5   Global Step: 85320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:56,656-Speed 3267.88 samples/sec   Loss 2.7044   LearningRate 0.0554   Epoch: 5   Global Step: 85330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:14:59,751-Speed 3310.09 samples/sec   Loss 2.7265   LearningRate 0.0554   Epoch: 5   Global Step: 85340   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:02,870-Speed 3283.98 samples/sec   Loss 2.7213   LearningRate 0.0554   Epoch: 5   Global Step: 85350   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:06,008-Speed 3264.33 samples/sec   Loss 2.7824   LearningRate 0.0554   Epoch: 5   Global Step: 85360   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:09,105-Speed 3307.17 samples/sec   Loss 2.7216   LearningRate 0.0554   Epoch: 5   Global Step: 85370   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:12,189-Speed 3322.72 samples/sec   Loss 2.6988   LearningRate 0.0554   Epoch: 5   Global Step: 85380   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:15,378-Speed 3211.40 samples/sec   Loss 2.7091   LearningRate 0.0554   Epoch: 5   Global Step: 85390   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:18,471-Speed 3311.88 samples/sec   Loss 2.6847   LearningRate 0.0554   Epoch: 5   Global Step: 85400   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:21,542-Speed 3335.96 samples/sec   Loss 2.7059   LearningRate 0.0554   Epoch: 5   Global Step: 85410   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:24,631-Speed 3316.09 samples/sec   Loss 2.7352   LearningRate 0.0554   Epoch: 5   Global Step: 85420   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:27,687-Speed 3352.42 samples/sec   Loss 2.7384   LearningRate 0.0554   Epoch: 5   Global Step: 85430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:30,825-Speed 3265.30 samples/sec   Loss 2.7079   LearningRate 0.0554   Epoch: 5   Global Step: 85440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:33,940-Speed 3287.83 samples/sec   Loss 2.6945   LearningRate 0.0554   Epoch: 5   Global Step: 85450   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:37,040-Speed 3304.60 samples/sec   Loss 2.7528   LearningRate 0.0554   Epoch: 5   Global Step: 85460   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:40,117-Speed 3329.06 samples/sec   Loss 2.7516   LearningRate 0.0553   Epoch: 5   Global Step: 85470   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:43,182-Speed 3342.14 samples/sec   Loss 2.7088   LearningRate 0.0553   Epoch: 5   Global Step: 85480   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:46,306-Speed 3278.78 samples/sec   Loss 2.7541   LearningRate 0.0553   Epoch: 5   Global Step: 85490   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:49,409-Speed 3301.46 samples/sec   Loss 2.6730   LearningRate 0.0553   Epoch: 5   Global Step: 85500   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:15:52,498-Speed 3316.97 samples/sec   Loss 2.7159   LearningRate 0.0553   Epoch: 5   Global Step: 85510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:15:55,671-Speed 3227.71 samples/sec   Loss 2.7101   LearningRate 0.0553   Epoch: 5   Global Step: 85520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:15:58,848-Speed 3224.93 samples/sec   Loss 2.7429   LearningRate 0.0553   Epoch: 5   Global Step: 85530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:16:01,987-Speed 3262.91 samples/sec   Loss 2.7237   LearningRate 0.0553   Epoch: 5   Global Step: 85540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:16:05,092-Speed 3298.56 samples/sec   Loss 2.6678   LearningRate 0.0553   Epoch: 5   Global Step: 85550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:16:08,178-Speed 3319.32 samples/sec   Loss 2.7867   LearningRate 0.0553   Epoch: 5   Global Step: 85560   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:16:11,246-Speed 3339.28 samples/sec   Loss 2.7317   LearningRate 0.0553   Epoch: 5   Global Step: 85570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:16:14,381-Speed 3267.38 samples/sec   Loss 2.7464   LearningRate 0.0553   Epoch: 5   Global Step: 85580   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:17,455-Speed 3331.47 samples/sec   Loss 2.7278   LearningRate 0.0553   Epoch: 5   Global Step: 85590   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:20,539-Speed 3321.76 samples/sec   Loss 2.7368   LearningRate 0.0553   Epoch: 5   Global Step: 85600   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:23,614-Speed 3331.13 samples/sec   Loss 2.7736   LearningRate 0.0553   Epoch: 5   Global Step: 85610   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:26,774-Speed 3240.78 samples/sec   Loss 2.7418   LearningRate 0.0553   Epoch: 5   Global Step: 85620   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:29,890-Speed 3287.56 samples/sec   Loss 2.7411   LearningRate 0.0553   Epoch: 5   Global Step: 85630   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:32,952-Speed 3345.29 samples/sec   Loss 2.6735   LearningRate 0.0553   Epoch: 5   Global Step: 85640   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:36,066-Speed 3289.72 samples/sec   Loss 2.7230   LearningRate 0.0553   Epoch: 5   Global Step: 85650   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:39,188-Speed 3280.60 samples/sec   Loss 2.7393   LearningRate 0.0553   Epoch: 5   Global Step: 85660   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:42,290-Speed 3301.23 samples/sec   Loss 2.7630   LearningRate 0.0553   Epoch: 5   Global Step: 85670   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:16:45,403-Speed 3290.84 samples/sec   Loss 2.7639   LearningRate 0.0553   Epoch: 5   Global Step: 85680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:16:48,521-Speed 3285.56 samples/sec   Loss 2.7926   LearningRate 0.0553   Epoch: 5   Global Step: 85690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:16:51,704-Speed 3217.25 samples/sec   Loss 2.7880   LearningRate 0.0552   Epoch: 5   Global Step: 85700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:16:54,869-Speed 3236.24 samples/sec   Loss 2.7459   LearningRate 0.0552   Epoch: 5   Global Step: 85710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:16:58,032-Speed 3239.06 samples/sec   Loss 2.8350   LearningRate 0.0552   Epoch: 5   Global Step: 85720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:01,090-Speed 3349.17 samples/sec   Loss 2.7712   LearningRate 0.0552   Epoch: 5   Global Step: 85730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:04,155-Speed 3340.91 samples/sec   Loss 2.7429   LearningRate 0.0552   Epoch: 5   Global Step: 85740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:07,224-Speed 3338.18 samples/sec   Loss 2.7441   LearningRate 0.0552   Epoch: 5   Global Step: 85750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:10,306-Speed 3322.98 samples/sec   Loss 2.7997   LearningRate 0.0552   Epoch: 5   Global Step: 85760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:13,407-Speed 3303.15 samples/sec   Loss 2.7436   LearningRate 0.0552   Epoch: 5   Global Step: 85770   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:16,465-Speed 3349.88 samples/sec   Loss 2.6409   LearningRate 0.0552   Epoch: 5   Global Step: 85780   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:19,551-Speed 3319.42 samples/sec   Loss 2.7951   LearningRate 0.0552   Epoch: 5   Global Step: 85790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:22,623-Speed 3334.28 samples/sec   Loss 2.7143   LearningRate 0.0552   Epoch: 5   Global Step: 85800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:25,688-Speed 3341.41 samples/sec   Loss 2.7264   LearningRate 0.0552   Epoch: 5   Global Step: 85810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:28,788-Speed 3304.72 samples/sec   Loss 2.7524   LearningRate 0.0552   Epoch: 5   Global Step: 85820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:31,856-Speed 3338.06 samples/sec   Loss 2.7909   LearningRate 0.0552   Epoch: 5   Global Step: 85830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:34,926-Speed 3337.06 samples/sec   Loss 2.7874   LearningRate 0.0552   Epoch: 5   Global Step: 85840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:38,096-Speed 3230.24 samples/sec   Loss 2.8486   LearningRate 0.0552   Epoch: 5   Global Step: 85850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:41,254-Speed 3243.85 samples/sec   Loss 2.7685   LearningRate 0.0552   Epoch: 5   Global Step: 85860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:44,434-Speed 3220.80 samples/sec   Loss 2.7658   LearningRate 0.0552   Epoch: 5   Global Step: 85870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:17:47,596-Speed 3239.26 samples/sec   Loss 2.7246   LearningRate 0.0552   Epoch: 5   Global Step: 85880   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:17:50,659-Speed 3344.27 samples/sec   Loss 2.7072   LearningRate 0.0552   Epoch: 5   Global Step: 85890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:17:53,825-Speed 3235.57 samples/sec   Loss 2.6839   LearningRate 0.0552   Epoch: 5   Global Step: 85900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:17:56,932-Speed 3296.51 samples/sec   Loss 2.7181   LearningRate 0.0552   Epoch: 5   Global Step: 85910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:18:00,005-Speed 3332.68 samples/sec   Loss 2.7961   LearningRate 0.0551   Epoch: 5   Global Step: 85920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:18:03,110-Speed 3298.82 samples/sec   Loss 2.7563   LearningRate 0.0551   Epoch: 5   Global Step: 85930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:18:06,181-Speed 3336.25 samples/sec   Loss 2.6914   LearningRate 0.0551   Epoch: 5   Global Step: 85940   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:18:09,244-Speed 3343.87 samples/sec   Loss 2.7179   LearningRate 0.0551   Epoch: 5   Global Step: 85950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:18:12,330-Speed 3318.45 samples/sec   Loss 2.8368   LearningRate 0.0551   Epoch: 5   Global Step: 85960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:18:15,392-Speed 3345.05 samples/sec   Loss 2.7821   LearningRate 0.0551   Epoch: 5   Global Step: 85970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:18:18,453-Speed 3346.17 samples/sec   Loss 2.7086   LearningRate 0.0551   Epoch: 5   Global Step: 85980   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 08:18:21,504-Speed 3357.04 samples/sec   Loss 2.7399   LearningRate 0.0551   Epoch: 5   Global Step: 85990   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:18:24,667-Speed 3239.55 samples/sec   Loss 2.7749   LearningRate 0.0551   Epoch: 5   Global Step: 86000   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:19:09,866-[lfw][86000]XNorm: 22.121675
Training: 2022-04-11 08:19:09,866-[lfw][86000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 08:19:09,867-[lfw][86000]Accuracy-Highest: 0.99817
Training: 2022-04-11 08:20:01,431-[cfp_fp][86000]XNorm: 20.686443
Training: 2022-04-11 08:20:01,432-[cfp_fp][86000]Accuracy-Flip: 0.98471+-0.00491
Training: 2022-04-11 08:20:01,433-[cfp_fp][86000]Accuracy-Highest: 0.98557
Training: 2022-04-11 08:20:45,668-[agedb_30][86000]XNorm: 22.060804
Training: 2022-04-11 08:20:45,669-[agedb_30][86000]Accuracy-Flip: 0.97917+-0.00779
Training: 2022-04-11 08:20:45,669-[agedb_30][86000]Accuracy-Highest: 0.98167
Training: 2022-04-11 08:20:48,747-Speed 71.07 samples/sec   Loss 2.7539   LearningRate 0.0551   Epoch: 5   Global Step: 86010   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:20:51,802-Speed 3352.41 samples/sec   Loss 2.7873   LearningRate 0.0551   Epoch: 5   Global Step: 86020   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:20:54,855-Speed 3355.72 samples/sec   Loss 2.8030   LearningRate 0.0551   Epoch: 5   Global Step: 86030   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:20:57,963-Speed 3296.07 samples/sec   Loss 2.8711   LearningRate 0.0551   Epoch: 5   Global Step: 86040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:21:01,068-Speed 3298.29 samples/sec   Loss 2.8131   LearningRate 0.0551   Epoch: 5   Global Step: 86050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:21:04,113-Speed 3363.97 samples/sec   Loss 2.7641   LearningRate 0.0551   Epoch: 5   Global Step: 86060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:07,190-Speed 3328.24 samples/sec   Loss 2.7497   LearningRate 0.0551   Epoch: 5   Global Step: 86070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:10,254-Speed 3344.10 samples/sec   Loss 2.7669   LearningRate 0.0551   Epoch: 5   Global Step: 86080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:13,325-Speed 3335.31 samples/sec   Loss 2.7119   LearningRate 0.0551   Epoch: 5   Global Step: 86090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:16,490-Speed 3235.57 samples/sec   Loss 2.7545   LearningRate 0.0551   Epoch: 5   Global Step: 86100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:19,558-Speed 3338.56 samples/sec   Loss 2.8207   LearningRate 0.0551   Epoch: 5   Global Step: 86110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:22,688-Speed 3272.07 samples/sec   Loss 2.7246   LearningRate 0.0551   Epoch: 5   Global Step: 86120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:25,773-Speed 3319.87 samples/sec   Loss 2.8466   LearningRate 0.0551   Epoch: 5   Global Step: 86130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:28,851-Speed 3328.89 samples/sec   Loss 2.7357   LearningRate 0.0550   Epoch: 5   Global Step: 86140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:31,924-Speed 3333.54 samples/sec   Loss 2.7613   LearningRate 0.0550   Epoch: 5   Global Step: 86150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:34,994-Speed 3335.76 samples/sec   Loss 2.7612   LearningRate 0.0550   Epoch: 5   Global Step: 86160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:21:38,063-Speed 3337.15 samples/sec   Loss 2.7020   LearningRate 0.0550   Epoch: 5   Global Step: 86170   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:21:41,123-Speed 3347.82 samples/sec   Loss 2.7450   LearningRate 0.0550   Epoch: 5   Global Step: 86180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:44,197-Speed 3332.26 samples/sec   Loss 2.8324   LearningRate 0.0550   Epoch: 5   Global Step: 86190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:47,294-Speed 3307.08 samples/sec   Loss 2.7143   LearningRate 0.0550   Epoch: 5   Global Step: 86200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:50,371-Speed 3328.50 samples/sec   Loss 2.7317   LearningRate 0.0550   Epoch: 5   Global Step: 86210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:53,470-Speed 3305.12 samples/sec   Loss 2.7402   LearningRate 0.0550   Epoch: 5   Global Step: 86220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:56,579-Speed 3304.13 samples/sec   Loss 2.7670   LearningRate 0.0550   Epoch: 5   Global Step: 86230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:21:59,656-Speed 3328.22 samples/sec   Loss 2.8559   LearningRate 0.0550   Epoch: 5   Global Step: 86240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:02,741-Speed 3320.60 samples/sec   Loss 2.8103   LearningRate 0.0550   Epoch: 5   Global Step: 86250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:05,867-Speed 3276.15 samples/sec   Loss 2.7998   LearningRate 0.0550   Epoch: 5   Global Step: 86260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:08,947-Speed 3327.50 samples/sec   Loss 2.7463   LearningRate 0.0550   Epoch: 5   Global Step: 86270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:12,013-Speed 3341.03 samples/sec   Loss 2.7115   LearningRate 0.0550   Epoch: 5   Global Step: 86280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:22:15,131-Speed 3285.71 samples/sec   Loss 2.7212   LearningRate 0.0550   Epoch: 5   Global Step: 86290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:22:18,207-Speed 3330.52 samples/sec   Loss 2.8438   LearningRate 0.0550   Epoch: 5   Global Step: 86300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:22:21,281-Speed 3331.69 samples/sec   Loss 2.8208   LearningRate 0.0550   Epoch: 5   Global Step: 86310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:22:24,345-Speed 3342.58 samples/sec   Loss 2.8473   LearningRate 0.0550   Epoch: 5   Global Step: 86320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:27,475-Speed 3342.00 samples/sec   Loss 2.8024   LearningRate 0.0550   Epoch: 5   Global Step: 86330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:30,544-Speed 3336.92 samples/sec   Loss 2.7708   LearningRate 0.0550   Epoch: 5   Global Step: 86340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:33,606-Speed 3345.45 samples/sec   Loss 2.8038   LearningRate 0.0550   Epoch: 5   Global Step: 86350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:36,698-Speed 3313.05 samples/sec   Loss 2.9144   LearningRate 0.0550   Epoch: 5   Global Step: 86360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:39,756-Speed 3350.68 samples/sec   Loss 2.8020   LearningRate 0.0549   Epoch: 5   Global Step: 86370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:42,826-Speed 3337.74 samples/sec   Loss 2.7238   LearningRate 0.0549   Epoch: 5   Global Step: 86380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:45,904-Speed 3338.99 samples/sec   Loss 2.7323   LearningRate 0.0549   Epoch: 5   Global Step: 86390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:48,966-Speed 3344.59 samples/sec   Loss 2.7237   LearningRate 0.0549   Epoch: 5   Global Step: 86400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:52,125-Speed 3243.24 samples/sec   Loss 2.7674   LearningRate 0.0549   Epoch: 5   Global Step: 86410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:22:55,336-Speed 3189.28 samples/sec   Loss 2.9156   LearningRate 0.0549   Epoch: 5   Global Step: 86420   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:22:58,457-Speed 3288.80 samples/sec   Loss 2.7866   LearningRate 0.0549   Epoch: 5   Global Step: 86430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:23:01,513-Speed 3352.26 samples/sec   Loss 2.7887   LearningRate 0.0549   Epoch: 5   Global Step: 86440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:04,637-Speed 3279.80 samples/sec   Loss 2.7546   LearningRate 0.0549   Epoch: 5   Global Step: 86450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:07,748-Speed 3291.88 samples/sec   Loss 2.7767   LearningRate 0.0549   Epoch: 5   Global Step: 86460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:10,823-Speed 3343.69 samples/sec   Loss 2.8686   LearningRate 0.0549   Epoch: 5   Global Step: 86470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:13,990-Speed 3234.92 samples/sec   Loss 2.7714   LearningRate 0.0549   Epoch: 5   Global Step: 86480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:17,075-Speed 3326.51 samples/sec   Loss 2.8410   LearningRate 0.0549   Epoch: 5   Global Step: 86490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:20,194-Speed 3284.24 samples/sec   Loss 2.7761   LearningRate 0.0549   Epoch: 5   Global Step: 86500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:23,265-Speed 3334.73 samples/sec   Loss 2.7152   LearningRate 0.0549   Epoch: 5   Global Step: 86510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:26,355-Speed 3315.91 samples/sec   Loss 2.7940   LearningRate 0.0549   Epoch: 5   Global Step: 86520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:29,507-Speed 3267.86 samples/sec   Loss 2.7848   LearningRate 0.0549   Epoch: 5   Global Step: 86530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:23:32,573-Speed 3341.18 samples/sec   Loss 2.8120   LearningRate 0.0549   Epoch: 5   Global Step: 86540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:23:35,680-Speed 3296.52 samples/sec   Loss 2.7926   LearningRate 0.0549   Epoch: 5   Global Step: 86550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:23:38,815-Speed 3275.99 samples/sec   Loss 2.7903   LearningRate 0.0549   Epoch: 5   Global Step: 86560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:23:41,973-Speed 3243.52 samples/sec   Loss 2.7675   LearningRate 0.0549   Epoch: 5   Global Step: 86570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:23:45,068-Speed 3324.13 samples/sec   Loss 2.6912   LearningRate 0.0549   Epoch: 5   Global Step: 86580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:23:48,154-Speed 3319.58 samples/sec   Loss 2.8248   LearningRate 0.0549   Epoch: 5   Global Step: 86590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:23:51,247-Speed 3311.74 samples/sec   Loss 2.8096   LearningRate 0.0548   Epoch: 5   Global Step: 86600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:23:54,354-Speed 3297.79 samples/sec   Loss 2.8438   LearningRate 0.0548   Epoch: 5   Global Step: 86610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:23:57,556-Speed 3199.12 samples/sec   Loss 2.8396   LearningRate 0.0548   Epoch: 5   Global Step: 86620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:00,649-Speed 3312.23 samples/sec   Loss 2.7479   LearningRate 0.0548   Epoch: 5   Global Step: 86630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:03,768-Speed 3284.23 samples/sec   Loss 2.8316   LearningRate 0.0548   Epoch: 5   Global Step: 86640   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 08:24:06,838-Speed 3336.41 samples/sec   Loss 2.7402   LearningRate 0.0548   Epoch: 5   Global Step: 86650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:09,945-Speed 3297.54 samples/sec   Loss 2.8506   LearningRate 0.0548   Epoch: 5   Global Step: 86660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:13,074-Speed 3274.30 samples/sec   Loss 2.7800   LearningRate 0.0548   Epoch: 5   Global Step: 86670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:16,312-Speed 3164.52 samples/sec   Loss 2.7797   LearningRate 0.0548   Epoch: 5   Global Step: 86680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:19,562-Speed 3151.49 samples/sec   Loss 2.8056   LearningRate 0.0548   Epoch: 5   Global Step: 86690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:22,661-Speed 3304.91 samples/sec   Loss 2.8158   LearningRate 0.0548   Epoch: 5   Global Step: 86700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:25,817-Speed 3245.68 samples/sec   Loss 2.8054   LearningRate 0.0548   Epoch: 5   Global Step: 86710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:28,922-Speed 3299.34 samples/sec   Loss 2.7572   LearningRate 0.0548   Epoch: 5   Global Step: 86720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:31,988-Speed 3341.60 samples/sec   Loss 2.7746   LearningRate 0.0548   Epoch: 5   Global Step: 86730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:35,121-Speed 3269.03 samples/sec   Loss 2.7140   LearningRate 0.0548   Epoch: 5   Global Step: 86740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:38,273-Speed 3251.61 samples/sec   Loss 2.7261   LearningRate 0.0548   Epoch: 5   Global Step: 86750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:41,402-Speed 3274.26 samples/sec   Loss 2.7009   LearningRate 0.0548   Epoch: 5   Global Step: 86760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:44,501-Speed 3305.78 samples/sec   Loss 2.7368   LearningRate 0.0548   Epoch: 5   Global Step: 86770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:47,595-Speed 3310.50 samples/sec   Loss 2.8645   LearningRate 0.0548   Epoch: 5   Global Step: 86780   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:50,722-Speed 3275.72 samples/sec   Loss 2.8350   LearningRate 0.0548   Epoch: 5   Global Step: 86790   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:53,899-Speed 3224.95 samples/sec   Loss 2.8466   LearningRate 0.0548   Epoch: 5   Global Step: 86800   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:24:57,056-Speed 3245.25 samples/sec   Loss 2.8592   LearningRate 0.0548   Epoch: 5   Global Step: 86810   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:00,142-Speed 3319.06 samples/sec   Loss 2.8091   LearningRate 0.0547   Epoch: 5   Global Step: 86820   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:03,239-Speed 3307.79 samples/sec   Loss 2.8025   LearningRate 0.0547   Epoch: 5   Global Step: 86830   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:06,355-Speed 3287.16 samples/sec   Loss 2.8158   LearningRate 0.0547   Epoch: 5   Global Step: 86840   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:09,457-Speed 3302.12 samples/sec   Loss 2.8762   LearningRate 0.0547   Epoch: 5   Global Step: 86850   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:12,548-Speed 3314.84 samples/sec   Loss 2.7489   LearningRate 0.0547   Epoch: 5   Global Step: 86860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:15,611-Speed 3343.59 samples/sec   Loss 2.8121   LearningRate 0.0547   Epoch: 5   Global Step: 86870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:18,671-Speed 3347.99 samples/sec   Loss 2.7609   LearningRate 0.0547   Epoch: 5   Global Step: 86880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:21,824-Speed 3249.04 samples/sec   Loss 2.8405   LearningRate 0.0547   Epoch: 5   Global Step: 86890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:24,899-Speed 3330.90 samples/sec   Loss 2.7643   LearningRate 0.0547   Epoch: 5   Global Step: 86900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:28,027-Speed 3274.52 samples/sec   Loss 2.7697   LearningRate 0.0547   Epoch: 5   Global Step: 86910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:31,193-Speed 3235.85 samples/sec   Loss 2.8977   LearningRate 0.0547   Epoch: 5   Global Step: 86920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:34,363-Speed 3231.26 samples/sec   Loss 2.8543   LearningRate 0.0547   Epoch: 5   Global Step: 86930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:37,528-Speed 3237.60 samples/sec   Loss 2.8516   LearningRate 0.0547   Epoch: 5   Global Step: 86940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:40,631-Speed 3301.13 samples/sec   Loss 2.7658   LearningRate 0.0547   Epoch: 5   Global Step: 86950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:25:43,775-Speed 3258.30 samples/sec   Loss 2.8753   LearningRate 0.0547   Epoch: 5   Global Step: 86960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:46,855-Speed 3326.79 samples/sec   Loss 2.8597   LearningRate 0.0547   Epoch: 5   Global Step: 86970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:49,971-Speed 3287.62 samples/sec   Loss 2.8118   LearningRate 0.0547   Epoch: 5   Global Step: 86980   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:53,049-Speed 3327.13 samples/sec   Loss 2.8889   LearningRate 0.0547   Epoch: 5   Global Step: 86990   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:56,137-Speed 3317.62 samples/sec   Loss 2.8952   LearningRate 0.0547   Epoch: 5   Global Step: 87000   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:25:59,212-Speed 3331.07 samples/sec   Loss 2.8371   LearningRate 0.0547   Epoch: 5   Global Step: 87010   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:02,358-Speed 3255.86 samples/sec   Loss 2.8163   LearningRate 0.0547   Epoch: 5   Global Step: 87020   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:05,442-Speed 3321.90 samples/sec   Loss 2.8093   LearningRate 0.0547   Epoch: 5   Global Step: 87030   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:08,595-Speed 3249.39 samples/sec   Loss 2.8769   LearningRate 0.0547   Epoch: 5   Global Step: 87040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:11,663-Speed 3338.47 samples/sec   Loss 2.8423   LearningRate 0.0546   Epoch: 5   Global Step: 87050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:14,752-Speed 3316.93 samples/sec   Loss 2.7594   LearningRate 0.0546   Epoch: 5   Global Step: 87060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:17,904-Speed 3250.01 samples/sec   Loss 2.8919   LearningRate 0.0546   Epoch: 5   Global Step: 87070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:21,092-Speed 3213.40 samples/sec   Loss 2.8226   LearningRate 0.0546   Epoch: 5   Global Step: 87080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:24,204-Speed 3291.62 samples/sec   Loss 2.8694   LearningRate 0.0546   Epoch: 5   Global Step: 87090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:27,306-Speed 3302.35 samples/sec   Loss 2.8106   LearningRate 0.0546   Epoch: 5   Global Step: 87100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:30,386-Speed 3325.82 samples/sec   Loss 2.7901   LearningRate 0.0546   Epoch: 5   Global Step: 87110   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:33,567-Speed 3220.08 samples/sec   Loss 2.9263   LearningRate 0.0546   Epoch: 5   Global Step: 87120   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:36,649-Speed 3323.36 samples/sec   Loss 2.8110   LearningRate 0.0546   Epoch: 5   Global Step: 87130   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:39,762-Speed 3290.07 samples/sec   Loss 2.8113   LearningRate 0.0546   Epoch: 5   Global Step: 87140   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:42,829-Speed 3340.57 samples/sec   Loss 2.8173   LearningRate 0.0546   Epoch: 5   Global Step: 87150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:45,956-Speed 3275.81 samples/sec   Loss 2.7951   LearningRate 0.0546   Epoch: 5   Global Step: 87160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:49,066-Speed 3293.61 samples/sec   Loss 2.8966   LearningRate 0.0546   Epoch: 5   Global Step: 87170   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:52,180-Speed 3290.67 samples/sec   Loss 2.8370   LearningRate 0.0546   Epoch: 5   Global Step: 87180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:26:55,258-Speed 3327.78 samples/sec   Loss 2.8378   LearningRate 0.0546   Epoch: 5   Global Step: 87190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:26:58,383-Speed 3277.26 samples/sec   Loss 2.8292   LearningRate 0.0546   Epoch: 5   Global Step: 87200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:01,463-Speed 3327.07 samples/sec   Loss 2.8653   LearningRate 0.0546   Epoch: 5   Global Step: 87210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:04,608-Speed 3256.73 samples/sec   Loss 2.8016   LearningRate 0.0546   Epoch: 5   Global Step: 87220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:07,734-Speed 3277.08 samples/sec   Loss 2.7623   LearningRate 0.0546   Epoch: 5   Global Step: 87230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:10,814-Speed 3326.32 samples/sec   Loss 2.8129   LearningRate 0.0546   Epoch: 5   Global Step: 87240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:13,894-Speed 3326.64 samples/sec   Loss 2.8459   LearningRate 0.0546   Epoch: 5   Global Step: 87250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:17,007-Speed 3290.66 samples/sec   Loss 2.8088   LearningRate 0.0546   Epoch: 5   Global Step: 87260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:20,148-Speed 3260.90 samples/sec   Loss 2.8670   LearningRate 0.0545   Epoch: 5   Global Step: 87270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:23,224-Speed 3331.22 samples/sec   Loss 2.8779   LearningRate 0.0545   Epoch: 5   Global Step: 87280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:26,379-Speed 3247.41 samples/sec   Loss 2.8452   LearningRate 0.0545   Epoch: 5   Global Step: 87290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:27:29,443-Speed 3342.28 samples/sec   Loss 2.7922   LearningRate 0.0545   Epoch: 5   Global Step: 87300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:27:32,539-Speed 3309.87 samples/sec   Loss 2.8058   LearningRate 0.0545   Epoch: 5   Global Step: 87310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:27:35,594-Speed 3352.15 samples/sec   Loss 2.8475   LearningRate 0.0545   Epoch: 5   Global Step: 87320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:38,786-Speed 3209.97 samples/sec   Loss 2.7663   LearningRate 0.0545   Epoch: 5   Global Step: 87330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:41,927-Speed 3261.65 samples/sec   Loss 2.8469   LearningRate 0.0545   Epoch: 5   Global Step: 87340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:45,143-Speed 3184.81 samples/sec   Loss 2.9102   LearningRate 0.0545   Epoch: 5   Global Step: 87350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:48,304-Speed 3240.51 samples/sec   Loss 2.8378   LearningRate 0.0545   Epoch: 5   Global Step: 87360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:51,370-Speed 3341.72 samples/sec   Loss 2.8655   LearningRate 0.0545   Epoch: 5   Global Step: 87370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:54,484-Speed 3289.44 samples/sec   Loss 2.8221   LearningRate 0.0545   Epoch: 5   Global Step: 87380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:27:57,565-Speed 3325.59 samples/sec   Loss 2.8410   LearningRate 0.0545   Epoch: 5   Global Step: 87390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:00,712-Speed 3255.25 samples/sec   Loss 2.8420   LearningRate 0.0545   Epoch: 5   Global Step: 87400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:03,903-Speed 3209.18 samples/sec   Loss 2.9403   LearningRate 0.0545   Epoch: 5   Global Step: 87410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:06,981-Speed 3328.64 samples/sec   Loss 2.8567   LearningRate 0.0545   Epoch: 5   Global Step: 87420   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:28:10,061-Speed 3326.37 samples/sec   Loss 2.8660   LearningRate 0.0545   Epoch: 5   Global Step: 87430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:28:13,186-Speed 3277.70 samples/sec   Loss 2.9825   LearningRate 0.0545   Epoch: 5   Global Step: 87440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:28:16,328-Speed 3260.30 samples/sec   Loss 2.8635   LearningRate 0.0545   Epoch: 5   Global Step: 87450   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:28:19,420-Speed 3313.09 samples/sec   Loss 2.8602   LearningRate 0.0545   Epoch: 5   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:22,529-Speed 3295.32 samples/sec   Loss 2.8246   LearningRate 0.0545   Epoch: 5   Global Step: 87470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:25,669-Speed 3263.09 samples/sec   Loss 2.8689   LearningRate 0.0545   Epoch: 5   Global Step: 87480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:28,813-Speed 3258.38 samples/sec   Loss 2.8286   LearningRate 0.0545   Epoch: 5   Global Step: 87490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:32,063-Speed 3151.90 samples/sec   Loss 2.8444   LearningRate 0.0544   Epoch: 5   Global Step: 87500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:35,162-Speed 3305.78 samples/sec   Loss 2.8398   LearningRate 0.0544   Epoch: 5   Global Step: 87510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:38,266-Speed 3299.92 samples/sec   Loss 2.9409   LearningRate 0.0544   Epoch: 5   Global Step: 87520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:41,387-Speed 3282.86 samples/sec   Loss 2.7825   LearningRate 0.0544   Epoch: 5   Global Step: 87530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:44,524-Speed 3264.83 samples/sec   Loss 2.8211   LearningRate 0.0544   Epoch: 5   Global Step: 87540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:47,658-Speed 3268.36 samples/sec   Loss 2.8124   LearningRate 0.0544   Epoch: 5   Global Step: 87550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:28:50,740-Speed 3323.94 samples/sec   Loss 2.8616   LearningRate 0.0544   Epoch: 5   Global Step: 87560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:28:53,860-Speed 3283.38 samples/sec   Loss 2.8616   LearningRate 0.0544   Epoch: 5   Global Step: 87570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:28:56,949-Speed 3317.25 samples/sec   Loss 2.8061   LearningRate 0.0544   Epoch: 5   Global Step: 87580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:00,049-Speed 3304.10 samples/sec   Loss 2.8013   LearningRate 0.0544   Epoch: 5   Global Step: 87590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:03,182-Speed 3270.02 samples/sec   Loss 2.7201   LearningRate 0.0544   Epoch: 5   Global Step: 87600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:06,247-Speed 3341.53 samples/sec   Loss 2.8403   LearningRate 0.0544   Epoch: 5   Global Step: 87610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:09,402-Speed 3247.09 samples/sec   Loss 2.8478   LearningRate 0.0544   Epoch: 5   Global Step: 87620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:12,490-Speed 3317.35 samples/sec   Loss 2.8020   LearningRate 0.0544   Epoch: 5   Global Step: 87630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:15,636-Speed 3256.00 samples/sec   Loss 2.8775   LearningRate 0.0544   Epoch: 5   Global Step: 87640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:18,704-Speed 3339.05 samples/sec   Loss 2.8516   LearningRate 0.0544   Epoch: 5   Global Step: 87650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:21,774-Speed 3336.41 samples/sec   Loss 2.8350   LearningRate 0.0544   Epoch: 5   Global Step: 87660   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 08:29:24,868-Speed 3311.66 samples/sec   Loss 2.7959   LearningRate 0.0544   Epoch: 5   Global Step: 87670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:28,037-Speed 3232.34 samples/sec   Loss 2.8438   LearningRate 0.0544   Epoch: 5   Global Step: 87680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:31,107-Speed 3336.72 samples/sec   Loss 2.7917   LearningRate 0.0544   Epoch: 5   Global Step: 87690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:34,233-Speed 3277.74 samples/sec   Loss 2.8727   LearningRate 0.0544   Epoch: 5   Global Step: 87700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:37,306-Speed 3333.80 samples/sec   Loss 2.8052   LearningRate 0.0544   Epoch: 5   Global Step: 87710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:40,448-Speed 3261.39 samples/sec   Loss 2.8241   LearningRate 0.0543   Epoch: 5   Global Step: 87720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:43,537-Speed 3315.87 samples/sec   Loss 2.8696   LearningRate 0.0543   Epoch: 5   Global Step: 87730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:46,678-Speed 3261.40 samples/sec   Loss 2.8491   LearningRate 0.0543   Epoch: 5   Global Step: 87740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:49,749-Speed 3335.70 samples/sec   Loss 2.9303   LearningRate 0.0543   Epoch: 5   Global Step: 87750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:52,839-Speed 3315.25 samples/sec   Loss 2.8003   LearningRate 0.0543   Epoch: 5   Global Step: 87760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:55,929-Speed 3315.35 samples/sec   Loss 2.8825   LearningRate 0.0543   Epoch: 5   Global Step: 87770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:29:59,018-Speed 3316.02 samples/sec   Loss 2.9066   LearningRate 0.0543   Epoch: 5   Global Step: 87780   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:02,125-Speed 3296.71 samples/sec   Loss 2.9654   LearningRate 0.0543   Epoch: 5   Global Step: 87790   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:05,236-Speed 3293.48 samples/sec   Loss 2.9109   LearningRate 0.0543   Epoch: 5   Global Step: 87800   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:08,417-Speed 3220.41 samples/sec   Loss 2.8644   LearningRate 0.0543   Epoch: 5   Global Step: 87810   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:11,492-Speed 3330.87 samples/sec   Loss 2.8982   LearningRate 0.0543   Epoch: 5   Global Step: 87820   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:14,590-Speed 3306.65 samples/sec   Loss 2.8050   LearningRate 0.0543   Epoch: 5   Global Step: 87830   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:17,678-Speed 3317.12 samples/sec   Loss 2.8504   LearningRate 0.0543   Epoch: 5   Global Step: 87840   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:20,798-Speed 3283.70 samples/sec   Loss 2.8767   LearningRate 0.0543   Epoch: 5   Global Step: 87850   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:23,893-Speed 3309.47 samples/sec   Loss 2.8781   LearningRate 0.0543   Epoch: 5   Global Step: 87860   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:26,960-Speed 3340.66 samples/sec   Loss 2.8172   LearningRate 0.0543   Epoch: 5   Global Step: 87870   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 08:30:30,018-Speed 3348.89 samples/sec   Loss 2.8640   LearningRate 0.0543   Epoch: 5   Global Step: 87880   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:33,087-Speed 3338.28 samples/sec   Loss 2.9814   LearningRate 0.0543   Epoch: 5   Global Step: 87890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:36,186-Speed 3305.90 samples/sec   Loss 2.8468   LearningRate 0.0543   Epoch: 5   Global Step: 87900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:39,257-Speed 3336.63 samples/sec   Loss 2.8468   LearningRate 0.0543   Epoch: 5   Global Step: 87910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:42,361-Speed 3298.97 samples/sec   Loss 2.8801   LearningRate 0.0543   Epoch: 5   Global Step: 87920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:45,428-Speed 3340.40 samples/sec   Loss 2.8442   LearningRate 0.0543   Epoch: 5   Global Step: 87930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:48,553-Speed 3277.99 samples/sec   Loss 2.8652   LearningRate 0.0543   Epoch: 5   Global Step: 87940   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:51,674-Speed 3282.19 samples/sec   Loss 2.8611   LearningRate 0.0542   Epoch: 5   Global Step: 87950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:54,853-Speed 3223.51 samples/sec   Loss 2.7971   LearningRate 0.0542   Epoch: 5   Global Step: 87960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:30:57,992-Speed 3262.56 samples/sec   Loss 2.8673   LearningRate 0.0542   Epoch: 5   Global Step: 87970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:31:01,134-Speed 3260.58 samples/sec   Loss 2.9045   LearningRate 0.0542   Epoch: 5   Global Step: 87980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:31:04,284-Speed 3252.25 samples/sec   Loss 2.8346   LearningRate 0.0542   Epoch: 5   Global Step: 87990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:31:07,393-Speed 3294.75 samples/sec   Loss 2.8820   LearningRate 0.0542   Epoch: 5   Global Step: 88000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:31:52,273-[lfw][88000]XNorm: 23.610485
Training: 2022-04-11 08:31:52,274-[lfw][88000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-11 08:31:52,274-[lfw][88000]Accuracy-Highest: 0.99817
Training: 2022-04-11 08:32:43,635-[cfp_fp][88000]XNorm: 22.200067
Training: 2022-04-11 08:32:43,636-[cfp_fp][88000]Accuracy-Flip: 0.98529+-0.00419
Training: 2022-04-11 08:32:43,637-[cfp_fp][88000]Accuracy-Highest: 0.98557
Training: 2022-04-11 08:33:27,872-[agedb_30][88000]XNorm: 23.442548
Training: 2022-04-11 08:33:27,872-[agedb_30][88000]Accuracy-Flip: 0.98033+-0.00759
Training: 2022-04-11 08:33:27,873-[agedb_30][88000]Accuracy-Highest: 0.98167
Training: 2022-04-11 08:33:30,943-Speed 71.33 samples/sec   Loss 2.8584   LearningRate 0.0542   Epoch: 5   Global Step: 88010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:33:34,013-Speed 3336.16 samples/sec   Loss 2.9235   LearningRate 0.0542   Epoch: 5   Global Step: 88020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:33:37,073-Speed 3349.11 samples/sec   Loss 2.8253   LearningRate 0.0542   Epoch: 5   Global Step: 88030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:33:40,133-Speed 3348.08 samples/sec   Loss 2.8777   LearningRate 0.0542   Epoch: 5   Global Step: 88040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:33:43,191-Speed 3349.35 samples/sec   Loss 2.9161   LearningRate 0.0542   Epoch: 5   Global Step: 88050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:33:46,262-Speed 3335.10 samples/sec   Loss 2.9034   LearningRate 0.0542   Epoch: 5   Global Step: 88060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:33:49,319-Speed 3351.09 samples/sec   Loss 2.8668   LearningRate 0.0542   Epoch: 5   Global Step: 88070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:33:52,403-Speed 3321.16 samples/sec   Loss 2.9117   LearningRate 0.0542   Epoch: 5   Global Step: 88080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:33:55,466-Speed 3344.03 samples/sec   Loss 2.9654   LearningRate 0.0542   Epoch: 5   Global Step: 88090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:33:58,534-Speed 3337.67 samples/sec   Loss 2.9290   LearningRate 0.0542   Epoch: 5   Global Step: 88100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:01,631-Speed 3308.25 samples/sec   Loss 2.9525   LearningRate 0.0542   Epoch: 5   Global Step: 88110   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:04,729-Speed 3305.77 samples/sec   Loss 2.9024   LearningRate 0.0542   Epoch: 5   Global Step: 88120   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:07,862-Speed 3269.87 samples/sec   Loss 2.9010   LearningRate 0.0542   Epoch: 5   Global Step: 88130   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:12,642-Speed 2142.49 samples/sec   Loss 2.9379   LearningRate 0.0542   Epoch: 5   Global Step: 88140   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:15,704-Speed 3344.79 samples/sec   Loss 2.8141   LearningRate 0.0542   Epoch: 5   Global Step: 88150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:21,422-Speed 1791.24 samples/sec   Loss 2.8845   LearningRate 0.0542   Epoch: 5   Global Step: 88160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:24,540-Speed 3288.38 samples/sec   Loss 2.8931   LearningRate 0.0542   Epoch: 5   Global Step: 88170   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:27,607-Speed 3339.43 samples/sec   Loss 2.9235   LearningRate 0.0541   Epoch: 5   Global Step: 88180   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:30,733-Speed 3276.52 samples/sec   Loss 2.9144   LearningRate 0.0541   Epoch: 5   Global Step: 88190   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:33,956-Speed 3177.89 samples/sec   Loss 2.8496   LearningRate 0.0541   Epoch: 5   Global Step: 88200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:37,038-Speed 3324.13 samples/sec   Loss 2.8586   LearningRate 0.0541   Epoch: 5   Global Step: 88210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:40,199-Speed 3240.12 samples/sec   Loss 2.7877   LearningRate 0.0541   Epoch: 5   Global Step: 88220   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:43,275-Speed 3329.84 samples/sec   Loss 2.8768   LearningRate 0.0541   Epoch: 5   Global Step: 88230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:46,336-Speed 3345.90 samples/sec   Loss 2.8432   LearningRate 0.0541   Epoch: 5   Global Step: 88240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:49,399-Speed 3343.94 samples/sec   Loss 2.8358   LearningRate 0.0541   Epoch: 5   Global Step: 88250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:52,499-Speed 3303.74 samples/sec   Loss 2.8688   LearningRate 0.0541   Epoch: 5   Global Step: 88260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:55,578-Speed 3327.84 samples/sec   Loss 2.9107   LearningRate 0.0541   Epoch: 5   Global Step: 88270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:34:58,651-Speed 3332.85 samples/sec   Loss 2.9163   LearningRate 0.0541   Epoch: 5   Global Step: 88280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:01,709-Speed 3350.24 samples/sec   Loss 2.7760   LearningRate 0.0541   Epoch: 5   Global Step: 88290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:04,769-Speed 3347.21 samples/sec   Loss 2.8503   LearningRate 0.0541   Epoch: 5   Global Step: 88300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:07,829-Speed 3347.58 samples/sec   Loss 2.8402   LearningRate 0.0541   Epoch: 5   Global Step: 88310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:10,895-Speed 3340.57 samples/sec   Loss 2.8343   LearningRate 0.0541   Epoch: 5   Global Step: 88320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:13,954-Speed 3349.02 samples/sec   Loss 2.8469   LearningRate 0.0541   Epoch: 5   Global Step: 88330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:17,021-Speed 3338.67 samples/sec   Loss 2.8478   LearningRate 0.0541   Epoch: 5   Global Step: 88340   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:20,092-Speed 3335.75 samples/sec   Loss 2.8365   LearningRate 0.0541   Epoch: 5   Global Step: 88350   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:23,179-Speed 3317.40 samples/sec   Loss 2.8264   LearningRate 0.0541   Epoch: 5   Global Step: 88360   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:26,335-Speed 3246.09 samples/sec   Loss 2.8608   LearningRate 0.0541   Epoch: 5   Global Step: 88370   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:29,547-Speed 3189.14 samples/sec   Loss 2.8503   LearningRate 0.0541   Epoch: 5   Global Step: 88380   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:32,684-Speed 3265.04 samples/sec   Loss 2.8343   LearningRate 0.0541   Epoch: 5   Global Step: 88390   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:35,780-Speed 3308.97 samples/sec   Loss 2.8203   LearningRate 0.0540   Epoch: 5   Global Step: 88400   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:35:38,835-Speed 3353.84 samples/sec   Loss 2.9084   LearningRate 0.0540   Epoch: 5   Global Step: 88410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:35:41,917-Speed 3323.34 samples/sec   Loss 2.8798   LearningRate 0.0540   Epoch: 5   Global Step: 88420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:35:45,038-Speed 3281.40 samples/sec   Loss 2.8264   LearningRate 0.0540   Epoch: 5   Global Step: 88430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:35:48,127-Speed 3316.12 samples/sec   Loss 2.9215   LearningRate 0.0540   Epoch: 5   Global Step: 88440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:35:51,186-Speed 3348.56 samples/sec   Loss 2.8908   LearningRate 0.0540   Epoch: 5   Global Step: 88450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:35:54,260-Speed 3332.46 samples/sec   Loss 2.8991   LearningRate 0.0540   Epoch: 5   Global Step: 88460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:35:57,320-Speed 3347.36 samples/sec   Loss 2.8286   LearningRate 0.0540   Epoch: 5   Global Step: 88470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:36:00,379-Speed 3347.92 samples/sec   Loss 2.8657   LearningRate 0.0540   Epoch: 5   Global Step: 88480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:36:03,439-Speed 3347.03 samples/sec   Loss 2.8346   LearningRate 0.0540   Epoch: 5   Global Step: 88490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:36:06,498-Speed 3349.14 samples/sec   Loss 2.9175   LearningRate 0.0540   Epoch: 5   Global Step: 88500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:36:09,642-Speed 3257.90 samples/sec   Loss 2.8746   LearningRate 0.0540   Epoch: 5   Global Step: 88510   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:12,700-Speed 3349.35 samples/sec   Loss 2.9293   LearningRate 0.0540   Epoch: 5   Global Step: 88520   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:15,762-Speed 3345.16 samples/sec   Loss 2.8725   LearningRate 0.0540   Epoch: 5   Global Step: 88530   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:18,825-Speed 3343.07 samples/sec   Loss 2.8491   LearningRate 0.0540   Epoch: 5   Global Step: 88540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:21,905-Speed 3326.18 samples/sec   Loss 2.8950   LearningRate 0.0540   Epoch: 5   Global Step: 88550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:24,964-Speed 3348.38 samples/sec   Loss 2.8786   LearningRate 0.0540   Epoch: 5   Global Step: 88560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:28,025-Speed 3345.81 samples/sec   Loss 2.9176   LearningRate 0.0540   Epoch: 5   Global Step: 88570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:31,088-Speed 3343.69 samples/sec   Loss 2.7958   LearningRate 0.0540   Epoch: 5   Global Step: 88580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:34,167-Speed 3327.00 samples/sec   Loss 2.8297   LearningRate 0.0540   Epoch: 5   Global Step: 88590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:37,230-Speed 3344.61 samples/sec   Loss 2.9274   LearningRate 0.0540   Epoch: 5   Global Step: 88600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:40,288-Speed 3349.12 samples/sec   Loss 2.9147   LearningRate 0.0540   Epoch: 5   Global Step: 88610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:43,354-Speed 3340.76 samples/sec   Loss 2.9081   LearningRate 0.0540   Epoch: 5   Global Step: 88620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:46,418-Speed 3342.55 samples/sec   Loss 2.9341   LearningRate 0.0539   Epoch: 5   Global Step: 88630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:49,483-Speed 3341.73 samples/sec   Loss 2.9182   LearningRate 0.0539   Epoch: 5   Global Step: 88640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:52,544-Speed 3346.28 samples/sec   Loss 2.9048   LearningRate 0.0539   Epoch: 5   Global Step: 88650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:55,607-Speed 3345.35 samples/sec   Loss 2.9062   LearningRate 0.0539   Epoch: 5   Global Step: 88660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:36:58,688-Speed 3323.58 samples/sec   Loss 2.8775   LearningRate 0.0539   Epoch: 5   Global Step: 88670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:01,762-Speed 3332.88 samples/sec   Loss 2.8758   LearningRate 0.0539   Epoch: 5   Global Step: 88680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:04,838-Speed 3329.34 samples/sec   Loss 2.9341   LearningRate 0.0539   Epoch: 5   Global Step: 88690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:07,964-Speed 3276.59 samples/sec   Loss 2.9250   LearningRate 0.0539   Epoch: 5   Global Step: 88700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:11,041-Speed 3328.93 samples/sec   Loss 2.8775   LearningRate 0.0539   Epoch: 5   Global Step: 88710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:14,109-Speed 3338.21 samples/sec   Loss 2.8573   LearningRate 0.0539   Epoch: 5   Global Step: 88720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:17,166-Speed 3350.60 samples/sec   Loss 2.9075   LearningRate 0.0539   Epoch: 5   Global Step: 88730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:20,226-Speed 3347.73 samples/sec   Loss 2.8630   LearningRate 0.0539   Epoch: 5   Global Step: 88740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:23,284-Speed 3349.90 samples/sec   Loss 2.8405   LearningRate 0.0539   Epoch: 5   Global Step: 88750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:26,343-Speed 3347.84 samples/sec   Loss 2.8589   LearningRate 0.0539   Epoch: 5   Global Step: 88760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:29,408-Speed 3341.41 samples/sec   Loss 2.8887   LearningRate 0.0539   Epoch: 5   Global Step: 88770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:32,469-Speed 3346.29 samples/sec   Loss 2.8661   LearningRate 0.0539   Epoch: 5   Global Step: 88780   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:35,527-Speed 3349.23 samples/sec   Loss 2.8828   LearningRate 0.0539   Epoch: 5   Global Step: 88790   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:38,586-Speed 3349.01 samples/sec   Loss 2.8862   LearningRate 0.0539   Epoch: 5   Global Step: 88800   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:41,646-Speed 3347.67 samples/sec   Loss 2.8111   LearningRate 0.0539   Epoch: 5   Global Step: 88810   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:44,713-Speed 3339.74 samples/sec   Loss 2.8490   LearningRate 0.0539   Epoch: 5   Global Step: 88820   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:47,838-Speed 3277.77 samples/sec   Loss 2.8824   LearningRate 0.0539   Epoch: 5   Global Step: 88830   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:50,928-Speed 3314.98 samples/sec   Loss 2.8700   LearningRate 0.0539   Epoch: 5   Global Step: 88840   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:37:54,028-Speed 3303.51 samples/sec   Loss 2.8841   LearningRate 0.0539   Epoch: 5   Global Step: 88850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:37:57,179-Speed 3250.48 samples/sec   Loss 2.8632   LearningRate 0.0538   Epoch: 5   Global Step: 88860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:00,257-Speed 3328.15 samples/sec   Loss 2.9058   LearningRate 0.0538   Epoch: 5   Global Step: 88870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:03,335-Speed 3328.02 samples/sec   Loss 2.8452   LearningRate 0.0538   Epoch: 5   Global Step: 88880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:06,470-Speed 3267.12 samples/sec   Loss 2.8790   LearningRate 0.0538   Epoch: 5   Global Step: 88890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:09,615-Speed 3257.02 samples/sec   Loss 2.8205   LearningRate 0.0538   Epoch: 5   Global Step: 88900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:12,697-Speed 3323.71 samples/sec   Loss 2.9043   LearningRate 0.0538   Epoch: 5   Global Step: 88910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:15,758-Speed 3345.58 samples/sec   Loss 2.8887   LearningRate 0.0538   Epoch: 5   Global Step: 88920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:18,886-Speed 3274.84 samples/sec   Loss 2.9294   LearningRate 0.0538   Epoch: 5   Global Step: 88930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:22,035-Speed 3252.87 samples/sec   Loss 2.8906   LearningRate 0.0538   Epoch: 5   Global Step: 88940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:25,108-Speed 3333.35 samples/sec   Loss 2.9321   LearningRate 0.0538   Epoch: 5   Global Step: 88950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:28,218-Speed 3293.58 samples/sec   Loss 2.9745   LearningRate 0.0538   Epoch: 5   Global Step: 88960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:31,280-Speed 3344.65 samples/sec   Loss 2.8765   LearningRate 0.0538   Epoch: 5   Global Step: 88970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:34,344-Speed 3343.22 samples/sec   Loss 2.8712   LearningRate 0.0538   Epoch: 5   Global Step: 88980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:37,461-Speed 3285.92 samples/sec   Loss 2.9012   LearningRate 0.0538   Epoch: 5   Global Step: 88990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:40,520-Speed 3348.32 samples/sec   Loss 2.8569   LearningRate 0.0538   Epoch: 5   Global Step: 89000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:43,586-Speed 3340.15 samples/sec   Loss 2.9265   LearningRate 0.0538   Epoch: 5   Global Step: 89010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:46,689-Speed 3301.86 samples/sec   Loss 2.8808   LearningRate 0.0538   Epoch: 5   Global Step: 89020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:49,750-Speed 3345.72 samples/sec   Loss 2.9580   LearningRate 0.0538   Epoch: 5   Global Step: 89030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:52,832-Speed 3323.40 samples/sec   Loss 2.7938   LearningRate 0.0538   Epoch: 5   Global Step: 89040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:38:55,909-Speed 3329.12 samples/sec   Loss 2.9095   LearningRate 0.0538   Epoch: 5   Global Step: 89050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:38:58,976-Speed 3339.66 samples/sec   Loss 2.9040   LearningRate 0.0538   Epoch: 5   Global Step: 89060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:39:02,041-Speed 3342.15 samples/sec   Loss 2.9207   LearningRate 0.0538   Epoch: 5   Global Step: 89070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:39:05,102-Speed 3345.46 samples/sec   Loss 2.8980   LearningRate 0.0538   Epoch: 5   Global Step: 89080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:39:08,203-Speed 3303.09 samples/sec   Loss 2.8223   LearningRate 0.0537   Epoch: 5   Global Step: 89090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:39:11,275-Speed 3334.51 samples/sec   Loss 2.8675   LearningRate 0.0537   Epoch: 5   Global Step: 89100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:14,363-Speed 3316.64 samples/sec   Loss 2.9265   LearningRate 0.0537   Epoch: 5   Global Step: 89110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:17,459-Speed 3308.84 samples/sec   Loss 2.8601   LearningRate 0.0537   Epoch: 5   Global Step: 89120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:20,525-Speed 3340.07 samples/sec   Loss 2.8902   LearningRate 0.0537   Epoch: 5   Global Step: 89130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:23,637-Speed 3291.89 samples/sec   Loss 2.8998   LearningRate 0.0537   Epoch: 5   Global Step: 89140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:26,855-Speed 3183.01 samples/sec   Loss 2.9325   LearningRate 0.0537   Epoch: 5   Global Step: 89150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:30,000-Speed 3256.77 samples/sec   Loss 2.9233   LearningRate 0.0537   Epoch: 5   Global Step: 89160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:33,095-Speed 3324.25 samples/sec   Loss 2.8596   LearningRate 0.0537   Epoch: 5   Global Step: 89170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:36,156-Speed 3345.84 samples/sec   Loss 2.7975   LearningRate 0.0537   Epoch: 5   Global Step: 89180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:39,221-Speed 3341.60 samples/sec   Loss 2.8780   LearningRate 0.0537   Epoch: 5   Global Step: 89190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:39:42,304-Speed 3322.07 samples/sec   Loss 2.8956   LearningRate 0.0537   Epoch: 5   Global Step: 89200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:39:45,506-Speed 3199.72 samples/sec   Loss 2.9224   LearningRate 0.0537   Epoch: 5   Global Step: 89210   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:39:48,745-Speed 3162.27 samples/sec   Loss 2.8457   LearningRate 0.0537   Epoch: 5   Global Step: 89220   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:39:51,972-Speed 3173.64 samples/sec   Loss 2.8697   LearningRate 0.0537   Epoch: 5   Global Step: 89230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:39:55,159-Speed 3214.20 samples/sec   Loss 2.8667   LearningRate 0.0537   Epoch: 5   Global Step: 89240   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:39:58,310-Speed 3250.14 samples/sec   Loss 2.9012   LearningRate 0.0537   Epoch: 5   Global Step: 89250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:40:01,434-Speed 3278.61 samples/sec   Loss 2.8842   LearningRate 0.0537   Epoch: 5   Global Step: 89260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:40:04,491-Speed 3350.42 samples/sec   Loss 2.8571   LearningRate 0.0537   Epoch: 5   Global Step: 89270   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:07,555-Speed 3343.23 samples/sec   Loss 2.8909   LearningRate 0.0537   Epoch: 5   Global Step: 89280   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:10,671-Speed 3288.06 samples/sec   Loss 2.8673   LearningRate 0.0537   Epoch: 5   Global Step: 89290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:13,751-Speed 3324.75 samples/sec   Loss 2.9157   LearningRate 0.0537   Epoch: 5   Global Step: 89300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:16,836-Speed 3320.46 samples/sec   Loss 2.9149   LearningRate 0.0536   Epoch: 5   Global Step: 89310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:19,911-Speed 3330.71 samples/sec   Loss 2.9987   LearningRate 0.0536   Epoch: 5   Global Step: 89320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:22,974-Speed 3344.49 samples/sec   Loss 2.8622   LearningRate 0.0536   Epoch: 5   Global Step: 89330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:26,043-Speed 3337.32 samples/sec   Loss 2.8477   LearningRate 0.0536   Epoch: 5   Global Step: 89340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:29,205-Speed 3239.38 samples/sec   Loss 2.8912   LearningRate 0.0536   Epoch: 5   Global Step: 89350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:32,281-Speed 3329.96 samples/sec   Loss 2.8825   LearningRate 0.0536   Epoch: 5   Global Step: 89360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:35,351-Speed 3336.99 samples/sec   Loss 2.8692   LearningRate 0.0536   Epoch: 5   Global Step: 89370   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:40:38,415-Speed 3342.18 samples/sec   Loss 2.9742   LearningRate 0.0536   Epoch: 5   Global Step: 89380   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:40:41,489-Speed 3331.63 samples/sec   Loss 2.8742   LearningRate 0.0536   Epoch: 5   Global Step: 89390   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:40:44,567-Speed 3327.49 samples/sec   Loss 2.8988   LearningRate 0.0536   Epoch: 5   Global Step: 89400   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:40:47,673-Speed 3298.09 samples/sec   Loss 2.8937   LearningRate 0.0536   Epoch: 5   Global Step: 89410   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:40:50,744-Speed 3335.10 samples/sec   Loss 2.9867   LearningRate 0.0536   Epoch: 5   Global Step: 89420   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:40:53,807-Speed 3343.66 samples/sec   Loss 2.9706   LearningRate 0.0536   Epoch: 5   Global Step: 89430   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:40:56,864-Speed 3351.31 samples/sec   Loss 2.8506   LearningRate 0.0536   Epoch: 5   Global Step: 89440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:40:59,929-Speed 3341.50 samples/sec   Loss 2.8517   LearningRate 0.0536   Epoch: 5   Global Step: 89450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:41:02,995-Speed 3340.97 samples/sec   Loss 2.8600   LearningRate 0.0536   Epoch: 5   Global Step: 89460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:41:06,059-Speed 3342.30 samples/sec   Loss 2.8835   LearningRate 0.0536   Epoch: 5   Global Step: 89470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:41:09,143-Speed 3320.95 samples/sec   Loss 2.8312   LearningRate 0.0536   Epoch: 5   Global Step: 89480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:41:12,245-Speed 3302.48 samples/sec   Loss 2.8619   LearningRate 0.0536   Epoch: 5   Global Step: 89490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:41:15,311-Speed 3340.53 samples/sec   Loss 2.9441   LearningRate 0.0536   Epoch: 5   Global Step: 89500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:41:18,398-Speed 3318.52 samples/sec   Loss 2.9074   LearningRate 0.0536   Epoch: 5   Global Step: 89510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:41:21,476-Speed 3327.01 samples/sec   Loss 2.9491   LearningRate 0.0536   Epoch: 5   Global Step: 89520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:41:24,614-Speed 3264.47 samples/sec   Loss 2.8688   LearningRate 0.0536   Epoch: 5   Global Step: 89530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:41:27,690-Speed 3330.10 samples/sec   Loss 2.8718   LearningRate 0.0535   Epoch: 5   Global Step: 89540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:30,756-Speed 3341.00 samples/sec   Loss 2.8693   LearningRate 0.0535   Epoch: 5   Global Step: 89550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:33,845-Speed 3315.57 samples/sec   Loss 2.9369   LearningRate 0.0535   Epoch: 5   Global Step: 89560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:36,913-Speed 3338.74 samples/sec   Loss 3.0320   LearningRate 0.0535   Epoch: 5   Global Step: 89570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:40,028-Speed 3288.73 samples/sec   Loss 2.9279   LearningRate 0.0535   Epoch: 5   Global Step: 89580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:43,117-Speed 3315.25 samples/sec   Loss 2.8548   LearningRate 0.0535   Epoch: 5   Global Step: 89590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:46,218-Speed 3303.49 samples/sec   Loss 2.8952   LearningRate 0.0535   Epoch: 5   Global Step: 89600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:49,307-Speed 3316.41 samples/sec   Loss 2.8837   LearningRate 0.0535   Epoch: 5   Global Step: 89610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:52,379-Speed 3335.18 samples/sec   Loss 2.9055   LearningRate 0.0535   Epoch: 5   Global Step: 89620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:55,486-Speed 3297.04 samples/sec   Loss 2.8683   LearningRate 0.0535   Epoch: 5   Global Step: 89630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:41:58,550-Speed 3342.87 samples/sec   Loss 2.9352   LearningRate 0.0535   Epoch: 5   Global Step: 89640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:42:01,650-Speed 3303.58 samples/sec   Loss 2.8999   LearningRate 0.0535   Epoch: 5   Global Step: 89650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:42:04,723-Speed 3334.00 samples/sec   Loss 2.8543   LearningRate 0.0535   Epoch: 5   Global Step: 89660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:42:07,797-Speed 3332.86 samples/sec   Loss 2.9027   LearningRate 0.0535   Epoch: 5   Global Step: 89670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:10,887-Speed 3314.95 samples/sec   Loss 2.8771   LearningRate 0.0535   Epoch: 5   Global Step: 89680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:13,956-Speed 3337.99 samples/sec   Loss 2.9104   LearningRate 0.0535   Epoch: 5   Global Step: 89690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:17,076-Speed 3283.06 samples/sec   Loss 2.8789   LearningRate 0.0535   Epoch: 5   Global Step: 89700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:20,181-Speed 3298.80 samples/sec   Loss 2.9011   LearningRate 0.0535   Epoch: 5   Global Step: 89710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:23,261-Speed 3324.96 samples/sec   Loss 2.9048   LearningRate 0.0535   Epoch: 5   Global Step: 89720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:26,329-Speed 3338.40 samples/sec   Loss 2.8899   LearningRate 0.0535   Epoch: 5   Global Step: 89730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:29,397-Speed 3339.07 samples/sec   Loss 2.9373   LearningRate 0.0535   Epoch: 5   Global Step: 89740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:32,468-Speed 3334.43 samples/sec   Loss 2.9440   LearningRate 0.0535   Epoch: 5   Global Step: 89750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:35,561-Speed 3312.73 samples/sec   Loss 2.8101   LearningRate 0.0535   Epoch: 5   Global Step: 89760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:42:38,628-Speed 3338.99 samples/sec   Loss 2.8320   LearningRate 0.0534   Epoch: 5   Global Step: 89770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:42:41,692-Speed 3343.67 samples/sec   Loss 2.8992   LearningRate 0.0534   Epoch: 5   Global Step: 89780   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:42:44,779-Speed 3317.68 samples/sec   Loss 2.9393   LearningRate 0.0534   Epoch: 5   Global Step: 89790   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:42:47,867-Speed 3316.76 samples/sec   Loss 2.9076   LearningRate 0.0534   Epoch: 5   Global Step: 89800   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:42:50,935-Speed 3338.57 samples/sec   Loss 2.8954   LearningRate 0.0534   Epoch: 5   Global Step: 89810   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:42:54,078-Speed 3258.58 samples/sec   Loss 2.9207   LearningRate 0.0534   Epoch: 5   Global Step: 89820   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:42:57,194-Speed 3287.79 samples/sec   Loss 2.9158   LearningRate 0.0534   Epoch: 5   Global Step: 89830   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:00,287-Speed 3312.09 samples/sec   Loss 2.9215   LearningRate 0.0534   Epoch: 5   Global Step: 89840   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:03,372-Speed 3320.09 samples/sec   Loss 2.8453   LearningRate 0.0534   Epoch: 5   Global Step: 89850   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:06,449-Speed 3328.09 samples/sec   Loss 2.9381   LearningRate 0.0534   Epoch: 5   Global Step: 89860   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:09,579-Speed 3273.09 samples/sec   Loss 2.9186   LearningRate 0.0534   Epoch: 5   Global Step: 89870   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:12,661-Speed 3322.73 samples/sec   Loss 2.9461   LearningRate 0.0534   Epoch: 5   Global Step: 89880   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:15,871-Speed 3191.25 samples/sec   Loss 2.8852   LearningRate 0.0534   Epoch: 5   Global Step: 89890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:18,972-Speed 3302.64 samples/sec   Loss 2.8505   LearningRate 0.0534   Epoch: 5   Global Step: 89900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:22,053-Speed 3323.94 samples/sec   Loss 2.8886   LearningRate 0.0534   Epoch: 5   Global Step: 89910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:25,147-Speed 3311.17 samples/sec   Loss 2.9033   LearningRate 0.0534   Epoch: 5   Global Step: 89920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:28,247-Speed 3303.98 samples/sec   Loss 2.9279   LearningRate 0.0534   Epoch: 5   Global Step: 89930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:43:31,381-Speed 3268.56 samples/sec   Loss 2.9476   LearningRate 0.0534   Epoch: 5   Global Step: 89940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:43:34,514-Speed 3268.84 samples/sec   Loss 2.9555   LearningRate 0.0534   Epoch: 5   Global Step: 89950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:43:37,605-Speed 3313.36 samples/sec   Loss 2.8966   LearningRate 0.0534   Epoch: 5   Global Step: 89960   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:43:40,681-Speed 3330.19 samples/sec   Loss 2.7943   LearningRate 0.0534   Epoch: 5   Global Step: 89970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:43:43,749-Speed 3339.05 samples/sec   Loss 2.8517   LearningRate 0.0534   Epoch: 5   Global Step: 89980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:43:46,823-Speed 3331.36 samples/sec   Loss 2.8677   LearningRate 0.0534   Epoch: 5   Global Step: 89990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:43:49,910-Speed 3318.15 samples/sec   Loss 2.7879   LearningRate 0.0533   Epoch: 5   Global Step: 90000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:44:34,214-[lfw][90000]XNorm: 22.871210
Training: 2022-04-11 08:44:34,214-[lfw][90000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 08:44:34,215-[lfw][90000]Accuracy-Highest: 0.99817
Training: 2022-04-11 08:45:25,487-[cfp_fp][90000]XNorm: 21.735875
Training: 2022-04-11 08:45:25,488-[cfp_fp][90000]Accuracy-Flip: 0.98486+-0.00390
Training: 2022-04-11 08:45:25,488-[cfp_fp][90000]Accuracy-Highest: 0.98557
Training: 2022-04-11 08:46:09,529-[agedb_30][90000]XNorm: 23.131157
Training: 2022-04-11 08:46:09,530-[agedb_30][90000]Accuracy-Flip: 0.98200+-0.00649
Training: 2022-04-11 08:46:09,531-[agedb_30][90000]Accuracy-Highest: 0.98200
Training: 2022-04-11 08:46:12,599-Speed 71.76 samples/sec   Loss 2.8684   LearningRate 0.0533   Epoch: 5   Global Step: 90010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:46:15,654-Speed 3352.21 samples/sec   Loss 2.8756   LearningRate 0.0533   Epoch: 5   Global Step: 90020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:46:18,713-Speed 3349.17 samples/sec   Loss 2.9009   LearningRate 0.0533   Epoch: 5   Global Step: 90030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:46:21,820-Speed 3296.25 samples/sec   Loss 2.8865   LearningRate 0.0533   Epoch: 5   Global Step: 90040   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:24,883-Speed 3343.76 samples/sec   Loss 2.9217   LearningRate 0.0533   Epoch: 5   Global Step: 90050   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:27,977-Speed 3310.79 samples/sec   Loss 2.8790   LearningRate 0.0533   Epoch: 5   Global Step: 90060   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:31,033-Speed 3351.05 samples/sec   Loss 2.9798   LearningRate 0.0533   Epoch: 5   Global Step: 90070   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:34,107-Speed 3332.69 samples/sec   Loss 2.8086   LearningRate 0.0533   Epoch: 5   Global Step: 90080   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:37,187-Speed 3325.59 samples/sec   Loss 2.8773   LearningRate 0.0533   Epoch: 5   Global Step: 90090   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:40,263-Speed 3330.17 samples/sec   Loss 2.9965   LearningRate 0.0533   Epoch: 5   Global Step: 90100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:43,321-Speed 3348.92 samples/sec   Loss 2.8520   LearningRate 0.0533   Epoch: 5   Global Step: 90110   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:46,379-Speed 3349.94 samples/sec   Loss 2.8288   LearningRate 0.0533   Epoch: 5   Global Step: 90120   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:49,509-Speed 3272.11 samples/sec   Loss 2.8302   LearningRate 0.0533   Epoch: 5   Global Step: 90130   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:52,581-Speed 3334.77 samples/sec   Loss 2.8797   LearningRate 0.0533   Epoch: 5   Global Step: 90140   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:55,685-Speed 3300.17 samples/sec   Loss 2.8672   LearningRate 0.0533   Epoch: 5   Global Step: 90150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:46:58,781-Speed 3307.93 samples/sec   Loss 2.8962   LearningRate 0.0533   Epoch: 5   Global Step: 90160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:47:01,838-Speed 3351.02 samples/sec   Loss 2.8690   LearningRate 0.0533   Epoch: 5   Global Step: 90170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:04,919-Speed 3324.01 samples/sec   Loss 2.8281   LearningRate 0.0533   Epoch: 5   Global Step: 90180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:08,008-Speed 3316.51 samples/sec   Loss 2.8666   LearningRate 0.0533   Epoch: 5   Global Step: 90190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:11,078-Speed 3336.00 samples/sec   Loss 2.8394   LearningRate 0.0533   Epoch: 5   Global Step: 90200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:14,159-Speed 3324.88 samples/sec   Loss 2.9030   LearningRate 0.0533   Epoch: 5   Global Step: 90210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:17,239-Speed 3325.01 samples/sec   Loss 2.9037   LearningRate 0.0533   Epoch: 5   Global Step: 90220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:20,330-Speed 3313.78 samples/sec   Loss 2.9731   LearningRate 0.0532   Epoch: 5   Global Step: 90230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:23,408-Speed 3328.19 samples/sec   Loss 2.9650   LearningRate 0.0532   Epoch: 5   Global Step: 90240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:26,475-Speed 3340.03 samples/sec   Loss 2.8930   LearningRate 0.0532   Epoch: 5   Global Step: 90250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:29,540-Speed 3341.76 samples/sec   Loss 2.8431   LearningRate 0.0532   Epoch: 5   Global Step: 90260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:47:32,663-Speed 3279.26 samples/sec   Loss 2.9680   LearningRate 0.0532   Epoch: 5   Global Step: 90270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:47:35,745-Speed 3323.52 samples/sec   Loss 2.9616   LearningRate 0.0532   Epoch: 5   Global Step: 90280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:47:38,831-Speed 3320.05 samples/sec   Loss 2.9410   LearningRate 0.0532   Epoch: 5   Global Step: 90290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:47:41,935-Speed 3298.71 samples/sec   Loss 2.8830   LearningRate 0.0532   Epoch: 5   Global Step: 90300   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:47:45,024-Speed 3315.81 samples/sec   Loss 2.9367   LearningRate 0.0532   Epoch: 5   Global Step: 90310   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:47:48,091-Speed 3339.98 samples/sec   Loss 2.9415   LearningRate 0.0532   Epoch: 5   Global Step: 90320   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:47:51,153-Speed 3345.51 samples/sec   Loss 2.8930   LearningRate 0.0532   Epoch: 5   Global Step: 90330   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:47:54,246-Speed 3311.61 samples/sec   Loss 2.8902   LearningRate 0.0532   Epoch: 5   Global Step: 90340   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:47:57,317-Speed 3334.62 samples/sec   Loss 2.8777   LearningRate 0.0532   Epoch: 5   Global Step: 90350   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:48:00,394-Speed 3329.78 samples/sec   Loss 2.9162   LearningRate 0.0532   Epoch: 5   Global Step: 90360   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:48:03,609-Speed 3185.64 samples/sec   Loss 2.8921   LearningRate 0.0532   Epoch: 5   Global Step: 90370   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-11 08:48:06,747-Speed 3264.14 samples/sec   Loss 2.9079   LearningRate 0.0532   Epoch: 5   Global Step: 90380   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:48:09,810-Speed 3343.78 samples/sec   Loss 2.9614   LearningRate 0.0532   Epoch: 5   Global Step: 90390   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:48:12,897-Speed 3317.84 samples/sec   Loss 2.9144   LearningRate 0.0532   Epoch: 5   Global Step: 90400   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:48:15,992-Speed 3309.18 samples/sec   Loss 2.9145   LearningRate 0.0532   Epoch: 5   Global Step: 90410   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:48:19,052-Speed 3347.38 samples/sec   Loss 2.8871   LearningRate 0.0532   Epoch: 5   Global Step: 90420   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:48:22,098-Speed 3363.33 samples/sec   Loss 2.9365   LearningRate 0.0532   Epoch: 5   Global Step: 90430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:25,183-Speed 3319.79 samples/sec   Loss 2.8988   LearningRate 0.0532   Epoch: 5   Global Step: 90440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:28,250-Speed 3339.40 samples/sec   Loss 2.9515   LearningRate 0.0532   Epoch: 5   Global Step: 90450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:31,340-Speed 3314.84 samples/sec   Loss 2.9557   LearningRate 0.0531   Epoch: 5   Global Step: 90460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:34,530-Speed 3211.13 samples/sec   Loss 2.9147   LearningRate 0.0531   Epoch: 5   Global Step: 90470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:37,596-Speed 3341.14 samples/sec   Loss 2.9763   LearningRate 0.0531   Epoch: 5   Global Step: 90480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:40,668-Speed 3333.84 samples/sec   Loss 2.8653   LearningRate 0.0531   Epoch: 5   Global Step: 90490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:43,742-Speed 3332.57 samples/sec   Loss 2.9405   LearningRate 0.0531   Epoch: 5   Global Step: 90500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:46,800-Speed 3349.33 samples/sec   Loss 2.8966   LearningRate 0.0531   Epoch: 5   Global Step: 90510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:49,857-Speed 3350.21 samples/sec   Loss 2.9476   LearningRate 0.0531   Epoch: 5   Global Step: 90520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:48:52,912-Speed 3352.39 samples/sec   Loss 2.8916   LearningRate 0.0531   Epoch: 5   Global Step: 90530   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:48:55,973-Speed 3347.45 samples/sec   Loss 2.9091   LearningRate 0.0531   Epoch: 5   Global Step: 90540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:48:59,032-Speed 3347.85 samples/sec   Loss 2.9281   LearningRate 0.0531   Epoch: 5   Global Step: 90550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:02,089-Speed 3350.44 samples/sec   Loss 2.8808   LearningRate 0.0531   Epoch: 5   Global Step: 90560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:05,148-Speed 3348.03 samples/sec   Loss 2.8392   LearningRate 0.0531   Epoch: 5   Global Step: 90570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:08,276-Speed 3274.85 samples/sec   Loss 2.8682   LearningRate 0.0531   Epoch: 5   Global Step: 90580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:11,336-Speed 3347.28 samples/sec   Loss 2.8234   LearningRate 0.0531   Epoch: 5   Global Step: 90590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:14,390-Speed 3354.08 samples/sec   Loss 2.9081   LearningRate 0.0531   Epoch: 5   Global Step: 90600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:17,450-Speed 3346.99 samples/sec   Loss 2.8875   LearningRate 0.0531   Epoch: 5   Global Step: 90610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:20,511-Speed 3345.91 samples/sec   Loss 2.9597   LearningRate 0.0531   Epoch: 5   Global Step: 90620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:23,625-Speed 3288.86 samples/sec   Loss 2.8704   LearningRate 0.0531   Epoch: 5   Global Step: 90630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:26,794-Speed 3232.42 samples/sec   Loss 2.8319   LearningRate 0.0531   Epoch: 5   Global Step: 90640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:29,949-Speed 3246.02 samples/sec   Loss 2.9510   LearningRate 0.0531   Epoch: 5   Global Step: 90650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:33,132-Speed 3218.83 samples/sec   Loss 2.8600   LearningRate 0.0531   Epoch: 5   Global Step: 90660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:36,373-Speed 3159.80 samples/sec   Loss 2.9485   LearningRate 0.0531   Epoch: 5   Global Step: 90670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:39,475-Speed 3302.20 samples/sec   Loss 2.9225   LearningRate 0.0531   Epoch: 5   Global Step: 90680   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:42,530-Speed 3352.68 samples/sec   Loss 2.8250   LearningRate 0.0530   Epoch: 5   Global Step: 90690   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:45,697-Speed 3233.75 samples/sec   Loss 2.8353   LearningRate 0.0530   Epoch: 5   Global Step: 90700   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:48,929-Speed 3169.43 samples/sec   Loss 2.8348   LearningRate 0.0530   Epoch: 5   Global Step: 90710   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:52,046-Speed 3285.58 samples/sec   Loss 2.8854   LearningRate 0.0530   Epoch: 5   Global Step: 90720   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:55,098-Speed 3356.55 samples/sec   Loss 2.9257   LearningRate 0.0530   Epoch: 5   Global Step: 90730   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:49:58,241-Speed 3258.49 samples/sec   Loss 2.9033   LearningRate 0.0530   Epoch: 5   Global Step: 90740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:01,335-Speed 3310.89 samples/sec   Loss 2.8832   LearningRate 0.0530   Epoch: 5   Global Step: 90750   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:04,409-Speed 3331.17 samples/sec   Loss 2.8448   LearningRate 0.0530   Epoch: 5   Global Step: 90760   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:07,508-Speed 3305.16 samples/sec   Loss 2.8648   LearningRate 0.0530   Epoch: 5   Global Step: 90770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:10,573-Speed 3341.91 samples/sec   Loss 2.8996   LearningRate 0.0530   Epoch: 5   Global Step: 90780   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:13,671-Speed 3306.63 samples/sec   Loss 2.8951   LearningRate 0.0530   Epoch: 5   Global Step: 90790   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:16,783-Speed 3291.18 samples/sec   Loss 2.9835   LearningRate 0.0530   Epoch: 5   Global Step: 90800   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:19,860-Speed 3328.77 samples/sec   Loss 2.8565   LearningRate 0.0530   Epoch: 5   Global Step: 90810   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:22,948-Speed 3316.81 samples/sec   Loss 2.8778   LearningRate 0.0530   Epoch: 5   Global Step: 90820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:26,026-Speed 3328.97 samples/sec   Loss 2.8193   LearningRate 0.0530   Epoch: 5   Global Step: 90830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:29,168-Speed 3260.48 samples/sec   Loss 2.8832   LearningRate 0.0530   Epoch: 5   Global Step: 90840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:32,240-Speed 3334.02 samples/sec   Loss 2.8525   LearningRate 0.0530   Epoch: 5   Global Step: 90850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:35,340-Speed 3303.41 samples/sec   Loss 2.8809   LearningRate 0.0530   Epoch: 5   Global Step: 90860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:38,416-Speed 3329.72 samples/sec   Loss 2.8580   LearningRate 0.0530   Epoch: 5   Global Step: 90870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:41,482-Speed 3341.85 samples/sec   Loss 2.9016   LearningRate 0.0530   Epoch: 5   Global Step: 90880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:50:44,562-Speed 3325.60 samples/sec   Loss 2.8979   LearningRate 0.0530   Epoch: 5   Global Step: 90890   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:47,636-Speed 3331.75 samples/sec   Loss 2.8248   LearningRate 0.0530   Epoch: 5   Global Step: 90900   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:50,749-Speed 3290.48 samples/sec   Loss 2.8308   LearningRate 0.0530   Epoch: 5   Global Step: 90910   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:53,825-Speed 3329.62 samples/sec   Loss 2.8214   LearningRate 0.0529   Epoch: 5   Global Step: 90920   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:56,884-Speed 3348.18 samples/sec   Loss 2.8941   LearningRate 0.0529   Epoch: 5   Global Step: 90930   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:50:59,947-Speed 3343.50 samples/sec   Loss 2.8770   LearningRate 0.0529   Epoch: 5   Global Step: 90940   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:51:03,054-Speed 3297.04 samples/sec   Loss 2.9673   LearningRate 0.0529   Epoch: 5   Global Step: 90950   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:51:06,185-Speed 3271.22 samples/sec   Loss 2.8971   LearningRate 0.0529   Epoch: 5   Global Step: 90960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:51:09,271-Speed 3319.04 samples/sec   Loss 2.9238   LearningRate 0.0529   Epoch: 5   Global Step: 90970   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:51:12,329-Speed 3349.42 samples/sec   Loss 2.8787   LearningRate 0.0529   Epoch: 5   Global Step: 90980   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:51:15,524-Speed 3206.13 samples/sec   Loss 2.9263   LearningRate 0.0529   Epoch: 5   Global Step: 90990   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:51:18,632-Speed 3296.02 samples/sec   Loss 2.9374   LearningRate 0.0529   Epoch: 5   Global Step: 91000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:21,746-Speed 3289.12 samples/sec   Loss 2.8552   LearningRate 0.0529   Epoch: 5   Global Step: 91010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:24,864-Speed 3284.90 samples/sec   Loss 2.8882   LearningRate 0.0529   Epoch: 5   Global Step: 91020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:27,964-Speed 3303.39 samples/sec   Loss 2.8676   LearningRate 0.0529   Epoch: 5   Global Step: 91030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:31,108-Speed 3258.22 samples/sec   Loss 2.9158   LearningRate 0.0529   Epoch: 5   Global Step: 91040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:34,228-Speed 3283.15 samples/sec   Loss 2.8955   LearningRate 0.0529   Epoch: 5   Global Step: 91050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:37,298-Speed 3337.19 samples/sec   Loss 2.9332   LearningRate 0.0529   Epoch: 5   Global Step: 91060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:40,372-Speed 3331.74 samples/sec   Loss 2.8599   LearningRate 0.0529   Epoch: 5   Global Step: 91070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:43,588-Speed 3184.73 samples/sec   Loss 2.9693   LearningRate 0.0529   Epoch: 5   Global Step: 91080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:46,704-Speed 3287.19 samples/sec   Loss 2.8429   LearningRate 0.0529   Epoch: 5   Global Step: 91090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:49,818-Speed 3289.46 samples/sec   Loss 2.8592   LearningRate 0.0529   Epoch: 5   Global Step: 91100   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:51:52,894-Speed 3329.41 samples/sec   Loss 2.9392   LearningRate 0.0529   Epoch: 5   Global Step: 91110   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:51:56,011-Speed 3286.23 samples/sec   Loss 2.8780   LearningRate 0.0529   Epoch: 5   Global Step: 91120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:51:59,147-Speed 3266.51 samples/sec   Loss 2.8870   LearningRate 0.0529   Epoch: 5   Global Step: 91130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:52:02,213-Speed 3341.01 samples/sec   Loss 2.9110   LearningRate 0.0528   Epoch: 5   Global Step: 91140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:52:05,349-Speed 3266.07 samples/sec   Loss 2.8418   LearningRate 0.0528   Epoch: 5   Global Step: 91150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:52:08,449-Speed 3303.82 samples/sec   Loss 2.8874   LearningRate 0.0528   Epoch: 5   Global Step: 91160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:52:11,551-Speed 3302.18 samples/sec   Loss 2.8171   LearningRate 0.0528   Epoch: 5   Global Step: 91170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:52:14,683-Speed 3270.47 samples/sec   Loss 2.9925   LearningRate 0.0528   Epoch: 5   Global Step: 91180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:52:17,793-Speed 3293.15 samples/sec   Loss 2.8249   LearningRate 0.0528   Epoch: 5   Global Step: 91190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:52:20,865-Speed 3334.10 samples/sec   Loss 2.9066   LearningRate 0.0528   Epoch: 5   Global Step: 91200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:52:23,980-Speed 3288.56 samples/sec   Loss 2.8893   LearningRate 0.0528   Epoch: 5   Global Step: 91210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:52:27,172-Speed 3208.27 samples/sec   Loss 2.8739   LearningRate 0.0528   Epoch: 5   Global Step: 91220   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:52:30,259-Speed 3318.78 samples/sec   Loss 2.9019   LearningRate 0.0528   Epoch: 5   Global Step: 91230   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:52:33,314-Speed 3351.95 samples/sec   Loss 2.8564   LearningRate 0.0528   Epoch: 5   Global Step: 91240   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:52:36,438-Speed 3278.86 samples/sec   Loss 2.9311   LearningRate 0.0528   Epoch: 5   Global Step: 91250   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:52:39,574-Speed 3266.01 samples/sec   Loss 2.9194   LearningRate 0.0528   Epoch: 5   Global Step: 91260   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:52:42,694-Speed 3283.40 samples/sec   Loss 2.9338   LearningRate 0.0528   Epoch: 5   Global Step: 91270   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:52:45,842-Speed 3253.75 samples/sec   Loss 2.8947   LearningRate 0.0528   Epoch: 5   Global Step: 91280   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:52:48,910-Speed 3338.40 samples/sec   Loss 2.9083   LearningRate 0.0528   Epoch: 5   Global Step: 91290   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:52:52,012-Speed 3302.10 samples/sec   Loss 2.9753   LearningRate 0.0528   Epoch: 5   Global Step: 91300   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:52:55,075-Speed 3343.27 samples/sec   Loss 2.9329   LearningRate 0.0528   Epoch: 5   Global Step: 91310   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:52:58,135-Speed 3347.40 samples/sec   Loss 2.8974   LearningRate 0.0528   Epoch: 5   Global Step: 91320   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:53:01,210-Speed 3330.75 samples/sec   Loss 2.9375   LearningRate 0.0528   Epoch: 5   Global Step: 91330   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:53:04,273-Speed 3344.73 samples/sec   Loss 2.8332   LearningRate 0.0528   Epoch: 5   Global Step: 91340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:07,341-Speed 3338.47 samples/sec   Loss 2.9422   LearningRate 0.0528   Epoch: 5   Global Step: 91350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:10,421-Speed 3325.07 samples/sec   Loss 2.8671   LearningRate 0.0528   Epoch: 5   Global Step: 91360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:13,488-Speed 3340.35 samples/sec   Loss 2.9536   LearningRate 0.0527   Epoch: 5   Global Step: 91370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:16,550-Speed 3344.51 samples/sec   Loss 2.8188   LearningRate 0.0527   Epoch: 5   Global Step: 91380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:19,626-Speed 3330.66 samples/sec   Loss 2.8907   LearningRate 0.0527   Epoch: 5   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:22,688-Speed 3344.50 samples/sec   Loss 2.9641   LearningRate 0.0527   Epoch: 5   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:25,757-Speed 3338.32 samples/sec   Loss 2.8555   LearningRate 0.0527   Epoch: 5   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:28,817-Speed 3346.46 samples/sec   Loss 2.9007   LearningRate 0.0527   Epoch: 5   Global Step: 91420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:31,894-Speed 3328.62 samples/sec   Loss 2.9515   LearningRate 0.0527   Epoch: 5   Global Step: 91430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:34,960-Speed 3340.57 samples/sec   Loss 2.8910   LearningRate 0.0527   Epoch: 5   Global Step: 91440   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:53:38,092-Speed 3270.77 samples/sec   Loss 2.8841   LearningRate 0.0527   Epoch: 5   Global Step: 91450   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:53:41,159-Speed 3338.87 samples/sec   Loss 2.9551   LearningRate 0.0527   Epoch: 5   Global Step: 91460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:44,228-Speed 3337.61 samples/sec   Loss 2.9645   LearningRate 0.0527   Epoch: 5   Global Step: 91470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:47,294-Speed 3340.75 samples/sec   Loss 2.8970   LearningRate 0.0527   Epoch: 5   Global Step: 91480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:50,358-Speed 3343.66 samples/sec   Loss 2.9339   LearningRate 0.0527   Epoch: 5   Global Step: 91490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:53,439-Speed 3323.34 samples/sec   Loss 2.9941   LearningRate 0.0527   Epoch: 5   Global Step: 91500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:56,501-Speed 3345.16 samples/sec   Loss 2.9002   LearningRate 0.0527   Epoch: 5   Global Step: 91510   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:53:59,580-Speed 3326.94 samples/sec   Loss 2.9608   LearningRate 0.0527   Epoch: 5   Global Step: 91520   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:02,660-Speed 3325.68 samples/sec   Loss 3.0072   LearningRate 0.0527   Epoch: 5   Global Step: 91530   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:05,730-Speed 3335.89 samples/sec   Loss 2.8665   LearningRate 0.0527   Epoch: 5   Global Step: 91540   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:08,795-Speed 3342.12 samples/sec   Loss 2.9212   LearningRate 0.0527   Epoch: 5   Global Step: 91550   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:11,869-Speed 3332.51 samples/sec   Loss 2.9244   LearningRate 0.0527   Epoch: 5   Global Step: 91560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:54:14,958-Speed 3315.37 samples/sec   Loss 2.8915   LearningRate 0.0527   Epoch: 5   Global Step: 91570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:54:18,021-Speed 3344.11 samples/sec   Loss 2.8900   LearningRate 0.0527   Epoch: 5   Global Step: 91580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:54:21,087-Speed 3340.38 samples/sec   Loss 2.8838   LearningRate 0.0527   Epoch: 5   Global Step: 91590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:54:24,162-Speed 3331.19 samples/sec   Loss 2.8602   LearningRate 0.0526   Epoch: 5   Global Step: 91600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:54:27,238-Speed 3330.69 samples/sec   Loss 2.9920   LearningRate 0.0526   Epoch: 5   Global Step: 91610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:54:30,316-Speed 3327.59 samples/sec   Loss 2.9930   LearningRate 0.0526   Epoch: 5   Global Step: 91620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:54:33,477-Speed 3239.25 samples/sec   Loss 2.9303   LearningRate 0.0526   Epoch: 5   Global Step: 91630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:54:36,548-Speed 3336.13 samples/sec   Loss 2.9266   LearningRate 0.0526   Epoch: 5   Global Step: 91640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:39,622-Speed 3332.69 samples/sec   Loss 2.8953   LearningRate 0.0526   Epoch: 5   Global Step: 91650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:42,751-Speed 3272.74 samples/sec   Loss 2.9627   LearningRate 0.0526   Epoch: 5   Global Step: 91660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:45,841-Speed 3315.48 samples/sec   Loss 2.9418   LearningRate 0.0526   Epoch: 5   Global Step: 91670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:49,007-Speed 3235.15 samples/sec   Loss 2.9234   LearningRate 0.0526   Epoch: 5   Global Step: 91680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:52,102-Speed 3309.18 samples/sec   Loss 2.9217   LearningRate 0.0526   Epoch: 5   Global Step: 91690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:55,201-Speed 3305.34 samples/sec   Loss 2.9205   LearningRate 0.0526   Epoch: 5   Global Step: 91700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:54:58,284-Speed 3323.55 samples/sec   Loss 2.9368   LearningRate 0.0526   Epoch: 5   Global Step: 91710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:55:01,351-Speed 3339.66 samples/sec   Loss 2.9167   LearningRate 0.0526   Epoch: 5   Global Step: 91720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:55:04,419-Speed 3337.58 samples/sec   Loss 2.8712   LearningRate 0.0526   Epoch: 5   Global Step: 91730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:55:07,501-Speed 3324.15 samples/sec   Loss 2.8933   LearningRate 0.0526   Epoch: 5   Global Step: 91740   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:55:10,587-Speed 3318.92 samples/sec   Loss 2.9614   LearningRate 0.0526   Epoch: 5   Global Step: 91750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:13,724-Speed 3265.09 samples/sec   Loss 2.9235   LearningRate 0.0526   Epoch: 5   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:16,798-Speed 3331.66 samples/sec   Loss 2.8853   LearningRate 0.0526   Epoch: 5   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:19,866-Speed 3338.93 samples/sec   Loss 2.8967   LearningRate 0.0526   Epoch: 5   Global Step: 91780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:22,934-Speed 3338.81 samples/sec   Loss 2.8648   LearningRate 0.0526   Epoch: 5   Global Step: 91790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:26,018-Speed 3321.47 samples/sec   Loss 2.8561   LearningRate 0.0526   Epoch: 5   Global Step: 91800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:29,110-Speed 3312.39 samples/sec   Loss 2.9055   LearningRate 0.0526   Epoch: 5   Global Step: 91810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:32,196-Speed 3318.55 samples/sec   Loss 2.8816   LearningRate 0.0526   Epoch: 5   Global Step: 91820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:35,266-Speed 3335.62 samples/sec   Loss 2.9176   LearningRate 0.0525   Epoch: 5   Global Step: 91830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:38,395-Speed 3273.82 samples/sec   Loss 2.8016   LearningRate 0.0525   Epoch: 5   Global Step: 91840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:55:41,495-Speed 3304.47 samples/sec   Loss 2.8912   LearningRate 0.0525   Epoch: 5   Global Step: 91850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 08:55:44,559-Speed 3342.94 samples/sec   Loss 2.8554   LearningRate 0.0525   Epoch: 5   Global Step: 91860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 08:55:47,629-Speed 3335.48 samples/sec   Loss 2.8581   LearningRate 0.0525   Epoch: 5   Global Step: 91870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 08:55:50,697-Speed 3339.20 samples/sec   Loss 2.8858   LearningRate 0.0525   Epoch: 5   Global Step: 91880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 08:55:53,761-Speed 3342.70 samples/sec   Loss 2.8871   LearningRate 0.0525   Epoch: 5   Global Step: 91890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 08:55:56,956-Speed 3205.56 samples/sec   Loss 2.8654   LearningRate 0.0525   Epoch: 5   Global Step: 91900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 08:56:00,005-Speed 3358.95 samples/sec   Loss 2.8814   LearningRate 0.0525   Epoch: 5   Global Step: 91910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:56:03,070-Speed 3341.69 samples/sec   Loss 2.8514   LearningRate 0.0525   Epoch: 5   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:56:06,161-Speed 3313.96 samples/sec   Loss 2.8696   LearningRate 0.0525   Epoch: 5   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:56:09,240-Speed 3326.68 samples/sec   Loss 2.9716   LearningRate 0.0525   Epoch: 5   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 08:56:12,300-Speed 3347.28 samples/sec   Loss 2.9185   LearningRate 0.0525   Epoch: 5   Global Step: 91950   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 08:56:15,421-Speed 3281.42 samples/sec   Loss 2.9702   LearningRate 0.0525   Epoch: 5   Global Step: 91960   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 08:56:18,562-Speed 3261.74 samples/sec   Loss 2.8749   LearningRate 0.0525   Epoch: 5   Global Step: 91970   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 08:56:21,648-Speed 3319.04 samples/sec   Loss 2.8734   LearningRate 0.0525   Epoch: 5   Global Step: 91980   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 08:56:24,722-Speed 3331.99 samples/sec   Loss 2.9033   LearningRate 0.0525   Epoch: 5   Global Step: 91990   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 08:56:27,882-Speed 3241.58 samples/sec   Loss 2.8927   LearningRate 0.0525   Epoch: 5   Global Step: 92000   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 08:57:11,771-[lfw][92000]XNorm: 21.650391
Training: 2022-04-11 08:57:11,771-[lfw][92000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-11 08:57:11,772-[lfw][92000]Accuracy-Highest: 0.99817
Training: 2022-04-11 08:58:02,677-[cfp_fp][92000]XNorm: 20.560312
Training: 2022-04-11 08:58:02,677-[cfp_fp][92000]Accuracy-Flip: 0.98500+-0.00576
Training: 2022-04-11 08:58:02,678-[cfp_fp][92000]Accuracy-Highest: 0.98557
Training: 2022-04-11 08:58:46,437-[agedb_30][92000]XNorm: 21.564771
Training: 2022-04-11 08:58:46,438-[agedb_30][92000]Accuracy-Flip: 0.98050+-0.00727
Training: 2022-04-11 08:58:46,438-[agedb_30][92000]Accuracy-Highest: 0.98200
Training: 2022-04-11 08:58:49,501-Speed 72.31 samples/sec   Loss 2.8286   LearningRate 0.0525   Epoch: 5   Global Step: 92010   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:58:52,571-Speed 3337.00 samples/sec   Loss 2.9615   LearningRate 0.0525   Epoch: 5   Global Step: 92020   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:58:55,691-Speed 3282.38 samples/sec   Loss 2.8950   LearningRate 0.0525   Epoch: 5   Global Step: 92030   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:58:58,841-Speed 3252.40 samples/sec   Loss 2.8878   LearningRate 0.0525   Epoch: 5   Global Step: 92040   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-04-11 08:59:01,900-Speed 3348.07 samples/sec   Loss 3.0105   LearningRate 0.0525   Epoch: 5   Global Step: 92050   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:04,980-Speed 3325.55 samples/sec   Loss 2.8717   LearningRate 0.0524   Epoch: 5   Global Step: 92060   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:08,034-Speed 3353.90 samples/sec   Loss 2.9222   LearningRate 0.0524   Epoch: 5   Global Step: 92070   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:11,095-Speed 3346.34 samples/sec   Loss 2.9951   LearningRate 0.0524   Epoch: 5   Global Step: 92080   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:14,237-Speed 3259.40 samples/sec   Loss 2.9266   LearningRate 0.0524   Epoch: 5   Global Step: 92090   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:17,324-Speed 3318.36 samples/sec   Loss 2.9092   LearningRate 0.0524   Epoch: 5   Global Step: 92100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:20,382-Speed 3348.62 samples/sec   Loss 2.9238   LearningRate 0.0524   Epoch: 5   Global Step: 92110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:23,445-Speed 3344.94 samples/sec   Loss 2.8263   LearningRate 0.0524   Epoch: 5   Global Step: 92120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:26,508-Speed 3343.31 samples/sec   Loss 2.9413   LearningRate 0.0524   Epoch: 5   Global Step: 92130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:29,579-Speed 3336.11 samples/sec   Loss 2.9464   LearningRate 0.0524   Epoch: 5   Global Step: 92140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:32,648-Speed 3337.27 samples/sec   Loss 2.8530   LearningRate 0.0524   Epoch: 5   Global Step: 92150   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:59:35,721-Speed 3331.95 samples/sec   Loss 2.9168   LearningRate 0.0524   Epoch: 5   Global Step: 92160   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 08:59:38,776-Speed 3353.29 samples/sec   Loss 2.8683   LearningRate 0.0524   Epoch: 5   Global Step: 92170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:41,839-Speed 3343.74 samples/sec   Loss 2.8886   LearningRate 0.0524   Epoch: 5   Global Step: 92180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:44,919-Speed 3325.77 samples/sec   Loss 2.8713   LearningRate 0.0524   Epoch: 5   Global Step: 92190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:48,002-Speed 3323.08 samples/sec   Loss 2.8896   LearningRate 0.0524   Epoch: 5   Global Step: 92200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:51,067-Speed 3342.70 samples/sec   Loss 2.9222   LearningRate 0.0524   Epoch: 5   Global Step: 92210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:54,157-Speed 3314.17 samples/sec   Loss 2.9083   LearningRate 0.0524   Epoch: 5   Global Step: 92220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 08:59:57,252-Speed 3309.73 samples/sec   Loss 2.9046   LearningRate 0.0524   Epoch: 5   Global Step: 92230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:00,386-Speed 3268.68 samples/sec   Loss 2.9203   LearningRate 0.0524   Epoch: 5   Global Step: 92240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:03,496-Speed 3292.81 samples/sec   Loss 2.9075   LearningRate 0.0524   Epoch: 5   Global Step: 92250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:06,589-Speed 3311.50 samples/sec   Loss 2.9472   LearningRate 0.0524   Epoch: 5   Global Step: 92260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:09,649-Speed 3347.27 samples/sec   Loss 2.9530   LearningRate 0.0524   Epoch: 5   Global Step: 92270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 09:00:12,719-Speed 3337.13 samples/sec   Loss 2.9056   LearningRate 0.0524   Epoch: 5   Global Step: 92280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 09:00:15,847-Speed 3273.74 samples/sec   Loss 2.8838   LearningRate 0.0524   Epoch: 5   Global Step: 92290   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-11 09:00:18,922-Speed 3331.27 samples/sec   Loss 2.8112   LearningRate 0.0523   Epoch: 5   Global Step: 92300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:22,034-Speed 3291.58 samples/sec   Loss 2.9894   LearningRate 0.0523   Epoch: 5   Global Step: 92310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:25,121-Speed 3317.91 samples/sec   Loss 2.9056   LearningRate 0.0523   Epoch: 5   Global Step: 92320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:28,185-Speed 3343.01 samples/sec   Loss 2.8796   LearningRate 0.0523   Epoch: 5   Global Step: 92330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:31,245-Speed 3347.02 samples/sec   Loss 2.9643   LearningRate 0.0523   Epoch: 5   Global Step: 92340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:34,316-Speed 3335.25 samples/sec   Loss 2.8801   LearningRate 0.0523   Epoch: 5   Global Step: 92350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:37,370-Speed 3353.85 samples/sec   Loss 2.8668   LearningRate 0.0523   Epoch: 5   Global Step: 92360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:40,453-Speed 3322.02 samples/sec   Loss 2.8115   LearningRate 0.0523   Epoch: 5   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:43,655-Speed 3200.56 samples/sec   Loss 2.8903   LearningRate 0.0523   Epoch: 5   Global Step: 92380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:46,746-Speed 3313.00 samples/sec   Loss 2.9277   LearningRate 0.0523   Epoch: 5   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-11 09:00:49,822-Speed 3329.49 samples/sec   Loss 2.9366   LearningRate 0.0523   Epoch: 5   Global Step: 92400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:00:52,913-Speed 3314.60 samples/sec   Loss 2.9861   LearningRate 0.0523   Epoch: 5   Global Step: 92410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:00:55,991-Speed 3327.88 samples/sec   Loss 2.8483   LearningRate 0.0523   Epoch: 5   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:00:59,079-Speed 3316.83 samples/sec   Loss 2.9282   LearningRate 0.0523   Epoch: 5   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:01:02,140-Speed 3346.10 samples/sec   Loss 2.8587   LearningRate 0.0523   Epoch: 5   Global Step: 92440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:01:05,282-Speed 3259.83 samples/sec   Loss 2.9674   LearningRate 0.0523   Epoch: 5   Global Step: 92450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:01:08,352-Speed 3337.12 samples/sec   Loss 2.9555   LearningRate 0.0523   Epoch: 5   Global Step: 92460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:01:11,446-Speed 3310.43 samples/sec   Loss 2.9100   LearningRate 0.0523   Epoch: 5   Global Step: 92470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:01:14,566-Speed 3283.45 samples/sec   Loss 2.9415   LearningRate 0.0523   Epoch: 5   Global Step: 92480   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:01:17,625-Speed 3348.23 samples/sec   Loss 2.9710   LearningRate 0.0523   Epoch: 5   Global Step: 92490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:01:20,686-Speed 3345.97 samples/sec   Loss 2.8519   LearningRate 0.0523   Epoch: 5   Global Step: 92500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:01:23,752-Speed 3340.85 samples/sec   Loss 2.8271   LearningRate 0.0523   Epoch: 5   Global Step: 92510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:01:26,852-Speed 3304.40 samples/sec   Loss 2.9207   LearningRate 0.0523   Epoch: 5   Global Step: 92520   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:29,946-Speed 3309.94 samples/sec   Loss 2.8323   LearningRate 0.0522   Epoch: 5   Global Step: 92530   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:33,033-Speed 3318.38 samples/sec   Loss 2.9266   LearningRate 0.0522   Epoch: 5   Global Step: 92540   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:36,192-Speed 3242.37 samples/sec   Loss 2.8998   LearningRate 0.0522   Epoch: 5   Global Step: 92550   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:39,337-Speed 3256.69 samples/sec   Loss 2.9114   LearningRate 0.0522   Epoch: 5   Global Step: 92560   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:42,456-Speed 3283.45 samples/sec   Loss 2.8536   LearningRate 0.0522   Epoch: 5   Global Step: 92570   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:45,557-Speed 3303.29 samples/sec   Loss 2.9127   LearningRate 0.0522   Epoch: 5   Global Step: 92580   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:48,673-Speed 3286.87 samples/sec   Loss 2.8890   LearningRate 0.0522   Epoch: 5   Global Step: 92590   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:51,737-Speed 3343.29 samples/sec   Loss 2.8759   LearningRate 0.0522   Epoch: 5   Global Step: 92600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:54,797-Speed 3347.37 samples/sec   Loss 2.8788   LearningRate 0.0522   Epoch: 5   Global Step: 92610   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:01:57,854-Speed 3351.43 samples/sec   Loss 2.9103   LearningRate 0.0522   Epoch: 5   Global Step: 92620   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:00,943-Speed 3315.42 samples/sec   Loss 2.9308   LearningRate 0.0522   Epoch: 5   Global Step: 92630   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:04,030-Speed 3317.45 samples/sec   Loss 2.8536   LearningRate 0.0522   Epoch: 5   Global Step: 92640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:07,119-Speed 3316.34 samples/sec   Loss 2.8864   LearningRate 0.0522   Epoch: 5   Global Step: 92650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:10,198-Speed 3326.16 samples/sec   Loss 2.8803   LearningRate 0.0522   Epoch: 5   Global Step: 92660   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:13,265-Speed 3340.14 samples/sec   Loss 2.8435   LearningRate 0.0522   Epoch: 5   Global Step: 92670   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:16,330-Speed 3341.31 samples/sec   Loss 2.9022   LearningRate 0.0522   Epoch: 5   Global Step: 92680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:19,398-Speed 3338.90 samples/sec   Loss 2.9552   LearningRate 0.0522   Epoch: 5   Global Step: 92690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:22,456-Speed 3348.83 samples/sec   Loss 2.8857   LearningRate 0.0522   Epoch: 5   Global Step: 92700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:25,542-Speed 3320.01 samples/sec   Loss 2.8958   LearningRate 0.0522   Epoch: 5   Global Step: 92710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:28,592-Speed 3357.86 samples/sec   Loss 2.8498   LearningRate 0.0522   Epoch: 5   Global Step: 92720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:31,650-Speed 3349.28 samples/sec   Loss 2.8859   LearningRate 0.0522   Epoch: 5   Global Step: 92730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:34,708-Speed 3349.68 samples/sec   Loss 2.9039   LearningRate 0.0522   Epoch: 5   Global Step: 92740   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:37,765-Speed 3350.46 samples/sec   Loss 2.8841   LearningRate 0.0522   Epoch: 5   Global Step: 92750   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:40,827-Speed 3344.53 samples/sec   Loss 2.8611   LearningRate 0.0521   Epoch: 5   Global Step: 92760   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:02:43,921-Speed 3310.84 samples/sec   Loss 2.8778   LearningRate 0.0521   Epoch: 5   Global Step: 92770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:02:47,025-Speed 3300.15 samples/sec   Loss 2.8949   LearningRate 0.0521   Epoch: 5   Global Step: 92780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:02:50,095-Speed 3336.39 samples/sec   Loss 2.9308   LearningRate 0.0521   Epoch: 5   Global Step: 92790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:02:53,175-Speed 3325.74 samples/sec   Loss 2.9190   LearningRate 0.0521   Epoch: 5   Global Step: 92800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:02:56,251-Speed 3329.09 samples/sec   Loss 2.9157   LearningRate 0.0521   Epoch: 5   Global Step: 92810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:02:59,320-Speed 3338.44 samples/sec   Loss 2.8446   LearningRate 0.0521   Epoch: 5   Global Step: 92820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:03:02,410-Speed 3314.83 samples/sec   Loss 2.8981   LearningRate 0.0521   Epoch: 5   Global Step: 92830   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:05,485-Speed 3330.51 samples/sec   Loss 2.9750   LearningRate 0.0521   Epoch: 5   Global Step: 92840   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:08,549-Speed 3342.99 samples/sec   Loss 2.9437   LearningRate 0.0521   Epoch: 5   Global Step: 92850   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:11,616-Speed 3339.74 samples/sec   Loss 2.8692   LearningRate 0.0521   Epoch: 5   Global Step: 92860   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:14,694-Speed 3328.24 samples/sec   Loss 2.8950   LearningRate 0.0521   Epoch: 5   Global Step: 92870   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:17,761-Speed 3338.87 samples/sec   Loss 2.7848   LearningRate 0.0521   Epoch: 5   Global Step: 92880   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:20,904-Speed 3258.97 samples/sec   Loss 2.8483   LearningRate 0.0521   Epoch: 5   Global Step: 92890   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:24,120-Speed 3185.66 samples/sec   Loss 2.9655   LearningRate 0.0521   Epoch: 5   Global Step: 92900   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:27,233-Speed 3289.49 samples/sec   Loss 2.8640   LearningRate 0.0521   Epoch: 5   Global Step: 92910   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:30,312-Speed 3326.90 samples/sec   Loss 2.9005   LearningRate 0.0521   Epoch: 5   Global Step: 92920   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:03:33,374-Speed 3345.06 samples/sec   Loss 2.9377   LearningRate 0.0521   Epoch: 5   Global Step: 92930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:03:36,533-Speed 3242.97 samples/sec   Loss 2.8359   LearningRate 0.0521   Epoch: 5   Global Step: 92940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:03:39,607-Speed 3331.91 samples/sec   Loss 2.9556   LearningRate 0.0521   Epoch: 5   Global Step: 92950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:03:42,661-Speed 3352.77 samples/sec   Loss 2.9538   LearningRate 0.0521   Epoch: 5   Global Step: 92960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:03:45,735-Speed 3333.23 samples/sec   Loss 2.8972   LearningRate 0.0521   Epoch: 5   Global Step: 92970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:03:48,804-Speed 3336.79 samples/sec   Loss 2.9202   LearningRate 0.0521   Epoch: 5   Global Step: 92980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:03:51,877-Speed 3333.47 samples/sec   Loss 2.7672   LearningRate 0.0520   Epoch: 5   Global Step: 92990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:03:54,965-Speed 3316.57 samples/sec   Loss 2.8983   LearningRate 0.0520   Epoch: 5   Global Step: 93000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:03:58,047-Speed 3323.40 samples/sec   Loss 2.8596   LearningRate 0.0520   Epoch: 5   Global Step: 93010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:01,111-Speed 3343.30 samples/sec   Loss 2.8893   LearningRate 0.0520   Epoch: 5   Global Step: 93020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:04,175-Speed 3342.75 samples/sec   Loss 2.9060   LearningRate 0.0520   Epoch: 5   Global Step: 93030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:04:07,259-Speed 3321.54 samples/sec   Loss 2.9341   LearningRate 0.0520   Epoch: 5   Global Step: 93040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:04:10,331-Speed 3334.56 samples/sec   Loss 2.9161   LearningRate 0.0520   Epoch: 5   Global Step: 93050   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:04:13,395-Speed 3342.79 samples/sec   Loss 2.8225   LearningRate 0.0520   Epoch: 5   Global Step: 93060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:04:16,448-Speed 3355.06 samples/sec   Loss 2.8358   LearningRate 0.0520   Epoch: 5   Global Step: 93070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:19,516-Speed 3337.96 samples/sec   Loss 2.7676   LearningRate 0.0520   Epoch: 5   Global Step: 93080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:22,587-Speed 3335.21 samples/sec   Loss 2.8215   LearningRate 0.0520   Epoch: 5   Global Step: 93090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:25,653-Speed 3340.92 samples/sec   Loss 2.9445   LearningRate 0.0520   Epoch: 5   Global Step: 93100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:28,786-Speed 3269.87 samples/sec   Loss 2.9910   LearningRate 0.0520   Epoch: 5   Global Step: 93110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:31,843-Speed 3350.34 samples/sec   Loss 2.8468   LearningRate 0.0520   Epoch: 5   Global Step: 93120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:34,925-Speed 3323.31 samples/sec   Loss 2.9628   LearningRate 0.0520   Epoch: 5   Global Step: 93130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:38,012-Speed 3318.06 samples/sec   Loss 2.9016   LearningRate 0.0520   Epoch: 5   Global Step: 93140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:41,074-Speed 3345.30 samples/sec   Loss 2.8721   LearningRate 0.0520   Epoch: 5   Global Step: 93150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:44,161-Speed 3317.73 samples/sec   Loss 2.8975   LearningRate 0.0520   Epoch: 5   Global Step: 93160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:47,289-Speed 3274.70 samples/sec   Loss 2.9399   LearningRate 0.0520   Epoch: 5   Global Step: 93170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:50,363-Speed 3331.96 samples/sec   Loss 2.9433   LearningRate 0.0520   Epoch: 5   Global Step: 93180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:53,447-Speed 3321.64 samples/sec   Loss 2.9773   LearningRate 0.0520   Epoch: 5   Global Step: 93190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:56,517-Speed 3336.08 samples/sec   Loss 2.8774   LearningRate 0.0520   Epoch: 5   Global Step: 93200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:04:59,588-Speed 3335.32 samples/sec   Loss 2.8643   LearningRate 0.0520   Epoch: 5   Global Step: 93210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:02,655-Speed 3339.07 samples/sec   Loss 2.8934   LearningRate 0.0519   Epoch: 5   Global Step: 93220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:05,745-Speed 3315.75 samples/sec   Loss 2.9457   LearningRate 0.0519   Epoch: 5   Global Step: 93230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:08,808-Speed 3342.91 samples/sec   Loss 2.9021   LearningRate 0.0519   Epoch: 5   Global Step: 93240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:11,897-Speed 3316.17 samples/sec   Loss 2.9237   LearningRate 0.0519   Epoch: 5   Global Step: 93250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:15,012-Speed 3288.24 samples/sec   Loss 2.8282   LearningRate 0.0519   Epoch: 5   Global Step: 93260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:18,110-Speed 3307.21 samples/sec   Loss 2.9057   LearningRate 0.0519   Epoch: 5   Global Step: 93270   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:05:21,261-Speed 3250.50 samples/sec   Loss 2.9228   LearningRate 0.0519   Epoch: 5   Global Step: 93280   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:05:24,336-Speed 3329.69 samples/sec   Loss 2.9373   LearningRate 0.0519   Epoch: 5   Global Step: 93290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:05:27,437-Speed 3303.28 samples/sec   Loss 2.9260   LearningRate 0.0519   Epoch: 5   Global Step: 93300   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:05:30,504-Speed 3340.13 samples/sec   Loss 2.8224   LearningRate 0.0519   Epoch: 5   Global Step: 93310   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:05:33,558-Speed 3354.17 samples/sec   Loss 2.9139   LearningRate 0.0519   Epoch: 5   Global Step: 93320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:36,637-Speed 3326.12 samples/sec   Loss 2.9302   LearningRate 0.0519   Epoch: 5   Global Step: 93330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:39,713-Speed 3330.50 samples/sec   Loss 2.8955   LearningRate 0.0519   Epoch: 5   Global Step: 93340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:42,772-Speed 3347.98 samples/sec   Loss 2.8942   LearningRate 0.0519   Epoch: 5   Global Step: 93350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:45,848-Speed 3329.78 samples/sec   Loss 2.8970   LearningRate 0.0519   Epoch: 5   Global Step: 93360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:48,910-Speed 3345.59 samples/sec   Loss 2.8530   LearningRate 0.0519   Epoch: 5   Global Step: 93370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:52,029-Speed 3285.16 samples/sec   Loss 2.8708   LearningRate 0.0519   Epoch: 5   Global Step: 93380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:55,121-Speed 3313.12 samples/sec   Loss 2.9215   LearningRate 0.0519   Epoch: 5   Global Step: 93390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:05:58,250-Speed 3272.96 samples/sec   Loss 2.9166   LearningRate 0.0519   Epoch: 5   Global Step: 93400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:01,431-Speed 3220.30 samples/sec   Loss 2.8816   LearningRate 0.0519   Epoch: 5   Global Step: 93410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:04,526-Speed 3309.22 samples/sec   Loss 2.9142   LearningRate 0.0519   Epoch: 5   Global Step: 93420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:06:07,630-Speed 3300.56 samples/sec   Loss 2.9541   LearningRate 0.0519   Epoch: 5   Global Step: 93430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:06:10,792-Speed 3238.58 samples/sec   Loss 2.8791   LearningRate 0.0519   Epoch: 5   Global Step: 93440   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:06:13,869-Speed 3328.96 samples/sec   Loss 2.9247   LearningRate 0.0518   Epoch: 5   Global Step: 93450   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:06:16,933-Speed 3343.02 samples/sec   Loss 2.8936   LearningRate 0.0518   Epoch: 5   Global Step: 93460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:06:20,108-Speed 3226.72 samples/sec   Loss 2.9318   LearningRate 0.0518   Epoch: 5   Global Step: 93470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:06:23,288-Speed 3220.30 samples/sec   Loss 2.8843   LearningRate 0.0518   Epoch: 5   Global Step: 93480   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:26,372-Speed 3322.16 samples/sec   Loss 2.8581   LearningRate 0.0518   Epoch: 5   Global Step: 93490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:29,438-Speed 3341.09 samples/sec   Loss 2.9198   LearningRate 0.0518   Epoch: 5   Global Step: 93500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:32,505-Speed 3339.96 samples/sec   Loss 2.9297   LearningRate 0.0518   Epoch: 5   Global Step: 93510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:35,566-Speed 3346.36 samples/sec   Loss 2.9137   LearningRate 0.0518   Epoch: 5   Global Step: 93520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:38,646-Speed 3325.20 samples/sec   Loss 2.8562   LearningRate 0.0518   Epoch: 5   Global Step: 93530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:41,877-Speed 3170.48 samples/sec   Loss 2.8622   LearningRate 0.0518   Epoch: 5   Global Step: 93540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:44,977-Speed 3303.44 samples/sec   Loss 2.8629   LearningRate 0.0518   Epoch: 5   Global Step: 93550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:48,047-Speed 3336.97 samples/sec   Loss 2.8924   LearningRate 0.0518   Epoch: 5   Global Step: 93560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:51,231-Speed 3216.29 samples/sec   Loss 2.9074   LearningRate 0.0518   Epoch: 5   Global Step: 93570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:54,361-Speed 3272.94 samples/sec   Loss 2.8799   LearningRate 0.0518   Epoch: 5   Global Step: 93580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:06:57,454-Speed 3311.83 samples/sec   Loss 2.9410   LearningRate 0.0518   Epoch: 5   Global Step: 93590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:00,525-Speed 3335.39 samples/sec   Loss 2.8894   LearningRate 0.0518   Epoch: 5   Global Step: 93600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:03,588-Speed 3344.43 samples/sec   Loss 2.9557   LearningRate 0.0518   Epoch: 5   Global Step: 93610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:06,656-Speed 3338.63 samples/sec   Loss 2.8377   LearningRate 0.0518   Epoch: 5   Global Step: 93620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:09,730-Speed 3332.05 samples/sec   Loss 2.9826   LearningRate 0.0518   Epoch: 5   Global Step: 93630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:12,795-Speed 3341.75 samples/sec   Loss 2.8760   LearningRate 0.0518   Epoch: 5   Global Step: 93640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:15,876-Speed 3325.91 samples/sec   Loss 2.9404   LearningRate 0.0518   Epoch: 5   Global Step: 93650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:18,937-Speed 3345.75 samples/sec   Loss 2.7622   LearningRate 0.0518   Epoch: 5   Global Step: 93660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:21,998-Speed 3346.85 samples/sec   Loss 2.8593   LearningRate 0.0518   Epoch: 5   Global Step: 93670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:25,097-Speed 3305.01 samples/sec   Loss 2.8597   LearningRate 0.0517   Epoch: 5   Global Step: 93680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:07:28,198-Speed 3303.20 samples/sec   Loss 2.8793   LearningRate 0.0517   Epoch: 5   Global Step: 93690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:07:31,259-Speed 3346.74 samples/sec   Loss 2.8789   LearningRate 0.0517   Epoch: 5   Global Step: 93700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:07:34,322-Speed 3343.28 samples/sec   Loss 2.8811   LearningRate 0.0517   Epoch: 5   Global Step: 93710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:07:37,376-Speed 3354.25 samples/sec   Loss 2.9047   LearningRate 0.0517   Epoch: 5   Global Step: 93720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:40,546-Speed 3231.78 samples/sec   Loss 2.8759   LearningRate 0.0517   Epoch: 5   Global Step: 93730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:43,636-Speed 3314.11 samples/sec   Loss 2.9295   LearningRate 0.0517   Epoch: 5   Global Step: 93740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:46,723-Speed 3318.00 samples/sec   Loss 2.9121   LearningRate 0.0517   Epoch: 5   Global Step: 93750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:49,948-Speed 3175.76 samples/sec   Loss 2.8601   LearningRate 0.0517   Epoch: 5   Global Step: 93760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:53,064-Speed 3287.82 samples/sec   Loss 2.9737   LearningRate 0.0517   Epoch: 5   Global Step: 93770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:56,164-Speed 3303.64 samples/sec   Loss 2.8845   LearningRate 0.0517   Epoch: 5   Global Step: 93780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:07:59,282-Speed 3285.45 samples/sec   Loss 2.9376   LearningRate 0.0517   Epoch: 5   Global Step: 93790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:02,381-Speed 3305.15 samples/sec   Loss 2.9000   LearningRate 0.0517   Epoch: 5   Global Step: 93800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:05,461-Speed 3325.27 samples/sec   Loss 2.8853   LearningRate 0.0517   Epoch: 5   Global Step: 93810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:08,628-Speed 3235.05 samples/sec   Loss 2.9198   LearningRate 0.0517   Epoch: 5   Global Step: 93820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:08:11,742-Speed 3288.39 samples/sec   Loss 2.8974   LearningRate 0.0517   Epoch: 5   Global Step: 93830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:08:14,846-Speed 3300.10 samples/sec   Loss 2.8816   LearningRate 0.0517   Epoch: 5   Global Step: 93840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:08:17,961-Speed 3288.43 samples/sec   Loss 2.9224   LearningRate 0.0517   Epoch: 5   Global Step: 93850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:08:21,086-Speed 3277.12 samples/sec   Loss 2.8995   LearningRate 0.0517   Epoch: 5   Global Step: 93860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:08:24,156-Speed 3336.05 samples/sec   Loss 2.8886   LearningRate 0.0517   Epoch: 5   Global Step: 93870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:27,237-Speed 3325.40 samples/sec   Loss 2.9281   LearningRate 0.0517   Epoch: 5   Global Step: 93880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:30,301-Speed 3343.18 samples/sec   Loss 2.8893   LearningRate 0.0517   Epoch: 5   Global Step: 93890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:33,364-Speed 3343.30 samples/sec   Loss 2.8876   LearningRate 0.0517   Epoch: 5   Global Step: 93900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:36,441-Speed 3328.57 samples/sec   Loss 2.9216   LearningRate 0.0517   Epoch: 5   Global Step: 93910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:39,589-Speed 3254.02 samples/sec   Loss 2.9634   LearningRate 0.0516   Epoch: 5   Global Step: 93920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:42,775-Speed 3215.56 samples/sec   Loss 2.9479   LearningRate 0.0516   Epoch: 5   Global Step: 93930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:45,910-Speed 3267.55 samples/sec   Loss 2.8834   LearningRate 0.0516   Epoch: 5   Global Step: 93940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:49,019-Speed 3293.90 samples/sec   Loss 2.8652   LearningRate 0.0516   Epoch: 5   Global Step: 93950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:52,178-Speed 3242.66 samples/sec   Loss 2.9330   LearningRate 0.0516   Epoch: 5   Global Step: 93960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:08:55,245-Speed 3339.09 samples/sec   Loss 2.9669   LearningRate 0.0516   Epoch: 5   Global Step: 93970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:08:58,383-Speed 3264.80 samples/sec   Loss 2.8888   LearningRate 0.0516   Epoch: 5   Global Step: 93980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:09:01,461-Speed 3327.57 samples/sec   Loss 2.9481   LearningRate 0.0516   Epoch: 5   Global Step: 93990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:09:04,534-Speed 3333.59 samples/sec   Loss 2.8323   LearningRate 0.0516   Epoch: 5   Global Step: 94000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:09:48,515-[lfw][94000]XNorm: 20.013484
Training: 2022-04-11 09:09:48,516-[lfw][94000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 09:09:48,517-[lfw][94000]Accuracy-Highest: 0.99817
Training: 2022-04-11 09:10:39,541-[cfp_fp][94000]XNorm: 18.759799
Training: 2022-04-11 09:10:39,541-[cfp_fp][94000]Accuracy-Flip: 0.98271+-0.00584
Training: 2022-04-11 09:10:39,542-[cfp_fp][94000]Accuracy-Highest: 0.98557
Training: 2022-04-11 09:11:23,416-[agedb_30][94000]XNorm: 20.514254
Training: 2022-04-11 09:11:23,417-[agedb_30][94000]Accuracy-Flip: 0.98133+-0.00666
Training: 2022-04-11 09:11:23,417-[agedb_30][94000]Accuracy-Highest: 0.98200
Training: 2022-04-11 09:11:26,604-Speed 72.08 samples/sec   Loss 2.8181   LearningRate 0.0516   Epoch: 5   Global Step: 94010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:11:29,657-Speed 3354.69 samples/sec   Loss 2.9232   LearningRate 0.0516   Epoch: 5   Global Step: 94020   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:11:32,751-Speed 3311.06 samples/sec   Loss 2.9619   LearningRate 0.0516   Epoch: 5   Global Step: 94030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:11:35,925-Speed 3227.07 samples/sec   Loss 2.8879   LearningRate 0.0516   Epoch: 5   Global Step: 94040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:11:39,024-Speed 3304.59 samples/sec   Loss 2.8418   LearningRate 0.0516   Epoch: 5   Global Step: 94050   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:11:42,115-Speed 3314.18 samples/sec   Loss 2.8290   LearningRate 0.0516   Epoch: 5   Global Step: 94060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:11:45,363-Speed 3153.96 samples/sec   Loss 2.8450   LearningRate 0.0516   Epoch: 5   Global Step: 94070   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:11:48,488-Speed 3276.87 samples/sec   Loss 2.8799   LearningRate 0.0516   Epoch: 5   Global Step: 94080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:11:51,586-Speed 3306.09 samples/sec   Loss 2.8964   LearningRate 0.0516   Epoch: 5   Global Step: 94090   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:11:54,646-Speed 3347.43 samples/sec   Loss 2.9011   LearningRate 0.0516   Epoch: 5   Global Step: 94100   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:11:57,719-Speed 3332.94 samples/sec   Loss 2.9125   LearningRate 0.0516   Epoch: 5   Global Step: 94110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:00,778-Speed 3348.55 samples/sec   Loss 2.8223   LearningRate 0.0516   Epoch: 5   Global Step: 94120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:03,853-Speed 3330.80 samples/sec   Loss 2.9384   LearningRate 0.0516   Epoch: 5   Global Step: 94130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:06,914-Speed 3346.63 samples/sec   Loss 2.8991   LearningRate 0.0516   Epoch: 5   Global Step: 94140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:09,971-Speed 3350.87 samples/sec   Loss 2.8419   LearningRate 0.0515   Epoch: 5   Global Step: 94150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:13,031-Speed 3346.21 samples/sec   Loss 2.9423   LearningRate 0.0515   Epoch: 5   Global Step: 94160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:16,115-Speed 3321.93 samples/sec   Loss 2.8869   LearningRate 0.0515   Epoch: 5   Global Step: 94170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:19,169-Speed 3353.93 samples/sec   Loss 2.9890   LearningRate 0.0515   Epoch: 5   Global Step: 94180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:22,233-Speed 3342.85 samples/sec   Loss 2.9602   LearningRate 0.0515   Epoch: 5   Global Step: 94190   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:25,303-Speed 3336.84 samples/sec   Loss 2.9925   LearningRate 0.0515   Epoch: 5   Global Step: 94200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:28,362-Speed 3347.56 samples/sec   Loss 2.9395   LearningRate 0.0515   Epoch: 5   Global Step: 94210   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:31,442-Speed 3326.02 samples/sec   Loss 2.8754   LearningRate 0.0515   Epoch: 5   Global Step: 94220   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:34,558-Speed 3287.49 samples/sec   Loss 2.7827   LearningRate 0.0515   Epoch: 5   Global Step: 94230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:37,622-Speed 3343.11 samples/sec   Loss 2.9289   LearningRate 0.0515   Epoch: 5   Global Step: 94240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:12:40,681-Speed 3347.62 samples/sec   Loss 2.8822   LearningRate 0.0515   Epoch: 5   Global Step: 94250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:12:43,742-Speed 3346.06 samples/sec   Loss 2.8429   LearningRate 0.0515   Epoch: 5   Global Step: 94260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:12:46,859-Speed 3286.13 samples/sec   Loss 2.8785   LearningRate 0.0515   Epoch: 5   Global Step: 94270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:12:49,947-Speed 3317.16 samples/sec   Loss 2.9536   LearningRate 0.0515   Epoch: 5   Global Step: 94280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:12:53,070-Speed 3279.88 samples/sec   Loss 2.8617   LearningRate 0.0515   Epoch: 5   Global Step: 94290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:12:56,136-Speed 3340.60 samples/sec   Loss 2.8787   LearningRate 0.0515   Epoch: 5   Global Step: 94300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:12:59,199-Speed 3343.32 samples/sec   Loss 2.7892   LearningRate 0.0515   Epoch: 5   Global Step: 94310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:02,286-Speed 3318.67 samples/sec   Loss 2.8966   LearningRate 0.0515   Epoch: 5   Global Step: 94320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:05,407-Speed 3281.63 samples/sec   Loss 2.9373   LearningRate 0.0515   Epoch: 5   Global Step: 94330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:08,495-Speed 3316.98 samples/sec   Loss 2.8679   LearningRate 0.0515   Epoch: 5   Global Step: 94340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:11,583-Speed 3317.02 samples/sec   Loss 3.0093   LearningRate 0.0515   Epoch: 5   Global Step: 94350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:13:14,722-Speed 3263.12 samples/sec   Loss 2.8154   LearningRate 0.0515   Epoch: 5   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:17,792-Speed 3336.48 samples/sec   Loss 2.8623   LearningRate 0.0515   Epoch: 5   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:20,918-Speed 3276.77 samples/sec   Loss 2.9391   LearningRate 0.0514   Epoch: 5   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:24,059-Speed 3261.73 samples/sec   Loss 2.9465   LearningRate 0.0514   Epoch: 5   Global Step: 94390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:27,139-Speed 3324.84 samples/sec   Loss 2.8944   LearningRate 0.0514   Epoch: 5   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:30,204-Speed 3342.23 samples/sec   Loss 2.9172   LearningRate 0.0514   Epoch: 5   Global Step: 94410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:33,294-Speed 3314.79 samples/sec   Loss 2.9271   LearningRate 0.0514   Epoch: 5   Global Step: 94420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:36,353-Speed 3348.44 samples/sec   Loss 2.9568   LearningRate 0.0514   Epoch: 5   Global Step: 94430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:39,452-Speed 3304.66 samples/sec   Loss 2.8658   LearningRate 0.0514   Epoch: 5   Global Step: 94440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:42,520-Speed 3338.44 samples/sec   Loss 2.8785   LearningRate 0.0514   Epoch: 5   Global Step: 94450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:13:45,602-Speed 3324.01 samples/sec   Loss 2.8467   LearningRate 0.0514   Epoch: 5   Global Step: 94460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:13:48,685-Speed 3323.04 samples/sec   Loss 2.9022   LearningRate 0.0514   Epoch: 5   Global Step: 94470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:13:51,826-Speed 3260.53 samples/sec   Loss 2.8511   LearningRate 0.0514   Epoch: 5   Global Step: 94480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:13:54,908-Speed 3322.55 samples/sec   Loss 2.9498   LearningRate 0.0514   Epoch: 5   Global Step: 94490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:13:58,019-Speed 3292.67 samples/sec   Loss 2.8374   LearningRate 0.0514   Epoch: 5   Global Step: 94500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:01,172-Speed 3249.47 samples/sec   Loss 2.8033   LearningRate 0.0514   Epoch: 5   Global Step: 94510   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:04,305-Speed 3269.25 samples/sec   Loss 2.8159   LearningRate 0.0514   Epoch: 5   Global Step: 94520   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:07,371-Speed 3341.00 samples/sec   Loss 2.8661   LearningRate 0.0514   Epoch: 5   Global Step: 94530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:10,444-Speed 3333.34 samples/sec   Loss 2.9160   LearningRate 0.0514   Epoch: 5   Global Step: 94540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:13,539-Speed 3311.00 samples/sec   Loss 2.9394   LearningRate 0.0514   Epoch: 5   Global Step: 94550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:16,688-Speed 3252.93 samples/sec   Loss 2.8521   LearningRate 0.0514   Epoch: 5   Global Step: 94560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:19,806-Speed 3285.18 samples/sec   Loss 2.9321   LearningRate 0.0514   Epoch: 5   Global Step: 94570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:22,885-Speed 3326.57 samples/sec   Loss 2.9272   LearningRate 0.0514   Epoch: 5   Global Step: 94580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:25,947-Speed 3345.83 samples/sec   Loss 2.8941   LearningRate 0.0514   Epoch: 5   Global Step: 94590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:29,033-Speed 3318.70 samples/sec   Loss 2.8812   LearningRate 0.0514   Epoch: 5   Global Step: 94600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:32,095-Speed 3346.01 samples/sec   Loss 2.9239   LearningRate 0.0513   Epoch: 5   Global Step: 94610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:35,162-Speed 3338.96 samples/sec   Loss 2.8496   LearningRate 0.0513   Epoch: 5   Global Step: 94620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:14:38,239-Speed 3329.66 samples/sec   Loss 2.8500   LearningRate 0.0513   Epoch: 5   Global Step: 94630   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:41,308-Speed 3337.40 samples/sec   Loss 2.9157   LearningRate 0.0513   Epoch: 5   Global Step: 94640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:44,404-Speed 3308.18 samples/sec   Loss 2.9694   LearningRate 0.0513   Epoch: 5   Global Step: 94650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:47,474-Speed 3336.19 samples/sec   Loss 2.9566   LearningRate 0.0513   Epoch: 5   Global Step: 94660   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:50,540-Speed 3341.04 samples/sec   Loss 2.8346   LearningRate 0.0513   Epoch: 5   Global Step: 94670   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:53,613-Speed 3333.12 samples/sec   Loss 2.9002   LearningRate 0.0513   Epoch: 5   Global Step: 94680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:56,693-Speed 3325.01 samples/sec   Loss 2.8484   LearningRate 0.0513   Epoch: 5   Global Step: 94690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:14:59,774-Speed 3324.58 samples/sec   Loss 2.8969   LearningRate 0.0513   Epoch: 5   Global Step: 94700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:15:02,868-Speed 3310.21 samples/sec   Loss 2.8468   LearningRate 0.0513   Epoch: 5   Global Step: 94710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:15:06,004-Speed 3266.56 samples/sec   Loss 2.8171   LearningRate 0.0513   Epoch: 5   Global Step: 94720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:09,151-Speed 3255.17 samples/sec   Loss 2.8390   LearningRate 0.0513   Epoch: 5   Global Step: 94730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:12,229-Speed 3327.59 samples/sec   Loss 2.8595   LearningRate 0.0513   Epoch: 5   Global Step: 94740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:15,299-Speed 3336.54 samples/sec   Loss 2.8375   LearningRate 0.0513   Epoch: 5   Global Step: 94750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:18,371-Speed 3334.43 samples/sec   Loss 2.9089   LearningRate 0.0513   Epoch: 5   Global Step: 94760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:21,460-Speed 3316.02 samples/sec   Loss 2.8452   LearningRate 0.0513   Epoch: 5   Global Step: 94770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:24,567-Speed 3296.41 samples/sec   Loss 2.9352   LearningRate 0.0513   Epoch: 5   Global Step: 94780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:27,645-Speed 3327.64 samples/sec   Loss 2.8076   LearningRate 0.0513   Epoch: 5   Global Step: 94790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:30,713-Speed 3339.29 samples/sec   Loss 2.9402   LearningRate 0.0513   Epoch: 5   Global Step: 94800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:33,775-Speed 3344.61 samples/sec   Loss 2.8236   LearningRate 0.0513   Epoch: 5   Global Step: 94810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:15:36,855-Speed 3325.53 samples/sec   Loss 2.8911   LearningRate 0.0513   Epoch: 5   Global Step: 94820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:15:39,940-Speed 3320.05 samples/sec   Loss 2.8959   LearningRate 0.0513   Epoch: 5   Global Step: 94830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:15:43,016-Speed 3330.80 samples/sec   Loss 2.8681   LearningRate 0.0513   Epoch: 5   Global Step: 94840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:15:46,098-Speed 3322.40 samples/sec   Loss 2.9173   LearningRate 0.0512   Epoch: 5   Global Step: 94850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:15:49,230-Speed 3270.82 samples/sec   Loss 2.8461   LearningRate 0.0512   Epoch: 5   Global Step: 94860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:15:52,310-Speed 3324.94 samples/sec   Loss 2.8192   LearningRate 0.0512   Epoch: 5   Global Step: 94870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:15:55,402-Speed 3312.96 samples/sec   Loss 2.9075   LearningRate 0.0512   Epoch: 5   Global Step: 94880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:15:58,464-Speed 3345.28 samples/sec   Loss 2.8642   LearningRate 0.0512   Epoch: 5   Global Step: 94890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:16:01,542-Speed 3328.08 samples/sec   Loss 2.8083   LearningRate 0.0512   Epoch: 5   Global Step: 94900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:16:04,615-Speed 3333.24 samples/sec   Loss 2.9628   LearningRate 0.0512   Epoch: 5   Global Step: 94910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:16:07,669-Speed 3353.53 samples/sec   Loss 2.9115   LearningRate 0.0512   Epoch: 5   Global Step: 94920   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:16:10,735-Speed 3340.41 samples/sec   Loss 2.7657   LearningRate 0.0512   Epoch: 5   Global Step: 94930   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:16:13,788-Speed 3355.02 samples/sec   Loss 2.8053   LearningRate 0.0512   Epoch: 5   Global Step: 94940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:16:16,857-Speed 3337.45 samples/sec   Loss 2.9403   LearningRate 0.0512   Epoch: 5   Global Step: 94950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:16:19,928-Speed 3335.23 samples/sec   Loss 2.8955   LearningRate 0.0512   Epoch: 5   Global Step: 94960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:16:22,992-Speed 3343.22 samples/sec   Loss 2.7898   LearningRate 0.0512   Epoch: 5   Global Step: 94970   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:26,116-Speed 3278.81 samples/sec   Loss 2.8588   LearningRate 0.0512   Epoch: 5   Global Step: 94980   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:29,187-Speed 3335.12 samples/sec   Loss 2.7996   LearningRate 0.0512   Epoch: 5   Global Step: 94990   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:32,260-Speed 3333.23 samples/sec   Loss 2.8783   LearningRate 0.0512   Epoch: 5   Global Step: 95000   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:35,335-Speed 3330.64 samples/sec   Loss 2.8910   LearningRate 0.0512   Epoch: 5   Global Step: 95010   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:38,400-Speed 3342.03 samples/sec   Loss 2.8991   LearningRate 0.0512   Epoch: 5   Global Step: 95020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:41,498-Speed 3306.20 samples/sec   Loss 2.8318   LearningRate 0.0512   Epoch: 5   Global Step: 95030   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:44,591-Speed 3311.42 samples/sec   Loss 2.8856   LearningRate 0.0512   Epoch: 5   Global Step: 95040   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:47,703-Speed 3292.40 samples/sec   Loss 2.9176   LearningRate 0.0512   Epoch: 5   Global Step: 95050   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:50,769-Speed 3340.25 samples/sec   Loss 2.8632   LearningRate 0.0512   Epoch: 5   Global Step: 95060   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:16:53,841-Speed 3334.10 samples/sec   Loss 2.9219   LearningRate 0.0512   Epoch: 5   Global Step: 95070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:16:56,922-Speed 3324.24 samples/sec   Loss 2.8814   LearningRate 0.0511   Epoch: 5   Global Step: 95080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:16:59,991-Speed 3338.47 samples/sec   Loss 2.8930   LearningRate 0.0511   Epoch: 5   Global Step: 95090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:17:03,069-Speed 3327.02 samples/sec   Loss 2.9288   LearningRate 0.0511   Epoch: 5   Global Step: 95100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:17:06,149-Speed 3325.77 samples/sec   Loss 2.9267   LearningRate 0.0511   Epoch: 5   Global Step: 95110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:17:09,336-Speed 3213.56 samples/sec   Loss 2.8954   LearningRate 0.0511   Epoch: 5   Global Step: 95120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:17:12,500-Speed 3237.81 samples/sec   Loss 2.8581   LearningRate 0.0511   Epoch: 5   Global Step: 95130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:17:15,571-Speed 3335.61 samples/sec   Loss 2.7621   LearningRate 0.0511   Epoch: 5   Global Step: 95140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:17:18,696-Speed 3277.48 samples/sec   Loss 2.8656   LearningRate 0.0511   Epoch: 5   Global Step: 95150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:17:21,817-Speed 3281.30 samples/sec   Loss 2.8115   LearningRate 0.0511   Epoch: 5   Global Step: 95160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:17:24,890-Speed 3332.98 samples/sec   Loss 2.8297   LearningRate 0.0511   Epoch: 5   Global Step: 95170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:27,991-Speed 3303.92 samples/sec   Loss 2.8403   LearningRate 0.0511   Epoch: 5   Global Step: 95180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:31,058-Speed 3339.39 samples/sec   Loss 2.8386   LearningRate 0.0511   Epoch: 5   Global Step: 95190   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:34,157-Speed 3304.66 samples/sec   Loss 2.8593   LearningRate 0.0511   Epoch: 5   Global Step: 95200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:37,222-Speed 3341.54 samples/sec   Loss 2.8574   LearningRate 0.0511   Epoch: 5   Global Step: 95210   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:40,300-Speed 3328.80 samples/sec   Loss 2.8684   LearningRate 0.0511   Epoch: 5   Global Step: 95220   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:43,380-Speed 3325.68 samples/sec   Loss 2.8676   LearningRate 0.0511   Epoch: 5   Global Step: 95230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:46,499-Speed 3283.31 samples/sec   Loss 2.8589   LearningRate 0.0511   Epoch: 5   Global Step: 95240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:49,574-Speed 3332.06 samples/sec   Loss 2.8335   LearningRate 0.0511   Epoch: 5   Global Step: 95250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:52,650-Speed 3330.07 samples/sec   Loss 2.8420   LearningRate 0.0511   Epoch: 5   Global Step: 95260   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:17:55,803-Speed 3248.37 samples/sec   Loss 2.9170   LearningRate 0.0511   Epoch: 5   Global Step: 95270   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:17:58,898-Speed 3310.16 samples/sec   Loss 2.8857   LearningRate 0.0511   Epoch: 5   Global Step: 95280   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:18:01,956-Speed 3349.41 samples/sec   Loss 2.8754   LearningRate 0.0511   Epoch: 5   Global Step: 95290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:05,023-Speed 3339.24 samples/sec   Loss 2.8315   LearningRate 0.0511   Epoch: 5   Global Step: 95300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:08,153-Speed 3272.17 samples/sec   Loss 2.8906   LearningRate 0.0510   Epoch: 5   Global Step: 95310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:11,218-Speed 3341.67 samples/sec   Loss 2.8853   LearningRate 0.0510   Epoch: 5   Global Step: 95320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:14,318-Speed 3304.33 samples/sec   Loss 2.8510   LearningRate 0.0510   Epoch: 5   Global Step: 95330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:17,386-Speed 3338.93 samples/sec   Loss 2.8319   LearningRate 0.0510   Epoch: 5   Global Step: 95340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:20,451-Speed 3341.13 samples/sec   Loss 2.8222   LearningRate 0.0510   Epoch: 5   Global Step: 95350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:23,583-Speed 3270.72 samples/sec   Loss 2.9335   LearningRate 0.0510   Epoch: 5   Global Step: 95360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:26,666-Speed 3322.10 samples/sec   Loss 2.8572   LearningRate 0.0510   Epoch: 5   Global Step: 95370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:29,733-Speed 3340.34 samples/sec   Loss 2.8772   LearningRate 0.0510   Epoch: 5   Global Step: 95380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:32,801-Speed 3338.29 samples/sec   Loss 2.9036   LearningRate 0.0510   Epoch: 5   Global Step: 95390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:18:35,858-Speed 3351.54 samples/sec   Loss 2.8371   LearningRate 0.0510   Epoch: 5   Global Step: 95400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:38,921-Speed 3342.93 samples/sec   Loss 2.7757   LearningRate 0.0510   Epoch: 5   Global Step: 95410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:41,988-Speed 3340.54 samples/sec   Loss 2.8195   LearningRate 0.0510   Epoch: 5   Global Step: 95420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:45,069-Speed 3324.39 samples/sec   Loss 2.8715   LearningRate 0.0510   Epoch: 5   Global Step: 95430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:48,157-Speed 3316.76 samples/sec   Loss 2.8756   LearningRate 0.0510   Epoch: 5   Global Step: 95440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:51,225-Speed 3338.49 samples/sec   Loss 2.8799   LearningRate 0.0510   Epoch: 5   Global Step: 95450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:54,292-Speed 3339.88 samples/sec   Loss 2.7991   LearningRate 0.0510   Epoch: 5   Global Step: 95460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:18:57,432-Speed 3261.40 samples/sec   Loss 2.9244   LearningRate 0.0510   Epoch: 5   Global Step: 95470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:19:00,501-Speed 3338.85 samples/sec   Loss 2.9098   LearningRate 0.0510   Epoch: 5   Global Step: 95480   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:19:03,566-Speed 3341.00 samples/sec   Loss 2.8575   LearningRate 0.0510   Epoch: 5   Global Step: 95490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:19:06,641-Speed 3330.94 samples/sec   Loss 2.8917   LearningRate 0.0510   Epoch: 5   Global Step: 95500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:09,730-Speed 3316.80 samples/sec   Loss 2.7991   LearningRate 0.0510   Epoch: 5   Global Step: 95510   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:12,798-Speed 3337.93 samples/sec   Loss 2.8138   LearningRate 0.0510   Epoch: 5   Global Step: 95520   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:15,873-Speed 3331.50 samples/sec   Loss 2.9335   LearningRate 0.0510   Epoch: 5   Global Step: 95530   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:18,979-Speed 3297.47 samples/sec   Loss 2.8015   LearningRate 0.0510   Epoch: 5   Global Step: 95540   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:22,050-Speed 3335.57 samples/sec   Loss 2.8996   LearningRate 0.0509   Epoch: 5   Global Step: 95550   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:25,132-Speed 3323.71 samples/sec   Loss 2.8392   LearningRate 0.0509   Epoch: 5   Global Step: 95560   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:28,197-Speed 3343.15 samples/sec   Loss 2.8485   LearningRate 0.0509   Epoch: 5   Global Step: 95570   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:31,264-Speed 3339.40 samples/sec   Loss 2.8103   LearningRate 0.0509   Epoch: 5   Global Step: 95580   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:34,326-Speed 3345.40 samples/sec   Loss 2.9493   LearningRate 0.0509   Epoch: 5   Global Step: 95590   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:37,397-Speed 3334.63 samples/sec   Loss 2.8539   LearningRate 0.0509   Epoch: 5   Global Step: 95600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:40,466-Speed 3337.46 samples/sec   Loss 2.9007   LearningRate 0.0509   Epoch: 5   Global Step: 95610   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:43,535-Speed 3337.19 samples/sec   Loss 2.9194   LearningRate 0.0509   Epoch: 5   Global Step: 95620   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:46,603-Speed 3338.87 samples/sec   Loss 2.9592   LearningRate 0.0509   Epoch: 5   Global Step: 95630   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:49,701-Speed 3306.03 samples/sec   Loss 2.9130   LearningRate 0.0509   Epoch: 5   Global Step: 95640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:52,769-Speed 3339.57 samples/sec   Loss 2.9486   LearningRate 0.0509   Epoch: 5   Global Step: 95650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:55,833-Speed 3342.78 samples/sec   Loss 2.9600   LearningRate 0.0509   Epoch: 5   Global Step: 95660   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:19:58,979-Speed 3256.45 samples/sec   Loss 2.8291   LearningRate 0.0509   Epoch: 5   Global Step: 95670   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:20:02,049-Speed 3335.64 samples/sec   Loss 2.7993   LearningRate 0.0509   Epoch: 5   Global Step: 95680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:20:05,126-Speed 3329.47 samples/sec   Loss 2.8775   LearningRate 0.0509   Epoch: 5   Global Step: 95690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:20:08,184-Speed 3348.33 samples/sec   Loss 2.8763   LearningRate 0.0509   Epoch: 5   Global Step: 95700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:20:11,259-Speed 3332.26 samples/sec   Loss 2.7911   LearningRate 0.0509   Epoch: 5   Global Step: 95710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:20:14,315-Speed 3351.95 samples/sec   Loss 2.8950   LearningRate 0.0509   Epoch: 5   Global Step: 95720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:17,390-Speed 3331.26 samples/sec   Loss 2.8504   LearningRate 0.0509   Epoch: 5   Global Step: 95730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:20,458-Speed 3338.80 samples/sec   Loss 2.8374   LearningRate 0.0509   Epoch: 5   Global Step: 95740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:23,589-Speed 3271.20 samples/sec   Loss 2.8239   LearningRate 0.0509   Epoch: 5   Global Step: 95750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:26,692-Speed 3301.93 samples/sec   Loss 2.8919   LearningRate 0.0509   Epoch: 5   Global Step: 95760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:29,768-Speed 3329.28 samples/sec   Loss 2.8060   LearningRate 0.0509   Epoch: 5   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:32,882-Speed 3289.19 samples/sec   Loss 2.8888   LearningRate 0.0508   Epoch: 5   Global Step: 95780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:35,954-Speed 3334.22 samples/sec   Loss 2.8612   LearningRate 0.0508   Epoch: 5   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:39,043-Speed 3316.48 samples/sec   Loss 2.9087   LearningRate 0.0508   Epoch: 5   Global Step: 95800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:42,112-Speed 3337.45 samples/sec   Loss 2.8708   LearningRate 0.0508   Epoch: 5   Global Step: 95810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:20:45,191-Speed 3326.16 samples/sec   Loss 2.8385   LearningRate 0.0508   Epoch: 5   Global Step: 95820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:20:48,307-Speed 3287.52 samples/sec   Loss 2.8573   LearningRate 0.0508   Epoch: 5   Global Step: 95830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:20:51,381-Speed 3332.18 samples/sec   Loss 2.7963   LearningRate 0.0508   Epoch: 5   Global Step: 95840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:20:54,463-Speed 3323.66 samples/sec   Loss 2.8662   LearningRate 0.0508   Epoch: 5   Global Step: 95850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:20:57,530-Speed 3339.31 samples/sec   Loss 2.7663   LearningRate 0.0508   Epoch: 5   Global Step: 95860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:00,606-Speed 3330.15 samples/sec   Loss 2.8521   LearningRate 0.0508   Epoch: 5   Global Step: 95870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:03,679-Speed 3331.98 samples/sec   Loss 2.8491   LearningRate 0.0508   Epoch: 5   Global Step: 95880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:06,757-Speed 3328.26 samples/sec   Loss 2.9003   LearningRate 0.0508   Epoch: 5   Global Step: 95890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:09,844-Speed 3317.97 samples/sec   Loss 2.8661   LearningRate 0.0508   Epoch: 5   Global Step: 95900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:12,945-Speed 3303.66 samples/sec   Loss 2.8770   LearningRate 0.0508   Epoch: 5   Global Step: 95910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:16,024-Speed 3325.67 samples/sec   Loss 2.8351   LearningRate 0.0508   Epoch: 5   Global Step: 95920   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:21:19,099-Speed 3330.69 samples/sec   Loss 2.8362   LearningRate 0.0508   Epoch: 5   Global Step: 95930   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:22,202-Speed 3303.25 samples/sec   Loss 2.9093   LearningRate 0.0508   Epoch: 5   Global Step: 95940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:25,370-Speed 3233.14 samples/sec   Loss 2.8763   LearningRate 0.0508   Epoch: 5   Global Step: 95950   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:28,438-Speed 3338.68 samples/sec   Loss 2.8794   LearningRate 0.0508   Epoch: 5   Global Step: 95960   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:31,512-Speed 3331.14 samples/sec   Loss 2.8627   LearningRate 0.0508   Epoch: 5   Global Step: 95970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:34,590-Speed 3327.93 samples/sec   Loss 2.7949   LearningRate 0.0508   Epoch: 5   Global Step: 95980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:37,672-Speed 3323.13 samples/sec   Loss 2.9175   LearningRate 0.0508   Epoch: 5   Global Step: 95990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:21:40,743-Speed 3335.65 samples/sec   Loss 2.8678   LearningRate 0.0508   Epoch: 5   Global Step: 96000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:22:24,449-[lfw][96000]XNorm: 22.028936
Training: 2022-04-11 09:22:24,450-[lfw][96000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-11 09:22:24,450-[lfw][96000]Accuracy-Highest: 0.99817
Training: 2022-04-11 09:23:15,626-[cfp_fp][96000]XNorm: 20.748122
Training: 2022-04-11 09:23:15,627-[cfp_fp][96000]Accuracy-Flip: 0.98400+-0.00588
Training: 2022-04-11 09:23:15,627-[cfp_fp][96000]Accuracy-Highest: 0.98557
Training: 2022-04-11 09:23:59,473-[agedb_30][96000]XNorm: 22.395659
Training: 2022-04-11 09:23:59,474-[agedb_30][96000]Accuracy-Flip: 0.98133+-0.00752
Training: 2022-04-11 09:23:59,474-[agedb_30][96000]Accuracy-Highest: 0.98200
Training: 2022-04-11 09:24:02,536-Speed 72.22 samples/sec   Loss 2.8229   LearningRate 0.0507   Epoch: 5   Global Step: 96010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:05,618-Speed 3323.70 samples/sec   Loss 2.8994   LearningRate 0.0507   Epoch: 5   Global Step: 96020   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:08,679-Speed 3346.82 samples/sec   Loss 2.8131   LearningRate 0.0507   Epoch: 5   Global Step: 96030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:11,783-Speed 3299.46 samples/sec   Loss 2.8738   LearningRate 0.0507   Epoch: 5   Global Step: 96040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:14,857-Speed 3332.15 samples/sec   Loss 2.8408   LearningRate 0.0507   Epoch: 5   Global Step: 96050   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:17,926-Speed 3337.17 samples/sec   Loss 2.8577   LearningRate 0.0507   Epoch: 5   Global Step: 96060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:20,981-Speed 3353.21 samples/sec   Loss 2.9562   LearningRate 0.0507   Epoch: 5   Global Step: 96070   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:24,041-Speed 3346.96 samples/sec   Loss 2.8859   LearningRate 0.0507   Epoch: 5   Global Step: 96080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:27,102-Speed 3346.34 samples/sec   Loss 2.8500   LearningRate 0.0507   Epoch: 5   Global Step: 96090   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:30,173-Speed 3334.83 samples/sec   Loss 2.8174   LearningRate 0.0507   Epoch: 5   Global Step: 96100   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:33,231-Speed 3349.57 samples/sec   Loss 2.8681   LearningRate 0.0507   Epoch: 5   Global Step: 96110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:36,289-Speed 3349.65 samples/sec   Loss 2.8104   LearningRate 0.0507   Epoch: 5   Global Step: 96120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:39,338-Speed 3358.63 samples/sec   Loss 2.8197   LearningRate 0.0507   Epoch: 5   Global Step: 96130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:42,436-Speed 3306.99 samples/sec   Loss 2.9178   LearningRate 0.0507   Epoch: 5   Global Step: 96140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:45,499-Speed 3343.93 samples/sec   Loss 2.8012   LearningRate 0.0507   Epoch: 5   Global Step: 96150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:24:48,614-Speed 3288.74 samples/sec   Loss 2.8946   LearningRate 0.0507   Epoch: 5   Global Step: 96160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:24:51,682-Speed 3338.56 samples/sec   Loss 2.8074   LearningRate 0.0507   Epoch: 5   Global Step: 96170   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:24:54,752-Speed 3336.36 samples/sec   Loss 2.8757   LearningRate 0.0507   Epoch: 5   Global Step: 96180   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:24:57,864-Speed 3292.20 samples/sec   Loss 2.9715   LearningRate 0.0507   Epoch: 5   Global Step: 96190   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:25:00,927-Speed 3344.01 samples/sec   Loss 2.8358   LearningRate 0.0507   Epoch: 5   Global Step: 96200   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:25:04,043-Speed 3286.41 samples/sec   Loss 2.8668   LearningRate 0.0507   Epoch: 5   Global Step: 96210   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:25:07,115-Speed 3334.85 samples/sec   Loss 2.8931   LearningRate 0.0507   Epoch: 5   Global Step: 96220   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:25:10,231-Speed 3286.75 samples/sec   Loss 2.8782   LearningRate 0.0507   Epoch: 5   Global Step: 96230   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:25:13,303-Speed 3333.97 samples/sec   Loss 2.8597   LearningRate 0.0507   Epoch: 5   Global Step: 96240   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:25:16,418-Speed 3288.93 samples/sec   Loss 2.7954   LearningRate 0.0506   Epoch: 5   Global Step: 96250   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:25:19,526-Speed 3295.72 samples/sec   Loss 2.8922   LearningRate 0.0506   Epoch: 5   Global Step: 96260   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-04-11 09:25:22,618-Speed 3312.33 samples/sec   Loss 2.8913   LearningRate 0.0506   Epoch: 5   Global Step: 96270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:25,682-Speed 3343.09 samples/sec   Loss 2.8146   LearningRate 0.0506   Epoch: 5   Global Step: 96280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:28,782-Speed 3304.49 samples/sec   Loss 2.8032   LearningRate 0.0506   Epoch: 5   Global Step: 96290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:31,845-Speed 3343.84 samples/sec   Loss 2.8722   LearningRate 0.0506   Epoch: 5   Global Step: 96300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:34,919-Speed 3331.55 samples/sec   Loss 2.8109   LearningRate 0.0506   Epoch: 5   Global Step: 96310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:37,991-Speed 3334.83 samples/sec   Loss 2.8309   LearningRate 0.0506   Epoch: 5   Global Step: 96320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:41,094-Speed 3301.35 samples/sec   Loss 2.9019   LearningRate 0.0506   Epoch: 5   Global Step: 96330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:44,234-Speed 3261.79 samples/sec   Loss 2.8362   LearningRate 0.0506   Epoch: 5   Global Step: 96340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:47,358-Speed 3278.15 samples/sec   Loss 2.8174   LearningRate 0.0506   Epoch: 5   Global Step: 96350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:50,507-Speed 3253.48 samples/sec   Loss 2.8617   LearningRate 0.0506   Epoch: 5   Global Step: 96360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:53,709-Speed 3197.97 samples/sec   Loss 2.8344   LearningRate 0.0506   Epoch: 5   Global Step: 96370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:25:56,828-Speed 3284.93 samples/sec   Loss 2.9264   LearningRate 0.0506   Epoch: 5   Global Step: 96380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:25:59,895-Speed 3339.71 samples/sec   Loss 2.8561   LearningRate 0.0506   Epoch: 5   Global Step: 96390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:26:03,079-Speed 3216.67 samples/sec   Loss 2.8159   LearningRate 0.0506   Epoch: 5   Global Step: 96400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:26:06,166-Speed 3317.77 samples/sec   Loss 2.8397   LearningRate 0.0506   Epoch: 5   Global Step: 96410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:26:09,309-Speed 3258.59 samples/sec   Loss 2.8419   LearningRate 0.0506   Epoch: 5   Global Step: 96420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:26:12,383-Speed 3332.99 samples/sec   Loss 2.8887   LearningRate 0.0506   Epoch: 5   Global Step: 96430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:26:15,444-Speed 3346.38 samples/sec   Loss 2.8169   LearningRate 0.0506   Epoch: 5   Global Step: 96440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:26:18,507-Speed 3344.44 samples/sec   Loss 2.8269   LearningRate 0.0506   Epoch: 5   Global Step: 96450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:26:21,572-Speed 3341.70 samples/sec   Loss 2.8374   LearningRate 0.0506   Epoch: 5   Global Step: 96460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:26:24,665-Speed 3311.44 samples/sec   Loss 2.9076   LearningRate 0.0506   Epoch: 5   Global Step: 96470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:26:27,735-Speed 3335.98 samples/sec   Loss 2.8436   LearningRate 0.0505   Epoch: 5   Global Step: 96480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:30,882-Speed 3254.78 samples/sec   Loss 2.8423   LearningRate 0.0505   Epoch: 5   Global Step: 96490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:33,946-Speed 3342.95 samples/sec   Loss 2.8420   LearningRate 0.0505   Epoch: 5   Global Step: 96500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:37,020-Speed 3331.91 samples/sec   Loss 2.9671   LearningRate 0.0505   Epoch: 5   Global Step: 96510   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:40,083-Speed 3344.93 samples/sec   Loss 2.8789   LearningRate 0.0505   Epoch: 5   Global Step: 96520   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:43,143-Speed 3346.88 samples/sec   Loss 2.8577   LearningRate 0.0505   Epoch: 5   Global Step: 96530   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:46,209-Speed 3340.47 samples/sec   Loss 2.7930   LearningRate 0.0505   Epoch: 5   Global Step: 96540   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:49,276-Speed 3340.01 samples/sec   Loss 2.8321   LearningRate 0.0505   Epoch: 5   Global Step: 96550   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:52,354-Speed 3327.04 samples/sec   Loss 2.8981   LearningRate 0.0505   Epoch: 5   Global Step: 96560   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:55,430-Speed 3330.16 samples/sec   Loss 2.8736   LearningRate 0.0505   Epoch: 5   Global Step: 96570   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:26:58,472-Speed 3367.58 samples/sec   Loss 2.9086   LearningRate 0.0505   Epoch: 5   Global Step: 96580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:01,532-Speed 3346.61 samples/sec   Loss 2.8775   LearningRate 0.0505   Epoch: 5   Global Step: 96590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:04,596-Speed 3344.54 samples/sec   Loss 2.8831   LearningRate 0.0505   Epoch: 5   Global Step: 96600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:07,657-Speed 3345.73 samples/sec   Loss 2.8193   LearningRate 0.0505   Epoch: 5   Global Step: 96610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:10,726-Speed 3337.25 samples/sec   Loss 2.9316   LearningRate 0.0505   Epoch: 5   Global Step: 96620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:13,791-Speed 3341.97 samples/sec   Loss 2.8033   LearningRate 0.0505   Epoch: 5   Global Step: 96630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:16,897-Speed 3297.90 samples/sec   Loss 2.8561   LearningRate 0.0505   Epoch: 5   Global Step: 96640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:19,960-Speed 3344.48 samples/sec   Loss 2.8963   LearningRate 0.0505   Epoch: 5   Global Step: 96650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:23,024-Speed 3342.58 samples/sec   Loss 2.8760   LearningRate 0.0505   Epoch: 5   Global Step: 96660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:26,101-Speed 3328.49 samples/sec   Loss 2.7790   LearningRate 0.0505   Epoch: 5   Global Step: 96670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:27:29,172-Speed 3335.77 samples/sec   Loss 2.8628   LearningRate 0.0505   Epoch: 5   Global Step: 96680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:27:32,255-Speed 3321.98 samples/sec   Loss 2.8905   LearningRate 0.0505   Epoch: 5   Global Step: 96690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:27:35,321-Speed 3340.77 samples/sec   Loss 2.8805   LearningRate 0.0505   Epoch: 5   Global Step: 96700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:27:38,403-Speed 3323.38 samples/sec   Loss 2.8661   LearningRate 0.0505   Epoch: 5   Global Step: 96710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:27:41,533-Speed 3272.03 samples/sec   Loss 2.8685   LearningRate 0.0504   Epoch: 5   Global Step: 96720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:27:44,694-Speed 3240.63 samples/sec   Loss 2.8239   LearningRate 0.0504   Epoch: 5   Global Step: 96730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:27:47,848-Speed 3247.39 samples/sec   Loss 2.7892   LearningRate 0.0504   Epoch: 5   Global Step: 96740   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:27:50,962-Speed 3289.80 samples/sec   Loss 2.8358   LearningRate 0.0504   Epoch: 5   Global Step: 96750   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:27:54,091-Speed 3272.61 samples/sec   Loss 2.8837   LearningRate 0.0504   Epoch: 5   Global Step: 96760   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:27:57,303-Speed 3189.48 samples/sec   Loss 2.8756   LearningRate 0.0504   Epoch: 5   Global Step: 96770   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:00,399-Speed 3308.92 samples/sec   Loss 2.7873   LearningRate 0.0504   Epoch: 5   Global Step: 96780   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:28:03,470-Speed 3334.78 samples/sec   Loss 2.8123   LearningRate 0.0504   Epoch: 5   Global Step: 96790   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:06,540-Speed 3336.39 samples/sec   Loss 2.8005   LearningRate 0.0504   Epoch: 5   Global Step: 96800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:09,641-Speed 3303.66 samples/sec   Loss 2.8059   LearningRate 0.0504   Epoch: 5   Global Step: 96810   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:12,755-Speed 3289.83 samples/sec   Loss 2.8819   LearningRate 0.0504   Epoch: 5   Global Step: 96820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:15,870-Speed 3287.58 samples/sec   Loss 2.8699   LearningRate 0.0504   Epoch: 5   Global Step: 96830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:18,946-Speed 3330.00 samples/sec   Loss 2.8865   LearningRate 0.0504   Epoch: 5   Global Step: 96840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:22,020-Speed 3331.88 samples/sec   Loss 2.7734   LearningRate 0.0504   Epoch: 5   Global Step: 96850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:25,087-Speed 3340.20 samples/sec   Loss 2.9044   LearningRate 0.0504   Epoch: 5   Global Step: 96860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:28,149-Speed 3344.08 samples/sec   Loss 2.8975   LearningRate 0.0504   Epoch: 5   Global Step: 96870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:31,223-Speed 3332.30 samples/sec   Loss 2.8883   LearningRate 0.0504   Epoch: 5   Global Step: 96880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:34,362-Speed 3263.97 samples/sec   Loss 2.8829   LearningRate 0.0504   Epoch: 5   Global Step: 96890   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:28:37,467-Speed 3297.85 samples/sec   Loss 2.8613   LearningRate 0.0504   Epoch: 5   Global Step: 96900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:28:40,538-Speed 3335.67 samples/sec   Loss 2.8398   LearningRate 0.0504   Epoch: 5   Global Step: 96910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:28:43,602-Speed 3342.65 samples/sec   Loss 2.7985   LearningRate 0.0504   Epoch: 5   Global Step: 96920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:28:46,671-Speed 3337.65 samples/sec   Loss 2.8556   LearningRate 0.0504   Epoch: 5   Global Step: 96930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:28:49,758-Speed 3318.77 samples/sec   Loss 2.7346   LearningRate 0.0504   Epoch: 5   Global Step: 96940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:28:52,865-Speed 3296.18 samples/sec   Loss 2.8033   LearningRate 0.0503   Epoch: 5   Global Step: 96950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:28:55,936-Speed 3335.64 samples/sec   Loss 2.8120   LearningRate 0.0503   Epoch: 5   Global Step: 96960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:28:59,006-Speed 3335.42 samples/sec   Loss 2.8753   LearningRate 0.0503   Epoch: 5   Global Step: 96970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:02,087-Speed 3325.59 samples/sec   Loss 2.7785   LearningRate 0.0503   Epoch: 5   Global Step: 96980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:05,189-Speed 3301.35 samples/sec   Loss 2.8683   LearningRate 0.0503   Epoch: 5   Global Step: 96990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:08,276-Speed 3318.07 samples/sec   Loss 2.8554   LearningRate 0.0503   Epoch: 5   Global Step: 97000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:11,341-Speed 3341.53 samples/sec   Loss 2.8736   LearningRate 0.0503   Epoch: 5   Global Step: 97010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:29:14,405-Speed 3343.05 samples/sec   Loss 2.8688   LearningRate 0.0503   Epoch: 5   Global Step: 97020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:17,470-Speed 3342.00 samples/sec   Loss 2.9196   LearningRate 0.0503   Epoch: 5   Global Step: 97030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:20,573-Speed 3300.77 samples/sec   Loss 2.9345   LearningRate 0.0503   Epoch: 5   Global Step: 97040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:23,694-Speed 3281.85 samples/sec   Loss 2.8228   LearningRate 0.0503   Epoch: 5   Global Step: 97050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:26,798-Speed 3299.82 samples/sec   Loss 2.8201   LearningRate 0.0503   Epoch: 5   Global Step: 97060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:29,873-Speed 3331.20 samples/sec   Loss 2.8593   LearningRate 0.0503   Epoch: 5   Global Step: 97070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:32,962-Speed 3315.89 samples/sec   Loss 2.8220   LearningRate 0.0503   Epoch: 5   Global Step: 97080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:36,102-Speed 3261.84 samples/sec   Loss 2.8398   LearningRate 0.0503   Epoch: 5   Global Step: 97090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:39,207-Speed 3298.82 samples/sec   Loss 2.7753   LearningRate 0.0503   Epoch: 5   Global Step: 97100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:42,344-Speed 3265.08 samples/sec   Loss 2.8398   LearningRate 0.0503   Epoch: 5   Global Step: 97110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:29:45,486-Speed 3260.14 samples/sec   Loss 2.8555   LearningRate 0.0503   Epoch: 5   Global Step: 97120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:29:48,588-Speed 3301.35 samples/sec   Loss 2.8504   LearningRate 0.0503   Epoch: 5   Global Step: 97130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:29:51,795-Speed 3194.24 samples/sec   Loss 2.8831   LearningRate 0.0503   Epoch: 5   Global Step: 97140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:29:54,917-Speed 3281.10 samples/sec   Loss 2.8624   LearningRate 0.0503   Epoch: 5   Global Step: 97150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:29:57,985-Speed 3338.83 samples/sec   Loss 2.8923   LearningRate 0.0503   Epoch: 5   Global Step: 97160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:01,051-Speed 3339.64 samples/sec   Loss 2.9058   LearningRate 0.0503   Epoch: 5   Global Step: 97170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:04,132-Speed 3324.61 samples/sec   Loss 2.9144   LearningRate 0.0503   Epoch: 5   Global Step: 97180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:07,211-Speed 3327.42 samples/sec   Loss 2.8776   LearningRate 0.0502   Epoch: 5   Global Step: 97190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:10,289-Speed 3328.03 samples/sec   Loss 2.9030   LearningRate 0.0502   Epoch: 5   Global Step: 97200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:13,355-Speed 3340.09 samples/sec   Loss 2.8037   LearningRate 0.0502   Epoch: 5   Global Step: 97210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:16,422-Speed 3339.97 samples/sec   Loss 2.8427   LearningRate 0.0502   Epoch: 5   Global Step: 97220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:19,550-Speed 3273.84 samples/sec   Loss 2.9369   LearningRate 0.0502   Epoch: 5   Global Step: 97230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:22,635-Speed 3320.43 samples/sec   Loss 2.8694   LearningRate 0.0502   Epoch: 5   Global Step: 97240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:25,764-Speed 3273.13 samples/sec   Loss 2.8665   LearningRate 0.0502   Epoch: 5   Global Step: 97250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:28,848-Speed 3321.77 samples/sec   Loss 2.8720   LearningRate 0.0502   Epoch: 5   Global Step: 97260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:31,911-Speed 3343.41 samples/sec   Loss 2.8956   LearningRate 0.0502   Epoch: 5   Global Step: 97270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:34,980-Speed 3337.68 samples/sec   Loss 2.7446   LearningRate 0.0502   Epoch: 5   Global Step: 97280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:30:38,047-Speed 3339.06 samples/sec   Loss 2.8630   LearningRate 0.0502   Epoch: 5   Global Step: 97290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:41,119-Speed 3334.67 samples/sec   Loss 2.7971   LearningRate 0.0502   Epoch: 5   Global Step: 97300   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:44,199-Speed 3327.12 samples/sec   Loss 2.8409   LearningRate 0.0502   Epoch: 5   Global Step: 97310   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:47,272-Speed 3332.82 samples/sec   Loss 2.7976   LearningRate 0.0502   Epoch: 5   Global Step: 97320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:50,376-Speed 3299.84 samples/sec   Loss 2.9035   LearningRate 0.0502   Epoch: 5   Global Step: 97330   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:53,575-Speed 3202.31 samples/sec   Loss 2.8218   LearningRate 0.0502   Epoch: 5   Global Step: 97340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:56,676-Speed 3302.63 samples/sec   Loss 2.8235   LearningRate 0.0502   Epoch: 5   Global Step: 97350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:30:59,795-Speed 3283.99 samples/sec   Loss 2.7762   LearningRate 0.0502   Epoch: 5   Global Step: 97360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:02,885-Speed 3314.46 samples/sec   Loss 2.8311   LearningRate 0.0502   Epoch: 5   Global Step: 97370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:05,965-Speed 3326.48 samples/sec   Loss 2.8067   LearningRate 0.0502   Epoch: 5   Global Step: 97380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:09,018-Speed 3354.04 samples/sec   Loss 2.8649   LearningRate 0.0502   Epoch: 5   Global Step: 97390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:12,083-Speed 3341.45 samples/sec   Loss 2.8131   LearningRate 0.0502   Epoch: 5   Global Step: 97400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:15,161-Speed 3328.13 samples/sec   Loss 2.7867   LearningRate 0.0502   Epoch: 5   Global Step: 97410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:18,241-Speed 3325.84 samples/sec   Loss 2.8305   LearningRate 0.0501   Epoch: 5   Global Step: 97420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:21,313-Speed 3334.87 samples/sec   Loss 2.8482   LearningRate 0.0501   Epoch: 5   Global Step: 97430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:24,510-Speed 3203.32 samples/sec   Loss 2.8083   LearningRate 0.0501   Epoch: 5   Global Step: 97440   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:27,610-Speed 3304.66 samples/sec   Loss 2.8317   LearningRate 0.0501   Epoch: 5   Global Step: 97450   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:30,856-Speed 3154.95 samples/sec   Loss 2.8319   LearningRate 0.0501   Epoch: 5   Global Step: 97460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:33,983-Speed 3275.99 samples/sec   Loss 2.7828   LearningRate 0.0501   Epoch: 5   Global Step: 97470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:37,151-Speed 3232.77 samples/sec   Loss 2.8673   LearningRate 0.0501   Epoch: 5   Global Step: 97480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:40,319-Speed 3233.91 samples/sec   Loss 2.8674   LearningRate 0.0501   Epoch: 5   Global Step: 97490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:43,382-Speed 3343.68 samples/sec   Loss 2.8068   LearningRate 0.0501   Epoch: 5   Global Step: 97500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:46,530-Speed 3253.39 samples/sec   Loss 2.8205   LearningRate 0.0501   Epoch: 5   Global Step: 97510   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:49,651-Speed 3281.57 samples/sec   Loss 2.7880   LearningRate 0.0501   Epoch: 5   Global Step: 97520   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:52,746-Speed 3310.65 samples/sec   Loss 2.8021   LearningRate 0.0501   Epoch: 5   Global Step: 97530   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:55,811-Speed 3341.16 samples/sec   Loss 2.8437   LearningRate 0.0501   Epoch: 5   Global Step: 97540   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:31:58,879-Speed 3338.37 samples/sec   Loss 2.7877   LearningRate 0.0501   Epoch: 5   Global Step: 97550   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:32:01,968-Speed 3315.62 samples/sec   Loss 2.9001   LearningRate 0.0501   Epoch: 5   Global Step: 97560   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:32:05,072-Speed 3301.08 samples/sec   Loss 2.8081   LearningRate 0.0501   Epoch: 5   Global Step: 97570   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:32:08,181-Speed 3294.13 samples/sec   Loss 2.8563   LearningRate 0.0501   Epoch: 5   Global Step: 97580   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:32:11,293-Speed 3291.06 samples/sec   Loss 2.8403   LearningRate 0.0501   Epoch: 5   Global Step: 97590   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:32:14,379-Speed 3320.31 samples/sec   Loss 2.9021   LearningRate 0.0501   Epoch: 5   Global Step: 97600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:32:17,449-Speed 3337.00 samples/sec   Loss 2.8445   LearningRate 0.0501   Epoch: 5   Global Step: 97610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:20,560-Speed 3292.41 samples/sec   Loss 2.8601   LearningRate 0.0501   Epoch: 5   Global Step: 97620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:23,638-Speed 3327.71 samples/sec   Loss 2.8809   LearningRate 0.0501   Epoch: 5   Global Step: 97630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:26,702-Speed 3342.79 samples/sec   Loss 2.8951   LearningRate 0.0501   Epoch: 5   Global Step: 97640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:29,795-Speed 3311.70 samples/sec   Loss 2.8580   LearningRate 0.0501   Epoch: 5   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:32,883-Speed 3317.02 samples/sec   Loss 2.8354   LearningRate 0.0500   Epoch: 5   Global Step: 97660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:35,951-Speed 3339.97 samples/sec   Loss 2.8152   LearningRate 0.0500   Epoch: 5   Global Step: 97670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:39,049-Speed 3306.31 samples/sec   Loss 2.8784   LearningRate 0.0500   Epoch: 5   Global Step: 97680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:42,200-Speed 3250.17 samples/sec   Loss 2.7739   LearningRate 0.0500   Epoch: 5   Global Step: 97690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:45,348-Speed 3254.02 samples/sec   Loss 2.8281   LearningRate 0.0500   Epoch: 5   Global Step: 97700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:32:48,418-Speed 3336.72 samples/sec   Loss 2.8425   LearningRate 0.0500   Epoch: 5   Global Step: 97710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:32:51,481-Speed 3344.09 samples/sec   Loss 2.8298   LearningRate 0.0500   Epoch: 5   Global Step: 97720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:32:54,576-Speed 3308.77 samples/sec   Loss 2.8584   LearningRate 0.0500   Epoch: 5   Global Step: 97730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:32:57,638-Speed 3345.01 samples/sec   Loss 2.8421   LearningRate 0.0500   Epoch: 5   Global Step: 97740   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:00,751-Speed 3290.16 samples/sec   Loss 2.7481   LearningRate 0.0500   Epoch: 5   Global Step: 97750   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:03,835-Speed 3321.91 samples/sec   Loss 2.8312   LearningRate 0.0500   Epoch: 5   Global Step: 97760   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:06,930-Speed 3309.48 samples/sec   Loss 2.8792   LearningRate 0.0500   Epoch: 5   Global Step: 97770   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:09,989-Speed 3347.80 samples/sec   Loss 2.8584   LearningRate 0.0500   Epoch: 5   Global Step: 97780   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:13,060-Speed 3335.21 samples/sec   Loss 2.8932   LearningRate 0.0500   Epoch: 5   Global Step: 97790   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:16,145-Speed 3320.35 samples/sec   Loss 2.8471   LearningRate 0.0500   Epoch: 5   Global Step: 97800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:19,247-Speed 3302.51 samples/sec   Loss 2.8654   LearningRate 0.0500   Epoch: 5   Global Step: 97810   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:22,380-Speed 3268.93 samples/sec   Loss 2.8613   LearningRate 0.0500   Epoch: 5   Global Step: 97820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:25,552-Speed 3229.54 samples/sec   Loss 2.8498   LearningRate 0.0500   Epoch: 5   Global Step: 97830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:28,686-Speed 3268.64 samples/sec   Loss 2.8774   LearningRate 0.0500   Epoch: 5   Global Step: 97840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:31,760-Speed 3332.23 samples/sec   Loss 2.8746   LearningRate 0.0500   Epoch: 5   Global Step: 97850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:34,827-Speed 3339.61 samples/sec   Loss 2.9169   LearningRate 0.0500   Epoch: 5   Global Step: 97860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:33:37,882-Speed 3352.80 samples/sec   Loss 2.8908   LearningRate 0.0500   Epoch: 5   Global Step: 97870   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:33:40,944-Speed 3344.78 samples/sec   Loss 2.8487   LearningRate 0.0500   Epoch: 5   Global Step: 97880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:33:44,015-Speed 3335.14 samples/sec   Loss 2.8216   LearningRate 0.0500   Epoch: 5   Global Step: 97890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:33:47,083-Speed 3340.37 samples/sec   Loss 2.9365   LearningRate 0.0499   Epoch: 5   Global Step: 97900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:33:50,148-Speed 3341.86 samples/sec   Loss 2.8261   LearningRate 0.0499   Epoch: 5   Global Step: 97910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:33:53,215-Speed 3339.42 samples/sec   Loss 2.8080   LearningRate 0.0499   Epoch: 5   Global Step: 97920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:33:56,291-Speed 3328.92 samples/sec   Loss 2.7450   LearningRate 0.0499   Epoch: 5   Global Step: 97930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:33:59,374-Speed 3323.23 samples/sec   Loss 2.8768   LearningRate 0.0499   Epoch: 5   Global Step: 97940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:34:02,518-Speed 3257.39 samples/sec   Loss 2.7943   LearningRate 0.0499   Epoch: 5   Global Step: 97950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:34:05,589-Speed 3335.30 samples/sec   Loss 2.8147   LearningRate 0.0499   Epoch: 5   Global Step: 97960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:34:08,773-Speed 3216.58 samples/sec   Loss 2.8268   LearningRate 0.0499   Epoch: 5   Global Step: 97970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:34:11,889-Speed 3287.14 samples/sec   Loss 2.8130   LearningRate 0.0499   Epoch: 5   Global Step: 97980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:34:14,987-Speed 3306.85 samples/sec   Loss 2.8620   LearningRate 0.0499   Epoch: 5   Global Step: 97990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:34:18,180-Speed 3207.56 samples/sec   Loss 2.8384   LearningRate 0.0499   Epoch: 5   Global Step: 98000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:35:02,261-[lfw][98000]XNorm: 21.638328
Training: 2022-04-11 09:35:02,261-[lfw][98000]Accuracy-Flip: 0.99783+-0.00308
Training: 2022-04-11 09:35:02,262-[lfw][98000]Accuracy-Highest: 0.99817
Training: 2022-04-11 09:35:53,000-[cfp_fp][98000]XNorm: 20.588497
Training: 2022-04-11 09:35:53,000-[cfp_fp][98000]Accuracy-Flip: 0.98471+-0.00629
Training: 2022-04-11 09:35:53,001-[cfp_fp][98000]Accuracy-Highest: 0.98557
Training: 2022-04-11 09:36:36,647-[agedb_30][98000]XNorm: 21.650965
Training: 2022-04-11 09:36:36,648-[agedb_30][98000]Accuracy-Flip: 0.97900+-0.00588
Training: 2022-04-11 09:36:36,648-[agedb_30][98000]Accuracy-Highest: 0.98200
Training: 2022-04-11 09:36:39,711-Speed 72.35 samples/sec   Loss 2.7756   LearningRate 0.0499   Epoch: 5   Global Step: 98010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:36:42,838-Speed 3275.34 samples/sec   Loss 2.7939   LearningRate 0.0499   Epoch: 5   Global Step: 98020   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:36:45,909-Speed 3334.84 samples/sec   Loss 2.8500   LearningRate 0.0499   Epoch: 5   Global Step: 98030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:36:48,987-Speed 3327.50 samples/sec   Loss 2.7885   LearningRate 0.0499   Epoch: 5   Global Step: 98040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:36:52,056-Speed 3337.57 samples/sec   Loss 2.8345   LearningRate 0.0499   Epoch: 5   Global Step: 98050   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:36:55,120-Speed 3342.83 samples/sec   Loss 2.8085   LearningRate 0.0499   Epoch: 5   Global Step: 98060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:36:58,172-Speed 3355.93 samples/sec   Loss 2.8200   LearningRate 0.0499   Epoch: 5   Global Step: 98070   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:01,244-Speed 3334.38 samples/sec   Loss 2.8542   LearningRate 0.0499   Epoch: 5   Global Step: 98080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:04,315-Speed 3335.29 samples/sec   Loss 2.8587   LearningRate 0.0499   Epoch: 5   Global Step: 98090   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:07,381-Speed 3341.12 samples/sec   Loss 2.8254   LearningRate 0.0499   Epoch: 5   Global Step: 98100   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:10,460-Speed 3326.88 samples/sec   Loss 2.7773   LearningRate 0.0499   Epoch: 5   Global Step: 98110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:13,538-Speed 3327.78 samples/sec   Loss 2.8767   LearningRate 0.0499   Epoch: 5   Global Step: 98120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:16,612-Speed 3332.30 samples/sec   Loss 2.7972   LearningRate 0.0498   Epoch: 5   Global Step: 98130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:19,720-Speed 3296.28 samples/sec   Loss 2.8916   LearningRate 0.0498   Epoch: 5   Global Step: 98140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:22,788-Speed 3338.36 samples/sec   Loss 2.7906   LearningRate 0.0498   Epoch: 5   Global Step: 98150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:25,980-Speed 3208.89 samples/sec   Loss 2.7940   LearningRate 0.0498   Epoch: 5   Global Step: 98160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:37:29,046-Speed 3340.28 samples/sec   Loss 2.7204   LearningRate 0.0498   Epoch: 5   Global Step: 98170   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:37:32,096-Speed 3358.57 samples/sec   Loss 2.8626   LearningRate 0.0498   Epoch: 5   Global Step: 98180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:37:35,239-Speed 3259.29 samples/sec   Loss 2.8021   LearningRate 0.0498   Epoch: 5   Global Step: 98190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:37:38,318-Speed 3326.50 samples/sec   Loss 2.8296   LearningRate 0.0498   Epoch: 5   Global Step: 98200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:37:41,408-Speed 3314.77 samples/sec   Loss 2.7725   LearningRate 0.0498   Epoch: 5   Global Step: 98210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:37:44,554-Speed 3257.13 samples/sec   Loss 2.8206   LearningRate 0.0498   Epoch: 5   Global Step: 98220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:37:47,662-Speed 3297.12 samples/sec   Loss 2.8139   LearningRate 0.0498   Epoch: 5   Global Step: 98230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:37:50,776-Speed 3288.69 samples/sec   Loss 2.8085   LearningRate 0.0498   Epoch: 5   Global Step: 98240   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:37:53,867-Speed 3313.89 samples/sec   Loss 2.8573   LearningRate 0.0498   Epoch: 5   Global Step: 98250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:37:56,996-Speed 3273.93 samples/sec   Loss 2.8419   LearningRate 0.0498   Epoch: 5   Global Step: 98260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:00,111-Speed 3287.93 samples/sec   Loss 2.8695   LearningRate 0.0498   Epoch: 5   Global Step: 98270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:03,306-Speed 3206.34 samples/sec   Loss 2.8157   LearningRate 0.0498   Epoch: 5   Global Step: 98280   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:38:06,420-Speed 3288.98 samples/sec   Loss 2.8759   LearningRate 0.0498   Epoch: 5   Global Step: 98290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:38:09,508-Speed 3316.12 samples/sec   Loss 2.8267   LearningRate 0.0498   Epoch: 5   Global Step: 98300   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:38:12,569-Speed 3346.33 samples/sec   Loss 2.8308   LearningRate 0.0498   Epoch: 5   Global Step: 98310   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:38:15,628-Speed 3348.48 samples/sec   Loss 2.8624   LearningRate 0.0498   Epoch: 5   Global Step: 98320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:38:18,691-Speed 3344.60 samples/sec   Loss 2.8322   LearningRate 0.0498   Epoch: 5   Global Step: 98330   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:38:21,755-Speed 3343.12 samples/sec   Loss 2.7715   LearningRate 0.0498   Epoch: 5   Global Step: 98340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:38:24,852-Speed 3307.28 samples/sec   Loss 2.8771   LearningRate 0.0498   Epoch: 5   Global Step: 98350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:38:27,922-Speed 3336.75 samples/sec   Loss 2.7937   LearningRate 0.0498   Epoch: 5   Global Step: 98360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:30,978-Speed 3351.05 samples/sec   Loss 2.8146   LearningRate 0.0497   Epoch: 5   Global Step: 98370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:34,057-Speed 3327.29 samples/sec   Loss 2.8471   LearningRate 0.0497   Epoch: 5   Global Step: 98380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:37,125-Speed 3338.03 samples/sec   Loss 2.8419   LearningRate 0.0497   Epoch: 5   Global Step: 98390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:40,214-Speed 3316.50 samples/sec   Loss 2.7023   LearningRate 0.0497   Epoch: 5   Global Step: 98400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:43,329-Speed 3288.59 samples/sec   Loss 2.8094   LearningRate 0.0497   Epoch: 5   Global Step: 98410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:46,383-Speed 3352.65 samples/sec   Loss 2.8087   LearningRate 0.0497   Epoch: 5   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:49,473-Speed 3315.74 samples/sec   Loss 2.7784   LearningRate 0.0497   Epoch: 5   Global Step: 98430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:52,539-Speed 3341.21 samples/sec   Loss 2.7519   LearningRate 0.0497   Epoch: 5   Global Step: 98440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:55,617-Speed 3326.64 samples/sec   Loss 2.7991   LearningRate 0.0497   Epoch: 5   Global Step: 98450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:38:58,692-Speed 3331.53 samples/sec   Loss 2.7950   LearningRate 0.0497   Epoch: 5   Global Step: 98460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:01,756-Speed 3343.48 samples/sec   Loss 2.8780   LearningRate 0.0497   Epoch: 5   Global Step: 98470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:04,886-Speed 3272.06 samples/sec   Loss 2.7746   LearningRate 0.0497   Epoch: 5   Global Step: 98480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:07,955-Speed 3337.08 samples/sec   Loss 2.8551   LearningRate 0.0497   Epoch: 5   Global Step: 98490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:11,017-Speed 3344.95 samples/sec   Loss 2.8184   LearningRate 0.0497   Epoch: 5   Global Step: 98500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:14,123-Speed 3298.04 samples/sec   Loss 2.8639   LearningRate 0.0497   Epoch: 5   Global Step: 98510   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:17,184-Speed 3346.45 samples/sec   Loss 2.7939   LearningRate 0.0497   Epoch: 5   Global Step: 98520   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:20,240-Speed 3351.10 samples/sec   Loss 2.7622   LearningRate 0.0497   Epoch: 5   Global Step: 98530   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:23,318-Speed 3327.43 samples/sec   Loss 2.8405   LearningRate 0.0497   Epoch: 5   Global Step: 98540   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:26,477-Speed 3242.52 samples/sec   Loss 2.7960   LearningRate 0.0497   Epoch: 5   Global Step: 98550   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:29,613-Speed 3266.69 samples/sec   Loss 2.8299   LearningRate 0.0497   Epoch: 5   Global Step: 98560   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:39:32,700-Speed 3318.41 samples/sec   Loss 2.8938   LearningRate 0.0497   Epoch: 5   Global Step: 98570   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:35,784-Speed 3320.08 samples/sec   Loss 2.9198   LearningRate 0.0497   Epoch: 5   Global Step: 98580   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:38,846-Speed 3345.63 samples/sec   Loss 2.8945   LearningRate 0.0497   Epoch: 5   Global Step: 98590   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:41,954-Speed 3295.98 samples/sec   Loss 2.8122   LearningRate 0.0497   Epoch: 5   Global Step: 98600   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:45,019-Speed 3341.72 samples/sec   Loss 2.9063   LearningRate 0.0496   Epoch: 5   Global Step: 98610   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:48,081-Speed 3344.74 samples/sec   Loss 2.7896   LearningRate 0.0496   Epoch: 5   Global Step: 98620   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:51,139-Speed 3350.36 samples/sec   Loss 2.8552   LearningRate 0.0496   Epoch: 5   Global Step: 98630   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:54,202-Speed 3343.37 samples/sec   Loss 2.7527   LearningRate 0.0496   Epoch: 5   Global Step: 98640   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:39:57,282-Speed 3324.97 samples/sec   Loss 2.6778   LearningRate 0.0496   Epoch: 5   Global Step: 98650   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:00,389-Speed 3296.93 samples/sec   Loss 2.8249   LearningRate 0.0496   Epoch: 5   Global Step: 98660   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:03,475-Speed 3320.14 samples/sec   Loss 2.9140   LearningRate 0.0496   Epoch: 5   Global Step: 98670   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:40:06,617-Speed 3259.24 samples/sec   Loss 2.9133   LearningRate 0.0496   Epoch: 5   Global Step: 98680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:09,691-Speed 3332.44 samples/sec   Loss 2.7817   LearningRate 0.0496   Epoch: 5   Global Step: 98690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:12,757-Speed 3340.24 samples/sec   Loss 2.8693   LearningRate 0.0496   Epoch: 5   Global Step: 98700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:15,874-Speed 3286.13 samples/sec   Loss 2.7343   LearningRate 0.0496   Epoch: 5   Global Step: 98710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:19,041-Speed 3235.15 samples/sec   Loss 2.8681   LearningRate 0.0496   Epoch: 5   Global Step: 98720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:22,100-Speed 3347.59 samples/sec   Loss 2.8749   LearningRate 0.0496   Epoch: 5   Global Step: 98730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:25,164-Speed 3343.54 samples/sec   Loss 2.7977   LearningRate 0.0496   Epoch: 5   Global Step: 98740   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:28,242-Speed 3327.42 samples/sec   Loss 2.8338   LearningRate 0.0496   Epoch: 5   Global Step: 98750   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:31,309-Speed 3339.66 samples/sec   Loss 2.8484   LearningRate 0.0496   Epoch: 5   Global Step: 98760   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:34,375-Speed 3340.75 samples/sec   Loss 2.8088   LearningRate 0.0496   Epoch: 5   Global Step: 98770   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:37,427-Speed 3355.84 samples/sec   Loss 2.8489   LearningRate 0.0496   Epoch: 5   Global Step: 98780   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:40,503-Speed 3330.23 samples/sec   Loss 2.7577   LearningRate 0.0496   Epoch: 5   Global Step: 98790   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:43,567-Speed 3343.44 samples/sec   Loss 2.8287   LearningRate 0.0496   Epoch: 5   Global Step: 98800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:46,626-Speed 3348.23 samples/sec   Loss 2.7947   LearningRate 0.0496   Epoch: 5   Global Step: 98810   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:49,698-Speed 3335.41 samples/sec   Loss 2.8682   LearningRate 0.0496   Epoch: 5   Global Step: 98820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:52,761-Speed 3343.92 samples/sec   Loss 2.8171   LearningRate 0.0496   Epoch: 5   Global Step: 98830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:55,840-Speed 3326.39 samples/sec   Loss 2.8403   LearningRate 0.0495   Epoch: 5   Global Step: 98840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:40:58,937-Speed 3307.47 samples/sec   Loss 2.8473   LearningRate 0.0495   Epoch: 5   Global Step: 98850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:01,997-Speed 3347.17 samples/sec   Loss 2.8059   LearningRate 0.0495   Epoch: 5   Global Step: 98860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:05,074-Speed 3327.94 samples/sec   Loss 2.8127   LearningRate 0.0495   Epoch: 5   Global Step: 98870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:08,134-Speed 3347.88 samples/sec   Loss 2.8912   LearningRate 0.0495   Epoch: 5   Global Step: 98880   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:41:11,190-Speed 3351.19 samples/sec   Loss 2.8227   LearningRate 0.0495   Epoch: 5   Global Step: 98890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:14,249-Speed 3348.57 samples/sec   Loss 2.8478   LearningRate 0.0495   Epoch: 5   Global Step: 98900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:17,362-Speed 3291.62 samples/sec   Loss 2.7964   LearningRate 0.0495   Epoch: 5   Global Step: 98910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:20,421-Speed 3348.72 samples/sec   Loss 2.8464   LearningRate 0.0495   Epoch: 5   Global Step: 98920   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:23,537-Speed 3286.57 samples/sec   Loss 2.8149   LearningRate 0.0495   Epoch: 5   Global Step: 98930   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:26,638-Speed 3303.18 samples/sec   Loss 2.8347   LearningRate 0.0495   Epoch: 5   Global Step: 98940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:29,764-Speed 3276.61 samples/sec   Loss 2.8788   LearningRate 0.0495   Epoch: 5   Global Step: 98950   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:32,856-Speed 3313.02 samples/sec   Loss 2.8216   LearningRate 0.0495   Epoch: 5   Global Step: 98960   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:35,942-Speed 3318.70 samples/sec   Loss 2.7532   LearningRate 0.0495   Epoch: 5   Global Step: 98970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:39,058-Speed 3287.92 samples/sec   Loss 2.8566   LearningRate 0.0495   Epoch: 5   Global Step: 98980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:42,181-Speed 3279.44 samples/sec   Loss 2.8802   LearningRate 0.0495   Epoch: 5   Global Step: 98990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:41:45,244-Speed 3343.42 samples/sec   Loss 2.7703   LearningRate 0.0495   Epoch: 5   Global Step: 99000   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:41:48,307-Speed 3345.00 samples/sec   Loss 2.8688   LearningRate 0.0495   Epoch: 5   Global Step: 99010   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:41:51,384-Speed 3329.17 samples/sec   Loss 2.8572   LearningRate 0.0495   Epoch: 5   Global Step: 99020   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:41:54,508-Speed 3279.59 samples/sec   Loss 2.7828   LearningRate 0.0495   Epoch: 5   Global Step: 99030   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:41:57,569-Speed 3345.76 samples/sec   Loss 2.7869   LearningRate 0.0495   Epoch: 5   Global Step: 99040   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:00,657-Speed 3316.91 samples/sec   Loss 2.8675   LearningRate 0.0495   Epoch: 5   Global Step: 99050   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:03,716-Speed 3348.07 samples/sec   Loss 2.8356   LearningRate 0.0495   Epoch: 5   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:06,778-Speed 3346.20 samples/sec   Loss 2.8645   LearningRate 0.0495   Epoch: 5   Global Step: 99070   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:09,837-Speed 3347.99 samples/sec   Loss 2.8052   LearningRate 0.0494   Epoch: 5   Global Step: 99080   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:12,910-Speed 3332.25 samples/sec   Loss 2.8418   LearningRate 0.0494   Epoch: 5   Global Step: 99090   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:15,970-Speed 3347.38 samples/sec   Loss 2.8022   LearningRate 0.0494   Epoch: 5   Global Step: 99100   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:42:19,048-Speed 3329.23 samples/sec   Loss 2.8101   LearningRate 0.0494   Epoch: 5   Global Step: 99110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:42:22,118-Speed 3336.77 samples/sec   Loss 2.7792   LearningRate 0.0494   Epoch: 5   Global Step: 99120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:42:25,174-Speed 3351.93 samples/sec   Loss 2.8125   LearningRate 0.0494   Epoch: 5   Global Step: 99130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:42:28,232-Speed 3348.55 samples/sec   Loss 2.8507   LearningRate 0.0494   Epoch: 5   Global Step: 99140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:31,324-Speed 3312.57 samples/sec   Loss 2.8456   LearningRate 0.0494   Epoch: 5   Global Step: 99150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:34,392-Speed 3339.10 samples/sec   Loss 2.7030   LearningRate 0.0494   Epoch: 5   Global Step: 99160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:37,458-Speed 3341.28 samples/sec   Loss 2.8248   LearningRate 0.0494   Epoch: 5   Global Step: 99170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:40,527-Speed 3337.18 samples/sec   Loss 2.8645   LearningRate 0.0494   Epoch: 5   Global Step: 99180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:43,666-Speed 3262.61 samples/sec   Loss 2.8091   LearningRate 0.0494   Epoch: 5   Global Step: 99190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:46,729-Speed 3344.10 samples/sec   Loss 2.8276   LearningRate 0.0494   Epoch: 5   Global Step: 99200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:49,804-Speed 3331.33 samples/sec   Loss 2.8407   LearningRate 0.0494   Epoch: 5   Global Step: 99210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:52,877-Speed 3332.82 samples/sec   Loss 2.8188   LearningRate 0.0494   Epoch: 5   Global Step: 99220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:55,951-Speed 3332.48 samples/sec   Loss 2.8127   LearningRate 0.0494   Epoch: 5   Global Step: 99230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:42:59,029-Speed 3327.94 samples/sec   Loss 2.7642   LearningRate 0.0494   Epoch: 5   Global Step: 99240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:43:02,097-Speed 3338.65 samples/sec   Loss 2.7804   LearningRate 0.0494   Epoch: 5   Global Step: 99250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:43:05,158-Speed 3345.90 samples/sec   Loss 2.8849   LearningRate 0.0494   Epoch: 5   Global Step: 99260   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:43:08,219-Speed 3345.84 samples/sec   Loss 2.8028   LearningRate 0.0494   Epoch: 5   Global Step: 99270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:11,307-Speed 3317.12 samples/sec   Loss 2.7925   LearningRate 0.0494   Epoch: 5   Global Step: 99280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:14,484-Speed 3224.08 samples/sec   Loss 2.8521   LearningRate 0.0494   Epoch: 5   Global Step: 99290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:17,585-Speed 3302.93 samples/sec   Loss 2.7973   LearningRate 0.0494   Epoch: 5   Global Step: 99300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:20,647-Speed 3346.89 samples/sec   Loss 2.7820   LearningRate 0.0494   Epoch: 5   Global Step: 99310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:23,703-Speed 3350.75 samples/sec   Loss 2.7713   LearningRate 0.0493   Epoch: 5   Global Step: 99320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:26,789-Speed 3319.18 samples/sec   Loss 2.8626   LearningRate 0.0493   Epoch: 5   Global Step: 99330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:29,914-Speed 3277.98 samples/sec   Loss 2.8016   LearningRate 0.0493   Epoch: 5   Global Step: 99340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:33,019-Speed 3298.24 samples/sec   Loss 2.8092   LearningRate 0.0493   Epoch: 5   Global Step: 99350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:36,126-Speed 3296.99 samples/sec   Loss 2.7870   LearningRate 0.0493   Epoch: 5   Global Step: 99360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:43:39,225-Speed 3306.14 samples/sec   Loss 2.7723   LearningRate 0.0493   Epoch: 5   Global Step: 99370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:43:42,357-Speed 3270.59 samples/sec   Loss 2.8790   LearningRate 0.0493   Epoch: 5   Global Step: 99380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:43:45,572-Speed 3185.96 samples/sec   Loss 2.7663   LearningRate 0.0493   Epoch: 5   Global Step: 99390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:43:48,668-Speed 3308.27 samples/sec   Loss 2.8165   LearningRate 0.0493   Epoch: 5   Global Step: 99400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:43:51,839-Speed 3229.82 samples/sec   Loss 2.8088   LearningRate 0.0493   Epoch: 5   Global Step: 99410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:43:54,920-Speed 3325.23 samples/sec   Loss 2.7731   LearningRate 0.0493   Epoch: 5   Global Step: 99420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:43:58,022-Speed 3302.14 samples/sec   Loss 2.8059   LearningRate 0.0493   Epoch: 5   Global Step: 99430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:44:01,107-Speed 3319.51 samples/sec   Loss 2.8060   LearningRate 0.0493   Epoch: 5   Global Step: 99440   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:44:04,176-Speed 3337.95 samples/sec   Loss 2.8941   LearningRate 0.0493   Epoch: 5   Global Step: 99450   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:44:07,328-Speed 3250.08 samples/sec   Loss 2.8188   LearningRate 0.0493   Epoch: 5   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:10,437-Speed 3294.59 samples/sec   Loss 2.7915   LearningRate 0.0493   Epoch: 5   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:13,498-Speed 3345.95 samples/sec   Loss 2.8519   LearningRate 0.0493   Epoch: 5   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:16,573-Speed 3330.82 samples/sec   Loss 2.7399   LearningRate 0.0493   Epoch: 5   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:19,644-Speed 3335.17 samples/sec   Loss 2.8193   LearningRate 0.0493   Epoch: 5   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:22,764-Speed 3283.35 samples/sec   Loss 2.9112   LearningRate 0.0493   Epoch: 5   Global Step: 99510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:25,888-Speed 3278.52 samples/sec   Loss 2.8307   LearningRate 0.0493   Epoch: 5   Global Step: 99520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:28,976-Speed 3316.22 samples/sec   Loss 2.8385   LearningRate 0.0493   Epoch: 5   Global Step: 99530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:32,071-Speed 3310.31 samples/sec   Loss 2.8243   LearningRate 0.0493   Epoch: 5   Global Step: 99540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:35,158-Speed 3317.61 samples/sec   Loss 2.7990   LearningRate 0.0493   Epoch: 5   Global Step: 99550   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:38,248-Speed 3315.94 samples/sec   Loss 2.7618   LearningRate 0.0492   Epoch: 5   Global Step: 99560   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:44:41,316-Speed 3338.11 samples/sec   Loss 2.8508   LearningRate 0.0492   Epoch: 5   Global Step: 99570   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:44:44,434-Speed 3285.88 samples/sec   Loss 2.7640   LearningRate 0.0492   Epoch: 5   Global Step: 99580   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:44:47,484-Speed 3357.51 samples/sec   Loss 2.8072   LearningRate 0.0492   Epoch: 5   Global Step: 99590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:50,580-Speed 3309.06 samples/sec   Loss 2.7488   LearningRate 0.0492   Epoch: 5   Global Step: 99600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:53,690-Speed 3293.51 samples/sec   Loss 2.7540   LearningRate 0.0492   Epoch: 5   Global Step: 99610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:56,756-Speed 3340.74 samples/sec   Loss 2.6934   LearningRate 0.0492   Epoch: 5   Global Step: 99620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:44:59,836-Speed 3325.91 samples/sec   Loss 2.8269   LearningRate 0.0492   Epoch: 5   Global Step: 99630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:45:03,015-Speed 3221.88 samples/sec   Loss 2.8541   LearningRate 0.0492   Epoch: 5   Global Step: 99640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:45:06,095-Speed 3325.06 samples/sec   Loss 2.7762   LearningRate 0.0492   Epoch: 5   Global Step: 99650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:45:09,167-Speed 3334.94 samples/sec   Loss 2.7732   LearningRate 0.0492   Epoch: 5   Global Step: 99660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:45:12,253-Speed 3318.30 samples/sec   Loss 2.8343   LearningRate 0.0492   Epoch: 5   Global Step: 99670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:45:15,316-Speed 3343.74 samples/sec   Loss 2.7442   LearningRate 0.0492   Epoch: 5   Global Step: 99680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:45:18,399-Speed 3322.44 samples/sec   Loss 2.8288   LearningRate 0.0492   Epoch: 5   Global Step: 99690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:21,473-Speed 3332.25 samples/sec   Loss 2.8319   LearningRate 0.0492   Epoch: 5   Global Step: 99700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:24,545-Speed 3334.33 samples/sec   Loss 2.8803   LearningRate 0.0492   Epoch: 5   Global Step: 99710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:27,612-Speed 3339.65 samples/sec   Loss 2.8270   LearningRate 0.0492   Epoch: 5   Global Step: 99720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:30,681-Speed 3337.53 samples/sec   Loss 2.7864   LearningRate 0.0492   Epoch: 5   Global Step: 99730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:33,813-Speed 3269.88 samples/sec   Loss 2.7728   LearningRate 0.0492   Epoch: 5   Global Step: 99740   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:36,934-Speed 3281.93 samples/sec   Loss 2.7806   LearningRate 0.0492   Epoch: 5   Global Step: 99750   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:39,998-Speed 3342.77 samples/sec   Loss 2.7328   LearningRate 0.0492   Epoch: 5   Global Step: 99760   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:43,089-Speed 3313.73 samples/sec   Loss 2.8330   LearningRate 0.0492   Epoch: 5   Global Step: 99770   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:46,174-Speed 3320.05 samples/sec   Loss 2.7830   LearningRate 0.0492   Epoch: 5   Global Step: 99780   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:49,273-Speed 3306.17 samples/sec   Loss 2.7399   LearningRate 0.0491   Epoch: 5   Global Step: 99790   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:52,345-Speed 3333.62 samples/sec   Loss 2.8466   LearningRate 0.0491   Epoch: 5   Global Step: 99800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:55,412-Speed 3340.21 samples/sec   Loss 2.8274   LearningRate 0.0491   Epoch: 5   Global Step: 99810   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:45:58,484-Speed 3333.44 samples/sec   Loss 2.7647   LearningRate 0.0491   Epoch: 5   Global Step: 99820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:46:01,552-Speed 3339.32 samples/sec   Loss 2.8129   LearningRate 0.0491   Epoch: 5   Global Step: 99830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:46:04,653-Speed 3302.43 samples/sec   Loss 2.8756   LearningRate 0.0491   Epoch: 5   Global Step: 99840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:46:07,723-Speed 3337.68 samples/sec   Loss 2.8391   LearningRate 0.0491   Epoch: 5   Global Step: 99850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:46:10,801-Speed 3327.55 samples/sec   Loss 2.8547   LearningRate 0.0491   Epoch: 5   Global Step: 99860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:46:13,876-Speed 3330.61 samples/sec   Loss 2.8329   LearningRate 0.0491   Epoch: 5   Global Step: 99870   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:46:16,971-Speed 3309.13 samples/sec   Loss 2.8703   LearningRate 0.0491   Epoch: 5   Global Step: 99880   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:20,075-Speed 3299.69 samples/sec   Loss 2.8199   LearningRate 0.0491   Epoch: 5   Global Step: 99890   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:23,158-Speed 3323.21 samples/sec   Loss 2.7823   LearningRate 0.0491   Epoch: 5   Global Step: 99900   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:26,226-Speed 3338.25 samples/sec   Loss 2.8141   LearningRate 0.0491   Epoch: 5   Global Step: 99910   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:29,290-Speed 3343.56 samples/sec   Loss 2.7863   LearningRate 0.0491   Epoch: 5   Global Step: 99920   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:32,380-Speed 3314.96 samples/sec   Loss 2.7839   LearningRate 0.0491   Epoch: 5   Global Step: 99930   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:35,487-Speed 3296.52 samples/sec   Loss 2.7397   LearningRate 0.0491   Epoch: 5   Global Step: 99940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:38,607-Speed 3282.52 samples/sec   Loss 2.7637   LearningRate 0.0491   Epoch: 5   Global Step: 99950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:41,679-Speed 3334.37 samples/sec   Loss 2.7465   LearningRate 0.0491   Epoch: 5   Global Step: 99960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:44,743-Speed 3342.79 samples/sec   Loss 2.7463   LearningRate 0.0491   Epoch: 5   Global Step: 99970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:46:47,895-Speed 3249.88 samples/sec   Loss 2.8381   LearningRate 0.0491   Epoch: 5   Global Step: 99980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:46:50,962-Speed 3340.04 samples/sec   Loss 2.8469   LearningRate 0.0491   Epoch: 5   Global Step: 99990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:46:54,046-Speed 3321.71 samples/sec   Loss 2.8778   LearningRate 0.0491   Epoch: 5   Global Step: 100000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:47:38,455-[lfw][100000]XNorm: 22.055729
Training: 2022-04-11 09:47:38,455-[lfw][100000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 09:47:38,456-[lfw][100000]Accuracy-Highest: 0.99817
Training: 2022-04-11 09:48:29,489-[cfp_fp][100000]XNorm: 21.061827
Training: 2022-04-11 09:48:29,490-[cfp_fp][100000]Accuracy-Flip: 0.98143+-0.00613
Training: 2022-04-11 09:48:29,491-[cfp_fp][100000]Accuracy-Highest: 0.98557
Training: 2022-04-11 09:49:13,400-[agedb_30][100000]XNorm: 22.439924
Training: 2022-04-11 09:49:13,400-[agedb_30][100000]Accuracy-Flip: 0.98033+-0.00722
Training: 2022-04-11 09:49:13,401-[agedb_30][100000]Accuracy-Highest: 0.98200
Training: 2022-04-11 09:49:16,482-Speed 71.89 samples/sec   Loss 2.7811   LearningRate 0.0491   Epoch: 5   Global Step: 100010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:19,526-Speed 3364.29 samples/sec   Loss 2.8144   LearningRate 0.0491   Epoch: 5   Global Step: 100020   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:22,587-Speed 3346.10 samples/sec   Loss 2.7578   LearningRate 0.0490   Epoch: 5   Global Step: 100030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:25,640-Speed 3355.21 samples/sec   Loss 2.8821   LearningRate 0.0490   Epoch: 5   Global Step: 100040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:28,697-Speed 3350.83 samples/sec   Loss 2.6997   LearningRate 0.0490   Epoch: 5   Global Step: 100050   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:31,750-Speed 3354.60 samples/sec   Loss 2.8654   LearningRate 0.0490   Epoch: 5   Global Step: 100060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:34,815-Speed 3342.36 samples/sec   Loss 2.8094   LearningRate 0.0490   Epoch: 5   Global Step: 100070   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:37,862-Speed 3361.76 samples/sec   Loss 2.7896   LearningRate 0.0490   Epoch: 5   Global Step: 100080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:40,949-Speed 3317.84 samples/sec   Loss 2.8271   LearningRate 0.0490   Epoch: 5   Global Step: 100090   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:44,003-Speed 3353.85 samples/sec   Loss 2.7496   LearningRate 0.0490   Epoch: 5   Global Step: 100100   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:47,078-Speed 3331.18 samples/sec   Loss 2.8390   LearningRate 0.0490   Epoch: 5   Global Step: 100110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:50,137-Speed 3348.31 samples/sec   Loss 2.8264   LearningRate 0.0490   Epoch: 5   Global Step: 100120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:53,198-Speed 3345.87 samples/sec   Loss 2.8143   LearningRate 0.0490   Epoch: 5   Global Step: 100130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:49:56,504-Speed 3098.26 samples/sec   Loss 2.7927   LearningRate 0.0490   Epoch: 5   Global Step: 100140   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:28,494-Speed 320.12 samples/sec   Loss 2.5960   LearningRate 0.0490   Epoch: 6   Global Step: 100150   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:31,702-Speed 3193.07 samples/sec   Loss 2.2016   LearningRate 0.0490   Epoch: 6   Global Step: 100160   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:34,971-Speed 3133.38 samples/sec   Loss 2.2380   LearningRate 0.0490   Epoch: 6   Global Step: 100170   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:38,061-Speed 3314.89 samples/sec   Loss 2.2529   LearningRate 0.0490   Epoch: 6   Global Step: 100180   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:41,150-Speed 3316.48 samples/sec   Loss 2.1971   LearningRate 0.0490   Epoch: 6   Global Step: 100190   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:44,208-Speed 3350.34 samples/sec   Loss 2.2462   LearningRate 0.0490   Epoch: 6   Global Step: 100200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:47,298-Speed 3313.85 samples/sec   Loss 2.1799   LearningRate 0.0490   Epoch: 6   Global Step: 100210   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:50,373-Speed 3331.56 samples/sec   Loss 2.1975   LearningRate 0.0490   Epoch: 6   Global Step: 100220   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:53,438-Speed 3341.65 samples/sec   Loss 2.2020   LearningRate 0.0490   Epoch: 6   Global Step: 100230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:56,509-Speed 3335.72 samples/sec   Loss 2.1979   LearningRate 0.0490   Epoch: 6   Global Step: 100240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:50:59,568-Speed 3348.31 samples/sec   Loss 2.2625   LearningRate 0.0490   Epoch: 6   Global Step: 100250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:02,629-Speed 3345.66 samples/sec   Loss 2.2333   LearningRate 0.0490   Epoch: 6   Global Step: 100260   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:05,698-Speed 3338.56 samples/sec   Loss 2.1856   LearningRate 0.0489   Epoch: 6   Global Step: 100270   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:08,777-Speed 3326.73 samples/sec   Loss 2.2313   LearningRate 0.0489   Epoch: 6   Global Step: 100280   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:11,857-Speed 3326.55 samples/sec   Loss 2.2949   LearningRate 0.0489   Epoch: 6   Global Step: 100290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:15,551-Speed 2772.69 samples/sec   Loss 2.1700   LearningRate 0.0489   Epoch: 6   Global Step: 100300   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:18,647-Speed 3308.10 samples/sec   Loss 2.2686   LearningRate 0.0489   Epoch: 6   Global Step: 100310   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:21,808-Speed 3241.17 samples/sec   Loss 2.2070   LearningRate 0.0489   Epoch: 6   Global Step: 100320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:24,883-Speed 3331.75 samples/sec   Loss 2.1742   LearningRate 0.0489   Epoch: 6   Global Step: 100330   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:27,962-Speed 3326.86 samples/sec   Loss 2.2094   LearningRate 0.0489   Epoch: 6   Global Step: 100340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:31,057-Speed 3309.38 samples/sec   Loss 2.2494   LearningRate 0.0489   Epoch: 6   Global Step: 100350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:34,157-Speed 3303.84 samples/sec   Loss 2.2819   LearningRate 0.0489   Epoch: 6   Global Step: 100360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:37,419-Speed 3140.57 samples/sec   Loss 2.2377   LearningRate 0.0489   Epoch: 6   Global Step: 100370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:40,511-Speed 3312.79 samples/sec   Loss 2.1664   LearningRate 0.0489   Epoch: 6   Global Step: 100380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:43,698-Speed 3215.09 samples/sec   Loss 2.2195   LearningRate 0.0489   Epoch: 6   Global Step: 100390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:47,873-Speed 2453.33 samples/sec   Loss 2.2028   LearningRate 0.0489   Epoch: 6   Global Step: 100400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:50,947-Speed 3332.34 samples/sec   Loss 2.1626   LearningRate 0.0489   Epoch: 6   Global Step: 100410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:54,080-Speed 3270.22 samples/sec   Loss 2.2539   LearningRate 0.0489   Epoch: 6   Global Step: 100420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:51:57,262-Speed 3220.05 samples/sec   Loss 2.2128   LearningRate 0.0489   Epoch: 6   Global Step: 100430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:00,376-Speed 3289.52 samples/sec   Loss 2.2449   LearningRate 0.0489   Epoch: 6   Global Step: 100440   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:03,450-Speed 3332.22 samples/sec   Loss 2.2182   LearningRate 0.0489   Epoch: 6   Global Step: 100450   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:06,636-Speed 3215.51 samples/sec   Loss 2.2396   LearningRate 0.0489   Epoch: 6   Global Step: 100460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:09,814-Speed 3223.76 samples/sec   Loss 2.1983   LearningRate 0.0489   Epoch: 6   Global Step: 100470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:12,927-Speed 3291.12 samples/sec   Loss 2.2385   LearningRate 0.0489   Epoch: 6   Global Step: 100480   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:52:15,983-Speed 3352.15 samples/sec   Loss 2.2640   LearningRate 0.0489   Epoch: 6   Global Step: 100490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:19,053-Speed 3336.63 samples/sec   Loss 2.2243   LearningRate 0.0489   Epoch: 6   Global Step: 100500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:22,116-Speed 3345.00 samples/sec   Loss 2.2806   LearningRate 0.0488   Epoch: 6   Global Step: 100510   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:25,250-Speed 3268.77 samples/sec   Loss 2.2667   LearningRate 0.0488   Epoch: 6   Global Step: 100520   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:28,326-Speed 3330.36 samples/sec   Loss 2.2665   LearningRate 0.0488   Epoch: 6   Global Step: 100530   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:31,403-Speed 3329.09 samples/sec   Loss 2.2211   LearningRate 0.0488   Epoch: 6   Global Step: 100540   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:34,510-Speed 3297.25 samples/sec   Loss 2.2542   LearningRate 0.0488   Epoch: 6   Global Step: 100550   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:52:37,724-Speed 3186.64 samples/sec   Loss 2.2115   LearningRate 0.0488   Epoch: 6   Global Step: 100560   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:52:40,933-Speed 3192.83 samples/sec   Loss 2.2358   LearningRate 0.0488   Epoch: 6   Global Step: 100570   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:52:44,075-Speed 3260.61 samples/sec   Loss 2.2839   LearningRate 0.0488   Epoch: 6   Global Step: 100580   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:52:47,210-Speed 3267.30 samples/sec   Loss 2.2577   LearningRate 0.0488   Epoch: 6   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:52:50,272-Speed 3346.21 samples/sec   Loss 2.2979   LearningRate 0.0488   Epoch: 6   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:52:53,373-Speed 3303.93 samples/sec   Loss 2.3081   LearningRate 0.0488   Epoch: 6   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:52:56,503-Speed 3272.53 samples/sec   Loss 2.2549   LearningRate 0.0488   Epoch: 6   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:52:59,606-Speed 3300.80 samples/sec   Loss 2.2521   LearningRate 0.0488   Epoch: 6   Global Step: 100630   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:02,723-Speed 3287.34 samples/sec   Loss 2.2387   LearningRate 0.0488   Epoch: 6   Global Step: 100640   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:05,817-Speed 3309.77 samples/sec   Loss 2.2530   LearningRate 0.0488   Epoch: 6   Global Step: 100650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:08,888-Speed 3335.33 samples/sec   Loss 2.3080   LearningRate 0.0488   Epoch: 6   Global Step: 100660   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:53:11,956-Speed 3340.25 samples/sec   Loss 2.2107   LearningRate 0.0488   Epoch: 6   Global Step: 100670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:15,019-Speed 3344.18 samples/sec   Loss 2.2576   LearningRate 0.0488   Epoch: 6   Global Step: 100680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:18,157-Speed 3264.73 samples/sec   Loss 2.3297   LearningRate 0.0488   Epoch: 6   Global Step: 100690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:21,224-Speed 3340.20 samples/sec   Loss 2.2532   LearningRate 0.0488   Epoch: 6   Global Step: 100700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:24,385-Speed 3240.07 samples/sec   Loss 2.3339   LearningRate 0.0488   Epoch: 6   Global Step: 100710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:27,456-Speed 3336.35 samples/sec   Loss 2.2427   LearningRate 0.0488   Epoch: 6   Global Step: 100720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:30,614-Speed 3243.58 samples/sec   Loss 2.2868   LearningRate 0.0488   Epoch: 6   Global Step: 100730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:33,753-Speed 3263.82 samples/sec   Loss 2.2412   LearningRate 0.0488   Epoch: 6   Global Step: 100740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:36,835-Speed 3322.71 samples/sec   Loss 2.2431   LearningRate 0.0487   Epoch: 6   Global Step: 100750   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:40,143-Speed 3096.55 samples/sec   Loss 2.2966   LearningRate 0.0487   Epoch: 6   Global Step: 100760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:53:43,257-Speed 3289.33 samples/sec   Loss 2.2731   LearningRate 0.0487   Epoch: 6   Global Step: 100770   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:53:46,389-Speed 3271.09 samples/sec   Loss 2.2643   LearningRate 0.0487   Epoch: 6   Global Step: 100780   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:53:49,462-Speed 3333.12 samples/sec   Loss 2.2642   LearningRate 0.0487   Epoch: 6   Global Step: 100790   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:53:52,557-Speed 3309.37 samples/sec   Loss 2.2369   LearningRate 0.0487   Epoch: 6   Global Step: 100800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:53:55,652-Speed 3310.70 samples/sec   Loss 2.2941   LearningRate 0.0487   Epoch: 6   Global Step: 100810   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:53:58,784-Speed 3270.67 samples/sec   Loss 2.3470   LearningRate 0.0487   Epoch: 6   Global Step: 100820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:01,862-Speed 3327.84 samples/sec   Loss 2.2470   LearningRate 0.0487   Epoch: 6   Global Step: 100830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:04,966-Speed 3300.05 samples/sec   Loss 2.2716   LearningRate 0.0487   Epoch: 6   Global Step: 100840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:08,213-Speed 3155.21 samples/sec   Loss 2.2830   LearningRate 0.0487   Epoch: 6   Global Step: 100850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:11,316-Speed 3300.78 samples/sec   Loss 2.3657   LearningRate 0.0487   Epoch: 6   Global Step: 100860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:14,486-Speed 3231.88 samples/sec   Loss 2.2556   LearningRate 0.0487   Epoch: 6   Global Step: 100870   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-11 09:54:17,647-Speed 3241.50 samples/sec   Loss 2.3341   LearningRate 0.0487   Epoch: 6   Global Step: 100880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:20,781-Speed 3268.04 samples/sec   Loss 2.2958   LearningRate 0.0487   Epoch: 6   Global Step: 100890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:23,902-Speed 3282.25 samples/sec   Loss 2.3093   LearningRate 0.0487   Epoch: 6   Global Step: 100900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:27,071-Speed 3232.46 samples/sec   Loss 2.3257   LearningRate 0.0487   Epoch: 6   Global Step: 100910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:30,176-Speed 3299.17 samples/sec   Loss 2.2596   LearningRate 0.0487   Epoch: 6   Global Step: 100920   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:33,268-Speed 3313.15 samples/sec   Loss 2.2917   LearningRate 0.0487   Epoch: 6   Global Step: 100930   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:36,340-Speed 3335.25 samples/sec   Loss 2.3021   LearningRate 0.0487   Epoch: 6   Global Step: 100940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:39,408-Speed 3338.60 samples/sec   Loss 2.2853   LearningRate 0.0487   Epoch: 6   Global Step: 100950   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:42,562-Speed 3248.10 samples/sec   Loss 2.2933   LearningRate 0.0487   Epoch: 6   Global Step: 100960   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:45,654-Speed 3312.67 samples/sec   Loss 2.3190   LearningRate 0.0487   Epoch: 6   Global Step: 100970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:48,799-Speed 3256.98 samples/sec   Loss 2.3815   LearningRate 0.0487   Epoch: 6   Global Step: 100980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:51,891-Speed 3312.44 samples/sec   Loss 2.2200   LearningRate 0.0486   Epoch: 6   Global Step: 100990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:55,007-Speed 3287.83 samples/sec   Loss 2.2399   LearningRate 0.0486   Epoch: 6   Global Step: 101000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:54:58,077-Speed 3336.28 samples/sec   Loss 2.3699   LearningRate 0.0486   Epoch: 6   Global Step: 101010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:01,287-Speed 3190.81 samples/sec   Loss 2.3004   LearningRate 0.0486   Epoch: 6   Global Step: 101020   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:04,404-Speed 3287.64 samples/sec   Loss 2.2369   LearningRate 0.0486   Epoch: 6   Global Step: 101030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:07,527-Speed 3280.53 samples/sec   Loss 2.2853   LearningRate 0.0486   Epoch: 6   Global Step: 101040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:10,596-Speed 3337.43 samples/sec   Loss 2.2710   LearningRate 0.0486   Epoch: 6   Global Step: 101050   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:13,717-Speed 3281.28 samples/sec   Loss 2.3147   LearningRate 0.0486   Epoch: 6   Global Step: 101060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:16,891-Speed 3227.33 samples/sec   Loss 2.3466   LearningRate 0.0486   Epoch: 6   Global Step: 101070   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:20,006-Speed 3289.43 samples/sec   Loss 2.2467   LearningRate 0.0486   Epoch: 6   Global Step: 101080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:23,077-Speed 3335.03 samples/sec   Loss 2.3194   LearningRate 0.0486   Epoch: 6   Global Step: 101090   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:26,402-Speed 3081.38 samples/sec   Loss 2.2978   LearningRate 0.0486   Epoch: 6   Global Step: 101100   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:29,578-Speed 3226.05 samples/sec   Loss 2.2730   LearningRate 0.0486   Epoch: 6   Global Step: 101110   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:32,695-Speed 3285.79 samples/sec   Loss 2.3619   LearningRate 0.0486   Epoch: 6   Global Step: 101120   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:35,823-Speed 3274.56 samples/sec   Loss 2.3473   LearningRate 0.0486   Epoch: 6   Global Step: 101130   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:55:38,878-Speed 3352.84 samples/sec   Loss 2.2889   LearningRate 0.0486   Epoch: 6   Global Step: 101140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:55:42,037-Speed 3243.01 samples/sec   Loss 2.2798   LearningRate 0.0486   Epoch: 6   Global Step: 101150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:55:45,108-Speed 3335.82 samples/sec   Loss 2.3784   LearningRate 0.0486   Epoch: 6   Global Step: 101160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:55:48,276-Speed 3233.64 samples/sec   Loss 2.2734   LearningRate 0.0486   Epoch: 6   Global Step: 101170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:55:51,347-Speed 3336.44 samples/sec   Loss 2.3429   LearningRate 0.0486   Epoch: 6   Global Step: 101180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:55:54,479-Speed 3270.74 samples/sec   Loss 2.2638   LearningRate 0.0486   Epoch: 6   Global Step: 101190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:55:57,601-Speed 3281.42 samples/sec   Loss 2.3058   LearningRate 0.0486   Epoch: 6   Global Step: 101200   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:56:00,726-Speed 3277.91 samples/sec   Loss 2.2677   LearningRate 0.0486   Epoch: 6   Global Step: 101210   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:56:03,814-Speed 3317.05 samples/sec   Loss 2.3565   LearningRate 0.0486   Epoch: 6   Global Step: 101220   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:56:06,979-Speed 3237.33 samples/sec   Loss 2.3227   LearningRate 0.0485   Epoch: 6   Global Step: 101230   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-11 09:56:10,063-Speed 3321.83 samples/sec   Loss 2.3371   LearningRate 0.0485   Epoch: 6   Global Step: 101240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:13,139-Speed 3330.52 samples/sec   Loss 2.3519   LearningRate 0.0485   Epoch: 6   Global Step: 101250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:16,268-Speed 3274.10 samples/sec   Loss 2.3217   LearningRate 0.0485   Epoch: 6   Global Step: 101260   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:19,361-Speed 3311.68 samples/sec   Loss 2.3059   LearningRate 0.0485   Epoch: 6   Global Step: 101270   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:22,513-Speed 3250.18 samples/sec   Loss 2.3339   LearningRate 0.0485   Epoch: 6   Global Step: 101280   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:25,653-Speed 3262.64 samples/sec   Loss 2.3274   LearningRate 0.0485   Epoch: 6   Global Step: 101290   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:28,758-Speed 3298.64 samples/sec   Loss 2.3234   LearningRate 0.0485   Epoch: 6   Global Step: 101300   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:31,874-Speed 3288.76 samples/sec   Loss 2.3040   LearningRate 0.0485   Epoch: 6   Global Step: 101310   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:35,046-Speed 3229.57 samples/sec   Loss 2.3008   LearningRate 0.0485   Epoch: 6   Global Step: 101320   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:38,153-Speed 3296.89 samples/sec   Loss 2.3215   LearningRate 0.0485   Epoch: 6   Global Step: 101330   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:41,211-Speed 3349.21 samples/sec   Loss 2.3567   LearningRate 0.0485   Epoch: 6   Global Step: 101340   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:44,329-Speed 3285.43 samples/sec   Loss 2.3776   LearningRate 0.0485   Epoch: 6   Global Step: 101350   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:47,483-Speed 3248.03 samples/sec   Loss 2.4089   LearningRate 0.0485   Epoch: 6   Global Step: 101360   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:50,652-Speed 3232.93 samples/sec   Loss 2.3108   LearningRate 0.0485   Epoch: 6   Global Step: 101370   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:53,720-Speed 3338.74 samples/sec   Loss 2.2698   LearningRate 0.0485   Epoch: 6   Global Step: 101380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:56,833-Speed 3290.72 samples/sec   Loss 2.3425   LearningRate 0.0485   Epoch: 6   Global Step: 101390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:56:59,921-Speed 3317.78 samples/sec   Loss 2.4015   LearningRate 0.0485   Epoch: 6   Global Step: 101400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:02,999-Speed 3327.45 samples/sec   Loss 2.3514   LearningRate 0.0485   Epoch: 6   Global Step: 101410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:06,166-Speed 3234.47 samples/sec   Loss 2.3677   LearningRate 0.0485   Epoch: 6   Global Step: 101420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:09,316-Speed 3253.02 samples/sec   Loss 2.3077   LearningRate 0.0485   Epoch: 6   Global Step: 101430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:12,373-Speed 3351.29 samples/sec   Loss 2.3021   LearningRate 0.0485   Epoch: 6   Global Step: 101440   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:15,508-Speed 3267.24 samples/sec   Loss 2.3769   LearningRate 0.0485   Epoch: 6   Global Step: 101450   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:18,648-Speed 3261.24 samples/sec   Loss 2.3558   LearningRate 0.0485   Epoch: 6   Global Step: 101460   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:21,808-Speed 3242.56 samples/sec   Loss 2.3667   LearningRate 0.0484   Epoch: 6   Global Step: 101470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:24,955-Speed 3254.82 samples/sec   Loss 2.3352   LearningRate 0.0484   Epoch: 6   Global Step: 101480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:28,060-Speed 3298.86 samples/sec   Loss 2.3334   LearningRate 0.0484   Epoch: 6   Global Step: 101490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:31,287-Speed 3175.02 samples/sec   Loss 2.3316   LearningRate 0.0484   Epoch: 6   Global Step: 101500   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 09:57:34,412-Speed 3277.70 samples/sec   Loss 2.3533   LearningRate 0.0484   Epoch: 6   Global Step: 101510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:57:37,511-Speed 3306.92 samples/sec   Loss 2.3175   LearningRate 0.0484   Epoch: 6   Global Step: 101520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:57:40,582-Speed 3335.61 samples/sec   Loss 2.2870   LearningRate 0.0484   Epoch: 6   Global Step: 101530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:57:43,715-Speed 3269.23 samples/sec   Loss 2.2673   LearningRate 0.0484   Epoch: 6   Global Step: 101540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:57:46,809-Speed 3310.48 samples/sec   Loss 2.3547   LearningRate 0.0484   Epoch: 6   Global Step: 101550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:57:50,047-Speed 3164.43 samples/sec   Loss 2.3569   LearningRate 0.0484   Epoch: 6   Global Step: 101560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:57:53,210-Speed 3238.27 samples/sec   Loss 2.3804   LearningRate 0.0484   Epoch: 6   Global Step: 101570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:57:56,328-Speed 3285.48 samples/sec   Loss 2.3544   LearningRate 0.0484   Epoch: 6   Global Step: 101580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:57:59,405-Speed 3328.51 samples/sec   Loss 2.3792   LearningRate 0.0484   Epoch: 6   Global Step: 101590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:58:02,520-Speed 3288.23 samples/sec   Loss 2.3860   LearningRate 0.0484   Epoch: 6   Global Step: 101600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:58:05,613-Speed 3312.04 samples/sec   Loss 2.3793   LearningRate 0.0484   Epoch: 6   Global Step: 101610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:58:08,770-Speed 3245.31 samples/sec   Loss 2.2972   LearningRate 0.0484   Epoch: 6   Global Step: 101620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:58:11,856-Speed 3319.12 samples/sec   Loss 2.3811   LearningRate 0.0484   Epoch: 6   Global Step: 101630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:58:14,980-Speed 3279.50 samples/sec   Loss 2.3840   LearningRate 0.0484   Epoch: 6   Global Step: 101640   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 09:58:18,027-Speed 3361.74 samples/sec   Loss 2.4071   LearningRate 0.0484   Epoch: 6   Global Step: 101650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:21,106-Speed 3326.41 samples/sec   Loss 2.3534   LearningRate 0.0484   Epoch: 6   Global Step: 101660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:24,278-Speed 3229.30 samples/sec   Loss 2.3460   LearningRate 0.0484   Epoch: 6   Global Step: 101670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:27,384-Speed 3298.83 samples/sec   Loss 2.4335   LearningRate 0.0484   Epoch: 6   Global Step: 101680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:30,558-Speed 3226.98 samples/sec   Loss 2.3219   LearningRate 0.0484   Epoch: 6   Global Step: 101690   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:33,687-Speed 3273.42 samples/sec   Loss 2.3075   LearningRate 0.0484   Epoch: 6   Global Step: 101700   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:36,794-Speed 3296.89 samples/sec   Loss 2.3640   LearningRate 0.0483   Epoch: 6   Global Step: 101710   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:39,936-Speed 3260.70 samples/sec   Loss 2.3400   LearningRate 0.0483   Epoch: 6   Global Step: 101720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:43,019-Speed 3322.22 samples/sec   Loss 2.3345   LearningRate 0.0483   Epoch: 6   Global Step: 101730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:46,138-Speed 3284.75 samples/sec   Loss 2.4455   LearningRate 0.0483   Epoch: 6   Global Step: 101740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 09:58:49,214-Speed 3330.53 samples/sec   Loss 2.3460   LearningRate 0.0483   Epoch: 6   Global Step: 101750   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:58:52,291-Speed 3328.36 samples/sec   Loss 2.3927   LearningRate 0.0483   Epoch: 6   Global Step: 101760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:58:55,401-Speed 3294.02 samples/sec   Loss 2.3513   LearningRate 0.0483   Epoch: 6   Global Step: 101770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:58:58,513-Speed 3292.25 samples/sec   Loss 2.3745   LearningRate 0.0483   Epoch: 6   Global Step: 101780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:01,614-Speed 3302.70 samples/sec   Loss 2.3489   LearningRate 0.0483   Epoch: 6   Global Step: 101790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:04,691-Speed 3330.46 samples/sec   Loss 2.4075   LearningRate 0.0483   Epoch: 6   Global Step: 101800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:07,761-Speed 3337.03 samples/sec   Loss 2.3686   LearningRate 0.0483   Epoch: 6   Global Step: 101810   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:10,865-Speed 3299.62 samples/sec   Loss 2.3843   LearningRate 0.0483   Epoch: 6   Global Step: 101820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:13,948-Speed 3322.66 samples/sec   Loss 2.3538   LearningRate 0.0483   Epoch: 6   Global Step: 101830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:17,041-Speed 3312.50 samples/sec   Loss 2.3775   LearningRate 0.0483   Epoch: 6   Global Step: 101840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:20,108-Speed 3339.67 samples/sec   Loss 2.3413   LearningRate 0.0483   Epoch: 6   Global Step: 101850   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 09:59:23,226-Speed 3284.62 samples/sec   Loss 2.3526   LearningRate 0.0483   Epoch: 6   Global Step: 101860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:26,305-Speed 3327.48 samples/sec   Loss 2.3613   LearningRate 0.0483   Epoch: 6   Global Step: 101870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:29,382-Speed 3328.75 samples/sec   Loss 2.4194   LearningRate 0.0483   Epoch: 6   Global Step: 101880   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:32,548-Speed 3236.01 samples/sec   Loss 2.4919   LearningRate 0.0483   Epoch: 6   Global Step: 101890   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:35,686-Speed 3265.12 samples/sec   Loss 2.4005   LearningRate 0.0483   Epoch: 6   Global Step: 101900   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:38,804-Speed 3284.78 samples/sec   Loss 2.4516   LearningRate 0.0483   Epoch: 6   Global Step: 101910   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:41,880-Speed 3330.78 samples/sec   Loss 2.4421   LearningRate 0.0483   Epoch: 6   Global Step: 101920   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:44,986-Speed 3298.44 samples/sec   Loss 2.4250   LearningRate 0.0483   Epoch: 6   Global Step: 101930   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:48,157-Speed 3230.69 samples/sec   Loss 2.3480   LearningRate 0.0483   Epoch: 6   Global Step: 101940   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:51,358-Speed 3200.34 samples/sec   Loss 2.3862   LearningRate 0.0482   Epoch: 6   Global Step: 101950   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:54,562-Speed 3197.36 samples/sec   Loss 2.4094   LearningRate 0.0482   Epoch: 6   Global Step: 101960   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 09:59:57,741-Speed 3222.37 samples/sec   Loss 2.3448   LearningRate 0.0482   Epoch: 6   Global Step: 101970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:00:00,879-Speed 3264.39 samples/sec   Loss 2.3535   LearningRate 0.0482   Epoch: 6   Global Step: 101980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:00:03,963-Speed 3320.66 samples/sec   Loss 2.4283   LearningRate 0.0482   Epoch: 6   Global Step: 101990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:00:07,057-Speed 3311.86 samples/sec   Loss 2.3809   LearningRate 0.0482   Epoch: 6   Global Step: 102000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:00:52,106-[lfw][102000]XNorm: 22.320298
Training: 2022-04-11 10:00:52,106-[lfw][102000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 10:00:52,107-[lfw][102000]Accuracy-Highest: 0.99817
Training: 2022-04-11 10:01:43,679-[cfp_fp][102000]XNorm: 21.054869
Training: 2022-04-11 10:01:43,680-[cfp_fp][102000]Accuracy-Flip: 0.98614+-0.00626
Training: 2022-04-11 10:01:43,681-[cfp_fp][102000]Accuracy-Highest: 0.98614
Training: 2022-04-11 10:02:28,369-[agedb_30][102000]XNorm: 22.528507
Training: 2022-04-11 10:02:28,370-[agedb_30][102000]Accuracy-Flip: 0.98050+-0.00711
Training: 2022-04-11 10:02:28,370-[agedb_30][102000]Accuracy-Highest: 0.98200
Training: 2022-04-11 10:02:31,459-Speed 70.91 samples/sec   Loss 2.4200   LearningRate 0.0482   Epoch: 6   Global Step: 102010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 10:02:34,566-Speed 3296.97 samples/sec   Loss 2.3936   LearningRate 0.0482   Epoch: 6   Global Step: 102020   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 10:02:37,623-Speed 3351.24 samples/sec   Loss 2.3506   LearningRate 0.0482   Epoch: 6   Global Step: 102030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 10:02:40,682-Speed 3347.71 samples/sec   Loss 2.4254   LearningRate 0.0482   Epoch: 6   Global Step: 102040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 10:02:43,759-Speed 3329.21 samples/sec   Loss 2.4070   LearningRate 0.0482   Epoch: 6   Global Step: 102050   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 10:02:46,812-Speed 3354.60 samples/sec   Loss 2.4103   LearningRate 0.0482   Epoch: 6   Global Step: 102060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 10:02:49,873-Speed 3346.80 samples/sec   Loss 2.4040   LearningRate 0.0482   Epoch: 6   Global Step: 102070   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 10:02:52,961-Speed 3317.27 samples/sec   Loss 2.3462   LearningRate 0.0482   Epoch: 6   Global Step: 102080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 10:02:56,039-Speed 3328.21 samples/sec   Loss 2.4229   LearningRate 0.0482   Epoch: 6   Global Step: 102090   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-11 10:02:59,112-Speed 3333.18 samples/sec   Loss 2.3983   LearningRate 0.0482   Epoch: 6   Global Step: 102100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:02,177-Speed 3342.22 samples/sec   Loss 2.3465   LearningRate 0.0482   Epoch: 6   Global Step: 102110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:05,265-Speed 3317.59 samples/sec   Loss 2.4314   LearningRate 0.0482   Epoch: 6   Global Step: 102120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:08,399-Speed 3267.54 samples/sec   Loss 2.3841   LearningRate 0.0482   Epoch: 6   Global Step: 102130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:11,582-Speed 3218.49 samples/sec   Loss 2.4121   LearningRate 0.0482   Epoch: 6   Global Step: 102140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:14,681-Speed 3305.20 samples/sec   Loss 2.4267   LearningRate 0.0482   Epoch: 6   Global Step: 102150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:17,756-Speed 3330.99 samples/sec   Loss 2.4247   LearningRate 0.0482   Epoch: 6   Global Step: 102160   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:03:20,809-Speed 3355.90 samples/sec   Loss 2.4053   LearningRate 0.0482   Epoch: 6   Global Step: 102170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:23,875-Speed 3340.30 samples/sec   Loss 2.4037   LearningRate 0.0482   Epoch: 6   Global Step: 102180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:27,010-Speed 3267.83 samples/sec   Loss 2.3467   LearningRate 0.0481   Epoch: 6   Global Step: 102190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:30,086-Speed 3330.16 samples/sec   Loss 2.5059   LearningRate 0.0481   Epoch: 6   Global Step: 102200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:33,186-Speed 3303.98 samples/sec   Loss 2.3672   LearningRate 0.0481   Epoch: 6   Global Step: 102210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:36,365-Speed 3221.98 samples/sec   Loss 2.3531   LearningRate 0.0481   Epoch: 6   Global Step: 102220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:39,431-Speed 3341.55 samples/sec   Loss 2.3892   LearningRate 0.0481   Epoch: 6   Global Step: 102230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:42,502-Speed 3335.07 samples/sec   Loss 2.4173   LearningRate 0.0481   Epoch: 6   Global Step: 102240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:45,582-Speed 3325.35 samples/sec   Loss 2.3928   LearningRate 0.0481   Epoch: 6   Global Step: 102250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:48,687-Speed 3299.35 samples/sec   Loss 2.3113   LearningRate 0.0481   Epoch: 6   Global Step: 102260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:51,838-Speed 3251.21 samples/sec   Loss 2.3775   LearningRate 0.0481   Epoch: 6   Global Step: 102270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:54,999-Speed 3240.63 samples/sec   Loss 2.4404   LearningRate 0.0481   Epoch: 6   Global Step: 102280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:03:58,163-Speed 3238.69 samples/sec   Loss 2.3692   LearningRate 0.0481   Epoch: 6   Global Step: 102290   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:04:01,252-Speed 3315.41 samples/sec   Loss 2.3588   LearningRate 0.0481   Epoch: 6   Global Step: 102300   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:04:04,341-Speed 3316.92 samples/sec   Loss 2.3276   LearningRate 0.0481   Epoch: 6   Global Step: 102310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:04:07,426-Speed 3319.99 samples/sec   Loss 2.4546   LearningRate 0.0481   Epoch: 6   Global Step: 102320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:04:10,562-Speed 3266.44 samples/sec   Loss 2.4213   LearningRate 0.0481   Epoch: 6   Global Step: 102330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:04:13,651-Speed 3316.40 samples/sec   Loss 2.4172   LearningRate 0.0481   Epoch: 6   Global Step: 102340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:04:16,820-Speed 3232.40 samples/sec   Loss 2.4067   LearningRate 0.0481   Epoch: 6   Global Step: 102350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:04:19,955-Speed 3267.34 samples/sec   Loss 2.3484   LearningRate 0.0481   Epoch: 6   Global Step: 102360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:04:23,055-Speed 3305.26 samples/sec   Loss 2.4102   LearningRate 0.0481   Epoch: 6   Global Step: 102370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:26,221-Speed 3235.02 samples/sec   Loss 2.4416   LearningRate 0.0481   Epoch: 6   Global Step: 102380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:29,320-Speed 3305.88 samples/sec   Loss 2.3769   LearningRate 0.0481   Epoch: 6   Global Step: 102390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:32,394-Speed 3332.37 samples/sec   Loss 2.3516   LearningRate 0.0481   Epoch: 6   Global Step: 102400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:35,494-Speed 3304.89 samples/sec   Loss 2.3769   LearningRate 0.0481   Epoch: 6   Global Step: 102410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:38,600-Speed 3297.32 samples/sec   Loss 2.3501   LearningRate 0.0481   Epoch: 6   Global Step: 102420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:41,671-Speed 3336.78 samples/sec   Loss 2.4059   LearningRate 0.0480   Epoch: 6   Global Step: 102430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:44,752-Speed 3324.74 samples/sec   Loss 2.4588   LearningRate 0.0480   Epoch: 6   Global Step: 102440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:47,859-Speed 3297.71 samples/sec   Loss 2.3692   LearningRate 0.0480   Epoch: 6   Global Step: 102450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:50,986-Speed 3275.74 samples/sec   Loss 2.4166   LearningRate 0.0480   Epoch: 6   Global Step: 102460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:04:54,046-Speed 3347.80 samples/sec   Loss 2.4136   LearningRate 0.0480   Epoch: 6   Global Step: 102470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:04:57,112-Speed 3340.96 samples/sec   Loss 2.4237   LearningRate 0.0480   Epoch: 6   Global Step: 102480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:00,187-Speed 3331.60 samples/sec   Loss 2.3819   LearningRate 0.0480   Epoch: 6   Global Step: 102490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:03,329-Speed 3259.94 samples/sec   Loss 2.4002   LearningRate 0.0480   Epoch: 6   Global Step: 102500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:06,455-Speed 3277.28 samples/sec   Loss 2.5138   LearningRate 0.0480   Epoch: 6   Global Step: 102510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:09,582-Speed 3275.88 samples/sec   Loss 2.3182   LearningRate 0.0480   Epoch: 6   Global Step: 102520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:12,741-Speed 3243.34 samples/sec   Loss 2.4586   LearningRate 0.0480   Epoch: 6   Global Step: 102530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:15,816-Speed 3331.59 samples/sec   Loss 2.4512   LearningRate 0.0480   Epoch: 6   Global Step: 102540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:18,905-Speed 3315.38 samples/sec   Loss 2.4338   LearningRate 0.0480   Epoch: 6   Global Step: 102550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:21,976-Speed 3335.33 samples/sec   Loss 2.3987   LearningRate 0.0480   Epoch: 6   Global Step: 102560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:25,284-Speed 3096.46 samples/sec   Loss 2.3602   LearningRate 0.0480   Epoch: 6   Global Step: 102570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:28,350-Speed 3341.91 samples/sec   Loss 2.4179   LearningRate 0.0480   Epoch: 6   Global Step: 102580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:31,440-Speed 3314.58 samples/sec   Loss 2.4103   LearningRate 0.0480   Epoch: 6   Global Step: 102590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:34,525-Speed 3320.85 samples/sec   Loss 2.4405   LearningRate 0.0480   Epoch: 6   Global Step: 102600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:37,613-Speed 3317.57 samples/sec   Loss 2.4160   LearningRate 0.0480   Epoch: 6   Global Step: 102610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:40,673-Speed 3347.14 samples/sec   Loss 2.4515   LearningRate 0.0480   Epoch: 6   Global Step: 102620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:43,754-Speed 3324.76 samples/sec   Loss 2.4589   LearningRate 0.0480   Epoch: 6   Global Step: 102630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:46,822-Speed 3338.98 samples/sec   Loss 2.3811   LearningRate 0.0480   Epoch: 6   Global Step: 102640   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:49,892-Speed 3336.35 samples/sec   Loss 2.3950   LearningRate 0.0480   Epoch: 6   Global Step: 102650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:52,990-Speed 3307.48 samples/sec   Loss 2.4341   LearningRate 0.0480   Epoch: 6   Global Step: 102660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:05:56,057-Speed 3339.75 samples/sec   Loss 2.3938   LearningRate 0.0479   Epoch: 6   Global Step: 102670   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:05:59,163-Speed 3298.29 samples/sec   Loss 2.4524   LearningRate 0.0479   Epoch: 6   Global Step: 102680   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:02,221-Speed 3349.67 samples/sec   Loss 2.4383   LearningRate 0.0479   Epoch: 6   Global Step: 102690   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:05,293-Speed 3334.77 samples/sec   Loss 2.4535   LearningRate 0.0479   Epoch: 6   Global Step: 102700   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:08,405-Speed 3290.61 samples/sec   Loss 2.4798   LearningRate 0.0479   Epoch: 6   Global Step: 102710   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:11,532-Speed 3276.11 samples/sec   Loss 2.3751   LearningRate 0.0479   Epoch: 6   Global Step: 102720   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:14,600-Speed 3339.42 samples/sec   Loss 2.3768   LearningRate 0.0479   Epoch: 6   Global Step: 102730   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:17,762-Speed 3239.99 samples/sec   Loss 2.4453   LearningRate 0.0479   Epoch: 6   Global Step: 102740   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:20,832-Speed 3336.41 samples/sec   Loss 2.4532   LearningRate 0.0479   Epoch: 6   Global Step: 102750   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:23,953-Speed 3282.55 samples/sec   Loss 2.4047   LearningRate 0.0479   Epoch: 6   Global Step: 102760   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:27,035-Speed 3323.24 samples/sec   Loss 2.5013   LearningRate 0.0479   Epoch: 6   Global Step: 102770   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:06:30,115-Speed 3326.27 samples/sec   Loss 2.4183   LearningRate 0.0479   Epoch: 6   Global Step: 102780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:06:33,206-Speed 3314.23 samples/sec   Loss 2.4351   LearningRate 0.0479   Epoch: 6   Global Step: 102790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:06:36,337-Speed 3271.36 samples/sec   Loss 2.3893   LearningRate 0.0479   Epoch: 6   Global Step: 102800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:06:39,408-Speed 3335.93 samples/sec   Loss 2.4323   LearningRate 0.0479   Epoch: 6   Global Step: 102810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:06:42,471-Speed 3343.88 samples/sec   Loss 2.4767   LearningRate 0.0479   Epoch: 6   Global Step: 102820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:06:45,588-Speed 3286.53 samples/sec   Loss 2.3942   LearningRate 0.0479   Epoch: 6   Global Step: 102830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:06:48,650-Speed 3345.71 samples/sec   Loss 2.4746   LearningRate 0.0479   Epoch: 6   Global Step: 102840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:06:51,734-Speed 3320.45 samples/sec   Loss 2.4382   LearningRate 0.0479   Epoch: 6   Global Step: 102850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:06:54,855-Speed 3283.25 samples/sec   Loss 2.4187   LearningRate 0.0479   Epoch: 6   Global Step: 102860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:06:57,998-Speed 3259.07 samples/sec   Loss 2.4133   LearningRate 0.0479   Epoch: 6   Global Step: 102870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:01,088-Speed 3315.32 samples/sec   Loss 2.4785   LearningRate 0.0479   Epoch: 6   Global Step: 102880   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:07:04,224-Speed 3267.28 samples/sec   Loss 2.4356   LearningRate 0.0479   Epoch: 6   Global Step: 102890   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:07:07,284-Speed 3348.57 samples/sec   Loss 2.4654   LearningRate 0.0479   Epoch: 6   Global Step: 102900   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:07:10,356-Speed 3334.21 samples/sec   Loss 2.4607   LearningRate 0.0478   Epoch: 6   Global Step: 102910   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:07:13,475-Speed 3283.50 samples/sec   Loss 2.4198   LearningRate 0.0478   Epoch: 6   Global Step: 102920   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:07:16,615-Speed 3262.40 samples/sec   Loss 2.3430   LearningRate 0.0478   Epoch: 6   Global Step: 102930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:19,702-Speed 3318.84 samples/sec   Loss 2.4548   LearningRate 0.0478   Epoch: 6   Global Step: 102940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:22,791-Speed 3315.34 samples/sec   Loss 2.4266   LearningRate 0.0478   Epoch: 6   Global Step: 102950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:25,872-Speed 3325.73 samples/sec   Loss 2.4526   LearningRate 0.0478   Epoch: 6   Global Step: 102960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:28,934-Speed 3344.62 samples/sec   Loss 2.4625   LearningRate 0.0478   Epoch: 6   Global Step: 102970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:32,006-Speed 3334.44 samples/sec   Loss 2.4431   LearningRate 0.0478   Epoch: 6   Global Step: 102980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:35,172-Speed 3235.91 samples/sec   Loss 2.4007   LearningRate 0.0478   Epoch: 6   Global Step: 102990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:38,370-Speed 3203.38 samples/sec   Loss 2.3825   LearningRate 0.0478   Epoch: 6   Global Step: 103000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:41,485-Speed 3288.06 samples/sec   Loss 2.4854   LearningRate 0.0478   Epoch: 6   Global Step: 103010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:44,603-Speed 3286.50 samples/sec   Loss 2.4591   LearningRate 0.0478   Epoch: 6   Global Step: 103020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:07:47,704-Speed 3303.23 samples/sec   Loss 2.4811   LearningRate 0.0478   Epoch: 6   Global Step: 103030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:07:50,783-Speed 3326.05 samples/sec   Loss 2.4443   LearningRate 0.0478   Epoch: 6   Global Step: 103040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:07:53,907-Speed 3278.92 samples/sec   Loss 2.4945   LearningRate 0.0478   Epoch: 6   Global Step: 103050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:07:57,007-Speed 3305.92 samples/sec   Loss 2.4542   LearningRate 0.0478   Epoch: 6   Global Step: 103060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:08:00,072-Speed 3341.89 samples/sec   Loss 2.4417   LearningRate 0.0478   Epoch: 6   Global Step: 103070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:08:03,163-Speed 3313.71 samples/sec   Loss 2.4626   LearningRate 0.0478   Epoch: 6   Global Step: 103080   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:08:06,275-Speed 3291.84 samples/sec   Loss 2.4295   LearningRate 0.0478   Epoch: 6   Global Step: 103090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:08:09,458-Speed 3218.17 samples/sec   Loss 2.4261   LearningRate 0.0478   Epoch: 6   Global Step: 103100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:08:12,598-Speed 3262.25 samples/sec   Loss 2.4342   LearningRate 0.0478   Epoch: 6   Global Step: 103110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:08:15,655-Speed 3351.20 samples/sec   Loss 2.5266   LearningRate 0.0478   Epoch: 6   Global Step: 103120   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:18,748-Speed 3312.38 samples/sec   Loss 2.4575   LearningRate 0.0478   Epoch: 6   Global Step: 103130   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:21,828-Speed 3325.64 samples/sec   Loss 2.4602   LearningRate 0.0478   Epoch: 6   Global Step: 103140   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:24,973-Speed 3257.48 samples/sec   Loss 2.4317   LearningRate 0.0477   Epoch: 6   Global Step: 103150   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:28,054-Speed 3324.52 samples/sec   Loss 2.3680   LearningRate 0.0477   Epoch: 6   Global Step: 103160   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:31,155-Speed 3303.26 samples/sec   Loss 2.5086   LearningRate 0.0477   Epoch: 6   Global Step: 103170   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:34,285-Speed 3273.63 samples/sec   Loss 2.5172   LearningRate 0.0477   Epoch: 6   Global Step: 103180   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:37,420-Speed 3266.89 samples/sec   Loss 2.4533   LearningRate 0.0477   Epoch: 6   Global Step: 103190   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:40,509-Speed 3317.62 samples/sec   Loss 2.5011   LearningRate 0.0477   Epoch: 6   Global Step: 103200   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:43,594-Speed 3320.78 samples/sec   Loss 2.4432   LearningRate 0.0477   Epoch: 6   Global Step: 103210   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-11 10:08:46,656-Speed 3345.94 samples/sec   Loss 2.4843   LearningRate 0.0477   Epoch: 6   Global Step: 103220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:08:49,808-Speed 3249.89 samples/sec   Loss 2.4905   LearningRate 0.0477   Epoch: 6   Global Step: 103230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:08:52,872-Speed 3342.77 samples/sec   Loss 2.4528   LearningRate 0.0477   Epoch: 6   Global Step: 103240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:08:56,048-Speed 3225.98 samples/sec   Loss 2.4564   LearningRate 0.0477   Epoch: 6   Global Step: 103250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:08:59,113-Speed 3342.49 samples/sec   Loss 2.4225   LearningRate 0.0477   Epoch: 6   Global Step: 103260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:09:02,219-Speed 3298.44 samples/sec   Loss 2.5103   LearningRate 0.0477   Epoch: 6   Global Step: 103270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:09:05,319-Speed 3303.82 samples/sec   Loss 2.4563   LearningRate 0.0477   Epoch: 6   Global Step: 103280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:09:08,385-Speed 3341.37 samples/sec   Loss 2.4836   LearningRate 0.0477   Epoch: 6   Global Step: 103290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:09:11,544-Speed 3243.07 samples/sec   Loss 2.4113   LearningRate 0.0477   Epoch: 6   Global Step: 103300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:09:14,628-Speed 3322.30 samples/sec   Loss 2.4903   LearningRate 0.0477   Epoch: 6   Global Step: 103310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:09:17,752-Speed 3278.63 samples/sec   Loss 2.5024   LearningRate 0.0477   Epoch: 6   Global Step: 103320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:20,816-Speed 3343.39 samples/sec   Loss 2.4432   LearningRate 0.0477   Epoch: 6   Global Step: 103330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:23,949-Speed 3269.87 samples/sec   Loss 2.3945   LearningRate 0.0477   Epoch: 6   Global Step: 103340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:27,022-Speed 3333.25 samples/sec   Loss 2.5089   LearningRate 0.0477   Epoch: 6   Global Step: 103350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:30,186-Speed 3237.56 samples/sec   Loss 2.4897   LearningRate 0.0477   Epoch: 6   Global Step: 103360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:33,268-Speed 3324.68 samples/sec   Loss 2.4204   LearningRate 0.0477   Epoch: 6   Global Step: 103370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:36,375-Speed 3296.55 samples/sec   Loss 2.4627   LearningRate 0.0477   Epoch: 6   Global Step: 103380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:39,462-Speed 3318.87 samples/sec   Loss 2.4559   LearningRate 0.0476   Epoch: 6   Global Step: 103390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:42,579-Speed 3286.27 samples/sec   Loss 2.4562   LearningRate 0.0476   Epoch: 6   Global Step: 103400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:45,663-Speed 3320.75 samples/sec   Loss 2.4739   LearningRate 0.0476   Epoch: 6   Global Step: 103410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:48,771-Speed 3296.17 samples/sec   Loss 2.4666   LearningRate 0.0476   Epoch: 6   Global Step: 103420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:51,852-Speed 3324.93 samples/sec   Loss 2.4048   LearningRate 0.0476   Epoch: 6   Global Step: 103430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:55,039-Speed 3213.77 samples/sec   Loss 2.4823   LearningRate 0.0476   Epoch: 6   Global Step: 103440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:09:58,126-Speed 3318.18 samples/sec   Loss 2.5119   LearningRate 0.0476   Epoch: 6   Global Step: 103450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:01,284-Speed 3244.42 samples/sec   Loss 2.5147   LearningRate 0.0476   Epoch: 6   Global Step: 103460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:04,403-Speed 3284.70 samples/sec   Loss 2.4063   LearningRate 0.0476   Epoch: 6   Global Step: 103470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:07,553-Speed 3251.63 samples/sec   Loss 2.4534   LearningRate 0.0476   Epoch: 6   Global Step: 103480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:10,653-Speed 3304.30 samples/sec   Loss 2.3893   LearningRate 0.0476   Epoch: 6   Global Step: 103490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:13,753-Speed 3303.47 samples/sec   Loss 2.4129   LearningRate 0.0476   Epoch: 6   Global Step: 103500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:16,830-Speed 3329.88 samples/sec   Loss 2.4351   LearningRate 0.0476   Epoch: 6   Global Step: 103510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:19,912-Speed 3323.22 samples/sec   Loss 2.4688   LearningRate 0.0476   Epoch: 6   Global Step: 103520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:22,982-Speed 3337.03 samples/sec   Loss 2.5187   LearningRate 0.0476   Epoch: 6   Global Step: 103530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:26,116-Speed 3268.99 samples/sec   Loss 2.4905   LearningRate 0.0476   Epoch: 6   Global Step: 103540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:29,186-Speed 3336.07 samples/sec   Loss 2.5020   LearningRate 0.0476   Epoch: 6   Global Step: 103550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:32,330-Speed 3258.29 samples/sec   Loss 2.5441   LearningRate 0.0476   Epoch: 6   Global Step: 103560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:10:35,398-Speed 3339.68 samples/sec   Loss 2.5339   LearningRate 0.0476   Epoch: 6   Global Step: 103570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:10:38,531-Speed 3270.00 samples/sec   Loss 2.5506   LearningRate 0.0476   Epoch: 6   Global Step: 103580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:10:41,621-Speed 3314.21 samples/sec   Loss 2.4271   LearningRate 0.0476   Epoch: 6   Global Step: 103590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:10:44,767-Speed 3256.07 samples/sec   Loss 2.4306   LearningRate 0.0476   Epoch: 6   Global Step: 103600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:10:47,859-Speed 3313.04 samples/sec   Loss 2.4847   LearningRate 0.0476   Epoch: 6   Global Step: 103610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:10:50,988-Speed 3274.37 samples/sec   Loss 2.4602   LearningRate 0.0476   Epoch: 6   Global Step: 103620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:10:54,067-Speed 3327.74 samples/sec   Loss 2.5176   LearningRate 0.0475   Epoch: 6   Global Step: 103630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:10:57,150-Speed 3322.14 samples/sec   Loss 2.4944   LearningRate 0.0475   Epoch: 6   Global Step: 103640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:00,243-Speed 3311.70 samples/sec   Loss 2.5261   LearningRate 0.0475   Epoch: 6   Global Step: 103650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:03,335-Speed 3313.59 samples/sec   Loss 2.5145   LearningRate 0.0475   Epoch: 6   Global Step: 103660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:06,460-Speed 3277.45 samples/sec   Loss 2.5412   LearningRate 0.0475   Epoch: 6   Global Step: 103670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:11:09,548-Speed 3318.00 samples/sec   Loss 2.4648   LearningRate 0.0475   Epoch: 6   Global Step: 103680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:11:12,645-Speed 3306.34 samples/sec   Loss 2.5079   LearningRate 0.0475   Epoch: 6   Global Step: 103690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:11:15,746-Speed 3304.49 samples/sec   Loss 2.4869   LearningRate 0.0475   Epoch: 6   Global Step: 103700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:11:18,825-Speed 3326.76 samples/sec   Loss 2.4758   LearningRate 0.0475   Epoch: 6   Global Step: 103710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:11:22,052-Speed 3174.66 samples/sec   Loss 2.5450   LearningRate 0.0475   Epoch: 6   Global Step: 103720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:25,191-Speed 3263.31 samples/sec   Loss 2.5378   LearningRate 0.0475   Epoch: 6   Global Step: 103730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:28,289-Speed 3307.43 samples/sec   Loss 2.5119   LearningRate 0.0475   Epoch: 6   Global Step: 103740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:31,390-Speed 3302.59 samples/sec   Loss 2.6027   LearningRate 0.0475   Epoch: 6   Global Step: 103750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:34,512-Speed 3281.27 samples/sec   Loss 2.4905   LearningRate 0.0475   Epoch: 6   Global Step: 103760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:37,666-Speed 3248.21 samples/sec   Loss 2.5238   LearningRate 0.0475   Epoch: 6   Global Step: 103770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:40,803-Speed 3265.56 samples/sec   Loss 2.5052   LearningRate 0.0475   Epoch: 6   Global Step: 103780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:43,901-Speed 3306.15 samples/sec   Loss 2.4728   LearningRate 0.0475   Epoch: 6   Global Step: 103790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:47,085-Speed 3217.47 samples/sec   Loss 2.5227   LearningRate 0.0475   Epoch: 6   Global Step: 103800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:50,174-Speed 3315.60 samples/sec   Loss 2.5471   LearningRate 0.0475   Epoch: 6   Global Step: 103810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:53,273-Speed 3305.03 samples/sec   Loss 2.5969   LearningRate 0.0475   Epoch: 6   Global Step: 103820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:11:56,360-Speed 3318.91 samples/sec   Loss 2.4569   LearningRate 0.0475   Epoch: 6   Global Step: 103830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:11:59,452-Speed 3312.58 samples/sec   Loss 2.4999   LearningRate 0.0475   Epoch: 6   Global Step: 103840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:12:02,521-Speed 3336.92 samples/sec   Loss 2.4524   LearningRate 0.0475   Epoch: 6   Global Step: 103850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:12:05,626-Speed 3299.52 samples/sec   Loss 2.5109   LearningRate 0.0475   Epoch: 6   Global Step: 103860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:12:08,710-Speed 3321.85 samples/sec   Loss 2.5305   LearningRate 0.0475   Epoch: 6   Global Step: 103870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:12:11,792-Speed 3323.53 samples/sec   Loss 2.5293   LearningRate 0.0474   Epoch: 6   Global Step: 103880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:12:14,916-Speed 3279.73 samples/sec   Loss 2.4793   LearningRate 0.0474   Epoch: 6   Global Step: 103890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:12:18,017-Speed 3303.35 samples/sec   Loss 2.5402   LearningRate 0.0474   Epoch: 6   Global Step: 103900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:12:21,081-Speed 3343.53 samples/sec   Loss 2.5047   LearningRate 0.0474   Epoch: 6   Global Step: 103910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:12:24,208-Speed 3276.23 samples/sec   Loss 2.4398   LearningRate 0.0474   Epoch: 6   Global Step: 103920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:12:27,326-Speed 3285.66 samples/sec   Loss 2.4649   LearningRate 0.0474   Epoch: 6   Global Step: 103930   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:12:30,446-Speed 3283.25 samples/sec   Loss 2.5225   LearningRate 0.0474   Epoch: 6   Global Step: 103940   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:12:33,628-Speed 3219.89 samples/sec   Loss 2.5556   LearningRate 0.0474   Epoch: 6   Global Step: 103950   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:12:36,694-Speed 3340.72 samples/sec   Loss 2.5097   LearningRate 0.0474   Epoch: 6   Global Step: 103960   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:12:39,786-Speed 3313.30 samples/sec   Loss 2.5477   LearningRate 0.0474   Epoch: 6   Global Step: 103970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:12:42,884-Speed 3307.16 samples/sec   Loss 2.4868   LearningRate 0.0474   Epoch: 6   Global Step: 103980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:12:46,018-Speed 3267.72 samples/sec   Loss 2.5380   LearningRate 0.0474   Epoch: 6   Global Step: 103990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:12:49,085-Speed 3340.77 samples/sec   Loss 2.5187   LearningRate 0.0474   Epoch: 6   Global Step: 104000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:13:34,867-[lfw][104000]XNorm: 22.439375
Training: 2022-04-11 10:13:34,868-[lfw][104000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 10:13:34,868-[lfw][104000]Accuracy-Highest: 0.99817
Training: 2022-04-11 10:14:26,566-[cfp_fp][104000]XNorm: 21.416988
Training: 2022-04-11 10:14:26,567-[cfp_fp][104000]Accuracy-Flip: 0.98471+-0.00589
Training: 2022-04-11 10:14:26,568-[cfp_fp][104000]Accuracy-Highest: 0.98614
Training: 2022-04-11 10:15:11,016-[agedb_30][104000]XNorm: 22.726809
Training: 2022-04-11 10:15:11,016-[agedb_30][104000]Accuracy-Flip: 0.98250+-0.00676
Training: 2022-04-11 10:15:11,017-[agedb_30][104000]Accuracy-Highest: 0.98250
Training: 2022-04-11 10:15:14,088-Speed 70.62 samples/sec   Loss 2.5143   LearningRate 0.0474   Epoch: 6   Global Step: 104010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:17,144-Speed 3352.19 samples/sec   Loss 2.4817   LearningRate 0.0474   Epoch: 6   Global Step: 104020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:20,191-Speed 3361.74 samples/sec   Loss 2.4933   LearningRate 0.0474   Epoch: 6   Global Step: 104030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:23,290-Speed 3304.56 samples/sec   Loss 2.4765   LearningRate 0.0474   Epoch: 6   Global Step: 104040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:26,374-Speed 3321.53 samples/sec   Loss 2.5214   LearningRate 0.0474   Epoch: 6   Global Step: 104050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:29,439-Speed 3342.12 samples/sec   Loss 2.5052   LearningRate 0.0474   Epoch: 6   Global Step: 104060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:32,517-Speed 3327.84 samples/sec   Loss 2.5024   LearningRate 0.0474   Epoch: 6   Global Step: 104070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:35,581-Speed 3342.44 samples/sec   Loss 2.5150   LearningRate 0.0474   Epoch: 6   Global Step: 104080   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:38,652-Speed 3336.03 samples/sec   Loss 2.4702   LearningRate 0.0474   Epoch: 6   Global Step: 104090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:41,713-Speed 3345.42 samples/sec   Loss 2.4984   LearningRate 0.0474   Epoch: 6   Global Step: 104100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:44,775-Speed 3345.77 samples/sec   Loss 2.4979   LearningRate 0.0474   Epoch: 6   Global Step: 104110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:47,834-Speed 3348.28 samples/sec   Loss 2.5414   LearningRate 0.0473   Epoch: 6   Global Step: 104120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:50,910-Speed 3329.66 samples/sec   Loss 2.5558   LearningRate 0.0473   Epoch: 6   Global Step: 104130   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:15:53,965-Speed 3352.79 samples/sec   Loss 2.5379   LearningRate 0.0473   Epoch: 6   Global Step: 104140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:15:57,026-Speed 3345.55 samples/sec   Loss 2.4892   LearningRate 0.0473   Epoch: 6   Global Step: 104150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:00,104-Speed 3328.26 samples/sec   Loss 2.4580   LearningRate 0.0473   Epoch: 6   Global Step: 104160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:03,168-Speed 3341.90 samples/sec   Loss 2.4321   LearningRate 0.0473   Epoch: 6   Global Step: 104170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:06,235-Speed 3340.08 samples/sec   Loss 2.5355   LearningRate 0.0473   Epoch: 6   Global Step: 104180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:09,296-Speed 3346.41 samples/sec   Loss 2.5506   LearningRate 0.0473   Epoch: 6   Global Step: 104190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:12,357-Speed 3346.30 samples/sec   Loss 2.5254   LearningRate 0.0473   Epoch: 6   Global Step: 104200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:15,428-Speed 3334.59 samples/sec   Loss 2.5615   LearningRate 0.0473   Epoch: 6   Global Step: 104210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:18,491-Speed 3345.04 samples/sec   Loss 2.5118   LearningRate 0.0473   Epoch: 6   Global Step: 104220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:21,565-Speed 3331.47 samples/sec   Loss 2.5954   LearningRate 0.0473   Epoch: 6   Global Step: 104230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:24,653-Speed 3317.62 samples/sec   Loss 2.5366   LearningRate 0.0473   Epoch: 6   Global Step: 104240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:27,723-Speed 3337.18 samples/sec   Loss 2.4642   LearningRate 0.0473   Epoch: 6   Global Step: 104250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:30,786-Speed 3344.57 samples/sec   Loss 2.4996   LearningRate 0.0473   Epoch: 6   Global Step: 104260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:33,845-Speed 3348.11 samples/sec   Loss 2.4948   LearningRate 0.0473   Epoch: 6   Global Step: 104270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:36,940-Speed 3309.98 samples/sec   Loss 2.4658   LearningRate 0.0473   Epoch: 6   Global Step: 104280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:40,050-Speed 3293.96 samples/sec   Loss 2.5026   LearningRate 0.0473   Epoch: 6   Global Step: 104290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:43,151-Speed 3301.89 samples/sec   Loss 2.5107   LearningRate 0.0473   Epoch: 6   Global Step: 104300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:16:46,264-Speed 3291.03 samples/sec   Loss 2.5222   LearningRate 0.0473   Epoch: 6   Global Step: 104310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:49,408-Speed 3257.62 samples/sec   Loss 2.5094   LearningRate 0.0473   Epoch: 6   Global Step: 104320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:52,495-Speed 3318.63 samples/sec   Loss 2.5662   LearningRate 0.0473   Epoch: 6   Global Step: 104330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:55,555-Speed 3346.66 samples/sec   Loss 2.5125   LearningRate 0.0473   Epoch: 6   Global Step: 104340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:16:58,667-Speed 3291.59 samples/sec   Loss 2.5493   LearningRate 0.0473   Epoch: 6   Global Step: 104350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:01,741-Speed 3332.46 samples/sec   Loss 2.5597   LearningRate 0.0472   Epoch: 6   Global Step: 104360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:04,803-Speed 3344.69 samples/sec   Loss 2.5265   LearningRate 0.0472   Epoch: 6   Global Step: 104370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:07,867-Speed 3342.55 samples/sec   Loss 2.5719   LearningRate 0.0472   Epoch: 6   Global Step: 104380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:10,926-Speed 3348.75 samples/sec   Loss 2.5413   LearningRate 0.0472   Epoch: 6   Global Step: 104390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:13,988-Speed 3345.35 samples/sec   Loss 2.5668   LearningRate 0.0472   Epoch: 6   Global Step: 104400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:17,044-Speed 3350.92 samples/sec   Loss 2.5388   LearningRate 0.0472   Epoch: 6   Global Step: 104410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:20,161-Speed 3286.54 samples/sec   Loss 2.5701   LearningRate 0.0472   Epoch: 6   Global Step: 104420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:23,239-Speed 3327.20 samples/sec   Loss 2.4712   LearningRate 0.0472   Epoch: 6   Global Step: 104430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:26,308-Speed 3337.87 samples/sec   Loss 2.6127   LearningRate 0.0472   Epoch: 6   Global Step: 104440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:29,409-Speed 3303.29 samples/sec   Loss 2.5238   LearningRate 0.0472   Epoch: 6   Global Step: 104450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:32,526-Speed 3286.61 samples/sec   Loss 2.4823   LearningRate 0.0472   Epoch: 6   Global Step: 104460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:35,618-Speed 3312.32 samples/sec   Loss 2.5334   LearningRate 0.0472   Epoch: 6   Global Step: 104470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:38,691-Speed 3334.41 samples/sec   Loss 2.5847   LearningRate 0.0472   Epoch: 6   Global Step: 104480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:41,838-Speed 3254.30 samples/sec   Loss 2.4615   LearningRate 0.0472   Epoch: 6   Global Step: 104490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:48,098-Speed 1635.98 samples/sec   Loss 2.4190   LearningRate 0.0472   Epoch: 6   Global Step: 104500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:53,496-Speed 1897.24 samples/sec   Loss 2.5526   LearningRate 0.0472   Epoch: 6   Global Step: 104510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:56,563-Speed 3339.70 samples/sec   Loss 2.4819   LearningRate 0.0472   Epoch: 6   Global Step: 104520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:17:59,624-Speed 3346.49 samples/sec   Loss 2.6066   LearningRate 0.0472   Epoch: 6   Global Step: 104530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:02,735-Speed 3292.49 samples/sec   Loss 2.5184   LearningRate 0.0472   Epoch: 6   Global Step: 104540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:05,858-Speed 3280.54 samples/sec   Loss 2.4959   LearningRate 0.0472   Epoch: 6   Global Step: 104550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:08,912-Speed 3353.96 samples/sec   Loss 2.5214   LearningRate 0.0472   Epoch: 6   Global Step: 104560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:11,968-Speed 3351.65 samples/sec   Loss 2.5975   LearningRate 0.0472   Epoch: 6   Global Step: 104570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:15,024-Speed 3351.00 samples/sec   Loss 2.5239   LearningRate 0.0472   Epoch: 6   Global Step: 104580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:18,094-Speed 3337.37 samples/sec   Loss 2.4750   LearningRate 0.0472   Epoch: 6   Global Step: 104590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:21,164-Speed 3336.04 samples/sec   Loss 2.5711   LearningRate 0.0471   Epoch: 6   Global Step: 104600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:24,239-Speed 3331.30 samples/sec   Loss 2.5845   LearningRate 0.0471   Epoch: 6   Global Step: 104610   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:18:27,293-Speed 3353.52 samples/sec   Loss 2.5795   LearningRate 0.0471   Epoch: 6   Global Step: 104620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:30,353-Speed 3347.40 samples/sec   Loss 2.5344   LearningRate 0.0471   Epoch: 6   Global Step: 104630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:33,423-Speed 3336.34 samples/sec   Loss 2.4333   LearningRate 0.0471   Epoch: 6   Global Step: 104640   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:36,484-Speed 3346.46 samples/sec   Loss 2.5142   LearningRate 0.0471   Epoch: 6   Global Step: 104650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:39,611-Speed 3275.16 samples/sec   Loss 2.5286   LearningRate 0.0471   Epoch: 6   Global Step: 104660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:42,767-Speed 3245.84 samples/sec   Loss 2.5218   LearningRate 0.0471   Epoch: 6   Global Step: 104670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:45,912-Speed 3257.03 samples/sec   Loss 2.5839   LearningRate 0.0471   Epoch: 6   Global Step: 104680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:48,974-Speed 3345.04 samples/sec   Loss 2.5235   LearningRate 0.0471   Epoch: 6   Global Step: 104690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:52,048-Speed 3332.61 samples/sec   Loss 2.4973   LearningRate 0.0471   Epoch: 6   Global Step: 104700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:55,129-Speed 3324.49 samples/sec   Loss 2.5485   LearningRate 0.0471   Epoch: 6   Global Step: 104710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:18:58,193-Speed 3342.87 samples/sec   Loss 2.5455   LearningRate 0.0471   Epoch: 6   Global Step: 104720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:01,300-Speed 3295.87 samples/sec   Loss 2.5457   LearningRate 0.0471   Epoch: 6   Global Step: 104730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:04,361-Speed 3346.72 samples/sec   Loss 2.5536   LearningRate 0.0471   Epoch: 6   Global Step: 104740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:07,426-Speed 3342.51 samples/sec   Loss 2.4519   LearningRate 0.0471   Epoch: 6   Global Step: 104750   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:10,507-Speed 3323.99 samples/sec   Loss 2.5568   LearningRate 0.0471   Epoch: 6   Global Step: 104760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:13,573-Speed 3340.93 samples/sec   Loss 2.5362   LearningRate 0.0471   Epoch: 6   Global Step: 104770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:16,653-Speed 3324.77 samples/sec   Loss 2.5242   LearningRate 0.0471   Epoch: 6   Global Step: 104780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:19,716-Speed 3344.40 samples/sec   Loss 2.5959   LearningRate 0.0471   Epoch: 6   Global Step: 104790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:22,792-Speed 3330.55 samples/sec   Loss 2.4835   LearningRate 0.0471   Epoch: 6   Global Step: 104800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:25,880-Speed 3316.85 samples/sec   Loss 2.4900   LearningRate 0.0471   Epoch: 6   Global Step: 104810   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:28,958-Speed 3328.68 samples/sec   Loss 2.5082   LearningRate 0.0471   Epoch: 6   Global Step: 104820   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:19:32,042-Speed 3321.16 samples/sec   Loss 2.5361   LearningRate 0.0471   Epoch: 6   Global Step: 104830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:35,139-Speed 3306.90 samples/sec   Loss 2.5837   LearningRate 0.0471   Epoch: 6   Global Step: 104840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:38,230-Speed 3313.51 samples/sec   Loss 2.5011   LearningRate 0.0470   Epoch: 6   Global Step: 104850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:19:41,284-Speed 3354.65 samples/sec   Loss 2.4486   LearningRate 0.0470   Epoch: 6   Global Step: 104860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:19:44,380-Speed 3307.28 samples/sec   Loss 2.5241   LearningRate 0.0470   Epoch: 6   Global Step: 104870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:19:47,457-Speed 3329.30 samples/sec   Loss 2.5060   LearningRate 0.0470   Epoch: 6   Global Step: 104880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:19:50,571-Speed 3289.17 samples/sec   Loss 2.4729   LearningRate 0.0470   Epoch: 6   Global Step: 104890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:19:53,630-Speed 3348.23 samples/sec   Loss 2.5590   LearningRate 0.0470   Epoch: 6   Global Step: 104900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:19:56,708-Speed 3327.77 samples/sec   Loss 2.4723   LearningRate 0.0470   Epoch: 6   Global Step: 104910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:19:59,780-Speed 3333.88 samples/sec   Loss 2.5638   LearningRate 0.0470   Epoch: 6   Global Step: 104920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:20:02,852-Speed 3334.87 samples/sec   Loss 2.5911   LearningRate 0.0470   Epoch: 6   Global Step: 104930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:20:06,004-Speed 3249.77 samples/sec   Loss 2.4951   LearningRate 0.0470   Epoch: 6   Global Step: 104940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:20:09,071-Speed 3339.31 samples/sec   Loss 2.5190   LearningRate 0.0470   Epoch: 6   Global Step: 104950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:20:12,152-Speed 3324.33 samples/sec   Loss 2.5229   LearningRate 0.0470   Epoch: 6   Global Step: 104960   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:15,215-Speed 3343.75 samples/sec   Loss 2.5563   LearningRate 0.0470   Epoch: 6   Global Step: 104970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:18,300-Speed 3320.81 samples/sec   Loss 2.4512   LearningRate 0.0470   Epoch: 6   Global Step: 104980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:21,377-Speed 3328.79 samples/sec   Loss 2.6069   LearningRate 0.0470   Epoch: 6   Global Step: 104990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:24,446-Speed 3337.13 samples/sec   Loss 2.5238   LearningRate 0.0470   Epoch: 6   Global Step: 105000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:27,507-Speed 3346.25 samples/sec   Loss 2.5598   LearningRate 0.0470   Epoch: 6   Global Step: 105010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:30,574-Speed 3339.34 samples/sec   Loss 2.6020   LearningRate 0.0470   Epoch: 6   Global Step: 105020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:33,709-Speed 3267.48 samples/sec   Loss 2.4948   LearningRate 0.0470   Epoch: 6   Global Step: 105030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:36,773-Speed 3343.95 samples/sec   Loss 2.5477   LearningRate 0.0470   Epoch: 6   Global Step: 105040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:39,840-Speed 3339.27 samples/sec   Loss 2.6195   LearningRate 0.0470   Epoch: 6   Global Step: 105050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:42,896-Speed 3351.37 samples/sec   Loss 2.5362   LearningRate 0.0470   Epoch: 6   Global Step: 105060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:45,955-Speed 3348.77 samples/sec   Loss 2.5542   LearningRate 0.0470   Epoch: 6   Global Step: 105070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:49,041-Speed 3318.89 samples/sec   Loss 2.5609   LearningRate 0.0470   Epoch: 6   Global Step: 105080   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:52,122-Speed 3324.61 samples/sec   Loss 2.5777   LearningRate 0.0469   Epoch: 6   Global Step: 105090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:55,203-Speed 3324.79 samples/sec   Loss 2.5686   LearningRate 0.0469   Epoch: 6   Global Step: 105100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:20:58,287-Speed 3320.71 samples/sec   Loss 2.6130   LearningRate 0.0469   Epoch: 6   Global Step: 105110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:21:01,344-Speed 3350.70 samples/sec   Loss 2.5917   LearningRate 0.0469   Epoch: 6   Global Step: 105120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:21:04,411-Speed 3339.18 samples/sec   Loss 2.4984   LearningRate 0.0469   Epoch: 6   Global Step: 105130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:21:07,477-Speed 3341.49 samples/sec   Loss 2.5773   LearningRate 0.0469   Epoch: 6   Global Step: 105140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:21:10,553-Speed 3329.59 samples/sec   Loss 2.5126   LearningRate 0.0469   Epoch: 6   Global Step: 105150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:21:13,609-Speed 3352.34 samples/sec   Loss 2.5611   LearningRate 0.0469   Epoch: 6   Global Step: 105160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:21:16,665-Speed 3350.46 samples/sec   Loss 2.5156   LearningRate 0.0469   Epoch: 6   Global Step: 105170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:19,729-Speed 3343.81 samples/sec   Loss 2.5975   LearningRate 0.0469   Epoch: 6   Global Step: 105180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:22,920-Speed 3209.27 samples/sec   Loss 2.5874   LearningRate 0.0469   Epoch: 6   Global Step: 105190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:25,992-Speed 3334.29 samples/sec   Loss 2.6160   LearningRate 0.0469   Epoch: 6   Global Step: 105200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:29,065-Speed 3334.08 samples/sec   Loss 2.5255   LearningRate 0.0469   Epoch: 6   Global Step: 105210   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:32,130-Speed 3341.89 samples/sec   Loss 2.5422   LearningRate 0.0469   Epoch: 6   Global Step: 105220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:35,196-Speed 3340.26 samples/sec   Loss 2.5770   LearningRate 0.0469   Epoch: 6   Global Step: 105230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:38,292-Speed 3308.79 samples/sec   Loss 2.5424   LearningRate 0.0469   Epoch: 6   Global Step: 105240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:41,358-Speed 3340.12 samples/sec   Loss 2.5366   LearningRate 0.0469   Epoch: 6   Global Step: 105250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:44,434-Speed 3329.59 samples/sec   Loss 2.4896   LearningRate 0.0469   Epoch: 6   Global Step: 105260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:47,547-Speed 3289.84 samples/sec   Loss 2.5450   LearningRate 0.0469   Epoch: 6   Global Step: 105270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:21:50,627-Speed 3326.66 samples/sec   Loss 2.5784   LearningRate 0.0469   Epoch: 6   Global Step: 105280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:21:53,675-Speed 3359.97 samples/sec   Loss 2.5781   LearningRate 0.0469   Epoch: 6   Global Step: 105290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:56,748-Speed 3333.01 samples/sec   Loss 2.5806   LearningRate 0.0469   Epoch: 6   Global Step: 105300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:21:59,811-Speed 3344.73 samples/sec   Loss 2.5526   LearningRate 0.0469   Epoch: 6   Global Step: 105310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:22:02,882-Speed 3334.86 samples/sec   Loss 2.4909   LearningRate 0.0469   Epoch: 6   Global Step: 105320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:22:05,943-Speed 3346.12 samples/sec   Loss 2.5087   LearningRate 0.0469   Epoch: 6   Global Step: 105330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:22:09,063-Speed 3283.03 samples/sec   Loss 2.5246   LearningRate 0.0468   Epoch: 6   Global Step: 105340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:22:12,170-Speed 3296.05 samples/sec   Loss 2.5170   LearningRate 0.0468   Epoch: 6   Global Step: 105350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:22:15,237-Speed 3339.98 samples/sec   Loss 2.5400   LearningRate 0.0468   Epoch: 6   Global Step: 105360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:22:18,301-Speed 3343.17 samples/sec   Loss 2.5753   LearningRate 0.0468   Epoch: 6   Global Step: 105370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:22:21,371-Speed 3335.45 samples/sec   Loss 2.5444   LearningRate 0.0468   Epoch: 6   Global Step: 105380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:22:24,448-Speed 3329.89 samples/sec   Loss 2.5999   LearningRate 0.0468   Epoch: 6   Global Step: 105390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:27,570-Speed 3280.66 samples/sec   Loss 2.5815   LearningRate 0.0468   Epoch: 6   Global Step: 105400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:30,649-Speed 3325.78 samples/sec   Loss 2.5328   LearningRate 0.0468   Epoch: 6   Global Step: 105410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:33,753-Speed 3300.29 samples/sec   Loss 2.5138   LearningRate 0.0468   Epoch: 6   Global Step: 105420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:36,826-Speed 3333.46 samples/sec   Loss 2.5337   LearningRate 0.0468   Epoch: 6   Global Step: 105430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:39,909-Speed 3322.47 samples/sec   Loss 2.5134   LearningRate 0.0468   Epoch: 6   Global Step: 105440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:42,999-Speed 3314.33 samples/sec   Loss 2.5453   LearningRate 0.0468   Epoch: 6   Global Step: 105450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:46,110-Speed 3292.15 samples/sec   Loss 2.5223   LearningRate 0.0468   Epoch: 6   Global Step: 105460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:49,178-Speed 3338.55 samples/sec   Loss 2.6093   LearningRate 0.0468   Epoch: 6   Global Step: 105470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:52,274-Speed 3309.10 samples/sec   Loss 2.5205   LearningRate 0.0468   Epoch: 6   Global Step: 105480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:55,326-Speed 3356.17 samples/sec   Loss 2.5810   LearningRate 0.0468   Epoch: 6   Global Step: 105490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:22:58,398-Speed 3334.09 samples/sec   Loss 2.5449   LearningRate 0.0468   Epoch: 6   Global Step: 105500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:01,486-Speed 3316.61 samples/sec   Loss 2.6020   LearningRate 0.0468   Epoch: 6   Global Step: 105510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:04,557-Speed 3335.92 samples/sec   Loss 2.5742   LearningRate 0.0468   Epoch: 6   Global Step: 105520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:07,639-Speed 3322.68 samples/sec   Loss 2.5201   LearningRate 0.0468   Epoch: 6   Global Step: 105530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:10,707-Speed 3339.16 samples/sec   Loss 2.5611   LearningRate 0.0468   Epoch: 6   Global Step: 105540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:13,782-Speed 3331.20 samples/sec   Loss 2.5747   LearningRate 0.0468   Epoch: 6   Global Step: 105550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:16,850-Speed 3338.26 samples/sec   Loss 2.5666   LearningRate 0.0468   Epoch: 6   Global Step: 105560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:19,915-Speed 3341.25 samples/sec   Loss 2.5362   LearningRate 0.0468   Epoch: 6   Global Step: 105570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:22,978-Speed 3344.94 samples/sec   Loss 2.5172   LearningRate 0.0467   Epoch: 6   Global Step: 105580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:26,052-Speed 3330.93 samples/sec   Loss 2.5443   LearningRate 0.0467   Epoch: 6   Global Step: 105590   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:23:29,120-Speed 3339.05 samples/sec   Loss 2.5294   LearningRate 0.0467   Epoch: 6   Global Step: 105600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:32,193-Speed 3333.11 samples/sec   Loss 2.5600   LearningRate 0.0467   Epoch: 6   Global Step: 105610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:35,260-Speed 3340.16 samples/sec   Loss 2.5534   LearningRate 0.0467   Epoch: 6   Global Step: 105620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:38,344-Speed 3320.72 samples/sec   Loss 2.5203   LearningRate 0.0467   Epoch: 6   Global Step: 105630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:41,414-Speed 3336.18 samples/sec   Loss 2.5206   LearningRate 0.0467   Epoch: 6   Global Step: 105640   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:44,476-Speed 3345.44 samples/sec   Loss 2.4874   LearningRate 0.0467   Epoch: 6   Global Step: 105650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:47,553-Speed 3328.53 samples/sec   Loss 2.6013   LearningRate 0.0467   Epoch: 6   Global Step: 105660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:50,624-Speed 3335.50 samples/sec   Loss 2.5404   LearningRate 0.0467   Epoch: 6   Global Step: 105670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:53,694-Speed 3336.83 samples/sec   Loss 2.5327   LearningRate 0.0467   Epoch: 6   Global Step: 105680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:56,767-Speed 3332.00 samples/sec   Loss 2.5814   LearningRate 0.0467   Epoch: 6   Global Step: 105690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:23:59,882-Speed 3289.01 samples/sec   Loss 2.5034   LearningRate 0.0467   Epoch: 6   Global Step: 105700   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:24:03,003-Speed 3282.21 samples/sec   Loss 2.5152   LearningRate 0.0467   Epoch: 6   Global Step: 105710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:06,102-Speed 3305.08 samples/sec   Loss 2.4817   LearningRate 0.0467   Epoch: 6   Global Step: 105720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:09,177-Speed 3330.54 samples/sec   Loss 2.5978   LearningRate 0.0467   Epoch: 6   Global Step: 105730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:12,245-Speed 3338.18 samples/sec   Loss 2.5696   LearningRate 0.0467   Epoch: 6   Global Step: 105740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:15,359-Speed 3289.49 samples/sec   Loss 2.5456   LearningRate 0.0467   Epoch: 6   Global Step: 105750   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:18,452-Speed 3311.49 samples/sec   Loss 2.4884   LearningRate 0.0467   Epoch: 6   Global Step: 105760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:21,516-Speed 3343.27 samples/sec   Loss 2.6765   LearningRate 0.0467   Epoch: 6   Global Step: 105770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:24,579-Speed 3343.94 samples/sec   Loss 2.5445   LearningRate 0.0467   Epoch: 6   Global Step: 105780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:27,661-Speed 3323.54 samples/sec   Loss 2.6328   LearningRate 0.0467   Epoch: 6   Global Step: 105790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:30,728-Speed 3339.12 samples/sec   Loss 2.5522   LearningRate 0.0467   Epoch: 6   Global Step: 105800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:33,782-Speed 3353.86 samples/sec   Loss 2.6030   LearningRate 0.0467   Epoch: 6   Global Step: 105810   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:36,847-Speed 3341.96 samples/sec   Loss 2.6265   LearningRate 0.0466   Epoch: 6   Global Step: 105820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:39,988-Speed 3262.21 samples/sec   Loss 2.5405   LearningRate 0.0466   Epoch: 6   Global Step: 105830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:43,065-Speed 3328.35 samples/sec   Loss 2.5836   LearningRate 0.0466   Epoch: 6   Global Step: 105840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:46,156-Speed 3314.73 samples/sec   Loss 2.5488   LearningRate 0.0466   Epoch: 6   Global Step: 105850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:49,242-Speed 3319.65 samples/sec   Loss 2.5659   LearningRate 0.0466   Epoch: 6   Global Step: 105860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:52,375-Speed 3269.27 samples/sec   Loss 2.5928   LearningRate 0.0466   Epoch: 6   Global Step: 105870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:55,508-Speed 3270.14 samples/sec   Loss 2.5757   LearningRate 0.0466   Epoch: 6   Global Step: 105880   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:24:58,639-Speed 3271.94 samples/sec   Loss 2.5677   LearningRate 0.0466   Epoch: 6   Global Step: 105890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:01,714-Speed 3330.44 samples/sec   Loss 2.7065   LearningRate 0.0466   Epoch: 6   Global Step: 105900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:04,861-Speed 3255.17 samples/sec   Loss 2.6243   LearningRate 0.0466   Epoch: 6   Global Step: 105910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:07,934-Speed 3332.73 samples/sec   Loss 2.5150   LearningRate 0.0466   Epoch: 6   Global Step: 105920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:11,036-Speed 3301.97 samples/sec   Loss 2.5891   LearningRate 0.0466   Epoch: 6   Global Step: 105930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:14,161-Speed 3278.02 samples/sec   Loss 2.5622   LearningRate 0.0466   Epoch: 6   Global Step: 105940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:17,223-Speed 3344.85 samples/sec   Loss 2.5875   LearningRate 0.0466   Epoch: 6   Global Step: 105950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:20,288-Speed 3342.45 samples/sec   Loss 2.7012   LearningRate 0.0466   Epoch: 6   Global Step: 105960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:23,352-Speed 3343.30 samples/sec   Loss 2.6491   LearningRate 0.0466   Epoch: 6   Global Step: 105970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:26,424-Speed 3333.46 samples/sec   Loss 2.5112   LearningRate 0.0466   Epoch: 6   Global Step: 105980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:25:29,549-Speed 3277.78 samples/sec   Loss 2.6146   LearningRate 0.0466   Epoch: 6   Global Step: 105990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:25:32,686-Speed 3265.24 samples/sec   Loss 2.5893   LearningRate 0.0466   Epoch: 6   Global Step: 106000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:26:17,134-[lfw][106000]XNorm: 20.759801
Training: 2022-04-11 10:26:17,135-[lfw][106000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 10:26:17,136-[lfw][106000]Accuracy-Highest: 0.99817
Training: 2022-04-11 10:27:08,520-[cfp_fp][106000]XNorm: 19.581719
Training: 2022-04-11 10:27:08,521-[cfp_fp][106000]Accuracy-Flip: 0.98414+-0.00577
Training: 2022-04-11 10:27:08,521-[cfp_fp][106000]Accuracy-Highest: 0.98614
Training: 2022-04-11 10:27:53,135-[agedb_30][106000]XNorm: 20.752442
Training: 2022-04-11 10:27:53,136-[agedb_30][106000]Accuracy-Flip: 0.98117+-0.00699
Training: 2022-04-11 10:27:53,137-[agedb_30][106000]Accuracy-Highest: 0.98250
Training: 2022-04-11 10:27:56,210-Speed 71.35 samples/sec   Loss 2.5398   LearningRate 0.0466   Epoch: 6   Global Step: 106010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:27:59,266-Speed 3351.78 samples/sec   Loss 2.5536   LearningRate 0.0466   Epoch: 6   Global Step: 106020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:28:02,323-Speed 3350.82 samples/sec   Loss 2.5693   LearningRate 0.0466   Epoch: 6   Global Step: 106030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:28:05,396-Speed 3333.67 samples/sec   Loss 2.6037   LearningRate 0.0466   Epoch: 6   Global Step: 106040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:28:08,458-Speed 3344.80 samples/sec   Loss 2.4858   LearningRate 0.0466   Epoch: 6   Global Step: 106050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:28:11,505-Speed 3361.38 samples/sec   Loss 2.4697   LearningRate 0.0466   Epoch: 6   Global Step: 106060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:14,566-Speed 3347.47 samples/sec   Loss 2.5694   LearningRate 0.0465   Epoch: 6   Global Step: 106070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:17,633-Speed 3340.20 samples/sec   Loss 2.4958   LearningRate 0.0465   Epoch: 6   Global Step: 106080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:20,697-Speed 3342.72 samples/sec   Loss 2.5617   LearningRate 0.0465   Epoch: 6   Global Step: 106090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:23,760-Speed 3343.50 samples/sec   Loss 2.5581   LearningRate 0.0465   Epoch: 6   Global Step: 106100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:26,823-Speed 3343.80 samples/sec   Loss 2.5654   LearningRate 0.0465   Epoch: 6   Global Step: 106110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:29,900-Speed 3329.12 samples/sec   Loss 2.5082   LearningRate 0.0465   Epoch: 6   Global Step: 106120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:32,982-Speed 3322.77 samples/sec   Loss 2.5332   LearningRate 0.0465   Epoch: 6   Global Step: 106130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:36,045-Speed 3343.91 samples/sec   Loss 2.5329   LearningRate 0.0465   Epoch: 6   Global Step: 106140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:39,113-Speed 3339.43 samples/sec   Loss 2.6077   LearningRate 0.0465   Epoch: 6   Global Step: 106150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:28:42,208-Speed 3308.82 samples/sec   Loss 2.6088   LearningRate 0.0465   Epoch: 6   Global Step: 106160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:28:45,272-Speed 3342.24 samples/sec   Loss 2.5336   LearningRate 0.0465   Epoch: 6   Global Step: 106170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:28:48,332-Speed 3347.18 samples/sec   Loss 2.6002   LearningRate 0.0465   Epoch: 6   Global Step: 106180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:28:51,434-Speed 3302.10 samples/sec   Loss 2.5583   LearningRate 0.0465   Epoch: 6   Global Step: 106190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:28:54,514-Speed 3325.38 samples/sec   Loss 2.6206   LearningRate 0.0465   Epoch: 6   Global Step: 106200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:28:57,583-Speed 3337.22 samples/sec   Loss 2.6118   LearningRate 0.0465   Epoch: 6   Global Step: 106210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:00,678-Speed 3311.26 samples/sec   Loss 2.6461   LearningRate 0.0465   Epoch: 6   Global Step: 106220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:03,751-Speed 3332.89 samples/sec   Loss 2.5957   LearningRate 0.0465   Epoch: 6   Global Step: 106230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:06,818-Speed 3339.92 samples/sec   Loss 2.5608   LearningRate 0.0465   Epoch: 6   Global Step: 106240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:09,918-Speed 3303.54 samples/sec   Loss 2.5770   LearningRate 0.0465   Epoch: 6   Global Step: 106250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:12,988-Speed 3336.82 samples/sec   Loss 2.5439   LearningRate 0.0465   Epoch: 6   Global Step: 106260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:16,053-Speed 3341.91 samples/sec   Loss 2.5766   LearningRate 0.0465   Epoch: 6   Global Step: 106270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:19,134-Speed 3324.43 samples/sec   Loss 2.6408   LearningRate 0.0465   Epoch: 6   Global Step: 106280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:22,212-Speed 3327.52 samples/sec   Loss 2.6459   LearningRate 0.0465   Epoch: 6   Global Step: 106290   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:25,284-Speed 3334.44 samples/sec   Loss 2.6143   LearningRate 0.0465   Epoch: 6   Global Step: 106300   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:28,350-Speed 3340.71 samples/sec   Loss 2.6014   LearningRate 0.0464   Epoch: 6   Global Step: 106310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:31,432-Speed 3322.59 samples/sec   Loss 2.5262   LearningRate 0.0464   Epoch: 6   Global Step: 106320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:34,535-Speed 3301.11 samples/sec   Loss 2.5534   LearningRate 0.0464   Epoch: 6   Global Step: 106330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:37,601-Speed 3341.44 samples/sec   Loss 2.5591   LearningRate 0.0464   Epoch: 6   Global Step: 106340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:40,663-Speed 3344.59 samples/sec   Loss 2.6391   LearningRate 0.0464   Epoch: 6   Global Step: 106350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:43,719-Speed 3351.63 samples/sec   Loss 2.5753   LearningRate 0.0464   Epoch: 6   Global Step: 106360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:46,801-Speed 3323.04 samples/sec   Loss 2.5507   LearningRate 0.0464   Epoch: 6   Global Step: 106370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:49,891-Speed 3315.54 samples/sec   Loss 2.5996   LearningRate 0.0464   Epoch: 6   Global Step: 106380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:52,958-Speed 3340.18 samples/sec   Loss 2.5872   LearningRate 0.0464   Epoch: 6   Global Step: 106390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:56,018-Speed 3346.52 samples/sec   Loss 2.5077   LearningRate 0.0464   Epoch: 6   Global Step: 106400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:29:59,096-Speed 3327.74 samples/sec   Loss 2.5824   LearningRate 0.0464   Epoch: 6   Global Step: 106410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:02,153-Speed 3351.06 samples/sec   Loss 2.5862   LearningRate 0.0464   Epoch: 6   Global Step: 106420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:05,225-Speed 3335.62 samples/sec   Loss 2.5360   LearningRate 0.0464   Epoch: 6   Global Step: 106430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:08,281-Speed 3351.43 samples/sec   Loss 2.5517   LearningRate 0.0464   Epoch: 6   Global Step: 106440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:11,352-Speed 3334.72 samples/sec   Loss 2.6475   LearningRate 0.0464   Epoch: 6   Global Step: 106450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:14,401-Speed 3359.48 samples/sec   Loss 2.5693   LearningRate 0.0464   Epoch: 6   Global Step: 106460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:17,466-Speed 3342.33 samples/sec   Loss 2.5485   LearningRate 0.0464   Epoch: 6   Global Step: 106470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:20,529-Speed 3343.41 samples/sec   Loss 2.5877   LearningRate 0.0464   Epoch: 6   Global Step: 106480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:23,606-Speed 3329.22 samples/sec   Loss 2.6283   LearningRate 0.0464   Epoch: 6   Global Step: 106490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:26,674-Speed 3338.81 samples/sec   Loss 2.6079   LearningRate 0.0464   Epoch: 6   Global Step: 106500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:29,764-Speed 3314.35 samples/sec   Loss 2.5820   LearningRate 0.0464   Epoch: 6   Global Step: 106510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:32,821-Speed 3350.09 samples/sec   Loss 2.5117   LearningRate 0.0464   Epoch: 6   Global Step: 106520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:35,884-Speed 3344.25 samples/sec   Loss 2.6174   LearningRate 0.0464   Epoch: 6   Global Step: 106530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:39,005-Speed 3282.28 samples/sec   Loss 2.6660   LearningRate 0.0464   Epoch: 6   Global Step: 106540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:42,077-Speed 3334.68 samples/sec   Loss 2.5741   LearningRate 0.0464   Epoch: 6   Global Step: 106550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:45,131-Speed 3353.32 samples/sec   Loss 2.5166   LearningRate 0.0463   Epoch: 6   Global Step: 106560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:48,294-Speed 3237.83 samples/sec   Loss 2.5130   LearningRate 0.0463   Epoch: 6   Global Step: 106570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:51,360-Speed 3341.40 samples/sec   Loss 2.5334   LearningRate 0.0463   Epoch: 6   Global Step: 106580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:30:54,443-Speed 3321.65 samples/sec   Loss 2.5049   LearningRate 0.0463   Epoch: 6   Global Step: 106590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:30:57,528-Speed 3320.32 samples/sec   Loss 2.5343   LearningRate 0.0463   Epoch: 6   Global Step: 106600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:00,605-Speed 3329.32 samples/sec   Loss 2.5749   LearningRate 0.0463   Epoch: 6   Global Step: 106610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:03,674-Speed 3337.17 samples/sec   Loss 2.6021   LearningRate 0.0463   Epoch: 6   Global Step: 106620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:06,778-Speed 3301.54 samples/sec   Loss 2.6065   LearningRate 0.0463   Epoch: 6   Global Step: 106630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:09,844-Speed 3340.27 samples/sec   Loss 2.6365   LearningRate 0.0463   Epoch: 6   Global Step: 106640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:12,918-Speed 3332.86 samples/sec   Loss 2.6276   LearningRate 0.0463   Epoch: 6   Global Step: 106650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:15,977-Speed 3348.67 samples/sec   Loss 2.5451   LearningRate 0.0463   Epoch: 6   Global Step: 106660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:19,040-Speed 3344.36 samples/sec   Loss 2.5973   LearningRate 0.0463   Epoch: 6   Global Step: 106670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:22,100-Speed 3346.71 samples/sec   Loss 2.5416   LearningRate 0.0463   Epoch: 6   Global Step: 106680   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:25,181-Speed 3324.91 samples/sec   Loss 2.5654   LearningRate 0.0463   Epoch: 6   Global Step: 106690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:31:28,240-Speed 3347.96 samples/sec   Loss 2.6173   LearningRate 0.0463   Epoch: 6   Global Step: 106700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:31:31,327-Speed 3318.74 samples/sec   Loss 2.5455   LearningRate 0.0463   Epoch: 6   Global Step: 106710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:31:34,395-Speed 3338.85 samples/sec   Loss 2.5719   LearningRate 0.0463   Epoch: 6   Global Step: 106720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:37,486-Speed 3314.16 samples/sec   Loss 2.5993   LearningRate 0.0463   Epoch: 6   Global Step: 106730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:40,552-Speed 3339.74 samples/sec   Loss 2.6297   LearningRate 0.0463   Epoch: 6   Global Step: 106740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:43,641-Speed 3315.83 samples/sec   Loss 2.6042   LearningRate 0.0463   Epoch: 6   Global Step: 106750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:46,750-Speed 3294.95 samples/sec   Loss 2.5271   LearningRate 0.0463   Epoch: 6   Global Step: 106760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:49,816-Speed 3340.81 samples/sec   Loss 2.6056   LearningRate 0.0463   Epoch: 6   Global Step: 106770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:52,878-Speed 3344.74 samples/sec   Loss 2.5441   LearningRate 0.0463   Epoch: 6   Global Step: 106780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:55,944-Speed 3340.47 samples/sec   Loss 2.5919   LearningRate 0.0463   Epoch: 6   Global Step: 106790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:31:59,085-Speed 3261.36 samples/sec   Loss 2.5926   LearningRate 0.0462   Epoch: 6   Global Step: 106800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:02,159-Speed 3331.88 samples/sec   Loss 2.6027   LearningRate 0.0462   Epoch: 6   Global Step: 106810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:05,231-Speed 3334.65 samples/sec   Loss 2.4959   LearningRate 0.0462   Epoch: 6   Global Step: 106820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:08,295-Speed 3342.08 samples/sec   Loss 2.5882   LearningRate 0.0462   Epoch: 6   Global Step: 106830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:11,397-Speed 3302.80 samples/sec   Loss 2.5878   LearningRate 0.0462   Epoch: 6   Global Step: 106840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:14,461-Speed 3341.96 samples/sec   Loss 2.5646   LearningRate 0.0462   Epoch: 6   Global Step: 106850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:17,526-Speed 3341.52 samples/sec   Loss 2.5055   LearningRate 0.0462   Epoch: 6   Global Step: 106860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:20,596-Speed 3336.91 samples/sec   Loss 2.5437   LearningRate 0.0462   Epoch: 6   Global Step: 106870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:23,679-Speed 3322.76 samples/sec   Loss 2.5518   LearningRate 0.0462   Epoch: 6   Global Step: 106880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:26,742-Speed 3343.48 samples/sec   Loss 2.5730   LearningRate 0.0462   Epoch: 6   Global Step: 106890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:29,815-Speed 3333.54 samples/sec   Loss 2.5285   LearningRate 0.0462   Epoch: 6   Global Step: 106900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:32,909-Speed 3311.29 samples/sec   Loss 2.5642   LearningRate 0.0462   Epoch: 6   Global Step: 106910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:32:35,974-Speed 3341.97 samples/sec   Loss 2.5816   LearningRate 0.0462   Epoch: 6   Global Step: 106920   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:32:39,053-Speed 3325.91 samples/sec   Loss 2.6105   LearningRate 0.0462   Epoch: 6   Global Step: 106930   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:32:42,133-Speed 3325.73 samples/sec   Loss 2.5232   LearningRate 0.0462   Epoch: 6   Global Step: 106940   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:32:45,234-Speed 3303.89 samples/sec   Loss 2.5595   LearningRate 0.0462   Epoch: 6   Global Step: 106950   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:32:48,333-Speed 3304.51 samples/sec   Loss 2.5803   LearningRate 0.0462   Epoch: 6   Global Step: 106960   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:32:51,406-Speed 3333.18 samples/sec   Loss 2.5655   LearningRate 0.0462   Epoch: 6   Global Step: 106970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:32:54,474-Speed 3339.26 samples/sec   Loss 2.5946   LearningRate 0.0462   Epoch: 6   Global Step: 106980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:32:57,539-Speed 3341.93 samples/sec   Loss 2.5630   LearningRate 0.0462   Epoch: 6   Global Step: 106990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:00,604-Speed 3341.42 samples/sec   Loss 2.5943   LearningRate 0.0462   Epoch: 6   Global Step: 107000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:03,670-Speed 3340.80 samples/sec   Loss 2.6110   LearningRate 0.0462   Epoch: 6   Global Step: 107010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:06,751-Speed 3324.45 samples/sec   Loss 2.5763   LearningRate 0.0462   Epoch: 6   Global Step: 107020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:09,841-Speed 3315.21 samples/sec   Loss 2.5644   LearningRate 0.0462   Epoch: 6   Global Step: 107030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:12,921-Speed 3326.32 samples/sec   Loss 2.5560   LearningRate 0.0462   Epoch: 6   Global Step: 107040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:15,999-Speed 3327.17 samples/sec   Loss 2.6654   LearningRate 0.0461   Epoch: 6   Global Step: 107050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:19,080-Speed 3323.84 samples/sec   Loss 2.5377   LearningRate 0.0461   Epoch: 6   Global Step: 107060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:22,145-Speed 3341.77 samples/sec   Loss 2.5642   LearningRate 0.0461   Epoch: 6   Global Step: 107070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:25,225-Speed 3326.55 samples/sec   Loss 2.5992   LearningRate 0.0461   Epoch: 6   Global Step: 107080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:28,309-Speed 3321.00 samples/sec   Loss 2.6139   LearningRate 0.0461   Epoch: 6   Global Step: 107090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:31,388-Speed 3326.83 samples/sec   Loss 2.5987   LearningRate 0.0461   Epoch: 6   Global Step: 107100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:33:34,486-Speed 3305.38 samples/sec   Loss 2.5836   LearningRate 0.0461   Epoch: 6   Global Step: 107110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:37,553-Speed 3340.28 samples/sec   Loss 2.6117   LearningRate 0.0461   Epoch: 6   Global Step: 107120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:40,628-Speed 3330.89 samples/sec   Loss 2.6383   LearningRate 0.0461   Epoch: 6   Global Step: 107130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:43,687-Speed 3347.76 samples/sec   Loss 2.5812   LearningRate 0.0461   Epoch: 6   Global Step: 107140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:46,748-Speed 3346.37 samples/sec   Loss 2.5345   LearningRate 0.0461   Epoch: 6   Global Step: 107150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:49,818-Speed 3336.89 samples/sec   Loss 2.5979   LearningRate 0.0461   Epoch: 6   Global Step: 107160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:52,886-Speed 3338.46 samples/sec   Loss 2.5743   LearningRate 0.0461   Epoch: 6   Global Step: 107170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:55,977-Speed 3313.71 samples/sec   Loss 2.5862   LearningRate 0.0461   Epoch: 6   Global Step: 107180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:33:59,049-Speed 3334.21 samples/sec   Loss 2.5110   LearningRate 0.0461   Epoch: 6   Global Step: 107190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:02,118-Speed 3337.30 samples/sec   Loss 2.6653   LearningRate 0.0461   Epoch: 6   Global Step: 107200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:05,173-Speed 3352.46 samples/sec   Loss 2.5031   LearningRate 0.0461   Epoch: 6   Global Step: 107210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:08,238-Speed 3342.19 samples/sec   Loss 2.5844   LearningRate 0.0461   Epoch: 6   Global Step: 107220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:11,302-Speed 3343.86 samples/sec   Loss 2.6179   LearningRate 0.0461   Epoch: 6   Global Step: 107230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:14,367-Speed 3340.67 samples/sec   Loss 2.6160   LearningRate 0.0461   Epoch: 6   Global Step: 107240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:17,434-Speed 3340.27 samples/sec   Loss 2.6211   LearningRate 0.0461   Epoch: 6   Global Step: 107250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:20,513-Speed 3326.24 samples/sec   Loss 2.6010   LearningRate 0.0461   Epoch: 6   Global Step: 107260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:23,602-Speed 3316.10 samples/sec   Loss 2.5677   LearningRate 0.0461   Epoch: 6   Global Step: 107270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:26,670-Speed 3338.75 samples/sec   Loss 2.6088   LearningRate 0.0461   Epoch: 6   Global Step: 107280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:29,775-Speed 3298.45 samples/sec   Loss 2.5875   LearningRate 0.0460   Epoch: 6   Global Step: 107290   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:32,852-Speed 3329.12 samples/sec   Loss 2.6177   LearningRate 0.0460   Epoch: 6   Global Step: 107300   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:35,920-Speed 3338.11 samples/sec   Loss 2.5643   LearningRate 0.0460   Epoch: 6   Global Step: 107310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:38,984-Speed 3343.71 samples/sec   Loss 2.5967   LearningRate 0.0460   Epoch: 6   Global Step: 107320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:42,063-Speed 3327.07 samples/sec   Loss 2.5809   LearningRate 0.0460   Epoch: 6   Global Step: 107330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:45,125-Speed 3344.58 samples/sec   Loss 2.6461   LearningRate 0.0460   Epoch: 6   Global Step: 107340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:48,202-Speed 3328.88 samples/sec   Loss 2.5385   LearningRate 0.0460   Epoch: 6   Global Step: 107350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:51,284-Speed 3323.50 samples/sec   Loss 2.6259   LearningRate 0.0460   Epoch: 6   Global Step: 107360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:54,355-Speed 3335.47 samples/sec   Loss 2.6126   LearningRate 0.0460   Epoch: 6   Global Step: 107370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:34:57,421-Speed 3340.20 samples/sec   Loss 2.5581   LearningRate 0.0460   Epoch: 6   Global Step: 107380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:35:00,487-Speed 3341.56 samples/sec   Loss 2.5559   LearningRate 0.0460   Epoch: 6   Global Step: 107390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:35:03,555-Speed 3338.44 samples/sec   Loss 2.5993   LearningRate 0.0460   Epoch: 6   Global Step: 107400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:35:06,635-Speed 3325.18 samples/sec   Loss 2.5732   LearningRate 0.0460   Epoch: 6   Global Step: 107410   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:35:09,690-Speed 3352.81 samples/sec   Loss 2.5290   LearningRate 0.0460   Epoch: 6   Global Step: 107420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:12,753-Speed 3344.20 samples/sec   Loss 2.5437   LearningRate 0.0460   Epoch: 6   Global Step: 107430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:15,819-Speed 3340.19 samples/sec   Loss 2.5834   LearningRate 0.0460   Epoch: 6   Global Step: 107440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:18,933-Speed 3290.45 samples/sec   Loss 2.5554   LearningRate 0.0460   Epoch: 6   Global Step: 107450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:22,013-Speed 3325.14 samples/sec   Loss 2.6033   LearningRate 0.0460   Epoch: 6   Global Step: 107460   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:25,096-Speed 3322.44 samples/sec   Loss 2.6323   LearningRate 0.0460   Epoch: 6   Global Step: 107470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:28,176-Speed 3325.69 samples/sec   Loss 2.6273   LearningRate 0.0460   Epoch: 6   Global Step: 107480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:31,243-Speed 3340.44 samples/sec   Loss 2.5863   LearningRate 0.0460   Epoch: 6   Global Step: 107490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:34,310-Speed 3338.98 samples/sec   Loss 2.6202   LearningRate 0.0460   Epoch: 6   Global Step: 107500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:37,411-Speed 3303.74 samples/sec   Loss 2.5096   LearningRate 0.0460   Epoch: 6   Global Step: 107510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:35:40,478-Speed 3339.30 samples/sec   Loss 2.5548   LearningRate 0.0460   Epoch: 6   Global Step: 107520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:35:43,553-Speed 3331.85 samples/sec   Loss 2.6145   LearningRate 0.0460   Epoch: 6   Global Step: 107530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:35:46,652-Speed 3304.73 samples/sec   Loss 2.5655   LearningRate 0.0459   Epoch: 6   Global Step: 107540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:35:49,730-Speed 3327.20 samples/sec   Loss 2.6222   LearningRate 0.0459   Epoch: 6   Global Step: 107550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:35:52,795-Speed 3342.03 samples/sec   Loss 2.5488   LearningRate 0.0459   Epoch: 6   Global Step: 107560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:35:55,959-Speed 3237.06 samples/sec   Loss 2.5382   LearningRate 0.0459   Epoch: 6   Global Step: 107570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:35:59,026-Speed 3340.08 samples/sec   Loss 2.6303   LearningRate 0.0459   Epoch: 6   Global Step: 107580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:02,089-Speed 3344.44 samples/sec   Loss 2.6393   LearningRate 0.0459   Epoch: 6   Global Step: 107590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:05,153-Speed 3342.88 samples/sec   Loss 2.6450   LearningRate 0.0459   Epoch: 6   Global Step: 107600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:08,214-Speed 3346.16 samples/sec   Loss 2.5803   LearningRate 0.0459   Epoch: 6   Global Step: 107610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:11,288-Speed 3331.35 samples/sec   Loss 2.5676   LearningRate 0.0459   Epoch: 6   Global Step: 107620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:14,353-Speed 3341.91 samples/sec   Loss 2.5267   LearningRate 0.0459   Epoch: 6   Global Step: 107630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:17,436-Speed 3321.79 samples/sec   Loss 2.5683   LearningRate 0.0459   Epoch: 6   Global Step: 107640   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:20,517-Speed 3325.42 samples/sec   Loss 2.5640   LearningRate 0.0459   Epoch: 6   Global Step: 107650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:23,632-Speed 3287.99 samples/sec   Loss 2.6071   LearningRate 0.0459   Epoch: 6   Global Step: 107660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:26,698-Speed 3340.27 samples/sec   Loss 2.6382   LearningRate 0.0459   Epoch: 6   Global Step: 107670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:29,770-Speed 3334.70 samples/sec   Loss 2.5673   LearningRate 0.0459   Epoch: 6   Global Step: 107680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:32,854-Speed 3321.01 samples/sec   Loss 2.5482   LearningRate 0.0459   Epoch: 6   Global Step: 107690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:35,917-Speed 3343.47 samples/sec   Loss 2.6143   LearningRate 0.0459   Epoch: 6   Global Step: 107700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:39,016-Speed 3304.78 samples/sec   Loss 2.5371   LearningRate 0.0459   Epoch: 6   Global Step: 107710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:42,079-Speed 3344.06 samples/sec   Loss 2.5681   LearningRate 0.0459   Epoch: 6   Global Step: 107720   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:36:45,155-Speed 3330.24 samples/sec   Loss 2.6237   LearningRate 0.0459   Epoch: 6   Global Step: 107730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:48,224-Speed 3337.53 samples/sec   Loss 2.6235   LearningRate 0.0459   Epoch: 6   Global Step: 107740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:51,307-Speed 3321.95 samples/sec   Loss 2.5003   LearningRate 0.0459   Epoch: 6   Global Step: 107750   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:54,394-Speed 3317.68 samples/sec   Loss 2.5125   LearningRate 0.0459   Epoch: 6   Global Step: 107760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:36:57,470-Speed 3329.83 samples/sec   Loss 2.5576   LearningRate 0.0459   Epoch: 6   Global Step: 107770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:00,547-Speed 3328.65 samples/sec   Loss 2.5759   LearningRate 0.0459   Epoch: 6   Global Step: 107780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:03,621-Speed 3332.91 samples/sec   Loss 2.6548   LearningRate 0.0458   Epoch: 6   Global Step: 107790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:06,693-Speed 3333.31 samples/sec   Loss 2.6139   LearningRate 0.0458   Epoch: 6   Global Step: 107800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:09,807-Speed 3289.50 samples/sec   Loss 2.5374   LearningRate 0.0458   Epoch: 6   Global Step: 107810   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:12,874-Speed 3340.50 samples/sec   Loss 2.6359   LearningRate 0.0458   Epoch: 6   Global Step: 107820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:15,927-Speed 3354.90 samples/sec   Loss 2.5558   LearningRate 0.0458   Epoch: 6   Global Step: 107830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:18,978-Speed 3357.15 samples/sec   Loss 2.6013   LearningRate 0.0458   Epoch: 6   Global Step: 107840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:22,043-Speed 3342.06 samples/sec   Loss 2.5517   LearningRate 0.0458   Epoch: 6   Global Step: 107850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:25,121-Speed 3327.44 samples/sec   Loss 2.5363   LearningRate 0.0458   Epoch: 6   Global Step: 107860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:28,208-Speed 3318.06 samples/sec   Loss 2.5720   LearningRate 0.0458   Epoch: 6   Global Step: 107870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:31,287-Speed 3326.01 samples/sec   Loss 2.6783   LearningRate 0.0458   Epoch: 6   Global Step: 107880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:34,390-Speed 3300.54 samples/sec   Loss 2.5380   LearningRate 0.0458   Epoch: 6   Global Step: 107890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:37,454-Speed 3343.68 samples/sec   Loss 2.5946   LearningRate 0.0458   Epoch: 6   Global Step: 107900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:40,544-Speed 3314.29 samples/sec   Loss 2.5322   LearningRate 0.0458   Epoch: 6   Global Step: 107910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:43,644-Speed 3304.26 samples/sec   Loss 2.5641   LearningRate 0.0458   Epoch: 6   Global Step: 107920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:46,719-Speed 3331.21 samples/sec   Loss 2.6556   LearningRate 0.0458   Epoch: 6   Global Step: 107930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:37:49,822-Speed 3300.80 samples/sec   Loss 2.6576   LearningRate 0.0458   Epoch: 6   Global Step: 107940   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:52,897-Speed 3331.46 samples/sec   Loss 2.5845   LearningRate 0.0458   Epoch: 6   Global Step: 107950   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:55,965-Speed 3337.95 samples/sec   Loss 2.5339   LearningRate 0.0458   Epoch: 6   Global Step: 107960   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:37:59,049-Speed 3321.66 samples/sec   Loss 2.5814   LearningRate 0.0458   Epoch: 6   Global Step: 107970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:38:02,123-Speed 3330.83 samples/sec   Loss 2.5092   LearningRate 0.0458   Epoch: 6   Global Step: 107980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:38:05,191-Speed 3338.59 samples/sec   Loss 2.5012   LearningRate 0.0458   Epoch: 6   Global Step: 107990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:38:08,266-Speed 3331.67 samples/sec   Loss 2.5473   LearningRate 0.0458   Epoch: 6   Global Step: 108000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:38:52,178-[lfw][108000]XNorm: 22.657624
Training: 2022-04-11 10:38:52,179-[lfw][108000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 10:38:52,180-[lfw][108000]Accuracy-Highest: 0.99817
Training: 2022-04-11 10:39:43,012-[cfp_fp][108000]XNorm: 21.695675
Training: 2022-04-11 10:39:43,012-[cfp_fp][108000]Accuracy-Flip: 0.98614+-0.00531
Training: 2022-04-11 10:39:43,013-[cfp_fp][108000]Accuracy-Highest: 0.98614
Training: 2022-04-11 10:40:27,076-[agedb_30][108000]XNorm: 22.607145
Training: 2022-04-11 10:40:27,077-[agedb_30][108000]Accuracy-Flip: 0.98217+-0.00803
Training: 2022-04-11 10:40:27,077-[agedb_30][108000]Accuracy-Highest: 0.98250
Training: 2022-04-11 10:40:30,166-Speed 72.16 samples/sec   Loss 2.6189   LearningRate 0.0458   Epoch: 6   Global Step: 108010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:40:33,226-Speed 3347.29 samples/sec   Loss 2.5937   LearningRate 0.0458   Epoch: 6   Global Step: 108020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:40:36,302-Speed 3329.28 samples/sec   Loss 2.6096   LearningRate 0.0457   Epoch: 6   Global Step: 108030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:40:39,354-Speed 3355.77 samples/sec   Loss 2.5388   LearningRate 0.0457   Epoch: 6   Global Step: 108040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:40:42,440-Speed 3319.25 samples/sec   Loss 2.5944   LearningRate 0.0457   Epoch: 6   Global Step: 108050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:40:45,547-Speed 3297.30 samples/sec   Loss 2.5861   LearningRate 0.0457   Epoch: 6   Global Step: 108060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:40:48,604-Speed 3350.74 samples/sec   Loss 2.6161   LearningRate 0.0457   Epoch: 6   Global Step: 108070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:40:51,672-Speed 3338.82 samples/sec   Loss 2.5994   LearningRate 0.0457   Epoch: 6   Global Step: 108080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:40:54,762-Speed 3314.31 samples/sec   Loss 2.5967   LearningRate 0.0457   Epoch: 6   Global Step: 108090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:40:57,829-Speed 3339.45 samples/sec   Loss 2.5397   LearningRate 0.0457   Epoch: 6   Global Step: 108100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:41:00,935-Speed 3297.46 samples/sec   Loss 2.5952   LearningRate 0.0457   Epoch: 6   Global Step: 108110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:41:04,057-Speed 3282.75 samples/sec   Loss 2.5332   LearningRate 0.0457   Epoch: 6   Global Step: 108120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:41:07,122-Speed 3341.47 samples/sec   Loss 2.6319   LearningRate 0.0457   Epoch: 6   Global Step: 108130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:41:10,190-Speed 3338.50 samples/sec   Loss 2.5702   LearningRate 0.0457   Epoch: 6   Global Step: 108140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:41:13,342-Speed 3249.84 samples/sec   Loss 2.6673   LearningRate 0.0457   Epoch: 6   Global Step: 108150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:41:16,412-Speed 3336.38 samples/sec   Loss 2.5665   LearningRate 0.0457   Epoch: 6   Global Step: 108160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:41:19,475-Speed 3344.16 samples/sec   Loss 2.6153   LearningRate 0.0457   Epoch: 6   Global Step: 108170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:41:22,540-Speed 3342.17 samples/sec   Loss 2.6401   LearningRate 0.0457   Epoch: 6   Global Step: 108180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:25,617-Speed 3328.08 samples/sec   Loss 2.6458   LearningRate 0.0457   Epoch: 6   Global Step: 108190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:28,681-Speed 3343.01 samples/sec   Loss 2.4759   LearningRate 0.0457   Epoch: 6   Global Step: 108200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:31,751-Speed 3336.77 samples/sec   Loss 2.6202   LearningRate 0.0457   Epoch: 6   Global Step: 108210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:34,815-Speed 3342.61 samples/sec   Loss 2.6250   LearningRate 0.0457   Epoch: 6   Global Step: 108220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:37,897-Speed 3323.68 samples/sec   Loss 2.6021   LearningRate 0.0457   Epoch: 6   Global Step: 108230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:41,009-Speed 3291.42 samples/sec   Loss 2.5492   LearningRate 0.0457   Epoch: 6   Global Step: 108240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:44,099-Speed 3314.18 samples/sec   Loss 2.6235   LearningRate 0.0457   Epoch: 6   Global Step: 108250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:47,250-Speed 3250.87 samples/sec   Loss 2.5718   LearningRate 0.0457   Epoch: 6   Global Step: 108260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:50,308-Speed 3349.12 samples/sec   Loss 2.6149   LearningRate 0.0457   Epoch: 6   Global Step: 108270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:53,358-Speed 3358.84 samples/sec   Loss 2.5232   LearningRate 0.0456   Epoch: 6   Global Step: 108280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:56,434-Speed 3328.70 samples/sec   Loss 2.6050   LearningRate 0.0456   Epoch: 6   Global Step: 108290   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:41:59,507-Speed 3333.84 samples/sec   Loss 2.6442   LearningRate 0.0456   Epoch: 6   Global Step: 108300   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:02,629-Speed 3281.18 samples/sec   Loss 2.5616   LearningRate 0.0456   Epoch: 6   Global Step: 108310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:05,707-Speed 3327.70 samples/sec   Loss 2.5785   LearningRate 0.0456   Epoch: 6   Global Step: 108320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:08,837-Speed 3271.87 samples/sec   Loss 2.6131   LearningRate 0.0456   Epoch: 6   Global Step: 108330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:11,920-Speed 3322.32 samples/sec   Loss 2.5741   LearningRate 0.0456   Epoch: 6   Global Step: 108340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:15,094-Speed 3227.42 samples/sec   Loss 2.6296   LearningRate 0.0456   Epoch: 6   Global Step: 108350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:18,163-Speed 3337.15 samples/sec   Loss 2.6265   LearningRate 0.0456   Epoch: 6   Global Step: 108360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:21,281-Speed 3285.73 samples/sec   Loss 2.6222   LearningRate 0.0456   Epoch: 6   Global Step: 108370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:24,366-Speed 3320.52 samples/sec   Loss 2.6448   LearningRate 0.0456   Epoch: 6   Global Step: 108380   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:42:27,424-Speed 3349.25 samples/sec   Loss 2.5885   LearningRate 0.0456   Epoch: 6   Global Step: 108390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:30,488-Speed 3342.75 samples/sec   Loss 2.5784   LearningRate 0.0456   Epoch: 6   Global Step: 108400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:33,548-Speed 3347.71 samples/sec   Loss 2.5642   LearningRate 0.0456   Epoch: 6   Global Step: 108410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:36,637-Speed 3315.74 samples/sec   Loss 2.6434   LearningRate 0.0456   Epoch: 6   Global Step: 108420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:39,736-Speed 3304.75 samples/sec   Loss 2.5887   LearningRate 0.0456   Epoch: 6   Global Step: 108430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:42,800-Speed 3343.86 samples/sec   Loss 2.5819   LearningRate 0.0456   Epoch: 6   Global Step: 108440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:45,863-Speed 3343.39 samples/sec   Loss 2.6416   LearningRate 0.0456   Epoch: 6   Global Step: 108450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:48,926-Speed 3344.34 samples/sec   Loss 2.5819   LearningRate 0.0456   Epoch: 6   Global Step: 108460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:51,985-Speed 3348.36 samples/sec   Loss 2.6263   LearningRate 0.0456   Epoch: 6   Global Step: 108470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:55,056-Speed 3335.06 samples/sec   Loss 2.5044   LearningRate 0.0456   Epoch: 6   Global Step: 108480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:42:58,110-Speed 3354.30 samples/sec   Loss 2.5237   LearningRate 0.0456   Epoch: 6   Global Step: 108490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:01,181-Speed 3334.66 samples/sec   Loss 2.6479   LearningRate 0.0456   Epoch: 6   Global Step: 108500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:04,245-Speed 3343.77 samples/sec   Loss 2.6352   LearningRate 0.0456   Epoch: 6   Global Step: 108510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:07,393-Speed 3253.50 samples/sec   Loss 2.5593   LearningRate 0.0456   Epoch: 6   Global Step: 108520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:10,474-Speed 3323.63 samples/sec   Loss 2.5905   LearningRate 0.0455   Epoch: 6   Global Step: 108530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:13,538-Speed 3344.32 samples/sec   Loss 2.5809   LearningRate 0.0455   Epoch: 6   Global Step: 108540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:16,595-Speed 3350.94 samples/sec   Loss 2.6323   LearningRate 0.0455   Epoch: 6   Global Step: 108550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:19,663-Speed 3337.70 samples/sec   Loss 2.6011   LearningRate 0.0455   Epoch: 6   Global Step: 108560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:22,734-Speed 3335.64 samples/sec   Loss 2.5842   LearningRate 0.0455   Epoch: 6   Global Step: 108570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:25,807-Speed 3333.07 samples/sec   Loss 2.5825   LearningRate 0.0455   Epoch: 6   Global Step: 108580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:28,860-Speed 3355.74 samples/sec   Loss 2.5696   LearningRate 0.0455   Epoch: 6   Global Step: 108590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:31,935-Speed 3329.90 samples/sec   Loss 2.6105   LearningRate 0.0455   Epoch: 6   Global Step: 108600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:35,002-Speed 3339.92 samples/sec   Loss 2.5234   LearningRate 0.0455   Epoch: 6   Global Step: 108610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:38,078-Speed 3330.11 samples/sec   Loss 2.5457   LearningRate 0.0455   Epoch: 6   Global Step: 108620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:41,157-Speed 3326.39 samples/sec   Loss 2.5686   LearningRate 0.0455   Epoch: 6   Global Step: 108630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:44,221-Speed 3343.29 samples/sec   Loss 2.5723   LearningRate 0.0455   Epoch: 6   Global Step: 108640   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:47,294-Speed 3332.94 samples/sec   Loss 2.6065   LearningRate 0.0455   Epoch: 6   Global Step: 108650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:50,393-Speed 3306.23 samples/sec   Loss 2.5930   LearningRate 0.0455   Epoch: 6   Global Step: 108660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:53,461-Speed 3338.29 samples/sec   Loss 2.5312   LearningRate 0.0455   Epoch: 6   Global Step: 108670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:56,546-Speed 3320.02 samples/sec   Loss 2.5169   LearningRate 0.0455   Epoch: 6   Global Step: 108680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:43:59,682-Speed 3266.85 samples/sec   Loss 2.5862   LearningRate 0.0455   Epoch: 6   Global Step: 108690   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:44:02,747-Speed 3341.67 samples/sec   Loss 2.5130   LearningRate 0.0455   Epoch: 6   Global Step: 108700   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:44:05,823-Speed 3330.96 samples/sec   Loss 2.5462   LearningRate 0.0455   Epoch: 6   Global Step: 108710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:44:08,943-Speed 3282.77 samples/sec   Loss 2.6416   LearningRate 0.0455   Epoch: 6   Global Step: 108720   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:12,028-Speed 3319.77 samples/sec   Loss 2.6239   LearningRate 0.0455   Epoch: 6   Global Step: 108730   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:15,090-Speed 3345.75 samples/sec   Loss 2.5305   LearningRate 0.0455   Epoch: 6   Global Step: 108740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:18,164-Speed 3331.85 samples/sec   Loss 2.6146   LearningRate 0.0455   Epoch: 6   Global Step: 108750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:21,285-Speed 3281.75 samples/sec   Loss 2.5653   LearningRate 0.0455   Epoch: 6   Global Step: 108760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:24,345-Speed 3346.91 samples/sec   Loss 2.5767   LearningRate 0.0454   Epoch: 6   Global Step: 108770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:27,421-Speed 3330.62 samples/sec   Loss 2.6533   LearningRate 0.0454   Epoch: 6   Global Step: 108780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:30,486-Speed 3340.59 samples/sec   Loss 2.5898   LearningRate 0.0454   Epoch: 6   Global Step: 108790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:33,573-Speed 3318.62 samples/sec   Loss 2.5987   LearningRate 0.0454   Epoch: 6   Global Step: 108800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:36,727-Speed 3247.77 samples/sec   Loss 2.6386   LearningRate 0.0454   Epoch: 6   Global Step: 108810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:44:39,801-Speed 3332.49 samples/sec   Loss 2.5535   LearningRate 0.0454   Epoch: 6   Global Step: 108820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:44:42,885-Speed 3321.29 samples/sec   Loss 2.5478   LearningRate 0.0454   Epoch: 6   Global Step: 108830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:44:45,957-Speed 3335.21 samples/sec   Loss 2.5888   LearningRate 0.0454   Epoch: 6   Global Step: 108840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:44:49,022-Speed 3340.99 samples/sec   Loss 2.5357   LearningRate 0.0454   Epoch: 6   Global Step: 108850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:44:52,098-Speed 3330.06 samples/sec   Loss 2.5837   LearningRate 0.0454   Epoch: 6   Global Step: 108860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:44:55,206-Speed 3295.89 samples/sec   Loss 2.6105   LearningRate 0.0454   Epoch: 6   Global Step: 108870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:44:58,286-Speed 3326.69 samples/sec   Loss 2.5751   LearningRate 0.0454   Epoch: 6   Global Step: 108880   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:45:01,367-Speed 3324.33 samples/sec   Loss 2.5760   LearningRate 0.0454   Epoch: 6   Global Step: 108890   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:45:04,427-Speed 3347.02 samples/sec   Loss 2.5316   LearningRate 0.0454   Epoch: 6   Global Step: 108900   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:45:07,578-Speed 3250.14 samples/sec   Loss 2.5808   LearningRate 0.0454   Epoch: 6   Global Step: 108910   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:45:10,700-Speed 3282.98 samples/sec   Loss 2.6493   LearningRate 0.0454   Epoch: 6   Global Step: 108920   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:45:13,762-Speed 3344.27 samples/sec   Loss 2.5820   LearningRate 0.0454   Epoch: 6   Global Step: 108930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:16,884-Speed 3281.86 samples/sec   Loss 2.6323   LearningRate 0.0454   Epoch: 6   Global Step: 108940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:19,958-Speed 3331.85 samples/sec   Loss 2.5589   LearningRate 0.0454   Epoch: 6   Global Step: 108950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:23,021-Speed 3344.11 samples/sec   Loss 2.6116   LearningRate 0.0454   Epoch: 6   Global Step: 108960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:26,135-Speed 3288.73 samples/sec   Loss 2.5664   LearningRate 0.0454   Epoch: 6   Global Step: 108970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:29,199-Speed 3342.24 samples/sec   Loss 2.5381   LearningRate 0.0454   Epoch: 6   Global Step: 108980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:32,271-Speed 3334.94 samples/sec   Loss 2.5903   LearningRate 0.0454   Epoch: 6   Global Step: 108990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:35,342-Speed 3334.60 samples/sec   Loss 2.5965   LearningRate 0.0454   Epoch: 6   Global Step: 109000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:38,444-Speed 3302.20 samples/sec   Loss 2.5848   LearningRate 0.0454   Epoch: 6   Global Step: 109010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:41,582-Speed 3263.68 samples/sec   Loss 2.5763   LearningRate 0.0453   Epoch: 6   Global Step: 109020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:45:44,685-Speed 3301.29 samples/sec   Loss 2.5921   LearningRate 0.0453   Epoch: 6   Global Step: 109030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:45:47,758-Speed 3333.15 samples/sec   Loss 2.6599   LearningRate 0.0453   Epoch: 6   Global Step: 109040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:45:50,843-Speed 3319.85 samples/sec   Loss 2.6381   LearningRate 0.0453   Epoch: 6   Global Step: 109050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:45:53,921-Speed 3328.14 samples/sec   Loss 2.5958   LearningRate 0.0453   Epoch: 6   Global Step: 109060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:45:56,997-Speed 3329.83 samples/sec   Loss 2.5776   LearningRate 0.0453   Epoch: 6   Global Step: 109070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:00,077-Speed 3325.54 samples/sec   Loss 2.6438   LearningRate 0.0453   Epoch: 6   Global Step: 109080   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:03,152-Speed 3330.62 samples/sec   Loss 2.5538   LearningRate 0.0453   Epoch: 6   Global Step: 109090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:06,229-Speed 3329.23 samples/sec   Loss 2.5422   LearningRate 0.0453   Epoch: 6   Global Step: 109100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:09,303-Speed 3331.59 samples/sec   Loss 2.5880   LearningRate 0.0453   Epoch: 6   Global Step: 109110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:12,383-Speed 3325.38 samples/sec   Loss 2.6207   LearningRate 0.0453   Epoch: 6   Global Step: 109120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:15,461-Speed 3328.67 samples/sec   Loss 2.5892   LearningRate 0.0453   Epoch: 6   Global Step: 109130   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:46:18,512-Speed 3356.10 samples/sec   Loss 2.4855   LearningRate 0.0453   Epoch: 6   Global Step: 109140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:21,606-Speed 3311.14 samples/sec   Loss 2.5293   LearningRate 0.0453   Epoch: 6   Global Step: 109150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:24,682-Speed 3329.45 samples/sec   Loss 2.5844   LearningRate 0.0453   Epoch: 6   Global Step: 109160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:27,769-Speed 3317.39 samples/sec   Loss 2.5475   LearningRate 0.0453   Epoch: 6   Global Step: 109170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:30,831-Speed 3346.11 samples/sec   Loss 2.6314   LearningRate 0.0453   Epoch: 6   Global Step: 109180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:33,890-Speed 3348.27 samples/sec   Loss 2.6013   LearningRate 0.0453   Epoch: 6   Global Step: 109190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:36,969-Speed 3326.82 samples/sec   Loss 2.5894   LearningRate 0.0453   Epoch: 6   Global Step: 109200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:40,032-Speed 3343.11 samples/sec   Loss 2.5398   LearningRate 0.0453   Epoch: 6   Global Step: 109210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:43,101-Speed 3337.99 samples/sec   Loss 2.5679   LearningRate 0.0453   Epoch: 6   Global Step: 109220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:46,163-Speed 3345.38 samples/sec   Loss 2.6106   LearningRate 0.0453   Epoch: 6   Global Step: 109230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:49,216-Speed 3354.79 samples/sec   Loss 2.6444   LearningRate 0.0453   Epoch: 6   Global Step: 109240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:52,278-Speed 3346.84 samples/sec   Loss 2.6340   LearningRate 0.0453   Epoch: 6   Global Step: 109250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:55,352-Speed 3331.57 samples/sec   Loss 2.6612   LearningRate 0.0453   Epoch: 6   Global Step: 109260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:46:58,414-Speed 3345.78 samples/sec   Loss 2.6606   LearningRate 0.0452   Epoch: 6   Global Step: 109270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:47:01,481-Speed 3339.23 samples/sec   Loss 2.5703   LearningRate 0.0452   Epoch: 6   Global Step: 109280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:47:04,542-Speed 3347.11 samples/sec   Loss 2.6480   LearningRate 0.0452   Epoch: 6   Global Step: 109290   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:47:07,679-Speed 3265.22 samples/sec   Loss 2.5939   LearningRate 0.0452   Epoch: 6   Global Step: 109300   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:47:10,755-Speed 3329.55 samples/sec   Loss 2.5691   LearningRate 0.0452   Epoch: 6   Global Step: 109310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:47:13,840-Speed 3320.63 samples/sec   Loss 2.6022   LearningRate 0.0452   Epoch: 6   Global Step: 109320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:47:16,945-Speed 3299.23 samples/sec   Loss 2.6176   LearningRate 0.0452   Epoch: 6   Global Step: 109330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:47:19,990-Speed 3363.56 samples/sec   Loss 2.5702   LearningRate 0.0452   Epoch: 6   Global Step: 109340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:47:23,091-Speed 3302.70 samples/sec   Loss 2.6737   LearningRate 0.0452   Epoch: 6   Global Step: 109350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:47:26,157-Speed 3340.96 samples/sec   Loss 2.6017   LearningRate 0.0452   Epoch: 6   Global Step: 109360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:29,226-Speed 3337.55 samples/sec   Loss 2.6472   LearningRate 0.0452   Epoch: 6   Global Step: 109370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:32,309-Speed 3321.58 samples/sec   Loss 2.5723   LearningRate 0.0452   Epoch: 6   Global Step: 109380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:35,370-Speed 3346.87 samples/sec   Loss 2.6671   LearningRate 0.0452   Epoch: 6   Global Step: 109390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:38,441-Speed 3335.34 samples/sec   Loss 2.5665   LearningRate 0.0452   Epoch: 6   Global Step: 109400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:41,561-Speed 3282.73 samples/sec   Loss 2.5428   LearningRate 0.0452   Epoch: 6   Global Step: 109410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:44,656-Speed 3309.37 samples/sec   Loss 2.5348   LearningRate 0.0452   Epoch: 6   Global Step: 109420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:47,716-Speed 3346.83 samples/sec   Loss 2.5875   LearningRate 0.0452   Epoch: 6   Global Step: 109430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:50,800-Speed 3321.29 samples/sec   Loss 2.5396   LearningRate 0.0452   Epoch: 6   Global Step: 109440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:53,907-Speed 3296.68 samples/sec   Loss 2.6028   LearningRate 0.0452   Epoch: 6   Global Step: 109450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:47:56,972-Speed 3342.30 samples/sec   Loss 2.5860   LearningRate 0.0452   Epoch: 6   Global Step: 109460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:00,070-Speed 3305.99 samples/sec   Loss 2.5527   LearningRate 0.0452   Epoch: 6   Global Step: 109470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:03,153-Speed 3322.31 samples/sec   Loss 2.6409   LearningRate 0.0452   Epoch: 6   Global Step: 109480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:06,247-Speed 3310.25 samples/sec   Loss 2.5900   LearningRate 0.0452   Epoch: 6   Global Step: 109490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:09,341-Speed 3310.86 samples/sec   Loss 2.5405   LearningRate 0.0452   Epoch: 6   Global Step: 109500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:12,424-Speed 3321.79 samples/sec   Loss 2.6201   LearningRate 0.0452   Epoch: 6   Global Step: 109510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:15,496-Speed 3335.44 samples/sec   Loss 2.5771   LearningRate 0.0451   Epoch: 6   Global Step: 109520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:18,573-Speed 3328.30 samples/sec   Loss 2.6087   LearningRate 0.0451   Epoch: 6   Global Step: 109530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:21,656-Speed 3322.17 samples/sec   Loss 2.6243   LearningRate 0.0451   Epoch: 6   Global Step: 109540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:24,816-Speed 3240.88 samples/sec   Loss 2.6137   LearningRate 0.0451   Epoch: 6   Global Step: 109550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:27,949-Speed 3269.56 samples/sec   Loss 2.6197   LearningRate 0.0451   Epoch: 6   Global Step: 109560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:31,023-Speed 3332.59 samples/sec   Loss 2.5212   LearningRate 0.0451   Epoch: 6   Global Step: 109570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:34,150-Speed 3275.53 samples/sec   Loss 2.4898   LearningRate 0.0451   Epoch: 6   Global Step: 109580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:37,260-Speed 3292.91 samples/sec   Loss 2.6055   LearningRate 0.0451   Epoch: 6   Global Step: 109590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:40,329-Speed 3337.33 samples/sec   Loss 2.6371   LearningRate 0.0451   Epoch: 6   Global Step: 109600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:43,411-Speed 3323.92 samples/sec   Loss 2.6039   LearningRate 0.0451   Epoch: 6   Global Step: 109610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:46,492-Speed 3324.63 samples/sec   Loss 2.5707   LearningRate 0.0451   Epoch: 6   Global Step: 109620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:49,618-Speed 3275.86 samples/sec   Loss 2.5464   LearningRate 0.0451   Epoch: 6   Global Step: 109630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:52,695-Speed 3328.91 samples/sec   Loss 2.6157   LearningRate 0.0451   Epoch: 6   Global Step: 109640   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:55,860-Speed 3236.81 samples/sec   Loss 2.5998   LearningRate 0.0451   Epoch: 6   Global Step: 109650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:48:58,975-Speed 3287.77 samples/sec   Loss 2.6908   LearningRate 0.0451   Epoch: 6   Global Step: 109660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:49:02,040-Speed 3341.52 samples/sec   Loss 2.5798   LearningRate 0.0451   Epoch: 6   Global Step: 109670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:49:05,134-Speed 3310.41 samples/sec   Loss 2.5250   LearningRate 0.0451   Epoch: 6   Global Step: 109680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:49:08,198-Speed 3343.21 samples/sec   Loss 2.6062   LearningRate 0.0451   Epoch: 6   Global Step: 109690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:49:11,262-Speed 3343.68 samples/sec   Loss 2.5119   LearningRate 0.0451   Epoch: 6   Global Step: 109700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:49:14,340-Speed 3326.77 samples/sec   Loss 2.5645   LearningRate 0.0451   Epoch: 6   Global Step: 109710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:49:17,408-Speed 3339.74 samples/sec   Loss 2.6414   LearningRate 0.0451   Epoch: 6   Global Step: 109720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:49:20,476-Speed 3338.73 samples/sec   Loss 2.5607   LearningRate 0.0451   Epoch: 6   Global Step: 109730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:49:23,541-Speed 3341.47 samples/sec   Loss 2.6099   LearningRate 0.0451   Epoch: 6   Global Step: 109740   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:26,622-Speed 3324.10 samples/sec   Loss 2.5762   LearningRate 0.0451   Epoch: 6   Global Step: 109750   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:29,728-Speed 3298.28 samples/sec   Loss 2.5854   LearningRate 0.0451   Epoch: 6   Global Step: 109760   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:32,800-Speed 3334.05 samples/sec   Loss 2.5560   LearningRate 0.0450   Epoch: 6   Global Step: 109770   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:35,864-Speed 3342.23 samples/sec   Loss 2.5607   LearningRate 0.0450   Epoch: 6   Global Step: 109780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:38,940-Speed 3329.57 samples/sec   Loss 2.6125   LearningRate 0.0450   Epoch: 6   Global Step: 109790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:42,006-Speed 3340.79 samples/sec   Loss 2.6105   LearningRate 0.0450   Epoch: 6   Global Step: 109800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:45,075-Speed 3338.24 samples/sec   Loss 2.6012   LearningRate 0.0450   Epoch: 6   Global Step: 109810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:48,161-Speed 3319.10 samples/sec   Loss 2.5307   LearningRate 0.0450   Epoch: 6   Global Step: 109820   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:51,240-Speed 3326.81 samples/sec   Loss 2.5242   LearningRate 0.0450   Epoch: 6   Global Step: 109830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:49:54,311-Speed 3335.02 samples/sec   Loss 2.5467   LearningRate 0.0450   Epoch: 6   Global Step: 109840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:49:57,381-Speed 3336.17 samples/sec   Loss 2.5857   LearningRate 0.0450   Epoch: 6   Global Step: 109850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:00,460-Speed 3326.35 samples/sec   Loss 2.5460   LearningRate 0.0450   Epoch: 6   Global Step: 109860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:03,527-Speed 3339.77 samples/sec   Loss 2.5870   LearningRate 0.0450   Epoch: 6   Global Step: 109870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:06,594-Speed 3339.40 samples/sec   Loss 2.6017   LearningRate 0.0450   Epoch: 6   Global Step: 109880   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:09,704-Speed 3294.19 samples/sec   Loss 2.4913   LearningRate 0.0450   Epoch: 6   Global Step: 109890   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:12,766-Speed 3345.16 samples/sec   Loss 2.5391   LearningRate 0.0450   Epoch: 6   Global Step: 109900   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:15,846-Speed 3325.52 samples/sec   Loss 2.6053   LearningRate 0.0450   Epoch: 6   Global Step: 109910   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:18,913-Speed 3339.64 samples/sec   Loss 2.5483   LearningRate 0.0450   Epoch: 6   Global Step: 109920   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:21,988-Speed 3330.42 samples/sec   Loss 2.5367   LearningRate 0.0450   Epoch: 6   Global Step: 109930   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:25,055-Speed 3339.48 samples/sec   Loss 2.5564   LearningRate 0.0450   Epoch: 6   Global Step: 109940   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:28,120-Speed 3342.30 samples/sec   Loss 2.5833   LearningRate 0.0450   Epoch: 6   Global Step: 109950   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:31,194-Speed 3332.52 samples/sec   Loss 2.5956   LearningRate 0.0450   Epoch: 6   Global Step: 109960   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:34,257-Speed 3343.37 samples/sec   Loss 2.5513   LearningRate 0.0450   Epoch: 6   Global Step: 109970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:37,336-Speed 3327.06 samples/sec   Loss 2.6235   LearningRate 0.0450   Epoch: 6   Global Step: 109980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:40,424-Speed 3317.15 samples/sec   Loss 2.6605   LearningRate 0.0450   Epoch: 6   Global Step: 109990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:50:43,506-Speed 3322.80 samples/sec   Loss 2.6382   LearningRate 0.0450   Epoch: 6   Global Step: 110000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:51:28,214-[lfw][110000]XNorm: 21.490594
Training: 2022-04-11 10:51:28,214-[lfw][110000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 10:51:28,215-[lfw][110000]Accuracy-Highest: 0.99817
Training: 2022-04-11 10:52:19,693-[cfp_fp][110000]XNorm: 20.363431
Training: 2022-04-11 10:52:19,694-[cfp_fp][110000]Accuracy-Flip: 0.98443+-0.00624
Training: 2022-04-11 10:52:19,694-[cfp_fp][110000]Accuracy-Highest: 0.98614
Training: 2022-04-11 10:53:03,922-[agedb_30][110000]XNorm: 21.997844
Training: 2022-04-11 10:53:03,922-[agedb_30][110000]Accuracy-Flip: 0.98183+-0.00709
Training: 2022-04-11 10:53:03,923-[agedb_30][110000]Accuracy-Highest: 0.98250
Training: 2022-04-11 10:53:06,989-Speed 71.37 samples/sec   Loss 2.5894   LearningRate 0.0450   Epoch: 6   Global Step: 110010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:53:10,110-Speed 3282.46 samples/sec   Loss 2.5905   LearningRate 0.0449   Epoch: 6   Global Step: 110020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:53:13,163-Speed 3354.47 samples/sec   Loss 2.6254   LearningRate 0.0449   Epoch: 6   Global Step: 110030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:53:16,242-Speed 3326.76 samples/sec   Loss 2.6100   LearningRate 0.0449   Epoch: 6   Global Step: 110040   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:53:19,363-Speed 3281.85 samples/sec   Loss 2.5666   LearningRate 0.0449   Epoch: 6   Global Step: 110050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:53:22,435-Speed 3333.87 samples/sec   Loss 2.6044   LearningRate 0.0449   Epoch: 6   Global Step: 110060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:53:25,570-Speed 3267.14 samples/sec   Loss 2.5658   LearningRate 0.0449   Epoch: 6   Global Step: 110070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:53:28,680-Speed 3294.28 samples/sec   Loss 2.5750   LearningRate 0.0449   Epoch: 6   Global Step: 110080   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:53:31,733-Speed 3355.01 samples/sec   Loss 2.6272   LearningRate 0.0449   Epoch: 6   Global Step: 110090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:53:34,790-Speed 3350.38 samples/sec   Loss 2.6222   LearningRate 0.0449   Epoch: 6   Global Step: 110100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:53:37,898-Speed 3295.54 samples/sec   Loss 2.5956   LearningRate 0.0449   Epoch: 6   Global Step: 110110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:53:40,998-Speed 3303.48 samples/sec   Loss 2.5959   LearningRate 0.0449   Epoch: 6   Global Step: 110120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:53:44,081-Speed 3322.89 samples/sec   Loss 2.6438   LearningRate 0.0449   Epoch: 6   Global Step: 110130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:53:47,202-Speed 3281.93 samples/sec   Loss 2.5962   LearningRate 0.0449   Epoch: 6   Global Step: 110140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:53:50,271-Speed 3337.58 samples/sec   Loss 2.5184   LearningRate 0.0449   Epoch: 6   Global Step: 110150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:53:53,368-Speed 3307.16 samples/sec   Loss 2.5667   LearningRate 0.0449   Epoch: 6   Global Step: 110160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:53:56,456-Speed 3317.18 samples/sec   Loss 2.5353   LearningRate 0.0449   Epoch: 6   Global Step: 110170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:53:59,529-Speed 3332.81 samples/sec   Loss 2.5487   LearningRate 0.0449   Epoch: 6   Global Step: 110180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:02,586-Speed 3350.10 samples/sec   Loss 2.5803   LearningRate 0.0449   Epoch: 6   Global Step: 110190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:54:05,664-Speed 3328.32 samples/sec   Loss 2.5471   LearningRate 0.0449   Epoch: 6   Global Step: 110200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:54:08,770-Speed 3297.44 samples/sec   Loss 2.6268   LearningRate 0.0449   Epoch: 6   Global Step: 110210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:54:11,830-Speed 3347.36 samples/sec   Loss 2.5669   LearningRate 0.0449   Epoch: 6   Global Step: 110220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:54:14,889-Speed 3347.70 samples/sec   Loss 2.5837   LearningRate 0.0449   Epoch: 6   Global Step: 110230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:54:17,952-Speed 3344.91 samples/sec   Loss 2.5605   LearningRate 0.0449   Epoch: 6   Global Step: 110240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:54:21,030-Speed 3327.18 samples/sec   Loss 2.5898   LearningRate 0.0449   Epoch: 6   Global Step: 110250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:54:24,075-Speed 3363.95 samples/sec   Loss 2.5773   LearningRate 0.0449   Epoch: 6   Global Step: 110260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:27,167-Speed 3311.85 samples/sec   Loss 2.5748   LearningRate 0.0448   Epoch: 6   Global Step: 110270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:30,243-Speed 3330.64 samples/sec   Loss 2.5550   LearningRate 0.0448   Epoch: 6   Global Step: 110280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:33,298-Speed 3352.55 samples/sec   Loss 2.6160   LearningRate 0.0448   Epoch: 6   Global Step: 110290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:36,361-Speed 3344.02 samples/sec   Loss 2.5694   LearningRate 0.0448   Epoch: 6   Global Step: 110300   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:39,420-Speed 3347.80 samples/sec   Loss 2.6415   LearningRate 0.0448   Epoch: 6   Global Step: 110310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:42,481-Speed 3345.88 samples/sec   Loss 2.5636   LearningRate 0.0448   Epoch: 6   Global Step: 110320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:45,543-Speed 3344.80 samples/sec   Loss 2.6405   LearningRate 0.0448   Epoch: 6   Global Step: 110330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:48,614-Speed 3335.80 samples/sec   Loss 2.5907   LearningRate 0.0448   Epoch: 6   Global Step: 110340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:51,677-Speed 3343.72 samples/sec   Loss 2.5752   LearningRate 0.0448   Epoch: 6   Global Step: 110350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:54:54,739-Speed 3345.48 samples/sec   Loss 2.5530   LearningRate 0.0448   Epoch: 6   Global Step: 110360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:54:57,799-Speed 3346.66 samples/sec   Loss 2.5390   LearningRate 0.0448   Epoch: 6   Global Step: 110370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:00,858-Speed 3348.90 samples/sec   Loss 2.6664   LearningRate 0.0448   Epoch: 6   Global Step: 110380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:03,919-Speed 3345.43 samples/sec   Loss 2.6183   LearningRate 0.0448   Epoch: 6   Global Step: 110390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:06,994-Speed 3331.07 samples/sec   Loss 2.6205   LearningRate 0.0448   Epoch: 6   Global Step: 110400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:10,079-Speed 3319.84 samples/sec   Loss 2.6010   LearningRate 0.0448   Epoch: 6   Global Step: 110410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:13,144-Speed 3342.36 samples/sec   Loss 2.6236   LearningRate 0.0448   Epoch: 6   Global Step: 110420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:16,200-Speed 3351.24 samples/sec   Loss 2.6240   LearningRate 0.0448   Epoch: 6   Global Step: 110430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:19,283-Speed 3322.74 samples/sec   Loss 2.6335   LearningRate 0.0448   Epoch: 6   Global Step: 110440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:22,345-Speed 3344.97 samples/sec   Loss 2.6729   LearningRate 0.0448   Epoch: 6   Global Step: 110450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:25,390-Speed 3363.50 samples/sec   Loss 2.6101   LearningRate 0.0448   Epoch: 6   Global Step: 110460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:28,450-Speed 3346.70 samples/sec   Loss 2.4710   LearningRate 0.0448   Epoch: 6   Global Step: 110470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:55:31,493-Speed 3365.86 samples/sec   Loss 2.4312   LearningRate 0.0448   Epoch: 6   Global Step: 110480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:55:34,560-Speed 3340.03 samples/sec   Loss 2.6760   LearningRate 0.0448   Epoch: 6   Global Step: 110490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:55:37,656-Speed 3308.42 samples/sec   Loss 2.6233   LearningRate 0.0448   Epoch: 6   Global Step: 110500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:55:40,782-Speed 3276.69 samples/sec   Loss 2.6117   LearningRate 0.0447   Epoch: 6   Global Step: 110510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:55:43,885-Speed 3300.57 samples/sec   Loss 2.5311   LearningRate 0.0447   Epoch: 6   Global Step: 110520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:55:46,971-Speed 3318.88 samples/sec   Loss 2.5113   LearningRate 0.0447   Epoch: 6   Global Step: 110530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:55:50,065-Speed 3311.02 samples/sec   Loss 2.5710   LearningRate 0.0447   Epoch: 6   Global Step: 110540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:55:53,136-Speed 3334.49 samples/sec   Loss 2.5786   LearningRate 0.0447   Epoch: 6   Global Step: 110550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:55:56,198-Speed 3345.82 samples/sec   Loss 2.6389   LearningRate 0.0447   Epoch: 6   Global Step: 110560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:55:59,314-Speed 3287.27 samples/sec   Loss 2.5904   LearningRate 0.0447   Epoch: 6   Global Step: 110570   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:02,372-Speed 3348.83 samples/sec   Loss 2.5656   LearningRate 0.0447   Epoch: 6   Global Step: 110580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:05,439-Speed 3340.06 samples/sec   Loss 2.5300   LearningRate 0.0447   Epoch: 6   Global Step: 110590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:08,550-Speed 3292.01 samples/sec   Loss 2.6131   LearningRate 0.0447   Epoch: 6   Global Step: 110600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:11,672-Speed 3280.79 samples/sec   Loss 2.5575   LearningRate 0.0447   Epoch: 6   Global Step: 110610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:14,796-Speed 3278.79 samples/sec   Loss 2.5854   LearningRate 0.0447   Epoch: 6   Global Step: 110620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:17,866-Speed 3337.19 samples/sec   Loss 2.6373   LearningRate 0.0447   Epoch: 6   Global Step: 110630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:20,958-Speed 3312.46 samples/sec   Loss 2.6145   LearningRate 0.0447   Epoch: 6   Global Step: 110640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:24,051-Speed 3311.18 samples/sec   Loss 2.5603   LearningRate 0.0447   Epoch: 6   Global Step: 110650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:27,110-Speed 3348.83 samples/sec   Loss 2.5547   LearningRate 0.0447   Epoch: 6   Global Step: 110660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:30,169-Speed 3348.09 samples/sec   Loss 2.5627   LearningRate 0.0447   Epoch: 6   Global Step: 110670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:56:33,241-Speed 3334.23 samples/sec   Loss 2.5131   LearningRate 0.0447   Epoch: 6   Global Step: 110680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:56:36,328-Speed 3317.79 samples/sec   Loss 2.5689   LearningRate 0.0447   Epoch: 6   Global Step: 110690   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:56:39,395-Speed 3339.65 samples/sec   Loss 2.5562   LearningRate 0.0447   Epoch: 6   Global Step: 110700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:56:42,490-Speed 3309.46 samples/sec   Loss 2.4550   LearningRate 0.0447   Epoch: 6   Global Step: 110710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:56:45,565-Speed 3331.16 samples/sec   Loss 2.5024   LearningRate 0.0447   Epoch: 6   Global Step: 110720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:56:48,633-Speed 3337.79 samples/sec   Loss 2.5981   LearningRate 0.0447   Epoch: 6   Global Step: 110730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:56:51,696-Speed 3344.92 samples/sec   Loss 2.6624   LearningRate 0.0447   Epoch: 6   Global Step: 110740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:56:54,758-Speed 3344.84 samples/sec   Loss 2.5896   LearningRate 0.0447   Epoch: 6   Global Step: 110750   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:56:57,870-Speed 3291.51 samples/sec   Loss 2.6149   LearningRate 0.0446   Epoch: 6   Global Step: 110760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:57:00,950-Speed 3326.17 samples/sec   Loss 2.6008   LearningRate 0.0446   Epoch: 6   Global Step: 110770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:57:04,017-Speed 3338.63 samples/sec   Loss 2.5469   LearningRate 0.0446   Epoch: 6   Global Step: 110780   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-11 10:57:07,076-Speed 3349.03 samples/sec   Loss 2.6140   LearningRate 0.0446   Epoch: 6   Global Step: 110790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:57:10,180-Speed 3299.67 samples/sec   Loss 2.4931   LearningRate 0.0446   Epoch: 6   Global Step: 110800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:57:13,240-Speed 3346.84 samples/sec   Loss 2.6222   LearningRate 0.0446   Epoch: 6   Global Step: 110810   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:57:16,307-Speed 3340.70 samples/sec   Loss 2.5930   LearningRate 0.0446   Epoch: 6   Global Step: 110820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:57:19,382-Speed 3330.48 samples/sec   Loss 2.5366   LearningRate 0.0446   Epoch: 6   Global Step: 110830   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:22,480-Speed 3305.78 samples/sec   Loss 2.6086   LearningRate 0.0446   Epoch: 6   Global Step: 110840   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:25,569-Speed 3316.75 samples/sec   Loss 2.5006   LearningRate 0.0446   Epoch: 6   Global Step: 110850   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:28,634-Speed 3341.64 samples/sec   Loss 2.6147   LearningRate 0.0446   Epoch: 6   Global Step: 110860   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:31,732-Speed 3305.57 samples/sec   Loss 2.5601   LearningRate 0.0446   Epoch: 6   Global Step: 110870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:34,796-Speed 3343.92 samples/sec   Loss 2.5920   LearningRate 0.0446   Epoch: 6   Global Step: 110880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:37,868-Speed 3334.50 samples/sec   Loss 2.6303   LearningRate 0.0446   Epoch: 6   Global Step: 110890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:40,927-Speed 3347.69 samples/sec   Loss 2.6082   LearningRate 0.0446   Epoch: 6   Global Step: 110900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:43,987-Speed 3347.25 samples/sec   Loss 2.6585   LearningRate 0.0446   Epoch: 6   Global Step: 110910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:47,060-Speed 3333.31 samples/sec   Loss 2.5697   LearningRate 0.0446   Epoch: 6   Global Step: 110920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:50,122-Speed 3345.63 samples/sec   Loss 2.5391   LearningRate 0.0446   Epoch: 6   Global Step: 110930   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:57:53,178-Speed 3352.03 samples/sec   Loss 2.6145   LearningRate 0.0446   Epoch: 6   Global Step: 110940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:56,242-Speed 3342.15 samples/sec   Loss 2.5242   LearningRate 0.0446   Epoch: 6   Global Step: 110950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:57:59,326-Speed 3321.98 samples/sec   Loss 2.5509   LearningRate 0.0446   Epoch: 6   Global Step: 110960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:02,391-Speed 3341.70 samples/sec   Loss 2.5750   LearningRate 0.0446   Epoch: 6   Global Step: 110970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:05,452-Speed 3345.73 samples/sec   Loss 2.6479   LearningRate 0.0446   Epoch: 6   Global Step: 110980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:08,543-Speed 3313.65 samples/sec   Loss 2.5165   LearningRate 0.0446   Epoch: 6   Global Step: 110990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:11,603-Speed 3347.79 samples/sec   Loss 2.6050   LearningRate 0.0446   Epoch: 6   Global Step: 111000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:14,666-Speed 3344.40 samples/sec   Loss 2.5815   LearningRate 0.0445   Epoch: 6   Global Step: 111010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:17,746-Speed 3324.52 samples/sec   Loss 2.5547   LearningRate 0.0445   Epoch: 6   Global Step: 111020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:20,803-Speed 3351.17 samples/sec   Loss 2.6370   LearningRate 0.0445   Epoch: 6   Global Step: 111030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:23,882-Speed 3326.83 samples/sec   Loss 2.6672   LearningRate 0.0445   Epoch: 6   Global Step: 111040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:58:26,963-Speed 3324.99 samples/sec   Loss 2.5590   LearningRate 0.0445   Epoch: 6   Global Step: 111050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-11 10:58:30,014-Speed 3357.35 samples/sec   Loss 2.5476   LearningRate 0.0445   Epoch: 6   Global Step: 111060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:33,075-Speed 3345.88 samples/sec   Loss 2.5820   LearningRate 0.0445   Epoch: 6   Global Step: 111070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:36,196-Speed 3281.80 samples/sec   Loss 2.6082   LearningRate 0.0445   Epoch: 6   Global Step: 111080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:39,256-Speed 3347.27 samples/sec   Loss 2.6036   LearningRate 0.0445   Epoch: 6   Global Step: 111090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:42,369-Speed 3290.15 samples/sec   Loss 2.6463   LearningRate 0.0445   Epoch: 6   Global Step: 111100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:45,508-Speed 3263.08 samples/sec   Loss 2.5698   LearningRate 0.0445   Epoch: 6   Global Step: 111110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-11 10:58:48,572-Speed 3342.86 samples/sec   Loss 2.6588   LearningRate 0.0445   Epoch: 6   Global Step: 111120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 10:58:51,641-Speed 3338.58 samples/sec   Loss 2.6052   LearningRate 0.0445   Epoch: 6   Global Step: 111130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 10:58:54,736-Speed 3309.20 samples/sec   Loss 2.5383   LearningRate 0.0445   Epoch: 6   Global Step: 111140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 10:58:57,803-Speed 3340.29 samples/sec   Loss 2.5351   LearningRate 0.0445   Epoch: 6   Global Step: 111150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 10:59:00,866-Speed 3343.89 samples/sec   Loss 2.5993   LearningRate 0.0445   Epoch: 6   Global Step: 111160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:03,952-Speed 3319.63 samples/sec   Loss 2.5605   LearningRate 0.0445   Epoch: 6   Global Step: 111170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:07,011-Speed 3347.41 samples/sec   Loss 2.4873   LearningRate 0.0445   Epoch: 6   Global Step: 111180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:10,081-Speed 3336.36 samples/sec   Loss 2.5715   LearningRate 0.0445   Epoch: 6   Global Step: 111190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:13,172-Speed 3314.03 samples/sec   Loss 2.6054   LearningRate 0.0445   Epoch: 6   Global Step: 111200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:16,269-Speed 3307.67 samples/sec   Loss 2.5273   LearningRate 0.0445   Epoch: 6   Global Step: 111210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:19,382-Speed 3290.15 samples/sec   Loss 2.5828   LearningRate 0.0445   Epoch: 6   Global Step: 111220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:22,449-Speed 3339.91 samples/sec   Loss 2.6383   LearningRate 0.0445   Epoch: 6   Global Step: 111230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:25,601-Speed 3248.98 samples/sec   Loss 2.5142   LearningRate 0.0445   Epoch: 6   Global Step: 111240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:28,676-Speed 3331.31 samples/sec   Loss 2.6141   LearningRate 0.0445   Epoch: 6   Global Step: 111250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:31,794-Speed 3284.95 samples/sec   Loss 2.5952   LearningRate 0.0444   Epoch: 6   Global Step: 111260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:34,879-Speed 3319.91 samples/sec   Loss 2.5801   LearningRate 0.0444   Epoch: 6   Global Step: 111270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:38,012-Speed 3269.03 samples/sec   Loss 2.5271   LearningRate 0.0444   Epoch: 6   Global Step: 111280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:41,088-Speed 3330.26 samples/sec   Loss 2.6059   LearningRate 0.0444   Epoch: 6   Global Step: 111290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:44,169-Speed 3323.98 samples/sec   Loss 2.5698   LearningRate 0.0444   Epoch: 6   Global Step: 111300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:47,249-Speed 3325.81 samples/sec   Loss 2.6849   LearningRate 0.0444   Epoch: 6   Global Step: 111310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:50,310-Speed 3345.88 samples/sec   Loss 2.6049   LearningRate 0.0444   Epoch: 6   Global Step: 111320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:53,390-Speed 3325.98 samples/sec   Loss 2.6286   LearningRate 0.0444   Epoch: 6   Global Step: 111330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:56,465-Speed 3330.93 samples/sec   Loss 2.5987   LearningRate 0.0444   Epoch: 6   Global Step: 111340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 10:59:59,543-Speed 3328.05 samples/sec   Loss 2.5738   LearningRate 0.0444   Epoch: 6   Global Step: 111350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:02,687-Speed 3258.13 samples/sec   Loss 2.6324   LearningRate 0.0444   Epoch: 6   Global Step: 111360   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:00:05,759-Speed 3333.37 samples/sec   Loss 2.5926   LearningRate 0.0444   Epoch: 6   Global Step: 111370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:08,853-Speed 3311.07 samples/sec   Loss 2.5747   LearningRate 0.0444   Epoch: 6   Global Step: 111380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:11,922-Speed 3337.57 samples/sec   Loss 2.5825   LearningRate 0.0444   Epoch: 6   Global Step: 111390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:14,995-Speed 3333.07 samples/sec   Loss 2.5985   LearningRate 0.0444   Epoch: 6   Global Step: 111400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:18,057-Speed 3344.92 samples/sec   Loss 2.5198   LearningRate 0.0444   Epoch: 6   Global Step: 111410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:21,120-Speed 3344.20 samples/sec   Loss 2.5752   LearningRate 0.0444   Epoch: 6   Global Step: 111420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:24,191-Speed 3335.40 samples/sec   Loss 2.5900   LearningRate 0.0444   Epoch: 6   Global Step: 111430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:27,270-Speed 3326.08 samples/sec   Loss 2.5761   LearningRate 0.0444   Epoch: 6   Global Step: 111440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:30,340-Speed 3335.98 samples/sec   Loss 2.5531   LearningRate 0.0444   Epoch: 6   Global Step: 111450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:33,431-Speed 3313.44 samples/sec   Loss 2.6167   LearningRate 0.0444   Epoch: 6   Global Step: 111460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:36,484-Speed 3355.35 samples/sec   Loss 2.5879   LearningRate 0.0444   Epoch: 6   Global Step: 111470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:39,587-Speed 3301.34 samples/sec   Loss 2.5883   LearningRate 0.0444   Epoch: 6   Global Step: 111480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:42,652-Speed 3342.29 samples/sec   Loss 2.6458   LearningRate 0.0444   Epoch: 6   Global Step: 111490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:45,731-Speed 3326.09 samples/sec   Loss 2.6103   LearningRate 0.0444   Epoch: 6   Global Step: 111500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:48,808-Speed 3328.39 samples/sec   Loss 2.6141   LearningRate 0.0443   Epoch: 6   Global Step: 111510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:51,885-Speed 3329.96 samples/sec   Loss 2.4997   LearningRate 0.0443   Epoch: 6   Global Step: 111520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:54,950-Speed 3341.44 samples/sec   Loss 2.5976   LearningRate 0.0443   Epoch: 6   Global Step: 111530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:00:58,037-Speed 3318.04 samples/sec   Loss 2.6670   LearningRate 0.0443   Epoch: 6   Global Step: 111540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:01:01,127-Speed 3314.51 samples/sec   Loss 2.5168   LearningRate 0.0443   Epoch: 6   Global Step: 111550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:01:04,213-Speed 3319.76 samples/sec   Loss 2.5543   LearningRate 0.0443   Epoch: 6   Global Step: 111560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:01:07,285-Speed 3333.08 samples/sec   Loss 2.5495   LearningRate 0.0443   Epoch: 6   Global Step: 111570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:01:10,480-Speed 3206.28 samples/sec   Loss 2.5639   LearningRate 0.0443   Epoch: 6   Global Step: 111580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:01:13,546-Speed 3340.96 samples/sec   Loss 2.5374   LearningRate 0.0443   Epoch: 6   Global Step: 111590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:01:16,631-Speed 3319.87 samples/sec   Loss 2.5858   LearningRate 0.0443   Epoch: 6   Global Step: 111600   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:19,711-Speed 3324.95 samples/sec   Loss 2.5344   LearningRate 0.0443   Epoch: 6   Global Step: 111610   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:22,777-Speed 3341.51 samples/sec   Loss 2.5942   LearningRate 0.0443   Epoch: 6   Global Step: 111620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:25,867-Speed 3313.83 samples/sec   Loss 2.5463   LearningRate 0.0443   Epoch: 6   Global Step: 111630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:28,955-Speed 3317.32 samples/sec   Loss 2.5561   LearningRate 0.0443   Epoch: 6   Global Step: 111640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:32,042-Speed 3317.47 samples/sec   Loss 2.5043   LearningRate 0.0443   Epoch: 6   Global Step: 111650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:35,126-Speed 3322.12 samples/sec   Loss 2.6623   LearningRate 0.0443   Epoch: 6   Global Step: 111660   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:38,231-Speed 3298.79 samples/sec   Loss 2.4947   LearningRate 0.0443   Epoch: 6   Global Step: 111670   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:41,302-Speed 3334.53 samples/sec   Loss 2.5397   LearningRate 0.0443   Epoch: 6   Global Step: 111680   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:44,426-Speed 3278.99 samples/sec   Loss 2.5331   LearningRate 0.0443   Epoch: 6   Global Step: 111690   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:01:47,510-Speed 3321.03 samples/sec   Loss 2.6294   LearningRate 0.0443   Epoch: 6   Global Step: 111700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:01:50,583-Speed 3333.09 samples/sec   Loss 2.5619   LearningRate 0.0443   Epoch: 6   Global Step: 111710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:01:53,754-Speed 3229.99 samples/sec   Loss 2.5388   LearningRate 0.0443   Epoch: 6   Global Step: 111720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:01:56,941-Speed 3213.81 samples/sec   Loss 2.5546   LearningRate 0.0443   Epoch: 6   Global Step: 111730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:00,006-Speed 3342.67 samples/sec   Loss 2.5736   LearningRate 0.0443   Epoch: 6   Global Step: 111740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:03,156-Speed 3251.16 samples/sec   Loss 2.4905   LearningRate 0.0443   Epoch: 6   Global Step: 111750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:06,233-Speed 3329.03 samples/sec   Loss 2.5547   LearningRate 0.0443   Epoch: 6   Global Step: 111760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:09,324-Speed 3313.44 samples/sec   Loss 2.6173   LearningRate 0.0442   Epoch: 6   Global Step: 111770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:12,394-Speed 3336.96 samples/sec   Loss 2.5135   LearningRate 0.0442   Epoch: 6   Global Step: 111780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:15,479-Speed 3320.46 samples/sec   Loss 2.5524   LearningRate 0.0442   Epoch: 6   Global Step: 111790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:18,596-Speed 3286.50 samples/sec   Loss 2.5781   LearningRate 0.0442   Epoch: 6   Global Step: 111800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:02:21,673-Speed 3327.87 samples/sec   Loss 2.6093   LearningRate 0.0442   Epoch: 6   Global Step: 111810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:02:24,745-Speed 3334.38 samples/sec   Loss 2.5554   LearningRate 0.0442   Epoch: 6   Global Step: 111820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:02:27,827-Speed 3323.61 samples/sec   Loss 2.5061   LearningRate 0.0442   Epoch: 6   Global Step: 111830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:02:30,896-Speed 3338.67 samples/sec   Loss 2.5754   LearningRate 0.0442   Epoch: 6   Global Step: 111840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:02:33,985-Speed 3315.19 samples/sec   Loss 2.6085   LearningRate 0.0442   Epoch: 6   Global Step: 111850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:02:37,064-Speed 3327.14 samples/sec   Loss 2.5722   LearningRate 0.0442   Epoch: 6   Global Step: 111860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:02:40,157-Speed 3311.62 samples/sec   Loss 2.6412   LearningRate 0.0442   Epoch: 6   Global Step: 111870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:02:43,223-Speed 3340.30 samples/sec   Loss 2.6085   LearningRate 0.0442   Epoch: 6   Global Step: 111880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:02:46,304-Speed 3324.41 samples/sec   Loss 2.5989   LearningRate 0.0442   Epoch: 6   Global Step: 111890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:49,418-Speed 3289.35 samples/sec   Loss 2.5625   LearningRate 0.0442   Epoch: 6   Global Step: 111900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:52,521-Speed 3301.69 samples/sec   Loss 2.5110   LearningRate 0.0442   Epoch: 6   Global Step: 111910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:55,621-Speed 3303.98 samples/sec   Loss 2.5920   LearningRate 0.0442   Epoch: 6   Global Step: 111920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:02:58,696-Speed 3329.97 samples/sec   Loss 2.5784   LearningRate 0.0442   Epoch: 6   Global Step: 111930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:03:01,770-Speed 3332.59 samples/sec   Loss 2.5498   LearningRate 0.0442   Epoch: 6   Global Step: 111940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:03:04,838-Speed 3338.08 samples/sec   Loss 2.6332   LearningRate 0.0442   Epoch: 6   Global Step: 111950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:03:07,903-Speed 3342.31 samples/sec   Loss 2.6417   LearningRate 0.0442   Epoch: 6   Global Step: 111960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:03:10,978-Speed 3331.08 samples/sec   Loss 2.5351   LearningRate 0.0442   Epoch: 6   Global Step: 111970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:03:14,062-Speed 3320.65 samples/sec   Loss 2.5822   LearningRate 0.0442   Epoch: 6   Global Step: 111980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:03:17,145-Speed 3322.33 samples/sec   Loss 2.5820   LearningRate 0.0442   Epoch: 6   Global Step: 111990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:03:20,218-Speed 3333.78 samples/sec   Loss 2.6241   LearningRate 0.0442   Epoch: 6   Global Step: 112000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:04:04,559-[lfw][112000]XNorm: 23.996880
Training: 2022-04-11 11:04:04,559-[lfw][112000]Accuracy-Flip: 0.99783+-0.00299
Training: 2022-04-11 11:04:04,560-[lfw][112000]Accuracy-Highest: 0.99817
Training: 2022-04-11 11:04:55,943-[cfp_fp][112000]XNorm: 22.893282
Training: 2022-04-11 11:04:55,944-[cfp_fp][112000]Accuracy-Flip: 0.98500+-0.00448
Training: 2022-04-11 11:04:55,944-[cfp_fp][112000]Accuracy-Highest: 0.98614
Training: 2022-04-11 11:05:39,894-[agedb_30][112000]XNorm: 24.522085
Training: 2022-04-11 11:05:39,894-[agedb_30][112000]Accuracy-Flip: 0.98150+-0.00769
Training: 2022-04-11 11:05:39,895-[agedb_30][112000]Accuracy-Highest: 0.98250
Training: 2022-04-11 11:05:42,959-Speed 71.74 samples/sec   Loss 2.5706   LearningRate 0.0442   Epoch: 6   Global Step: 112010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:05:46,028-Speed 3336.72 samples/sec   Loss 2.5447   LearningRate 0.0441   Epoch: 6   Global Step: 112020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:05:49,139-Speed 3292.52 samples/sec   Loss 2.5663   LearningRate 0.0441   Epoch: 6   Global Step: 112030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:05:52,191-Speed 3356.34 samples/sec   Loss 2.6168   LearningRate 0.0441   Epoch: 6   Global Step: 112040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:05:55,265-Speed 3331.62 samples/sec   Loss 2.6200   LearningRate 0.0441   Epoch: 6   Global Step: 112050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:05:58,334-Speed 3337.25 samples/sec   Loss 2.6064   LearningRate 0.0441   Epoch: 6   Global Step: 112060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:06:01,523-Speed 3212.62 samples/sec   Loss 2.5713   LearningRate 0.0441   Epoch: 6   Global Step: 112070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:06:04,649-Speed 3277.35 samples/sec   Loss 2.6140   LearningRate 0.0441   Epoch: 6   Global Step: 112080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:06:07,712-Speed 3343.15 samples/sec   Loss 2.6298   LearningRate 0.0441   Epoch: 6   Global Step: 112090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:10,783-Speed 3335.51 samples/sec   Loss 2.5458   LearningRate 0.0441   Epoch: 6   Global Step: 112100   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:13,840-Speed 3350.98 samples/sec   Loss 2.5550   LearningRate 0.0441   Epoch: 6   Global Step: 112110   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:16,920-Speed 3325.69 samples/sec   Loss 2.4864   LearningRate 0.0441   Epoch: 6   Global Step: 112120   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:20,020-Speed 3302.97 samples/sec   Loss 2.5551   LearningRate 0.0441   Epoch: 6   Global Step: 112130   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:23,118-Speed 3306.80 samples/sec   Loss 2.5935   LearningRate 0.0441   Epoch: 6   Global Step: 112140   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:26,202-Speed 3321.51 samples/sec   Loss 2.6512   LearningRate 0.0441   Epoch: 6   Global Step: 112150   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:29,280-Speed 3327.26 samples/sec   Loss 2.5924   LearningRate 0.0441   Epoch: 6   Global Step: 112160   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:32,381-Speed 3303.32 samples/sec   Loss 2.5939   LearningRate 0.0441   Epoch: 6   Global Step: 112170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:35,455-Speed 3331.11 samples/sec   Loss 2.5099   LearningRate 0.0441   Epoch: 6   Global Step: 112180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:06:38,531-Speed 3330.24 samples/sec   Loss 2.5026   LearningRate 0.0441   Epoch: 6   Global Step: 112190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:06:41,613-Speed 3322.95 samples/sec   Loss 2.5099   LearningRate 0.0441   Epoch: 6   Global Step: 112200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:06:44,692-Speed 3327.10 samples/sec   Loss 2.6092   LearningRate 0.0441   Epoch: 6   Global Step: 112210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:06:47,772-Speed 3325.81 samples/sec   Loss 2.5111   LearningRate 0.0441   Epoch: 6   Global Step: 112220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:06:50,845-Speed 3332.05 samples/sec   Loss 2.5721   LearningRate 0.0441   Epoch: 6   Global Step: 112230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:06:53,922-Speed 3328.93 samples/sec   Loss 2.5447   LearningRate 0.0441   Epoch: 6   Global Step: 112240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:06:57,013-Speed 3314.21 samples/sec   Loss 2.6045   LearningRate 0.0441   Epoch: 6   Global Step: 112250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:00,102-Speed 3315.60 samples/sec   Loss 2.6237   LearningRate 0.0441   Epoch: 6   Global Step: 112260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:03,192-Speed 3313.80 samples/sec   Loss 2.5754   LearningRate 0.0440   Epoch: 6   Global Step: 112270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:06,250-Speed 3349.54 samples/sec   Loss 2.6251   LearningRate 0.0440   Epoch: 6   Global Step: 112280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:09,312-Speed 3345.72 samples/sec   Loss 2.5659   LearningRate 0.0440   Epoch: 6   Global Step: 112290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:07:12,379-Speed 3339.62 samples/sec   Loss 2.6009   LearningRate 0.0440   Epoch: 6   Global Step: 112300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:07:15,456-Speed 3328.65 samples/sec   Loss 2.5387   LearningRate 0.0440   Epoch: 6   Global Step: 112310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:07:18,539-Speed 3321.96 samples/sec   Loss 2.5793   LearningRate 0.0440   Epoch: 6   Global Step: 112320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:07:21,604-Speed 3342.22 samples/sec   Loss 2.5817   LearningRate 0.0440   Epoch: 6   Global Step: 112330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:07:24,688-Speed 3321.61 samples/sec   Loss 2.5979   LearningRate 0.0440   Epoch: 6   Global Step: 112340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:07:27,824-Speed 3265.81 samples/sec   Loss 2.5729   LearningRate 0.0440   Epoch: 6   Global Step: 112350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:07:30,894-Speed 3335.88 samples/sec   Loss 2.5537   LearningRate 0.0440   Epoch: 6   Global Step: 112360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:07:33,989-Speed 3310.24 samples/sec   Loss 2.5416   LearningRate 0.0440   Epoch: 6   Global Step: 112370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:37,072-Speed 3321.80 samples/sec   Loss 2.5325   LearningRate 0.0440   Epoch: 6   Global Step: 112380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:40,150-Speed 3327.86 samples/sec   Loss 2.5748   LearningRate 0.0440   Epoch: 6   Global Step: 112390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:43,220-Speed 3336.63 samples/sec   Loss 2.5364   LearningRate 0.0440   Epoch: 6   Global Step: 112400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:46,278-Speed 3349.20 samples/sec   Loss 2.5865   LearningRate 0.0440   Epoch: 6   Global Step: 112410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:49,356-Speed 3327.40 samples/sec   Loss 2.5758   LearningRate 0.0440   Epoch: 6   Global Step: 112420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:52,422-Speed 3341.15 samples/sec   Loss 2.5584   LearningRate 0.0440   Epoch: 6   Global Step: 112430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:55,531-Speed 3294.96 samples/sec   Loss 2.5181   LearningRate 0.0440   Epoch: 6   Global Step: 112440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:07:58,599-Speed 3337.60 samples/sec   Loss 2.5791   LearningRate 0.0440   Epoch: 6   Global Step: 112450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:08:01,700-Speed 3303.80 samples/sec   Loss 2.5515   LearningRate 0.0440   Epoch: 6   Global Step: 112460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:08:04,766-Speed 3340.73 samples/sec   Loss 2.6400   LearningRate 0.0440   Epoch: 6   Global Step: 112470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:07,841-Speed 3330.50 samples/sec   Loss 2.6090   LearningRate 0.0440   Epoch: 6   Global Step: 112480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:10,957-Speed 3286.97 samples/sec   Loss 2.5795   LearningRate 0.0440   Epoch: 6   Global Step: 112490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:14,085-Speed 3274.85 samples/sec   Loss 2.6166   LearningRate 0.0440   Epoch: 6   Global Step: 112500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:17,161-Speed 3333.17 samples/sec   Loss 2.5535   LearningRate 0.0440   Epoch: 6   Global Step: 112510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:20,228-Speed 3339.68 samples/sec   Loss 2.5467   LearningRate 0.0439   Epoch: 6   Global Step: 112520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:23,296-Speed 3338.96 samples/sec   Loss 2.5159   LearningRate 0.0439   Epoch: 6   Global Step: 112530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:26,368-Speed 3333.54 samples/sec   Loss 2.5654   LearningRate 0.0439   Epoch: 6   Global Step: 112540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:29,436-Speed 3339.07 samples/sec   Loss 2.5645   LearningRate 0.0439   Epoch: 6   Global Step: 112550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:32,513-Speed 3328.07 samples/sec   Loss 2.5337   LearningRate 0.0439   Epoch: 6   Global Step: 112560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:35,595-Speed 3324.94 samples/sec   Loss 2.5894   LearningRate 0.0439   Epoch: 6   Global Step: 112570   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:08:38,647-Speed 3355.81 samples/sec   Loss 2.6278   LearningRate 0.0439   Epoch: 6   Global Step: 112580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:41,709-Speed 3345.38 samples/sec   Loss 2.5803   LearningRate 0.0439   Epoch: 6   Global Step: 112590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:44,795-Speed 3319.45 samples/sec   Loss 2.6015   LearningRate 0.0439   Epoch: 6   Global Step: 112600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:47,858-Speed 3344.20 samples/sec   Loss 2.5944   LearningRate 0.0439   Epoch: 6   Global Step: 112610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:50,921-Speed 3342.91 samples/sec   Loss 2.5458   LearningRate 0.0439   Epoch: 6   Global Step: 112620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:53,991-Speed 3336.53 samples/sec   Loss 2.5451   LearningRate 0.0439   Epoch: 6   Global Step: 112630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:08:57,067-Speed 3329.92 samples/sec   Loss 2.5816   LearningRate 0.0439   Epoch: 6   Global Step: 112640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:09:00,163-Speed 3308.76 samples/sec   Loss 2.6012   LearningRate 0.0439   Epoch: 6   Global Step: 112650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:03,259-Speed 3308.07 samples/sec   Loss 2.5751   LearningRate 0.0439   Epoch: 6   Global Step: 112660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:06,320-Speed 3346.88 samples/sec   Loss 2.5641   LearningRate 0.0439   Epoch: 6   Global Step: 112670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:09,384-Speed 3343.25 samples/sec   Loss 2.5794   LearningRate 0.0439   Epoch: 6   Global Step: 112680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:12,494-Speed 3293.41 samples/sec   Loss 2.5858   LearningRate 0.0439   Epoch: 6   Global Step: 112690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:15,556-Speed 3345.01 samples/sec   Loss 2.4981   LearningRate 0.0439   Epoch: 6   Global Step: 112700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:18,672-Speed 3288.51 samples/sec   Loss 2.5715   LearningRate 0.0439   Epoch: 6   Global Step: 112710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:21,816-Speed 3257.07 samples/sec   Loss 2.5607   LearningRate 0.0439   Epoch: 6   Global Step: 112720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:24,911-Speed 3310.34 samples/sec   Loss 2.5669   LearningRate 0.0439   Epoch: 6   Global Step: 112730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:27,978-Speed 3339.16 samples/sec   Loss 2.6429   LearningRate 0.0439   Epoch: 6   Global Step: 112740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:31,053-Speed 3331.30 samples/sec   Loss 2.5920   LearningRate 0.0439   Epoch: 6   Global Step: 112750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:09:34,116-Speed 3344.22 samples/sec   Loss 2.5929   LearningRate 0.0439   Epoch: 6   Global Step: 112760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:09:37,180-Speed 3343.97 samples/sec   Loss 2.6332   LearningRate 0.0438   Epoch: 6   Global Step: 112770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:40,240-Speed 3346.93 samples/sec   Loss 2.5056   LearningRate 0.0438   Epoch: 6   Global Step: 112780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:43,301-Speed 3345.73 samples/sec   Loss 2.5637   LearningRate 0.0438   Epoch: 6   Global Step: 112790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:46,386-Speed 3321.07 samples/sec   Loss 2.5379   LearningRate 0.0438   Epoch: 6   Global Step: 112800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:49,444-Speed 3348.85 samples/sec   Loss 2.6221   LearningRate 0.0438   Epoch: 6   Global Step: 112810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:09:52,574-Speed 3273.06 samples/sec   Loss 2.6214   LearningRate 0.0438   Epoch: 6   Global Step: 112820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:09:55,706-Speed 3269.64 samples/sec   Loss 2.5753   LearningRate 0.0438   Epoch: 6   Global Step: 112830   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:09:58,770-Speed 3343.97 samples/sec   Loss 2.5932   LearningRate 0.0438   Epoch: 6   Global Step: 112840   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:10:01,840-Speed 3337.13 samples/sec   Loss 2.4935   LearningRate 0.0438   Epoch: 6   Global Step: 112850   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:10:04,905-Speed 3341.14 samples/sec   Loss 2.5574   LearningRate 0.0438   Epoch: 6   Global Step: 112860   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:10:07,963-Speed 3349.46 samples/sec   Loss 2.5765   LearningRate 0.0438   Epoch: 6   Global Step: 112870   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:10:11,040-Speed 3329.01 samples/sec   Loss 2.5253   LearningRate 0.0438   Epoch: 6   Global Step: 112880   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:10:14,101-Speed 3346.69 samples/sec   Loss 2.4901   LearningRate 0.0438   Epoch: 6   Global Step: 112890   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:10:17,166-Speed 3341.49 samples/sec   Loss 2.5656   LearningRate 0.0438   Epoch: 6   Global Step: 112900   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:10:20,241-Speed 3330.28 samples/sec   Loss 2.5494   LearningRate 0.0438   Epoch: 6   Global Step: 112910   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:10:23,309-Speed 3339.28 samples/sec   Loss 2.5454   LearningRate 0.0438   Epoch: 6   Global Step: 112920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:26,374-Speed 3341.68 samples/sec   Loss 2.4323   LearningRate 0.0438   Epoch: 6   Global Step: 112930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:29,444-Speed 3337.34 samples/sec   Loss 2.6854   LearningRate 0.0438   Epoch: 6   Global Step: 112940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:32,518-Speed 3331.52 samples/sec   Loss 2.5419   LearningRate 0.0438   Epoch: 6   Global Step: 112950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:35,593-Speed 3330.96 samples/sec   Loss 2.5596   LearningRate 0.0438   Epoch: 6   Global Step: 112960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:38,664-Speed 3335.79 samples/sec   Loss 2.5986   LearningRate 0.0438   Epoch: 6   Global Step: 112970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:41,735-Speed 3335.52 samples/sec   Loss 2.5280   LearningRate 0.0438   Epoch: 6   Global Step: 112980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:44,795-Speed 3346.45 samples/sec   Loss 2.5322   LearningRate 0.0438   Epoch: 6   Global Step: 112990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:47,870-Speed 3331.86 samples/sec   Loss 2.5398   LearningRate 0.0438   Epoch: 6   Global Step: 113000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:50,947-Speed 3328.73 samples/sec   Loss 2.5751   LearningRate 0.0438   Epoch: 6   Global Step: 113010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:10:54,093-Speed 3256.07 samples/sec   Loss 2.5828   LearningRate 0.0437   Epoch: 6   Global Step: 113020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:10:57,203-Speed 3293.05 samples/sec   Loss 2.5692   LearningRate 0.0437   Epoch: 6   Global Step: 113030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:00,298-Speed 3309.22 samples/sec   Loss 2.6255   LearningRate 0.0437   Epoch: 6   Global Step: 113040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:03,379-Speed 3324.75 samples/sec   Loss 2.6008   LearningRate 0.0437   Epoch: 6   Global Step: 113050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:06,496-Speed 3286.87 samples/sec   Loss 2.5950   LearningRate 0.0437   Epoch: 6   Global Step: 113060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:09,595-Speed 3304.13 samples/sec   Loss 2.6044   LearningRate 0.0437   Epoch: 6   Global Step: 113070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:12,675-Speed 3326.00 samples/sec   Loss 2.6187   LearningRate 0.0437   Epoch: 6   Global Step: 113080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:15,735-Speed 3347.33 samples/sec   Loss 2.5336   LearningRate 0.0437   Epoch: 6   Global Step: 113090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:18,811-Speed 3329.49 samples/sec   Loss 2.5372   LearningRate 0.0437   Epoch: 6   Global Step: 113100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:21,890-Speed 3325.93 samples/sec   Loss 2.5761   LearningRate 0.0437   Epoch: 6   Global Step: 113110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:24,981-Speed 3313.98 samples/sec   Loss 2.6303   LearningRate 0.0437   Epoch: 6   Global Step: 113120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:28,057-Speed 3330.52 samples/sec   Loss 2.5840   LearningRate 0.0437   Epoch: 6   Global Step: 113130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:31,169-Speed 3291.42 samples/sec   Loss 2.5995   LearningRate 0.0437   Epoch: 6   Global Step: 113140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:34,249-Speed 3324.90 samples/sec   Loss 2.6406   LearningRate 0.0437   Epoch: 6   Global Step: 113150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:37,365-Speed 3287.48 samples/sec   Loss 2.5454   LearningRate 0.0437   Epoch: 6   Global Step: 113160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:40,431-Speed 3340.32 samples/sec   Loss 2.5417   LearningRate 0.0437   Epoch: 6   Global Step: 113170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:43,494-Speed 3344.09 samples/sec   Loss 2.5671   LearningRate 0.0437   Epoch: 6   Global Step: 113180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:46,592-Speed 3306.72 samples/sec   Loss 2.5629   LearningRate 0.0437   Epoch: 6   Global Step: 113190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:49,733-Speed 3260.54 samples/sec   Loss 2.4994   LearningRate 0.0437   Epoch: 6   Global Step: 113200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:52,806-Speed 3333.10 samples/sec   Loss 2.5924   LearningRate 0.0437   Epoch: 6   Global Step: 113210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:55,861-Speed 3352.69 samples/sec   Loss 2.6706   LearningRate 0.0437   Epoch: 6   Global Step: 113220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:11:58,946-Speed 3319.89 samples/sec   Loss 2.5497   LearningRate 0.0437   Epoch: 6   Global Step: 113230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:12:02,060-Speed 3289.12 samples/sec   Loss 2.6254   LearningRate 0.0437   Epoch: 6   Global Step: 113240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:12:05,123-Speed 3344.52 samples/sec   Loss 2.6284   LearningRate 0.0437   Epoch: 6   Global Step: 113250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:12:08,243-Speed 3281.93 samples/sec   Loss 2.6084   LearningRate 0.0437   Epoch: 6   Global Step: 113260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:12:11,318-Speed 3331.20 samples/sec   Loss 2.6052   LearningRate 0.0437   Epoch: 6   Global Step: 113270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:12:14,422-Speed 3300.13 samples/sec   Loss 2.6109   LearningRate 0.0436   Epoch: 6   Global Step: 113280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:12:17,485-Speed 3344.04 samples/sec   Loss 2.5510   LearningRate 0.0436   Epoch: 6   Global Step: 113290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:12:20,551-Speed 3340.51 samples/sec   Loss 2.5553   LearningRate 0.0436   Epoch: 6   Global Step: 113300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:12:23,622-Speed 3335.45 samples/sec   Loss 2.5425   LearningRate 0.0436   Epoch: 6   Global Step: 113310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:12:26,732-Speed 3294.10 samples/sec   Loss 2.5189   LearningRate 0.0436   Epoch: 6   Global Step: 113320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:29,890-Speed 3243.25 samples/sec   Loss 2.6145   LearningRate 0.0436   Epoch: 6   Global Step: 113330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:32,972-Speed 3323.52 samples/sec   Loss 2.5419   LearningRate 0.0436   Epoch: 6   Global Step: 113340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:36,045-Speed 3332.44 samples/sec   Loss 2.5631   LearningRate 0.0436   Epoch: 6   Global Step: 113350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:39,108-Speed 3344.81 samples/sec   Loss 2.4945   LearningRate 0.0436   Epoch: 6   Global Step: 113360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:42,174-Speed 3340.02 samples/sec   Loss 2.5558   LearningRate 0.0436   Epoch: 6   Global Step: 113370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:45,235-Speed 3346.12 samples/sec   Loss 2.6125   LearningRate 0.0436   Epoch: 6   Global Step: 113380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:48,308-Speed 3333.00 samples/sec   Loss 2.5261   LearningRate 0.0436   Epoch: 6   Global Step: 113390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:51,375-Speed 3339.59 samples/sec   Loss 2.5170   LearningRate 0.0436   Epoch: 6   Global Step: 113400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:54,438-Speed 3343.79 samples/sec   Loss 2.5091   LearningRate 0.0436   Epoch: 6   Global Step: 113410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:12:57,523-Speed 3320.84 samples/sec   Loss 2.5455   LearningRate 0.0436   Epoch: 6   Global Step: 113420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:00,603-Speed 3325.01 samples/sec   Loss 2.5171   LearningRate 0.0436   Epoch: 6   Global Step: 113430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:03,668-Speed 3342.64 samples/sec   Loss 2.6367   LearningRate 0.0436   Epoch: 6   Global Step: 113440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:06,737-Speed 3337.06 samples/sec   Loss 2.6297   LearningRate 0.0436   Epoch: 6   Global Step: 113450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:09,819-Speed 3323.62 samples/sec   Loss 2.4855   LearningRate 0.0436   Epoch: 6   Global Step: 113460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:12,894-Speed 3330.19 samples/sec   Loss 2.5309   LearningRate 0.0436   Epoch: 6   Global Step: 113470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:15,975-Speed 3325.16 samples/sec   Loss 2.5395   LearningRate 0.0436   Epoch: 6   Global Step: 113480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:19,051-Speed 3329.18 samples/sec   Loss 2.5042   LearningRate 0.0436   Epoch: 6   Global Step: 113490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:22,175-Speed 3278.54 samples/sec   Loss 2.5736   LearningRate 0.0436   Epoch: 6   Global Step: 113500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:25,274-Speed 3305.37 samples/sec   Loss 2.5189   LearningRate 0.0436   Epoch: 6   Global Step: 113510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:28,364-Speed 3315.77 samples/sec   Loss 2.5501   LearningRate 0.0436   Epoch: 6   Global Step: 113520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:31,431-Speed 3339.09 samples/sec   Loss 2.5614   LearningRate 0.0435   Epoch: 6   Global Step: 113530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:13:34,544-Speed 3291.03 samples/sec   Loss 2.4832   LearningRate 0.0435   Epoch: 6   Global Step: 113540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:13:37,679-Speed 3266.84 samples/sec   Loss 2.6050   LearningRate 0.0435   Epoch: 6   Global Step: 113550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:13:40,796-Speed 3285.68 samples/sec   Loss 2.5292   LearningRate 0.0435   Epoch: 6   Global Step: 113560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:13:43,860-Speed 3342.83 samples/sec   Loss 2.5321   LearningRate 0.0435   Epoch: 6   Global Step: 113570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:13:46,924-Speed 3342.77 samples/sec   Loss 2.5716   LearningRate 0.0435   Epoch: 6   Global Step: 113580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:13:50,014-Speed 3315.18 samples/sec   Loss 2.5279   LearningRate 0.0435   Epoch: 6   Global Step: 113590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:13:53,084-Speed 3336.04 samples/sec   Loss 2.5846   LearningRate 0.0435   Epoch: 6   Global Step: 113600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:13:56,147-Speed 3344.67 samples/sec   Loss 2.6332   LearningRate 0.0435   Epoch: 6   Global Step: 113610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:13:59,246-Speed 3305.02 samples/sec   Loss 2.5879   LearningRate 0.0435   Epoch: 6   Global Step: 113620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:14:02,354-Speed 3294.81 samples/sec   Loss 2.5579   LearningRate 0.0435   Epoch: 6   Global Step: 113630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:14:05,419-Speed 3341.63 samples/sec   Loss 2.5477   LearningRate 0.0435   Epoch: 6   Global Step: 113640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:08,498-Speed 3326.47 samples/sec   Loss 2.5755   LearningRate 0.0435   Epoch: 6   Global Step: 113650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:11,579-Speed 3324.82 samples/sec   Loss 2.5351   LearningRate 0.0435   Epoch: 6   Global Step: 113660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:14,641-Speed 3344.33 samples/sec   Loss 2.5134   LearningRate 0.0435   Epoch: 6   Global Step: 113670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:17,707-Speed 3340.72 samples/sec   Loss 2.5342   LearningRate 0.0435   Epoch: 6   Global Step: 113680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:20,833-Speed 3277.32 samples/sec   Loss 2.5324   LearningRate 0.0435   Epoch: 6   Global Step: 113690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:23,913-Speed 3326.00 samples/sec   Loss 2.5227   LearningRate 0.0435   Epoch: 6   Global Step: 113700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:27,031-Speed 3284.24 samples/sec   Loss 2.5824   LearningRate 0.0435   Epoch: 6   Global Step: 113710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:30,137-Speed 3298.18 samples/sec   Loss 2.5929   LearningRate 0.0435   Epoch: 6   Global Step: 113720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:33,227-Speed 3314.65 samples/sec   Loss 2.5196   LearningRate 0.0435   Epoch: 6   Global Step: 113730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:36,282-Speed 3352.49 samples/sec   Loss 2.5413   LearningRate 0.0435   Epoch: 6   Global Step: 113740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:39,368-Speed 3318.50 samples/sec   Loss 2.5492   LearningRate 0.0435   Epoch: 6   Global Step: 113750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:42,432-Speed 3343.19 samples/sec   Loss 2.5212   LearningRate 0.0435   Epoch: 6   Global Step: 113760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:45,528-Speed 3307.99 samples/sec   Loss 2.5917   LearningRate 0.0435   Epoch: 6   Global Step: 113770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:48,608-Speed 3326.64 samples/sec   Loss 2.5388   LearningRate 0.0434   Epoch: 6   Global Step: 113780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:51,723-Speed 3288.53 samples/sec   Loss 2.5400   LearningRate 0.0434   Epoch: 6   Global Step: 113790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:54,806-Speed 3321.79 samples/sec   Loss 2.5366   LearningRate 0.0434   Epoch: 6   Global Step: 113800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:14:57,886-Speed 3326.04 samples/sec   Loss 2.5812   LearningRate 0.0434   Epoch: 6   Global Step: 113810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:00,978-Speed 3312.07 samples/sec   Loss 2.5644   LearningRate 0.0434   Epoch: 6   Global Step: 113820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:04,071-Speed 3312.35 samples/sec   Loss 2.6024   LearningRate 0.0434   Epoch: 6   Global Step: 113830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:07,141-Speed 3335.45 samples/sec   Loss 2.5366   LearningRate 0.0434   Epoch: 6   Global Step: 113840   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:15:10,196-Speed 3353.00 samples/sec   Loss 2.6213   LearningRate 0.0434   Epoch: 6   Global Step: 113850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:13,267-Speed 3335.48 samples/sec   Loss 2.5924   LearningRate 0.0434   Epoch: 6   Global Step: 113860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:16,330-Speed 3345.08 samples/sec   Loss 2.4941   LearningRate 0.0434   Epoch: 6   Global Step: 113870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:19,396-Speed 3341.25 samples/sec   Loss 2.6045   LearningRate 0.0434   Epoch: 6   Global Step: 113880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:22,462-Speed 3340.03 samples/sec   Loss 2.6033   LearningRate 0.0434   Epoch: 6   Global Step: 113890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:25,544-Speed 3323.25 samples/sec   Loss 2.5592   LearningRate 0.0434   Epoch: 6   Global Step: 113900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:28,634-Speed 3314.36 samples/sec   Loss 2.6033   LearningRate 0.0434   Epoch: 6   Global Step: 113910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:31,720-Speed 3319.91 samples/sec   Loss 2.5421   LearningRate 0.0434   Epoch: 6   Global Step: 113920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:34,786-Speed 3341.11 samples/sec   Loss 2.5772   LearningRate 0.0434   Epoch: 6   Global Step: 113930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:37,873-Speed 3317.48 samples/sec   Loss 2.5811   LearningRate 0.0434   Epoch: 6   Global Step: 113940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:40,931-Speed 3349.21 samples/sec   Loss 2.5025   LearningRate 0.0434   Epoch: 6   Global Step: 113950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:44,002-Speed 3334.98 samples/sec   Loss 2.5187   LearningRate 0.0434   Epoch: 6   Global Step: 113960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:47,069-Speed 3340.43 samples/sec   Loss 2.5201   LearningRate 0.0434   Epoch: 6   Global Step: 113970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:50,216-Speed 3254.09 samples/sec   Loss 2.5241   LearningRate 0.0434   Epoch: 6   Global Step: 113980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:53,321-Speed 3299.35 samples/sec   Loss 2.5491   LearningRate 0.0434   Epoch: 6   Global Step: 113990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:15:56,463-Speed 3259.88 samples/sec   Loss 2.5540   LearningRate 0.0434   Epoch: 6   Global Step: 114000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:16:41,000-[lfw][114000]XNorm: 23.402697
Training: 2022-04-11 11:16:41,001-[lfw][114000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-11 11:16:41,001-[lfw][114000]Accuracy-Highest: 0.99817
Training: 2022-04-11 11:17:32,398-[cfp_fp][114000]XNorm: 22.355344
Training: 2022-04-11 11:17:32,398-[cfp_fp][114000]Accuracy-Flip: 0.98643+-0.00604
Training: 2022-04-11 11:17:32,399-[cfp_fp][114000]Accuracy-Highest: 0.98643
Training: 2022-04-11 11:18:16,430-[agedb_30][114000]XNorm: 23.586956
Training: 2022-04-11 11:18:16,431-[agedb_30][114000]Accuracy-Flip: 0.98317+-0.00787
Training: 2022-04-11 11:18:16,431-[agedb_30][114000]Accuracy-Highest: 0.98317
Training: 2022-04-11 11:18:19,510-Speed 71.59 samples/sec   Loss 2.5608   LearningRate 0.0434   Epoch: 6   Global Step: 114010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:18:22,591-Speed 3324.25 samples/sec   Loss 2.6048   LearningRate 0.0434   Epoch: 6   Global Step: 114020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:18:25,652-Speed 3346.32 samples/sec   Loss 2.5883   LearningRate 0.0434   Epoch: 6   Global Step: 114030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:18:28,822-Speed 3231.08 samples/sec   Loss 2.5556   LearningRate 0.0433   Epoch: 6   Global Step: 114040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:18:31,881-Speed 3347.42 samples/sec   Loss 2.5610   LearningRate 0.0433   Epoch: 6   Global Step: 114050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:18:34,975-Speed 3311.10 samples/sec   Loss 2.5707   LearningRate 0.0433   Epoch: 6   Global Step: 114060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:18:38,031-Speed 3351.30 samples/sec   Loss 2.5514   LearningRate 0.0433   Epoch: 6   Global Step: 114070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:18:41,119-Speed 3316.64 samples/sec   Loss 2.5499   LearningRate 0.0433   Epoch: 6   Global Step: 114080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:18:44,177-Speed 3349.40 samples/sec   Loss 2.6692   LearningRate 0.0433   Epoch: 6   Global Step: 114090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:18:47,256-Speed 3327.16 samples/sec   Loss 2.6277   LearningRate 0.0433   Epoch: 6   Global Step: 114100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:18:50,319-Speed 3343.69 samples/sec   Loss 2.5372   LearningRate 0.0433   Epoch: 6   Global Step: 114110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:18:53,391-Speed 3334.29 samples/sec   Loss 2.5335   LearningRate 0.0433   Epoch: 6   Global Step: 114120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:18:56,454-Speed 3343.34 samples/sec   Loss 2.4969   LearningRate 0.0433   Epoch: 6   Global Step: 114130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:18:59,513-Speed 3348.65 samples/sec   Loss 2.5192   LearningRate 0.0433   Epoch: 6   Global Step: 114140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:02,582-Speed 3337.31 samples/sec   Loss 2.5220   LearningRate 0.0433   Epoch: 6   Global Step: 114150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:05,631-Speed 3359.37 samples/sec   Loss 2.5588   LearningRate 0.0433   Epoch: 6   Global Step: 114160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:08,763-Speed 3269.93 samples/sec   Loss 2.5240   LearningRate 0.0433   Epoch: 6   Global Step: 114170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:11,841-Speed 3328.57 samples/sec   Loss 2.5251   LearningRate 0.0433   Epoch: 6   Global Step: 114180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:14,912-Speed 3335.74 samples/sec   Loss 2.5432   LearningRate 0.0433   Epoch: 6   Global Step: 114190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:17,972-Speed 3346.56 samples/sec   Loss 2.5548   LearningRate 0.0433   Epoch: 6   Global Step: 114200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:21,036-Speed 3343.13 samples/sec   Loss 2.5524   LearningRate 0.0433   Epoch: 6   Global Step: 114210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:24,117-Speed 3324.35 samples/sec   Loss 2.5480   LearningRate 0.0433   Epoch: 6   Global Step: 114220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:27,171-Speed 3353.21 samples/sec   Loss 2.5177   LearningRate 0.0433   Epoch: 6   Global Step: 114230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:30,239-Speed 3338.98 samples/sec   Loss 2.5078   LearningRate 0.0433   Epoch: 6   Global Step: 114240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:33,295-Speed 3351.26 samples/sec   Loss 2.5681   LearningRate 0.0433   Epoch: 6   Global Step: 114250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:19:36,356-Speed 3346.62 samples/sec   Loss 2.5376   LearningRate 0.0433   Epoch: 6   Global Step: 114260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:19:39,436-Speed 3325.27 samples/sec   Loss 2.5214   LearningRate 0.0433   Epoch: 6   Global Step: 114270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:19:42,494-Speed 3349.14 samples/sec   Loss 2.5102   LearningRate 0.0433   Epoch: 6   Global Step: 114280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:19:45,570-Speed 3330.43 samples/sec   Loss 2.4956   LearningRate 0.0432   Epoch: 6   Global Step: 114290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:19:48,635-Speed 3341.66 samples/sec   Loss 2.5543   LearningRate 0.0432   Epoch: 6   Global Step: 114300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:19:51,716-Speed 3324.43 samples/sec   Loss 2.6079   LearningRate 0.0432   Epoch: 6   Global Step: 114310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:19:54,804-Speed 3316.61 samples/sec   Loss 2.4859   LearningRate 0.0432   Epoch: 6   Global Step: 114320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:19:57,913-Speed 3294.58 samples/sec   Loss 2.5090   LearningRate 0.0432   Epoch: 6   Global Step: 114330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:00,992-Speed 3326.98 samples/sec   Loss 2.5416   LearningRate 0.0432   Epoch: 6   Global Step: 114340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:04,056-Speed 3342.31 samples/sec   Loss 2.5171   LearningRate 0.0432   Epoch: 6   Global Step: 114350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:07,102-Speed 3362.18 samples/sec   Loss 2.4865   LearningRate 0.0432   Epoch: 6   Global Step: 114360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:10,166-Speed 3344.52 samples/sec   Loss 2.5853   LearningRate 0.0432   Epoch: 6   Global Step: 114370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:13,237-Speed 3335.42 samples/sec   Loss 2.5458   LearningRate 0.0432   Epoch: 6   Global Step: 114380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:16,367-Speed 3272.34 samples/sec   Loss 2.5917   LearningRate 0.0432   Epoch: 6   Global Step: 114390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:19,488-Speed 3281.53 samples/sec   Loss 2.5216   LearningRate 0.0432   Epoch: 6   Global Step: 114400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:22,616-Speed 3274.17 samples/sec   Loss 2.5407   LearningRate 0.0432   Epoch: 6   Global Step: 114410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:25,674-Speed 3349.76 samples/sec   Loss 2.5045   LearningRate 0.0432   Epoch: 6   Global Step: 114420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:28,731-Speed 3350.53 samples/sec   Loss 2.5701   LearningRate 0.0432   Epoch: 6   Global Step: 114430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:31,803-Speed 3335.20 samples/sec   Loss 2.5602   LearningRate 0.0432   Epoch: 6   Global Step: 114440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:34,860-Speed 3350.37 samples/sec   Loss 2.5943   LearningRate 0.0432   Epoch: 6   Global Step: 114450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:37,919-Speed 3348.00 samples/sec   Loss 2.4603   LearningRate 0.0432   Epoch: 6   Global Step: 114460   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:20:40,970-Speed 3357.20 samples/sec   Loss 2.5376   LearningRate 0.0432   Epoch: 6   Global Step: 114470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:44,030-Speed 3347.50 samples/sec   Loss 2.4570   LearningRate 0.0432   Epoch: 6   Global Step: 114480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:47,100-Speed 3337.07 samples/sec   Loss 2.5347   LearningRate 0.0432   Epoch: 6   Global Step: 114490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:50,182-Speed 3323.19 samples/sec   Loss 2.6046   LearningRate 0.0432   Epoch: 6   Global Step: 114500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:53,249-Speed 3339.42 samples/sec   Loss 2.5751   LearningRate 0.0432   Epoch: 6   Global Step: 114510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:56,353-Speed 3299.08 samples/sec   Loss 2.4733   LearningRate 0.0432   Epoch: 6   Global Step: 114520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:20:59,412-Speed 3348.94 samples/sec   Loss 2.5512   LearningRate 0.0432   Epoch: 6   Global Step: 114530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:02,481-Speed 3337.20 samples/sec   Loss 2.4878   LearningRate 0.0431   Epoch: 6   Global Step: 114540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:05,555-Speed 3331.89 samples/sec   Loss 2.5077   LearningRate 0.0431   Epoch: 6   Global Step: 114550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:08,618-Speed 3344.45 samples/sec   Loss 2.4642   LearningRate 0.0431   Epoch: 6   Global Step: 114560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:11,671-Speed 3354.89 samples/sec   Loss 2.4809   LearningRate 0.0431   Epoch: 6   Global Step: 114570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:14,728-Speed 3349.88 samples/sec   Loss 2.5306   LearningRate 0.0431   Epoch: 6   Global Step: 114580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:17,819-Speed 3314.43 samples/sec   Loss 2.4813   LearningRate 0.0431   Epoch: 6   Global Step: 114590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:20,877-Speed 3349.78 samples/sec   Loss 2.5217   LearningRate 0.0431   Epoch: 6   Global Step: 114600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:23,933-Speed 3351.58 samples/sec   Loss 2.5474   LearningRate 0.0431   Epoch: 6   Global Step: 114610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:27,048-Speed 3287.48 samples/sec   Loss 2.4897   LearningRate 0.0431   Epoch: 6   Global Step: 114620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:30,107-Speed 3349.06 samples/sec   Loss 2.5960   LearningRate 0.0431   Epoch: 6   Global Step: 114630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:33,180-Speed 3331.93 samples/sec   Loss 2.5301   LearningRate 0.0431   Epoch: 6   Global Step: 114640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:36,258-Speed 3328.49 samples/sec   Loss 2.5819   LearningRate 0.0431   Epoch: 6   Global Step: 114650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:39,341-Speed 3321.96 samples/sec   Loss 2.5252   LearningRate 0.0431   Epoch: 6   Global Step: 114660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:42,409-Speed 3338.27 samples/sec   Loss 2.5697   LearningRate 0.0431   Epoch: 6   Global Step: 114670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:45,496-Speed 3318.36 samples/sec   Loss 2.5483   LearningRate 0.0431   Epoch: 6   Global Step: 114680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:48,564-Speed 3338.45 samples/sec   Loss 2.4351   LearningRate 0.0431   Epoch: 6   Global Step: 114690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:51,627-Speed 3343.92 samples/sec   Loss 2.5656   LearningRate 0.0431   Epoch: 6   Global Step: 114700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:54,706-Speed 3326.70 samples/sec   Loss 2.5556   LearningRate 0.0431   Epoch: 6   Global Step: 114710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:21:57,770-Speed 3342.46 samples/sec   Loss 2.5943   LearningRate 0.0431   Epoch: 6   Global Step: 114720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:00,861-Speed 3313.82 samples/sec   Loss 2.5435   LearningRate 0.0431   Epoch: 6   Global Step: 114730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:03,975-Speed 3289.20 samples/sec   Loss 2.5209   LearningRate 0.0431   Epoch: 6   Global Step: 114740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:07,041-Speed 3340.67 samples/sec   Loss 2.6345   LearningRate 0.0431   Epoch: 6   Global Step: 114750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:10,124-Speed 3322.15 samples/sec   Loss 2.5306   LearningRate 0.0431   Epoch: 6   Global Step: 114760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:13,182-Speed 3349.91 samples/sec   Loss 2.5175   LearningRate 0.0431   Epoch: 6   Global Step: 114770   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:22:16,230-Speed 3360.55 samples/sec   Loss 2.5422   LearningRate 0.0431   Epoch: 6   Global Step: 114780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:19,295-Speed 3342.23 samples/sec   Loss 2.5344   LearningRate 0.0431   Epoch: 6   Global Step: 114790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:22,395-Speed 3303.60 samples/sec   Loss 2.5585   LearningRate 0.0430   Epoch: 6   Global Step: 114800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:25,465-Speed 3335.78 samples/sec   Loss 2.5796   LearningRate 0.0430   Epoch: 6   Global Step: 114810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:28,545-Speed 3326.01 samples/sec   Loss 2.5969   LearningRate 0.0430   Epoch: 6   Global Step: 114820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:31,617-Speed 3334.25 samples/sec   Loss 2.5454   LearningRate 0.0430   Epoch: 6   Global Step: 114830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:34,687-Speed 3336.12 samples/sec   Loss 2.5151   LearningRate 0.0430   Epoch: 6   Global Step: 114840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:37,764-Speed 3328.43 samples/sec   Loss 2.5729   LearningRate 0.0430   Epoch: 6   Global Step: 114850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:40,839-Speed 3331.81 samples/sec   Loss 2.5728   LearningRate 0.0430   Epoch: 6   Global Step: 114860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:43,919-Speed 3324.76 samples/sec   Loss 2.4548   LearningRate 0.0430   Epoch: 6   Global Step: 114870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:46,990-Speed 3345.91 samples/sec   Loss 2.6071   LearningRate 0.0430   Epoch: 6   Global Step: 114880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:50,052-Speed 3344.41 samples/sec   Loss 2.6006   LearningRate 0.0430   Epoch: 6   Global Step: 114890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:53,114-Speed 3345.80 samples/sec   Loss 2.5366   LearningRate 0.0430   Epoch: 6   Global Step: 114900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:56,178-Speed 3342.09 samples/sec   Loss 2.5206   LearningRate 0.0430   Epoch: 6   Global Step: 114910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:22:59,324-Speed 3289.42 samples/sec   Loss 2.6096   LearningRate 0.0430   Epoch: 6   Global Step: 114920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:23:02,394-Speed 3336.51 samples/sec   Loss 2.4819   LearningRate 0.0430   Epoch: 6   Global Step: 114930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:23:05,457-Speed 3343.38 samples/sec   Loss 2.5417   LearningRate 0.0430   Epoch: 6   Global Step: 114940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:23:08,569-Speed 3291.34 samples/sec   Loss 2.4752   LearningRate 0.0430   Epoch: 6   Global Step: 114950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:23:11,676-Speed 3296.38 samples/sec   Loss 2.5280   LearningRate 0.0430   Epoch: 6   Global Step: 114960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:23:14,737-Speed 3346.85 samples/sec   Loss 2.5849   LearningRate 0.0430   Epoch: 6   Global Step: 114970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:23:17,812-Speed 3338.63 samples/sec   Loss 2.5580   LearningRate 0.0430   Epoch: 6   Global Step: 114980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:23:20,882-Speed 3335.30 samples/sec   Loss 2.4773   LearningRate 0.0430   Epoch: 6   Global Step: 114990   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:24,059-Speed 3224.11 samples/sec   Loss 2.5903   LearningRate 0.0430   Epoch: 6   Global Step: 115000   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:27,149-Speed 3315.21 samples/sec   Loss 2.5071   LearningRate 0.0430   Epoch: 6   Global Step: 115010   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:30,221-Speed 3337.19 samples/sec   Loss 2.5618   LearningRate 0.0430   Epoch: 6   Global Step: 115020   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:33,295-Speed 3331.41 samples/sec   Loss 2.5502   LearningRate 0.0430   Epoch: 6   Global Step: 115030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:36,359-Speed 3343.15 samples/sec   Loss 2.5663   LearningRate 0.0430   Epoch: 6   Global Step: 115040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:39,480-Speed 3281.51 samples/sec   Loss 2.4405   LearningRate 0.0429   Epoch: 6   Global Step: 115050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:42,549-Speed 3338.17 samples/sec   Loss 2.5618   LearningRate 0.0429   Epoch: 6   Global Step: 115060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:45,641-Speed 3333.73 samples/sec   Loss 2.5040   LearningRate 0.0429   Epoch: 6   Global Step: 115070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:48,749-Speed 3295.66 samples/sec   Loss 2.5525   LearningRate 0.0429   Epoch: 6   Global Step: 115080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:23:51,821-Speed 3333.30 samples/sec   Loss 2.4862   LearningRate 0.0429   Epoch: 6   Global Step: 115090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:23:54,898-Speed 3329.06 samples/sec   Loss 2.5689   LearningRate 0.0429   Epoch: 6   Global Step: 115100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:23:57,960-Speed 3347.67 samples/sec   Loss 2.5434   LearningRate 0.0429   Epoch: 6   Global Step: 115110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:24:01,042-Speed 3322.73 samples/sec   Loss 2.5718   LearningRate 0.0429   Epoch: 6   Global Step: 115120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:24:04,116-Speed 3332.09 samples/sec   Loss 2.6188   LearningRate 0.0429   Epoch: 6   Global Step: 115130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:24:07,190-Speed 3332.03 samples/sec   Loss 2.5463   LearningRate 0.0429   Epoch: 6   Global Step: 115140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:24:10,267-Speed 3328.55 samples/sec   Loss 2.5635   LearningRate 0.0429   Epoch: 6   Global Step: 115150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:24:13,333-Speed 3340.17 samples/sec   Loss 2.5267   LearningRate 0.0429   Epoch: 6   Global Step: 115160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:24:16,405-Speed 3343.28 samples/sec   Loss 2.5667   LearningRate 0.0429   Epoch: 6   Global Step: 115170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:24:19,470-Speed 3341.25 samples/sec   Loss 2.5298   LearningRate 0.0429   Epoch: 6   Global Step: 115180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:24:22,534-Speed 3346.95 samples/sec   Loss 2.5732   LearningRate 0.0429   Epoch: 6   Global Step: 115190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:24:25,595-Speed 3346.43 samples/sec   Loss 2.5325   LearningRate 0.0429   Epoch: 6   Global Step: 115200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:24:28,672-Speed 3328.75 samples/sec   Loss 2.6091   LearningRate 0.0429   Epoch: 6   Global Step: 115210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:24:31,750-Speed 3328.06 samples/sec   Loss 2.5199   LearningRate 0.0429   Epoch: 6   Global Step: 115220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:24:34,814-Speed 3346.20 samples/sec   Loss 2.6043   LearningRate 0.0429   Epoch: 6   Global Step: 115230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:24:37,887-Speed 3332.40 samples/sec   Loss 2.5119   LearningRate 0.0429   Epoch: 6   Global Step: 115240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:24:40,956-Speed 3337.30 samples/sec   Loss 2.5658   LearningRate 0.0429   Epoch: 6   Global Step: 115250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:24:44,018-Speed 3345.47 samples/sec   Loss 2.5545   LearningRate 0.0429   Epoch: 6   Global Step: 115260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:24:47,091-Speed 3334.21 samples/sec   Loss 2.4883   LearningRate 0.0429   Epoch: 6   Global Step: 115270   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:24:50,161-Speed 3343.84 samples/sec   Loss 2.6245   LearningRate 0.0429   Epoch: 6   Global Step: 115280   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:24:53,220-Speed 3347.95 samples/sec   Loss 2.5736   LearningRate 0.0429   Epoch: 6   Global Step: 115290   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:24:56,336-Speed 3291.09 samples/sec   Loss 2.5259   LearningRate 0.0429   Epoch: 6   Global Step: 115300   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:24:59,451-Speed 3287.98 samples/sec   Loss 2.5191   LearningRate 0.0428   Epoch: 6   Global Step: 115310   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:25:02,512-Speed 3346.51 samples/sec   Loss 2.5055   LearningRate 0.0428   Epoch: 6   Global Step: 115320   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:25:05,590-Speed 3327.31 samples/sec   Loss 2.5735   LearningRate 0.0428   Epoch: 6   Global Step: 115330   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:25:08,708-Speed 3346.53 samples/sec   Loss 2.5042   LearningRate 0.0428   Epoch: 6   Global Step: 115340   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:25:11,771-Speed 3344.45 samples/sec   Loss 2.5260   LearningRate 0.0428   Epoch: 6   Global Step: 115350   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:25:14,902-Speed 3270.79 samples/sec   Loss 2.4608   LearningRate 0.0428   Epoch: 6   Global Step: 115360   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:25:17,985-Speed 3323.31 samples/sec   Loss 2.4555   LearningRate 0.0428   Epoch: 6   Global Step: 115370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:21,084-Speed 3309.34 samples/sec   Loss 2.5058   LearningRate 0.0428   Epoch: 6   Global Step: 115380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:24,197-Speed 3290.53 samples/sec   Loss 2.4916   LearningRate 0.0428   Epoch: 6   Global Step: 115390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:27,333-Speed 3327.63 samples/sec   Loss 2.5135   LearningRate 0.0428   Epoch: 6   Global Step: 115400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:30,398-Speed 3341.82 samples/sec   Loss 2.5600   LearningRate 0.0428   Epoch: 6   Global Step: 115410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:33,472-Speed 3332.17 samples/sec   Loss 2.5098   LearningRate 0.0428   Epoch: 6   Global Step: 115420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:36,535-Speed 3343.43 samples/sec   Loss 2.5325   LearningRate 0.0428   Epoch: 6   Global Step: 115430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:39,603-Speed 3348.99 samples/sec   Loss 2.6042   LearningRate 0.0428   Epoch: 6   Global Step: 115440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:42,755-Speed 3249.28 samples/sec   Loss 2.5495   LearningRate 0.0428   Epoch: 6   Global Step: 115450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:45,869-Speed 3289.26 samples/sec   Loss 2.5898   LearningRate 0.0428   Epoch: 6   Global Step: 115460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:25:48,932-Speed 3344.01 samples/sec   Loss 2.5287   LearningRate 0.0428   Epoch: 6   Global Step: 115470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:25:52,011-Speed 3331.79 samples/sec   Loss 2.5329   LearningRate 0.0428   Epoch: 6   Global Step: 115480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:25:55,095-Speed 3321.46 samples/sec   Loss 2.4342   LearningRate 0.0428   Epoch: 6   Global Step: 115490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:25:58,210-Speed 3325.35 samples/sec   Loss 2.5322   LearningRate 0.0428   Epoch: 6   Global Step: 115500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:01,301-Speed 3312.86 samples/sec   Loss 2.4756   LearningRate 0.0428   Epoch: 6   Global Step: 115510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:04,370-Speed 3338.17 samples/sec   Loss 2.5685   LearningRate 0.0428   Epoch: 6   Global Step: 115520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:07,429-Speed 3348.13 samples/sec   Loss 2.5642   LearningRate 0.0428   Epoch: 6   Global Step: 115530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:10,498-Speed 3337.32 samples/sec   Loss 2.4905   LearningRate 0.0428   Epoch: 6   Global Step: 115540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:13,568-Speed 3336.35 samples/sec   Loss 2.5094   LearningRate 0.0428   Epoch: 6   Global Step: 115550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:16,654-Speed 3318.54 samples/sec   Loss 2.5100   LearningRate 0.0427   Epoch: 6   Global Step: 115560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:19,794-Speed 3262.86 samples/sec   Loss 2.5109   LearningRate 0.0427   Epoch: 6   Global Step: 115570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:22,900-Speed 3297.61 samples/sec   Loss 2.4858   LearningRate 0.0427   Epoch: 6   Global Step: 115580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:25,963-Speed 3345.20 samples/sec   Loss 2.4357   LearningRate 0.0427   Epoch: 6   Global Step: 115590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:29,024-Speed 3345.53 samples/sec   Loss 2.4407   LearningRate 0.0427   Epoch: 6   Global Step: 115600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:32,114-Speed 3314.90 samples/sec   Loss 2.5346   LearningRate 0.0427   Epoch: 6   Global Step: 115610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:26:35,160-Speed 3362.96 samples/sec   Loss 2.5797   LearningRate 0.0427   Epoch: 6   Global Step: 115620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:26:38,255-Speed 3327.01 samples/sec   Loss 2.5221   LearningRate 0.0427   Epoch: 6   Global Step: 115630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:26:41,320-Speed 3341.14 samples/sec   Loss 2.5477   LearningRate 0.0427   Epoch: 6   Global Step: 115640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:26:44,394-Speed 3332.39 samples/sec   Loss 2.5533   LearningRate 0.0427   Epoch: 6   Global Step: 115650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:26:47,455-Speed 3345.76 samples/sec   Loss 2.5725   LearningRate 0.0427   Epoch: 6   Global Step: 115660   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:26:50,554-Speed 3331.82 samples/sec   Loss 2.4821   LearningRate 0.0427   Epoch: 6   Global Step: 115670   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:26:53,788-Speed 3167.09 samples/sec   Loss 2.4585   LearningRate 0.0427   Epoch: 6   Global Step: 115680   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:26:56,894-Speed 3312.52 samples/sec   Loss 2.5546   LearningRate 0.0427   Epoch: 6   Global Step: 115690   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:26:59,971-Speed 3329.52 samples/sec   Loss 2.5585   LearningRate 0.0427   Epoch: 6   Global Step: 115700   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:27:03,037-Speed 3340.38 samples/sec   Loss 2.5045   LearningRate 0.0427   Epoch: 6   Global Step: 115710   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:27:06,101-Speed 3342.31 samples/sec   Loss 2.6275   LearningRate 0.0427   Epoch: 6   Global Step: 115720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:09,164-Speed 3349.82 samples/sec   Loss 2.4858   LearningRate 0.0427   Epoch: 6   Global Step: 115730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:12,223-Speed 3349.01 samples/sec   Loss 2.4898   LearningRate 0.0427   Epoch: 6   Global Step: 115740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:15,284-Speed 3345.79 samples/sec   Loss 2.5489   LearningRate 0.0427   Epoch: 6   Global Step: 115750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:18,356-Speed 3333.94 samples/sec   Loss 2.5652   LearningRate 0.0427   Epoch: 6   Global Step: 115760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:21,439-Speed 3327.41 samples/sec   Loss 2.5613   LearningRate 0.0427   Epoch: 6   Global Step: 115770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:24,504-Speed 3342.19 samples/sec   Loss 2.5286   LearningRate 0.0427   Epoch: 6   Global Step: 115780   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:27,577-Speed 3336.00 samples/sec   Loss 2.5496   LearningRate 0.0427   Epoch: 6   Global Step: 115790   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:30,657-Speed 3324.80 samples/sec   Loss 2.5884   LearningRate 0.0427   Epoch: 6   Global Step: 115800   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:33,719-Speed 3344.68 samples/sec   Loss 2.5541   LearningRate 0.0427   Epoch: 6   Global Step: 115810   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:27:36,812-Speed 3312.04 samples/sec   Loss 2.5038   LearningRate 0.0426   Epoch: 6   Global Step: 115820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:27:39,912-Speed 3334.26 samples/sec   Loss 2.4920   LearningRate 0.0426   Epoch: 6   Global Step: 115830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:27:43,008-Speed 3308.70 samples/sec   Loss 2.5011   LearningRate 0.0426   Epoch: 6   Global Step: 115840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:27:46,075-Speed 3339.01 samples/sec   Loss 2.4781   LearningRate 0.0426   Epoch: 6   Global Step: 115850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:27:49,153-Speed 3328.49 samples/sec   Loss 2.4964   LearningRate 0.0426   Epoch: 6   Global Step: 115860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:27:52,222-Speed 3337.16 samples/sec   Loss 2.5387   LearningRate 0.0426   Epoch: 6   Global Step: 115870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:27:55,298-Speed 3329.93 samples/sec   Loss 2.5174   LearningRate 0.0426   Epoch: 6   Global Step: 115880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:27:58,381-Speed 3322.35 samples/sec   Loss 2.5214   LearningRate 0.0426   Epoch: 6   Global Step: 115890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:01,451-Speed 3335.99 samples/sec   Loss 2.5270   LearningRate 0.0426   Epoch: 6   Global Step: 115900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:04,514-Speed 3344.79 samples/sec   Loss 2.4996   LearningRate 0.0426   Epoch: 6   Global Step: 115910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:07,578-Speed 3342.37 samples/sec   Loss 2.5796   LearningRate 0.0426   Epoch: 6   Global Step: 115920   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:28:10,634-Speed 3355.35 samples/sec   Loss 2.5523   LearningRate 0.0426   Epoch: 6   Global Step: 115930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:13,714-Speed 3326.05 samples/sec   Loss 2.5750   LearningRate 0.0426   Epoch: 6   Global Step: 115940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:16,839-Speed 3277.81 samples/sec   Loss 2.4701   LearningRate 0.0426   Epoch: 6   Global Step: 115950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:19,922-Speed 3343.43 samples/sec   Loss 2.5143   LearningRate 0.0426   Epoch: 6   Global Step: 115960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:23,000-Speed 3327.42 samples/sec   Loss 2.5351   LearningRate 0.0426   Epoch: 6   Global Step: 115970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:26,139-Speed 3330.99 samples/sec   Loss 2.5533   LearningRate 0.0426   Epoch: 6   Global Step: 115980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:29,286-Speed 3254.97 samples/sec   Loss 2.6011   LearningRate 0.0426   Epoch: 6   Global Step: 115990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:28:32,430-Speed 3256.71 samples/sec   Loss 2.4954   LearningRate 0.0426   Epoch: 6   Global Step: 116000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:29:16,522-[lfw][116000]XNorm: 21.918602
Training: 2022-04-11 11:29:16,523-[lfw][116000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-04-11 11:29:16,523-[lfw][116000]Accuracy-Highest: 0.99817
Training: 2022-04-11 11:30:07,625-[cfp_fp][116000]XNorm: 21.183093
Training: 2022-04-11 11:30:07,625-[cfp_fp][116000]Accuracy-Flip: 0.98471+-0.00470
Training: 2022-04-11 11:30:07,626-[cfp_fp][116000]Accuracy-Highest: 0.98643
Training: 2022-04-11 11:30:51,449-[agedb_30][116000]XNorm: 22.530877
Training: 2022-04-11 11:30:51,449-[agedb_30][116000]Accuracy-Flip: 0.98267+-0.00680
Training: 2022-04-11 11:30:51,450-[agedb_30][116000]Accuracy-Highest: 0.98317
Training: 2022-04-11 11:30:54,522-Speed 72.07 samples/sec   Loss 2.6035   LearningRate 0.0426   Epoch: 6   Global Step: 116010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:30:57,582-Speed 3347.12 samples/sec   Loss 2.5538   LearningRate 0.0426   Epoch: 6   Global Step: 116020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:00,672-Speed 3314.43 samples/sec   Loss 2.5318   LearningRate 0.0426   Epoch: 6   Global Step: 116030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:03,735-Speed 3345.03 samples/sec   Loss 2.5618   LearningRate 0.0426   Epoch: 6   Global Step: 116040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:06,787-Speed 3355.20 samples/sec   Loss 2.5832   LearningRate 0.0426   Epoch: 6   Global Step: 116050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:09,844-Speed 3351.14 samples/sec   Loss 2.5205   LearningRate 0.0426   Epoch: 6   Global Step: 116060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:12,909-Speed 3341.55 samples/sec   Loss 2.5636   LearningRate 0.0425   Epoch: 6   Global Step: 116070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:15,985-Speed 3329.85 samples/sec   Loss 2.5531   LearningRate 0.0425   Epoch: 6   Global Step: 116080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:19,075-Speed 3313.79 samples/sec   Loss 2.5496   LearningRate 0.0425   Epoch: 6   Global Step: 116090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:22,138-Speed 3345.13 samples/sec   Loss 2.5615   LearningRate 0.0425   Epoch: 6   Global Step: 116100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:25,225-Speed 3317.77 samples/sec   Loss 2.5396   LearningRate 0.0425   Epoch: 6   Global Step: 116110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:28,282-Speed 3350.88 samples/sec   Loss 2.5114   LearningRate 0.0425   Epoch: 6   Global Step: 116120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:31,338-Speed 3351.32 samples/sec   Loss 2.5022   LearningRate 0.0425   Epoch: 6   Global Step: 116130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:34,397-Speed 3348.47 samples/sec   Loss 2.5237   LearningRate 0.0425   Epoch: 6   Global Step: 116140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:37,460-Speed 3343.83 samples/sec   Loss 2.5485   LearningRate 0.0425   Epoch: 6   Global Step: 116150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:40,523-Speed 3343.85 samples/sec   Loss 2.4420   LearningRate 0.0425   Epoch: 6   Global Step: 116160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:43,606-Speed 3321.54 samples/sec   Loss 2.5612   LearningRate 0.0425   Epoch: 6   Global Step: 116170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:46,681-Speed 3331.64 samples/sec   Loss 2.5550   LearningRate 0.0425   Epoch: 6   Global Step: 116180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:49,745-Speed 3343.12 samples/sec   Loss 2.5002   LearningRate 0.0425   Epoch: 6   Global Step: 116190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:52,888-Speed 3258.40 samples/sec   Loss 2.5609   LearningRate 0.0425   Epoch: 6   Global Step: 116200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:55,958-Speed 3337.07 samples/sec   Loss 2.5413   LearningRate 0.0425   Epoch: 6   Global Step: 116210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:31:59,024-Speed 3340.56 samples/sec   Loss 2.4910   LearningRate 0.0425   Epoch: 6   Global Step: 116220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:32:02,081-Speed 3350.56 samples/sec   Loss 2.4935   LearningRate 0.0425   Epoch: 6   Global Step: 116230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:32:05,146-Speed 3342.33 samples/sec   Loss 2.4770   LearningRate 0.0425   Epoch: 6   Global Step: 116240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:32:08,193-Speed 3361.52 samples/sec   Loss 2.4302   LearningRate 0.0425   Epoch: 6   Global Step: 116250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:11,266-Speed 3332.14 samples/sec   Loss 2.5005   LearningRate 0.0425   Epoch: 6   Global Step: 116260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:14,413-Speed 3254.74 samples/sec   Loss 2.5152   LearningRate 0.0425   Epoch: 6   Global Step: 116270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:17,490-Speed 3329.85 samples/sec   Loss 2.5706   LearningRate 0.0425   Epoch: 6   Global Step: 116280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:20,560-Speed 3335.96 samples/sec   Loss 2.4968   LearningRate 0.0425   Epoch: 6   Global Step: 116290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:23,622-Speed 3344.29 samples/sec   Loss 2.5087   LearningRate 0.0425   Epoch: 6   Global Step: 116300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:26,697-Speed 3331.49 samples/sec   Loss 2.4680   LearningRate 0.0425   Epoch: 6   Global Step: 116310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:29,766-Speed 3337.60 samples/sec   Loss 2.4779   LearningRate 0.0425   Epoch: 6   Global Step: 116320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:32,836-Speed 3335.78 samples/sec   Loss 2.4631   LearningRate 0.0424   Epoch: 6   Global Step: 116330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:35,910-Speed 3332.34 samples/sec   Loss 2.4799   LearningRate 0.0424   Epoch: 6   Global Step: 116340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:32:38,976-Speed 3340.89 samples/sec   Loss 2.4549   LearningRate 0.0424   Epoch: 6   Global Step: 116350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:32:42,036-Speed 3346.78 samples/sec   Loss 2.5363   LearningRate 0.0424   Epoch: 6   Global Step: 116360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:32:45,101-Speed 3341.66 samples/sec   Loss 2.5373   LearningRate 0.0424   Epoch: 6   Global Step: 116370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:32:48,169-Speed 3338.76 samples/sec   Loss 2.5348   LearningRate 0.0424   Epoch: 6   Global Step: 116380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:32:51,249-Speed 3325.30 samples/sec   Loss 2.4565   LearningRate 0.0424   Epoch: 6   Global Step: 116390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:32:54,321-Speed 3334.57 samples/sec   Loss 2.5803   LearningRate 0.0424   Epoch: 6   Global Step: 116400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:32:57,421-Speed 3304.02 samples/sec   Loss 2.5203   LearningRate 0.0424   Epoch: 6   Global Step: 116410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:33:00,503-Speed 3323.58 samples/sec   Loss 2.5469   LearningRate 0.0424   Epoch: 6   Global Step: 116420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:33:03,561-Speed 3349.04 samples/sec   Loss 2.5581   LearningRate 0.0424   Epoch: 6   Global Step: 116430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:33:06,635-Speed 3331.37 samples/sec   Loss 2.4977   LearningRate 0.0424   Epoch: 6   Global Step: 116440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:09,693-Speed 3349.91 samples/sec   Loss 2.5181   LearningRate 0.0424   Epoch: 6   Global Step: 116450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:12,754-Speed 3346.87 samples/sec   Loss 2.4320   LearningRate 0.0424   Epoch: 6   Global Step: 116460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:15,816-Speed 3344.15 samples/sec   Loss 2.5495   LearningRate 0.0424   Epoch: 6   Global Step: 116470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:18,892-Speed 3330.34 samples/sec   Loss 2.5232   LearningRate 0.0424   Epoch: 6   Global Step: 116480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:21,947-Speed 3352.22 samples/sec   Loss 2.4656   LearningRate 0.0424   Epoch: 6   Global Step: 116490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:25,073-Speed 3276.86 samples/sec   Loss 2.4582   LearningRate 0.0424   Epoch: 6   Global Step: 116500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:28,153-Speed 3325.67 samples/sec   Loss 2.5752   LearningRate 0.0424   Epoch: 6   Global Step: 116510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:31,248-Speed 3308.80 samples/sec   Loss 2.5046   LearningRate 0.0424   Epoch: 6   Global Step: 116520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:34,308-Speed 3347.87 samples/sec   Loss 2.4926   LearningRate 0.0424   Epoch: 6   Global Step: 116530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:37,375-Speed 3340.15 samples/sec   Loss 2.5188   LearningRate 0.0424   Epoch: 6   Global Step: 116540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:33:40,451-Speed 3329.35 samples/sec   Loss 2.5901   LearningRate 0.0424   Epoch: 6   Global Step: 116550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:33:43,515-Speed 3342.85 samples/sec   Loss 2.5339   LearningRate 0.0424   Epoch: 6   Global Step: 116560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:33:46,577-Speed 3345.64 samples/sec   Loss 2.5044   LearningRate 0.0424   Epoch: 6   Global Step: 116570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:33:49,646-Speed 3336.61 samples/sec   Loss 2.4830   LearningRate 0.0424   Epoch: 6   Global Step: 116580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:33:52,746-Speed 3303.97 samples/sec   Loss 2.5687   LearningRate 0.0423   Epoch: 6   Global Step: 116590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:33:55,804-Speed 3349.02 samples/sec   Loss 2.6028   LearningRate 0.0423   Epoch: 6   Global Step: 116600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:33:58,866-Speed 3345.72 samples/sec   Loss 2.5775   LearningRate 0.0423   Epoch: 6   Global Step: 116610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:34:01,955-Speed 3315.86 samples/sec   Loss 2.5339   LearningRate 0.0423   Epoch: 6   Global Step: 116620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:34:05,021-Speed 3340.32 samples/sec   Loss 2.5642   LearningRate 0.0423   Epoch: 6   Global Step: 116630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:34:08,096-Speed 3331.64 samples/sec   Loss 2.4814   LearningRate 0.0423   Epoch: 6   Global Step: 116640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:34:11,178-Speed 3322.76 samples/sec   Loss 2.5422   LearningRate 0.0423   Epoch: 6   Global Step: 116650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:34:14,256-Speed 3327.96 samples/sec   Loss 2.5554   LearningRate 0.0423   Epoch: 6   Global Step: 116660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:34:17,315-Speed 3348.28 samples/sec   Loss 2.4578   LearningRate 0.0423   Epoch: 6   Global Step: 116670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:34:20,382-Speed 3339.51 samples/sec   Loss 2.5890   LearningRate 0.0423   Epoch: 6   Global Step: 116680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:34:23,490-Speed 3295.16 samples/sec   Loss 2.5532   LearningRate 0.0423   Epoch: 6   Global Step: 116690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:34:26,576-Speed 3319.47 samples/sec   Loss 2.5158   LearningRate 0.0423   Epoch: 6   Global Step: 116700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:29,711-Speed 3266.79 samples/sec   Loss 2.4823   LearningRate 0.0423   Epoch: 6   Global Step: 116710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:32,790-Speed 3327.16 samples/sec   Loss 2.4368   LearningRate 0.0423   Epoch: 6   Global Step: 116720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:35,857-Speed 3339.28 samples/sec   Loss 2.4335   LearningRate 0.0423   Epoch: 6   Global Step: 116730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:38,917-Speed 3346.81 samples/sec   Loss 2.4933   LearningRate 0.0423   Epoch: 6   Global Step: 116740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:42,012-Speed 3310.22 samples/sec   Loss 2.5111   LearningRate 0.0423   Epoch: 6   Global Step: 116750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:45,072-Speed 3346.49 samples/sec   Loss 2.5813   LearningRate 0.0423   Epoch: 6   Global Step: 116760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:48,146-Speed 3331.96 samples/sec   Loss 2.5251   LearningRate 0.0423   Epoch: 6   Global Step: 116770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:51,205-Speed 3348.40 samples/sec   Loss 2.5302   LearningRate 0.0423   Epoch: 6   Global Step: 116780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:54,299-Speed 3309.77 samples/sec   Loss 2.5501   LearningRate 0.0423   Epoch: 6   Global Step: 116790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:34:57,375-Speed 3330.74 samples/sec   Loss 2.5451   LearningRate 0.0423   Epoch: 6   Global Step: 116800   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:35:00,480-Speed 3298.68 samples/sec   Loss 2.5083   LearningRate 0.0423   Epoch: 6   Global Step: 116810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:35:03,549-Speed 3337.43 samples/sec   Loss 2.5003   LearningRate 0.0423   Epoch: 6   Global Step: 116820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:35:06,865-Speed 3088.58 samples/sec   Loss 2.5293   LearningRate 0.0423   Epoch: 6   Global Step: 116830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:35:38,349-Speed 325.26 samples/sec   Loss 2.3027   LearningRate 0.0422   Epoch: 7   Global Step: 116840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:35:42,165-Speed 2684.30 samples/sec   Loss 2.0141   LearningRate 0.0422   Epoch: 7   Global Step: 116850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:35:46,202-Speed 2537.26 samples/sec   Loss 1.9981   LearningRate 0.0422   Epoch: 7   Global Step: 116860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:35:49,280-Speed 3327.43 samples/sec   Loss 1.9665   LearningRate 0.0422   Epoch: 7   Global Step: 116870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:35:52,375-Speed 3309.73 samples/sec   Loss 1.9888   LearningRate 0.0422   Epoch: 7   Global Step: 116880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:35:55,510-Speed 3269.99 samples/sec   Loss 1.9469   LearningRate 0.0422   Epoch: 7   Global Step: 116890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:35:58,638-Speed 3273.78 samples/sec   Loss 1.9443   LearningRate 0.0422   Epoch: 7   Global Step: 116900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:36:01,720-Speed 3324.54 samples/sec   Loss 1.9601   LearningRate 0.0422   Epoch: 7   Global Step: 116910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:36:04,840-Speed 3283.24 samples/sec   Loss 1.9170   LearningRate 0.0422   Epoch: 7   Global Step: 116920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:36:07,963-Speed 3280.38 samples/sec   Loss 1.9411   LearningRate 0.0422   Epoch: 7   Global Step: 116930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:11,043-Speed 3325.32 samples/sec   Loss 1.9593   LearningRate 0.0422   Epoch: 7   Global Step: 116940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:14,165-Speed 3281.87 samples/sec   Loss 1.9823   LearningRate 0.0422   Epoch: 7   Global Step: 116950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:17,285-Speed 3284.02 samples/sec   Loss 1.9460   LearningRate 0.0422   Epoch: 7   Global Step: 116960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:20,458-Speed 3227.97 samples/sec   Loss 1.8951   LearningRate 0.0422   Epoch: 7   Global Step: 116970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:23,603-Speed 3257.53 samples/sec   Loss 1.8981   LearningRate 0.0422   Epoch: 7   Global Step: 116980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:27,467-Speed 2651.20 samples/sec   Loss 1.9210   LearningRate 0.0422   Epoch: 7   Global Step: 116990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:30,600-Speed 3270.22 samples/sec   Loss 1.9467   LearningRate 0.0422   Epoch: 7   Global Step: 117000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:33,722-Speed 3281.15 samples/sec   Loss 1.9593   LearningRate 0.0422   Epoch: 7   Global Step: 117010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:36,836-Speed 3289.56 samples/sec   Loss 2.0148   LearningRate 0.0422   Epoch: 7   Global Step: 117020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:36:39,971-Speed 3267.43 samples/sec   Loss 1.9393   LearningRate 0.0422   Epoch: 7   Global Step: 117030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:36:43,152-Speed 3220.61 samples/sec   Loss 1.9350   LearningRate 0.0422   Epoch: 7   Global Step: 117040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:36:46,234-Speed 3324.16 samples/sec   Loss 1.9828   LearningRate 0.0422   Epoch: 7   Global Step: 117050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:36:49,356-Speed 3281.14 samples/sec   Loss 1.9688   LearningRate 0.0422   Epoch: 7   Global Step: 117060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:36:52,525-Speed 3233.07 samples/sec   Loss 1.9845   LearningRate 0.0422   Epoch: 7   Global Step: 117070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:36:55,608-Speed 3322.82 samples/sec   Loss 1.9095   LearningRate 0.0422   Epoch: 7   Global Step: 117080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:36:58,776-Speed 3233.07 samples/sec   Loss 1.9278   LearningRate 0.0422   Epoch: 7   Global Step: 117090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:01,866-Speed 3315.96 samples/sec   Loss 1.9602   LearningRate 0.0421   Epoch: 7   Global Step: 117100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:04,944-Speed 3328.53 samples/sec   Loss 1.9850   LearningRate 0.0421   Epoch: 7   Global Step: 117110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:08,060-Speed 3288.09 samples/sec   Loss 1.9757   LearningRate 0.0421   Epoch: 7   Global Step: 117120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:11,142-Speed 3324.04 samples/sec   Loss 1.9471   LearningRate 0.0421   Epoch: 7   Global Step: 117130   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:37:14,231-Speed 3315.63 samples/sec   Loss 1.9530   LearningRate 0.0421   Epoch: 7   Global Step: 117140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:17,331-Speed 3304.95 samples/sec   Loss 1.9173   LearningRate 0.0421   Epoch: 7   Global Step: 117150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:20,415-Speed 3322.44 samples/sec   Loss 1.9828   LearningRate 0.0421   Epoch: 7   Global Step: 117160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:23,531-Speed 3287.16 samples/sec   Loss 1.9199   LearningRate 0.0421   Epoch: 7   Global Step: 117170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:26,628-Speed 3307.31 samples/sec   Loss 1.9630   LearningRate 0.0421   Epoch: 7   Global Step: 117180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:29,704-Speed 3330.98 samples/sec   Loss 1.9769   LearningRate 0.0421   Epoch: 7   Global Step: 117190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:32,773-Speed 3336.79 samples/sec   Loss 1.9334   LearningRate 0.0421   Epoch: 7   Global Step: 117200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:35,899-Speed 3277.51 samples/sec   Loss 1.9776   LearningRate 0.0421   Epoch: 7   Global Step: 117210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:38,996-Speed 3307.89 samples/sec   Loss 1.8948   LearningRate 0.0421   Epoch: 7   Global Step: 117220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:42,101-Speed 3298.80 samples/sec   Loss 1.9523   LearningRate 0.0421   Epoch: 7   Global Step: 117230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:45,217-Speed 3287.85 samples/sec   Loss 1.9928   LearningRate 0.0421   Epoch: 7   Global Step: 117240   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:37:48,384-Speed 3234.67 samples/sec   Loss 1.9921   LearningRate 0.0421   Epoch: 7   Global Step: 117250   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:37:51,485-Speed 3303.96 samples/sec   Loss 1.9781   LearningRate 0.0421   Epoch: 7   Global Step: 117260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:54,623-Speed 3265.24 samples/sec   Loss 1.9685   LearningRate 0.0421   Epoch: 7   Global Step: 117270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:37:57,721-Speed 3307.08 samples/sec   Loss 2.0007   LearningRate 0.0421   Epoch: 7   Global Step: 117280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:00,796-Speed 3331.52 samples/sec   Loss 1.9375   LearningRate 0.0421   Epoch: 7   Global Step: 117290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:03,949-Speed 3248.57 samples/sec   Loss 1.9767   LearningRate 0.0421   Epoch: 7   Global Step: 117300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:07,049-Speed 3304.48 samples/sec   Loss 2.0099   LearningRate 0.0421   Epoch: 7   Global Step: 117310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:10,181-Speed 3271.41 samples/sec   Loss 1.9909   LearningRate 0.0421   Epoch: 7   Global Step: 117320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:13,363-Speed 3219.53 samples/sec   Loss 1.9237   LearningRate 0.0421   Epoch: 7   Global Step: 117330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:16,475-Speed 3291.30 samples/sec   Loss 1.9849   LearningRate 0.0421   Epoch: 7   Global Step: 117340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:19,591-Speed 3287.45 samples/sec   Loss 1.9646   LearningRate 0.0421   Epoch: 7   Global Step: 117350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:22,686-Speed 3309.38 samples/sec   Loss 1.9915   LearningRate 0.0420   Epoch: 7   Global Step: 117360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:25,754-Speed 3340.09 samples/sec   Loss 2.0831   LearningRate 0.0420   Epoch: 7   Global Step: 117370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:28,835-Speed 3323.81 samples/sec   Loss 2.0260   LearningRate 0.0420   Epoch: 7   Global Step: 117380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:31,931-Speed 3309.45 samples/sec   Loss 1.9667   LearningRate 0.0420   Epoch: 7   Global Step: 117390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:35,020-Speed 3315.81 samples/sec   Loss 2.0369   LearningRate 0.0420   Epoch: 7   Global Step: 117400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:38,097-Speed 3329.70 samples/sec   Loss 1.9594   LearningRate 0.0420   Epoch: 7   Global Step: 117410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:41,167-Speed 3336.06 samples/sec   Loss 2.0100   LearningRate 0.0420   Epoch: 7   Global Step: 117420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:38:44,349-Speed 3220.24 samples/sec   Loss 1.9555   LearningRate 0.0420   Epoch: 7   Global Step: 117430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:47,489-Speed 3262.21 samples/sec   Loss 2.0122   LearningRate 0.0420   Epoch: 7   Global Step: 117440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:50,615-Speed 3277.08 samples/sec   Loss 1.9630   LearningRate 0.0420   Epoch: 7   Global Step: 117450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:53,741-Speed 3276.87 samples/sec   Loss 1.9403   LearningRate 0.0420   Epoch: 7   Global Step: 117460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:56,842-Speed 3302.85 samples/sec   Loss 1.9403   LearningRate 0.0420   Epoch: 7   Global Step: 117470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:38:59,915-Speed 3334.10 samples/sec   Loss 2.0136   LearningRate 0.0420   Epoch: 7   Global Step: 117480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:02,996-Speed 3324.31 samples/sec   Loss 1.9745   LearningRate 0.0420   Epoch: 7   Global Step: 117490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:06,138-Speed 3260.75 samples/sec   Loss 1.9701   LearningRate 0.0420   Epoch: 7   Global Step: 117500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:09,246-Speed 3295.77 samples/sec   Loss 2.0519   LearningRate 0.0420   Epoch: 7   Global Step: 117510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:12,323-Speed 3329.12 samples/sec   Loss 1.9820   LearningRate 0.0420   Epoch: 7   Global Step: 117520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:15,409-Speed 3319.53 samples/sec   Loss 2.0173   LearningRate 0.0420   Epoch: 7   Global Step: 117530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:18,514-Speed 3300.62 samples/sec   Loss 1.9943   LearningRate 0.0420   Epoch: 7   Global Step: 117540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:21,616-Speed 3302.11 samples/sec   Loss 2.0197   LearningRate 0.0420   Epoch: 7   Global Step: 117550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:24,734-Speed 3284.73 samples/sec   Loss 1.9816   LearningRate 0.0420   Epoch: 7   Global Step: 117560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:27,884-Speed 3251.97 samples/sec   Loss 1.9270   LearningRate 0.0420   Epoch: 7   Global Step: 117570   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:39:31,008-Speed 3279.62 samples/sec   Loss 1.9538   LearningRate 0.0420   Epoch: 7   Global Step: 117580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:39:34,103-Speed 3309.67 samples/sec   Loss 2.0378   LearningRate 0.0420   Epoch: 7   Global Step: 117590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:39:37,199-Speed 3309.10 samples/sec   Loss 2.0686   LearningRate 0.0420   Epoch: 7   Global Step: 117600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:39:40,284-Speed 3320.72 samples/sec   Loss 1.9817   LearningRate 0.0419   Epoch: 7   Global Step: 117610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:39:43,403-Speed 3284.47 samples/sec   Loss 2.0003   LearningRate 0.0419   Epoch: 7   Global Step: 117620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:39:46,475-Speed 3334.22 samples/sec   Loss 1.9960   LearningRate 0.0419   Epoch: 7   Global Step: 117630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:39:49,581-Speed 3298.59 samples/sec   Loss 2.0263   LearningRate 0.0419   Epoch: 7   Global Step: 117640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:39:52,665-Speed 3321.66 samples/sec   Loss 1.9712   LearningRate 0.0419   Epoch: 7   Global Step: 117650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:39:55,775-Speed 3294.97 samples/sec   Loss 2.0394   LearningRate 0.0419   Epoch: 7   Global Step: 117660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:39:58,893-Speed 3285.54 samples/sec   Loss 2.0249   LearningRate 0.0419   Epoch: 7   Global Step: 117670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:02,024-Speed 3271.47 samples/sec   Loss 1.9961   LearningRate 0.0419   Epoch: 7   Global Step: 117680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:05,180-Speed 3246.83 samples/sec   Loss 1.9849   LearningRate 0.0419   Epoch: 7   Global Step: 117690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:08,261-Speed 3325.82 samples/sec   Loss 2.0450   LearningRate 0.0419   Epoch: 7   Global Step: 117700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:11,366-Speed 3298.65 samples/sec   Loss 2.0247   LearningRate 0.0419   Epoch: 7   Global Step: 117710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:14,491-Speed 3278.03 samples/sec   Loss 1.9936   LearningRate 0.0419   Epoch: 7   Global Step: 117720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:17,621-Speed 3273.76 samples/sec   Loss 2.0637   LearningRate 0.0419   Epoch: 7   Global Step: 117730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:20,786-Speed 3235.39 samples/sec   Loss 1.9959   LearningRate 0.0419   Epoch: 7   Global Step: 117740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:23,886-Speed 3305.22 samples/sec   Loss 2.0814   LearningRate 0.0419   Epoch: 7   Global Step: 117750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:26,993-Speed 3296.79 samples/sec   Loss 2.0422   LearningRate 0.0419   Epoch: 7   Global Step: 117760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:30,067-Speed 3332.71 samples/sec   Loss 1.9782   LearningRate 0.0419   Epoch: 7   Global Step: 117770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:33,155-Speed 3317.24 samples/sec   Loss 1.9961   LearningRate 0.0419   Epoch: 7   Global Step: 117780   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:40:36,283-Speed 3274.74 samples/sec   Loss 1.9445   LearningRate 0.0419   Epoch: 7   Global Step: 117790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:39,397-Speed 3289.90 samples/sec   Loss 2.0698   LearningRate 0.0419   Epoch: 7   Global Step: 117800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:42,475-Speed 3327.41 samples/sec   Loss 2.0442   LearningRate 0.0419   Epoch: 7   Global Step: 117810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:45,553-Speed 3328.57 samples/sec   Loss 2.0247   LearningRate 0.0419   Epoch: 7   Global Step: 117820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:48,654-Speed 3302.98 samples/sec   Loss 2.0368   LearningRate 0.0419   Epoch: 7   Global Step: 117830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:51,739-Speed 3320.03 samples/sec   Loss 2.0761   LearningRate 0.0419   Epoch: 7   Global Step: 117840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:54,879-Speed 3263.37 samples/sec   Loss 2.0514   LearningRate 0.0419   Epoch: 7   Global Step: 117850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:40:57,946-Speed 3339.04 samples/sec   Loss 2.0274   LearningRate 0.0419   Epoch: 7   Global Step: 117860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:01,029-Speed 3323.51 samples/sec   Loss 2.1014   LearningRate 0.0418   Epoch: 7   Global Step: 117870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:04,136-Speed 3296.25 samples/sec   Loss 1.9863   LearningRate 0.0418   Epoch: 7   Global Step: 117880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:07,208-Speed 3335.12 samples/sec   Loss 2.0621   LearningRate 0.0418   Epoch: 7   Global Step: 117890   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:41:10,288-Speed 3325.34 samples/sec   Loss 1.9837   LearningRate 0.0418   Epoch: 7   Global Step: 117900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:13,380-Speed 3311.83 samples/sec   Loss 2.0606   LearningRate 0.0418   Epoch: 7   Global Step: 117910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:16,496-Speed 3287.01 samples/sec   Loss 2.0778   LearningRate 0.0418   Epoch: 7   Global Step: 117920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:19,586-Speed 3316.43 samples/sec   Loss 2.0855   LearningRate 0.0418   Epoch: 7   Global Step: 117930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:22,661-Speed 3331.16 samples/sec   Loss 2.0499   LearningRate 0.0418   Epoch: 7   Global Step: 117940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:25,743-Speed 3323.16 samples/sec   Loss 2.0612   LearningRate 0.0418   Epoch: 7   Global Step: 117950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:28,847-Speed 3300.56 samples/sec   Loss 1.9815   LearningRate 0.0418   Epoch: 7   Global Step: 117960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:31,935-Speed 3317.34 samples/sec   Loss 2.0140   LearningRate 0.0418   Epoch: 7   Global Step: 117970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:35,012-Speed 3329.80 samples/sec   Loss 2.0002   LearningRate 0.0418   Epoch: 7   Global Step: 117980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:41:38,113-Speed 3302.91 samples/sec   Loss 2.0279   LearningRate 0.0418   Epoch: 7   Global Step: 117990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:41:41,187-Speed 3332.28 samples/sec   Loss 1.9962   LearningRate 0.0418   Epoch: 7   Global Step: 118000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:42:25,913-[lfw][118000]XNorm: 21.397800
Training: 2022-04-11 11:42:25,914-[lfw][118000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 11:42:25,914-[lfw][118000]Accuracy-Highest: 0.99817
Training: 2022-04-11 11:43:17,005-[cfp_fp][118000]XNorm: 20.749756
Training: 2022-04-11 11:43:17,006-[cfp_fp][118000]Accuracy-Flip: 0.98529+-0.00642
Training: 2022-04-11 11:43:17,007-[cfp_fp][118000]Accuracy-Highest: 0.98643
Training: 2022-04-11 11:44:00,937-[agedb_30][118000]XNorm: 21.849119
Training: 2022-04-11 11:44:00,938-[agedb_30][118000]Accuracy-Flip: 0.98133+-0.00737
Training: 2022-04-11 11:44:00,938-[agedb_30][118000]Accuracy-Highest: 0.98317
Training: 2022-04-11 11:44:04,039-Speed 71.68 samples/sec   Loss 2.0547   LearningRate 0.0418   Epoch: 7   Global Step: 118010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:07,093-Speed 3353.75 samples/sec   Loss 2.0732   LearningRate 0.0418   Epoch: 7   Global Step: 118020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:10,165-Speed 3335.36 samples/sec   Loss 2.0541   LearningRate 0.0418   Epoch: 7   Global Step: 118030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:13,216-Speed 3356.91 samples/sec   Loss 1.9929   LearningRate 0.0418   Epoch: 7   Global Step: 118040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:16,273-Speed 3349.81 samples/sec   Loss 2.0743   LearningRate 0.0418   Epoch: 7   Global Step: 118050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:19,421-Speed 3253.88 samples/sec   Loss 2.0474   LearningRate 0.0418   Epoch: 7   Global Step: 118060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:22,480-Speed 3349.59 samples/sec   Loss 2.0536   LearningRate 0.0418   Epoch: 7   Global Step: 118070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:25,538-Speed 3349.65 samples/sec   Loss 2.0886   LearningRate 0.0418   Epoch: 7   Global Step: 118080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:28,657-Speed 3284.65 samples/sec   Loss 2.0614   LearningRate 0.0418   Epoch: 7   Global Step: 118090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:44:31,752-Speed 3309.52 samples/sec   Loss 2.0137   LearningRate 0.0418   Epoch: 7   Global Step: 118100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:44:34,834-Speed 3322.53 samples/sec   Loss 2.0231   LearningRate 0.0418   Epoch: 7   Global Step: 118110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:44:37,961-Speed 3276.57 samples/sec   Loss 2.0336   LearningRate 0.0418   Epoch: 7   Global Step: 118120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:44:41,034-Speed 3333.38 samples/sec   Loss 2.1014   LearningRate 0.0417   Epoch: 7   Global Step: 118130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:44,123-Speed 3316.83 samples/sec   Loss 2.0416   LearningRate 0.0417   Epoch: 7   Global Step: 118140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:47,211-Speed 3316.33 samples/sec   Loss 2.1024   LearningRate 0.0417   Epoch: 7   Global Step: 118150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:50,285-Speed 3332.43 samples/sec   Loss 2.0151   LearningRate 0.0417   Epoch: 7   Global Step: 118160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:53,345-Speed 3347.02 samples/sec   Loss 2.0559   LearningRate 0.0417   Epoch: 7   Global Step: 118170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:56,434-Speed 3315.94 samples/sec   Loss 2.1434   LearningRate 0.0417   Epoch: 7   Global Step: 118180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:44:59,509-Speed 3331.30 samples/sec   Loss 2.0497   LearningRate 0.0417   Epoch: 7   Global Step: 118190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:45:02,640-Speed 3271.91 samples/sec   Loss 2.1001   LearningRate 0.0417   Epoch: 7   Global Step: 118200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:45:05,746-Speed 3297.15 samples/sec   Loss 2.0497   LearningRate 0.0417   Epoch: 7   Global Step: 118210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:45:08,807-Speed 3346.43 samples/sec   Loss 2.0369   LearningRate 0.0417   Epoch: 7   Global Step: 118220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:45:11,881-Speed 3331.89 samples/sec   Loss 2.0668   LearningRate 0.0417   Epoch: 7   Global Step: 118230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:15,003-Speed 3280.75 samples/sec   Loss 2.0465   LearningRate 0.0417   Epoch: 7   Global Step: 118240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:18,110-Speed 3296.47 samples/sec   Loss 2.0317   LearningRate 0.0417   Epoch: 7   Global Step: 118250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:21,200-Speed 3315.04 samples/sec   Loss 2.0339   LearningRate 0.0417   Epoch: 7   Global Step: 118260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:24,312-Speed 3291.93 samples/sec   Loss 2.0629   LearningRate 0.0417   Epoch: 7   Global Step: 118270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:27,388-Speed 3329.61 samples/sec   Loss 2.0656   LearningRate 0.0417   Epoch: 7   Global Step: 118280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:30,449-Speed 3345.84 samples/sec   Loss 2.0703   LearningRate 0.0417   Epoch: 7   Global Step: 118290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:33,541-Speed 3313.93 samples/sec   Loss 2.0564   LearningRate 0.0417   Epoch: 7   Global Step: 118300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:36,616-Speed 3330.74 samples/sec   Loss 2.0918   LearningRate 0.0417   Epoch: 7   Global Step: 118310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:39,686-Speed 3335.98 samples/sec   Loss 2.0877   LearningRate 0.0417   Epoch: 7   Global Step: 118320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:42,740-Speed 3353.42 samples/sec   Loss 2.0977   LearningRate 0.0417   Epoch: 7   Global Step: 118330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:45,891-Speed 3251.51 samples/sec   Loss 2.1046   LearningRate 0.0417   Epoch: 7   Global Step: 118340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:49,030-Speed 3263.02 samples/sec   Loss 2.0864   LearningRate 0.0417   Epoch: 7   Global Step: 118350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:52,129-Speed 3304.12 samples/sec   Loss 2.1200   LearningRate 0.0417   Epoch: 7   Global Step: 118360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:55,200-Speed 3335.38 samples/sec   Loss 2.0694   LearningRate 0.0417   Epoch: 7   Global Step: 118370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:45:58,270-Speed 3337.56 samples/sec   Loss 2.0813   LearningRate 0.0417   Epoch: 7   Global Step: 118380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:01,366-Speed 3308.28 samples/sec   Loss 2.0382   LearningRate 0.0416   Epoch: 7   Global Step: 118390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:04,444-Speed 3327.96 samples/sec   Loss 2.0582   LearningRate 0.0416   Epoch: 7   Global Step: 118400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:07,516-Speed 3334.03 samples/sec   Loss 2.0729   LearningRate 0.0416   Epoch: 7   Global Step: 118410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:10,591-Speed 3331.46 samples/sec   Loss 2.0828   LearningRate 0.0416   Epoch: 7   Global Step: 118420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:13,663-Speed 3334.03 samples/sec   Loss 2.0821   LearningRate 0.0416   Epoch: 7   Global Step: 118430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:16,771-Speed 3295.23 samples/sec   Loss 2.1309   LearningRate 0.0416   Epoch: 7   Global Step: 118440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:19,916-Speed 3256.65 samples/sec   Loss 2.0370   LearningRate 0.0416   Epoch: 7   Global Step: 118450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:22,992-Speed 3330.10 samples/sec   Loss 2.0291   LearningRate 0.0416   Epoch: 7   Global Step: 118460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:26,066-Speed 3332.08 samples/sec   Loss 2.1091   LearningRate 0.0416   Epoch: 7   Global Step: 118470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:29,140-Speed 3331.80 samples/sec   Loss 2.0812   LearningRate 0.0416   Epoch: 7   Global Step: 118480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:32,210-Speed 3336.74 samples/sec   Loss 2.0652   LearningRate 0.0416   Epoch: 7   Global Step: 118490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:35,280-Speed 3335.53 samples/sec   Loss 2.0295   LearningRate 0.0416   Epoch: 7   Global Step: 118500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:38,352-Speed 3334.96 samples/sec   Loss 2.1087   LearningRate 0.0416   Epoch: 7   Global Step: 118510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:41,483-Speed 3271.54 samples/sec   Loss 2.0630   LearningRate 0.0416   Epoch: 7   Global Step: 118520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:44,566-Speed 3322.66 samples/sec   Loss 2.1335   LearningRate 0.0416   Epoch: 7   Global Step: 118530   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:46:47,674-Speed 3295.67 samples/sec   Loss 2.0688   LearningRate 0.0416   Epoch: 7   Global Step: 118540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:50,746-Speed 3333.84 samples/sec   Loss 2.0042   LearningRate 0.0416   Epoch: 7   Global Step: 118550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:53,815-Speed 3337.25 samples/sec   Loss 2.1294   LearningRate 0.0416   Epoch: 7   Global Step: 118560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:56,887-Speed 3334.90 samples/sec   Loss 2.0809   LearningRate 0.0416   Epoch: 7   Global Step: 118570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:46:59,943-Speed 3350.67 samples/sec   Loss 2.1171   LearningRate 0.0416   Epoch: 7   Global Step: 118580   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:03,048-Speed 3298.63 samples/sec   Loss 2.0995   LearningRate 0.0416   Epoch: 7   Global Step: 118590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:06,217-Speed 3233.19 samples/sec   Loss 2.0586   LearningRate 0.0416   Epoch: 7   Global Step: 118600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:09,300-Speed 3322.90 samples/sec   Loss 2.0818   LearningRate 0.0416   Epoch: 7   Global Step: 118610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:12,372-Speed 3334.28 samples/sec   Loss 2.0679   LearningRate 0.0416   Epoch: 7   Global Step: 118620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:15,515-Speed 3258.86 samples/sec   Loss 2.1037   LearningRate 0.0416   Epoch: 7   Global Step: 118630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:18,599-Speed 3321.21 samples/sec   Loss 2.0504   LearningRate 0.0416   Epoch: 7   Global Step: 118640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:21,858-Speed 3142.12 samples/sec   Loss 2.0040   LearningRate 0.0415   Epoch: 7   Global Step: 118650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:24,972-Speed 3289.61 samples/sec   Loss 2.1129   LearningRate 0.0415   Epoch: 7   Global Step: 118660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:28,041-Speed 3337.94 samples/sec   Loss 2.1142   LearningRate 0.0415   Epoch: 7   Global Step: 118670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:31,240-Speed 3201.27 samples/sec   Loss 2.0658   LearningRate 0.0415   Epoch: 7   Global Step: 118680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:47:34,386-Speed 3256.24 samples/sec   Loss 2.0250   LearningRate 0.0415   Epoch: 7   Global Step: 118690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:47:37,487-Speed 3302.22 samples/sec   Loss 2.0865   LearningRate 0.0415   Epoch: 7   Global Step: 118700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:47:40,600-Speed 3291.27 samples/sec   Loss 2.0798   LearningRate 0.0415   Epoch: 7   Global Step: 118710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:47:43,717-Speed 3285.91 samples/sec   Loss 2.1134   LearningRate 0.0415   Epoch: 7   Global Step: 118720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:46,876-Speed 3242.10 samples/sec   Loss 2.0970   LearningRate 0.0415   Epoch: 7   Global Step: 118730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:50,096-Speed 3181.31 samples/sec   Loss 2.0967   LearningRate 0.0415   Epoch: 7   Global Step: 118740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:53,197-Speed 3303.18 samples/sec   Loss 2.1385   LearningRate 0.0415   Epoch: 7   Global Step: 118750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:56,269-Speed 3333.40 samples/sec   Loss 2.1361   LearningRate 0.0415   Epoch: 7   Global Step: 118760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:47:59,332-Speed 3344.31 samples/sec   Loss 2.1271   LearningRate 0.0415   Epoch: 7   Global Step: 118770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:02,405-Speed 3333.65 samples/sec   Loss 2.0901   LearningRate 0.0415   Epoch: 7   Global Step: 118780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:05,506-Speed 3302.62 samples/sec   Loss 2.0668   LearningRate 0.0415   Epoch: 7   Global Step: 118790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:08,576-Speed 3335.74 samples/sec   Loss 2.1330   LearningRate 0.0415   Epoch: 7   Global Step: 118800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:11,655-Speed 3326.54 samples/sec   Loss 2.0952   LearningRate 0.0415   Epoch: 7   Global Step: 118810   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:14,734-Speed 3327.73 samples/sec   Loss 2.1150   LearningRate 0.0415   Epoch: 7   Global Step: 118820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:17,854-Speed 3282.76 samples/sec   Loss 2.0822   LearningRate 0.0415   Epoch: 7   Global Step: 118830   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:20,932-Speed 3327.88 samples/sec   Loss 2.0970   LearningRate 0.0415   Epoch: 7   Global Step: 118840   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:24,015-Speed 3322.26 samples/sec   Loss 2.0849   LearningRate 0.0415   Epoch: 7   Global Step: 118850   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:27,088-Speed 3332.85 samples/sec   Loss 2.0862   LearningRate 0.0415   Epoch: 7   Global Step: 118860   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:48:30,190-Speed 3301.40 samples/sec   Loss 2.1563   LearningRate 0.0415   Epoch: 7   Global Step: 118870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:48:33,286-Speed 3308.45 samples/sec   Loss 2.1186   LearningRate 0.0415   Epoch: 7   Global Step: 118880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:48:36,446-Speed 3241.39 samples/sec   Loss 2.1718   LearningRate 0.0415   Epoch: 7   Global Step: 118890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:48:39,517-Speed 3335.18 samples/sec   Loss 2.1158   LearningRate 0.0415   Epoch: 7   Global Step: 118900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:48:42,587-Speed 3336.42 samples/sec   Loss 2.1317   LearningRate 0.0414   Epoch: 7   Global Step: 118910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:48:45,665-Speed 3328.25 samples/sec   Loss 2.1123   LearningRate 0.0414   Epoch: 7   Global Step: 118920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:48:48,738-Speed 3332.97 samples/sec   Loss 2.1038   LearningRate 0.0414   Epoch: 7   Global Step: 118930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:48:51,839-Speed 3303.08 samples/sec   Loss 2.1124   LearningRate 0.0414   Epoch: 7   Global Step: 118940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:48:54,934-Speed 3310.04 samples/sec   Loss 2.0808   LearningRate 0.0414   Epoch: 7   Global Step: 118950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:48:58,072-Speed 3263.57 samples/sec   Loss 2.0392   LearningRate 0.0414   Epoch: 7   Global Step: 118960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:01,178-Speed 3297.66 samples/sec   Loss 2.1637   LearningRate 0.0414   Epoch: 7   Global Step: 118970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:49:04,258-Speed 3325.32 samples/sec   Loss 2.0492   LearningRate 0.0414   Epoch: 7   Global Step: 118980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:49:07,454-Speed 3205.42 samples/sec   Loss 2.1249   LearningRate 0.0414   Epoch: 7   Global Step: 118990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:49:10,611-Speed 3244.39 samples/sec   Loss 2.0881   LearningRate 0.0414   Epoch: 7   Global Step: 119000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:49:13,732-Speed 3282.53 samples/sec   Loss 2.1558   LearningRate 0.0414   Epoch: 7   Global Step: 119010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:49:16,897-Speed 3236.50 samples/sec   Loss 2.1470   LearningRate 0.0414   Epoch: 7   Global Step: 119020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:20,078-Speed 3219.53 samples/sec   Loss 2.1930   LearningRate 0.0414   Epoch: 7   Global Step: 119030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:23,180-Speed 3302.22 samples/sec   Loss 2.1153   LearningRate 0.0414   Epoch: 7   Global Step: 119040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:26,249-Speed 3337.71 samples/sec   Loss 2.1332   LearningRate 0.0414   Epoch: 7   Global Step: 119050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:29,316-Speed 3339.42 samples/sec   Loss 2.1333   LearningRate 0.0414   Epoch: 7   Global Step: 119060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:32,379-Speed 3343.45 samples/sec   Loss 2.1524   LearningRate 0.0414   Epoch: 7   Global Step: 119070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:35,451-Speed 3334.00 samples/sec   Loss 2.1056   LearningRate 0.0414   Epoch: 7   Global Step: 119080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:38,536-Speed 3320.61 samples/sec   Loss 2.1962   LearningRate 0.0414   Epoch: 7   Global Step: 119090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:41,645-Speed 3294.30 samples/sec   Loss 2.1751   LearningRate 0.0414   Epoch: 7   Global Step: 119100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:44,848-Speed 3198.01 samples/sec   Loss 2.1141   LearningRate 0.0414   Epoch: 7   Global Step: 119110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:49:47,927-Speed 3326.80 samples/sec   Loss 2.1257   LearningRate 0.0414   Epoch: 7   Global Step: 119120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:49:51,067-Speed 3261.80 samples/sec   Loss 2.1882   LearningRate 0.0414   Epoch: 7   Global Step: 119130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:49:54,158-Speed 3313.68 samples/sec   Loss 2.1189   LearningRate 0.0414   Epoch: 7   Global Step: 119140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:49:57,296-Speed 3264.79 samples/sec   Loss 2.0430   LearningRate 0.0414   Epoch: 7   Global Step: 119150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:00,366-Speed 3335.92 samples/sec   Loss 2.0793   LearningRate 0.0414   Epoch: 7   Global Step: 119160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:03,441-Speed 3330.74 samples/sec   Loss 2.1635   LearningRate 0.0413   Epoch: 7   Global Step: 119170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:06,521-Speed 3326.23 samples/sec   Loss 2.1255   LearningRate 0.0413   Epoch: 7   Global Step: 119180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:09,618-Speed 3306.41 samples/sec   Loss 2.1206   LearningRate 0.0413   Epoch: 7   Global Step: 119190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:12,700-Speed 3323.80 samples/sec   Loss 2.1563   LearningRate 0.0413   Epoch: 7   Global Step: 119200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:15,811-Speed 3292.69 samples/sec   Loss 2.1503   LearningRate 0.0413   Epoch: 7   Global Step: 119210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:18,867-Speed 3351.22 samples/sec   Loss 2.1445   LearningRate 0.0413   Epoch: 7   Global Step: 119220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:21,940-Speed 3333.37 samples/sec   Loss 2.1199   LearningRate 0.0413   Epoch: 7   Global Step: 119230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:25,010-Speed 3336.38 samples/sec   Loss 2.1396   LearningRate 0.0413   Epoch: 7   Global Step: 119240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:28,115-Speed 3298.91 samples/sec   Loss 2.2230   LearningRate 0.0413   Epoch: 7   Global Step: 119250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:31,185-Speed 3336.55 samples/sec   Loss 2.1879   LearningRate 0.0413   Epoch: 7   Global Step: 119260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:34,295-Speed 3292.79 samples/sec   Loss 2.1190   LearningRate 0.0413   Epoch: 7   Global Step: 119270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:37,387-Speed 3313.14 samples/sec   Loss 2.1502   LearningRate 0.0413   Epoch: 7   Global Step: 119280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:40,455-Speed 3339.16 samples/sec   Loss 2.1365   LearningRate 0.0413   Epoch: 7   Global Step: 119290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:43,589-Speed 3267.38 samples/sec   Loss 2.2032   LearningRate 0.0413   Epoch: 7   Global Step: 119300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:46,729-Speed 3262.06 samples/sec   Loss 2.1740   LearningRate 0.0413   Epoch: 7   Global Step: 119310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:49,821-Speed 3312.94 samples/sec   Loss 2.0978   LearningRate 0.0413   Epoch: 7   Global Step: 119320   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:50:52,957-Speed 3266.68 samples/sec   Loss 2.1732   LearningRate 0.0413   Epoch: 7   Global Step: 119330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:50:56,065-Speed 3295.70 samples/sec   Loss 2.1268   LearningRate 0.0413   Epoch: 7   Global Step: 119340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:50:59,177-Speed 3291.76 samples/sec   Loss 2.1926   LearningRate 0.0413   Epoch: 7   Global Step: 119350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:51:02,319-Speed 3259.68 samples/sec   Loss 2.1524   LearningRate 0.0413   Epoch: 7   Global Step: 119360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:51:05,459-Speed 3261.46 samples/sec   Loss 2.1430   LearningRate 0.0413   Epoch: 7   Global Step: 119370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:51:08,558-Speed 3306.61 samples/sec   Loss 2.1002   LearningRate 0.0413   Epoch: 7   Global Step: 119380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:51:11,658-Speed 3304.24 samples/sec   Loss 2.1342   LearningRate 0.0413   Epoch: 7   Global Step: 119390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:51:14,763-Speed 3298.54 samples/sec   Loss 2.1021   LearningRate 0.0413   Epoch: 7   Global Step: 119400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:51:17,834-Speed 3334.85 samples/sec   Loss 2.1555   LearningRate 0.0413   Epoch: 7   Global Step: 119410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:51:20,976-Speed 3260.11 samples/sec   Loss 2.1802   LearningRate 0.0413   Epoch: 7   Global Step: 119420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:51:24,042-Speed 3340.89 samples/sec   Loss 2.1628   LearningRate 0.0412   Epoch: 7   Global Step: 119430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:51:27,119-Speed 3328.83 samples/sec   Loss 2.1710   LearningRate 0.0412   Epoch: 7   Global Step: 119440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:30,190-Speed 3334.17 samples/sec   Loss 2.2290   LearningRate 0.0412   Epoch: 7   Global Step: 119450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:33,338-Speed 3253.42 samples/sec   Loss 2.2006   LearningRate 0.0412   Epoch: 7   Global Step: 119460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:36,506-Speed 3234.34 samples/sec   Loss 2.2041   LearningRate 0.0412   Epoch: 7   Global Step: 119470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:39,746-Speed 3160.90 samples/sec   Loss 2.0922   LearningRate 0.0412   Epoch: 7   Global Step: 119480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:43,002-Speed 3145.11 samples/sec   Loss 2.1126   LearningRate 0.0412   Epoch: 7   Global Step: 119490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:46,068-Speed 3340.91 samples/sec   Loss 2.1254   LearningRate 0.0412   Epoch: 7   Global Step: 119500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:49,143-Speed 3331.20 samples/sec   Loss 2.1481   LearningRate 0.0412   Epoch: 7   Global Step: 119510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:52,258-Speed 3288.04 samples/sec   Loss 2.2025   LearningRate 0.0412   Epoch: 7   Global Step: 119520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:55,332-Speed 3332.48 samples/sec   Loss 2.1051   LearningRate 0.0412   Epoch: 7   Global Step: 119530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:51:58,405-Speed 3332.94 samples/sec   Loss 2.1704   LearningRate 0.0412   Epoch: 7   Global Step: 119540   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-11 11:52:01,467-Speed 3344.51 samples/sec   Loss 2.1363   LearningRate 0.0412   Epoch: 7   Global Step: 119550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:52:04,553-Speed 3318.98 samples/sec   Loss 2.1629   LearningRate 0.0412   Epoch: 7   Global Step: 119560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:52:07,651-Speed 3306.88 samples/sec   Loss 2.1825   LearningRate 0.0412   Epoch: 7   Global Step: 119570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:52:10,694-Speed 3366.12 samples/sec   Loss 2.1114   LearningRate 0.0412   Epoch: 7   Global Step: 119580   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:13,763-Speed 3337.02 samples/sec   Loss 2.0916   LearningRate 0.0412   Epoch: 7   Global Step: 119590   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:16,830-Speed 3339.74 samples/sec   Loss 2.1139   LearningRate 0.0412   Epoch: 7   Global Step: 119600   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:19,917-Speed 3318.45 samples/sec   Loss 2.1318   LearningRate 0.0412   Epoch: 7   Global Step: 119610   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:22,996-Speed 3325.88 samples/sec   Loss 2.1309   LearningRate 0.0412   Epoch: 7   Global Step: 119620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:26,061-Speed 3342.04 samples/sec   Loss 2.1107   LearningRate 0.0412   Epoch: 7   Global Step: 119630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:29,131-Speed 3336.82 samples/sec   Loss 2.2214   LearningRate 0.0412   Epoch: 7   Global Step: 119640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:32,258-Speed 3275.45 samples/sec   Loss 2.1724   LearningRate 0.0412   Epoch: 7   Global Step: 119650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:35,351-Speed 3311.33 samples/sec   Loss 2.1843   LearningRate 0.0412   Epoch: 7   Global Step: 119660   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:38,440-Speed 3316.09 samples/sec   Loss 2.1992   LearningRate 0.0412   Epoch: 7   Global Step: 119670   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-11 11:52:41,558-Speed 3285.73 samples/sec   Loss 2.1893   LearningRate 0.0412   Epoch: 7   Global Step: 119680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:52:44,751-Speed 3206.87 samples/sec   Loss 2.1748   LearningRate 0.0411   Epoch: 7   Global Step: 119690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:52:47,819-Speed 3338.85 samples/sec   Loss 2.2328   LearningRate 0.0411   Epoch: 7   Global Step: 119700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:52:50,892-Speed 3333.08 samples/sec   Loss 2.2386   LearningRate 0.0411   Epoch: 7   Global Step: 119710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:52:53,971-Speed 3327.48 samples/sec   Loss 2.1069   LearningRate 0.0411   Epoch: 7   Global Step: 119720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:52:57,044-Speed 3333.10 samples/sec   Loss 2.1108   LearningRate 0.0411   Epoch: 7   Global Step: 119730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:53:00,113-Speed 3336.60 samples/sec   Loss 2.1443   LearningRate 0.0411   Epoch: 7   Global Step: 119740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:53:03,181-Speed 3338.31 samples/sec   Loss 2.1478   LearningRate 0.0411   Epoch: 7   Global Step: 119750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:53:06,284-Speed 3301.18 samples/sec   Loss 2.1914   LearningRate 0.0411   Epoch: 7   Global Step: 119760   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:53:09,366-Speed 3323.68 samples/sec   Loss 2.1802   LearningRate 0.0411   Epoch: 7   Global Step: 119770   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:53:12,439-Speed 3332.83 samples/sec   Loss 2.1576   LearningRate 0.0411   Epoch: 7   Global Step: 119780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:15,520-Speed 3325.10 samples/sec   Loss 2.1454   LearningRate 0.0411   Epoch: 7   Global Step: 119790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:18,594-Speed 3332.01 samples/sec   Loss 2.1422   LearningRate 0.0411   Epoch: 7   Global Step: 119800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:21,673-Speed 3326.76 samples/sec   Loss 2.1978   LearningRate 0.0411   Epoch: 7   Global Step: 119810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:24,740-Speed 3339.02 samples/sec   Loss 2.1810   LearningRate 0.0411   Epoch: 7   Global Step: 119820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:27,833-Speed 3311.80 samples/sec   Loss 2.2185   LearningRate 0.0411   Epoch: 7   Global Step: 119830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:30,908-Speed 3331.77 samples/sec   Loss 2.1964   LearningRate 0.0411   Epoch: 7   Global Step: 119840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:34,010-Speed 3301.34 samples/sec   Loss 2.1051   LearningRate 0.0411   Epoch: 7   Global Step: 119850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:37,091-Speed 3325.08 samples/sec   Loss 2.1162   LearningRate 0.0411   Epoch: 7   Global Step: 119860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:40,201-Speed 3293.38 samples/sec   Loss 2.1745   LearningRate 0.0411   Epoch: 7   Global Step: 119870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:43,261-Speed 3346.41 samples/sec   Loss 2.2200   LearningRate 0.0411   Epoch: 7   Global Step: 119880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:53:46,334-Speed 3333.89 samples/sec   Loss 2.1684   LearningRate 0.0411   Epoch: 7   Global Step: 119890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:53:49,413-Speed 3326.27 samples/sec   Loss 2.1995   LearningRate 0.0411   Epoch: 7   Global Step: 119900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:53:52,532-Speed 3283.93 samples/sec   Loss 2.1467   LearningRate 0.0411   Epoch: 7   Global Step: 119910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:53:55,642-Speed 3293.57 samples/sec   Loss 2.1832   LearningRate 0.0411   Epoch: 7   Global Step: 119920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:53:58,713-Speed 3335.40 samples/sec   Loss 2.1769   LearningRate 0.0411   Epoch: 7   Global Step: 119930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:54:01,782-Speed 3337.97 samples/sec   Loss 2.1410   LearningRate 0.0411   Epoch: 7   Global Step: 119940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:54:04,883-Speed 3302.95 samples/sec   Loss 2.1991   LearningRate 0.0410   Epoch: 7   Global Step: 119950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:54:07,956-Speed 3333.68 samples/sec   Loss 2.1822   LearningRate 0.0410   Epoch: 7   Global Step: 119960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:54:11,041-Speed 3320.07 samples/sec   Loss 2.1524   LearningRate 0.0410   Epoch: 7   Global Step: 119970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:54:14,128-Speed 3317.55 samples/sec   Loss 2.1755   LearningRate 0.0410   Epoch: 7   Global Step: 119980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:54:17,214-Speed 3319.65 samples/sec   Loss 2.1764   LearningRate 0.0410   Epoch: 7   Global Step: 119990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:54:20,333-Speed 3283.09 samples/sec   Loss 2.1463   LearningRate 0.0410   Epoch: 7   Global Step: 120000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:55:04,381-[lfw][120000]XNorm: 22.420710
Training: 2022-04-11 11:55:04,381-[lfw][120000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 11:55:04,382-[lfw][120000]Accuracy-Highest: 0.99817
Training: 2022-04-11 11:55:55,553-[cfp_fp][120000]XNorm: 21.888708
Training: 2022-04-11 11:55:55,554-[cfp_fp][120000]Accuracy-Flip: 0.98543+-0.00418
Training: 2022-04-11 11:55:55,554-[cfp_fp][120000]Accuracy-Highest: 0.98643
Training: 2022-04-11 11:56:39,578-[agedb_30][120000]XNorm: 23.310018
Training: 2022-04-11 11:56:39,579-[agedb_30][120000]Accuracy-Flip: 0.98267+-0.00659
Training: 2022-04-11 11:56:39,579-[agedb_30][120000]Accuracy-Highest: 0.98317
Training: 2022-04-11 11:56:42,682-Speed 71.94 samples/sec   Loss 2.2355   LearningRate 0.0410   Epoch: 7   Global Step: 120010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:56:45,780-Speed 3306.46 samples/sec   Loss 2.1681   LearningRate 0.0410   Epoch: 7   Global Step: 120020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:56:48,877-Speed 3306.60 samples/sec   Loss 2.1966   LearningRate 0.0410   Epoch: 7   Global Step: 120030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:56:51,945-Speed 3339.26 samples/sec   Loss 2.1613   LearningRate 0.0410   Epoch: 7   Global Step: 120040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:56:55,098-Speed 3248.13 samples/sec   Loss 2.1959   LearningRate 0.0410   Epoch: 7   Global Step: 120050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:56:58,227-Speed 3273.68 samples/sec   Loss 2.2059   LearningRate 0.0410   Epoch: 7   Global Step: 120060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:01,323-Speed 3308.77 samples/sec   Loss 2.1706   LearningRate 0.0410   Epoch: 7   Global Step: 120070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:04,398-Speed 3331.30 samples/sec   Loss 2.0934   LearningRate 0.0410   Epoch: 7   Global Step: 120080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:07,487-Speed 3315.60 samples/sec   Loss 2.2192   LearningRate 0.0410   Epoch: 7   Global Step: 120090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:10,550-Speed 3344.62 samples/sec   Loss 2.1839   LearningRate 0.0410   Epoch: 7   Global Step: 120100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:13,622-Speed 3334.73 samples/sec   Loss 2.1204   LearningRate 0.0410   Epoch: 7   Global Step: 120110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:16,713-Speed 3312.93 samples/sec   Loss 2.2153   LearningRate 0.0410   Epoch: 7   Global Step: 120120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:19,842-Speed 3273.34 samples/sec   Loss 2.1296   LearningRate 0.0410   Epoch: 7   Global Step: 120130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:22,941-Speed 3305.98 samples/sec   Loss 2.1735   LearningRate 0.0410   Epoch: 7   Global Step: 120140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:26,033-Speed 3312.38 samples/sec   Loss 2.2062   LearningRate 0.0410   Epoch: 7   Global Step: 120150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:29,102-Speed 3336.96 samples/sec   Loss 2.2228   LearningRate 0.0410   Epoch: 7   Global Step: 120160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:57:32,165-Speed 3343.50 samples/sec   Loss 2.2407   LearningRate 0.0410   Epoch: 7   Global Step: 120170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:57:35,237-Speed 3335.13 samples/sec   Loss 2.1979   LearningRate 0.0410   Epoch: 7   Global Step: 120180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:57:38,332-Speed 3309.07 samples/sec   Loss 2.1303   LearningRate 0.0410   Epoch: 7   Global Step: 120190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:57:41,437-Speed 3298.27 samples/sec   Loss 2.1389   LearningRate 0.0410   Epoch: 7   Global Step: 120200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:57:44,509-Speed 3334.40 samples/sec   Loss 2.1443   LearningRate 0.0409   Epoch: 7   Global Step: 120210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:57:47,609-Speed 3304.65 samples/sec   Loss 2.2200   LearningRate 0.0409   Epoch: 7   Global Step: 120220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:57:50,682-Speed 3332.50 samples/sec   Loss 2.1575   LearningRate 0.0409   Epoch: 7   Global Step: 120230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:57:53,756-Speed 3332.17 samples/sec   Loss 2.1867   LearningRate 0.0409   Epoch: 7   Global Step: 120240   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:57:56,829-Speed 3332.97 samples/sec   Loss 2.1923   LearningRate 0.0409   Epoch: 7   Global Step: 120250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:00,003-Speed 3226.56 samples/sec   Loss 2.1550   LearningRate 0.0409   Epoch: 7   Global Step: 120260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:03,079-Speed 3329.64 samples/sec   Loss 2.1779   LearningRate 0.0409   Epoch: 7   Global Step: 120270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:58:06,158-Speed 3326.61 samples/sec   Loss 2.2236   LearningRate 0.0409   Epoch: 7   Global Step: 120280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:58:09,239-Speed 3324.34 samples/sec   Loss 2.1760   LearningRate 0.0409   Epoch: 7   Global Step: 120290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:58:12,352-Speed 3290.85 samples/sec   Loss 2.1673   LearningRate 0.0409   Epoch: 7   Global Step: 120300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:58:15,427-Speed 3330.34 samples/sec   Loss 2.1677   LearningRate 0.0409   Epoch: 7   Global Step: 120310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:58:18,565-Speed 3264.39 samples/sec   Loss 2.1998   LearningRate 0.0409   Epoch: 7   Global Step: 120320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:58:21,678-Speed 3289.94 samples/sec   Loss 2.1923   LearningRate 0.0409   Epoch: 7   Global Step: 120330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:58:24,807-Speed 3274.45 samples/sec   Loss 2.1818   LearningRate 0.0409   Epoch: 7   Global Step: 120340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:58:27,892-Speed 3320.50 samples/sec   Loss 2.2363   LearningRate 0.0409   Epoch: 7   Global Step: 120350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:58:30,990-Speed 3306.00 samples/sec   Loss 2.2639   LearningRate 0.0409   Epoch: 7   Global Step: 120360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:34,085-Speed 3309.23 samples/sec   Loss 2.1322   LearningRate 0.0409   Epoch: 7   Global Step: 120370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:37,217-Speed 3270.09 samples/sec   Loss 2.2230   LearningRate 0.0409   Epoch: 7   Global Step: 120380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:40,283-Speed 3340.73 samples/sec   Loss 2.2644   LearningRate 0.0409   Epoch: 7   Global Step: 120390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:43,353-Speed 3336.26 samples/sec   Loss 2.2224   LearningRate 0.0409   Epoch: 7   Global Step: 120400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:46,445-Speed 3313.12 samples/sec   Loss 2.1894   LearningRate 0.0409   Epoch: 7   Global Step: 120410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:49,519-Speed 3331.53 samples/sec   Loss 2.2503   LearningRate 0.0409   Epoch: 7   Global Step: 120420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:52,587-Speed 3339.15 samples/sec   Loss 2.1138   LearningRate 0.0409   Epoch: 7   Global Step: 120430   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:55,651-Speed 3342.00 samples/sec   Loss 2.2509   LearningRate 0.0409   Epoch: 7   Global Step: 120440   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:58:58,791-Speed 3262.51 samples/sec   Loss 2.2344   LearningRate 0.0409   Epoch: 7   Global Step: 120450   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:59:01,863-Speed 3333.25 samples/sec   Loss 2.1743   LearningRate 0.0409   Epoch: 7   Global Step: 120460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:04,931-Speed 3338.79 samples/sec   Loss 2.2196   LearningRate 0.0408   Epoch: 7   Global Step: 120470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:08,007-Speed 3330.03 samples/sec   Loss 2.1789   LearningRate 0.0408   Epoch: 7   Global Step: 120480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:11,073-Speed 3341.61 samples/sec   Loss 2.2321   LearningRate 0.0408   Epoch: 7   Global Step: 120490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:14,132-Speed 3347.43 samples/sec   Loss 2.2478   LearningRate 0.0408   Epoch: 7   Global Step: 120500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:17,207-Speed 3331.16 samples/sec   Loss 2.2070   LearningRate 0.0408   Epoch: 7   Global Step: 120510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:20,274-Speed 3339.19 samples/sec   Loss 2.1802   LearningRate 0.0408   Epoch: 7   Global Step: 120520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:23,341-Speed 3339.88 samples/sec   Loss 2.1554   LearningRate 0.0408   Epoch: 7   Global Step: 120530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:26,433-Speed 3312.29 samples/sec   Loss 2.2394   LearningRate 0.0408   Epoch: 7   Global Step: 120540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:29,501-Speed 3338.70 samples/sec   Loss 2.2908   LearningRate 0.0408   Epoch: 7   Global Step: 120550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:32,577-Speed 3329.18 samples/sec   Loss 2.2756   LearningRate 0.0408   Epoch: 7   Global Step: 120560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:35,642-Speed 3342.70 samples/sec   Loss 2.3035   LearningRate 0.0408   Epoch: 7   Global Step: 120570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:38,703-Speed 3345.75 samples/sec   Loss 2.2289   LearningRate 0.0408   Epoch: 7   Global Step: 120580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 11:59:41,754-Speed 3356.84 samples/sec   Loss 2.2027   LearningRate 0.0408   Epoch: 7   Global Step: 120590   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:59:44,862-Speed 3295.69 samples/sec   Loss 2.1700   LearningRate 0.0408   Epoch: 7   Global Step: 120600   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:59:47,978-Speed 3287.62 samples/sec   Loss 2.2028   LearningRate 0.0408   Epoch: 7   Global Step: 120610   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:59:51,055-Speed 3328.38 samples/sec   Loss 2.2137   LearningRate 0.0408   Epoch: 7   Global Step: 120620   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:59:54,129-Speed 3332.34 samples/sec   Loss 2.2723   LearningRate 0.0408   Epoch: 7   Global Step: 120630   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 11:59:57,227-Speed 3305.79 samples/sec   Loss 2.2815   LearningRate 0.0408   Epoch: 7   Global Step: 120640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 12:00:00,291-Speed 3343.21 samples/sec   Loss 2.2467   LearningRate 0.0408   Epoch: 7   Global Step: 120650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 12:00:03,389-Speed 3306.90 samples/sec   Loss 2.2038   LearningRate 0.0408   Epoch: 7   Global Step: 120660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 12:00:06,468-Speed 3326.29 samples/sec   Loss 2.1680   LearningRate 0.0408   Epoch: 7   Global Step: 120670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 12:00:09,575-Speed 3295.94 samples/sec   Loss 2.2430   LearningRate 0.0408   Epoch: 7   Global Step: 120680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-11 12:00:12,671-Speed 3309.05 samples/sec   Loss 2.2024   LearningRate 0.0408   Epoch: 7   Global Step: 120690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:15,879-Speed 3192.93 samples/sec   Loss 2.2211   LearningRate 0.0408   Epoch: 7   Global Step: 120700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:18,970-Speed 3314.16 samples/sec   Loss 2.1256   LearningRate 0.0408   Epoch: 7   Global Step: 120710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:22,055-Speed 3319.59 samples/sec   Loss 2.2096   LearningRate 0.0408   Epoch: 7   Global Step: 120720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:25,250-Speed 3205.82 samples/sec   Loss 2.2188   LearningRate 0.0407   Epoch: 7   Global Step: 120730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:28,385-Speed 3267.87 samples/sec   Loss 2.1637   LearningRate 0.0407   Epoch: 7   Global Step: 120740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:31,632-Speed 3154.17 samples/sec   Loss 2.1952   LearningRate 0.0407   Epoch: 7   Global Step: 120750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:34,719-Speed 3317.76 samples/sec   Loss 2.2103   LearningRate 0.0407   Epoch: 7   Global Step: 120760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:37,825-Speed 3297.30 samples/sec   Loss 2.2641   LearningRate 0.0407   Epoch: 7   Global Step: 120770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:40,939-Speed 3289.93 samples/sec   Loss 2.2542   LearningRate 0.0407   Epoch: 7   Global Step: 120780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-11 12:00:44,012-Speed 3333.25 samples/sec   Loss 2.2198   LearningRate 0.0407   Epoch: 7   Global Step: 120790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:00:47,199-Speed 3213.90 samples/sec   Loss 2.1934   LearningRate 0.0407   Epoch: 7   Global Step: 120800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:00:50,372-Speed 3228.42 samples/sec   Loss 2.1836   LearningRate 0.0407   Epoch: 7   Global Step: 120810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:00:53,444-Speed 3333.72 samples/sec   Loss 2.2217   LearningRate 0.0407   Epoch: 7   Global Step: 120820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:00:56,544-Speed 3304.78 samples/sec   Loss 2.1951   LearningRate 0.0407   Epoch: 7   Global Step: 120830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:00:59,619-Speed 3330.38 samples/sec   Loss 2.2278   LearningRate 0.0407   Epoch: 7   Global Step: 120840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:01:07,112-Speed 1366.92 samples/sec   Loss 2.2123   LearningRate 0.0407   Epoch: 7   Global Step: 120850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:01:10,271-Speed 3242.57 samples/sec   Loss 2.2298   LearningRate 0.0407   Epoch: 7   Global Step: 120860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:01:13,334-Speed 3343.32 samples/sec   Loss 2.1792   LearningRate 0.0407   Epoch: 7   Global Step: 120870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:01:16,424-Speed 3315.43 samples/sec   Loss 2.2033   LearningRate 0.0407   Epoch: 7   Global Step: 120880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:01:19,511-Speed 3317.76 samples/sec   Loss 2.1956   LearningRate 0.0407   Epoch: 7   Global Step: 120890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:01:22,580-Speed 3337.80 samples/sec   Loss 2.2811   LearningRate 0.0407   Epoch: 7   Global Step: 120900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:01:25,662-Speed 3322.75 samples/sec   Loss 2.2256   LearningRate 0.0407   Epoch: 7   Global Step: 120910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:01:28,738-Speed 3329.82 samples/sec   Loss 2.1758   LearningRate 0.0407   Epoch: 7   Global Step: 120920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:01:31,816-Speed 3327.92 samples/sec   Loss 2.2624   LearningRate 0.0407   Epoch: 7   Global Step: 120930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:01:34,937-Speed 3281.77 samples/sec   Loss 2.2249   LearningRate 0.0407   Epoch: 7   Global Step: 120940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:01:38,008-Speed 3335.14 samples/sec   Loss 2.1295   LearningRate 0.0407   Epoch: 7   Global Step: 120950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:01:41,089-Speed 3324.36 samples/sec   Loss 2.1882   LearningRate 0.0407   Epoch: 7   Global Step: 120960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:01:44,164-Speed 3330.64 samples/sec   Loss 2.1976   LearningRate 0.0407   Epoch: 7   Global Step: 120970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:01:47,237-Speed 3332.82 samples/sec   Loss 2.2283   LearningRate 0.0407   Epoch: 7   Global Step: 120980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:01:50,319-Speed 3323.92 samples/sec   Loss 2.1925   LearningRate 0.0406   Epoch: 7   Global Step: 120990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:01:53,422-Speed 3300.77 samples/sec   Loss 2.2045   LearningRate 0.0406   Epoch: 7   Global Step: 121000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:01:56,497-Speed 3331.22 samples/sec   Loss 2.2295   LearningRate 0.0406   Epoch: 7   Global Step: 121010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:01:59,569-Speed 3333.28 samples/sec   Loss 2.1592   LearningRate 0.0406   Epoch: 7   Global Step: 121020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:02:02,645-Speed 3329.76 samples/sec   Loss 2.2000   LearningRate 0.0406   Epoch: 7   Global Step: 121030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:02:05,707-Speed 3345.43 samples/sec   Loss 2.2004   LearningRate 0.0406   Epoch: 7   Global Step: 121040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:02:08,807-Speed 3304.67 samples/sec   Loss 2.2195   LearningRate 0.0406   Epoch: 7   Global Step: 121050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:02:11,900-Speed 3311.30 samples/sec   Loss 2.2384   LearningRate 0.0406   Epoch: 7   Global Step: 121060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:02:14,992-Speed 3312.04 samples/sec   Loss 2.1615   LearningRate 0.0406   Epoch: 7   Global Step: 121070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:02:18,105-Speed 3290.35 samples/sec   Loss 2.2096   LearningRate 0.0406   Epoch: 7   Global Step: 121080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:02:21,206-Speed 3303.31 samples/sec   Loss 2.2469   LearningRate 0.0406   Epoch: 7   Global Step: 121090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:02:24,318-Speed 3291.28 samples/sec   Loss 2.2765   LearningRate 0.0406   Epoch: 7   Global Step: 121100   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:27,456-Speed 3263.74 samples/sec   Loss 2.2495   LearningRate 0.0406   Epoch: 7   Global Step: 121110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:30,593-Speed 3265.80 samples/sec   Loss 2.2603   LearningRate 0.0406   Epoch: 7   Global Step: 121120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:33,825-Speed 3168.38 samples/sec   Loss 2.1895   LearningRate 0.0406   Epoch: 7   Global Step: 121130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:37,085-Speed 3142.01 samples/sec   Loss 2.2419   LearningRate 0.0406   Epoch: 7   Global Step: 121140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:40,281-Speed 3204.80 samples/sec   Loss 2.2394   LearningRate 0.0406   Epoch: 7   Global Step: 121150   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:43,366-Speed 3319.96 samples/sec   Loss 2.2718   LearningRate 0.0406   Epoch: 7   Global Step: 121160   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:46,502-Speed 3267.37 samples/sec   Loss 2.2062   LearningRate 0.0406   Epoch: 7   Global Step: 121170   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:49,626-Speed 3278.16 samples/sec   Loss 2.2036   LearningRate 0.0406   Epoch: 7   Global Step: 121180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:52,709-Speed 3322.74 samples/sec   Loss 2.2450   LearningRate 0.0406   Epoch: 7   Global Step: 121190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:02:55,786-Speed 3328.19 samples/sec   Loss 2.2651   LearningRate 0.0406   Epoch: 7   Global Step: 121200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:02:58,871-Speed 3320.43 samples/sec   Loss 2.1943   LearningRate 0.0406   Epoch: 7   Global Step: 121210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:02,069-Speed 3202.52 samples/sec   Loss 2.2298   LearningRate 0.0406   Epoch: 7   Global Step: 121220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:05,166-Speed 3307.96 samples/sec   Loss 2.3138   LearningRate 0.0406   Epoch: 7   Global Step: 121230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:08,263-Speed 3307.49 samples/sec   Loss 2.2217   LearningRate 0.0406   Epoch: 7   Global Step: 121240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:11,348-Speed 3319.91 samples/sec   Loss 2.2619   LearningRate 0.0405   Epoch: 7   Global Step: 121250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:14,427-Speed 3325.90 samples/sec   Loss 2.2511   LearningRate 0.0405   Epoch: 7   Global Step: 121260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:17,522-Speed 3309.96 samples/sec   Loss 2.2439   LearningRate 0.0405   Epoch: 7   Global Step: 121270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:20,629-Speed 3296.08 samples/sec   Loss 2.2104   LearningRate 0.0405   Epoch: 7   Global Step: 121280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:23,716-Speed 3318.21 samples/sec   Loss 2.1377   LearningRate 0.0405   Epoch: 7   Global Step: 121290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:26,808-Speed 3312.90 samples/sec   Loss 2.1878   LearningRate 0.0405   Epoch: 7   Global Step: 121300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:03:29,944-Speed 3265.46 samples/sec   Loss 2.2092   LearningRate 0.0405   Epoch: 7   Global Step: 121310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:03:33,108-Speed 3237.66 samples/sec   Loss 2.3031   LearningRate 0.0405   Epoch: 7   Global Step: 121320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:03:36,210-Speed 3301.56 samples/sec   Loss 2.1897   LearningRate 0.0405   Epoch: 7   Global Step: 121330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:03:39,328-Speed 3285.17 samples/sec   Loss 2.2735   LearningRate 0.0405   Epoch: 7   Global Step: 121340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:42,499-Speed 3229.89 samples/sec   Loss 2.2674   LearningRate 0.0405   Epoch: 7   Global Step: 121350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:45,777-Speed 3124.44 samples/sec   Loss 2.2765   LearningRate 0.0405   Epoch: 7   Global Step: 121360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:48,983-Speed 3194.85 samples/sec   Loss 2.2484   LearningRate 0.0405   Epoch: 7   Global Step: 121370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:52,143-Speed 3241.57 samples/sec   Loss 2.2877   LearningRate 0.0405   Epoch: 7   Global Step: 121380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:55,360-Speed 3184.40 samples/sec   Loss 2.2856   LearningRate 0.0405   Epoch: 7   Global Step: 121390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:03:58,525-Speed 3237.24 samples/sec   Loss 2.2546   LearningRate 0.0405   Epoch: 7   Global Step: 121400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:01,609-Speed 3320.82 samples/sec   Loss 2.2951   LearningRate 0.0405   Epoch: 7   Global Step: 121410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:04,765-Speed 3244.86 samples/sec   Loss 2.2455   LearningRate 0.0405   Epoch: 7   Global Step: 121420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:07,950-Speed 3216.09 samples/sec   Loss 2.2042   LearningRate 0.0405   Epoch: 7   Global Step: 121430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:11,047-Speed 3306.56 samples/sec   Loss 2.2102   LearningRate 0.0405   Epoch: 7   Global Step: 121440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:04:14,143-Speed 3309.17 samples/sec   Loss 2.2420   LearningRate 0.0405   Epoch: 7   Global Step: 121450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:04:17,237-Speed 3310.43 samples/sec   Loss 2.2828   LearningRate 0.0405   Epoch: 7   Global Step: 121460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:04:20,316-Speed 3326.20 samples/sec   Loss 2.2271   LearningRate 0.0405   Epoch: 7   Global Step: 121470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:04:23,402-Speed 3319.01 samples/sec   Loss 2.2368   LearningRate 0.0405   Epoch: 7   Global Step: 121480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:26,496-Speed 3311.08 samples/sec   Loss 2.2216   LearningRate 0.0405   Epoch: 7   Global Step: 121490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:29,580-Speed 3321.69 samples/sec   Loss 2.2199   LearningRate 0.0405   Epoch: 7   Global Step: 121500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:32,674-Speed 3310.25 samples/sec   Loss 2.3130   LearningRate 0.0404   Epoch: 7   Global Step: 121510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:35,759-Speed 3320.94 samples/sec   Loss 2.2562   LearningRate 0.0404   Epoch: 7   Global Step: 121520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:38,910-Speed 3249.61 samples/sec   Loss 2.1975   LearningRate 0.0404   Epoch: 7   Global Step: 121530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:42,022-Speed 3291.39 samples/sec   Loss 2.2619   LearningRate 0.0404   Epoch: 7   Global Step: 121540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:45,123-Speed 3303.78 samples/sec   Loss 2.2793   LearningRate 0.0404   Epoch: 7   Global Step: 121550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:48,220-Speed 3307.01 samples/sec   Loss 2.2638   LearningRate 0.0404   Epoch: 7   Global Step: 121560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:51,351-Speed 3270.77 samples/sec   Loss 2.1903   LearningRate 0.0404   Epoch: 7   Global Step: 121570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:04:54,448-Speed 3308.23 samples/sec   Loss 2.2468   LearningRate 0.0404   Epoch: 7   Global Step: 121580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:04:57,560-Speed 3290.57 samples/sec   Loss 2.2461   LearningRate 0.0404   Epoch: 7   Global Step: 121590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:00,708-Speed 3253.82 samples/sec   Loss 2.2466   LearningRate 0.0404   Epoch: 7   Global Step: 121600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:03,801-Speed 3312.30 samples/sec   Loss 2.3055   LearningRate 0.0404   Epoch: 7   Global Step: 121610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:06,882-Speed 3323.80 samples/sec   Loss 2.2298   LearningRate 0.0404   Epoch: 7   Global Step: 121620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:09,981-Speed 3305.26 samples/sec   Loss 2.2313   LearningRate 0.0404   Epoch: 7   Global Step: 121630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:13,135-Speed 3247.17 samples/sec   Loss 2.2507   LearningRate 0.0404   Epoch: 7   Global Step: 121640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:16,272-Speed 3265.53 samples/sec   Loss 2.1988   LearningRate 0.0404   Epoch: 7   Global Step: 121650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:19,435-Speed 3237.71 samples/sec   Loss 2.2182   LearningRate 0.0404   Epoch: 7   Global Step: 121660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:22,512-Speed 3328.68 samples/sec   Loss 2.2674   LearningRate 0.0404   Epoch: 7   Global Step: 121670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:25,604-Speed 3312.87 samples/sec   Loss 2.2568   LearningRate 0.0404   Epoch: 7   Global Step: 121680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:28,694-Speed 3314.42 samples/sec   Loss 2.2336   LearningRate 0.0404   Epoch: 7   Global Step: 121690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:31,790-Speed 3308.50 samples/sec   Loss 2.2571   LearningRate 0.0404   Epoch: 7   Global Step: 121700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:35,000-Speed 3190.75 samples/sec   Loss 2.2621   LearningRate 0.0404   Epoch: 7   Global Step: 121710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:38,220-Speed 3181.10 samples/sec   Loss 2.1878   LearningRate 0.0404   Epoch: 7   Global Step: 121720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:41,355-Speed 3268.35 samples/sec   Loss 2.1689   LearningRate 0.0404   Epoch: 7   Global Step: 121730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:44,458-Speed 3300.48 samples/sec   Loss 2.2147   LearningRate 0.0404   Epoch: 7   Global Step: 121740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:05:47,539-Speed 3324.62 samples/sec   Loss 2.2855   LearningRate 0.0404   Epoch: 7   Global Step: 121750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:50,624-Speed 3320.57 samples/sec   Loss 2.2233   LearningRate 0.0404   Epoch: 7   Global Step: 121760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:53,706-Speed 3322.68 samples/sec   Loss 2.2543   LearningRate 0.0404   Epoch: 7   Global Step: 121770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:56,799-Speed 3311.87 samples/sec   Loss 2.2356   LearningRate 0.0403   Epoch: 7   Global Step: 121780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:05:59,897-Speed 3305.78 samples/sec   Loss 2.3011   LearningRate 0.0403   Epoch: 7   Global Step: 121790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:02,984-Speed 3317.56 samples/sec   Loss 2.2414   LearningRate 0.0403   Epoch: 7   Global Step: 121800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:06,066-Speed 3324.14 samples/sec   Loss 2.2607   LearningRate 0.0403   Epoch: 7   Global Step: 121810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:09,155-Speed 3315.72 samples/sec   Loss 2.2973   LearningRate 0.0403   Epoch: 7   Global Step: 121820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:12,386-Speed 3169.96 samples/sec   Loss 2.2692   LearningRate 0.0403   Epoch: 7   Global Step: 121830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:15,562-Speed 3224.48 samples/sec   Loss 2.2623   LearningRate 0.0403   Epoch: 7   Global Step: 121840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:18,743-Speed 3220.33 samples/sec   Loss 2.2161   LearningRate 0.0403   Epoch: 7   Global Step: 121850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:21,871-Speed 3275.16 samples/sec   Loss 2.2388   LearningRate 0.0403   Epoch: 7   Global Step: 121860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:24,954-Speed 3322.56 samples/sec   Loss 2.2942   LearningRate 0.0403   Epoch: 7   Global Step: 121870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:28,086-Speed 3269.52 samples/sec   Loss 2.3012   LearningRate 0.0403   Epoch: 7   Global Step: 121880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:31,184-Speed 3306.45 samples/sec   Loss 2.2468   LearningRate 0.0403   Epoch: 7   Global Step: 121890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:34,340-Speed 3245.92 samples/sec   Loss 2.2752   LearningRate 0.0403   Epoch: 7   Global Step: 121900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:37,440-Speed 3304.55 samples/sec   Loss 2.2375   LearningRate 0.0403   Epoch: 7   Global Step: 121910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:40,535-Speed 3309.00 samples/sec   Loss 2.2936   LearningRate 0.0403   Epoch: 7   Global Step: 121920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:43,648-Speed 3290.21 samples/sec   Loss 2.2456   LearningRate 0.0403   Epoch: 7   Global Step: 121930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:46,764-Speed 3287.57 samples/sec   Loss 2.2663   LearningRate 0.0403   Epoch: 7   Global Step: 121940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:06:49,898-Speed 3267.56 samples/sec   Loss 2.2870   LearningRate 0.0403   Epoch: 7   Global Step: 121950   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 12:06:52,957-Speed 3348.71 samples/sec   Loss 2.2860   LearningRate 0.0403   Epoch: 7   Global Step: 121960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:06:56,068-Speed 3292.83 samples/sec   Loss 2.2369   LearningRate 0.0403   Epoch: 7   Global Step: 121970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:06:59,349-Speed 3121.57 samples/sec   Loss 2.2322   LearningRate 0.0403   Epoch: 7   Global Step: 121980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:07:02,444-Speed 3309.53 samples/sec   Loss 2.3153   LearningRate 0.0403   Epoch: 7   Global Step: 121990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:07:05,534-Speed 3314.37 samples/sec   Loss 2.2955   LearningRate 0.0403   Epoch: 7   Global Step: 122000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:07:49,742-[lfw][122000]XNorm: 22.079913
Training: 2022-04-11 12:07:49,742-[lfw][122000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 12:07:49,743-[lfw][122000]Accuracy-Highest: 0.99817
Training: 2022-04-11 12:08:41,317-[cfp_fp][122000]XNorm: 21.984118
Training: 2022-04-11 12:08:41,318-[cfp_fp][122000]Accuracy-Flip: 0.98700+-0.00497
Training: 2022-04-11 12:08:41,318-[cfp_fp][122000]Accuracy-Highest: 0.98700
Training: 2022-04-11 12:09:25,798-[agedb_30][122000]XNorm: 22.676181
Training: 2022-04-11 12:09:25,799-[agedb_30][122000]Accuracy-Flip: 0.98283+-0.00775
Training: 2022-04-11 12:09:25,799-[agedb_30][122000]Accuracy-Highest: 0.98317
Training: 2022-04-11 12:09:28,923-Speed 71.41 samples/sec   Loss 2.2825   LearningRate 0.0403   Epoch: 7   Global Step: 122010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:09:32,007-Speed 3321.01 samples/sec   Loss 2.2530   LearningRate 0.0403   Epoch: 7   Global Step: 122020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:09:35,077-Speed 3337.00 samples/sec   Loss 2.2263   LearningRate 0.0403   Epoch: 7   Global Step: 122030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:09:38,173-Speed 3308.20 samples/sec   Loss 2.2990   LearningRate 0.0402   Epoch: 7   Global Step: 122040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:09:41,284-Speed 3291.90 samples/sec   Loss 2.2676   LearningRate 0.0402   Epoch: 7   Global Step: 122050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:09:44,431-Speed 3254.19 samples/sec   Loss 2.1710   LearningRate 0.0402   Epoch: 7   Global Step: 122060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:09:47,575-Speed 3258.39 samples/sec   Loss 2.2790   LearningRate 0.0402   Epoch: 7   Global Step: 122070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:09:50,712-Speed 3265.43 samples/sec   Loss 2.3072   LearningRate 0.0402   Epoch: 7   Global Step: 122080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:09:53,891-Speed 3221.64 samples/sec   Loss 2.2684   LearningRate 0.0402   Epoch: 7   Global Step: 122090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:09:57,077-Speed 3215.08 samples/sec   Loss 2.2706   LearningRate 0.0402   Epoch: 7   Global Step: 122100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:10:00,162-Speed 3319.96 samples/sec   Loss 2.2853   LearningRate 0.0402   Epoch: 7   Global Step: 122110   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:10:03,387-Speed 3176.21 samples/sec   Loss 2.2272   LearningRate 0.0402   Epoch: 7   Global Step: 122120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:10:06,636-Speed 3152.48 samples/sec   Loss 2.2759   LearningRate 0.0402   Epoch: 7   Global Step: 122130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:10:09,839-Speed 3197.77 samples/sec   Loss 2.3044   LearningRate 0.0402   Epoch: 7   Global Step: 122140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:10:13,009-Speed 3231.22 samples/sec   Loss 2.2224   LearningRate 0.0402   Epoch: 7   Global Step: 122150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:10:16,171-Speed 3238.76 samples/sec   Loss 2.2382   LearningRate 0.0402   Epoch: 7   Global Step: 122160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:10:19,288-Speed 3286.42 samples/sec   Loss 2.2194   LearningRate 0.0402   Epoch: 7   Global Step: 122170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:10:22,378-Speed 3315.89 samples/sec   Loss 2.1965   LearningRate 0.0402   Epoch: 7   Global Step: 122180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:10:25,455-Speed 3327.74 samples/sec   Loss 2.2710   LearningRate 0.0402   Epoch: 7   Global Step: 122190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:28,615-Speed 3241.53 samples/sec   Loss 2.3133   LearningRate 0.0402   Epoch: 7   Global Step: 122200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:31,798-Speed 3217.68 samples/sec   Loss 2.2740   LearningRate 0.0402   Epoch: 7   Global Step: 122210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:34,883-Speed 3320.67 samples/sec   Loss 2.2900   LearningRate 0.0402   Epoch: 7   Global Step: 122220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:38,036-Speed 3249.40 samples/sec   Loss 2.2646   LearningRate 0.0402   Epoch: 7   Global Step: 122230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:41,160-Speed 3279.13 samples/sec   Loss 2.3026   LearningRate 0.0402   Epoch: 7   Global Step: 122240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:44,284-Speed 3277.92 samples/sec   Loss 2.3166   LearningRate 0.0402   Epoch: 7   Global Step: 122250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:47,454-Speed 3231.23 samples/sec   Loss 2.2323   LearningRate 0.0402   Epoch: 7   Global Step: 122260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:50,551-Speed 3307.75 samples/sec   Loss 2.2541   LearningRate 0.0402   Epoch: 7   Global Step: 122270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:53,710-Speed 3242.15 samples/sec   Loss 2.2231   LearningRate 0.0402   Epoch: 7   Global Step: 122280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:10:56,915-Speed 3195.79 samples/sec   Loss 2.2690   LearningRate 0.0402   Epoch: 7   Global Step: 122290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:00,039-Speed 3278.66 samples/sec   Loss 2.2622   LearningRate 0.0401   Epoch: 7   Global Step: 122300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:03,136-Speed 3308.09 samples/sec   Loss 2.2643   LearningRate 0.0401   Epoch: 7   Global Step: 122310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:06,296-Speed 3240.74 samples/sec   Loss 2.2900   LearningRate 0.0401   Epoch: 7   Global Step: 122320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:09,413-Speed 3286.11 samples/sec   Loss 2.2979   LearningRate 0.0401   Epoch: 7   Global Step: 122330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:12,585-Speed 3228.55 samples/sec   Loss 2.2863   LearningRate 0.0401   Epoch: 7   Global Step: 122340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:15,779-Speed 3206.50 samples/sec   Loss 2.2931   LearningRate 0.0401   Epoch: 7   Global Step: 122350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:18,887-Speed 3296.32 samples/sec   Loss 2.2935   LearningRate 0.0401   Epoch: 7   Global Step: 122360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:22,068-Speed 3219.75 samples/sec   Loss 2.3173   LearningRate 0.0401   Epoch: 7   Global Step: 122370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:25,187-Speed 3283.96 samples/sec   Loss 2.2462   LearningRate 0.0401   Epoch: 7   Global Step: 122380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:28,365-Speed 3222.73 samples/sec   Loss 2.2884   LearningRate 0.0401   Epoch: 7   Global Step: 122390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:31,445-Speed 3325.32 samples/sec   Loss 2.2894   LearningRate 0.0401   Epoch: 7   Global Step: 122400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:34,526-Speed 3324.39 samples/sec   Loss 2.2687   LearningRate 0.0401   Epoch: 7   Global Step: 122410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:37,613-Speed 3318.57 samples/sec   Loss 2.2535   LearningRate 0.0401   Epoch: 7   Global Step: 122420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:11:40,693-Speed 3324.68 samples/sec   Loss 2.3904   LearningRate 0.0401   Epoch: 7   Global Step: 122430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:43,790-Speed 3308.14 samples/sec   Loss 2.2486   LearningRate 0.0401   Epoch: 7   Global Step: 122440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:46,886-Speed 3307.84 samples/sec   Loss 2.2386   LearningRate 0.0401   Epoch: 7   Global Step: 122450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:49,972-Speed 3319.45 samples/sec   Loss 2.2600   LearningRate 0.0401   Epoch: 7   Global Step: 122460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:53,055-Speed 3321.82 samples/sec   Loss 2.2745   LearningRate 0.0401   Epoch: 7   Global Step: 122470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:56,207-Speed 3249.63 samples/sec   Loss 2.2605   LearningRate 0.0401   Epoch: 7   Global Step: 122480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:11:59,333-Speed 3276.27 samples/sec   Loss 2.2823   LearningRate 0.0401   Epoch: 7   Global Step: 122490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:12:02,418-Speed 3320.70 samples/sec   Loss 2.2538   LearningRate 0.0401   Epoch: 7   Global Step: 122500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:12:05,510-Speed 3312.14 samples/sec   Loss 2.2731   LearningRate 0.0401   Epoch: 7   Global Step: 122510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:12:08,572-Speed 3345.56 samples/sec   Loss 2.2335   LearningRate 0.0401   Epoch: 7   Global Step: 122520   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:11,645-Speed 3333.12 samples/sec   Loss 2.3191   LearningRate 0.0401   Epoch: 7   Global Step: 122530   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:14,723-Speed 3327.33 samples/sec   Loss 2.2386   LearningRate 0.0401   Epoch: 7   Global Step: 122540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:17,802-Speed 3326.06 samples/sec   Loss 2.3343   LearningRate 0.0401   Epoch: 7   Global Step: 122550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:20,887-Speed 3320.15 samples/sec   Loss 2.2659   LearningRate 0.0401   Epoch: 7   Global Step: 122560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:23,988-Speed 3303.36 samples/sec   Loss 2.2440   LearningRate 0.0400   Epoch: 7   Global Step: 122570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:27,116-Speed 3274.17 samples/sec   Loss 2.2515   LearningRate 0.0400   Epoch: 7   Global Step: 122580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:30,210-Speed 3310.55 samples/sec   Loss 2.3019   LearningRate 0.0400   Epoch: 7   Global Step: 122590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:33,292-Speed 3323.33 samples/sec   Loss 2.2210   LearningRate 0.0400   Epoch: 7   Global Step: 122600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:36,384-Speed 3312.81 samples/sec   Loss 2.3285   LearningRate 0.0400   Epoch: 7   Global Step: 122610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:12:39,559-Speed 3226.55 samples/sec   Loss 2.3110   LearningRate 0.0400   Epoch: 7   Global Step: 122620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:12:42,690-Speed 3270.80 samples/sec   Loss 2.2396   LearningRate 0.0400   Epoch: 7   Global Step: 122630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:12:45,833-Speed 3258.73 samples/sec   Loss 2.2738   LearningRate 0.0400   Epoch: 7   Global Step: 122640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:12:48,941-Speed 3295.15 samples/sec   Loss 2.3136   LearningRate 0.0400   Epoch: 7   Global Step: 122650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:12:52,029-Speed 3317.02 samples/sec   Loss 2.2267   LearningRate 0.0400   Epoch: 7   Global Step: 122660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:12:55,134-Speed 3298.88 samples/sec   Loss 2.3056   LearningRate 0.0400   Epoch: 7   Global Step: 122670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:12:58,219-Speed 3319.60 samples/sec   Loss 2.2786   LearningRate 0.0400   Epoch: 7   Global Step: 122680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:13:01,310-Speed 3313.85 samples/sec   Loss 2.3101   LearningRate 0.0400   Epoch: 7   Global Step: 122690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:13:04,420-Speed 3294.02 samples/sec   Loss 2.2698   LearningRate 0.0400   Epoch: 7   Global Step: 122700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:13:07,554-Speed 3267.60 samples/sec   Loss 2.2558   LearningRate 0.0400   Epoch: 7   Global Step: 122710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:13:10,654-Speed 3304.09 samples/sec   Loss 2.2380   LearningRate 0.0400   Epoch: 7   Global Step: 122720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:13,740-Speed 3319.52 samples/sec   Loss 2.3354   LearningRate 0.0400   Epoch: 7   Global Step: 122730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:16,923-Speed 3217.38 samples/sec   Loss 2.2415   LearningRate 0.0400   Epoch: 7   Global Step: 122740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:20,143-Speed 3180.70 samples/sec   Loss 2.3308   LearningRate 0.0400   Epoch: 7   Global Step: 122750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:23,236-Speed 3311.47 samples/sec   Loss 2.2393   LearningRate 0.0400   Epoch: 7   Global Step: 122760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:26,391-Speed 3246.99 samples/sec   Loss 2.2008   LearningRate 0.0400   Epoch: 7   Global Step: 122770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:29,516-Speed 3277.43 samples/sec   Loss 2.1880   LearningRate 0.0400   Epoch: 7   Global Step: 122780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:32,625-Speed 3294.71 samples/sec   Loss 2.3133   LearningRate 0.0400   Epoch: 7   Global Step: 122790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:35,702-Speed 3328.77 samples/sec   Loss 2.3143   LearningRate 0.0400   Epoch: 7   Global Step: 122800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:38,860-Speed 3242.88 samples/sec   Loss 2.2174   LearningRate 0.0400   Epoch: 7   Global Step: 122810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:42,027-Speed 3233.85 samples/sec   Loss 2.2260   LearningRate 0.0400   Epoch: 7   Global Step: 122820   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 12:13:45,183-Speed 3246.64 samples/sec   Loss 2.2689   LearningRate 0.0399   Epoch: 7   Global Step: 122830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:48,323-Speed 3261.94 samples/sec   Loss 2.2233   LearningRate 0.0399   Epoch: 7   Global Step: 122840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:51,467-Speed 3256.88 samples/sec   Loss 2.3409   LearningRate 0.0399   Epoch: 7   Global Step: 122850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:54,578-Speed 3292.80 samples/sec   Loss 2.2889   LearningRate 0.0399   Epoch: 7   Global Step: 122860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:13:57,681-Speed 3301.23 samples/sec   Loss 2.3134   LearningRate 0.0399   Epoch: 7   Global Step: 122870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:14:00,796-Speed 3288.28 samples/sec   Loss 2.2591   LearningRate 0.0399   Epoch: 7   Global Step: 122880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:14:04,047-Speed 3150.96 samples/sec   Loss 2.2711   LearningRate 0.0399   Epoch: 7   Global Step: 122890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:14:07,197-Speed 3251.59 samples/sec   Loss 2.2648   LearningRate 0.0399   Epoch: 7   Global Step: 122900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:10,311-Speed 3289.19 samples/sec   Loss 2.3134   LearningRate 0.0399   Epoch: 7   Global Step: 122910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:13,437-Speed 3277.04 samples/sec   Loss 2.2371   LearningRate 0.0399   Epoch: 7   Global Step: 122920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:16,531-Speed 3310.07 samples/sec   Loss 2.3073   LearningRate 0.0399   Epoch: 7   Global Step: 122930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:19,624-Speed 3311.50 samples/sec   Loss 2.2692   LearningRate 0.0399   Epoch: 7   Global Step: 122940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:22,707-Speed 3322.06 samples/sec   Loss 2.2830   LearningRate 0.0399   Epoch: 7   Global Step: 122950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:25,872-Speed 3235.71 samples/sec   Loss 2.2388   LearningRate 0.0399   Epoch: 7   Global Step: 122960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:28,997-Speed 3277.99 samples/sec   Loss 2.2338   LearningRate 0.0399   Epoch: 7   Global Step: 122970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:32,119-Speed 3280.88 samples/sec   Loss 2.3191   LearningRate 0.0399   Epoch: 7   Global Step: 122980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:35,241-Speed 3281.35 samples/sec   Loss 2.2561   LearningRate 0.0399   Epoch: 7   Global Step: 122990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:14:38,348-Speed 3296.47 samples/sec   Loss 2.3488   LearningRate 0.0399   Epoch: 7   Global Step: 123000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:14:41,434-Speed 3318.68 samples/sec   Loss 2.2760   LearningRate 0.0399   Epoch: 7   Global Step: 123010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:14:44,636-Speed 3198.89 samples/sec   Loss 2.2528   LearningRate 0.0399   Epoch: 7   Global Step: 123020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:14:47,718-Speed 3322.77 samples/sec   Loss 2.3211   LearningRate 0.0399   Epoch: 7   Global Step: 123030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:14:50,800-Speed 3323.87 samples/sec   Loss 2.3093   LearningRate 0.0399   Epoch: 7   Global Step: 123040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:14:53,881-Speed 3324.26 samples/sec   Loss 2.2811   LearningRate 0.0399   Epoch: 7   Global Step: 123050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:14:56,997-Speed 3286.46 samples/sec   Loss 2.2545   LearningRate 0.0399   Epoch: 7   Global Step: 123060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:00,116-Speed 3284.16 samples/sec   Loss 2.3159   LearningRate 0.0399   Epoch: 7   Global Step: 123070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:03,215-Speed 3305.33 samples/sec   Loss 2.3062   LearningRate 0.0399   Epoch: 7   Global Step: 123080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:06,291-Speed 3329.89 samples/sec   Loss 2.2810   LearningRate 0.0398   Epoch: 7   Global Step: 123090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:09,358-Speed 3339.45 samples/sec   Loss 2.3130   LearningRate 0.0398   Epoch: 7   Global Step: 123100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:12,438-Speed 3325.74 samples/sec   Loss 2.2982   LearningRate 0.0398   Epoch: 7   Global Step: 123110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:15,512-Speed 3331.88 samples/sec   Loss 2.2539   LearningRate 0.0398   Epoch: 7   Global Step: 123120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:18,615-Speed 3300.44 samples/sec   Loss 2.2322   LearningRate 0.0398   Epoch: 7   Global Step: 123130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:21,723-Speed 3295.77 samples/sec   Loss 2.2929   LearningRate 0.0398   Epoch: 7   Global Step: 123140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:24,917-Speed 3206.91 samples/sec   Loss 2.3267   LearningRate 0.0398   Epoch: 7   Global Step: 123150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:28,046-Speed 3274.34 samples/sec   Loss 2.3220   LearningRate 0.0398   Epoch: 7   Global Step: 123160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:31,159-Speed 3289.81 samples/sec   Loss 2.2567   LearningRate 0.0398   Epoch: 7   Global Step: 123170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:34,305-Speed 3256.00 samples/sec   Loss 2.2718   LearningRate 0.0398   Epoch: 7   Global Step: 123180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:37,385-Speed 3325.01 samples/sec   Loss 2.2090   LearningRate 0.0398   Epoch: 7   Global Step: 123190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:40,469-Speed 3320.80 samples/sec   Loss 2.2613   LearningRate 0.0398   Epoch: 7   Global Step: 123200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:43,568-Speed 3305.68 samples/sec   Loss 2.2650   LearningRate 0.0398   Epoch: 7   Global Step: 123210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:15:46,674-Speed 3297.26 samples/sec   Loss 2.2753   LearningRate 0.0398   Epoch: 7   Global Step: 123220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:49,750-Speed 3329.58 samples/sec   Loss 2.2887   LearningRate 0.0398   Epoch: 7   Global Step: 123230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:52,866-Speed 3287.19 samples/sec   Loss 2.2922   LearningRate 0.0398   Epoch: 7   Global Step: 123240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:56,129-Speed 3139.69 samples/sec   Loss 2.2939   LearningRate 0.0398   Epoch: 7   Global Step: 123250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:15:59,250-Speed 3280.92 samples/sec   Loss 2.2377   LearningRate 0.0398   Epoch: 7   Global Step: 123260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:02,335-Speed 3320.29 samples/sec   Loss 2.2739   LearningRate 0.0398   Epoch: 7   Global Step: 123270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:05,431-Speed 3307.77 samples/sec   Loss 2.2130   LearningRate 0.0398   Epoch: 7   Global Step: 123280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:08,546-Speed 3288.76 samples/sec   Loss 2.2664   LearningRate 0.0398   Epoch: 7   Global Step: 123290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:11,753-Speed 3193.76 samples/sec   Loss 2.2679   LearningRate 0.0398   Epoch: 7   Global Step: 123300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:14,881-Speed 3275.09 samples/sec   Loss 2.3468   LearningRate 0.0398   Epoch: 7   Global Step: 123310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:17,958-Speed 3328.24 samples/sec   Loss 2.2900   LearningRate 0.0398   Epoch: 7   Global Step: 123320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:21,051-Speed 3312.16 samples/sec   Loss 2.3070   LearningRate 0.0398   Epoch: 7   Global Step: 123330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:24,284-Speed 3167.81 samples/sec   Loss 2.3505   LearningRate 0.0398   Epoch: 7   Global Step: 123340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:27,448-Speed 3237.97 samples/sec   Loss 2.2453   LearningRate 0.0398   Epoch: 7   Global Step: 123350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:30,547-Speed 3304.44 samples/sec   Loss 2.2912   LearningRate 0.0397   Epoch: 7   Global Step: 123360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:33,626-Speed 3326.59 samples/sec   Loss 2.2693   LearningRate 0.0397   Epoch: 7   Global Step: 123370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:36,705-Speed 3326.69 samples/sec   Loss 2.3035   LearningRate 0.0397   Epoch: 7   Global Step: 123380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:39,802-Speed 3307.82 samples/sec   Loss 2.2494   LearningRate 0.0397   Epoch: 7   Global Step: 123390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:42,881-Speed 3326.20 samples/sec   Loss 2.3203   LearningRate 0.0397   Epoch: 7   Global Step: 123400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:45,988-Speed 3296.80 samples/sec   Loss 2.2922   LearningRate 0.0397   Epoch: 7   Global Step: 123410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:49,123-Speed 3266.20 samples/sec   Loss 2.3193   LearningRate 0.0397   Epoch: 7   Global Step: 123420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:16:52,304-Speed 3220.89 samples/sec   Loss 2.2405   LearningRate 0.0397   Epoch: 7   Global Step: 123430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:16:55,434-Speed 3272.08 samples/sec   Loss 2.2718   LearningRate 0.0397   Epoch: 7   Global Step: 123440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:16:58,518-Speed 3321.83 samples/sec   Loss 2.3298   LearningRate 0.0397   Epoch: 7   Global Step: 123450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:17:01,620-Speed 3301.30 samples/sec   Loss 2.2925   LearningRate 0.0397   Epoch: 7   Global Step: 123460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:17:04,737-Speed 3286.22 samples/sec   Loss 2.2894   LearningRate 0.0397   Epoch: 7   Global Step: 123470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:17:07,928-Speed 3209.82 samples/sec   Loss 2.2294   LearningRate 0.0397   Epoch: 7   Global Step: 123480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:17:11,054-Speed 3276.02 samples/sec   Loss 2.3017   LearningRate 0.0397   Epoch: 7   Global Step: 123490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:17:14,152-Speed 3306.26 samples/sec   Loss 2.3185   LearningRate 0.0397   Epoch: 7   Global Step: 123500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:17:17,223-Speed 3336.03 samples/sec   Loss 2.2939   LearningRate 0.0397   Epoch: 7   Global Step: 123510   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:20,303-Speed 3325.15 samples/sec   Loss 2.2456   LearningRate 0.0397   Epoch: 7   Global Step: 123520   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:23,429-Speed 3276.29 samples/sec   Loss 2.3345   LearningRate 0.0397   Epoch: 7   Global Step: 123530   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:26,537-Speed 3296.32 samples/sec   Loss 2.2755   LearningRate 0.0397   Epoch: 7   Global Step: 123540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:29,628-Speed 3312.46 samples/sec   Loss 2.2286   LearningRate 0.0397   Epoch: 7   Global Step: 123550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:32,709-Speed 3325.29 samples/sec   Loss 2.2610   LearningRate 0.0397   Epoch: 7   Global Step: 123560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:35,832-Speed 3278.82 samples/sec   Loss 2.2544   LearningRate 0.0397   Epoch: 7   Global Step: 123570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:38,907-Speed 3331.66 samples/sec   Loss 2.3031   LearningRate 0.0397   Epoch: 7   Global Step: 123580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:42,002-Speed 3308.70 samples/sec   Loss 2.3387   LearningRate 0.0397   Epoch: 7   Global Step: 123590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:45,132-Speed 3273.25 samples/sec   Loss 2.2899   LearningRate 0.0397   Epoch: 7   Global Step: 123600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:17:48,224-Speed 3311.79 samples/sec   Loss 2.2977   LearningRate 0.0397   Epoch: 7   Global Step: 123610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:17:51,386-Speed 3240.65 samples/sec   Loss 2.3009   LearningRate 0.0396   Epoch: 7   Global Step: 123620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:17:54,465-Speed 3326.68 samples/sec   Loss 2.2949   LearningRate 0.0396   Epoch: 7   Global Step: 123630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:17:57,558-Speed 3311.00 samples/sec   Loss 2.3052   LearningRate 0.0396   Epoch: 7   Global Step: 123640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:00,677-Speed 3283.53 samples/sec   Loss 2.2776   LearningRate 0.0396   Epoch: 7   Global Step: 123650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:03,762-Speed 3321.21 samples/sec   Loss 2.2882   LearningRate 0.0396   Epoch: 7   Global Step: 123660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:06,845-Speed 3322.42 samples/sec   Loss 2.3521   LearningRate 0.0396   Epoch: 7   Global Step: 123670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:09,960-Speed 3288.09 samples/sec   Loss 2.3425   LearningRate 0.0396   Epoch: 7   Global Step: 123680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:13,047-Speed 3318.86 samples/sec   Loss 2.2923   LearningRate 0.0396   Epoch: 7   Global Step: 123690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:16,133-Speed 3318.80 samples/sec   Loss 2.3140   LearningRate 0.0396   Epoch: 7   Global Step: 123700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:19,253-Speed 3283.01 samples/sec   Loss 2.3421   LearningRate 0.0396   Epoch: 7   Global Step: 123710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:18:22,336-Speed 3321.48 samples/sec   Loss 2.2795   LearningRate 0.0396   Epoch: 7   Global Step: 123720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:18:25,443-Speed 3297.18 samples/sec   Loss 2.3101   LearningRate 0.0396   Epoch: 7   Global Step: 123730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:18:28,508-Speed 3341.50 samples/sec   Loss 2.3246   LearningRate 0.0396   Epoch: 7   Global Step: 123740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:31,618-Speed 3293.90 samples/sec   Loss 2.2937   LearningRate 0.0396   Epoch: 7   Global Step: 123750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:34,716-Speed 3305.23 samples/sec   Loss 2.2949   LearningRate 0.0396   Epoch: 7   Global Step: 123760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:37,799-Speed 3322.98 samples/sec   Loss 2.2910   LearningRate 0.0396   Epoch: 7   Global Step: 123770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:40,875-Speed 3329.02 samples/sec   Loss 2.2702   LearningRate 0.0396   Epoch: 7   Global Step: 123780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:43,960-Speed 3320.01 samples/sec   Loss 2.2716   LearningRate 0.0396   Epoch: 7   Global Step: 123790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:47,054-Speed 3310.36 samples/sec   Loss 2.3541   LearningRate 0.0396   Epoch: 7   Global Step: 123800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:50,173-Speed 3283.96 samples/sec   Loss 2.2841   LearningRate 0.0396   Epoch: 7   Global Step: 123810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:53,263-Speed 3316.53 samples/sec   Loss 2.3039   LearningRate 0.0396   Epoch: 7   Global Step: 123820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:56,365-Speed 3301.36 samples/sec   Loss 2.3245   LearningRate 0.0396   Epoch: 7   Global Step: 123830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:18:59,452-Speed 3318.33 samples/sec   Loss 2.3273   LearningRate 0.0396   Epoch: 7   Global Step: 123840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:19:02,560-Speed 3295.30 samples/sec   Loss 2.3327   LearningRate 0.0396   Epoch: 7   Global Step: 123850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:19:05,646-Speed 3318.80 samples/sec   Loss 2.2913   LearningRate 0.0396   Epoch: 7   Global Step: 123860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:19:08,775-Speed 3273.11 samples/sec   Loss 2.2776   LearningRate 0.0396   Epoch: 7   Global Step: 123870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:11,859-Speed 3322.27 samples/sec   Loss 2.2502   LearningRate 0.0396   Epoch: 7   Global Step: 123880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:15,054-Speed 3205.52 samples/sec   Loss 2.3766   LearningRate 0.0395   Epoch: 7   Global Step: 123890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:18,236-Speed 3218.71 samples/sec   Loss 2.3027   LearningRate 0.0395   Epoch: 7   Global Step: 123900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:21,323-Speed 3318.03 samples/sec   Loss 2.2715   LearningRate 0.0395   Epoch: 7   Global Step: 123910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:24,472-Speed 3252.55 samples/sec   Loss 2.3338   LearningRate 0.0395   Epoch: 7   Global Step: 123920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:27,575-Speed 3301.14 samples/sec   Loss 2.2404   LearningRate 0.0395   Epoch: 7   Global Step: 123930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:30,779-Speed 3196.60 samples/sec   Loss 2.3022   LearningRate 0.0395   Epoch: 7   Global Step: 123940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:33,899-Speed 3282.62 samples/sec   Loss 2.2892   LearningRate 0.0395   Epoch: 7   Global Step: 123950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:37,031-Speed 3271.37 samples/sec   Loss 2.3161   LearningRate 0.0395   Epoch: 7   Global Step: 123960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:19:40,259-Speed 3172.35 samples/sec   Loss 2.3648   LearningRate 0.0395   Epoch: 7   Global Step: 123970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:19:43,426-Speed 3234.02 samples/sec   Loss 2.3161   LearningRate 0.0395   Epoch: 7   Global Step: 123980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:19:46,524-Speed 3305.84 samples/sec   Loss 2.3454   LearningRate 0.0395   Epoch: 7   Global Step: 123990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:19:49,607-Speed 3322.93 samples/sec   Loss 2.2778   LearningRate 0.0395   Epoch: 7   Global Step: 124000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:20:33,684-[lfw][124000]XNorm: 23.557848
Training: 2022-04-11 12:20:33,684-[lfw][124000]Accuracy-Flip: 0.99750+-0.00300
Training: 2022-04-11 12:20:33,685-[lfw][124000]Accuracy-Highest: 0.99817
Training: 2022-04-11 12:21:24,756-[cfp_fp][124000]XNorm: 22.411495
Training: 2022-04-11 12:21:24,757-[cfp_fp][124000]Accuracy-Flip: 0.98657+-0.00617
Training: 2022-04-11 12:21:24,757-[cfp_fp][124000]Accuracy-Highest: 0.98700
Training: 2022-04-11 12:22:08,778-[agedb_30][124000]XNorm: 23.864167
Training: 2022-04-11 12:22:08,779-[agedb_30][124000]Accuracy-Flip: 0.98200+-0.00714
Training: 2022-04-11 12:22:08,779-[agedb_30][124000]Accuracy-Highest: 0.98317
Training: 2022-04-11 12:22:11,959-Speed 71.93 samples/sec   Loss 2.3105   LearningRate 0.0395   Epoch: 7   Global Step: 124010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:15,036-Speed 3329.37 samples/sec   Loss 2.3074   LearningRate 0.0395   Epoch: 7   Global Step: 124020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:18,109-Speed 3333.17 samples/sec   Loss 2.3127   LearningRate 0.0395   Epoch: 7   Global Step: 124030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:21,184-Speed 3330.85 samples/sec   Loss 2.3298   LearningRate 0.0395   Epoch: 7   Global Step: 124040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:24,249-Speed 3341.82 samples/sec   Loss 2.2694   LearningRate 0.0395   Epoch: 7   Global Step: 124050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:27,345-Speed 3308.28 samples/sec   Loss 2.3182   LearningRate 0.0395   Epoch: 7   Global Step: 124060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:30,452-Speed 3297.02 samples/sec   Loss 2.2834   LearningRate 0.0395   Epoch: 7   Global Step: 124070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:33,525-Speed 3332.28 samples/sec   Loss 2.2860   LearningRate 0.0395   Epoch: 7   Global Step: 124080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:36,603-Speed 3327.70 samples/sec   Loss 2.2607   LearningRate 0.0395   Epoch: 7   Global Step: 124090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:39,670-Speed 3340.34 samples/sec   Loss 2.2400   LearningRate 0.0395   Epoch: 7   Global Step: 124100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:22:42,734-Speed 3342.68 samples/sec   Loss 2.3396   LearningRate 0.0395   Epoch: 7   Global Step: 124110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:22:45,797-Speed 3344.87 samples/sec   Loss 2.2398   LearningRate 0.0395   Epoch: 7   Global Step: 124120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:22:48,867-Speed 3335.71 samples/sec   Loss 2.2890   LearningRate 0.0395   Epoch: 7   Global Step: 124130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:22:51,938-Speed 3335.72 samples/sec   Loss 2.3059   LearningRate 0.0395   Epoch: 7   Global Step: 124140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:22:55,023-Speed 3319.38 samples/sec   Loss 2.3382   LearningRate 0.0395   Epoch: 7   Global Step: 124150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:22:58,144-Speed 3281.84 samples/sec   Loss 2.2951   LearningRate 0.0394   Epoch: 7   Global Step: 124160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:23:01,259-Speed 3287.52 samples/sec   Loss 2.3555   LearningRate 0.0394   Epoch: 7   Global Step: 124170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:23:04,348-Speed 3316.81 samples/sec   Loss 2.3129   LearningRate 0.0394   Epoch: 7   Global Step: 124180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:23:07,491-Speed 3261.11 samples/sec   Loss 2.2878   LearningRate 0.0394   Epoch: 7   Global Step: 124190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:23:10,562-Speed 3334.43 samples/sec   Loss 2.3271   LearningRate 0.0394   Epoch: 7   Global Step: 124200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:23:13,619-Speed 3349.94 samples/sec   Loss 2.2516   LearningRate 0.0394   Epoch: 7   Global Step: 124210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:23:16,683-Speed 3343.02 samples/sec   Loss 2.2176   LearningRate 0.0394   Epoch: 7   Global Step: 124220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:19,790-Speed 3296.52 samples/sec   Loss 2.2915   LearningRate 0.0394   Epoch: 7   Global Step: 124230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:22,890-Speed 3304.17 samples/sec   Loss 2.3750   LearningRate 0.0394   Epoch: 7   Global Step: 124240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:25,979-Speed 3316.07 samples/sec   Loss 2.3091   LearningRate 0.0394   Epoch: 7   Global Step: 124250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:29,117-Speed 3263.76 samples/sec   Loss 2.3949   LearningRate 0.0394   Epoch: 7   Global Step: 124260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:32,195-Speed 3328.39 samples/sec   Loss 2.2494   LearningRate 0.0394   Epoch: 7   Global Step: 124270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:35,432-Speed 3163.85 samples/sec   Loss 2.2758   LearningRate 0.0394   Epoch: 7   Global Step: 124280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:38,536-Speed 3299.21 samples/sec   Loss 2.3227   LearningRate 0.0394   Epoch: 7   Global Step: 124290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:41,672-Speed 3266.40 samples/sec   Loss 2.3232   LearningRate 0.0394   Epoch: 7   Global Step: 124300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:44,844-Speed 3228.75 samples/sec   Loss 2.2921   LearningRate 0.0394   Epoch: 7   Global Step: 124310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:23:47,934-Speed 3315.22 samples/sec   Loss 2.3233   LearningRate 0.0394   Epoch: 7   Global Step: 124320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:23:51,009-Speed 3330.35 samples/sec   Loss 2.3670   LearningRate 0.0394   Epoch: 7   Global Step: 124330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:23:54,147-Speed 3265.06 samples/sec   Loss 2.3099   LearningRate 0.0394   Epoch: 7   Global Step: 124340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:23:57,235-Speed 3316.59 samples/sec   Loss 2.2754   LearningRate 0.0394   Epoch: 7   Global Step: 124350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:00,306-Speed 3335.55 samples/sec   Loss 2.3421   LearningRate 0.0394   Epoch: 7   Global Step: 124360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:03,390-Speed 3320.52 samples/sec   Loss 2.3281   LearningRate 0.0394   Epoch: 7   Global Step: 124370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:06,468-Speed 3327.55 samples/sec   Loss 2.2344   LearningRate 0.0394   Epoch: 7   Global Step: 124380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:09,528-Speed 3347.30 samples/sec   Loss 2.3460   LearningRate 0.0394   Epoch: 7   Global Step: 124390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:12,611-Speed 3321.49 samples/sec   Loss 2.3304   LearningRate 0.0394   Epoch: 7   Global Step: 124400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:15,698-Speed 3318.84 samples/sec   Loss 2.2665   LearningRate 0.0394   Epoch: 7   Global Step: 124410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:18,778-Speed 3325.05 samples/sec   Loss 2.3804   LearningRate 0.0393   Epoch: 7   Global Step: 124420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:21,849-Speed 3336.26 samples/sec   Loss 2.4096   LearningRate 0.0393   Epoch: 7   Global Step: 124430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:24,918-Speed 3337.71 samples/sec   Loss 2.3665   LearningRate 0.0393   Epoch: 7   Global Step: 124440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:28,014-Speed 3308.03 samples/sec   Loss 2.3122   LearningRate 0.0393   Epoch: 7   Global Step: 124450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:31,103-Speed 3314.94 samples/sec   Loss 2.2854   LearningRate 0.0393   Epoch: 7   Global Step: 124460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:34,173-Speed 3336.53 samples/sec   Loss 2.2665   LearningRate 0.0393   Epoch: 7   Global Step: 124470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:37,245-Speed 3334.63 samples/sec   Loss 2.2731   LearningRate 0.0393   Epoch: 7   Global Step: 124480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:24:40,330-Speed 3319.59 samples/sec   Loss 2.4067   LearningRate 0.0393   Epoch: 7   Global Step: 124490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:43,524-Speed 3206.39 samples/sec   Loss 2.3201   LearningRate 0.0393   Epoch: 7   Global Step: 124500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:46,594-Speed 3337.11 samples/sec   Loss 2.2758   LearningRate 0.0393   Epoch: 7   Global Step: 124510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:49,668-Speed 3332.50 samples/sec   Loss 2.3121   LearningRate 0.0393   Epoch: 7   Global Step: 124520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:52,782-Speed 3288.98 samples/sec   Loss 2.2657   LearningRate 0.0393   Epoch: 7   Global Step: 124530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:55,902-Speed 3282.73 samples/sec   Loss 2.2784   LearningRate 0.0393   Epoch: 7   Global Step: 124540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:24:58,974-Speed 3333.88 samples/sec   Loss 2.3326   LearningRate 0.0393   Epoch: 7   Global Step: 124550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:25:02,065-Speed 3312.94 samples/sec   Loss 2.3018   LearningRate 0.0393   Epoch: 7   Global Step: 124560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:25:05,151-Speed 3320.14 samples/sec   Loss 2.2765   LearningRate 0.0393   Epoch: 7   Global Step: 124570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:25:08,238-Speed 3317.32 samples/sec   Loss 2.3271   LearningRate 0.0393   Epoch: 7   Global Step: 124580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:25:11,289-Speed 3356.92 samples/sec   Loss 2.2662   LearningRate 0.0393   Epoch: 7   Global Step: 124590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:14,376-Speed 3317.68 samples/sec   Loss 2.3204   LearningRate 0.0393   Epoch: 7   Global Step: 124600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:17,445-Speed 3338.08 samples/sec   Loss 2.2992   LearningRate 0.0393   Epoch: 7   Global Step: 124610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:20,634-Speed 3212.58 samples/sec   Loss 2.3763   LearningRate 0.0393   Epoch: 7   Global Step: 124620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:23,831-Speed 3203.30 samples/sec   Loss 2.3340   LearningRate 0.0393   Epoch: 7   Global Step: 124630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:26,907-Speed 3329.37 samples/sec   Loss 2.3192   LearningRate 0.0393   Epoch: 7   Global Step: 124640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:30,011-Speed 3300.39 samples/sec   Loss 2.3834   LearningRate 0.0393   Epoch: 7   Global Step: 124650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:33,148-Speed 3264.75 samples/sec   Loss 2.2851   LearningRate 0.0393   Epoch: 7   Global Step: 124660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:36,217-Speed 3337.79 samples/sec   Loss 2.3047   LearningRate 0.0393   Epoch: 7   Global Step: 124670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:39,287-Speed 3336.26 samples/sec   Loss 2.2675   LearningRate 0.0393   Epoch: 7   Global Step: 124680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:25:42,429-Speed 3259.80 samples/sec   Loss 2.3545   LearningRate 0.0392   Epoch: 7   Global Step: 124690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:25:45,498-Speed 3337.34 samples/sec   Loss 2.2880   LearningRate 0.0392   Epoch: 7   Global Step: 124700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:25:48,594-Speed 3308.14 samples/sec   Loss 2.3420   LearningRate 0.0392   Epoch: 7   Global Step: 124710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:25:51,703-Speed 3294.52 samples/sec   Loss 2.2991   LearningRate 0.0392   Epoch: 7   Global Step: 124720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:25:54,799-Speed 3308.36 samples/sec   Loss 2.3330   LearningRate 0.0392   Epoch: 7   Global Step: 124730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:25:57,916-Speed 3286.15 samples/sec   Loss 2.3095   LearningRate 0.0392   Epoch: 7   Global Step: 124740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:01,023-Speed 3296.24 samples/sec   Loss 2.3097   LearningRate 0.0392   Epoch: 7   Global Step: 124750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:04,132-Speed 3294.52 samples/sec   Loss 2.2787   LearningRate 0.0392   Epoch: 7   Global Step: 124760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:07,209-Speed 3328.93 samples/sec   Loss 2.2482   LearningRate 0.0392   Epoch: 7   Global Step: 124770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:10,367-Speed 3242.86 samples/sec   Loss 2.2763   LearningRate 0.0392   Epoch: 7   Global Step: 124780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:13,499-Speed 3270.21 samples/sec   Loss 2.3185   LearningRate 0.0392   Epoch: 7   Global Step: 124790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:16,584-Speed 3319.93 samples/sec   Loss 2.2844   LearningRate 0.0392   Epoch: 7   Global Step: 124800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:19,679-Speed 3309.55 samples/sec   Loss 2.2941   LearningRate 0.0392   Epoch: 7   Global Step: 124810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:22,759-Speed 3326.29 samples/sec   Loss 2.2796   LearningRate 0.0392   Epoch: 7   Global Step: 124820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:25,842-Speed 3322.32 samples/sec   Loss 2.3201   LearningRate 0.0392   Epoch: 7   Global Step: 124830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:28,957-Speed 3288.60 samples/sec   Loss 2.2699   LearningRate 0.0392   Epoch: 7   Global Step: 124840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:32,167-Speed 3191.15 samples/sec   Loss 2.3196   LearningRate 0.0392   Epoch: 7   Global Step: 124850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:35,255-Speed 3317.90 samples/sec   Loss 2.2823   LearningRate 0.0392   Epoch: 7   Global Step: 124860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:38,388-Speed 3269.04 samples/sec   Loss 2.2338   LearningRate 0.0392   Epoch: 7   Global Step: 124870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:41,581-Speed 3207.42 samples/sec   Loss 2.2957   LearningRate 0.0392   Epoch: 7   Global Step: 124880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:44,660-Speed 3326.92 samples/sec   Loss 2.3181   LearningRate 0.0392   Epoch: 7   Global Step: 124890   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 12:26:47,759-Speed 3304.90 samples/sec   Loss 2.3116   LearningRate 0.0392   Epoch: 7   Global Step: 124900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:50,975-Speed 3185.10 samples/sec   Loss 2.2635   LearningRate 0.0392   Epoch: 7   Global Step: 124910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:54,054-Speed 3326.70 samples/sec   Loss 2.3401   LearningRate 0.0392   Epoch: 7   Global Step: 124920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:26:57,140-Speed 3318.44 samples/sec   Loss 2.2512   LearningRate 0.0392   Epoch: 7   Global Step: 124930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:00,365-Speed 3175.90 samples/sec   Loss 2.2532   LearningRate 0.0392   Epoch: 7   Global Step: 124940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:03,525-Speed 3241.39 samples/sec   Loss 2.2621   LearningRate 0.0391   Epoch: 7   Global Step: 124950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:06,599-Speed 3331.60 samples/sec   Loss 2.3312   LearningRate 0.0391   Epoch: 7   Global Step: 124960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:09,688-Speed 3316.93 samples/sec   Loss 2.2714   LearningRate 0.0391   Epoch: 7   Global Step: 124970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:12,840-Speed 3248.62 samples/sec   Loss 2.2465   LearningRate 0.0391   Epoch: 7   Global Step: 124980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:16,056-Speed 3185.14 samples/sec   Loss 2.2659   LearningRate 0.0391   Epoch: 7   Global Step: 124990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:19,143-Speed 3318.54 samples/sec   Loss 2.2751   LearningRate 0.0391   Epoch: 7   Global Step: 125000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:22,264-Speed 3280.89 samples/sec   Loss 2.3258   LearningRate 0.0391   Epoch: 7   Global Step: 125010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:25,473-Speed 3192.01 samples/sec   Loss 2.3086   LearningRate 0.0391   Epoch: 7   Global Step: 125020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:28,552-Speed 3326.82 samples/sec   Loss 2.3408   LearningRate 0.0391   Epoch: 7   Global Step: 125030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:31,634-Speed 3322.30 samples/sec   Loss 2.4189   LearningRate 0.0391   Epoch: 7   Global Step: 125040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:34,713-Speed 3327.38 samples/sec   Loss 2.2992   LearningRate 0.0391   Epoch: 7   Global Step: 125050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:37,783-Speed 3335.80 samples/sec   Loss 2.2927   LearningRate 0.0391   Epoch: 7   Global Step: 125060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:40,936-Speed 3249.39 samples/sec   Loss 2.3526   LearningRate 0.0391   Epoch: 7   Global Step: 125070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:44,028-Speed 3312.41 samples/sec   Loss 2.3155   LearningRate 0.0391   Epoch: 7   Global Step: 125080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:47,247-Speed 3181.50 samples/sec   Loss 2.2788   LearningRate 0.0391   Epoch: 7   Global Step: 125090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:50,337-Speed 3314.86 samples/sec   Loss 2.3323   LearningRate 0.0391   Epoch: 7   Global Step: 125100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:27:53,429-Speed 3311.90 samples/sec   Loss 2.2721   LearningRate 0.0391   Epoch: 7   Global Step: 125110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:56,546-Speed 3286.41 samples/sec   Loss 2.3630   LearningRate 0.0391   Epoch: 7   Global Step: 125120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:27:59,689-Speed 3259.01 samples/sec   Loss 2.3150   LearningRate 0.0391   Epoch: 7   Global Step: 125130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:02,805-Speed 3286.85 samples/sec   Loss 2.4114   LearningRate 0.0391   Epoch: 7   Global Step: 125140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:05,883-Speed 3327.79 samples/sec   Loss 2.3296   LearningRate 0.0391   Epoch: 7   Global Step: 125150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:08,963-Speed 3324.90 samples/sec   Loss 2.3480   LearningRate 0.0391   Epoch: 7   Global Step: 125160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:12,073-Speed 3293.87 samples/sec   Loss 2.3065   LearningRate 0.0391   Epoch: 7   Global Step: 125170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:15,195-Speed 3280.32 samples/sec   Loss 2.2907   LearningRate 0.0391   Epoch: 7   Global Step: 125180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:18,291-Speed 3308.32 samples/sec   Loss 2.2558   LearningRate 0.0391   Epoch: 7   Global Step: 125190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:21,515-Speed 3176.75 samples/sec   Loss 2.2659   LearningRate 0.0391   Epoch: 7   Global Step: 125200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:24,697-Speed 3218.85 samples/sec   Loss 2.2849   LearningRate 0.0391   Epoch: 7   Global Step: 125210   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 12:28:27,778-Speed 3324.68 samples/sec   Loss 2.2604   LearningRate 0.0390   Epoch: 7   Global Step: 125220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:30,936-Speed 3243.77 samples/sec   Loss 2.2950   LearningRate 0.0390   Epoch: 7   Global Step: 125230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:34,025-Speed 3315.75 samples/sec   Loss 2.3573   LearningRate 0.0390   Epoch: 7   Global Step: 125240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:28:37,129-Speed 3299.57 samples/sec   Loss 2.3267   LearningRate 0.0390   Epoch: 7   Global Step: 125250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:28:40,256-Speed 3275.82 samples/sec   Loss 2.3179   LearningRate 0.0390   Epoch: 7   Global Step: 125260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:28:43,370-Speed 3289.10 samples/sec   Loss 2.3364   LearningRate 0.0390   Epoch: 7   Global Step: 125270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:28:46,534-Speed 3236.61 samples/sec   Loss 2.3078   LearningRate 0.0390   Epoch: 7   Global Step: 125280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:28:49,724-Speed 3210.99 samples/sec   Loss 2.3119   LearningRate 0.0390   Epoch: 7   Global Step: 125290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:28:52,932-Speed 3192.15 samples/sec   Loss 2.4204   LearningRate 0.0390   Epoch: 7   Global Step: 125300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:28:56,008-Speed 3330.64 samples/sec   Loss 2.3152   LearningRate 0.0390   Epoch: 7   Global Step: 125310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:28:59,130-Speed 3280.66 samples/sec   Loss 2.2396   LearningRate 0.0390   Epoch: 7   Global Step: 125320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:29:02,239-Speed 3293.99 samples/sec   Loss 2.3085   LearningRate 0.0390   Epoch: 7   Global Step: 125330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:29:05,431-Speed 3209.34 samples/sec   Loss 2.2638   LearningRate 0.0390   Epoch: 7   Global Step: 125340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:29:08,668-Speed 3163.24 samples/sec   Loss 2.3704   LearningRate 0.0390   Epoch: 7   Global Step: 125350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:11,838-Speed 3231.26 samples/sec   Loss 2.3088   LearningRate 0.0390   Epoch: 7   Global Step: 125360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:15,000-Speed 3239.83 samples/sec   Loss 2.2893   LearningRate 0.0390   Epoch: 7   Global Step: 125370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:18,181-Speed 3219.32 samples/sec   Loss 2.2875   LearningRate 0.0390   Epoch: 7   Global Step: 125380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:21,286-Speed 3299.15 samples/sec   Loss 2.3162   LearningRate 0.0390   Epoch: 7   Global Step: 125390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:24,394-Speed 3295.61 samples/sec   Loss 2.2515   LearningRate 0.0390   Epoch: 7   Global Step: 125400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:27,484-Speed 3314.75 samples/sec   Loss 2.2578   LearningRate 0.0390   Epoch: 7   Global Step: 125410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:30,570-Speed 3318.72 samples/sec   Loss 2.3231   LearningRate 0.0390   Epoch: 7   Global Step: 125420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:33,725-Speed 3246.82 samples/sec   Loss 2.3867   LearningRate 0.0390   Epoch: 7   Global Step: 125430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:36,800-Speed 3331.44 samples/sec   Loss 2.2505   LearningRate 0.0390   Epoch: 7   Global Step: 125440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:39,893-Speed 3310.80 samples/sec   Loss 2.2617   LearningRate 0.0390   Epoch: 7   Global Step: 125450   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 12:29:43,039-Speed 3255.38 samples/sec   Loss 2.2422   LearningRate 0.0390   Epoch: 7   Global Step: 125460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:46,117-Speed 3328.21 samples/sec   Loss 2.2651   LearningRate 0.0390   Epoch: 7   Global Step: 125470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:49,203-Speed 3319.56 samples/sec   Loss 2.2238   LearningRate 0.0390   Epoch: 7   Global Step: 125480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:52,278-Speed 3330.50 samples/sec   Loss 2.3076   LearningRate 0.0389   Epoch: 7   Global Step: 125490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:55,354-Speed 3329.42 samples/sec   Loss 2.3629   LearningRate 0.0389   Epoch: 7   Global Step: 125500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:29:58,432-Speed 3327.43 samples/sec   Loss 2.3218   LearningRate 0.0389   Epoch: 7   Global Step: 125510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:01,510-Speed 3327.30 samples/sec   Loss 2.2380   LearningRate 0.0389   Epoch: 7   Global Step: 125520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:04,586-Speed 3330.15 samples/sec   Loss 2.3199   LearningRate 0.0389   Epoch: 7   Global Step: 125530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:07,683-Speed 3307.53 samples/sec   Loss 2.3324   LearningRate 0.0389   Epoch: 7   Global Step: 125540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:10,778-Speed 3309.07 samples/sec   Loss 2.3186   LearningRate 0.0389   Epoch: 7   Global Step: 125550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:13,986-Speed 3192.71 samples/sec   Loss 2.2872   LearningRate 0.0389   Epoch: 7   Global Step: 125560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:17,077-Speed 3314.09 samples/sec   Loss 2.3230   LearningRate 0.0389   Epoch: 7   Global Step: 125570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:20,211-Speed 3268.41 samples/sec   Loss 2.3284   LearningRate 0.0389   Epoch: 7   Global Step: 125580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:23,292-Speed 3323.89 samples/sec   Loss 2.3230   LearningRate 0.0389   Epoch: 7   Global Step: 125590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:26,367-Speed 3330.94 samples/sec   Loss 2.2897   LearningRate 0.0389   Epoch: 7   Global Step: 125600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:29,442-Speed 3331.30 samples/sec   Loss 2.2762   LearningRate 0.0389   Epoch: 7   Global Step: 125610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:32,520-Speed 3328.59 samples/sec   Loss 2.3041   LearningRate 0.0389   Epoch: 7   Global Step: 125620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:30:35,598-Speed 3327.66 samples/sec   Loss 2.3862   LearningRate 0.0389   Epoch: 7   Global Step: 125630   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:30:38,701-Speed 3300.45 samples/sec   Loss 2.3424   LearningRate 0.0389   Epoch: 7   Global Step: 125640   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:30:41,786-Speed 3320.22 samples/sec   Loss 2.3288   LearningRate 0.0389   Epoch: 7   Global Step: 125650   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:30:44,879-Speed 3312.10 samples/sec   Loss 2.2395   LearningRate 0.0389   Epoch: 7   Global Step: 125660   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:30:47,985-Speed 3297.78 samples/sec   Loss 2.3756   LearningRate 0.0389   Epoch: 7   Global Step: 125670   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:30:51,124-Speed 3262.74 samples/sec   Loss 2.2956   LearningRate 0.0389   Epoch: 7   Global Step: 125680   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:30:54,210-Speed 3318.27 samples/sec   Loss 2.3170   LearningRate 0.0389   Epoch: 7   Global Step: 125690   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:30:57,286-Speed 3330.06 samples/sec   Loss 2.3046   LearningRate 0.0389   Epoch: 7   Global Step: 125700   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:31:00,380-Speed 3311.13 samples/sec   Loss 2.3090   LearningRate 0.0389   Epoch: 7   Global Step: 125710   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:31:03,450-Speed 3335.73 samples/sec   Loss 2.4172   LearningRate 0.0389   Epoch: 7   Global Step: 125720   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:31:06,553-Speed 3300.62 samples/sec   Loss 2.2501   LearningRate 0.0389   Epoch: 7   Global Step: 125730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:09,632-Speed 3327.16 samples/sec   Loss 2.2544   LearningRate 0.0389   Epoch: 7   Global Step: 125740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:12,703-Speed 3336.05 samples/sec   Loss 2.2830   LearningRate 0.0389   Epoch: 7   Global Step: 125750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:15,775-Speed 3333.97 samples/sec   Loss 2.2713   LearningRate 0.0388   Epoch: 7   Global Step: 125760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:18,849-Speed 3332.43 samples/sec   Loss 2.3268   LearningRate 0.0388   Epoch: 7   Global Step: 125770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:21,976-Speed 3275.27 samples/sec   Loss 2.2909   LearningRate 0.0388   Epoch: 7   Global Step: 125780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:25,059-Speed 3322.45 samples/sec   Loss 2.3039   LearningRate 0.0388   Epoch: 7   Global Step: 125790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:28,247-Speed 3212.87 samples/sec   Loss 2.3162   LearningRate 0.0388   Epoch: 7   Global Step: 125800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:31,325-Speed 3327.63 samples/sec   Loss 2.3139   LearningRate 0.0388   Epoch: 7   Global Step: 125810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:34,460-Speed 3267.20 samples/sec   Loss 2.3142   LearningRate 0.0388   Epoch: 7   Global Step: 125820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:37,582-Speed 3281.03 samples/sec   Loss 2.3078   LearningRate 0.0388   Epoch: 7   Global Step: 125830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:31:40,697-Speed 3287.61 samples/sec   Loss 2.2021   LearningRate 0.0388   Epoch: 7   Global Step: 125840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:31:43,842-Speed 3256.95 samples/sec   Loss 2.3332   LearningRate 0.0388   Epoch: 7   Global Step: 125850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:31:46,994-Speed 3249.59 samples/sec   Loss 2.3511   LearningRate 0.0388   Epoch: 7   Global Step: 125860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:31:50,165-Speed 3229.97 samples/sec   Loss 2.3250   LearningRate 0.0388   Epoch: 7   Global Step: 125870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:31:53,295-Speed 3271.73 samples/sec   Loss 2.3022   LearningRate 0.0388   Epoch: 7   Global Step: 125880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:56,399-Speed 3301.00 samples/sec   Loss 2.3755   LearningRate 0.0388   Epoch: 7   Global Step: 125890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:31:59,578-Speed 3221.28 samples/sec   Loss 2.2721   LearningRate 0.0388   Epoch: 7   Global Step: 125900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:32:02,653-Speed 3331.31 samples/sec   Loss 2.3506   LearningRate 0.0388   Epoch: 7   Global Step: 125910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:32:05,734-Speed 3324.24 samples/sec   Loss 2.3839   LearningRate 0.0388   Epoch: 7   Global Step: 125920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:32:08,845-Speed 3292.06 samples/sec   Loss 2.2703   LearningRate 0.0388   Epoch: 7   Global Step: 125930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:32:11,929-Speed 3321.43 samples/sec   Loss 2.3577   LearningRate 0.0388   Epoch: 7   Global Step: 125940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:32:15,013-Speed 3320.65 samples/sec   Loss 2.3562   LearningRate 0.0388   Epoch: 7   Global Step: 125950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:32:18,091-Speed 3328.14 samples/sec   Loss 2.3010   LearningRate 0.0388   Epoch: 7   Global Step: 125960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:32:21,334-Speed 3158.20 samples/sec   Loss 2.3369   LearningRate 0.0388   Epoch: 7   Global Step: 125970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:32:24,525-Speed 3209.92 samples/sec   Loss 2.3382   LearningRate 0.0388   Epoch: 7   Global Step: 125980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:32:27,653-Speed 3274.44 samples/sec   Loss 2.4058   LearningRate 0.0388   Epoch: 7   Global Step: 125990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:32:30,809-Speed 3246.53 samples/sec   Loss 2.3567   LearningRate 0.0388   Epoch: 7   Global Step: 126000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:33:15,334-[lfw][126000]XNorm: 24.490161
Training: 2022-04-11 12:33:15,334-[lfw][126000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 12:33:15,335-[lfw][126000]Accuracy-Highest: 0.99817
Training: 2022-04-11 12:34:06,758-[cfp_fp][126000]XNorm: 23.342157
Training: 2022-04-11 12:34:06,759-[cfp_fp][126000]Accuracy-Flip: 0.98686+-0.00530
Training: 2022-04-11 12:34:06,759-[cfp_fp][126000]Accuracy-Highest: 0.98700
Training: 2022-04-11 12:34:50,830-[agedb_30][126000]XNorm: 24.956585
Training: 2022-04-11 12:34:50,831-[agedb_30][126000]Accuracy-Flip: 0.98167+-0.00687
Training: 2022-04-11 12:34:50,831-[agedb_30][126000]Accuracy-Highest: 0.98317
Training: 2022-04-11 12:34:53,923-Speed 71.55 samples/sec   Loss 2.2814   LearningRate 0.0388   Epoch: 7   Global Step: 126010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:34:57,054-Speed 3271.63 samples/sec   Loss 2.3237   LearningRate 0.0387   Epoch: 7   Global Step: 126020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:00,134-Speed 3325.44 samples/sec   Loss 2.3215   LearningRate 0.0387   Epoch: 7   Global Step: 126030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:03,273-Speed 3263.10 samples/sec   Loss 2.3130   LearningRate 0.0387   Epoch: 7   Global Step: 126040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:06,352-Speed 3326.50 samples/sec   Loss 2.2949   LearningRate 0.0387   Epoch: 7   Global Step: 126050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:09,423-Speed 3335.23 samples/sec   Loss 2.3281   LearningRate 0.0387   Epoch: 7   Global Step: 126060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:12,506-Speed 3322.70 samples/sec   Loss 2.2988   LearningRate 0.0387   Epoch: 7   Global Step: 126070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:15,566-Speed 3346.76 samples/sec   Loss 2.2352   LearningRate 0.0387   Epoch: 7   Global Step: 126080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:18,650-Speed 3321.32 samples/sec   Loss 2.3356   LearningRate 0.0387   Epoch: 7   Global Step: 126090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:21,723-Speed 3332.94 samples/sec   Loss 2.4086   LearningRate 0.0387   Epoch: 7   Global Step: 126100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:24,829-Speed 3297.80 samples/sec   Loss 2.3043   LearningRate 0.0387   Epoch: 7   Global Step: 126110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:27,975-Speed 3256.16 samples/sec   Loss 2.2837   LearningRate 0.0387   Epoch: 7   Global Step: 126120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:31,098-Speed 3279.30 samples/sec   Loss 2.3017   LearningRate 0.0387   Epoch: 7   Global Step: 126130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:34,217-Speed 3283.82 samples/sec   Loss 2.3305   LearningRate 0.0387   Epoch: 7   Global Step: 126140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:37,346-Speed 3273.20 samples/sec   Loss 2.3052   LearningRate 0.0387   Epoch: 7   Global Step: 126150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:40,448-Speed 3302.44 samples/sec   Loss 2.3036   LearningRate 0.0387   Epoch: 7   Global Step: 126160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:43,550-Speed 3301.86 samples/sec   Loss 2.3294   LearningRate 0.0387   Epoch: 7   Global Step: 126170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:46,665-Speed 3288.11 samples/sec   Loss 2.3207   LearningRate 0.0387   Epoch: 7   Global Step: 126180   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 12:35:49,776-Speed 3292.65 samples/sec   Loss 2.3377   LearningRate 0.0387   Epoch: 7   Global Step: 126190   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 12:35:52,873-Speed 3306.79 samples/sec   Loss 2.2922   LearningRate 0.0387   Epoch: 7   Global Step: 126200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:35:55,967-Speed 3310.83 samples/sec   Loss 2.3234   LearningRate 0.0387   Epoch: 7   Global Step: 126210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:35:59,074-Speed 3295.95 samples/sec   Loss 2.2869   LearningRate 0.0387   Epoch: 7   Global Step: 126220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:36:02,170-Speed 3308.73 samples/sec   Loss 2.3628   LearningRate 0.0387   Epoch: 7   Global Step: 126230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:36:05,247-Speed 3328.88 samples/sec   Loss 2.2543   LearningRate 0.0387   Epoch: 7   Global Step: 126240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:36:08,344-Speed 3306.56 samples/sec   Loss 2.2283   LearningRate 0.0387   Epoch: 7   Global Step: 126250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:36:11,537-Speed 3207.81 samples/sec   Loss 2.3010   LearningRate 0.0387   Epoch: 7   Global Step: 126260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:36:14,626-Speed 3316.27 samples/sec   Loss 2.3363   LearningRate 0.0387   Epoch: 7   Global Step: 126270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:36:17,706-Speed 3325.36 samples/sec   Loss 2.2719   LearningRate 0.0387   Epoch: 7   Global Step: 126280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:36:20,791-Speed 3319.64 samples/sec   Loss 2.2776   LearningRate 0.0386   Epoch: 7   Global Step: 126290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:36:23,875-Speed 3320.77 samples/sec   Loss 2.3362   LearningRate 0.0386   Epoch: 7   Global Step: 126300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:36:26,985-Speed 3293.91 samples/sec   Loss 2.3535   LearningRate 0.0386   Epoch: 7   Global Step: 126310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:30,099-Speed 3289.01 samples/sec   Loss 2.2535   LearningRate 0.0386   Epoch: 7   Global Step: 126320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:33,238-Speed 3263.15 samples/sec   Loss 2.3265   LearningRate 0.0386   Epoch: 7   Global Step: 126330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:36,358-Speed 3282.97 samples/sec   Loss 2.3387   LearningRate 0.0386   Epoch: 7   Global Step: 126340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:39,488-Speed 3272.04 samples/sec   Loss 2.3131   LearningRate 0.0386   Epoch: 7   Global Step: 126350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:42,675-Speed 3213.30 samples/sec   Loss 2.3243   LearningRate 0.0386   Epoch: 7   Global Step: 126360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:45,933-Speed 3144.15 samples/sec   Loss 2.2691   LearningRate 0.0386   Epoch: 7   Global Step: 126370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:49,125-Speed 3208.52 samples/sec   Loss 2.2715   LearningRate 0.0386   Epoch: 7   Global Step: 126380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:52,215-Speed 3314.57 samples/sec   Loss 2.3328   LearningRate 0.0386   Epoch: 7   Global Step: 126390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:55,355-Speed 3262.10 samples/sec   Loss 2.3701   LearningRate 0.0386   Epoch: 7   Global Step: 126400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:36:58,457-Speed 3302.07 samples/sec   Loss 2.2808   LearningRate 0.0386   Epoch: 7   Global Step: 126410   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 12:37:01,527-Speed 3336.84 samples/sec   Loss 2.2587   LearningRate 0.0386   Epoch: 7   Global Step: 126420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:37:04,603-Speed 3330.21 samples/sec   Loss 2.2925   LearningRate 0.0386   Epoch: 7   Global Step: 126430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:37:07,738-Speed 3266.82 samples/sec   Loss 2.3568   LearningRate 0.0386   Epoch: 7   Global Step: 126440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:37:10,818-Speed 3324.87 samples/sec   Loss 2.2845   LearningRate 0.0386   Epoch: 7   Global Step: 126450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:37:13,903-Speed 3320.17 samples/sec   Loss 2.3096   LearningRate 0.0386   Epoch: 7   Global Step: 126460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:37:16,992-Speed 3315.96 samples/sec   Loss 2.3570   LearningRate 0.0386   Epoch: 7   Global Step: 126470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:37:20,104-Speed 3291.54 samples/sec   Loss 2.3424   LearningRate 0.0386   Epoch: 7   Global Step: 126480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:37:23,186-Speed 3322.80 samples/sec   Loss 2.2609   LearningRate 0.0386   Epoch: 7   Global Step: 126490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:37:26,272-Speed 3319.99 samples/sec   Loss 2.3185   LearningRate 0.0386   Epoch: 7   Global Step: 126500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:29,351-Speed 3327.19 samples/sec   Loss 2.2876   LearningRate 0.0386   Epoch: 7   Global Step: 126510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:32,460-Speed 3293.89 samples/sec   Loss 2.2862   LearningRate 0.0386   Epoch: 7   Global Step: 126520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:35,696-Speed 3165.56 samples/sec   Loss 2.2735   LearningRate 0.0386   Epoch: 7   Global Step: 126530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:38,772-Speed 3329.62 samples/sec   Loss 2.3767   LearningRate 0.0386   Epoch: 7   Global Step: 126540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:41,855-Speed 3322.25 samples/sec   Loss 2.2571   LearningRate 0.0386   Epoch: 7   Global Step: 126550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:45,011-Speed 3245.87 samples/sec   Loss 2.2569   LearningRate 0.0385   Epoch: 7   Global Step: 126560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:48,100-Speed 3315.42 samples/sec   Loss 2.3829   LearningRate 0.0385   Epoch: 7   Global Step: 126570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:51,193-Speed 3310.89 samples/sec   Loss 2.3824   LearningRate 0.0385   Epoch: 7   Global Step: 126580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:54,305-Speed 3291.55 samples/sec   Loss 2.3615   LearningRate 0.0385   Epoch: 7   Global Step: 126590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:37:57,520-Speed 3186.82 samples/sec   Loss 2.2972   LearningRate 0.0385   Epoch: 7   Global Step: 126600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:00,609-Speed 3314.79 samples/sec   Loss 2.3111   LearningRate 0.0385   Epoch: 7   Global Step: 126610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:03,691-Speed 3323.85 samples/sec   Loss 2.3189   LearningRate 0.0385   Epoch: 7   Global Step: 126620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:06,816-Speed 3277.48 samples/sec   Loss 2.3017   LearningRate 0.0385   Epoch: 7   Global Step: 126630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:09,933-Speed 3285.58 samples/sec   Loss 2.2588   LearningRate 0.0385   Epoch: 7   Global Step: 126640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:13,052-Speed 3284.52 samples/sec   Loss 2.3645   LearningRate 0.0385   Epoch: 7   Global Step: 126650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:16,169-Speed 3285.32 samples/sec   Loss 2.2707   LearningRate 0.0385   Epoch: 7   Global Step: 126660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:19,333-Speed 3237.70 samples/sec   Loss 2.3107   LearningRate 0.0385   Epoch: 7   Global Step: 126670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:22,521-Speed 3212.17 samples/sec   Loss 2.2609   LearningRate 0.0385   Epoch: 7   Global Step: 126680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:25,611-Speed 3314.75 samples/sec   Loss 2.3016   LearningRate 0.0385   Epoch: 7   Global Step: 126690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:28,700-Speed 3316.21 samples/sec   Loss 2.3513   LearningRate 0.0385   Epoch: 7   Global Step: 126700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:31,848-Speed 3253.94 samples/sec   Loss 2.3470   LearningRate 0.0385   Epoch: 7   Global Step: 126710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:35,017-Speed 3231.38 samples/sec   Loss 2.3307   LearningRate 0.0385   Epoch: 7   Global Step: 126720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:38,096-Speed 3326.36 samples/sec   Loss 2.2389   LearningRate 0.0385   Epoch: 7   Global Step: 126730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:41,186-Speed 3314.88 samples/sec   Loss 2.3317   LearningRate 0.0385   Epoch: 7   Global Step: 126740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:44,272-Speed 3318.56 samples/sec   Loss 2.3515   LearningRate 0.0385   Epoch: 7   Global Step: 126750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:47,419-Speed 3255.90 samples/sec   Loss 2.3411   LearningRate 0.0385   Epoch: 7   Global Step: 126760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:50,521-Speed 3301.28 samples/sec   Loss 2.3318   LearningRate 0.0385   Epoch: 7   Global Step: 126770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:38:53,622-Speed 3303.94 samples/sec   Loss 2.3198   LearningRate 0.0385   Epoch: 7   Global Step: 126780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:38:56,750-Speed 3274.09 samples/sec   Loss 2.2560   LearningRate 0.0385   Epoch: 7   Global Step: 126790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:38:59,937-Speed 3213.52 samples/sec   Loss 2.2825   LearningRate 0.0385   Epoch: 7   Global Step: 126800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:03,121-Speed 3217.02 samples/sec   Loss 2.2706   LearningRate 0.0385   Epoch: 7   Global Step: 126810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:06,240-Speed 3283.31 samples/sec   Loss 2.3191   LearningRate 0.0385   Epoch: 7   Global Step: 126820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:09,426-Speed 3214.68 samples/sec   Loss 2.2712   LearningRate 0.0384   Epoch: 7   Global Step: 126830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:12,541-Speed 3288.44 samples/sec   Loss 2.3051   LearningRate 0.0384   Epoch: 7   Global Step: 126840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:15,621-Speed 3326.13 samples/sec   Loss 2.3910   LearningRate 0.0384   Epoch: 7   Global Step: 126850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:18,699-Speed 3327.45 samples/sec   Loss 2.3025   LearningRate 0.0384   Epoch: 7   Global Step: 126860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:21,829-Speed 3272.16 samples/sec   Loss 2.3372   LearningRate 0.0384   Epoch: 7   Global Step: 126870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:25,090-Speed 3141.05 samples/sec   Loss 2.3017   LearningRate 0.0384   Epoch: 7   Global Step: 126880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:39:28,288-Speed 3203.04 samples/sec   Loss 2.3041   LearningRate 0.0384   Epoch: 7   Global Step: 126890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:31,407-Speed 3283.67 samples/sec   Loss 2.2649   LearningRate 0.0384   Epoch: 7   Global Step: 126900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:34,486-Speed 3326.15 samples/sec   Loss 2.2995   LearningRate 0.0384   Epoch: 7   Global Step: 126910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:37,596-Speed 3294.10 samples/sec   Loss 2.3050   LearningRate 0.0384   Epoch: 7   Global Step: 126920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:40,746-Speed 3251.23 samples/sec   Loss 2.2357   LearningRate 0.0384   Epoch: 7   Global Step: 126930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:43,856-Speed 3293.06 samples/sec   Loss 2.2753   LearningRate 0.0384   Epoch: 7   Global Step: 126940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:46,953-Speed 3307.10 samples/sec   Loss 2.2839   LearningRate 0.0384   Epoch: 7   Global Step: 126950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:50,045-Speed 3313.35 samples/sec   Loss 2.3685   LearningRate 0.0384   Epoch: 7   Global Step: 126960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:53,125-Speed 3325.76 samples/sec   Loss 2.2616   LearningRate 0.0384   Epoch: 7   Global Step: 126970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:56,202-Speed 3328.74 samples/sec   Loss 2.2833   LearningRate 0.0384   Epoch: 7   Global Step: 126980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:39:59,315-Speed 3289.75 samples/sec   Loss 2.2686   LearningRate 0.0384   Epoch: 7   Global Step: 126990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:02,536-Speed 3180.49 samples/sec   Loss 2.2473   LearningRate 0.0384   Epoch: 7   Global Step: 127000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:05,652-Speed 3286.97 samples/sec   Loss 2.3241   LearningRate 0.0384   Epoch: 7   Global Step: 127010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:08,784-Speed 3269.89 samples/sec   Loss 2.3111   LearningRate 0.0384   Epoch: 7   Global Step: 127020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:11,908-Speed 3279.64 samples/sec   Loss 2.2360   LearningRate 0.0384   Epoch: 7   Global Step: 127030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:14,989-Speed 3324.65 samples/sec   Loss 2.3297   LearningRate 0.0384   Epoch: 7   Global Step: 127040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:18,078-Speed 3315.03 samples/sec   Loss 2.3595   LearningRate 0.0384   Epoch: 7   Global Step: 127050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:21,240-Speed 3239.83 samples/sec   Loss 2.3234   LearningRate 0.0384   Epoch: 7   Global Step: 127060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:24,355-Speed 3288.30 samples/sec   Loss 2.2626   LearningRate 0.0384   Epoch: 7   Global Step: 127070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:27,491-Speed 3265.36 samples/sec   Loss 2.2905   LearningRate 0.0384   Epoch: 7   Global Step: 127080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:40:30,603-Speed 3291.74 samples/sec   Loss 2.3437   LearningRate 0.0384   Epoch: 7   Global Step: 127090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:40:33,787-Speed 3216.64 samples/sec   Loss 2.3204   LearningRate 0.0383   Epoch: 7   Global Step: 127100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:40:36,957-Speed 3231.40 samples/sec   Loss 2.3280   LearningRate 0.0383   Epoch: 7   Global Step: 127110   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:40:40,084-Speed 3275.64 samples/sec   Loss 2.3019   LearningRate 0.0383   Epoch: 7   Global Step: 127120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:40:43,174-Speed 3314.11 samples/sec   Loss 2.3445   LearningRate 0.0383   Epoch: 7   Global Step: 127130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:40:46,277-Speed 3300.23 samples/sec   Loss 2.2721   LearningRate 0.0383   Epoch: 7   Global Step: 127140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:40:49,415-Speed 3265.34 samples/sec   Loss 2.3096   LearningRate 0.0383   Epoch: 7   Global Step: 127150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:40:52,587-Speed 3228.91 samples/sec   Loss 2.3681   LearningRate 0.0383   Epoch: 7   Global Step: 127160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:40:55,713-Speed 3276.21 samples/sec   Loss 2.3420   LearningRate 0.0383   Epoch: 7   Global Step: 127170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:40:58,810-Speed 3306.50 samples/sec   Loss 2.3614   LearningRate 0.0383   Epoch: 7   Global Step: 127180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:01,921-Speed 3292.59 samples/sec   Loss 2.3550   LearningRate 0.0383   Epoch: 7   Global Step: 127190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:41:05,014-Speed 3311.02 samples/sec   Loss 2.3139   LearningRate 0.0383   Epoch: 7   Global Step: 127200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:41:08,122-Speed 3295.79 samples/sec   Loss 2.2902   LearningRate 0.0383   Epoch: 7   Global Step: 127210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:41:11,211-Speed 3316.32 samples/sec   Loss 2.3500   LearningRate 0.0383   Epoch: 7   Global Step: 127220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:41:14,338-Speed 3275.10 samples/sec   Loss 2.2861   LearningRate 0.0383   Epoch: 7   Global Step: 127230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:41:17,434-Speed 3308.95 samples/sec   Loss 2.3363   LearningRate 0.0383   Epoch: 7   Global Step: 127240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:41:20,646-Speed 3188.41 samples/sec   Loss 2.2535   LearningRate 0.0383   Epoch: 7   Global Step: 127250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:23,858-Speed 3189.54 samples/sec   Loss 2.2715   LearningRate 0.0383   Epoch: 7   Global Step: 127260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:27,071-Speed 3188.20 samples/sec   Loss 2.3053   LearningRate 0.0383   Epoch: 7   Global Step: 127270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:30,158-Speed 3317.31 samples/sec   Loss 2.3182   LearningRate 0.0383   Epoch: 7   Global Step: 127280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:33,277-Speed 3284.14 samples/sec   Loss 2.2143   LearningRate 0.0383   Epoch: 7   Global Step: 127290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:36,370-Speed 3311.39 samples/sec   Loss 2.2409   LearningRate 0.0383   Epoch: 7   Global Step: 127300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:39,485-Speed 3288.58 samples/sec   Loss 2.3187   LearningRate 0.0383   Epoch: 7   Global Step: 127310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:42,600-Speed 3288.57 samples/sec   Loss 2.3872   LearningRate 0.0383   Epoch: 7   Global Step: 127320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:45,743-Speed 3258.88 samples/sec   Loss 2.4132   LearningRate 0.0383   Epoch: 7   Global Step: 127330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:48,839-Speed 3308.31 samples/sec   Loss 2.3078   LearningRate 0.0383   Epoch: 7   Global Step: 127340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:41:51,924-Speed 3319.55 samples/sec   Loss 2.2781   LearningRate 0.0383   Epoch: 7   Global Step: 127350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:41:55,022-Speed 3306.88 samples/sec   Loss 2.3241   LearningRate 0.0383   Epoch: 7   Global Step: 127360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:41:58,107-Speed 3319.21 samples/sec   Loss 2.2520   LearningRate 0.0382   Epoch: 7   Global Step: 127370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:01,201-Speed 3310.89 samples/sec   Loss 2.3418   LearningRate 0.0382   Epoch: 7   Global Step: 127380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:04,314-Speed 3290.11 samples/sec   Loss 2.3424   LearningRate 0.0382   Epoch: 7   Global Step: 127390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:07,458-Speed 3258.17 samples/sec   Loss 2.2987   LearningRate 0.0382   Epoch: 7   Global Step: 127400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:10,538-Speed 3325.56 samples/sec   Loss 2.3324   LearningRate 0.0382   Epoch: 7   Global Step: 127410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:13,628-Speed 3314.65 samples/sec   Loss 2.2618   LearningRate 0.0382   Epoch: 7   Global Step: 127420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:16,825-Speed 3204.53 samples/sec   Loss 2.3343   LearningRate 0.0382   Epoch: 7   Global Step: 127430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:20,011-Speed 3214.18 samples/sec   Loss 2.2893   LearningRate 0.0382   Epoch: 7   Global Step: 127440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:23,098-Speed 3318.48 samples/sec   Loss 2.2673   LearningRate 0.0382   Epoch: 7   Global Step: 127450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:26,299-Speed 3199.08 samples/sec   Loss 2.3559   LearningRate 0.0382   Epoch: 7   Global Step: 127460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:29,538-Speed 3162.32 samples/sec   Loss 2.3092   LearningRate 0.0382   Epoch: 7   Global Step: 127470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:32,638-Speed 3303.44 samples/sec   Loss 2.3909   LearningRate 0.0382   Epoch: 7   Global Step: 127480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:35,740-Speed 3302.24 samples/sec   Loss 2.2840   LearningRate 0.0382   Epoch: 7   Global Step: 127490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:38,838-Speed 3306.86 samples/sec   Loss 2.2903   LearningRate 0.0382   Epoch: 7   Global Step: 127500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:41,933-Speed 3309.80 samples/sec   Loss 2.3070   LearningRate 0.0382   Epoch: 7   Global Step: 127510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:45,010-Speed 3328.63 samples/sec   Loss 2.3159   LearningRate 0.0382   Epoch: 7   Global Step: 127520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:48,089-Speed 3326.12 samples/sec   Loss 2.3403   LearningRate 0.0382   Epoch: 7   Global Step: 127530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:51,181-Speed 3312.17 samples/sec   Loss 2.3111   LearningRate 0.0382   Epoch: 7   Global Step: 127540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:42:54,262-Speed 3324.66 samples/sec   Loss 2.2371   LearningRate 0.0382   Epoch: 7   Global Step: 127550   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 12:42:57,350-Speed 3317.00 samples/sec   Loss 2.3053   LearningRate 0.0382   Epoch: 7   Global Step: 127560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:00,446-Speed 3308.33 samples/sec   Loss 2.3631   LearningRate 0.0382   Epoch: 7   Global Step: 127570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:03,547-Speed 3303.33 samples/sec   Loss 2.3151   LearningRate 0.0382   Epoch: 7   Global Step: 127580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:06,632-Speed 3319.97 samples/sec   Loss 2.2217   LearningRate 0.0382   Epoch: 7   Global Step: 127590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:09,716-Speed 3321.34 samples/sec   Loss 2.3397   LearningRate 0.0382   Epoch: 7   Global Step: 127600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:12,968-Speed 3149.60 samples/sec   Loss 2.3564   LearningRate 0.0382   Epoch: 7   Global Step: 127610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:16,151-Speed 3217.40 samples/sec   Loss 2.2846   LearningRate 0.0382   Epoch: 7   Global Step: 127620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:19,250-Speed 3305.41 samples/sec   Loss 2.3374   LearningRate 0.0382   Epoch: 7   Global Step: 127630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:22,378-Speed 3274.70 samples/sec   Loss 2.2616   LearningRate 0.0381   Epoch: 7   Global Step: 127640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:25,461-Speed 3321.54 samples/sec   Loss 2.3155   LearningRate 0.0381   Epoch: 7   Global Step: 127650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:43:28,561-Speed 3304.10 samples/sec   Loss 2.3249   LearningRate 0.0381   Epoch: 7   Global Step: 127660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:31,656-Speed 3309.14 samples/sec   Loss 2.3747   LearningRate 0.0381   Epoch: 7   Global Step: 127670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:34,781-Speed 3278.01 samples/sec   Loss 2.3078   LearningRate 0.0381   Epoch: 7   Global Step: 127680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:37,860-Speed 3326.84 samples/sec   Loss 2.2675   LearningRate 0.0381   Epoch: 7   Global Step: 127690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:40,938-Speed 3326.91 samples/sec   Loss 2.2684   LearningRate 0.0381   Epoch: 7   Global Step: 127700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:44,027-Speed 3316.66 samples/sec   Loss 2.3109   LearningRate 0.0381   Epoch: 7   Global Step: 127710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:47,138-Speed 3291.28 samples/sec   Loss 2.3453   LearningRate 0.0381   Epoch: 7   Global Step: 127720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:50,234-Speed 3308.78 samples/sec   Loss 2.2844   LearningRate 0.0381   Epoch: 7   Global Step: 127730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:53,317-Speed 3322.46 samples/sec   Loss 2.3364   LearningRate 0.0381   Epoch: 7   Global Step: 127740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:56,394-Speed 3329.09 samples/sec   Loss 2.2454   LearningRate 0.0381   Epoch: 7   Global Step: 127750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:43:59,495-Speed 3302.54 samples/sec   Loss 2.3315   LearningRate 0.0381   Epoch: 7   Global Step: 127760   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:02,585-Speed 3314.31 samples/sec   Loss 2.3399   LearningRate 0.0381   Epoch: 7   Global Step: 127770   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:05,689-Speed 3300.62 samples/sec   Loss 2.3008   LearningRate 0.0381   Epoch: 7   Global Step: 127780   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:08,796-Speed 3296.93 samples/sec   Loss 2.3024   LearningRate 0.0381   Epoch: 7   Global Step: 127790   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:11,878-Speed 3323.33 samples/sec   Loss 2.2730   LearningRate 0.0381   Epoch: 7   Global Step: 127800   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:14,983-Speed 3298.83 samples/sec   Loss 2.3498   LearningRate 0.0381   Epoch: 7   Global Step: 127810   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:18,075-Speed 3312.27 samples/sec   Loss 2.2748   LearningRate 0.0381   Epoch: 7   Global Step: 127820   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:21,202-Speed 3275.40 samples/sec   Loss 2.3071   LearningRate 0.0381   Epoch: 7   Global Step: 127830   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:24,291-Speed 3316.31 samples/sec   Loss 2.2974   LearningRate 0.0381   Epoch: 7   Global Step: 127840   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:27,372-Speed 3324.30 samples/sec   Loss 2.3445   LearningRate 0.0381   Epoch: 7   Global Step: 127850   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:44:30,517-Speed 3257.09 samples/sec   Loss 2.3692   LearningRate 0.0381   Epoch: 7   Global Step: 127860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:44:33,698-Speed 3219.75 samples/sec   Loss 2.3146   LearningRate 0.0381   Epoch: 7   Global Step: 127870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:44:36,820-Speed 3280.87 samples/sec   Loss 2.3095   LearningRate 0.0381   Epoch: 7   Global Step: 127880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:44:39,958-Speed 3264.26 samples/sec   Loss 2.3429   LearningRate 0.0381   Epoch: 7   Global Step: 127890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:44:43,160-Speed 3198.49 samples/sec   Loss 2.3300   LearningRate 0.0381   Epoch: 7   Global Step: 127900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:44:46,301-Speed 3260.70 samples/sec   Loss 2.2752   LearningRate 0.0380   Epoch: 7   Global Step: 127910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:44:49,399-Speed 3306.71 samples/sec   Loss 2.2987   LearningRate 0.0380   Epoch: 7   Global Step: 127920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:44:52,528-Speed 3273.50 samples/sec   Loss 2.2735   LearningRate 0.0380   Epoch: 7   Global Step: 127930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:44:55,652-Speed 3278.64 samples/sec   Loss 2.3434   LearningRate 0.0380   Epoch: 7   Global Step: 127940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:44:58,772-Speed 3281.96 samples/sec   Loss 2.3039   LearningRate 0.0380   Epoch: 7   Global Step: 127950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:45:01,909-Speed 3265.27 samples/sec   Loss 2.3556   LearningRate 0.0380   Epoch: 7   Global Step: 127960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:45:05,005-Speed 3309.45 samples/sec   Loss 2.2373   LearningRate 0.0380   Epoch: 7   Global Step: 127970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:45:08,125-Speed 3282.55 samples/sec   Loss 2.2864   LearningRate 0.0380   Epoch: 7   Global Step: 127980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:45:11,207-Speed 3323.72 samples/sec   Loss 2.3154   LearningRate 0.0380   Epoch: 7   Global Step: 127990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:45:14,366-Speed 3241.36 samples/sec   Loss 2.2882   LearningRate 0.0380   Epoch: 7   Global Step: 128000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:45:58,471-[lfw][128000]XNorm: 22.783002
Training: 2022-04-11 12:45:58,472-[lfw][128000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 12:45:58,472-[lfw][128000]Accuracy-Highest: 0.99817
Training: 2022-04-11 12:46:49,654-[cfp_fp][128000]XNorm: 21.646355
Training: 2022-04-11 12:46:49,655-[cfp_fp][128000]Accuracy-Flip: 0.98514+-0.00528
Training: 2022-04-11 12:46:49,655-[cfp_fp][128000]Accuracy-Highest: 0.98700
Training: 2022-04-11 12:47:33,760-[agedb_30][128000]XNorm: 23.015101
Training: 2022-04-11 12:47:33,760-[agedb_30][128000]Accuracy-Flip: 0.98083+-0.00757
Training: 2022-04-11 12:47:33,761-[agedb_30][128000]Accuracy-Highest: 0.98317
Training: 2022-04-11 12:47:36,861-Speed 71.86 samples/sec   Loss 2.2405   LearningRate 0.0380   Epoch: 7   Global Step: 128010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:47:39,928-Speed 3340.05 samples/sec   Loss 2.2806   LearningRate 0.0380   Epoch: 7   Global Step: 128020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:47:43,017-Speed 3315.42 samples/sec   Loss 2.3201   LearningRate 0.0380   Epoch: 7   Global Step: 128030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:47:46,097-Speed 3325.91 samples/sec   Loss 2.2737   LearningRate 0.0380   Epoch: 7   Global Step: 128040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:47:49,159-Speed 3344.62 samples/sec   Loss 2.2926   LearningRate 0.0380   Epoch: 7   Global Step: 128050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:47:52,254-Speed 3308.65 samples/sec   Loss 2.2409   LearningRate 0.0380   Epoch: 7   Global Step: 128060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:47:55,347-Speed 3311.90 samples/sec   Loss 2.2394   LearningRate 0.0380   Epoch: 7   Global Step: 128070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:47:58,472-Speed 3278.14 samples/sec   Loss 2.2523   LearningRate 0.0380   Epoch: 7   Global Step: 128080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:01,540-Speed 3338.39 samples/sec   Loss 2.2652   LearningRate 0.0380   Epoch: 7   Global Step: 128090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:04,614-Speed 3331.43 samples/sec   Loss 2.3531   LearningRate 0.0380   Epoch: 7   Global Step: 128100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:07,690-Speed 3329.92 samples/sec   Loss 2.2453   LearningRate 0.0380   Epoch: 7   Global Step: 128110   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:10,763-Speed 3333.70 samples/sec   Loss 2.3166   LearningRate 0.0380   Epoch: 7   Global Step: 128120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:13,832-Speed 3337.50 samples/sec   Loss 2.3744   LearningRate 0.0380   Epoch: 7   Global Step: 128130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:16,933-Speed 3302.53 samples/sec   Loss 2.2638   LearningRate 0.0380   Epoch: 7   Global Step: 128140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:20,089-Speed 3246.10 samples/sec   Loss 2.3212   LearningRate 0.0380   Epoch: 7   Global Step: 128150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:48:23,184-Speed 3309.22 samples/sec   Loss 2.2908   LearningRate 0.0380   Epoch: 7   Global Step: 128160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:48:26,284-Speed 3303.68 samples/sec   Loss 2.3222   LearningRate 0.0380   Epoch: 7   Global Step: 128170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:29,394-Speed 3293.35 samples/sec   Loss 2.3045   LearningRate 0.0379   Epoch: 7   Global Step: 128180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:32,474-Speed 3326.15 samples/sec   Loss 2.2562   LearningRate 0.0379   Epoch: 7   Global Step: 128190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:35,555-Speed 3323.80 samples/sec   Loss 2.2556   LearningRate 0.0379   Epoch: 7   Global Step: 128200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:38,659-Speed 3299.63 samples/sec   Loss 2.3561   LearningRate 0.0379   Epoch: 7   Global Step: 128210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:41,774-Speed 3288.27 samples/sec   Loss 2.3312   LearningRate 0.0379   Epoch: 7   Global Step: 128220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:44,946-Speed 3229.59 samples/sec   Loss 2.4279   LearningRate 0.0379   Epoch: 7   Global Step: 128230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:48,099-Speed 3248.60 samples/sec   Loss 2.2954   LearningRate 0.0379   Epoch: 7   Global Step: 128240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:51,281-Speed 3218.11 samples/sec   Loss 2.3505   LearningRate 0.0379   Epoch: 7   Global Step: 128250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:54,495-Speed 3186.78 samples/sec   Loss 2.3403   LearningRate 0.0379   Epoch: 7   Global Step: 128260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:48:57,571-Speed 3329.76 samples/sec   Loss 2.2998   LearningRate 0.0379   Epoch: 7   Global Step: 128270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:00,687-Speed 3287.92 samples/sec   Loss 2.2698   LearningRate 0.0379   Epoch: 7   Global Step: 128280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:03,797-Speed 3292.86 samples/sec   Loss 2.3331   LearningRate 0.0379   Epoch: 7   Global Step: 128290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:06,887-Speed 3314.73 samples/sec   Loss 2.2963   LearningRate 0.0379   Epoch: 7   Global Step: 128300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:09,964-Speed 3329.18 samples/sec   Loss 2.2268   LearningRate 0.0379   Epoch: 7   Global Step: 128310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:13,066-Speed 3302.21 samples/sec   Loss 2.2826   LearningRate 0.0379   Epoch: 7   Global Step: 128320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:16,161-Speed 3309.62 samples/sec   Loss 2.3340   LearningRate 0.0379   Epoch: 7   Global Step: 128330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:19,287-Speed 3276.59 samples/sec   Loss 2.3449   LearningRate 0.0379   Epoch: 7   Global Step: 128340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:22,378-Speed 3313.01 samples/sec   Loss 2.3262   LearningRate 0.0379   Epoch: 7   Global Step: 128350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:25,470-Speed 3312.26 samples/sec   Loss 2.3565   LearningRate 0.0379   Epoch: 7   Global Step: 128360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:28,558-Speed 3317.93 samples/sec   Loss 2.3461   LearningRate 0.0379   Epoch: 7   Global Step: 128370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:49:31,650-Speed 3311.67 samples/sec   Loss 2.2749   LearningRate 0.0379   Epoch: 7   Global Step: 128380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:49:34,732-Speed 3324.07 samples/sec   Loss 2.3268   LearningRate 0.0379   Epoch: 7   Global Step: 128390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:49:37,841-Speed 3295.16 samples/sec   Loss 2.2880   LearningRate 0.0379   Epoch: 7   Global Step: 128400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:49:40,907-Speed 3340.01 samples/sec   Loss 2.3074   LearningRate 0.0379   Epoch: 7   Global Step: 128410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:43,984-Speed 3329.23 samples/sec   Loss 2.2864   LearningRate 0.0379   Epoch: 7   Global Step: 128420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:47,079-Speed 3309.17 samples/sec   Loss 2.2926   LearningRate 0.0379   Epoch: 7   Global Step: 128430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:50,178-Speed 3305.14 samples/sec   Loss 2.3065   LearningRate 0.0379   Epoch: 7   Global Step: 128440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:53,267-Speed 3317.23 samples/sec   Loss 2.3677   LearningRate 0.0378   Epoch: 7   Global Step: 128450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:56,345-Speed 3326.61 samples/sec   Loss 2.2709   LearningRate 0.0378   Epoch: 7   Global Step: 128460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:49:59,559-Speed 3187.00 samples/sec   Loss 2.3046   LearningRate 0.0378   Epoch: 7   Global Step: 128470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:02,662-Speed 3300.92 samples/sec   Loss 2.3096   LearningRate 0.0378   Epoch: 7   Global Step: 128480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:05,735-Speed 3332.81 samples/sec   Loss 2.2965   LearningRate 0.0378   Epoch: 7   Global Step: 128490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:08,829-Speed 3310.07 samples/sec   Loss 2.2768   LearningRate 0.0378   Epoch: 7   Global Step: 128500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:11,923-Speed 3311.21 samples/sec   Loss 2.2988   LearningRate 0.0378   Epoch: 7   Global Step: 128510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:50:15,043-Speed 3282.62 samples/sec   Loss 2.3100   LearningRate 0.0378   Epoch: 7   Global Step: 128520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:50:18,183-Speed 3262.69 samples/sec   Loss 2.3085   LearningRate 0.0378   Epoch: 7   Global Step: 128530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:50:21,262-Speed 3326.06 samples/sec   Loss 2.2429   LearningRate 0.0378   Epoch: 7   Global Step: 128540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:50:24,343-Speed 3324.68 samples/sec   Loss 2.3201   LearningRate 0.0378   Epoch: 7   Global Step: 128550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:50:27,443-Speed 3304.08 samples/sec   Loss 2.3262   LearningRate 0.0378   Epoch: 7   Global Step: 128560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:50:30,528-Speed 3320.03 samples/sec   Loss 2.2781   LearningRate 0.0378   Epoch: 7   Global Step: 128570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:50:33,632-Speed 3299.29 samples/sec   Loss 2.3447   LearningRate 0.0378   Epoch: 7   Global Step: 128580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:50:36,711-Speed 3326.32 samples/sec   Loss 2.3536   LearningRate 0.0378   Epoch: 7   Global Step: 128590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:50:39,793-Speed 3323.67 samples/sec   Loss 2.3311   LearningRate 0.0378   Epoch: 7   Global Step: 128600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:42,869-Speed 3330.13 samples/sec   Loss 2.2208   LearningRate 0.0378   Epoch: 7   Global Step: 128610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:45,997-Speed 3274.54 samples/sec   Loss 2.3145   LearningRate 0.0378   Epoch: 7   Global Step: 128620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:49,102-Speed 3299.49 samples/sec   Loss 2.3150   LearningRate 0.0378   Epoch: 7   Global Step: 128630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:52,248-Speed 3255.06 samples/sec   Loss 2.3509   LearningRate 0.0378   Epoch: 7   Global Step: 128640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:55,446-Speed 3203.14 samples/sec   Loss 2.2626   LearningRate 0.0378   Epoch: 7   Global Step: 128650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:50:58,630-Speed 3216.43 samples/sec   Loss 2.3425   LearningRate 0.0378   Epoch: 7   Global Step: 128660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:01,718-Speed 3316.50 samples/sec   Loss 2.3127   LearningRate 0.0378   Epoch: 7   Global Step: 128670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:04,835-Speed 3286.46 samples/sec   Loss 2.3428   LearningRate 0.0378   Epoch: 7   Global Step: 128680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:07,917-Speed 3323.70 samples/sec   Loss 2.3110   LearningRate 0.0378   Epoch: 7   Global Step: 128690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:11,001-Speed 3322.03 samples/sec   Loss 2.3008   LearningRate 0.0378   Epoch: 7   Global Step: 128700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:51:14,204-Speed 3197.30 samples/sec   Loss 2.3420   LearningRate 0.0378   Epoch: 7   Global Step: 128710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:51:17,352-Speed 3254.01 samples/sec   Loss 2.3434   LearningRate 0.0377   Epoch: 7   Global Step: 128720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:51:20,447-Speed 3309.12 samples/sec   Loss 2.3213   LearningRate 0.0377   Epoch: 7   Global Step: 128730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:51:23,534-Speed 3317.40 samples/sec   Loss 2.2778   LearningRate 0.0377   Epoch: 7   Global Step: 128740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:51:26,696-Speed 3239.43 samples/sec   Loss 2.3102   LearningRate 0.0377   Epoch: 7   Global Step: 128750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:51:29,816-Speed 3282.42 samples/sec   Loss 2.2791   LearningRate 0.0377   Epoch: 7   Global Step: 128760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:51:32,987-Speed 3230.15 samples/sec   Loss 2.2509   LearningRate 0.0377   Epoch: 7   Global Step: 128770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:51:36,055-Speed 3339.42 samples/sec   Loss 2.3069   LearningRate 0.0377   Epoch: 7   Global Step: 128780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:39,141-Speed 3318.76 samples/sec   Loss 2.3006   LearningRate 0.0377   Epoch: 7   Global Step: 128790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:42,221-Speed 3326.22 samples/sec   Loss 2.2925   LearningRate 0.0377   Epoch: 7   Global Step: 128800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:45,297-Speed 3329.89 samples/sec   Loss 2.3472   LearningRate 0.0377   Epoch: 7   Global Step: 128810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:48,377-Speed 3325.36 samples/sec   Loss 2.3270   LearningRate 0.0377   Epoch: 7   Global Step: 128820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:51,468-Speed 3314.00 samples/sec   Loss 2.2717   LearningRate 0.0377   Epoch: 7   Global Step: 128830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:54,554-Speed 3319.27 samples/sec   Loss 2.2447   LearningRate 0.0377   Epoch: 7   Global Step: 128840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:51:57,636-Speed 3323.19 samples/sec   Loss 2.2691   LearningRate 0.0377   Epoch: 7   Global Step: 128850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:52:00,743-Speed 3296.88 samples/sec   Loss 2.2861   LearningRate 0.0377   Epoch: 7   Global Step: 128860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:52:03,843-Speed 3304.09 samples/sec   Loss 2.3027   LearningRate 0.0377   Epoch: 7   Global Step: 128870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:52:06,978-Speed 3266.45 samples/sec   Loss 2.2887   LearningRate 0.0377   Epoch: 7   Global Step: 128880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:10,055-Speed 3328.38 samples/sec   Loss 2.3238   LearningRate 0.0377   Epoch: 7   Global Step: 128890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:13,195-Speed 3262.50 samples/sec   Loss 2.4145   LearningRate 0.0377   Epoch: 7   Global Step: 128900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:16,409-Speed 3186.90 samples/sec   Loss 2.3926   LearningRate 0.0377   Epoch: 7   Global Step: 128910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:19,651-Speed 3159.03 samples/sec   Loss 2.3234   LearningRate 0.0377   Epoch: 7   Global Step: 128920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:22,928-Speed 3126.30 samples/sec   Loss 2.2996   LearningRate 0.0377   Epoch: 7   Global Step: 128930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:26,114-Speed 3215.32 samples/sec   Loss 2.3030   LearningRate 0.0377   Epoch: 7   Global Step: 128940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:29,304-Speed 3210.11 samples/sec   Loss 2.2199   LearningRate 0.0377   Epoch: 7   Global Step: 128950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:32,389-Speed 3320.26 samples/sec   Loss 2.3435   LearningRate 0.0377   Epoch: 7   Global Step: 128960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:35,513-Speed 3282.86 samples/sec   Loss 2.2586   LearningRate 0.0377   Epoch: 7   Global Step: 128970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:38,589-Speed 3330.02 samples/sec   Loss 2.3328   LearningRate 0.0377   Epoch: 7   Global Step: 128980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:41,677-Speed 3316.50 samples/sec   Loss 2.2259   LearningRate 0.0376   Epoch: 7   Global Step: 128990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:44,815-Speed 3264.15 samples/sec   Loss 2.3278   LearningRate 0.0376   Epoch: 7   Global Step: 129000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:47,898-Speed 3322.17 samples/sec   Loss 2.3217   LearningRate 0.0376   Epoch: 7   Global Step: 129010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:50,981-Speed 3322.06 samples/sec   Loss 2.3103   LearningRate 0.0376   Epoch: 7   Global Step: 129020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:54,064-Speed 3322.43 samples/sec   Loss 2.2838   LearningRate 0.0376   Epoch: 7   Global Step: 129030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:52:57,160-Speed 3308.89 samples/sec   Loss 2.2554   LearningRate 0.0376   Epoch: 7   Global Step: 129040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:00,304-Speed 3258.04 samples/sec   Loss 2.2960   LearningRate 0.0376   Epoch: 7   Global Step: 129050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:03,386-Speed 3323.04 samples/sec   Loss 2.3017   LearningRate 0.0376   Epoch: 7   Global Step: 129060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:06,469-Speed 3321.92 samples/sec   Loss 2.2294   LearningRate 0.0376   Epoch: 7   Global Step: 129070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:09,557-Speed 3316.67 samples/sec   Loss 2.3502   LearningRate 0.0376   Epoch: 7   Global Step: 129080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:12,626-Speed 3338.08 samples/sec   Loss 2.2391   LearningRate 0.0376   Epoch: 7   Global Step: 129090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:15,709-Speed 3322.45 samples/sec   Loss 2.3160   LearningRate 0.0376   Epoch: 7   Global Step: 129100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:18,785-Speed 3329.81 samples/sec   Loss 2.2939   LearningRate 0.0376   Epoch: 7   Global Step: 129110   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:21,864-Speed 3326.61 samples/sec   Loss 2.3063   LearningRate 0.0376   Epoch: 7   Global Step: 129120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:24,950-Speed 3318.79 samples/sec   Loss 2.3166   LearningRate 0.0376   Epoch: 7   Global Step: 129130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:28,032-Speed 3323.91 samples/sec   Loss 2.3462   LearningRate 0.0376   Epoch: 7   Global Step: 129140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:31,233-Speed 3199.31 samples/sec   Loss 2.3226   LearningRate 0.0376   Epoch: 7   Global Step: 129150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:34,359-Speed 3276.37 samples/sec   Loss 2.3108   LearningRate 0.0376   Epoch: 7   Global Step: 129160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:37,482-Speed 3280.31 samples/sec   Loss 2.3116   LearningRate 0.0376   Epoch: 7   Global Step: 129170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:40,572-Speed 3314.41 samples/sec   Loss 2.2910   LearningRate 0.0376   Epoch: 7   Global Step: 129180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:53:43,684-Speed 3291.61 samples/sec   Loss 2.3087   LearningRate 0.0376   Epoch: 7   Global Step: 129190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:46,935-Speed 3151.18 samples/sec   Loss 2.2917   LearningRate 0.0376   Epoch: 7   Global Step: 129200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:50,023-Speed 3316.58 samples/sec   Loss 2.3424   LearningRate 0.0376   Epoch: 7   Global Step: 129210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:53,169-Speed 3255.94 samples/sec   Loss 2.2851   LearningRate 0.0376   Epoch: 7   Global Step: 129220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:56,263-Speed 3310.32 samples/sec   Loss 2.3049   LearningRate 0.0376   Epoch: 7   Global Step: 129230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:53:59,394-Speed 3271.75 samples/sec   Loss 2.2771   LearningRate 0.0376   Epoch: 7   Global Step: 129240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:02,505-Speed 3291.98 samples/sec   Loss 2.3507   LearningRate 0.0376   Epoch: 7   Global Step: 129250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:05,662-Speed 3244.96 samples/sec   Loss 2.3267   LearningRate 0.0376   Epoch: 7   Global Step: 129260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:08,790-Speed 3273.37 samples/sec   Loss 2.2378   LearningRate 0.0375   Epoch: 7   Global Step: 129270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:11,973-Speed 3217.82 samples/sec   Loss 2.2770   LearningRate 0.0375   Epoch: 7   Global Step: 129280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:15,193-Speed 3181.42 samples/sec   Loss 2.3423   LearningRate 0.0375   Epoch: 7   Global Step: 129290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:18,424-Speed 3170.61 samples/sec   Loss 2.3278   LearningRate 0.0375   Epoch: 7   Global Step: 129300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:21,597-Speed 3227.36 samples/sec   Loss 2.2958   LearningRate 0.0375   Epoch: 7   Global Step: 129310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:24,853-Speed 3147.03 samples/sec   Loss 2.2677   LearningRate 0.0375   Epoch: 7   Global Step: 129320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:27,953-Speed 3304.29 samples/sec   Loss 2.2943   LearningRate 0.0375   Epoch: 7   Global Step: 129330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:54:31,021-Speed 3338.10 samples/sec   Loss 2.3456   LearningRate 0.0375   Epoch: 7   Global Step: 129340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:54:34,122-Speed 3302.20 samples/sec   Loss 2.3103   LearningRate 0.0375   Epoch: 7   Global Step: 129350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:54:37,212-Speed 3315.50 samples/sec   Loss 2.3117   LearningRate 0.0375   Epoch: 7   Global Step: 129360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:54:40,333-Speed 3281.14 samples/sec   Loss 2.3378   LearningRate 0.0375   Epoch: 7   Global Step: 129370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:54:43,443-Speed 3293.41 samples/sec   Loss 2.3533   LearningRate 0.0375   Epoch: 7   Global Step: 129380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:54:46,542-Speed 3305.86 samples/sec   Loss 2.2097   LearningRate 0.0375   Epoch: 7   Global Step: 129390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:54:49,691-Speed 3252.10 samples/sec   Loss 2.3014   LearningRate 0.0375   Epoch: 7   Global Step: 129400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:54:52,770-Speed 3326.70 samples/sec   Loss 2.2637   LearningRate 0.0375   Epoch: 7   Global Step: 129410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:54:55,851-Speed 3324.91 samples/sec   Loss 2.3145   LearningRate 0.0375   Epoch: 7   Global Step: 129420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:54:58,928-Speed 3329.04 samples/sec   Loss 2.3127   LearningRate 0.0375   Epoch: 7   Global Step: 129430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:55:02,016-Speed 3316.36 samples/sec   Loss 2.3106   LearningRate 0.0375   Epoch: 7   Global Step: 129440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:55:05,095-Speed 3327.49 samples/sec   Loss 2.3333   LearningRate 0.0375   Epoch: 7   Global Step: 129450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:55:08,208-Speed 3289.92 samples/sec   Loss 2.3574   LearningRate 0.0375   Epoch: 7   Global Step: 129460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:55:11,311-Speed 3300.56 samples/sec   Loss 2.2916   LearningRate 0.0375   Epoch: 7   Global Step: 129470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:55:14,575-Speed 3137.40 samples/sec   Loss 2.2771   LearningRate 0.0375   Epoch: 7   Global Step: 129480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:55:17,654-Speed 3327.72 samples/sec   Loss 2.3388   LearningRate 0.0375   Epoch: 7   Global Step: 129490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:55:20,731-Speed 3328.61 samples/sec   Loss 2.2922   LearningRate 0.0375   Epoch: 7   Global Step: 129500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:55:23,814-Speed 3322.26 samples/sec   Loss 2.3403   LearningRate 0.0375   Epoch: 7   Global Step: 129510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:55:26,932-Speed 3284.82 samples/sec   Loss 2.2668   LearningRate 0.0375   Epoch: 7   Global Step: 129520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:55:29,996-Speed 3342.31 samples/sec   Loss 2.3219   LearningRate 0.0375   Epoch: 7   Global Step: 129530   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:55:33,081-Speed 3320.34 samples/sec   Loss 2.2323   LearningRate 0.0374   Epoch: 7   Global Step: 129540   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:55:36,171-Speed 3315.42 samples/sec   Loss 2.3331   LearningRate 0.0374   Epoch: 7   Global Step: 129550   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:55:39,270-Speed 3304.66 samples/sec   Loss 2.2381   LearningRate 0.0374   Epoch: 7   Global Step: 129560   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:55:42,360-Speed 3314.76 samples/sec   Loss 2.2304   LearningRate 0.0374   Epoch: 7   Global Step: 129570   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:55:45,449-Speed 3316.37 samples/sec   Loss 2.2858   LearningRate 0.0374   Epoch: 7   Global Step: 129580   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:55:48,533-Speed 3320.99 samples/sec   Loss 2.3075   LearningRate 0.0374   Epoch: 7   Global Step: 129590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:55:51,624-Speed 3313.66 samples/sec   Loss 2.2972   LearningRate 0.0374   Epoch: 7   Global Step: 129600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:55:54,819-Speed 3205.56 samples/sec   Loss 2.2332   LearningRate 0.0374   Epoch: 7   Global Step: 129610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:55:57,923-Speed 3299.73 samples/sec   Loss 2.2450   LearningRate 0.0374   Epoch: 7   Global Step: 129620   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-11 12:56:01,004-Speed 3325.08 samples/sec   Loss 2.2530   LearningRate 0.0374   Epoch: 7   Global Step: 129630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:04,124-Speed 3282.98 samples/sec   Loss 2.3599   LearningRate 0.0374   Epoch: 7   Global Step: 129640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:07,204-Speed 3324.81 samples/sec   Loss 2.2888   LearningRate 0.0374   Epoch: 7   Global Step: 129650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:10,360-Speed 3245.38 samples/sec   Loss 2.2671   LearningRate 0.0374   Epoch: 7   Global Step: 129660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:13,519-Speed 3242.44 samples/sec   Loss 2.3303   LearningRate 0.0374   Epoch: 7   Global Step: 129670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:16,610-Speed 3313.47 samples/sec   Loss 2.2731   LearningRate 0.0374   Epoch: 7   Global Step: 129680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:19,700-Speed 3315.39 samples/sec   Loss 2.2982   LearningRate 0.0374   Epoch: 7   Global Step: 129690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:22,799-Speed 3305.44 samples/sec   Loss 2.3174   LearningRate 0.0374   Epoch: 7   Global Step: 129700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:25,893-Speed 3310.29 samples/sec   Loss 2.2854   LearningRate 0.0374   Epoch: 7   Global Step: 129710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:28,971-Speed 3327.00 samples/sec   Loss 2.3212   LearningRate 0.0374   Epoch: 7   Global Step: 129720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 12:56:32,060-Speed 3316.15 samples/sec   Loss 2.3144   LearningRate 0.0374   Epoch: 7   Global Step: 129730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:56:35,203-Speed 3258.56 samples/sec   Loss 2.2844   LearningRate 0.0374   Epoch: 7   Global Step: 129740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:56:38,376-Speed 3228.20 samples/sec   Loss 2.2846   LearningRate 0.0374   Epoch: 7   Global Step: 129750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:56:41,457-Speed 3324.51 samples/sec   Loss 2.2495   LearningRate 0.0374   Epoch: 7   Global Step: 129760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:56:44,537-Speed 3325.98 samples/sec   Loss 2.2759   LearningRate 0.0374   Epoch: 7   Global Step: 129770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:56:47,630-Speed 3310.79 samples/sec   Loss 2.3234   LearningRate 0.0374   Epoch: 7   Global Step: 129780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:56:50,723-Speed 3312.14 samples/sec   Loss 2.3367   LearningRate 0.0374   Epoch: 7   Global Step: 129790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:56:53,812-Speed 3315.59 samples/sec   Loss 2.2897   LearningRate 0.0374   Epoch: 7   Global Step: 129800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:56:57,001-Speed 3211.95 samples/sec   Loss 2.2996   LearningRate 0.0373   Epoch: 7   Global Step: 129810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:00,189-Speed 3212.24 samples/sec   Loss 2.2705   LearningRate 0.0373   Epoch: 7   Global Step: 129820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:03,319-Speed 3273.51 samples/sec   Loss 2.2772   LearningRate 0.0373   Epoch: 7   Global Step: 129830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:06,413-Speed 3310.21 samples/sec   Loss 2.3109   LearningRate 0.0373   Epoch: 7   Global Step: 129840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:09,508-Speed 3309.49 samples/sec   Loss 2.3262   LearningRate 0.0373   Epoch: 7   Global Step: 129850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:12,589-Speed 3324.58 samples/sec   Loss 2.3286   LearningRate 0.0373   Epoch: 7   Global Step: 129860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:15,752-Speed 3237.65 samples/sec   Loss 2.3141   LearningRate 0.0373   Epoch: 7   Global Step: 129870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:18,861-Speed 3294.73 samples/sec   Loss 2.3091   LearningRate 0.0373   Epoch: 7   Global Step: 129880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:22,002-Speed 3260.99 samples/sec   Loss 2.2990   LearningRate 0.0373   Epoch: 7   Global Step: 129890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:25,170-Speed 3233.43 samples/sec   Loss 2.3289   LearningRate 0.0373   Epoch: 7   Global Step: 129900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:28,320-Speed 3252.81 samples/sec   Loss 2.3083   LearningRate 0.0373   Epoch: 7   Global Step: 129910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:31,420-Speed 3304.11 samples/sec   Loss 2.3151   LearningRate 0.0373   Epoch: 7   Global Step: 129920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:34,541-Speed 3281.32 samples/sec   Loss 2.2695   LearningRate 0.0373   Epoch: 7   Global Step: 129930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:37,642-Speed 3303.49 samples/sec   Loss 2.2274   LearningRate 0.0373   Epoch: 7   Global Step: 129940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:40,803-Speed 3240.59 samples/sec   Loss 2.2985   LearningRate 0.0373   Epoch: 7   Global Step: 129950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:43,880-Speed 3327.70 samples/sec   Loss 2.3409   LearningRate 0.0373   Epoch: 7   Global Step: 129960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:46,966-Speed 3319.85 samples/sec   Loss 2.2901   LearningRate 0.0373   Epoch: 7   Global Step: 129970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 12:57:50,046-Speed 3325.64 samples/sec   Loss 2.2170   LearningRate 0.0373   Epoch: 7   Global Step: 129980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 12:57:53,130-Speed 3321.53 samples/sec   Loss 2.2654   LearningRate 0.0373   Epoch: 7   Global Step: 129990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 12:57:56,209-Speed 3325.86 samples/sec   Loss 2.2763   LearningRate 0.0373   Epoch: 7   Global Step: 130000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 12:58:40,143-[lfw][130000]XNorm: 22.359334
Training: 2022-04-11 12:58:40,144-[lfw][130000]Accuracy-Flip: 0.99717+-0.00299
Training: 2022-04-11 12:58:40,144-[lfw][130000]Accuracy-Highest: 0.99817
Training: 2022-04-11 12:59:31,126-[cfp_fp][130000]XNorm: 21.696087
Training: 2022-04-11 12:59:31,127-[cfp_fp][130000]Accuracy-Flip: 0.98686+-0.00567
Training: 2022-04-11 12:59:31,127-[cfp_fp][130000]Accuracy-Highest: 0.98700
Training: 2022-04-11 13:00:15,400-[agedb_30][130000]XNorm: 22.836366
Training: 2022-04-11 13:00:15,401-[agedb_30][130000]Accuracy-Flip: 0.98133+-0.00726
Training: 2022-04-11 13:00:15,401-[agedb_30][130000]Accuracy-Highest: 0.98317
Training: 2022-04-11 13:00:18,479-Speed 71.98 samples/sec   Loss 2.2746   LearningRate 0.0373   Epoch: 7   Global Step: 130010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:00:21,570-Speed 3313.59 samples/sec   Loss 2.2300   LearningRate 0.0373   Epoch: 7   Global Step: 130020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:00:24,642-Speed 3334.72 samples/sec   Loss 2.3166   LearningRate 0.0373   Epoch: 7   Global Step: 130030   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-11 13:00:27,691-Speed 3359.18 samples/sec   Loss 2.2677   LearningRate 0.0373   Epoch: 7   Global Step: 130040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:00:30,756-Speed 3340.88 samples/sec   Loss 2.2290   LearningRate 0.0373   Epoch: 7   Global Step: 130050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:00:33,834-Speed 3327.96 samples/sec   Loss 2.3319   LearningRate 0.0373   Epoch: 7   Global Step: 130060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:00:36,920-Speed 3319.05 samples/sec   Loss 2.3344   LearningRate 0.0373   Epoch: 7   Global Step: 130070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:00:40,005-Speed 3320.40 samples/sec   Loss 2.3608   LearningRate 0.0373   Epoch: 7   Global Step: 130080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:00:43,081-Speed 3330.33 samples/sec   Loss 2.3082   LearningRate 0.0372   Epoch: 7   Global Step: 130090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:00:46,200-Speed 3283.40 samples/sec   Loss 2.2103   LearningRate 0.0372   Epoch: 7   Global Step: 130100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:00:49,345-Speed 3257.38 samples/sec   Loss 2.2966   LearningRate 0.0372   Epoch: 7   Global Step: 130110   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:00:52,584-Speed 3161.74 samples/sec   Loss 2.2500   LearningRate 0.0372   Epoch: 7   Global Step: 130120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:00:55,746-Speed 3238.75 samples/sec   Loss 2.2951   LearningRate 0.0372   Epoch: 7   Global Step: 130130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:00:58,993-Speed 3155.09 samples/sec   Loss 2.2769   LearningRate 0.0372   Epoch: 7   Global Step: 130140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:02,104-Speed 3292.67 samples/sec   Loss 2.2421   LearningRate 0.0372   Epoch: 7   Global Step: 130150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:05,202-Speed 3305.67 samples/sec   Loss 2.3078   LearningRate 0.0372   Epoch: 7   Global Step: 130160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:01:08,308-Speed 3298.00 samples/sec   Loss 2.2657   LearningRate 0.0372   Epoch: 7   Global Step: 130170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:11,561-Speed 3149.43 samples/sec   Loss 2.2161   LearningRate 0.0372   Epoch: 7   Global Step: 130180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:14,654-Speed 3311.65 samples/sec   Loss 2.2965   LearningRate 0.0372   Epoch: 7   Global Step: 130190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:17,808-Speed 3247.53 samples/sec   Loss 2.2857   LearningRate 0.0372   Epoch: 7   Global Step: 130200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:20,942-Speed 3267.23 samples/sec   Loss 2.2625   LearningRate 0.0372   Epoch: 7   Global Step: 130210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:24,099-Speed 3244.43 samples/sec   Loss 2.2328   LearningRate 0.0372   Epoch: 7   Global Step: 130220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:27,240-Speed 3261.07 samples/sec   Loss 2.2643   LearningRate 0.0372   Epoch: 7   Global Step: 130230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:30,336-Speed 3307.94 samples/sec   Loss 2.3346   LearningRate 0.0372   Epoch: 7   Global Step: 130240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:33,424-Speed 3317.18 samples/sec   Loss 2.2982   LearningRate 0.0372   Epoch: 7   Global Step: 130250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:36,508-Speed 3321.09 samples/sec   Loss 2.2404   LearningRate 0.0372   Epoch: 7   Global Step: 130260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:39,591-Speed 3323.05 samples/sec   Loss 2.2478   LearningRate 0.0372   Epoch: 7   Global Step: 130270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:01:42,699-Speed 3296.11 samples/sec   Loss 2.2758   LearningRate 0.0372   Epoch: 7   Global Step: 130280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:01:45,764-Speed 3341.32 samples/sec   Loss 2.3112   LearningRate 0.0372   Epoch: 7   Global Step: 130290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:48,881-Speed 3285.77 samples/sec   Loss 2.2470   LearningRate 0.0372   Epoch: 7   Global Step: 130300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:51,973-Speed 3313.03 samples/sec   Loss 2.2657   LearningRate 0.0372   Epoch: 7   Global Step: 130310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:55,065-Speed 3312.22 samples/sec   Loss 2.2708   LearningRate 0.0372   Epoch: 7   Global Step: 130320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:01:58,160-Speed 3310.10 samples/sec   Loss 2.2600   LearningRate 0.0372   Epoch: 7   Global Step: 130330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:02:01,235-Speed 3330.49 samples/sec   Loss 2.2875   LearningRate 0.0372   Epoch: 7   Global Step: 130340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:02:04,342-Speed 3296.63 samples/sec   Loss 2.2704   LearningRate 0.0372   Epoch: 7   Global Step: 130350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:02:07,447-Speed 3299.09 samples/sec   Loss 2.2583   LearningRate 0.0371   Epoch: 7   Global Step: 130360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:02:10,590-Speed 3259.25 samples/sec   Loss 2.2930   LearningRate 0.0371   Epoch: 7   Global Step: 130370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:02:13,701-Speed 3291.61 samples/sec   Loss 2.3073   LearningRate 0.0371   Epoch: 7   Global Step: 130380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-11 13:02:16,907-Speed 3195.20 samples/sec   Loss 2.2523   LearningRate 0.0371   Epoch: 7   Global Step: 130390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:02:20,060-Speed 3248.27 samples/sec   Loss 2.2612   LearningRate 0.0371   Epoch: 7   Global Step: 130400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:02:23,158-Speed 3306.46 samples/sec   Loss 2.2846   LearningRate 0.0371   Epoch: 7   Global Step: 130410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:02:26,299-Speed 3260.33 samples/sec   Loss 2.2814   LearningRate 0.0371   Epoch: 7   Global Step: 130420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-11 13:02:29,367-Speed 3339.61 samples/sec   Loss 2.2341   LearningRate 0.0371   Epoch: 7   Global Step: 130430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:02:32,439-Speed 3333.55 samples/sec   Loss 2.2716   LearningRate 0.0371   Epoch: 7   Global Step: 130440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:02:35,583-Speed 3257.53 samples/sec   Loss 2.2992   LearningRate 0.0371   Epoch: 7   Global Step: 130450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:02:38,669-Speed 3319.26 samples/sec   Loss 2.3472   LearningRate 0.0371   Epoch: 7   Global Step: 130460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:02:41,763-Speed 3310.95 samples/sec   Loss 2.3347   LearningRate 0.0371   Epoch: 7   Global Step: 130470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:02:44,856-Speed 3312.18 samples/sec   Loss 2.2859   LearningRate 0.0371   Epoch: 7   Global Step: 130480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:02:47,921-Speed 3341.90 samples/sec   Loss 2.3934   LearningRate 0.0371   Epoch: 7   Global Step: 130490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:02:51,041-Speed 3282.24 samples/sec   Loss 2.2866   LearningRate 0.0371   Epoch: 7   Global Step: 130500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:02:54,164-Speed 3281.31 samples/sec   Loss 2.2517   LearningRate 0.0371   Epoch: 7   Global Step: 130510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:02:57,240-Speed 3329.87 samples/sec   Loss 2.2943   LearningRate 0.0371   Epoch: 7   Global Step: 130520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:03:00,322-Speed 3323.29 samples/sec   Loss 2.3032   LearningRate 0.0371   Epoch: 7   Global Step: 130530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:03:03,418-Speed 3308.06 samples/sec   Loss 2.2594   LearningRate 0.0371   Epoch: 7   Global Step: 130540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:03:06,602-Speed 3216.28 samples/sec   Loss 2.2600   LearningRate 0.0371   Epoch: 7   Global Step: 130550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:03:09,852-Speed 3153.16 samples/sec   Loss 2.3066   LearningRate 0.0371   Epoch: 7   Global Step: 130560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:03:13,087-Speed 3166.61 samples/sec   Loss 2.3165   LearningRate 0.0371   Epoch: 7   Global Step: 130570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:03:16,155-Speed 3338.20 samples/sec   Loss 2.3208   LearningRate 0.0371   Epoch: 7   Global Step: 130580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:19,275-Speed 3282.87 samples/sec   Loss 2.3310   LearningRate 0.0371   Epoch: 7   Global Step: 130590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:22,464-Speed 3211.57 samples/sec   Loss 2.2930   LearningRate 0.0371   Epoch: 7   Global Step: 130600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:25,576-Speed 3291.50 samples/sec   Loss 2.3128   LearningRate 0.0371   Epoch: 7   Global Step: 130610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:28,715-Speed 3263.12 samples/sec   Loss 2.2458   LearningRate 0.0371   Epoch: 7   Global Step: 130620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:31,805-Speed 3315.03 samples/sec   Loss 2.3219   LearningRate 0.0370   Epoch: 7   Global Step: 130630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:34,896-Speed 3313.10 samples/sec   Loss 2.2819   LearningRate 0.0370   Epoch: 7   Global Step: 130640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:37,983-Speed 3317.79 samples/sec   Loss 2.2596   LearningRate 0.0370   Epoch: 7   Global Step: 130650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:41,074-Speed 3314.13 samples/sec   Loss 2.2540   LearningRate 0.0370   Epoch: 7   Global Step: 130660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:44,165-Speed 3314.02 samples/sec   Loss 2.2767   LearningRate 0.0370   Epoch: 7   Global Step: 130670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:47,328-Speed 3238.18 samples/sec   Loss 2.2871   LearningRate 0.0370   Epoch: 7   Global Step: 130680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:03:50,516-Speed 3212.00 samples/sec   Loss 2.2670   LearningRate 0.0370   Epoch: 7   Global Step: 130690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:03:53,647-Speed 3271.92 samples/sec   Loss 2.3814   LearningRate 0.0370   Epoch: 7   Global Step: 130700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:03:56,762-Speed 3288.57 samples/sec   Loss 2.2807   LearningRate 0.0370   Epoch: 7   Global Step: 130710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:03:59,855-Speed 3310.71 samples/sec   Loss 2.2829   LearningRate 0.0370   Epoch: 7   Global Step: 130720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:02,944-Speed 3316.18 samples/sec   Loss 2.2871   LearningRate 0.0370   Epoch: 7   Global Step: 130730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:06,093-Speed 3253.05 samples/sec   Loss 2.2297   LearningRate 0.0370   Epoch: 7   Global Step: 130740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:09,199-Speed 3297.01 samples/sec   Loss 2.2986   LearningRate 0.0370   Epoch: 7   Global Step: 130750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:12,353-Speed 3248.05 samples/sec   Loss 2.3218   LearningRate 0.0370   Epoch: 7   Global Step: 130760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:15,467-Speed 3289.06 samples/sec   Loss 2.2891   LearningRate 0.0370   Epoch: 7   Global Step: 130770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:18,645-Speed 3223.35 samples/sec   Loss 2.3183   LearningRate 0.0370   Epoch: 7   Global Step: 130780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:21,726-Speed 3324.81 samples/sec   Loss 2.3091   LearningRate 0.0370   Epoch: 7   Global Step: 130790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:24,801-Speed 3330.55 samples/sec   Loss 2.2956   LearningRate 0.0370   Epoch: 7   Global Step: 130800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:27,881-Speed 3325.50 samples/sec   Loss 2.2947   LearningRate 0.0370   Epoch: 7   Global Step: 130810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:04:30,960-Speed 3326.71 samples/sec   Loss 2.3068   LearningRate 0.0370   Epoch: 7   Global Step: 130820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:04:34,106-Speed 3255.91 samples/sec   Loss 2.2477   LearningRate 0.0370   Epoch: 7   Global Step: 130830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:04:37,200-Speed 3309.84 samples/sec   Loss 2.2822   LearningRate 0.0370   Epoch: 7   Global Step: 130840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:04:40,283-Speed 3322.57 samples/sec   Loss 2.2875   LearningRate 0.0370   Epoch: 7   Global Step: 130850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:43,360-Speed 3328.09 samples/sec   Loss 2.2274   LearningRate 0.0370   Epoch: 7   Global Step: 130860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:46,450-Speed 3315.30 samples/sec   Loss 2.2530   LearningRate 0.0370   Epoch: 7   Global Step: 130870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:49,541-Speed 3313.87 samples/sec   Loss 2.3040   LearningRate 0.0370   Epoch: 7   Global Step: 130880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:52,616-Speed 3330.41 samples/sec   Loss 2.2880   LearningRate 0.0370   Epoch: 7   Global Step: 130890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:55,696-Speed 3325.19 samples/sec   Loss 2.3298   LearningRate 0.0370   Epoch: 7   Global Step: 130900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:04:58,780-Speed 3322.60 samples/sec   Loss 2.2278   LearningRate 0.0369   Epoch: 7   Global Step: 130910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:01,853-Speed 3332.21 samples/sec   Loss 2.2607   LearningRate 0.0369   Epoch: 7   Global Step: 130920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:04,947-Speed 3311.48 samples/sec   Loss 2.2611   LearningRate 0.0369   Epoch: 7   Global Step: 130930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:08,066-Speed 3283.72 samples/sec   Loss 2.2492   LearningRate 0.0369   Epoch: 7   Global Step: 130940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:11,229-Speed 3237.54 samples/sec   Loss 2.3446   LearningRate 0.0369   Epoch: 7   Global Step: 130950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:05:14,392-Speed 3238.10 samples/sec   Loss 2.2948   LearningRate 0.0369   Epoch: 7   Global Step: 130960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:05:17,495-Speed 3301.94 samples/sec   Loss 2.3100   LearningRate 0.0369   Epoch: 7   Global Step: 130970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:05:20,577-Speed 3322.77 samples/sec   Loss 2.2899   LearningRate 0.0369   Epoch: 7   Global Step: 130980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:05:23,659-Speed 3322.64 samples/sec   Loss 2.2665   LearningRate 0.0369   Epoch: 7   Global Step: 130990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:05:26,752-Speed 3311.90 samples/sec   Loss 2.3068   LearningRate 0.0369   Epoch: 7   Global Step: 131000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:29,946-Speed 3206.76 samples/sec   Loss 2.2244   LearningRate 0.0369   Epoch: 7   Global Step: 131010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:33,075-Speed 3274.04 samples/sec   Loss 2.2861   LearningRate 0.0369   Epoch: 7   Global Step: 131020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:36,173-Speed 3305.67 samples/sec   Loss 2.2261   LearningRate 0.0369   Epoch: 7   Global Step: 131030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:39,260-Speed 3318.31 samples/sec   Loss 2.2361   LearningRate 0.0369   Epoch: 7   Global Step: 131040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:42,363-Speed 3300.20 samples/sec   Loss 2.2682   LearningRate 0.0369   Epoch: 7   Global Step: 131050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:45,462-Speed 3305.45 samples/sec   Loss 2.3172   LearningRate 0.0369   Epoch: 7   Global Step: 131060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:48,589-Speed 3275.61 samples/sec   Loss 2.2724   LearningRate 0.0369   Epoch: 7   Global Step: 131070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:51,669-Speed 3325.25 samples/sec   Loss 2.3306   LearningRate 0.0369   Epoch: 7   Global Step: 131080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:54,822-Speed 3248.21 samples/sec   Loss 2.2615   LearningRate 0.0369   Epoch: 7   Global Step: 131090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:05:57,955-Speed 3270.17 samples/sec   Loss 2.2253   LearningRate 0.0369   Epoch: 7   Global Step: 131100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:01,035-Speed 3325.24 samples/sec   Loss 2.2521   LearningRate 0.0369   Epoch: 7   Global Step: 131110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:04,186-Speed 3251.06 samples/sec   Loss 2.3077   LearningRate 0.0369   Epoch: 7   Global Step: 131120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:07,280-Speed 3310.08 samples/sec   Loss 2.2726   LearningRate 0.0369   Epoch: 7   Global Step: 131130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:10,369-Speed 3315.10 samples/sec   Loss 2.3023   LearningRate 0.0369   Epoch: 7   Global Step: 131140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:13,545-Speed 3224.76 samples/sec   Loss 2.2061   LearningRate 0.0369   Epoch: 7   Global Step: 131150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:16,643-Speed 3307.26 samples/sec   Loss 2.2932   LearningRate 0.0369   Epoch: 7   Global Step: 131160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:19,732-Speed 3315.34 samples/sec   Loss 2.2209   LearningRate 0.0369   Epoch: 7   Global Step: 131170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:22,824-Speed 3312.74 samples/sec   Loss 2.2747   LearningRate 0.0368   Epoch: 7   Global Step: 131180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:25,979-Speed 3246.46 samples/sec   Loss 2.2197   LearningRate 0.0368   Epoch: 7   Global Step: 131190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:06:29,200-Speed 3179.35 samples/sec   Loss 2.3228   LearningRate 0.0368   Epoch: 7   Global Step: 131200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:06:32,311-Speed 3292.19 samples/sec   Loss 2.2991   LearningRate 0.0368   Epoch: 7   Global Step: 131210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:06:35,420-Speed 3295.80 samples/sec   Loss 2.1743   LearningRate 0.0368   Epoch: 7   Global Step: 131220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:06:38,573-Speed 3248.06 samples/sec   Loss 2.2747   LearningRate 0.0368   Epoch: 7   Global Step: 131230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:06:41,775-Speed 3198.42 samples/sec   Loss 2.2684   LearningRate 0.0368   Epoch: 7   Global Step: 131240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:06:44,869-Speed 3310.68 samples/sec   Loss 2.2533   LearningRate 0.0368   Epoch: 7   Global Step: 131250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:06:47,967-Speed 3306.33 samples/sec   Loss 2.3341   LearningRate 0.0368   Epoch: 7   Global Step: 131260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:06:51,060-Speed 3311.81 samples/sec   Loss 2.2502   LearningRate 0.0368   Epoch: 7   Global Step: 131270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:06:54,227-Speed 3233.50 samples/sec   Loss 2.3397   LearningRate 0.0368   Epoch: 7   Global Step: 131280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:06:57,302-Speed 3331.08 samples/sec   Loss 2.2655   LearningRate 0.0368   Epoch: 7   Global Step: 131290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:00,486-Speed 3216.40 samples/sec   Loss 2.2650   LearningRate 0.0368   Epoch: 7   Global Step: 131300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:03,598-Speed 3291.78 samples/sec   Loss 2.2521   LearningRate 0.0368   Epoch: 7   Global Step: 131310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:06,687-Speed 3316.32 samples/sec   Loss 2.3568   LearningRate 0.0368   Epoch: 7   Global Step: 131320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:09,771-Speed 3320.73 samples/sec   Loss 2.2828   LearningRate 0.0368   Epoch: 7   Global Step: 131330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:12,863-Speed 3312.41 samples/sec   Loss 2.2728   LearningRate 0.0368   Epoch: 7   Global Step: 131340   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:15,942-Speed 3326.39 samples/sec   Loss 2.2349   LearningRate 0.0368   Epoch: 7   Global Step: 131350   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:19,028-Speed 3318.72 samples/sec   Loss 2.2746   LearningRate 0.0368   Epoch: 7   Global Step: 131360   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:22,107-Speed 3326.93 samples/sec   Loss 2.2853   LearningRate 0.0368   Epoch: 7   Global Step: 131370   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:25,194-Speed 3318.16 samples/sec   Loss 2.2428   LearningRate 0.0368   Epoch: 7   Global Step: 131380   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:07:28,376-Speed 3218.38 samples/sec   Loss 2.2199   LearningRate 0.0368   Epoch: 7   Global Step: 131390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:31,482-Speed 3297.90 samples/sec   Loss 2.2374   LearningRate 0.0368   Epoch: 7   Global Step: 131400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:34,638-Speed 3246.11 samples/sec   Loss 2.3198   LearningRate 0.0368   Epoch: 7   Global Step: 131410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:37,715-Speed 3328.46 samples/sec   Loss 2.2501   LearningRate 0.0368   Epoch: 7   Global Step: 131420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:40,790-Speed 3330.54 samples/sec   Loss 2.3408   LearningRate 0.0368   Epoch: 7   Global Step: 131430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:43,868-Speed 3326.84 samples/sec   Loss 2.2281   LearningRate 0.0368   Epoch: 7   Global Step: 131440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:46,967-Speed 3306.17 samples/sec   Loss 2.3381   LearningRate 0.0368   Epoch: 7   Global Step: 131450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:50,139-Speed 3228.26 samples/sec   Loss 2.2749   LearningRate 0.0367   Epoch: 7   Global Step: 131460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:53,370-Speed 3169.93 samples/sec   Loss 2.2844   LearningRate 0.0367   Epoch: 7   Global Step: 131470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:56,508-Speed 3264.10 samples/sec   Loss 2.3283   LearningRate 0.0367   Epoch: 7   Global Step: 131480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:07:59,622-Speed 3289.83 samples/sec   Loss 2.2503   LearningRate 0.0367   Epoch: 7   Global Step: 131490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:08:02,707-Speed 3319.68 samples/sec   Loss 2.2272   LearningRate 0.0367   Epoch: 7   Global Step: 131500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:08:05,784-Speed 3328.79 samples/sec   Loss 2.3457   LearningRate 0.0367   Epoch: 7   Global Step: 131510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:08:08,929-Speed 3256.71 samples/sec   Loss 2.2911   LearningRate 0.0367   Epoch: 7   Global Step: 131520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:08:12,163-Speed 3166.54 samples/sec   Loss 2.2744   LearningRate 0.0367   Epoch: 7   Global Step: 131530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:15,305-Speed 3260.11 samples/sec   Loss 2.3196   LearningRate 0.0367   Epoch: 7   Global Step: 131540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:18,387-Speed 3324.38 samples/sec   Loss 2.2903   LearningRate 0.0367   Epoch: 7   Global Step: 131550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:21,487-Speed 3303.60 samples/sec   Loss 2.3043   LearningRate 0.0367   Epoch: 7   Global Step: 131560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:24,674-Speed 3214.09 samples/sec   Loss 2.2985   LearningRate 0.0367   Epoch: 7   Global Step: 131570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:27,773-Speed 3305.26 samples/sec   Loss 2.3248   LearningRate 0.0367   Epoch: 7   Global Step: 131580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:30,850-Speed 3329.18 samples/sec   Loss 2.3022   LearningRate 0.0367   Epoch: 7   Global Step: 131590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:33,924-Speed 3331.96 samples/sec   Loss 2.2452   LearningRate 0.0367   Epoch: 7   Global Step: 131600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:37,009-Speed 3319.88 samples/sec   Loss 2.3239   LearningRate 0.0367   Epoch: 7   Global Step: 131610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:40,097-Speed 3316.86 samples/sec   Loss 2.2650   LearningRate 0.0367   Epoch: 7   Global Step: 131620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:08:43,312-Speed 3185.90 samples/sec   Loss 2.2854   LearningRate 0.0367   Epoch: 7   Global Step: 131630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:08:46,390-Speed 3328.03 samples/sec   Loss 2.2746   LearningRate 0.0367   Epoch: 7   Global Step: 131640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:08:49,525-Speed 3267.09 samples/sec   Loss 2.2642   LearningRate 0.0367   Epoch: 7   Global Step: 131650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:08:52,733-Speed 3192.97 samples/sec   Loss 2.2587   LearningRate 0.0367   Epoch: 7   Global Step: 131660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:08:55,820-Speed 3317.94 samples/sec   Loss 2.3437   LearningRate 0.0367   Epoch: 7   Global Step: 131670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:08:59,007-Speed 3213.57 samples/sec   Loss 2.3228   LearningRate 0.0367   Epoch: 7   Global Step: 131680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:09:02,076-Speed 3336.55 samples/sec   Loss 2.2788   LearningRate 0.0367   Epoch: 7   Global Step: 131690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:09:05,174-Speed 3307.40 samples/sec   Loss 2.3065   LearningRate 0.0367   Epoch: 7   Global Step: 131700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:09:08,436-Speed 3140.08 samples/sec   Loss 2.2795   LearningRate 0.0367   Epoch: 7   Global Step: 131710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:09:11,596-Speed 3240.38 samples/sec   Loss 2.2653   LearningRate 0.0367   Epoch: 7   Global Step: 131720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:09:14,738-Speed 3260.42 samples/sec   Loss 2.2890   LearningRate 0.0366   Epoch: 7   Global Step: 131730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:09:17,828-Speed 3314.45 samples/sec   Loss 2.3063   LearningRate 0.0366   Epoch: 7   Global Step: 131740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:09:20,926-Speed 3306.19 samples/sec   Loss 2.2736   LearningRate 0.0366   Epoch: 7   Global Step: 131750   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:24,001-Speed 3330.76 samples/sec   Loss 2.2749   LearningRate 0.0366   Epoch: 7   Global Step: 131760   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:27,096-Speed 3309.69 samples/sec   Loss 2.3086   LearningRate 0.0366   Epoch: 7   Global Step: 131770   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:30,190-Speed 3310.53 samples/sec   Loss 2.2923   LearningRate 0.0366   Epoch: 7   Global Step: 131780   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:33,323-Speed 3269.76 samples/sec   Loss 2.2162   LearningRate 0.0366   Epoch: 7   Global Step: 131790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:36,469-Speed 3255.82 samples/sec   Loss 2.2366   LearningRate 0.0366   Epoch: 7   Global Step: 131800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:39,664-Speed 3205.06 samples/sec   Loss 2.3128   LearningRate 0.0366   Epoch: 7   Global Step: 131810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:42,772-Speed 3295.55 samples/sec   Loss 2.2151   LearningRate 0.0366   Epoch: 7   Global Step: 131820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:45,869-Speed 3307.83 samples/sec   Loss 2.3121   LearningRate 0.0366   Epoch: 7   Global Step: 131830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:49,069-Speed 3200.72 samples/sec   Loss 2.2634   LearningRate 0.0366   Epoch: 7   Global Step: 131840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:09:52,197-Speed 3274.25 samples/sec   Loss 2.2371   LearningRate 0.0366   Epoch: 7   Global Step: 131850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:09:55,422-Speed 3175.90 samples/sec   Loss 2.2552   LearningRate 0.0366   Epoch: 7   Global Step: 131860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:09:58,513-Speed 3313.07 samples/sec   Loss 2.2222   LearningRate 0.0366   Epoch: 7   Global Step: 131870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:10:01,672-Speed 3242.85 samples/sec   Loss 2.2850   LearningRate 0.0366   Epoch: 7   Global Step: 131880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:10:04,787-Speed 3287.93 samples/sec   Loss 2.2877   LearningRate 0.0366   Epoch: 7   Global Step: 131890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:10:07,887-Speed 3303.71 samples/sec   Loss 2.2920   LearningRate 0.0366   Epoch: 7   Global Step: 131900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:10:10,984-Speed 3307.53 samples/sec   Loss 2.2853   LearningRate 0.0366   Epoch: 7   Global Step: 131910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:10:14,170-Speed 3215.40 samples/sec   Loss 2.2802   LearningRate 0.0366   Epoch: 7   Global Step: 131920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:10:17,263-Speed 3310.71 samples/sec   Loss 2.3228   LearningRate 0.0366   Epoch: 7   Global Step: 131930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:10:20,459-Speed 3205.25 samples/sec   Loss 2.2975   LearningRate 0.0366   Epoch: 7   Global Step: 131940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:10:23,708-Speed 3152.48 samples/sec   Loss 2.3211   LearningRate 0.0366   Epoch: 7   Global Step: 131950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:10:26,847-Speed 3262.97 samples/sec   Loss 2.2621   LearningRate 0.0366   Epoch: 7   Global Step: 131960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:10:29,954-Speed 3296.24 samples/sec   Loss 2.2654   LearningRate 0.0366   Epoch: 7   Global Step: 131970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:10:33,065-Speed 3292.72 samples/sec   Loss 2.2746   LearningRate 0.0366   Epoch: 7   Global Step: 131980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:10:36,145-Speed 3324.84 samples/sec   Loss 2.2732   LearningRate 0.0366   Epoch: 7   Global Step: 131990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:10:39,225-Speed 3326.17 samples/sec   Loss 2.3383   LearningRate 0.0366   Epoch: 7   Global Step: 132000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:11:22,876-[lfw][132000]XNorm: 23.173646
Training: 2022-04-11 13:11:22,877-[lfw][132000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-11 13:11:22,877-[lfw][132000]Accuracy-Highest: 0.99817
Training: 2022-04-11 13:12:13,529-[cfp_fp][132000]XNorm: 22.119523
Training: 2022-04-11 13:12:13,531-[cfp_fp][132000]Accuracy-Flip: 0.98571+-0.00553
Training: 2022-04-11 13:12:13,532-[cfp_fp][132000]Accuracy-Highest: 0.98700
Training: 2022-04-11 13:12:57,212-[agedb_30][132000]XNorm: 23.628599
Training: 2022-04-11 13:12:57,212-[agedb_30][132000]Accuracy-Flip: 0.98050+-0.00687
Training: 2022-04-11 13:12:57,213-[agedb_30][132000]Accuracy-Highest: 0.98317
Training: 2022-04-11 13:13:00,297-Speed 72.59 samples/sec   Loss 2.2171   LearningRate 0.0365   Epoch: 7   Global Step: 132010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:03,387-Speed 3314.79 samples/sec   Loss 2.2581   LearningRate 0.0365   Epoch: 7   Global Step: 132020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:06,476-Speed 3315.94 samples/sec   Loss 2.2309   LearningRate 0.0365   Epoch: 7   Global Step: 132030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:09,544-Speed 3338.49 samples/sec   Loss 2.2018   LearningRate 0.0365   Epoch: 7   Global Step: 132040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:12,638-Speed 3310.12 samples/sec   Loss 2.3166   LearningRate 0.0365   Epoch: 7   Global Step: 132050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:15,708-Speed 3336.07 samples/sec   Loss 2.2911   LearningRate 0.0365   Epoch: 7   Global Step: 132060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:18,799-Speed 3313.92 samples/sec   Loss 2.2654   LearningRate 0.0365   Epoch: 7   Global Step: 132070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:21,877-Speed 3327.14 samples/sec   Loss 2.3061   LearningRate 0.0365   Epoch: 7   Global Step: 132080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:24,957-Speed 3325.81 samples/sec   Loss 2.2950   LearningRate 0.0365   Epoch: 7   Global Step: 132090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:28,108-Speed 3250.51 samples/sec   Loss 2.2997   LearningRate 0.0365   Epoch: 7   Global Step: 132100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:31,353-Speed 3156.32 samples/sec   Loss 2.2592   LearningRate 0.0365   Epoch: 7   Global Step: 132110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:34,575-Speed 3178.63 samples/sec   Loss 2.2221   LearningRate 0.0365   Epoch: 7   Global Step: 132120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:37,680-Speed 3298.99 samples/sec   Loss 2.2853   LearningRate 0.0365   Epoch: 7   Global Step: 132130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:40,782-Speed 3301.34 samples/sec   Loss 2.2968   LearningRate 0.0365   Epoch: 7   Global Step: 132140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:43,961-Speed 3222.73 samples/sec   Loss 2.2361   LearningRate 0.0365   Epoch: 7   Global Step: 132150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:47,144-Speed 3216.99 samples/sec   Loss 2.2495   LearningRate 0.0365   Epoch: 7   Global Step: 132160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:50,226-Speed 3323.18 samples/sec   Loss 2.2980   LearningRate 0.0365   Epoch: 7   Global Step: 132170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:13:53,306-Speed 3325.43 samples/sec   Loss 2.2830   LearningRate 0.0365   Epoch: 7   Global Step: 132180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:13:56,412-Speed 3298.88 samples/sec   Loss 2.2584   LearningRate 0.0365   Epoch: 7   Global Step: 132190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:13:59,560-Speed 3253.14 samples/sec   Loss 2.2525   LearningRate 0.0365   Epoch: 7   Global Step: 132200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:14:02,656-Speed 3307.70 samples/sec   Loss 2.2365   LearningRate 0.0365   Epoch: 7   Global Step: 132210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:14:05,767-Speed 3292.87 samples/sec   Loss 2.2490   LearningRate 0.0365   Epoch: 7   Global Step: 132220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:14:08,901-Speed 3268.18 samples/sec   Loss 2.1837   LearningRate 0.0365   Epoch: 7   Global Step: 132230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:14:12,047-Speed 3255.86 samples/sec   Loss 2.2272   LearningRate 0.0365   Epoch: 7   Global Step: 132240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:14:15,217-Speed 3230.86 samples/sec   Loss 2.2321   LearningRate 0.0365   Epoch: 7   Global Step: 132250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:14:18,317-Speed 3304.00 samples/sec   Loss 2.3151   LearningRate 0.0365   Epoch: 7   Global Step: 132260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:14:21,402-Speed 3320.02 samples/sec   Loss 2.2145   LearningRate 0.0365   Epoch: 7   Global Step: 132270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:14:24,480-Speed 3328.06 samples/sec   Loss 2.2452   LearningRate 0.0365   Epoch: 7   Global Step: 132280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:27,574-Speed 3309.93 samples/sec   Loss 2.2795   LearningRate 0.0364   Epoch: 7   Global Step: 132290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:30,653-Speed 3326.33 samples/sec   Loss 2.2836   LearningRate 0.0364   Epoch: 7   Global Step: 132300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:33,736-Speed 3322.03 samples/sec   Loss 2.2524   LearningRate 0.0364   Epoch: 7   Global Step: 132310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:36,914-Speed 3223.41 samples/sec   Loss 2.2904   LearningRate 0.0364   Epoch: 7   Global Step: 132320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:40,136-Speed 3179.83 samples/sec   Loss 2.2985   LearningRate 0.0364   Epoch: 7   Global Step: 132330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:43,284-Speed 3252.79 samples/sec   Loss 2.3096   LearningRate 0.0364   Epoch: 7   Global Step: 132340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:46,374-Speed 3315.50 samples/sec   Loss 2.2752   LearningRate 0.0364   Epoch: 7   Global Step: 132350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:49,474-Speed 3302.96 samples/sec   Loss 2.3216   LearningRate 0.0364   Epoch: 7   Global Step: 132360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:52,597-Speed 3280.91 samples/sec   Loss 2.2315   LearningRate 0.0364   Epoch: 7   Global Step: 132370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:55,678-Speed 3323.92 samples/sec   Loss 2.3032   LearningRate 0.0364   Epoch: 7   Global Step: 132380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:14:58,770-Speed 3312.58 samples/sec   Loss 2.1735   LearningRate 0.0364   Epoch: 7   Global Step: 132390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:01,877-Speed 3295.90 samples/sec   Loss 2.3165   LearningRate 0.0364   Epoch: 7   Global Step: 132400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:04,996-Speed 3284.14 samples/sec   Loss 2.2481   LearningRate 0.0364   Epoch: 7   Global Step: 132410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:08,122-Speed 3276.91 samples/sec   Loss 2.2268   LearningRate 0.0364   Epoch: 7   Global Step: 132420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:11,233-Speed 3292.97 samples/sec   Loss 2.2533   LearningRate 0.0364   Epoch: 7   Global Step: 132430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:14,348-Speed 3287.95 samples/sec   Loss 2.2673   LearningRate 0.0364   Epoch: 7   Global Step: 132440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:17,466-Speed 3283.98 samples/sec   Loss 2.2804   LearningRate 0.0364   Epoch: 7   Global Step: 132450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:20,613-Speed 3255.36 samples/sec   Loss 2.2426   LearningRate 0.0364   Epoch: 7   Global Step: 132460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:23,738-Speed 3277.76 samples/sec   Loss 2.2610   LearningRate 0.0364   Epoch: 7   Global Step: 132470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:26,839-Speed 3302.32 samples/sec   Loss 2.2513   LearningRate 0.0364   Epoch: 7   Global Step: 132480   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:15:29,938-Speed 3305.39 samples/sec   Loss 2.2708   LearningRate 0.0364   Epoch: 7   Global Step: 132490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:33,038-Speed 3303.59 samples/sec   Loss 2.1951   LearningRate 0.0364   Epoch: 7   Global Step: 132500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:36,157-Speed 3283.74 samples/sec   Loss 2.2653   LearningRate 0.0364   Epoch: 7   Global Step: 132510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:39,290-Speed 3270.11 samples/sec   Loss 2.2831   LearningRate 0.0364   Epoch: 7   Global Step: 132520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:42,385-Speed 3308.92 samples/sec   Loss 2.1912   LearningRate 0.0364   Epoch: 7   Global Step: 132530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:45,594-Speed 3191.71 samples/sec   Loss 2.2226   LearningRate 0.0364   Epoch: 7   Global Step: 132540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:48,847-Speed 3148.83 samples/sec   Loss 2.3331   LearningRate 0.0364   Epoch: 7   Global Step: 132550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:52,020-Speed 3227.38 samples/sec   Loss 2.3131   LearningRate 0.0363   Epoch: 7   Global Step: 132560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:55,205-Speed 3216.70 samples/sec   Loss 2.2944   LearningRate 0.0363   Epoch: 7   Global Step: 132570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:15:58,350-Speed 3256.63 samples/sec   Loss 2.2339   LearningRate 0.0363   Epoch: 7   Global Step: 132580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:16:01,562-Speed 3188.61 samples/sec   Loss 2.3040   LearningRate 0.0363   Epoch: 7   Global Step: 132590   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:16:04,646-Speed 3321.44 samples/sec   Loss 2.2319   LearningRate 0.0363   Epoch: 7   Global Step: 132600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:16:07,757-Speed 3292.52 samples/sec   Loss 2.2818   LearningRate 0.0363   Epoch: 7   Global Step: 132610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:10,843-Speed 3318.39 samples/sec   Loss 2.2813   LearningRate 0.0363   Epoch: 7   Global Step: 132620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:13,943-Speed 3304.42 samples/sec   Loss 2.2357   LearningRate 0.0363   Epoch: 7   Global Step: 132630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:17,084-Speed 3260.87 samples/sec   Loss 2.2242   LearningRate 0.0363   Epoch: 7   Global Step: 132640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:20,202-Speed 3284.99 samples/sec   Loss 2.2374   LearningRate 0.0363   Epoch: 7   Global Step: 132650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:23,317-Speed 3288.55 samples/sec   Loss 2.2629   LearningRate 0.0363   Epoch: 7   Global Step: 132660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:26,531-Speed 3186.08 samples/sec   Loss 2.2605   LearningRate 0.0363   Epoch: 7   Global Step: 132670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:29,673-Speed 3260.16 samples/sec   Loss 2.2544   LearningRate 0.0363   Epoch: 7   Global Step: 132680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:32,822-Speed 3252.43 samples/sec   Loss 2.2753   LearningRate 0.0363   Epoch: 7   Global Step: 132690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:36,003-Speed 3219.96 samples/sec   Loss 2.2697   LearningRate 0.0363   Epoch: 7   Global Step: 132700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:39,084-Speed 3324.87 samples/sec   Loss 2.1881   LearningRate 0.0363   Epoch: 7   Global Step: 132710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:16:42,170-Speed 3318.89 samples/sec   Loss 2.2637   LearningRate 0.0363   Epoch: 7   Global Step: 132720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:16:45,265-Speed 3309.40 samples/sec   Loss 2.2743   LearningRate 0.0363   Epoch: 7   Global Step: 132730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:16:48,376-Speed 3291.99 samples/sec   Loss 2.2361   LearningRate 0.0363   Epoch: 7   Global Step: 132740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:16:51,486-Speed 3293.56 samples/sec   Loss 2.2717   LearningRate 0.0363   Epoch: 7   Global Step: 132750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:16:54,649-Speed 3238.81 samples/sec   Loss 2.1776   LearningRate 0.0363   Epoch: 7   Global Step: 132760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:16:57,775-Speed 3276.80 samples/sec   Loss 2.2499   LearningRate 0.0363   Epoch: 7   Global Step: 132770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:17:00,928-Speed 3247.85 samples/sec   Loss 2.2696   LearningRate 0.0363   Epoch: 7   Global Step: 132780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:17:04,049-Speed 3281.54 samples/sec   Loss 2.2443   LearningRate 0.0363   Epoch: 7   Global Step: 132790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:17:07,147-Speed 3306.95 samples/sec   Loss 2.2494   LearningRate 0.0363   Epoch: 7   Global Step: 132800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:17:10,274-Speed 3275.66 samples/sec   Loss 2.2443   LearningRate 0.0363   Epoch: 7   Global Step: 132810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:17:13,484-Speed 3190.41 samples/sec   Loss 2.2648   LearningRate 0.0363   Epoch: 7   Global Step: 132820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:17:16,578-Speed 3311.81 samples/sec   Loss 2.3007   LearningRate 0.0363   Epoch: 7   Global Step: 132830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:17:19,666-Speed 3316.31 samples/sec   Loss 2.2142   LearningRate 0.0362   Epoch: 7   Global Step: 132840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:17:22,751-Speed 3320.40 samples/sec   Loss 2.2994   LearningRate 0.0362   Epoch: 7   Global Step: 132850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:17:25,839-Speed 3316.18 samples/sec   Loss 2.3140   LearningRate 0.0362   Epoch: 7   Global Step: 132860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:28,964-Speed 3277.53 samples/sec   Loss 2.2120   LearningRate 0.0362   Epoch: 7   Global Step: 132870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:32,080-Speed 3287.26 samples/sec   Loss 2.1898   LearningRate 0.0362   Epoch: 7   Global Step: 132880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:35,169-Speed 3316.50 samples/sec   Loss 2.2177   LearningRate 0.0362   Epoch: 7   Global Step: 132890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:38,261-Speed 3312.53 samples/sec   Loss 2.3106   LearningRate 0.0362   Epoch: 7   Global Step: 132900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:41,355-Speed 3310.03 samples/sec   Loss 2.2638   LearningRate 0.0362   Epoch: 7   Global Step: 132910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:44,462-Speed 3296.51 samples/sec   Loss 2.2408   LearningRate 0.0362   Epoch: 7   Global Step: 132920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:47,556-Speed 3310.06 samples/sec   Loss 2.3178   LearningRate 0.0362   Epoch: 7   Global Step: 132930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:50,646-Speed 3315.38 samples/sec   Loss 2.2673   LearningRate 0.0362   Epoch: 7   Global Step: 132940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:53,738-Speed 3311.65 samples/sec   Loss 2.2546   LearningRate 0.0362   Epoch: 7   Global Step: 132950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:56,832-Speed 3310.53 samples/sec   Loss 2.3168   LearningRate 0.0362   Epoch: 7   Global Step: 132960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:17:59,961-Speed 3273.34 samples/sec   Loss 2.2412   LearningRate 0.0362   Epoch: 7   Global Step: 132970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:18:03,036-Speed 3331.82 samples/sec   Loss 2.2054   LearningRate 0.0362   Epoch: 7   Global Step: 132980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:06,126-Speed 3313.94 samples/sec   Loss 2.2982   LearningRate 0.0362   Epoch: 7   Global Step: 132990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:09,220-Speed 3311.13 samples/sec   Loss 2.2667   LearningRate 0.0362   Epoch: 7   Global Step: 133000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:12,304-Speed 3321.03 samples/sec   Loss 2.3009   LearningRate 0.0362   Epoch: 7   Global Step: 133010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:15,391-Speed 3317.71 samples/sec   Loss 2.2124   LearningRate 0.0362   Epoch: 7   Global Step: 133020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:18,495-Speed 3299.84 samples/sec   Loss 2.2885   LearningRate 0.0362   Epoch: 7   Global Step: 133030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:21,582-Speed 3318.76 samples/sec   Loss 2.2425   LearningRate 0.0362   Epoch: 7   Global Step: 133040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:24,710-Speed 3273.66 samples/sec   Loss 2.2472   LearningRate 0.0362   Epoch: 7   Global Step: 133050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:27,832-Speed 3281.54 samples/sec   Loss 2.2391   LearningRate 0.0362   Epoch: 7   Global Step: 133060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:30,967-Speed 3266.31 samples/sec   Loss 2.3012   LearningRate 0.0362   Epoch: 7   Global Step: 133070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:18:34,072-Speed 3299.37 samples/sec   Loss 2.1435   LearningRate 0.0362   Epoch: 7   Global Step: 133080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:18:37,233-Speed 3240.01 samples/sec   Loss 2.3379   LearningRate 0.0362   Epoch: 7   Global Step: 133090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:18:40,353-Speed 3283.20 samples/sec   Loss 2.2983   LearningRate 0.0362   Epoch: 7   Global Step: 133100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:18:43,472-Speed 3283.43 samples/sec   Loss 2.2723   LearningRate 0.0362   Epoch: 7   Global Step: 133110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:18:46,609-Speed 3266.17 samples/sec   Loss 2.2963   LearningRate 0.0361   Epoch: 7   Global Step: 133120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:18:49,778-Speed 3232.11 samples/sec   Loss 2.2997   LearningRate 0.0361   Epoch: 7   Global Step: 133130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:18:52,869-Speed 3312.92 samples/sec   Loss 2.3111   LearningRate 0.0361   Epoch: 7   Global Step: 133140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:18:56,072-Speed 3198.16 samples/sec   Loss 2.2874   LearningRate 0.0361   Epoch: 7   Global Step: 133150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:18:59,342-Speed 3132.36 samples/sec   Loss 2.2424   LearningRate 0.0361   Epoch: 7   Global Step: 133160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:02,501-Speed 3242.80 samples/sec   Loss 2.2324   LearningRate 0.0361   Epoch: 7   Global Step: 133170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:05,598-Speed 3306.72 samples/sec   Loss 2.2449   LearningRate 0.0361   Epoch: 7   Global Step: 133180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:08,782-Speed 3216.85 samples/sec   Loss 2.2454   LearningRate 0.0361   Epoch: 7   Global Step: 133190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:11,916-Speed 3268.32 samples/sec   Loss 2.2649   LearningRate 0.0361   Epoch: 7   Global Step: 133200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:15,107-Speed 3209.07 samples/sec   Loss 2.2315   LearningRate 0.0361   Epoch: 7   Global Step: 133210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:18,293-Speed 3215.94 samples/sec   Loss 2.2719   LearningRate 0.0361   Epoch: 7   Global Step: 133220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:21,391-Speed 3305.89 samples/sec   Loss 2.2596   LearningRate 0.0361   Epoch: 7   Global Step: 133230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:24,494-Speed 3300.55 samples/sec   Loss 2.2929   LearningRate 0.0361   Epoch: 7   Global Step: 133240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:27,606-Speed 3291.19 samples/sec   Loss 2.3158   LearningRate 0.0361   Epoch: 7   Global Step: 133250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:30,704-Speed 3306.73 samples/sec   Loss 2.2546   LearningRate 0.0361   Epoch: 7   Global Step: 133260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:33,815-Speed 3291.90 samples/sec   Loss 2.2381   LearningRate 0.0361   Epoch: 7   Global Step: 133270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:36,894-Speed 3327.22 samples/sec   Loss 2.2418   LearningRate 0.0361   Epoch: 7   Global Step: 133280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:39,985-Speed 3312.61 samples/sec   Loss 2.2676   LearningRate 0.0361   Epoch: 7   Global Step: 133290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:43,136-Speed 3250.95 samples/sec   Loss 2.1963   LearningRate 0.0361   Epoch: 7   Global Step: 133300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:46,254-Speed 3284.82 samples/sec   Loss 2.2589   LearningRate 0.0361   Epoch: 7   Global Step: 133310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:49,364-Speed 3293.17 samples/sec   Loss 2.2396   LearningRate 0.0361   Epoch: 7   Global Step: 133320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:52,467-Speed 3301.77 samples/sec   Loss 2.2511   LearningRate 0.0361   Epoch: 7   Global Step: 133330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:55,561-Speed 3309.64 samples/sec   Loss 2.2364   LearningRate 0.0361   Epoch: 7   Global Step: 133340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:19:58,675-Speed 3290.12 samples/sec   Loss 2.2301   LearningRate 0.0361   Epoch: 7   Global Step: 133350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:01,761-Speed 3318.95 samples/sec   Loss 2.1890   LearningRate 0.0361   Epoch: 7   Global Step: 133360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:04,862-Speed 3302.11 samples/sec   Loss 2.2531   LearningRate 0.0361   Epoch: 7   Global Step: 133370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:07,977-Speed 3287.90 samples/sec   Loss 2.2314   LearningRate 0.0361   Epoch: 7   Global Step: 133380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:11,072-Speed 3309.79 samples/sec   Loss 2.3112   LearningRate 0.0360   Epoch: 7   Global Step: 133390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:14,157-Speed 3320.86 samples/sec   Loss 2.1955   LearningRate 0.0360   Epoch: 7   Global Step: 133400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:17,293-Speed 3265.05 samples/sec   Loss 2.2738   LearningRate 0.0360   Epoch: 7   Global Step: 133410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:20,380-Speed 3318.59 samples/sec   Loss 2.3431   LearningRate 0.0360   Epoch: 7   Global Step: 133420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:23,467-Speed 3317.91 samples/sec   Loss 2.2768   LearningRate 0.0360   Epoch: 7   Global Step: 133430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:26,637-Speed 3231.05 samples/sec   Loss 2.1844   LearningRate 0.0360   Epoch: 7   Global Step: 133440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:29,729-Speed 3312.30 samples/sec   Loss 2.2684   LearningRate 0.0360   Epoch: 7   Global Step: 133450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:32,835-Speed 3297.48 samples/sec   Loss 2.2116   LearningRate 0.0360   Epoch: 7   Global Step: 133460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:35,921-Speed 3320.45 samples/sec   Loss 2.2776   LearningRate 0.0360   Epoch: 7   Global Step: 133470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:39,013-Speed 3312.30 samples/sec   Loss 2.2154   LearningRate 0.0360   Epoch: 7   Global Step: 133480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:42,147-Speed 3268.36 samples/sec   Loss 2.2593   LearningRate 0.0360   Epoch: 7   Global Step: 133490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:45,238-Speed 3313.53 samples/sec   Loss 2.2459   LearningRate 0.0360   Epoch: 7   Global Step: 133500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:48,330-Speed 3312.38 samples/sec   Loss 2.3430   LearningRate 0.0360   Epoch: 7   Global Step: 133510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:20:53,668-Speed 1918.51 samples/sec   Loss 2.2983   LearningRate 0.0360   Epoch: 7   Global Step: 133520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:21:32,788-Speed 261.77 samples/sec   Loss 2.0783   LearningRate 0.0360   Epoch: 8   Global Step: 133530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:21:36,160-Speed 3037.87 samples/sec   Loss 1.7827   LearningRate 0.0360   Epoch: 8   Global Step: 133540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:21:39,572-Speed 3001.35 samples/sec   Loss 1.7454   LearningRate 0.0360   Epoch: 8   Global Step: 133550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:21:42,667-Speed 3309.77 samples/sec   Loss 1.7213   LearningRate 0.0360   Epoch: 8   Global Step: 133560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:21:45,818-Speed 3250.75 samples/sec   Loss 1.6949   LearningRate 0.0360   Epoch: 8   Global Step: 133570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:21:48,906-Speed 3316.59 samples/sec   Loss 1.7338   LearningRate 0.0360   Epoch: 8   Global Step: 133580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:21:52,021-Speed 3288.47 samples/sec   Loss 1.6751   LearningRate 0.0360   Epoch: 8   Global Step: 133590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:21:55,118-Speed 3307.48 samples/sec   Loss 1.7189   LearningRate 0.0360   Epoch: 8   Global Step: 133600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:21:58,212-Speed 3309.71 samples/sec   Loss 1.7271   LearningRate 0.0360   Epoch: 8   Global Step: 133610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:01,340-Speed 3274.60 samples/sec   Loss 1.7284   LearningRate 0.0360   Epoch: 8   Global Step: 133620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:04,456-Speed 3286.91 samples/sec   Loss 1.7856   LearningRate 0.0360   Epoch: 8   Global Step: 133630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:07,575-Speed 3283.38 samples/sec   Loss 1.6866   LearningRate 0.0360   Epoch: 8   Global Step: 133640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:10,734-Speed 3242.91 samples/sec   Loss 1.6876   LearningRate 0.0360   Epoch: 8   Global Step: 133650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:13,963-Speed 3171.05 samples/sec   Loss 1.7685   LearningRate 0.0360   Epoch: 8   Global Step: 133660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:17,145-Speed 3219.87 samples/sec   Loss 1.7155   LearningRate 0.0359   Epoch: 8   Global Step: 133670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:20,317-Speed 3229.14 samples/sec   Loss 1.6770   LearningRate 0.0359   Epoch: 8   Global Step: 133680   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:22:23,786-Speed 2952.19 samples/sec   Loss 1.7617   LearningRate 0.0359   Epoch: 8   Global Step: 133690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:26,907-Speed 3281.18 samples/sec   Loss 1.7272   LearningRate 0.0359   Epoch: 8   Global Step: 133700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:30,209-Speed 3102.23 samples/sec   Loss 1.6848   LearningRate 0.0359   Epoch: 8   Global Step: 133710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:33,303-Speed 3310.12 samples/sec   Loss 1.7382   LearningRate 0.0359   Epoch: 8   Global Step: 133720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:36,435-Speed 3270.03 samples/sec   Loss 1.6689   LearningRate 0.0359   Epoch: 8   Global Step: 133730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:39,552-Speed 3285.90 samples/sec   Loss 1.7181   LearningRate 0.0359   Epoch: 8   Global Step: 133740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:22:42,700-Speed 3254.42 samples/sec   Loss 1.7089   LearningRate 0.0359   Epoch: 8   Global Step: 133750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:22:45,879-Speed 3222.04 samples/sec   Loss 1.6274   LearningRate 0.0359   Epoch: 8   Global Step: 133760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:22:49,034-Speed 3245.99 samples/sec   Loss 1.7010   LearningRate 0.0359   Epoch: 8   Global Step: 133770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:22:52,163-Speed 3273.42 samples/sec   Loss 1.6991   LearningRate 0.0359   Epoch: 8   Global Step: 133780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:22:55,286-Speed 3279.36 samples/sec   Loss 1.6806   LearningRate 0.0359   Epoch: 8   Global Step: 133790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:22:58,461-Speed 3226.81 samples/sec   Loss 1.6909   LearningRate 0.0359   Epoch: 8   Global Step: 133800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:23:01,560-Speed 3304.47 samples/sec   Loss 1.6776   LearningRate 0.0359   Epoch: 8   Global Step: 133810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:23:04,669-Speed 3294.27 samples/sec   Loss 1.7517   LearningRate 0.0359   Epoch: 8   Global Step: 133820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:23:07,775-Speed 3297.89 samples/sec   Loss 1.7050   LearningRate 0.0359   Epoch: 8   Global Step: 133830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:23:10,884-Speed 3294.37 samples/sec   Loss 1.6904   LearningRate 0.0359   Epoch: 8   Global Step: 133840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:23:13,982-Speed 3306.28 samples/sec   Loss 1.7213   LearningRate 0.0359   Epoch: 8   Global Step: 133850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:17,076-Speed 3310.73 samples/sec   Loss 1.6937   LearningRate 0.0359   Epoch: 8   Global Step: 133860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:20,165-Speed 3314.98 samples/sec   Loss 1.6971   LearningRate 0.0359   Epoch: 8   Global Step: 133870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:23,245-Speed 3325.92 samples/sec   Loss 1.6819   LearningRate 0.0359   Epoch: 8   Global Step: 133880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:26,329-Speed 3320.79 samples/sec   Loss 1.7362   LearningRate 0.0359   Epoch: 8   Global Step: 133890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:29,413-Speed 3320.78 samples/sec   Loss 1.7290   LearningRate 0.0359   Epoch: 8   Global Step: 133900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:32,521-Speed 3295.99 samples/sec   Loss 1.6900   LearningRate 0.0359   Epoch: 8   Global Step: 133910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:35,617-Speed 3307.85 samples/sec   Loss 1.7242   LearningRate 0.0359   Epoch: 8   Global Step: 133920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:38,712-Speed 3309.69 samples/sec   Loss 1.7042   LearningRate 0.0359   Epoch: 8   Global Step: 133930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:41,810-Speed 3306.02 samples/sec   Loss 1.7140   LearningRate 0.0359   Epoch: 8   Global Step: 133940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:44,881-Speed 3335.57 samples/sec   Loss 1.7526   LearningRate 0.0358   Epoch: 8   Global Step: 133950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:47,966-Speed 3319.70 samples/sec   Loss 1.7106   LearningRate 0.0358   Epoch: 8   Global Step: 133960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:51,049-Speed 3321.87 samples/sec   Loss 1.6952   LearningRate 0.0358   Epoch: 8   Global Step: 133970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:54,135-Speed 3319.15 samples/sec   Loss 1.7127   LearningRate 0.0358   Epoch: 8   Global Step: 133980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:23:57,249-Speed 3289.10 samples/sec   Loss 1.7439   LearningRate 0.0358   Epoch: 8   Global Step: 133990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:24:00,346-Speed 3307.64 samples/sec   Loss 1.7661   LearningRate 0.0358   Epoch: 8   Global Step: 134000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:24:44,284-[lfw][134000]XNorm: 20.831489
Training: 2022-04-11 13:24:44,285-[lfw][134000]Accuracy-Flip: 0.99717+-0.00325
Training: 2022-04-11 13:24:44,285-[lfw][134000]Accuracy-Highest: 0.99817
Training: 2022-04-11 13:25:35,284-[cfp_fp][134000]XNorm: 20.640670
Training: 2022-04-11 13:25:35,284-[cfp_fp][134000]Accuracy-Flip: 0.98814+-0.00474
Training: 2022-04-11 13:25:35,285-[cfp_fp][134000]Accuracy-Highest: 0.98814
Training: 2022-04-11 13:26:19,137-[agedb_30][134000]XNorm: 21.327300
Training: 2022-04-11 13:26:19,137-[agedb_30][134000]Accuracy-Flip: 0.98150+-0.00773
Training: 2022-04-11 13:26:19,138-[agedb_30][134000]Accuracy-Highest: 0.98317
Training: 2022-04-11 13:26:22,211-Speed 72.18 samples/sec   Loss 1.7079   LearningRate 0.0358   Epoch: 8   Global Step: 134010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:26:25,285-Speed 3331.36 samples/sec   Loss 1.7187   LearningRate 0.0358   Epoch: 8   Global Step: 134020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:26:28,357-Speed 3335.13 samples/sec   Loss 1.6630   LearningRate 0.0358   Epoch: 8   Global Step: 134030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:26:31,466-Speed 3293.57 samples/sec   Loss 1.6354   LearningRate 0.0358   Epoch: 8   Global Step: 134040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:26:34,591-Speed 3278.44 samples/sec   Loss 1.7305   LearningRate 0.0358   Epoch: 8   Global Step: 134050   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:26:37,657-Speed 3340.52 samples/sec   Loss 1.6404   LearningRate 0.0358   Epoch: 8   Global Step: 134060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:26:40,730-Speed 3334.10 samples/sec   Loss 1.7674   LearningRate 0.0358   Epoch: 8   Global Step: 134070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:26:43,803-Speed 3332.14 samples/sec   Loss 1.7342   LearningRate 0.0358   Epoch: 8   Global Step: 134080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:26:46,864-Speed 3345.66 samples/sec   Loss 1.7463   LearningRate 0.0358   Epoch: 8   Global Step: 134090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:26:49,983-Speed 3283.86 samples/sec   Loss 1.7133   LearningRate 0.0358   Epoch: 8   Global Step: 134100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:26:53,090-Speed 3296.34 samples/sec   Loss 1.7264   LearningRate 0.0358   Epoch: 8   Global Step: 134110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:26:56,292-Speed 3199.40 samples/sec   Loss 1.7735   LearningRate 0.0358   Epoch: 8   Global Step: 134120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:26:59,421-Speed 3272.97 samples/sec   Loss 1.6929   LearningRate 0.0358   Epoch: 8   Global Step: 134130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:02,501-Speed 3325.56 samples/sec   Loss 1.7574   LearningRate 0.0358   Epoch: 8   Global Step: 134140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:05,583-Speed 3324.23 samples/sec   Loss 1.7582   LearningRate 0.0358   Epoch: 8   Global Step: 134150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:08,698-Speed 3287.31 samples/sec   Loss 1.6420   LearningRate 0.0358   Epoch: 8   Global Step: 134160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:11,770-Speed 3333.84 samples/sec   Loss 1.7743   LearningRate 0.0358   Epoch: 8   Global Step: 134170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:14,854-Speed 3321.62 samples/sec   Loss 1.7383   LearningRate 0.0358   Epoch: 8   Global Step: 134180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:18,022-Speed 3232.82 samples/sec   Loss 1.7511   LearningRate 0.0358   Epoch: 8   Global Step: 134190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:27:21,101-Speed 3326.51 samples/sec   Loss 1.7156   LearningRate 0.0358   Epoch: 8   Global Step: 134200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:27:24,203-Speed 3301.99 samples/sec   Loss 1.7166   LearningRate 0.0358   Epoch: 8   Global Step: 134210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:27:27,398-Speed 3205.67 samples/sec   Loss 1.7244   LearningRate 0.0358   Epoch: 8   Global Step: 134220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:27:30,527-Speed 3272.94 samples/sec   Loss 1.7404   LearningRate 0.0357   Epoch: 8   Global Step: 134230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:27:33,594-Speed 3340.25 samples/sec   Loss 1.7578   LearningRate 0.0357   Epoch: 8   Global Step: 134240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:36,675-Speed 3324.56 samples/sec   Loss 1.7546   LearningRate 0.0357   Epoch: 8   Global Step: 134250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:39,750-Speed 3330.23 samples/sec   Loss 1.7621   LearningRate 0.0357   Epoch: 8   Global Step: 134260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:42,837-Speed 3318.01 samples/sec   Loss 1.6654   LearningRate 0.0357   Epoch: 8   Global Step: 134270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:45,978-Speed 3261.10 samples/sec   Loss 1.7721   LearningRate 0.0357   Epoch: 8   Global Step: 134280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:49,063-Speed 3319.15 samples/sec   Loss 1.7615   LearningRate 0.0357   Epoch: 8   Global Step: 134290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:52,233-Speed 3231.37 samples/sec   Loss 1.7710   LearningRate 0.0357   Epoch: 8   Global Step: 134300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:55,350-Speed 3286.38 samples/sec   Loss 1.7130   LearningRate 0.0357   Epoch: 8   Global Step: 134310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:27:58,421-Speed 3335.34 samples/sec   Loss 1.7748   LearningRate 0.0357   Epoch: 8   Global Step: 134320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:28:01,507-Speed 3318.31 samples/sec   Loss 1.7957   LearningRate 0.0357   Epoch: 8   Global Step: 134330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:28:04,604-Speed 3308.02 samples/sec   Loss 1.7837   LearningRate 0.0357   Epoch: 8   Global Step: 134340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:07,762-Speed 3243.09 samples/sec   Loss 1.7500   LearningRate 0.0357   Epoch: 8   Global Step: 134350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:10,911-Speed 3252.39 samples/sec   Loss 1.7453   LearningRate 0.0357   Epoch: 8   Global Step: 134360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:14,162-Speed 3149.90 samples/sec   Loss 1.7171   LearningRate 0.0357   Epoch: 8   Global Step: 134370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:17,261-Speed 3305.24 samples/sec   Loss 1.7130   LearningRate 0.0357   Epoch: 8   Global Step: 134380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:20,337-Speed 3330.02 samples/sec   Loss 1.7499   LearningRate 0.0357   Epoch: 8   Global Step: 134390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:23,447-Speed 3293.48 samples/sec   Loss 1.7683   LearningRate 0.0357   Epoch: 8   Global Step: 134400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:26,530-Speed 3322.80 samples/sec   Loss 1.7540   LearningRate 0.0357   Epoch: 8   Global Step: 134410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:29,606-Speed 3329.71 samples/sec   Loss 1.7861   LearningRate 0.0357   Epoch: 8   Global Step: 134420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:32,710-Speed 3299.20 samples/sec   Loss 1.7409   LearningRate 0.0357   Epoch: 8   Global Step: 134430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:28:35,772-Speed 3344.87 samples/sec   Loss 1.7253   LearningRate 0.0357   Epoch: 8   Global Step: 134440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:28:38,868-Speed 3308.67 samples/sec   Loss 1.7602   LearningRate 0.0357   Epoch: 8   Global Step: 134450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:28:41,940-Speed 3333.16 samples/sec   Loss 1.7443   LearningRate 0.0357   Epoch: 8   Global Step: 134460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:28:45,018-Speed 3328.63 samples/sec   Loss 1.7659   LearningRate 0.0357   Epoch: 8   Global Step: 134470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:28:48,103-Speed 3319.63 samples/sec   Loss 1.7588   LearningRate 0.0357   Epoch: 8   Global Step: 134480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:28:51,174-Speed 3335.36 samples/sec   Loss 1.8476   LearningRate 0.0357   Epoch: 8   Global Step: 134490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:28:54,251-Speed 3328.51 samples/sec   Loss 1.7856   LearningRate 0.0357   Epoch: 8   Global Step: 134500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:28:57,326-Speed 3330.91 samples/sec   Loss 1.8247   LearningRate 0.0356   Epoch: 8   Global Step: 134510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:00,426-Speed 3303.96 samples/sec   Loss 1.7036   LearningRate 0.0356   Epoch: 8   Global Step: 134520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:03,519-Speed 3312.26 samples/sec   Loss 1.7467   LearningRate 0.0356   Epoch: 8   Global Step: 134530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:06,577-Speed 3348.55 samples/sec   Loss 1.8210   LearningRate 0.0356   Epoch: 8   Global Step: 134540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:09,728-Speed 3250.90 samples/sec   Loss 1.8288   LearningRate 0.0356   Epoch: 8   Global Step: 134550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:12,802-Speed 3331.72 samples/sec   Loss 1.8087   LearningRate 0.0356   Epoch: 8   Global Step: 134560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:15,880-Speed 3328.19 samples/sec   Loss 1.7722   LearningRate 0.0356   Epoch: 8   Global Step: 134570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:18,956-Speed 3329.58 samples/sec   Loss 1.7741   LearningRate 0.0356   Epoch: 8   Global Step: 134580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:22,035-Speed 3326.99 samples/sec   Loss 1.7536   LearningRate 0.0356   Epoch: 8   Global Step: 134590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:25,112-Speed 3327.78 samples/sec   Loss 1.7502   LearningRate 0.0356   Epoch: 8   Global Step: 134600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:28,192-Speed 3325.80 samples/sec   Loss 1.7578   LearningRate 0.0356   Epoch: 8   Global Step: 134610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:31,266-Speed 3332.31 samples/sec   Loss 1.7466   LearningRate 0.0356   Epoch: 8   Global Step: 134620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:34,356-Speed 3317.44 samples/sec   Loss 1.8171   LearningRate 0.0356   Epoch: 8   Global Step: 134630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:37,428-Speed 3334.20 samples/sec   Loss 1.8057   LearningRate 0.0356   Epoch: 8   Global Step: 134640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:40,504-Speed 3330.18 samples/sec   Loss 1.7816   LearningRate 0.0356   Epoch: 8   Global Step: 134650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:43,646-Speed 3258.80 samples/sec   Loss 1.7633   LearningRate 0.0356   Epoch: 8   Global Step: 134660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:46,840-Speed 3207.98 samples/sec   Loss 1.8154   LearningRate 0.0356   Epoch: 8   Global Step: 134670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:49,997-Speed 3243.89 samples/sec   Loss 1.7680   LearningRate 0.0356   Epoch: 8   Global Step: 134680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:53,147-Speed 3251.22 samples/sec   Loss 1.7617   LearningRate 0.0356   Epoch: 8   Global Step: 134690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:56,239-Speed 3313.37 samples/sec   Loss 1.8008   LearningRate 0.0356   Epoch: 8   Global Step: 134700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:29:59,410-Speed 3230.20 samples/sec   Loss 1.8154   LearningRate 0.0356   Epoch: 8   Global Step: 134710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:30:02,630-Speed 3181.18 samples/sec   Loss 1.7927   LearningRate 0.0356   Epoch: 8   Global Step: 134720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:30:05,709-Speed 3326.70 samples/sec   Loss 1.7244   LearningRate 0.0356   Epoch: 8   Global Step: 134730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:30:08,791-Speed 3323.07 samples/sec   Loss 1.7094   LearningRate 0.0356   Epoch: 8   Global Step: 134740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:11,994-Speed 3197.30 samples/sec   Loss 1.7695   LearningRate 0.0356   Epoch: 8   Global Step: 134750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:15,086-Speed 3312.93 samples/sec   Loss 1.7888   LearningRate 0.0356   Epoch: 8   Global Step: 134760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:18,293-Speed 3193.50 samples/sec   Loss 1.7855   LearningRate 0.0356   Epoch: 8   Global Step: 134770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:21,382-Speed 3315.79 samples/sec   Loss 1.8741   LearningRate 0.0356   Epoch: 8   Global Step: 134780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:24,501-Speed 3283.20 samples/sec   Loss 1.8348   LearningRate 0.0355   Epoch: 8   Global Step: 134790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:27,585-Speed 3321.78 samples/sec   Loss 1.7358   LearningRate 0.0355   Epoch: 8   Global Step: 134800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:30,771-Speed 3214.71 samples/sec   Loss 1.8254   LearningRate 0.0355   Epoch: 8   Global Step: 134810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:33,854-Speed 3322.33 samples/sec   Loss 1.8063   LearningRate 0.0355   Epoch: 8   Global Step: 134820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:36,969-Speed 3288.24 samples/sec   Loss 1.7834   LearningRate 0.0355   Epoch: 8   Global Step: 134830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:40,041-Speed 3334.92 samples/sec   Loss 1.8138   LearningRate 0.0355   Epoch: 8   Global Step: 134840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:43,128-Speed 3317.55 samples/sec   Loss 1.9024   LearningRate 0.0355   Epoch: 8   Global Step: 134850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:46,203-Speed 3329.93 samples/sec   Loss 1.7804   LearningRate 0.0355   Epoch: 8   Global Step: 134860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:49,297-Speed 3311.12 samples/sec   Loss 1.7965   LearningRate 0.0355   Epoch: 8   Global Step: 134870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:52,378-Speed 3324.09 samples/sec   Loss 1.7803   LearningRate 0.0355   Epoch: 8   Global Step: 134880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:30:55,437-Speed 3348.05 samples/sec   Loss 1.7398   LearningRate 0.0355   Epoch: 8   Global Step: 134890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:30:58,512-Speed 3331.76 samples/sec   Loss 1.7757   LearningRate 0.0355   Epoch: 8   Global Step: 134900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:01,586-Speed 3331.34 samples/sec   Loss 1.7712   LearningRate 0.0355   Epoch: 8   Global Step: 134910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:04,666-Speed 3325.51 samples/sec   Loss 1.7563   LearningRate 0.0355   Epoch: 8   Global Step: 134920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:07,772-Speed 3297.93 samples/sec   Loss 1.8047   LearningRate 0.0355   Epoch: 8   Global Step: 134930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:10,874-Speed 3302.08 samples/sec   Loss 1.7494   LearningRate 0.0355   Epoch: 8   Global Step: 134940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:14,028-Speed 3248.02 samples/sec   Loss 1.8493   LearningRate 0.0355   Epoch: 8   Global Step: 134950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:17,109-Speed 3324.01 samples/sec   Loss 1.8967   LearningRate 0.0355   Epoch: 8   Global Step: 134960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:20,209-Speed 3304.10 samples/sec   Loss 1.7008   LearningRate 0.0355   Epoch: 8   Global Step: 134970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:23,287-Speed 3327.72 samples/sec   Loss 1.7899   LearningRate 0.0355   Epoch: 8   Global Step: 134980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:26,388-Speed 3303.43 samples/sec   Loss 1.8403   LearningRate 0.0355   Epoch: 8   Global Step: 134990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:31:29,469-Speed 3323.71 samples/sec   Loss 1.8020   LearningRate 0.0355   Epoch: 8   Global Step: 135000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:31:32,547-Speed 3327.50 samples/sec   Loss 1.8440   LearningRate 0.0355   Epoch: 8   Global Step: 135010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:31:35,650-Speed 3301.40 samples/sec   Loss 1.7848   LearningRate 0.0355   Epoch: 8   Global Step: 135020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:31:38,733-Speed 3321.46 samples/sec   Loss 1.7881   LearningRate 0.0355   Epoch: 8   Global Step: 135030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:41,813-Speed 3326.03 samples/sec   Loss 1.7625   LearningRate 0.0355   Epoch: 8   Global Step: 135040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:44,908-Speed 3309.18 samples/sec   Loss 1.7776   LearningRate 0.0355   Epoch: 8   Global Step: 135050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:47,986-Speed 3327.29 samples/sec   Loss 1.7617   LearningRate 0.0355   Epoch: 8   Global Step: 135060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:51,063-Speed 3329.24 samples/sec   Loss 1.7996   LearningRate 0.0354   Epoch: 8   Global Step: 135070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:54,140-Speed 3328.10 samples/sec   Loss 1.7942   LearningRate 0.0354   Epoch: 8   Global Step: 135080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:31:57,216-Speed 3330.36 samples/sec   Loss 1.8366   LearningRate 0.0354   Epoch: 8   Global Step: 135090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:00,333-Speed 3286.05 samples/sec   Loss 1.7597   LearningRate 0.0354   Epoch: 8   Global Step: 135100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:03,410-Speed 3328.36 samples/sec   Loss 1.8466   LearningRate 0.0354   Epoch: 8   Global Step: 135110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:06,493-Speed 3322.56 samples/sec   Loss 1.8283   LearningRate 0.0354   Epoch: 8   Global Step: 135120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:09,625-Speed 3270.26 samples/sec   Loss 1.8687   LearningRate 0.0354   Epoch: 8   Global Step: 135130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:32:12,704-Speed 3326.19 samples/sec   Loss 1.7773   LearningRate 0.0354   Epoch: 8   Global Step: 135140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:32:15,785-Speed 3324.72 samples/sec   Loss 1.7962   LearningRate 0.0354   Epoch: 8   Global Step: 135150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:32:18,861-Speed 3329.67 samples/sec   Loss 1.8229   LearningRate 0.0354   Epoch: 8   Global Step: 135160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:32:21,943-Speed 3323.13 samples/sec   Loss 1.8856   LearningRate 0.0354   Epoch: 8   Global Step: 135170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:32:25,032-Speed 3315.69 samples/sec   Loss 1.7912   LearningRate 0.0354   Epoch: 8   Global Step: 135180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:32:28,109-Speed 3329.25 samples/sec   Loss 1.8109   LearningRate 0.0354   Epoch: 8   Global Step: 135190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:32:31,274-Speed 3235.87 samples/sec   Loss 1.8740   LearningRate 0.0354   Epoch: 8   Global Step: 135200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:34,446-Speed 3228.96 samples/sec   Loss 1.8449   LearningRate 0.0354   Epoch: 8   Global Step: 135210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:37,532-Speed 3319.34 samples/sec   Loss 1.8497   LearningRate 0.0354   Epoch: 8   Global Step: 135220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:40,763-Speed 3169.20 samples/sec   Loss 1.8374   LearningRate 0.0354   Epoch: 8   Global Step: 135230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:43,874-Speed 3292.35 samples/sec   Loss 1.8219   LearningRate 0.0354   Epoch: 8   Global Step: 135240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:46,990-Speed 3287.63 samples/sec   Loss 1.8813   LearningRate 0.0354   Epoch: 8   Global Step: 135250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:50,067-Speed 3329.06 samples/sec   Loss 1.8440   LearningRate 0.0354   Epoch: 8   Global Step: 135260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:53,152-Speed 3319.80 samples/sec   Loss 1.8595   LearningRate 0.0354   Epoch: 8   Global Step: 135270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:56,252-Speed 3304.32 samples/sec   Loss 1.8659   LearningRate 0.0354   Epoch: 8   Global Step: 135280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:32:59,338-Speed 3318.86 samples/sec   Loss 1.8685   LearningRate 0.0354   Epoch: 8   Global Step: 135290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:02,404-Speed 3341.11 samples/sec   Loss 1.8532   LearningRate 0.0354   Epoch: 8   Global Step: 135300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:05,479-Speed 3330.24 samples/sec   Loss 1.8422   LearningRate 0.0354   Epoch: 8   Global Step: 135310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:08,626-Speed 3254.92 samples/sec   Loss 1.8332   LearningRate 0.0354   Epoch: 8   Global Step: 135320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:11,703-Speed 3328.90 samples/sec   Loss 1.8777   LearningRate 0.0354   Epoch: 8   Global Step: 135330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:14,783-Speed 3324.92 samples/sec   Loss 1.8504   LearningRate 0.0354   Epoch: 8   Global Step: 135340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:17,970-Speed 3214.83 samples/sec   Loss 1.8555   LearningRate 0.0353   Epoch: 8   Global Step: 135350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:21,196-Speed 3174.40 samples/sec   Loss 1.8858   LearningRate 0.0353   Epoch: 8   Global Step: 135360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:24,417-Speed 3179.62 samples/sec   Loss 1.8686   LearningRate 0.0353   Epoch: 8   Global Step: 135370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:27,569-Speed 3249.51 samples/sec   Loss 1.8378   LearningRate 0.0353   Epoch: 8   Global Step: 135380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:30,651-Speed 3323.59 samples/sec   Loss 1.8461   LearningRate 0.0353   Epoch: 8   Global Step: 135390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:33:33,722-Speed 3334.40 samples/sec   Loss 1.7859   LearningRate 0.0353   Epoch: 8   Global Step: 135400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:33:36,800-Speed 3328.12 samples/sec   Loss 1.8732   LearningRate 0.0353   Epoch: 8   Global Step: 135410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:33:39,882-Speed 3323.06 samples/sec   Loss 1.8357   LearningRate 0.0353   Epoch: 8   Global Step: 135420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:33:42,987-Speed 3299.07 samples/sec   Loss 1.8356   LearningRate 0.0353   Epoch: 8   Global Step: 135430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:33:46,126-Speed 3262.79 samples/sec   Loss 1.8385   LearningRate 0.0353   Epoch: 8   Global Step: 135440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:33:49,230-Speed 3300.11 samples/sec   Loss 1.8432   LearningRate 0.0353   Epoch: 8   Global Step: 135450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:33:52,318-Speed 3316.61 samples/sec   Loss 1.8887   LearningRate 0.0353   Epoch: 8   Global Step: 135460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:33:55,397-Speed 3326.53 samples/sec   Loss 1.8375   LearningRate 0.0353   Epoch: 8   Global Step: 135470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:33:58,485-Speed 3316.54 samples/sec   Loss 1.9074   LearningRate 0.0353   Epoch: 8   Global Step: 135480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:01,563-Speed 3327.66 samples/sec   Loss 1.8401   LearningRate 0.0353   Epoch: 8   Global Step: 135490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:04,661-Speed 3306.51 samples/sec   Loss 1.7774   LearningRate 0.0353   Epoch: 8   Global Step: 135500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:07,747-Speed 3318.91 samples/sec   Loss 1.8257   LearningRate 0.0353   Epoch: 8   Global Step: 135510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:10,828-Speed 3324.54 samples/sec   Loss 1.7924   LearningRate 0.0353   Epoch: 8   Global Step: 135520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:13,910-Speed 3323.16 samples/sec   Loss 1.8258   LearningRate 0.0353   Epoch: 8   Global Step: 135530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:16,999-Speed 3315.95 samples/sec   Loss 1.8414   LearningRate 0.0353   Epoch: 8   Global Step: 135540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:20,076-Speed 3328.92 samples/sec   Loss 1.7917   LearningRate 0.0353   Epoch: 8   Global Step: 135550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:23,165-Speed 3315.17 samples/sec   Loss 1.8203   LearningRate 0.0353   Epoch: 8   Global Step: 135560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:26,341-Speed 3224.69 samples/sec   Loss 1.8471   LearningRate 0.0353   Epoch: 8   Global Step: 135570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:29,514-Speed 3227.87 samples/sec   Loss 1.8811   LearningRate 0.0353   Epoch: 8   Global Step: 135580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:32,607-Speed 3311.96 samples/sec   Loss 1.8520   LearningRate 0.0353   Epoch: 8   Global Step: 135590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:35,698-Speed 3313.22 samples/sec   Loss 1.8948   LearningRate 0.0353   Epoch: 8   Global Step: 135600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:34:38,764-Speed 3340.51 samples/sec   Loss 1.8714   LearningRate 0.0353   Epoch: 8   Global Step: 135610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:34:41,846-Speed 3323.11 samples/sec   Loss 1.8456   LearningRate 0.0353   Epoch: 8   Global Step: 135620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:34:44,995-Speed 3252.81 samples/sec   Loss 1.8881   LearningRate 0.0352   Epoch: 8   Global Step: 135630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:34:48,090-Speed 3309.30 samples/sec   Loss 1.9323   LearningRate 0.0352   Epoch: 8   Global Step: 135640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:34:51,165-Speed 3331.68 samples/sec   Loss 1.8806   LearningRate 0.0352   Epoch: 8   Global Step: 135650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:34:54,242-Speed 3328.18 samples/sec   Loss 1.8452   LearningRate 0.0352   Epoch: 8   Global Step: 135660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:34:57,339-Speed 3307.51 samples/sec   Loss 1.8742   LearningRate 0.0352   Epoch: 8   Global Step: 135670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:00,507-Speed 3232.57 samples/sec   Loss 1.8789   LearningRate 0.0352   Epoch: 8   Global Step: 135680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:03,591-Speed 3321.28 samples/sec   Loss 1.9205   LearningRate 0.0352   Epoch: 8   Global Step: 135690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:06,830-Speed 3162.12 samples/sec   Loss 1.8977   LearningRate 0.0352   Epoch: 8   Global Step: 135700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:09,932-Speed 3301.85 samples/sec   Loss 1.8059   LearningRate 0.0352   Epoch: 8   Global Step: 135710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:13,021-Speed 3316.50 samples/sec   Loss 1.8128   LearningRate 0.0352   Epoch: 8   Global Step: 135720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:16,096-Speed 3330.53 samples/sec   Loss 1.8851   LearningRate 0.0352   Epoch: 8   Global Step: 135730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:19,263-Speed 3234.15 samples/sec   Loss 1.8972   LearningRate 0.0352   Epoch: 8   Global Step: 135740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:22,436-Speed 3228.00 samples/sec   Loss 1.8853   LearningRate 0.0352   Epoch: 8   Global Step: 135750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:25,511-Speed 3331.00 samples/sec   Loss 1.8011   LearningRate 0.0352   Epoch: 8   Global Step: 135760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:28,643-Speed 3269.42 samples/sec   Loss 1.8125   LearningRate 0.0352   Epoch: 8   Global Step: 135770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:31,736-Speed 3311.97 samples/sec   Loss 1.7897   LearningRate 0.0352   Epoch: 8   Global Step: 135780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:34,894-Speed 3242.85 samples/sec   Loss 1.8591   LearningRate 0.0352   Epoch: 8   Global Step: 135790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:37,981-Speed 3318.22 samples/sec   Loss 1.8701   LearningRate 0.0352   Epoch: 8   Global Step: 135800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:35:41,101-Speed 3283.39 samples/sec   Loss 1.8844   LearningRate 0.0352   Epoch: 8   Global Step: 135810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:35:44,225-Speed 3278.11 samples/sec   Loss 1.8970   LearningRate 0.0352   Epoch: 8   Global Step: 135820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:35:47,364-Speed 3262.47 samples/sec   Loss 1.8226   LearningRate 0.0352   Epoch: 8   Global Step: 135830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:35:50,457-Speed 3311.80 samples/sec   Loss 1.8548   LearningRate 0.0352   Epoch: 8   Global Step: 135840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:35:53,536-Speed 3326.57 samples/sec   Loss 1.8973   LearningRate 0.0352   Epoch: 8   Global Step: 135850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:35:56,617-Speed 3324.55 samples/sec   Loss 1.8606   LearningRate 0.0352   Epoch: 8   Global Step: 135860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:35:59,725-Speed 3295.34 samples/sec   Loss 1.8093   LearningRate 0.0352   Epoch: 8   Global Step: 135870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:02,826-Speed 3302.81 samples/sec   Loss 1.8904   LearningRate 0.0352   Epoch: 8   Global Step: 135880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:05,917-Speed 3313.47 samples/sec   Loss 1.8927   LearningRate 0.0352   Epoch: 8   Global Step: 135890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:08,995-Speed 3327.06 samples/sec   Loss 1.8998   LearningRate 0.0352   Epoch: 8   Global Step: 135900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:12,066-Speed 3335.65 samples/sec   Loss 1.9491   LearningRate 0.0351   Epoch: 8   Global Step: 135910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:15,146-Speed 3325.47 samples/sec   Loss 1.9213   LearningRate 0.0351   Epoch: 8   Global Step: 135920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:18,255-Speed 3294.74 samples/sec   Loss 1.8718   LearningRate 0.0351   Epoch: 8   Global Step: 135930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:21,337-Speed 3322.73 samples/sec   Loss 1.8753   LearningRate 0.0351   Epoch: 8   Global Step: 135940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:24,461-Speed 3278.52 samples/sec   Loss 1.9017   LearningRate 0.0351   Epoch: 8   Global Step: 135950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:27,542-Speed 3325.25 samples/sec   Loss 1.9393   LearningRate 0.0351   Epoch: 8   Global Step: 135960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:30,674-Speed 3269.81 samples/sec   Loss 1.8492   LearningRate 0.0351   Epoch: 8   Global Step: 135970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:33,764-Speed 3314.41 samples/sec   Loss 1.8662   LearningRate 0.0351   Epoch: 8   Global Step: 135980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:36,880-Speed 3287.53 samples/sec   Loss 1.9008   LearningRate 0.0351   Epoch: 8   Global Step: 135990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:36:39,978-Speed 3305.78 samples/sec   Loss 1.8699   LearningRate 0.0351   Epoch: 8   Global Step: 136000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:37:23,769-[lfw][136000]XNorm: 23.429183
Training: 2022-04-11 13:37:23,770-[lfw][136000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-11 13:37:23,770-[lfw][136000]Accuracy-Highest: 0.99817
Training: 2022-04-11 13:38:14,692-[cfp_fp][136000]XNorm: 23.164878
Training: 2022-04-11 13:38:14,693-[cfp_fp][136000]Accuracy-Flip: 0.98743+-0.00578
Training: 2022-04-11 13:38:14,693-[cfp_fp][136000]Accuracy-Highest: 0.98814
Training: 2022-04-11 13:38:58,334-[agedb_30][136000]XNorm: 23.647201
Training: 2022-04-11 13:38:58,334-[agedb_30][136000]Accuracy-Flip: 0.98267+-0.00680
Training: 2022-04-11 13:38:58,335-[agedb_30][136000]Accuracy-Highest: 0.98317
Training: 2022-04-11 13:39:01,418-Speed 72.40 samples/sec   Loss 1.8763   LearningRate 0.0351   Epoch: 8   Global Step: 136010   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:39:04,486-Speed 3338.06 samples/sec   Loss 1.9476   LearningRate 0.0351   Epoch: 8   Global Step: 136020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:07,554-Speed 3338.10 samples/sec   Loss 1.8537   LearningRate 0.0351   Epoch: 8   Global Step: 136030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:10,630-Speed 3330.61 samples/sec   Loss 1.8554   LearningRate 0.0351   Epoch: 8   Global Step: 136040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:13,701-Speed 3335.29 samples/sec   Loss 1.8703   LearningRate 0.0351   Epoch: 8   Global Step: 136050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:16,815-Speed 3288.74 samples/sec   Loss 1.8058   LearningRate 0.0351   Epoch: 8   Global Step: 136060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:19,884-Speed 3337.44 samples/sec   Loss 1.9467   LearningRate 0.0351   Epoch: 8   Global Step: 136070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:22,964-Speed 3325.63 samples/sec   Loss 1.9145   LearningRate 0.0351   Epoch: 8   Global Step: 136080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:26,041-Speed 3329.12 samples/sec   Loss 1.8685   LearningRate 0.0351   Epoch: 8   Global Step: 136090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:29,130-Speed 3315.28 samples/sec   Loss 1.8504   LearningRate 0.0351   Epoch: 8   Global Step: 136100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:32,251-Speed 3281.66 samples/sec   Loss 1.8333   LearningRate 0.0351   Epoch: 8   Global Step: 136110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:35,385-Speed 3268.60 samples/sec   Loss 1.8627   LearningRate 0.0351   Epoch: 8   Global Step: 136120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:38,501-Speed 3286.11 samples/sec   Loss 1.9117   LearningRate 0.0351   Epoch: 8   Global Step: 136130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:39:41,560-Speed 3349.29 samples/sec   Loss 1.8984   LearningRate 0.0351   Epoch: 8   Global Step: 136140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:39:44,632-Speed 3334.27 samples/sec   Loss 1.8670   LearningRate 0.0351   Epoch: 8   Global Step: 136150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:39:47,703-Speed 3334.53 samples/sec   Loss 1.8177   LearningRate 0.0351   Epoch: 8   Global Step: 136160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:39:50,773-Speed 3336.63 samples/sec   Loss 1.8312   LearningRate 0.0351   Epoch: 8   Global Step: 136170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:39:53,842-Speed 3337.65 samples/sec   Loss 1.9200   LearningRate 0.0351   Epoch: 8   Global Step: 136180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:39:56,919-Speed 3329.00 samples/sec   Loss 1.8397   LearningRate 0.0350   Epoch: 8   Global Step: 136190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:39:59,991-Speed 3333.81 samples/sec   Loss 1.9032   LearningRate 0.0350   Epoch: 8   Global Step: 136200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:40:03,082-Speed 3313.20 samples/sec   Loss 1.8971   LearningRate 0.0350   Epoch: 8   Global Step: 136210   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:40:06,164-Speed 3323.62 samples/sec   Loss 1.8517   LearningRate 0.0350   Epoch: 8   Global Step: 136220   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:40:09,238-Speed 3332.15 samples/sec   Loss 1.9190   LearningRate 0.0350   Epoch: 8   Global Step: 136230   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-11 13:40:12,322-Speed 3320.64 samples/sec   Loss 1.9118   LearningRate 0.0350   Epoch: 8   Global Step: 136240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:15,406-Speed 3321.33 samples/sec   Loss 1.8750   LearningRate 0.0350   Epoch: 8   Global Step: 136250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:18,516-Speed 3293.81 samples/sec   Loss 1.8958   LearningRate 0.0350   Epoch: 8   Global Step: 136260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:21,611-Speed 3308.61 samples/sec   Loss 1.8787   LearningRate 0.0350   Epoch: 8   Global Step: 136270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:24,689-Speed 3328.17 samples/sec   Loss 1.8807   LearningRate 0.0350   Epoch: 8   Global Step: 136280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:27,792-Speed 3300.96 samples/sec   Loss 1.8749   LearningRate 0.0350   Epoch: 8   Global Step: 136290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:30,989-Speed 3203.01 samples/sec   Loss 1.9259   LearningRate 0.0350   Epoch: 8   Global Step: 136300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:34,078-Speed 3316.37 samples/sec   Loss 1.9303   LearningRate 0.0350   Epoch: 8   Global Step: 136310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:37,153-Speed 3329.94 samples/sec   Loss 1.8466   LearningRate 0.0350   Epoch: 8   Global Step: 136320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:40,231-Speed 3328.88 samples/sec   Loss 1.8819   LearningRate 0.0350   Epoch: 8   Global Step: 136330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:40:43,314-Speed 3321.36 samples/sec   Loss 1.8459   LearningRate 0.0350   Epoch: 8   Global Step: 136340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:40:46,392-Speed 3327.68 samples/sec   Loss 1.8599   LearningRate 0.0350   Epoch: 8   Global Step: 136350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:40:49,486-Speed 3310.66 samples/sec   Loss 1.8425   LearningRate 0.0350   Epoch: 8   Global Step: 136360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:40:52,567-Speed 3324.74 samples/sec   Loss 1.8862   LearningRate 0.0350   Epoch: 8   Global Step: 136370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:40:55,643-Speed 3329.29 samples/sec   Loss 1.8731   LearningRate 0.0350   Epoch: 8   Global Step: 136380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:40:58,727-Speed 3320.67 samples/sec   Loss 1.9153   LearningRate 0.0350   Epoch: 8   Global Step: 136390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:41:01,814-Speed 3317.97 samples/sec   Loss 1.8674   LearningRate 0.0350   Epoch: 8   Global Step: 136400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:41:04,876-Speed 3345.60 samples/sec   Loss 1.9609   LearningRate 0.0350   Epoch: 8   Global Step: 136410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:07,973-Speed 3306.97 samples/sec   Loss 1.9065   LearningRate 0.0350   Epoch: 8   Global Step: 136420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:11,059-Speed 3319.06 samples/sec   Loss 1.8626   LearningRate 0.0350   Epoch: 8   Global Step: 136430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:14,133-Speed 3331.72 samples/sec   Loss 1.8960   LearningRate 0.0350   Epoch: 8   Global Step: 136440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:17,210-Speed 3329.27 samples/sec   Loss 1.8464   LearningRate 0.0350   Epoch: 8   Global Step: 136450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:20,301-Speed 3313.77 samples/sec   Loss 1.9266   LearningRate 0.0350   Epoch: 8   Global Step: 136460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:23,373-Speed 3333.55 samples/sec   Loss 1.8627   LearningRate 0.0350   Epoch: 8   Global Step: 136470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:26,488-Speed 3288.57 samples/sec   Loss 1.9280   LearningRate 0.0349   Epoch: 8   Global Step: 136480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:29,578-Speed 3314.41 samples/sec   Loss 1.9453   LearningRate 0.0349   Epoch: 8   Global Step: 136490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:32,653-Speed 3330.42 samples/sec   Loss 1.9135   LearningRate 0.0349   Epoch: 8   Global Step: 136500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:41:35,765-Speed 3292.04 samples/sec   Loss 1.8727   LearningRate 0.0349   Epoch: 8   Global Step: 136510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:41:38,855-Speed 3314.81 samples/sec   Loss 1.9135   LearningRate 0.0349   Epoch: 8   Global Step: 136520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:41:41,948-Speed 3311.15 samples/sec   Loss 1.9506   LearningRate 0.0349   Epoch: 8   Global Step: 136530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:41:45,102-Speed 3247.66 samples/sec   Loss 1.8948   LearningRate 0.0349   Epoch: 8   Global Step: 136540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:41:48,292-Speed 3210.71 samples/sec   Loss 1.8596   LearningRate 0.0349   Epoch: 8   Global Step: 136550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:41:51,367-Speed 3330.03 samples/sec   Loss 1.9108   LearningRate 0.0349   Epoch: 8   Global Step: 136560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:41:54,573-Speed 3194.85 samples/sec   Loss 1.9202   LearningRate 0.0349   Epoch: 8   Global Step: 136570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:41:57,710-Speed 3264.81 samples/sec   Loss 1.9716   LearningRate 0.0349   Epoch: 8   Global Step: 136580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:42:00,785-Speed 3331.32 samples/sec   Loss 1.9349   LearningRate 0.0349   Epoch: 8   Global Step: 136590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:42:03,855-Speed 3336.22 samples/sec   Loss 1.8609   LearningRate 0.0349   Epoch: 8   Global Step: 136600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:06,961-Speed 3297.89 samples/sec   Loss 1.8887   LearningRate 0.0349   Epoch: 8   Global Step: 136610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:10,089-Speed 3274.69 samples/sec   Loss 1.9048   LearningRate 0.0349   Epoch: 8   Global Step: 136620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:13,305-Speed 3184.91 samples/sec   Loss 1.9194   LearningRate 0.0349   Epoch: 8   Global Step: 136630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:16,538-Speed 3167.66 samples/sec   Loss 1.9578   LearningRate 0.0349   Epoch: 8   Global Step: 136640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:19,666-Speed 3274.42 samples/sec   Loss 1.9048   LearningRate 0.0349   Epoch: 8   Global Step: 136650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:22,779-Speed 3290.50 samples/sec   Loss 1.8898   LearningRate 0.0349   Epoch: 8   Global Step: 136660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:25,930-Speed 3250.15 samples/sec   Loss 1.9309   LearningRate 0.0349   Epoch: 8   Global Step: 136670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:29,021-Speed 3313.70 samples/sec   Loss 1.9183   LearningRate 0.0349   Epoch: 8   Global Step: 136680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:32,113-Speed 3312.70 samples/sec   Loss 1.9566   LearningRate 0.0349   Epoch: 8   Global Step: 136690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:42:35,202-Speed 3315.31 samples/sec   Loss 1.8944   LearningRate 0.0349   Epoch: 8   Global Step: 136700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:42:38,360-Speed 3243.33 samples/sec   Loss 1.9162   LearningRate 0.0349   Epoch: 8   Global Step: 136710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:42:41,435-Speed 3331.05 samples/sec   Loss 1.8882   LearningRate 0.0349   Epoch: 8   Global Step: 136720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:42:44,594-Speed 3241.91 samples/sec   Loss 1.8806   LearningRate 0.0349   Epoch: 8   Global Step: 136730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:42:47,696-Speed 3302.70 samples/sec   Loss 1.9192   LearningRate 0.0349   Epoch: 8   Global Step: 136740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:42:50,784-Speed 3316.60 samples/sec   Loss 1.8989   LearningRate 0.0349   Epoch: 8   Global Step: 136750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:42:53,885-Speed 3302.33 samples/sec   Loss 1.8417   LearningRate 0.0348   Epoch: 8   Global Step: 136760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:42:56,966-Speed 3324.07 samples/sec   Loss 1.9126   LearningRate 0.0348   Epoch: 8   Global Step: 136770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:43:00,064-Speed 3306.53 samples/sec   Loss 1.9657   LearningRate 0.0348   Epoch: 8   Global Step: 136780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:43:03,162-Speed 3306.92 samples/sec   Loss 1.8862   LearningRate 0.0348   Epoch: 8   Global Step: 136790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:43:06,255-Speed 3311.14 samples/sec   Loss 1.8968   LearningRate 0.0348   Epoch: 8   Global Step: 136800   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:43:09,334-Speed 3326.04 samples/sec   Loss 1.9136   LearningRate 0.0348   Epoch: 8   Global Step: 136810   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:43:12,416-Speed 3323.53 samples/sec   Loss 1.9745   LearningRate 0.0348   Epoch: 8   Global Step: 136820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:43:15,494-Speed 3327.34 samples/sec   Loss 1.8934   LearningRate 0.0348   Epoch: 8   Global Step: 136830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:43:18,574-Speed 3326.29 samples/sec   Loss 1.9635   LearningRate 0.0348   Epoch: 8   Global Step: 136840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:43:21,685-Speed 3291.74 samples/sec   Loss 1.9426   LearningRate 0.0348   Epoch: 8   Global Step: 136850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:43:24,870-Speed 3215.99 samples/sec   Loss 1.9341   LearningRate 0.0348   Epoch: 8   Global Step: 136860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:43:28,012-Speed 3259.81 samples/sec   Loss 1.9017   LearningRate 0.0348   Epoch: 8   Global Step: 136870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:31,148-Speed 3266.54 samples/sec   Loss 1.9430   LearningRate 0.0348   Epoch: 8   Global Step: 136880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:34,223-Speed 3330.66 samples/sec   Loss 1.8975   LearningRate 0.0348   Epoch: 8   Global Step: 136890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:37,324-Speed 3302.93 samples/sec   Loss 1.9395   LearningRate 0.0348   Epoch: 8   Global Step: 136900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:40,421-Speed 3308.35 samples/sec   Loss 1.9757   LearningRate 0.0348   Epoch: 8   Global Step: 136910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:43,511-Speed 3313.96 samples/sec   Loss 2.0387   LearningRate 0.0348   Epoch: 8   Global Step: 136920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:46,583-Speed 3334.14 samples/sec   Loss 1.8939   LearningRate 0.0348   Epoch: 8   Global Step: 136930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:49,730-Speed 3254.45 samples/sec   Loss 1.9221   LearningRate 0.0348   Epoch: 8   Global Step: 136940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:52,922-Speed 3209.20 samples/sec   Loss 1.8983   LearningRate 0.0348   Epoch: 8   Global Step: 136950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:55,998-Speed 3329.83 samples/sec   Loss 1.9786   LearningRate 0.0348   Epoch: 8   Global Step: 136960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:43:59,075-Speed 3328.63 samples/sec   Loss 1.9909   LearningRate 0.0348   Epoch: 8   Global Step: 136970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:44:02,155-Speed 3325.88 samples/sec   Loss 1.8702   LearningRate 0.0348   Epoch: 8   Global Step: 136980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:05,239-Speed 3321.07 samples/sec   Loss 1.8940   LearningRate 0.0348   Epoch: 8   Global Step: 136990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:08,468-Speed 3171.98 samples/sec   Loss 1.9646   LearningRate 0.0348   Epoch: 8   Global Step: 137000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:11,543-Speed 3330.58 samples/sec   Loss 2.0115   LearningRate 0.0348   Epoch: 8   Global Step: 137010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:14,616-Speed 3333.36 samples/sec   Loss 1.9959   LearningRate 0.0348   Epoch: 8   Global Step: 137020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:17,758-Speed 3259.48 samples/sec   Loss 1.9457   LearningRate 0.0348   Epoch: 8   Global Step: 137030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:20,867-Speed 3293.95 samples/sec   Loss 1.8904   LearningRate 0.0347   Epoch: 8   Global Step: 137040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:23,944-Speed 3330.21 samples/sec   Loss 1.9478   LearningRate 0.0347   Epoch: 8   Global Step: 137050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:27,023-Speed 3326.86 samples/sec   Loss 1.9257   LearningRate 0.0347   Epoch: 8   Global Step: 137060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:30,098-Speed 3331.14 samples/sec   Loss 1.9548   LearningRate 0.0347   Epoch: 8   Global Step: 137070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:44:33,178-Speed 3325.34 samples/sec   Loss 1.9426   LearningRate 0.0347   Epoch: 8   Global Step: 137080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:44:36,252-Speed 3331.67 samples/sec   Loss 1.9427   LearningRate 0.0347   Epoch: 8   Global Step: 137090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:44:39,327-Speed 3331.18 samples/sec   Loss 1.8621   LearningRate 0.0347   Epoch: 8   Global Step: 137100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:44:42,408-Speed 3324.38 samples/sec   Loss 1.9046   LearningRate 0.0347   Epoch: 8   Global Step: 137110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:44:45,497-Speed 3315.61 samples/sec   Loss 1.9416   LearningRate 0.0347   Epoch: 8   Global Step: 137120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:44:48,598-Speed 3302.70 samples/sec   Loss 1.9630   LearningRate 0.0347   Epoch: 8   Global Step: 137130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:44:51,762-Speed 3237.53 samples/sec   Loss 1.9497   LearningRate 0.0347   Epoch: 8   Global Step: 137140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:44:54,846-Speed 3320.78 samples/sec   Loss 1.9962   LearningRate 0.0347   Epoch: 8   Global Step: 137150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:44:57,927-Speed 3325.19 samples/sec   Loss 1.9608   LearningRate 0.0347   Epoch: 8   Global Step: 137160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:45:01,009-Speed 3322.42 samples/sec   Loss 1.9540   LearningRate 0.0347   Epoch: 8   Global Step: 137170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:45:04,144-Speed 3267.50 samples/sec   Loss 1.9797   LearningRate 0.0347   Epoch: 8   Global Step: 137180   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:45:09,964-Speed 1759.64 samples/sec   Loss 1.9195   LearningRate 0.0347   Epoch: 8   Global Step: 137190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:45:14,135-Speed 2456.09 samples/sec   Loss 1.9231   LearningRate 0.0347   Epoch: 8   Global Step: 137200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:45:18,512-Speed 2339.60 samples/sec   Loss 1.9846   LearningRate 0.0347   Epoch: 8   Global Step: 137210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:21,588-Speed 3330.36 samples/sec   Loss 1.9349   LearningRate 0.0347   Epoch: 8   Global Step: 137220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:24,684-Speed 3307.50 samples/sec   Loss 1.9658   LearningRate 0.0347   Epoch: 8   Global Step: 137230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:27,786-Speed 3302.92 samples/sec   Loss 1.9527   LearningRate 0.0347   Epoch: 8   Global Step: 137240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:30,862-Speed 3329.32 samples/sec   Loss 1.9590   LearningRate 0.0347   Epoch: 8   Global Step: 137250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:33,943-Speed 3324.40 samples/sec   Loss 1.8812   LearningRate 0.0347   Epoch: 8   Global Step: 137260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:37,038-Speed 3308.78 samples/sec   Loss 1.9223   LearningRate 0.0347   Epoch: 8   Global Step: 137270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:40,123-Speed 3321.18 samples/sec   Loss 1.9408   LearningRate 0.0347   Epoch: 8   Global Step: 137280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:43,222-Speed 3305.31 samples/sec   Loss 1.9770   LearningRate 0.0347   Epoch: 8   Global Step: 137290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:46,303-Speed 3323.46 samples/sec   Loss 1.9480   LearningRate 0.0347   Epoch: 8   Global Step: 137300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:45:49,449-Speed 3255.74 samples/sec   Loss 1.9786   LearningRate 0.0347   Epoch: 8   Global Step: 137310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:45:52,562-Speed 3290.95 samples/sec   Loss 1.9353   LearningRate 0.0346   Epoch: 8   Global Step: 137320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:45:55,649-Speed 3317.07 samples/sec   Loss 1.9623   LearningRate 0.0346   Epoch: 8   Global Step: 137330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:45:58,743-Speed 3310.28 samples/sec   Loss 1.9310   LearningRate 0.0346   Epoch: 8   Global Step: 137340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:01,831-Speed 3316.93 samples/sec   Loss 2.0093   LearningRate 0.0346   Epoch: 8   Global Step: 137350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:04,921-Speed 3315.51 samples/sec   Loss 2.0079   LearningRate 0.0346   Epoch: 8   Global Step: 137360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:08,010-Speed 3315.73 samples/sec   Loss 1.9383   LearningRate 0.0346   Epoch: 8   Global Step: 137370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:11,094-Speed 3321.21 samples/sec   Loss 1.8985   LearningRate 0.0346   Epoch: 8   Global Step: 137380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:14,179-Speed 3319.71 samples/sec   Loss 1.9380   LearningRate 0.0346   Epoch: 8   Global Step: 137390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:17,265-Speed 3319.19 samples/sec   Loss 1.9598   LearningRate 0.0346   Epoch: 8   Global Step: 137400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:20,355-Speed 3315.32 samples/sec   Loss 1.8967   LearningRate 0.0346   Epoch: 8   Global Step: 137410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:23,443-Speed 3317.24 samples/sec   Loss 1.9773   LearningRate 0.0346   Epoch: 8   Global Step: 137420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:26,533-Speed 3314.85 samples/sec   Loss 1.8916   LearningRate 0.0346   Epoch: 8   Global Step: 137430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:29,642-Speed 3293.53 samples/sec   Loss 1.9376   LearningRate 0.0346   Epoch: 8   Global Step: 137440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:32,732-Speed 3315.25 samples/sec   Loss 1.9299   LearningRate 0.0346   Epoch: 8   Global Step: 137450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:35,837-Speed 3298.51 samples/sec   Loss 1.9095   LearningRate 0.0346   Epoch: 8   Global Step: 137460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:38,922-Speed 3320.35 samples/sec   Loss 1.9338   LearningRate 0.0346   Epoch: 8   Global Step: 137470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:42,003-Speed 3324.86 samples/sec   Loss 1.9523   LearningRate 0.0346   Epoch: 8   Global Step: 137480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:45,091-Speed 3316.82 samples/sec   Loss 1.9586   LearningRate 0.0346   Epoch: 8   Global Step: 137490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:48,176-Speed 3319.73 samples/sec   Loss 1.9499   LearningRate 0.0346   Epoch: 8   Global Step: 137500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:51,247-Speed 3335.24 samples/sec   Loss 1.9693   LearningRate 0.0346   Epoch: 8   Global Step: 137510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:54,334-Speed 3317.82 samples/sec   Loss 1.9453   LearningRate 0.0346   Epoch: 8   Global Step: 137520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:46:57,439-Speed 3299.00 samples/sec   Loss 1.9566   LearningRate 0.0346   Epoch: 8   Global Step: 137530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:47:00,520-Speed 3324.27 samples/sec   Loss 1.9788   LearningRate 0.0346   Epoch: 8   Global Step: 137540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:47:03,604-Speed 3321.29 samples/sec   Loss 1.9503   LearningRate 0.0346   Epoch: 8   Global Step: 137550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:47:06,743-Speed 3262.90 samples/sec   Loss 1.9799   LearningRate 0.0346   Epoch: 8   Global Step: 137560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:47:09,933-Speed 3211.48 samples/sec   Loss 1.9485   LearningRate 0.0346   Epoch: 8   Global Step: 137570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:47:13,036-Speed 3300.44 samples/sec   Loss 1.9462   LearningRate 0.0346   Epoch: 8   Global Step: 137580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:47:16,119-Speed 3322.01 samples/sec   Loss 1.9071   LearningRate 0.0346   Epoch: 8   Global Step: 137590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:47:19,198-Speed 3326.51 samples/sec   Loss 1.9639   LearningRate 0.0346   Epoch: 8   Global Step: 137600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:47:22,280-Speed 3323.49 samples/sec   Loss 1.9510   LearningRate 0.0345   Epoch: 8   Global Step: 137610   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:47:25,352-Speed 3334.36 samples/sec   Loss 1.9311   LearningRate 0.0345   Epoch: 8   Global Step: 137620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:47:28,414-Speed 3344.42 samples/sec   Loss 1.9689   LearningRate 0.0345   Epoch: 8   Global Step: 137630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:31,488-Speed 3331.99 samples/sec   Loss 1.9667   LearningRate 0.0345   Epoch: 8   Global Step: 137640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:34,566-Speed 3328.32 samples/sec   Loss 1.8571   LearningRate 0.0345   Epoch: 8   Global Step: 137650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:37,653-Speed 3317.94 samples/sec   Loss 1.8920   LearningRate 0.0345   Epoch: 8   Global Step: 137660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:40,858-Speed 3196.00 samples/sec   Loss 1.9345   LearningRate 0.0345   Epoch: 8   Global Step: 137670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:43,947-Speed 3315.56 samples/sec   Loss 1.9737   LearningRate 0.0345   Epoch: 8   Global Step: 137680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:47,026-Speed 3325.78 samples/sec   Loss 2.0522   LearningRate 0.0345   Epoch: 8   Global Step: 137690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:50,139-Speed 3290.04 samples/sec   Loss 2.0054   LearningRate 0.0345   Epoch: 8   Global Step: 137700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:53,240-Speed 3303.44 samples/sec   Loss 1.9693   LearningRate 0.0345   Epoch: 8   Global Step: 137710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:56,316-Speed 3329.69 samples/sec   Loss 1.9664   LearningRate 0.0345   Epoch: 8   Global Step: 137720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:47:59,396-Speed 3326.01 samples/sec   Loss 1.9110   LearningRate 0.0345   Epoch: 8   Global Step: 137730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:02,499-Speed 3301.17 samples/sec   Loss 1.9803   LearningRate 0.0345   Epoch: 8   Global Step: 137740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:05,597-Speed 3306.09 samples/sec   Loss 1.9813   LearningRate 0.0345   Epoch: 8   Global Step: 137750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:08,692-Speed 3308.85 samples/sec   Loss 1.9343   LearningRate 0.0345   Epoch: 8   Global Step: 137760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:11,775-Speed 3322.28 samples/sec   Loss 1.9800   LearningRate 0.0345   Epoch: 8   Global Step: 137770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:14,857-Speed 3322.84 samples/sec   Loss 1.9322   LearningRate 0.0345   Epoch: 8   Global Step: 137780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:17,939-Speed 3324.10 samples/sec   Loss 1.9210   LearningRate 0.0345   Epoch: 8   Global Step: 137790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:21,054-Speed 3288.09 samples/sec   Loss 1.9436   LearningRate 0.0345   Epoch: 8   Global Step: 137800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:24,153-Speed 3305.11 samples/sec   Loss 1.9571   LearningRate 0.0345   Epoch: 8   Global Step: 137810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:27,234-Speed 3324.37 samples/sec   Loss 1.9467   LearningRate 0.0345   Epoch: 8   Global Step: 137820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:30,303-Speed 3337.71 samples/sec   Loss 1.9332   LearningRate 0.0345   Epoch: 8   Global Step: 137830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:33,391-Speed 3316.54 samples/sec   Loss 1.9473   LearningRate 0.0345   Epoch: 8   Global Step: 137840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:36,474-Speed 3321.70 samples/sec   Loss 1.9761   LearningRate 0.0345   Epoch: 8   Global Step: 137850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:39,559-Speed 3321.01 samples/sec   Loss 1.9674   LearningRate 0.0345   Epoch: 8   Global Step: 137860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:42,645-Speed 3318.29 samples/sec   Loss 1.9616   LearningRate 0.0345   Epoch: 8   Global Step: 137870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:45,724-Speed 3326.52 samples/sec   Loss 1.9627   LearningRate 0.0345   Epoch: 8   Global Step: 137880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:48,902-Speed 3223.50 samples/sec   Loss 1.9342   LearningRate 0.0344   Epoch: 8   Global Step: 137890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:51,991-Speed 3315.65 samples/sec   Loss 1.9193   LearningRate 0.0344   Epoch: 8   Global Step: 137900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:55,085-Speed 3311.15 samples/sec   Loss 1.9596   LearningRate 0.0344   Epoch: 8   Global Step: 137910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:48:58,218-Speed 3268.99 samples/sec   Loss 1.9464   LearningRate 0.0344   Epoch: 8   Global Step: 137920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:49:01,304-Speed 3318.87 samples/sec   Loss 1.9661   LearningRate 0.0344   Epoch: 8   Global Step: 137930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:49:04,431-Speed 3275.49 samples/sec   Loss 1.9403   LearningRate 0.0344   Epoch: 8   Global Step: 137940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:49:07,562-Speed 3270.59 samples/sec   Loss 1.9869   LearningRate 0.0344   Epoch: 8   Global Step: 137950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:49:10,644-Speed 3324.26 samples/sec   Loss 1.9361   LearningRate 0.0344   Epoch: 8   Global Step: 137960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:49:13,722-Speed 3327.00 samples/sec   Loss 1.9715   LearningRate 0.0344   Epoch: 8   Global Step: 137970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:49:16,799-Speed 3328.74 samples/sec   Loss 1.9929   LearningRate 0.0344   Epoch: 8   Global Step: 137980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:49:19,912-Speed 3290.78 samples/sec   Loss 1.9701   LearningRate 0.0344   Epoch: 8   Global Step: 137990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:49:23,040-Speed 3274.62 samples/sec   Loss 1.9535   LearningRate 0.0344   Epoch: 8   Global Step: 138000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:50:07,231-[lfw][138000]XNorm: 21.115000
Training: 2022-04-11 13:50:07,232-[lfw][138000]Accuracy-Flip: 0.99800+-0.00256
Training: 2022-04-11 13:50:07,232-[lfw][138000]Accuracy-Highest: 0.99817
Training: 2022-04-11 13:50:58,694-[cfp_fp][138000]XNorm: 20.823176
Training: 2022-04-11 13:50:58,695-[cfp_fp][138000]Accuracy-Flip: 0.98700+-0.00536
Training: 2022-04-11 13:50:58,695-[cfp_fp][138000]Accuracy-Highest: 0.98814
Training: 2022-04-11 13:51:42,295-[agedb_30][138000]XNorm: 21.850276
Training: 2022-04-11 13:51:42,296-[agedb_30][138000]Accuracy-Flip: 0.98167+-0.00792
Training: 2022-04-11 13:51:42,296-[agedb_30][138000]Accuracy-Highest: 0.98317
Training: 2022-04-11 13:51:45,397-Speed 71.93 samples/sec   Loss 1.9933   LearningRate 0.0344   Epoch: 8   Global Step: 138010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:51:48,495-Speed 3306.37 samples/sec   Loss 2.0032   LearningRate 0.0344   Epoch: 8   Global Step: 138020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:51:51,575-Speed 3325.86 samples/sec   Loss 1.9884   LearningRate 0.0344   Epoch: 8   Global Step: 138030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:51:54,696-Speed 3281.97 samples/sec   Loss 2.0499   LearningRate 0.0344   Epoch: 8   Global Step: 138040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:51:57,763-Speed 3338.56 samples/sec   Loss 1.9466   LearningRate 0.0344   Epoch: 8   Global Step: 138050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:00,844-Speed 3324.68 samples/sec   Loss 2.0222   LearningRate 0.0344   Epoch: 8   Global Step: 138060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:03,966-Speed 3281.05 samples/sec   Loss 1.9376   LearningRate 0.0344   Epoch: 8   Global Step: 138070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:07,040-Speed 3331.78 samples/sec   Loss 1.9821   LearningRate 0.0344   Epoch: 8   Global Step: 138080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:10,126-Speed 3319.00 samples/sec   Loss 2.0002   LearningRate 0.0344   Epoch: 8   Global Step: 138090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:13,238-Speed 3290.92 samples/sec   Loss 1.9461   LearningRate 0.0344   Epoch: 8   Global Step: 138100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:16,342-Speed 3299.58 samples/sec   Loss 1.9271   LearningRate 0.0344   Epoch: 8   Global Step: 138110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:19,412-Speed 3336.29 samples/sec   Loss 1.9613   LearningRate 0.0344   Epoch: 8   Global Step: 138120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:22,505-Speed 3312.29 samples/sec   Loss 1.9500   LearningRate 0.0344   Epoch: 8   Global Step: 138130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:25,577-Speed 3334.17 samples/sec   Loss 1.9734   LearningRate 0.0344   Epoch: 8   Global Step: 138140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:28,656-Speed 3325.94 samples/sec   Loss 1.9716   LearningRate 0.0344   Epoch: 8   Global Step: 138150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:31,740-Speed 3321.69 samples/sec   Loss 1.9952   LearningRate 0.0344   Epoch: 8   Global Step: 138160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:34,824-Speed 3320.68 samples/sec   Loss 1.9801   LearningRate 0.0344   Epoch: 8   Global Step: 138170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:37,909-Speed 3320.02 samples/sec   Loss 1.9735   LearningRate 0.0343   Epoch: 8   Global Step: 138180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:40,988-Speed 3326.21 samples/sec   Loss 2.0436   LearningRate 0.0343   Epoch: 8   Global Step: 138190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:44,094-Speed 3298.45 samples/sec   Loss 1.9904   LearningRate 0.0343   Epoch: 8   Global Step: 138200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:47,212-Speed 3284.43 samples/sec   Loss 1.9852   LearningRate 0.0343   Epoch: 8   Global Step: 138210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:52:50,353-Speed 3260.96 samples/sec   Loss 1.9742   LearningRate 0.0343   Epoch: 8   Global Step: 138220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:53,461-Speed 3295.19 samples/sec   Loss 1.9908   LearningRate 0.0343   Epoch: 8   Global Step: 138230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:56,533-Speed 3334.73 samples/sec   Loss 1.9699   LearningRate 0.0343   Epoch: 8   Global Step: 138240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:52:59,637-Speed 3300.01 samples/sec   Loss 2.0010   LearningRate 0.0343   Epoch: 8   Global Step: 138250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:02,706-Speed 3337.42 samples/sec   Loss 1.9254   LearningRate 0.0343   Epoch: 8   Global Step: 138260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:05,796-Speed 3313.87 samples/sec   Loss 2.0035   LearningRate 0.0343   Epoch: 8   Global Step: 138270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:08,874-Speed 3328.03 samples/sec   Loss 2.0153   LearningRate 0.0343   Epoch: 8   Global Step: 138280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:11,969-Speed 3309.51 samples/sec   Loss 1.9889   LearningRate 0.0343   Epoch: 8   Global Step: 138290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:15,065-Speed 3308.61 samples/sec   Loss 1.9924   LearningRate 0.0343   Epoch: 8   Global Step: 138300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:18,146-Speed 3324.20 samples/sec   Loss 2.0424   LearningRate 0.0343   Epoch: 8   Global Step: 138310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:21,209-Speed 3343.95 samples/sec   Loss 2.0281   LearningRate 0.0343   Epoch: 8   Global Step: 138320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:24,294-Speed 3319.96 samples/sec   Loss 2.0433   LearningRate 0.0343   Epoch: 8   Global Step: 138330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:27,371-Speed 3328.62 samples/sec   Loss 1.9856   LearningRate 0.0343   Epoch: 8   Global Step: 138340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:30,458-Speed 3318.00 samples/sec   Loss 2.0575   LearningRate 0.0343   Epoch: 8   Global Step: 138350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:33,550-Speed 3312.75 samples/sec   Loss 1.9822   LearningRate 0.0343   Epoch: 8   Global Step: 138360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:53:36,631-Speed 3323.78 samples/sec   Loss 1.9296   LearningRate 0.0343   Epoch: 8   Global Step: 138370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:53:39,715-Speed 3321.64 samples/sec   Loss 1.9811   LearningRate 0.0343   Epoch: 8   Global Step: 138380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:53:42,805-Speed 3314.61 samples/sec   Loss 1.9530   LearningRate 0.0343   Epoch: 8   Global Step: 138390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:53:45,902-Speed 3306.90 samples/sec   Loss 2.0019   LearningRate 0.0343   Epoch: 8   Global Step: 138400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:53:48,979-Speed 3328.77 samples/sec   Loss 1.9908   LearningRate 0.0343   Epoch: 8   Global Step: 138410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:53:52,069-Speed 3314.91 samples/sec   Loss 1.9456   LearningRate 0.0343   Epoch: 8   Global Step: 138420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:53:55,144-Speed 3330.22 samples/sec   Loss 1.9939   LearningRate 0.0343   Epoch: 8   Global Step: 138430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:53:58,244-Speed 3304.02 samples/sec   Loss 1.9779   LearningRate 0.0343   Epoch: 8   Global Step: 138440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:01,363-Speed 3284.69 samples/sec   Loss 2.0001   LearningRate 0.0343   Epoch: 8   Global Step: 138450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:04,462-Speed 3304.60 samples/sec   Loss 2.0069   LearningRate 0.0342   Epoch: 8   Global Step: 138460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:07,538-Speed 3330.11 samples/sec   Loss 1.9706   LearningRate 0.0342   Epoch: 8   Global Step: 138470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:54:10,604-Speed 3340.21 samples/sec   Loss 1.9046   LearningRate 0.0342   Epoch: 8   Global Step: 138480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:13,697-Speed 3312.09 samples/sec   Loss 1.9921   LearningRate 0.0342   Epoch: 8   Global Step: 138490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:16,777-Speed 3325.21 samples/sec   Loss 1.9828   LearningRate 0.0342   Epoch: 8   Global Step: 138500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:19,896-Speed 3284.04 samples/sec   Loss 2.0384   LearningRate 0.0342   Epoch: 8   Global Step: 138510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:22,977-Speed 3324.17 samples/sec   Loss 1.9714   LearningRate 0.0342   Epoch: 8   Global Step: 138520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:26,077-Speed 3304.30 samples/sec   Loss 2.0120   LearningRate 0.0342   Epoch: 8   Global Step: 138530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:29,158-Speed 3323.79 samples/sec   Loss 2.0205   LearningRate 0.0342   Epoch: 8   Global Step: 138540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:32,253-Speed 3309.94 samples/sec   Loss 1.9654   LearningRate 0.0342   Epoch: 8   Global Step: 138550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:35,329-Speed 3329.92 samples/sec   Loss 1.9752   LearningRate 0.0342   Epoch: 8   Global Step: 138560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:38,403-Speed 3331.69 samples/sec   Loss 1.9900   LearningRate 0.0342   Epoch: 8   Global Step: 138570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:54:41,479-Speed 3329.44 samples/sec   Loss 2.0328   LearningRate 0.0342   Epoch: 8   Global Step: 138580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:54:44,558-Speed 3326.86 samples/sec   Loss 1.9485   LearningRate 0.0342   Epoch: 8   Global Step: 138590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:54:47,643-Speed 3319.22 samples/sec   Loss 2.0194   LearningRate 0.0342   Epoch: 8   Global Step: 138600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:54:50,719-Speed 3329.99 samples/sec   Loss 1.9703   LearningRate 0.0342   Epoch: 8   Global Step: 138610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:54:53,802-Speed 3322.63 samples/sec   Loss 1.9256   LearningRate 0.0342   Epoch: 8   Global Step: 138620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:54:56,889-Speed 3318.18 samples/sec   Loss 1.9578   LearningRate 0.0342   Epoch: 8   Global Step: 138630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:54:59,971-Speed 3323.38 samples/sec   Loss 1.9619   LearningRate 0.0342   Epoch: 8   Global Step: 138640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:03,050-Speed 3325.91 samples/sec   Loss 1.9375   LearningRate 0.0342   Epoch: 8   Global Step: 138650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:06,153-Speed 3300.66 samples/sec   Loss 1.9994   LearningRate 0.0342   Epoch: 8   Global Step: 138660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:09,298-Speed 3257.05 samples/sec   Loss 1.9553   LearningRate 0.0342   Epoch: 8   Global Step: 138670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:12,386-Speed 3316.50 samples/sec   Loss 1.9775   LearningRate 0.0342   Epoch: 8   Global Step: 138680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:15,470-Speed 3321.31 samples/sec   Loss 2.0492   LearningRate 0.0342   Epoch: 8   Global Step: 138690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:18,569-Speed 3304.81 samples/sec   Loss 2.0581   LearningRate 0.0342   Epoch: 8   Global Step: 138700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:21,664-Speed 3310.69 samples/sec   Loss 2.0247   LearningRate 0.0342   Epoch: 8   Global Step: 138710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:24,737-Speed 3333.07 samples/sec   Loss 2.0117   LearningRate 0.0342   Epoch: 8   Global Step: 138720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:27,816-Speed 3326.93 samples/sec   Loss 2.0472   LearningRate 0.0342   Epoch: 8   Global Step: 138730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:30,912-Speed 3308.19 samples/sec   Loss 1.9378   LearningRate 0.0342   Epoch: 8   Global Step: 138740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:34,019-Speed 3296.83 samples/sec   Loss 2.0025   LearningRate 0.0341   Epoch: 8   Global Step: 138750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:55:37,096-Speed 3328.03 samples/sec   Loss 1.9803   LearningRate 0.0341   Epoch: 8   Global Step: 138760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:55:40,173-Speed 3328.82 samples/sec   Loss 2.0056   LearningRate 0.0341   Epoch: 8   Global Step: 138770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:55:43,265-Speed 3312.09 samples/sec   Loss 1.9492   LearningRate 0.0341   Epoch: 8   Global Step: 138780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:55:46,350-Speed 3321.14 samples/sec   Loss 2.0117   LearningRate 0.0341   Epoch: 8   Global Step: 138790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:55:49,437-Speed 3317.25 samples/sec   Loss 1.9976   LearningRate 0.0341   Epoch: 8   Global Step: 138800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:55:52,512-Speed 3331.36 samples/sec   Loss 1.9538   LearningRate 0.0341   Epoch: 8   Global Step: 138810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:55:55,594-Speed 3323.58 samples/sec   Loss 2.0363   LearningRate 0.0341   Epoch: 8   Global Step: 138820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:55:58,682-Speed 3316.95 samples/sec   Loss 1.9508   LearningRate 0.0341   Epoch: 8   Global Step: 138830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:56:01,765-Speed 3321.38 samples/sec   Loss 1.9942   LearningRate 0.0341   Epoch: 8   Global Step: 138840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:56:04,847-Speed 3323.46 samples/sec   Loss 1.9988   LearningRate 0.0341   Epoch: 8   Global Step: 138850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:56:07,945-Speed 3305.88 samples/sec   Loss 1.9931   LearningRate 0.0341   Epoch: 8   Global Step: 138860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:11,030-Speed 3320.34 samples/sec   Loss 2.0289   LearningRate 0.0341   Epoch: 8   Global Step: 138870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:14,115-Speed 3320.27 samples/sec   Loss 1.9940   LearningRate 0.0341   Epoch: 8   Global Step: 138880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:17,200-Speed 3320.56 samples/sec   Loss 1.9709   LearningRate 0.0341   Epoch: 8   Global Step: 138890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:20,299-Speed 3304.32 samples/sec   Loss 1.9334   LearningRate 0.0341   Epoch: 8   Global Step: 138900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:23,395-Speed 3308.61 samples/sec   Loss 1.9916   LearningRate 0.0341   Epoch: 8   Global Step: 138910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:26,490-Speed 3308.96 samples/sec   Loss 1.9473   LearningRate 0.0341   Epoch: 8   Global Step: 138920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:29,636-Speed 3255.51 samples/sec   Loss 1.9413   LearningRate 0.0341   Epoch: 8   Global Step: 138930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:32,729-Speed 3311.80 samples/sec   Loss 2.0382   LearningRate 0.0341   Epoch: 8   Global Step: 138940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:35,838-Speed 3294.15 samples/sec   Loss 1.9088   LearningRate 0.0341   Epoch: 8   Global Step: 138950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:38,970-Speed 3270.15 samples/sec   Loss 2.0246   LearningRate 0.0341   Epoch: 8   Global Step: 138960   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:56:42,115-Speed 3256.43 samples/sec   Loss 1.9817   LearningRate 0.0341   Epoch: 8   Global Step: 138970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:45,197-Speed 3323.71 samples/sec   Loss 1.9975   LearningRate 0.0341   Epoch: 8   Global Step: 138980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:48,273-Speed 3330.53 samples/sec   Loss 1.9824   LearningRate 0.0341   Epoch: 8   Global Step: 138990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:51,344-Speed 3335.09 samples/sec   Loss 2.0193   LearningRate 0.0341   Epoch: 8   Global Step: 139000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:54,469-Speed 3277.48 samples/sec   Loss 1.9880   LearningRate 0.0341   Epoch: 8   Global Step: 139010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:56:57,607-Speed 3264.40 samples/sec   Loss 1.9703   LearningRate 0.0341   Epoch: 8   Global Step: 139020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:00,747-Speed 3261.63 samples/sec   Loss 1.9440   LearningRate 0.0340   Epoch: 8   Global Step: 139030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:03,838-Speed 3312.90 samples/sec   Loss 2.0090   LearningRate 0.0340   Epoch: 8   Global Step: 139040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:06,913-Speed 3331.04 samples/sec   Loss 2.0706   LearningRate 0.0340   Epoch: 8   Global Step: 139050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:10,005-Speed 3312.64 samples/sec   Loss 2.0014   LearningRate 0.0340   Epoch: 8   Global Step: 139060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:13,071-Speed 3341.37 samples/sec   Loss 1.9772   LearningRate 0.0340   Epoch: 8   Global Step: 139070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:16,157-Speed 3318.34 samples/sec   Loss 1.9951   LearningRate 0.0340   Epoch: 8   Global Step: 139080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:19,237-Speed 3325.30 samples/sec   Loss 1.9814   LearningRate 0.0340   Epoch: 8   Global Step: 139090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:22,312-Speed 3331.06 samples/sec   Loss 2.0336   LearningRate 0.0340   Epoch: 8   Global Step: 139100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:25,388-Speed 3329.74 samples/sec   Loss 1.9551   LearningRate 0.0340   Epoch: 8   Global Step: 139110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:28,479-Speed 3313.66 samples/sec   Loss 2.0449   LearningRate 0.0340   Epoch: 8   Global Step: 139120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:31,561-Speed 3323.01 samples/sec   Loss 1.9698   LearningRate 0.0340   Epoch: 8   Global Step: 139130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:34,640-Speed 3327.19 samples/sec   Loss 1.9747   LearningRate 0.0340   Epoch: 8   Global Step: 139140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:37,745-Speed 3298.70 samples/sec   Loss 1.9730   LearningRate 0.0340   Epoch: 8   Global Step: 139150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:40,825-Speed 3324.69 samples/sec   Loss 1.9330   LearningRate 0.0340   Epoch: 8   Global Step: 139160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:57:43,883-Speed 3350.53 samples/sec   Loss 1.9829   LearningRate 0.0340   Epoch: 8   Global Step: 139170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:57:46,972-Speed 3315.29 samples/sec   Loss 1.9929   LearningRate 0.0340   Epoch: 8   Global Step: 139180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:57:50,055-Speed 3322.44 samples/sec   Loss 2.0166   LearningRate 0.0340   Epoch: 8   Global Step: 139190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:57:53,141-Speed 3318.94 samples/sec   Loss 1.9399   LearningRate 0.0340   Epoch: 8   Global Step: 139200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:57:56,226-Speed 3319.41 samples/sec   Loss 2.0029   LearningRate 0.0340   Epoch: 8   Global Step: 139210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:57:59,333-Speed 3296.37 samples/sec   Loss 2.0252   LearningRate 0.0340   Epoch: 8   Global Step: 139220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:58:02,432-Speed 3305.74 samples/sec   Loss 2.0057   LearningRate 0.0340   Epoch: 8   Global Step: 139230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:58:05,528-Speed 3308.29 samples/sec   Loss 2.0647   LearningRate 0.0340   Epoch: 8   Global Step: 139240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:58:08,606-Speed 3327.38 samples/sec   Loss 1.9869   LearningRate 0.0340   Epoch: 8   Global Step: 139250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:58:11,708-Speed 3301.43 samples/sec   Loss 1.9656   LearningRate 0.0340   Epoch: 8   Global Step: 139260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:58:14,801-Speed 3311.99 samples/sec   Loss 2.0390   LearningRate 0.0340   Epoch: 8   Global Step: 139270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:17,880-Speed 3326.55 samples/sec   Loss 2.0209   LearningRate 0.0340   Epoch: 8   Global Step: 139280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:20,956-Speed 3329.82 samples/sec   Loss 1.9862   LearningRate 0.0340   Epoch: 8   Global Step: 139290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:24,037-Speed 3324.21 samples/sec   Loss 2.0067   LearningRate 0.0340   Epoch: 8   Global Step: 139300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:27,120-Speed 3322.26 samples/sec   Loss 1.9882   LearningRate 0.0340   Epoch: 8   Global Step: 139310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:30,203-Speed 3322.20 samples/sec   Loss 1.9892   LearningRate 0.0339   Epoch: 8   Global Step: 139320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:33,286-Speed 3322.72 samples/sec   Loss 1.9446   LearningRate 0.0339   Epoch: 8   Global Step: 139330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:36,403-Speed 3285.91 samples/sec   Loss 1.9457   LearningRate 0.0339   Epoch: 8   Global Step: 139340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:39,501-Speed 3305.44 samples/sec   Loss 2.0547   LearningRate 0.0339   Epoch: 8   Global Step: 139350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:42,591-Speed 3315.57 samples/sec   Loss 1.9753   LearningRate 0.0339   Epoch: 8   Global Step: 139360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:45,675-Speed 3320.90 samples/sec   Loss 1.9754   LearningRate 0.0339   Epoch: 8   Global Step: 139370   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 13:58:48,748-Speed 3333.16 samples/sec   Loss 1.9595   LearningRate 0.0339   Epoch: 8   Global Step: 139380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:51,826-Speed 3326.61 samples/sec   Loss 1.9569   LearningRate 0.0339   Epoch: 8   Global Step: 139390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:58:54,900-Speed 3333.41 samples/sec   Loss 2.0420   LearningRate 0.0339   Epoch: 8   Global Step: 139400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:58:57,987-Speed 3317.95 samples/sec   Loss 1.9914   LearningRate 0.0339   Epoch: 8   Global Step: 139410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:59:01,077-Speed 3314.44 samples/sec   Loss 1.9895   LearningRate 0.0339   Epoch: 8   Global Step: 139420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:59:04,152-Speed 3330.59 samples/sec   Loss 2.0045   LearningRate 0.0339   Epoch: 8   Global Step: 139430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:59:07,231-Speed 3326.81 samples/sec   Loss 1.9654   LearningRate 0.0339   Epoch: 8   Global Step: 139440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:59:10,305-Speed 3331.49 samples/sec   Loss 2.0129   LearningRate 0.0339   Epoch: 8   Global Step: 139450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:59:13,386-Speed 3324.89 samples/sec   Loss 2.0625   LearningRate 0.0339   Epoch: 8   Global Step: 139460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:59:16,472-Speed 3319.20 samples/sec   Loss 2.0569   LearningRate 0.0339   Epoch: 8   Global Step: 139470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:59:19,553-Speed 3324.49 samples/sec   Loss 2.0649   LearningRate 0.0339   Epoch: 8   Global Step: 139480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:59:22,636-Speed 3322.33 samples/sec   Loss 2.0072   LearningRate 0.0339   Epoch: 8   Global Step: 139490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-11 13:59:25,711-Speed 3330.66 samples/sec   Loss 2.0428   LearningRate 0.0339   Epoch: 8   Global Step: 139500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:28,846-Speed 3267.63 samples/sec   Loss 1.9723   LearningRate 0.0339   Epoch: 8   Global Step: 139510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:32,030-Speed 3216.95 samples/sec   Loss 1.9918   LearningRate 0.0339   Epoch: 8   Global Step: 139520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:35,229-Speed 3200.88 samples/sec   Loss 2.0063   LearningRate 0.0339   Epoch: 8   Global Step: 139530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:38,320-Speed 3314.11 samples/sec   Loss 1.9877   LearningRate 0.0339   Epoch: 8   Global Step: 139540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:41,394-Speed 3331.33 samples/sec   Loss 2.0747   LearningRate 0.0339   Epoch: 8   Global Step: 139550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:44,482-Speed 3316.98 samples/sec   Loss 2.0101   LearningRate 0.0339   Epoch: 8   Global Step: 139560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:47,558-Speed 3329.70 samples/sec   Loss 2.0776   LearningRate 0.0339   Epoch: 8   Global Step: 139570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:50,642-Speed 3321.20 samples/sec   Loss 2.0462   LearningRate 0.0339   Epoch: 8   Global Step: 139580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:53,724-Speed 3324.03 samples/sec   Loss 1.9903   LearningRate 0.0339   Epoch: 8   Global Step: 139590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:56,795-Speed 3334.99 samples/sec   Loss 2.0279   LearningRate 0.0339   Epoch: 8   Global Step: 139600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 13:59:59,871-Speed 3330.29 samples/sec   Loss 1.9488   LearningRate 0.0338   Epoch: 8   Global Step: 139610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:00:02,953-Speed 3323.51 samples/sec   Loss 1.9930   LearningRate 0.0338   Epoch: 8   Global Step: 139620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:00:06,031-Speed 3326.95 samples/sec   Loss 2.0149   LearningRate 0.0338   Epoch: 8   Global Step: 139630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:00:09,183-Speed 3249.12 samples/sec   Loss 1.9897   LearningRate 0.0338   Epoch: 8   Global Step: 139640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:00:12,289-Speed 3297.98 samples/sec   Loss 2.0415   LearningRate 0.0338   Epoch: 8   Global Step: 139650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:00:15,407-Speed 3285.30 samples/sec   Loss 1.9818   LearningRate 0.0338   Epoch: 8   Global Step: 139660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:00:18,530-Speed 3279.08 samples/sec   Loss 1.9684   LearningRate 0.0338   Epoch: 8   Global Step: 139670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:00:21,631-Speed 3303.60 samples/sec   Loss 1.9845   LearningRate 0.0338   Epoch: 8   Global Step: 139680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:00:24,705-Speed 3331.52 samples/sec   Loss 2.0174   LearningRate 0.0338   Epoch: 8   Global Step: 139690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:00:27,781-Speed 3330.32 samples/sec   Loss 1.9978   LearningRate 0.0338   Epoch: 8   Global Step: 139700   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-11 14:00:30,846-Speed 3341.24 samples/sec   Loss 1.9682   LearningRate 0.0338   Epoch: 8   Global Step: 139710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:00:33,925-Speed 3326.18 samples/sec   Loss 2.0350   LearningRate 0.0338   Epoch: 8   Global Step: 139720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:00:37,014-Speed 3316.25 samples/sec   Loss 1.9924   LearningRate 0.0338   Epoch: 8   Global Step: 139730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:00:40,101-Speed 3318.12 samples/sec   Loss 1.9993   LearningRate 0.0338   Epoch: 8   Global Step: 139740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:00:43,190-Speed 3315.08 samples/sec   Loss 2.0101   LearningRate 0.0338   Epoch: 8   Global Step: 139750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:00:46,282-Speed 3312.53 samples/sec   Loss 2.0126   LearningRate 0.0338   Epoch: 8   Global Step: 139760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:00:49,357-Speed 3331.52 samples/sec   Loss 2.0227   LearningRate 0.0338   Epoch: 8   Global Step: 139770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:00:52,441-Speed 3320.64 samples/sec   Loss 2.0172   LearningRate 0.0338   Epoch: 8   Global Step: 139780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:00:55,545-Speed 3299.96 samples/sec   Loss 1.9865   LearningRate 0.0338   Epoch: 8   Global Step: 139790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:00:58,637-Speed 3312.29 samples/sec   Loss 2.0197   LearningRate 0.0338   Epoch: 8   Global Step: 139800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:01:01,704-Speed 3339.83 samples/sec   Loss 2.0515   LearningRate 0.0338   Epoch: 8   Global Step: 139810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:01:04,784-Speed 3325.74 samples/sec   Loss 1.9935   LearningRate 0.0338   Epoch: 8   Global Step: 139820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:01:07,868-Speed 3320.15 samples/sec   Loss 1.9851   LearningRate 0.0338   Epoch: 8   Global Step: 139830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:10,975-Speed 3296.68 samples/sec   Loss 2.0010   LearningRate 0.0338   Epoch: 8   Global Step: 139840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:14,055-Speed 3325.89 samples/sec   Loss 2.0091   LearningRate 0.0338   Epoch: 8   Global Step: 139850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:17,154-Speed 3305.31 samples/sec   Loss 2.0774   LearningRate 0.0338   Epoch: 8   Global Step: 139860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:20,233-Speed 3326.94 samples/sec   Loss 2.0195   LearningRate 0.0338   Epoch: 8   Global Step: 139870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:23,312-Speed 3326.02 samples/sec   Loss 1.9813   LearningRate 0.0338   Epoch: 8   Global Step: 139880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:26,391-Speed 3326.14 samples/sec   Loss 2.0160   LearningRate 0.0337   Epoch: 8   Global Step: 139890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:29,475-Speed 3321.91 samples/sec   Loss 2.0069   LearningRate 0.0337   Epoch: 8   Global Step: 139900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:32,553-Speed 3326.91 samples/sec   Loss 1.9718   LearningRate 0.0337   Epoch: 8   Global Step: 139910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:35,631-Speed 3327.34 samples/sec   Loss 1.9735   LearningRate 0.0337   Epoch: 8   Global Step: 139920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:01:38,714-Speed 3322.66 samples/sec   Loss 1.9849   LearningRate 0.0337   Epoch: 8   Global Step: 139930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:01:41,836-Speed 3281.30 samples/sec   Loss 2.0083   LearningRate 0.0337   Epoch: 8   Global Step: 139940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:01:44,947-Speed 3292.55 samples/sec   Loss 2.0277   LearningRate 0.0337   Epoch: 8   Global Step: 139950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:01:48,056-Speed 3294.37 samples/sec   Loss 2.0095   LearningRate 0.0337   Epoch: 8   Global Step: 139960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:01:51,147-Speed 3313.61 samples/sec   Loss 2.0838   LearningRate 0.0337   Epoch: 8   Global Step: 139970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:01:54,235-Speed 3316.85 samples/sec   Loss 1.9919   LearningRate 0.0337   Epoch: 8   Global Step: 139980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:01:57,311-Speed 3329.39 samples/sec   Loss 1.9756   LearningRate 0.0337   Epoch: 8   Global Step: 139990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:02:00,479-Speed 3232.85 samples/sec   Loss 2.0156   LearningRate 0.0337   Epoch: 8   Global Step: 140000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:02:45,022-[lfw][140000]XNorm: 22.855251
Training: 2022-04-11 14:02:45,022-[lfw][140000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 14:02:45,023-[lfw][140000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:03:36,631-[cfp_fp][140000]XNorm: 22.145720
Training: 2022-04-11 14:03:36,632-[cfp_fp][140000]Accuracy-Flip: 0.98800+-0.00539
Training: 2022-04-11 14:03:36,632-[cfp_fp][140000]Accuracy-Highest: 0.98814
Training: 2022-04-11 14:04:21,010-[agedb_30][140000]XNorm: 23.233909
Training: 2022-04-11 14:04:21,011-[agedb_30][140000]Accuracy-Flip: 0.98133+-0.00614
Training: 2022-04-11 14:04:21,011-[agedb_30][140000]Accuracy-Highest: 0.98317
Training: 2022-04-11 14:04:24,106-Speed 71.30 samples/sec   Loss 2.0207   LearningRate 0.0337   Epoch: 8   Global Step: 140010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:27,203-Speed 3306.76 samples/sec   Loss 2.0603   LearningRate 0.0337   Epoch: 8   Global Step: 140020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:30,282-Speed 3326.51 samples/sec   Loss 1.9810   LearningRate 0.0337   Epoch: 8   Global Step: 140030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:33,347-Speed 3341.70 samples/sec   Loss 1.9925   LearningRate 0.0337   Epoch: 8   Global Step: 140040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:36,422-Speed 3330.69 samples/sec   Loss 2.0281   LearningRate 0.0337   Epoch: 8   Global Step: 140050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:39,509-Speed 3318.62 samples/sec   Loss 2.0113   LearningRate 0.0337   Epoch: 8   Global Step: 140060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:42,574-Speed 3341.64 samples/sec   Loss 1.9876   LearningRate 0.0337   Epoch: 8   Global Step: 140070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:45,659-Speed 3320.25 samples/sec   Loss 2.0171   LearningRate 0.0337   Epoch: 8   Global Step: 140080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:48,749-Speed 3314.44 samples/sec   Loss 2.0388   LearningRate 0.0337   Epoch: 8   Global Step: 140090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:51,822-Speed 3333.30 samples/sec   Loss 2.0413   LearningRate 0.0337   Epoch: 8   Global Step: 140100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-11 14:04:54,898-Speed 3329.01 samples/sec   Loss 2.0745   LearningRate 0.0337   Epoch: 8   Global Step: 140110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:04:57,964-Speed 3340.82 samples/sec   Loss 1.9627   LearningRate 0.0337   Epoch: 8   Global Step: 140120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:01,023-Speed 3348.09 samples/sec   Loss 2.0125   LearningRate 0.0337   Epoch: 8   Global Step: 140130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:04,081-Speed 3350.49 samples/sec   Loss 2.0444   LearningRate 0.0337   Epoch: 8   Global Step: 140140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:07,151-Speed 3336.25 samples/sec   Loss 2.0298   LearningRate 0.0337   Epoch: 8   Global Step: 140150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:10,223-Speed 3333.80 samples/sec   Loss 2.0652   LearningRate 0.0337   Epoch: 8   Global Step: 140160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:13,311-Speed 3316.90 samples/sec   Loss 2.0885   LearningRate 0.0337   Epoch: 8   Global Step: 140170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:16,385-Speed 3331.61 samples/sec   Loss 2.0638   LearningRate 0.0336   Epoch: 8   Global Step: 140180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:19,455-Speed 3335.92 samples/sec   Loss 2.0024   LearningRate 0.0336   Epoch: 8   Global Step: 140190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:22,529-Speed 3331.69 samples/sec   Loss 2.0373   LearningRate 0.0336   Epoch: 8   Global Step: 140200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:25,623-Speed 3310.90 samples/sec   Loss 2.0252   LearningRate 0.0336   Epoch: 8   Global Step: 140210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:28,773-Speed 3251.65 samples/sec   Loss 2.1126   LearningRate 0.0336   Epoch: 8   Global Step: 140220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:31,979-Speed 3194.72 samples/sec   Loss 2.0676   LearningRate 0.0336   Epoch: 8   Global Step: 140230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:05:35,053-Speed 3332.78 samples/sec   Loss 2.0580   LearningRate 0.0336   Epoch: 8   Global Step: 140240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:38,139-Speed 3318.25 samples/sec   Loss 2.1144   LearningRate 0.0336   Epoch: 8   Global Step: 140250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:41,230-Speed 3313.06 samples/sec   Loss 2.0289   LearningRate 0.0336   Epoch: 8   Global Step: 140260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:44,301-Speed 3335.00 samples/sec   Loss 2.0349   LearningRate 0.0336   Epoch: 8   Global Step: 140270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:47,373-Speed 3334.93 samples/sec   Loss 2.0711   LearningRate 0.0336   Epoch: 8   Global Step: 140280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:50,452-Speed 3326.51 samples/sec   Loss 2.0132   LearningRate 0.0336   Epoch: 8   Global Step: 140290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:53,600-Speed 3253.07 samples/sec   Loss 2.0803   LearningRate 0.0336   Epoch: 8   Global Step: 140300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:56,674-Speed 3332.51 samples/sec   Loss 2.0135   LearningRate 0.0336   Epoch: 8   Global Step: 140310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:05:59,755-Speed 3324.16 samples/sec   Loss 2.0724   LearningRate 0.0336   Epoch: 8   Global Step: 140320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:02,833-Speed 3327.91 samples/sec   Loss 2.0956   LearningRate 0.0336   Epoch: 8   Global Step: 140330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:05,923-Speed 3314.89 samples/sec   Loss 1.9990   LearningRate 0.0336   Epoch: 8   Global Step: 140340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:09,143-Speed 3180.13 samples/sec   Loss 2.0422   LearningRate 0.0336   Epoch: 8   Global Step: 140350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:12,234-Speed 3313.47 samples/sec   Loss 2.0564   LearningRate 0.0336   Epoch: 8   Global Step: 140360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:15,355-Speed 3281.52 samples/sec   Loss 1.9873   LearningRate 0.0336   Epoch: 8   Global Step: 140370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:18,456-Speed 3303.03 samples/sec   Loss 2.1087   LearningRate 0.0336   Epoch: 8   Global Step: 140380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:21,536-Speed 3325.78 samples/sec   Loss 2.0482   LearningRate 0.0336   Epoch: 8   Global Step: 140390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:24,692-Speed 3245.48 samples/sec   Loss 1.9924   LearningRate 0.0336   Epoch: 8   Global Step: 140400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:27,800-Speed 3295.56 samples/sec   Loss 2.0622   LearningRate 0.0336   Epoch: 8   Global Step: 140410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:30,895-Speed 3309.45 samples/sec   Loss 2.0611   LearningRate 0.0336   Epoch: 8   Global Step: 140420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:34,110-Speed 3186.17 samples/sec   Loss 2.0199   LearningRate 0.0336   Epoch: 8   Global Step: 140430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:37,275-Speed 3236.04 samples/sec   Loss 2.0843   LearningRate 0.0336   Epoch: 8   Global Step: 140440   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-11 14:06:40,336-Speed 3345.94 samples/sec   Loss 2.0932   LearningRate 0.0336   Epoch: 8   Global Step: 140450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:43,419-Speed 3321.61 samples/sec   Loss 2.0194   LearningRate 0.0336   Epoch: 8   Global Step: 140460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:46,499-Speed 3325.33 samples/sec   Loss 2.0461   LearningRate 0.0335   Epoch: 8   Global Step: 140470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:49,573-Speed 3331.99 samples/sec   Loss 2.0947   LearningRate 0.0335   Epoch: 8   Global Step: 140480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:52,658-Speed 3320.08 samples/sec   Loss 2.0256   LearningRate 0.0335   Epoch: 8   Global Step: 140490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:55,730-Speed 3334.13 samples/sec   Loss 2.0397   LearningRate 0.0335   Epoch: 8   Global Step: 140500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:06:58,808-Speed 3328.37 samples/sec   Loss 2.0262   LearningRate 0.0335   Epoch: 8   Global Step: 140510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:07:01,881-Speed 3332.96 samples/sec   Loss 1.9554   LearningRate 0.0335   Epoch: 8   Global Step: 140520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:07:04,952-Speed 3334.98 samples/sec   Loss 2.1137   LearningRate 0.0335   Epoch: 8   Global Step: 140530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:07:08,022-Speed 3336.29 samples/sec   Loss 2.0768   LearningRate 0.0335   Epoch: 8   Global Step: 140540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:11,128-Speed 3297.11 samples/sec   Loss 1.9901   LearningRate 0.0335   Epoch: 8   Global Step: 140550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:14,206-Speed 3327.97 samples/sec   Loss 2.0184   LearningRate 0.0335   Epoch: 8   Global Step: 140560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:17,289-Speed 3322.55 samples/sec   Loss 2.0243   LearningRate 0.0335   Epoch: 8   Global Step: 140570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:20,361-Speed 3334.65 samples/sec   Loss 2.0480   LearningRate 0.0335   Epoch: 8   Global Step: 140580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:23,429-Speed 3338.20 samples/sec   Loss 2.1023   LearningRate 0.0335   Epoch: 8   Global Step: 140590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:26,508-Speed 3326.38 samples/sec   Loss 2.0048   LearningRate 0.0335   Epoch: 8   Global Step: 140600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:29,583-Speed 3332.09 samples/sec   Loss 2.0091   LearningRate 0.0335   Epoch: 8   Global Step: 140610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:32,670-Speed 3317.90 samples/sec   Loss 2.0839   LearningRate 0.0335   Epoch: 8   Global Step: 140620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:35,758-Speed 3316.33 samples/sec   Loss 2.0573   LearningRate 0.0335   Epoch: 8   Global Step: 140630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:38,842-Speed 3320.66 samples/sec   Loss 2.0867   LearningRate 0.0335   Epoch: 8   Global Step: 140640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:07:41,910-Speed 3338.45 samples/sec   Loss 2.0352   LearningRate 0.0335   Epoch: 8   Global Step: 140650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:07:45,001-Speed 3313.51 samples/sec   Loss 2.0109   LearningRate 0.0335   Epoch: 8   Global Step: 140660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:07:48,080-Speed 3326.88 samples/sec   Loss 2.0305   LearningRate 0.0335   Epoch: 8   Global Step: 140670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:07:51,147-Speed 3339.72 samples/sec   Loss 2.0727   LearningRate 0.0335   Epoch: 8   Global Step: 140680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:54,242-Speed 3309.20 samples/sec   Loss 1.9863   LearningRate 0.0335   Epoch: 8   Global Step: 140690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:07:57,328-Speed 3318.86 samples/sec   Loss 2.0148   LearningRate 0.0335   Epoch: 8   Global Step: 140700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:00,410-Speed 3323.47 samples/sec   Loss 2.0298   LearningRate 0.0335   Epoch: 8   Global Step: 140710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:03,490-Speed 3325.17 samples/sec   Loss 1.9672   LearningRate 0.0335   Epoch: 8   Global Step: 140720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:06,685-Speed 3205.62 samples/sec   Loss 1.9797   LearningRate 0.0335   Epoch: 8   Global Step: 140730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:09,798-Speed 3290.66 samples/sec   Loss 2.0947   LearningRate 0.0335   Epoch: 8   Global Step: 140740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:12,871-Speed 3333.28 samples/sec   Loss 2.0562   LearningRate 0.0335   Epoch: 8   Global Step: 140750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:15,961-Speed 3314.44 samples/sec   Loss 2.1233   LearningRate 0.0334   Epoch: 8   Global Step: 140760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:19,037-Speed 3331.58 samples/sec   Loss 1.9535   LearningRate 0.0334   Epoch: 8   Global Step: 140770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:22,091-Speed 3353.45 samples/sec   Loss 2.0319   LearningRate 0.0334   Epoch: 8   Global Step: 140780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:25,167-Speed 3329.38 samples/sec   Loss 2.0978   LearningRate 0.0334   Epoch: 8   Global Step: 140790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:28,263-Speed 3308.59 samples/sec   Loss 2.0402   LearningRate 0.0334   Epoch: 8   Global Step: 140800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:31,337-Speed 3331.94 samples/sec   Loss 1.9571   LearningRate 0.0334   Epoch: 8   Global Step: 140810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:34,425-Speed 3316.79 samples/sec   Loss 1.9708   LearningRate 0.0334   Epoch: 8   Global Step: 140820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:37,498-Speed 3334.00 samples/sec   Loss 1.9880   LearningRate 0.0334   Epoch: 8   Global Step: 140830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:40,575-Speed 3328.55 samples/sec   Loss 2.0368   LearningRate 0.0334   Epoch: 8   Global Step: 140840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:43,645-Speed 3336.46 samples/sec   Loss 2.0319   LearningRate 0.0334   Epoch: 8   Global Step: 140850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:46,715-Speed 3335.85 samples/sec   Loss 2.0086   LearningRate 0.0334   Epoch: 8   Global Step: 140860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:49,784-Speed 3336.76 samples/sec   Loss 2.0832   LearningRate 0.0334   Epoch: 8   Global Step: 140870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:08:52,863-Speed 3327.49 samples/sec   Loss 2.0807   LearningRate 0.0334   Epoch: 8   Global Step: 140880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:08:55,949-Speed 3317.87 samples/sec   Loss 2.0357   LearningRate 0.0334   Epoch: 8   Global Step: 140890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:08:59,060-Speed 3293.05 samples/sec   Loss 1.9857   LearningRate 0.0334   Epoch: 8   Global Step: 140900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:09:02,149-Speed 3314.84 samples/sec   Loss 1.9939   LearningRate 0.0334   Epoch: 8   Global Step: 140910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:09:05,235-Speed 3319.92 samples/sec   Loss 1.9867   LearningRate 0.0334   Epoch: 8   Global Step: 140920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:09:08,333-Speed 3306.51 samples/sec   Loss 2.0056   LearningRate 0.0334   Epoch: 8   Global Step: 140930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:09:11,404-Speed 3334.29 samples/sec   Loss 2.0275   LearningRate 0.0334   Epoch: 8   Global Step: 140940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:09:14,476-Speed 3334.91 samples/sec   Loss 2.0088   LearningRate 0.0334   Epoch: 8   Global Step: 140950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:09:17,556-Speed 3324.58 samples/sec   Loss 1.9766   LearningRate 0.0334   Epoch: 8   Global Step: 140960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:09:20,637-Speed 3324.67 samples/sec   Loss 2.0045   LearningRate 0.0334   Epoch: 8   Global Step: 140970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:09:23,732-Speed 3308.79 samples/sec   Loss 2.0675   LearningRate 0.0334   Epoch: 8   Global Step: 140980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:09:26,828-Speed 3309.27 samples/sec   Loss 2.0591   LearningRate 0.0334   Epoch: 8   Global Step: 140990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:09:29,909-Speed 3324.62 samples/sec   Loss 2.0619   LearningRate 0.0334   Epoch: 8   Global Step: 141000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:09:33,011-Speed 3300.91 samples/sec   Loss 2.0388   LearningRate 0.0334   Epoch: 8   Global Step: 141010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:09:36,122-Speed 3293.13 samples/sec   Loss 2.0483   LearningRate 0.0334   Epoch: 8   Global Step: 141020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:09:39,206-Speed 3321.17 samples/sec   Loss 1.9909   LearningRate 0.0334   Epoch: 8   Global Step: 141030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:09:42,299-Speed 3310.86 samples/sec   Loss 2.0759   LearningRate 0.0334   Epoch: 8   Global Step: 141040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:09:45,377-Speed 3328.22 samples/sec   Loss 2.0485   LearningRate 0.0333   Epoch: 8   Global Step: 141050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:09:48,507-Speed 3271.92 samples/sec   Loss 2.0197   LearningRate 0.0333   Epoch: 8   Global Step: 141060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:09:51,585-Speed 3327.60 samples/sec   Loss 2.0280   LearningRate 0.0333   Epoch: 8   Global Step: 141070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:09:54,656-Speed 3334.63 samples/sec   Loss 2.0168   LearningRate 0.0333   Epoch: 8   Global Step: 141080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:09:57,758-Speed 3303.16 samples/sec   Loss 2.0172   LearningRate 0.0333   Epoch: 8   Global Step: 141090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:10:00,833-Speed 3330.87 samples/sec   Loss 2.0303   LearningRate 0.0333   Epoch: 8   Global Step: 141100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:10:03,964-Speed 3271.04 samples/sec   Loss 2.0250   LearningRate 0.0333   Epoch: 8   Global Step: 141110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:10:07,096-Speed 3269.47 samples/sec   Loss 1.9928   LearningRate 0.0333   Epoch: 8   Global Step: 141120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:10:10,238-Speed 3260.66 samples/sec   Loss 2.0349   LearningRate 0.0333   Epoch: 8   Global Step: 141130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:13,335-Speed 3307.00 samples/sec   Loss 2.0528   LearningRate 0.0333   Epoch: 8   Global Step: 141140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:16,418-Speed 3321.86 samples/sec   Loss 2.0593   LearningRate 0.0333   Epoch: 8   Global Step: 141150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:19,535-Speed 3286.41 samples/sec   Loss 2.0711   LearningRate 0.0333   Epoch: 8   Global Step: 141160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:22,611-Speed 3329.66 samples/sec   Loss 2.0974   LearningRate 0.0333   Epoch: 8   Global Step: 141170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:25,685-Speed 3332.37 samples/sec   Loss 2.0297   LearningRate 0.0333   Epoch: 8   Global Step: 141180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:28,761-Speed 3329.13 samples/sec   Loss 2.0867   LearningRate 0.0333   Epoch: 8   Global Step: 141190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:31,835-Speed 3332.29 samples/sec   Loss 1.9971   LearningRate 0.0333   Epoch: 8   Global Step: 141200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:35,070-Speed 3166.24 samples/sec   Loss 1.9970   LearningRate 0.0333   Epoch: 8   Global Step: 141210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:38,146-Speed 3329.57 samples/sec   Loss 1.9979   LearningRate 0.0333   Epoch: 8   Global Step: 141220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:10:41,225-Speed 3326.95 samples/sec   Loss 1.9961   LearningRate 0.0333   Epoch: 8   Global Step: 141230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:10:44,323-Speed 3306.34 samples/sec   Loss 2.0822   LearningRate 0.0333   Epoch: 8   Global Step: 141240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:10:47,400-Speed 3328.10 samples/sec   Loss 2.0661   LearningRate 0.0333   Epoch: 8   Global Step: 141250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:10:50,482-Speed 3323.88 samples/sec   Loss 2.0723   LearningRate 0.0333   Epoch: 8   Global Step: 141260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:10:53,628-Speed 3255.57 samples/sec   Loss 2.0395   LearningRate 0.0333   Epoch: 8   Global Step: 141270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:10:56,706-Speed 3327.81 samples/sec   Loss 2.0337   LearningRate 0.0333   Epoch: 8   Global Step: 141280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:10:59,794-Speed 3316.81 samples/sec   Loss 2.0722   LearningRate 0.0333   Epoch: 8   Global Step: 141290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:02,905-Speed 3291.80 samples/sec   Loss 2.0344   LearningRate 0.0333   Epoch: 8   Global Step: 141300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:05,986-Speed 3324.03 samples/sec   Loss 2.0451   LearningRate 0.0333   Epoch: 8   Global Step: 141310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:09,068-Speed 3323.33 samples/sec   Loss 2.0019   LearningRate 0.0333   Epoch: 8   Global Step: 141320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:12,156-Speed 3316.72 samples/sec   Loss 2.0218   LearningRate 0.0332   Epoch: 8   Global Step: 141330   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-11 14:11:15,227-Speed 3336.20 samples/sec   Loss 2.0889   LearningRate 0.0332   Epoch: 8   Global Step: 141340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:18,315-Speed 3316.30 samples/sec   Loss 2.0361   LearningRate 0.0332   Epoch: 8   Global Step: 141350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:21,393-Speed 3328.10 samples/sec   Loss 1.9845   LearningRate 0.0332   Epoch: 8   Global Step: 141360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:24,483-Speed 3314.23 samples/sec   Loss 2.0135   LearningRate 0.0332   Epoch: 8   Global Step: 141370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:27,570-Speed 3318.45 samples/sec   Loss 2.0699   LearningRate 0.0332   Epoch: 8   Global Step: 141380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:30,659-Speed 3315.90 samples/sec   Loss 1.9929   LearningRate 0.0332   Epoch: 8   Global Step: 141390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:33,741-Speed 3323.13 samples/sec   Loss 2.0143   LearningRate 0.0332   Epoch: 8   Global Step: 141400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:36,826-Speed 3319.53 samples/sec   Loss 2.0148   LearningRate 0.0332   Epoch: 8   Global Step: 141410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:39,907-Speed 3324.98 samples/sec   Loss 2.0092   LearningRate 0.0332   Epoch: 8   Global Step: 141420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:42,991-Speed 3320.57 samples/sec   Loss 2.0452   LearningRate 0.0332   Epoch: 8   Global Step: 141430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:46,059-Speed 3339.00 samples/sec   Loss 2.0654   LearningRate 0.0332   Epoch: 8   Global Step: 141440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:49,156-Speed 3307.57 samples/sec   Loss 2.0520   LearningRate 0.0332   Epoch: 8   Global Step: 141450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:52,248-Speed 3311.81 samples/sec   Loss 1.9407   LearningRate 0.0332   Epoch: 8   Global Step: 141460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:55,333-Speed 3320.98 samples/sec   Loss 2.0540   LearningRate 0.0332   Epoch: 8   Global Step: 141470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:11:58,438-Speed 3298.68 samples/sec   Loss 2.0496   LearningRate 0.0332   Epoch: 8   Global Step: 141480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:12:01,603-Speed 3236.00 samples/sec   Loss 2.0616   LearningRate 0.0332   Epoch: 8   Global Step: 141490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:12:04,682-Speed 3326.34 samples/sec   Loss 2.0212   LearningRate 0.0332   Epoch: 8   Global Step: 141500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:12:07,780-Speed 3305.50 samples/sec   Loss 2.0613   LearningRate 0.0332   Epoch: 8   Global Step: 141510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:12:10,854-Speed 3332.27 samples/sec   Loss 2.0654   LearningRate 0.0332   Epoch: 8   Global Step: 141520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:12:13,935-Speed 3325.35 samples/sec   Loss 2.0662   LearningRate 0.0332   Epoch: 8   Global Step: 141530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:12:17,006-Speed 3335.12 samples/sec   Loss 2.0732   LearningRate 0.0332   Epoch: 8   Global Step: 141540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:12:20,085-Speed 3326.14 samples/sec   Loss 2.0799   LearningRate 0.0332   Epoch: 8   Global Step: 141550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:12:23,165-Speed 3325.19 samples/sec   Loss 2.0564   LearningRate 0.0332   Epoch: 8   Global Step: 141560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:12:26,236-Speed 3335.64 samples/sec   Loss 2.1029   LearningRate 0.0332   Epoch: 8   Global Step: 141570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:29,313-Speed 3328.84 samples/sec   Loss 2.0197   LearningRate 0.0332   Epoch: 8   Global Step: 141580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:32,401-Speed 3316.68 samples/sec   Loss 2.1558   LearningRate 0.0332   Epoch: 8   Global Step: 141590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:35,490-Speed 3315.27 samples/sec   Loss 2.0700   LearningRate 0.0332   Epoch: 8   Global Step: 141600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:38,567-Speed 3330.16 samples/sec   Loss 2.0078   LearningRate 0.0332   Epoch: 8   Global Step: 141610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:41,643-Speed 3329.37 samples/sec   Loss 2.0582   LearningRate 0.0331   Epoch: 8   Global Step: 141620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:44,725-Speed 3324.37 samples/sec   Loss 2.0025   LearningRate 0.0331   Epoch: 8   Global Step: 141630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:47,799-Speed 3331.06 samples/sec   Loss 2.0974   LearningRate 0.0331   Epoch: 8   Global Step: 141640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:50,951-Speed 3249.48 samples/sec   Loss 2.0085   LearningRate 0.0331   Epoch: 8   Global Step: 141650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:54,198-Speed 3154.83 samples/sec   Loss 2.0265   LearningRate 0.0331   Epoch: 8   Global Step: 141660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:12:57,316-Speed 3284.94 samples/sec   Loss 2.0723   LearningRate 0.0331   Epoch: 8   Global Step: 141670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:00,404-Speed 3316.10 samples/sec   Loss 2.0199   LearningRate 0.0331   Epoch: 8   Global Step: 141680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:03,505-Speed 3303.79 samples/sec   Loss 2.0303   LearningRate 0.0331   Epoch: 8   Global Step: 141690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:06,594-Speed 3315.28 samples/sec   Loss 1.9650   LearningRate 0.0331   Epoch: 8   Global Step: 141700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:09,675-Speed 3324.97 samples/sec   Loss 2.0285   LearningRate 0.0331   Epoch: 8   Global Step: 141710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:12,768-Speed 3311.43 samples/sec   Loss 1.9747   LearningRate 0.0331   Epoch: 8   Global Step: 141720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:15,870-Speed 3301.96 samples/sec   Loss 1.9408   LearningRate 0.0331   Epoch: 8   Global Step: 141730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:18,950-Speed 3324.64 samples/sec   Loss 2.0707   LearningRate 0.0331   Epoch: 8   Global Step: 141740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:22,033-Speed 3322.83 samples/sec   Loss 2.1314   LearningRate 0.0331   Epoch: 8   Global Step: 141750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:25,108-Speed 3330.06 samples/sec   Loss 2.0009   LearningRate 0.0331   Epoch: 8   Global Step: 141760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:13:28,214-Speed 3297.97 samples/sec   Loss 2.0531   LearningRate 0.0331   Epoch: 8   Global Step: 141770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:31,311-Speed 3307.89 samples/sec   Loss 2.0909   LearningRate 0.0331   Epoch: 8   Global Step: 141780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:34,385-Speed 3332.36 samples/sec   Loss 2.0363   LearningRate 0.0331   Epoch: 8   Global Step: 141790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:37,461-Speed 3329.10 samples/sec   Loss 2.0742   LearningRate 0.0331   Epoch: 8   Global Step: 141800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:40,557-Speed 3308.23 samples/sec   Loss 2.0776   LearningRate 0.0331   Epoch: 8   Global Step: 141810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:43,633-Speed 3329.63 samples/sec   Loss 2.0213   LearningRate 0.0331   Epoch: 8   Global Step: 141820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:46,718-Speed 3320.34 samples/sec   Loss 2.0056   LearningRate 0.0331   Epoch: 8   Global Step: 141830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:49,838-Speed 3283.17 samples/sec   Loss 2.0659   LearningRate 0.0331   Epoch: 8   Global Step: 141840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:52,969-Speed 3271.25 samples/sec   Loss 2.0279   LearningRate 0.0331   Epoch: 8   Global Step: 141850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:56,049-Speed 3325.41 samples/sec   Loss 2.0371   LearningRate 0.0331   Epoch: 8   Global Step: 141860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:13:59,118-Speed 3337.43 samples/sec   Loss 2.0247   LearningRate 0.0331   Epoch: 8   Global Step: 141870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:14:02,207-Speed 3315.93 samples/sec   Loss 2.0514   LearningRate 0.0331   Epoch: 8   Global Step: 141880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:14:05,301-Speed 3310.00 samples/sec   Loss 2.1067   LearningRate 0.0331   Epoch: 8   Global Step: 141890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:14:08,382-Speed 3324.42 samples/sec   Loss 2.0563   LearningRate 0.0331   Epoch: 8   Global Step: 141900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:14:11,472-Speed 3314.55 samples/sec   Loss 2.1224   LearningRate 0.0330   Epoch: 8   Global Step: 141910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:14:14,548-Speed 3329.96 samples/sec   Loss 2.0526   LearningRate 0.0330   Epoch: 8   Global Step: 141920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:14:17,674-Speed 3275.80 samples/sec   Loss 2.0788   LearningRate 0.0330   Epoch: 8   Global Step: 141930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:14:20,757-Speed 3323.21 samples/sec   Loss 2.0554   LearningRate 0.0330   Epoch: 8   Global Step: 141940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:14:23,834-Speed 3328.22 samples/sec   Loss 2.0387   LearningRate 0.0330   Epoch: 8   Global Step: 141950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:14:26,913-Speed 3327.82 samples/sec   Loss 2.0545   LearningRate 0.0330   Epoch: 8   Global Step: 141960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:14:29,991-Speed 3326.52 samples/sec   Loss 1.9963   LearningRate 0.0330   Epoch: 8   Global Step: 141970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:14:33,084-Speed 3311.54 samples/sec   Loss 2.0722   LearningRate 0.0330   Epoch: 8   Global Step: 141980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:14:36,164-Speed 3326.11 samples/sec   Loss 2.0631   LearningRate 0.0330   Epoch: 8   Global Step: 141990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:14:39,258-Speed 3310.32 samples/sec   Loss 2.1019   LearningRate 0.0330   Epoch: 8   Global Step: 142000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:15:23,496-[lfw][142000]XNorm: 23.913055
Training: 2022-04-11 14:15:23,496-[lfw][142000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 14:15:23,497-[lfw][142000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:16:14,520-[cfp_fp][142000]XNorm: 23.087925
Training: 2022-04-11 14:16:14,521-[cfp_fp][142000]Accuracy-Flip: 0.98586+-0.00416
Training: 2022-04-11 14:16:14,521-[cfp_fp][142000]Accuracy-Highest: 0.98814
Training: 2022-04-11 14:16:58,390-[agedb_30][142000]XNorm: 24.038849
Training: 2022-04-11 14:16:58,391-[agedb_30][142000]Accuracy-Flip: 0.98133+-0.00636
Training: 2022-04-11 14:16:58,391-[agedb_30][142000]Accuracy-Highest: 0.98317
Training: 2022-04-11 14:17:01,475-Speed 72.00 samples/sec   Loss 2.0326   LearningRate 0.0330   Epoch: 8   Global Step: 142010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:17:04,557-Speed 3323.84 samples/sec   Loss 2.0425   LearningRate 0.0330   Epoch: 8   Global Step: 142020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:17:07,639-Speed 3322.84 samples/sec   Loss 2.0541   LearningRate 0.0330   Epoch: 8   Global Step: 142030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:17:10,706-Speed 3340.02 samples/sec   Loss 2.1192   LearningRate 0.0330   Epoch: 8   Global Step: 142040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:17:13,913-Speed 3193.62 samples/sec   Loss 2.0578   LearningRate 0.0330   Epoch: 8   Global Step: 142050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:16,995-Speed 3322.67 samples/sec   Loss 2.0680   LearningRate 0.0330   Epoch: 8   Global Step: 142060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:20,081-Speed 3319.31 samples/sec   Loss 2.1115   LearningRate 0.0330   Epoch: 8   Global Step: 142070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:23,181-Speed 3304.10 samples/sec   Loss 2.0472   LearningRate 0.0330   Epoch: 8   Global Step: 142080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:26,257-Speed 3329.35 samples/sec   Loss 2.0330   LearningRate 0.0330   Epoch: 8   Global Step: 142090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:29,325-Speed 3338.80 samples/sec   Loss 2.1118   LearningRate 0.0330   Epoch: 8   Global Step: 142100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:32,434-Speed 3294.72 samples/sec   Loss 2.0664   LearningRate 0.0330   Epoch: 8   Global Step: 142110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:35,566-Speed 3270.17 samples/sec   Loss 2.0712   LearningRate 0.0330   Epoch: 8   Global Step: 142120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:38,638-Speed 3334.00 samples/sec   Loss 2.0326   LearningRate 0.0330   Epoch: 8   Global Step: 142130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:41,715-Speed 3328.89 samples/sec   Loss 1.9755   LearningRate 0.0330   Epoch: 8   Global Step: 142140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:44,800-Speed 3319.99 samples/sec   Loss 2.0401   LearningRate 0.0330   Epoch: 8   Global Step: 142150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:47,885-Speed 3320.03 samples/sec   Loss 2.0836   LearningRate 0.0330   Epoch: 8   Global Step: 142160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:50,963-Speed 3326.76 samples/sec   Loss 2.0683   LearningRate 0.0330   Epoch: 8   Global Step: 142170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:54,039-Speed 3330.33 samples/sec   Loss 1.9732   LearningRate 0.0330   Epoch: 8   Global Step: 142180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:17:57,133-Speed 3310.10 samples/sec   Loss 2.0484   LearningRate 0.0330   Epoch: 8   Global Step: 142190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:00,209-Speed 3330.65 samples/sec   Loss 2.0030   LearningRate 0.0330   Epoch: 8   Global Step: 142200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:03,282-Speed 3332.60 samples/sec   Loss 2.0300   LearningRate 0.0329   Epoch: 8   Global Step: 142210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:06,355-Speed 3332.56 samples/sec   Loss 2.0560   LearningRate 0.0329   Epoch: 8   Global Step: 142220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:09,445-Speed 3315.41 samples/sec   Loss 2.0561   LearningRate 0.0329   Epoch: 8   Global Step: 142230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:12,590-Speed 3256.67 samples/sec   Loss 2.0561   LearningRate 0.0329   Epoch: 8   Global Step: 142240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:15,675-Speed 3319.72 samples/sec   Loss 2.0178   LearningRate 0.0329   Epoch: 8   Global Step: 142250   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-11 14:18:18,740-Speed 3341.57 samples/sec   Loss 2.0831   LearningRate 0.0329   Epoch: 8   Global Step: 142260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:21,815-Speed 3331.02 samples/sec   Loss 2.0022   LearningRate 0.0329   Epoch: 8   Global Step: 142270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:24,890-Speed 3331.66 samples/sec   Loss 2.0929   LearningRate 0.0329   Epoch: 8   Global Step: 142280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:27,969-Speed 3326.37 samples/sec   Loss 2.0677   LearningRate 0.0329   Epoch: 8   Global Step: 142290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:31,059-Speed 3313.97 samples/sec   Loss 2.0586   LearningRate 0.0329   Epoch: 8   Global Step: 142300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:34,135-Speed 3330.26 samples/sec   Loss 2.0567   LearningRate 0.0329   Epoch: 8   Global Step: 142310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:37,296-Speed 3240.32 samples/sec   Loss 2.0829   LearningRate 0.0329   Epoch: 8   Global Step: 142320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:40,474-Speed 3222.77 samples/sec   Loss 2.0270   LearningRate 0.0329   Epoch: 8   Global Step: 142330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:43,608-Speed 3267.73 samples/sec   Loss 2.0702   LearningRate 0.0329   Epoch: 8   Global Step: 142340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:46,740-Speed 3270.52 samples/sec   Loss 2.0865   LearningRate 0.0329   Epoch: 8   Global Step: 142350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:49,833-Speed 3312.01 samples/sec   Loss 2.0798   LearningRate 0.0329   Epoch: 8   Global Step: 142360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:52,919-Speed 3318.14 samples/sec   Loss 2.0259   LearningRate 0.0329   Epoch: 8   Global Step: 142370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:18:55,995-Speed 3329.79 samples/sec   Loss 2.0358   LearningRate 0.0329   Epoch: 8   Global Step: 142380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:18:59,067-Speed 3334.31 samples/sec   Loss 2.0316   LearningRate 0.0329   Epoch: 8   Global Step: 142390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:19:02,150-Speed 3322.04 samples/sec   Loss 2.0220   LearningRate 0.0329   Epoch: 8   Global Step: 142400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:19:05,220-Speed 3336.48 samples/sec   Loss 2.0103   LearningRate 0.0329   Epoch: 8   Global Step: 142410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:19:08,303-Speed 3321.91 samples/sec   Loss 2.0950   LearningRate 0.0329   Epoch: 8   Global Step: 142420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:19:11,379-Speed 3330.32 samples/sec   Loss 2.0153   LearningRate 0.0329   Epoch: 8   Global Step: 142430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:19:14,479-Speed 3303.89 samples/sec   Loss 2.0828   LearningRate 0.0329   Epoch: 8   Global Step: 142440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:19:17,589-Speed 3293.87 samples/sec   Loss 2.0258   LearningRate 0.0329   Epoch: 8   Global Step: 142450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:19:20,672-Speed 3321.46 samples/sec   Loss 2.1014   LearningRate 0.0329   Epoch: 8   Global Step: 142460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:19:23,750-Speed 3327.69 samples/sec   Loss 2.0396   LearningRate 0.0329   Epoch: 8   Global Step: 142470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:19:26,843-Speed 3312.05 samples/sec   Loss 2.0748   LearningRate 0.0329   Epoch: 8   Global Step: 142480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:29,918-Speed 3329.94 samples/sec   Loss 2.1082   LearningRate 0.0329   Epoch: 8   Global Step: 142490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:33,011-Speed 3311.96 samples/sec   Loss 2.0557   LearningRate 0.0328   Epoch: 8   Global Step: 142500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:36,085-Speed 3331.94 samples/sec   Loss 2.0298   LearningRate 0.0328   Epoch: 8   Global Step: 142510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:39,181-Speed 3307.97 samples/sec   Loss 2.0652   LearningRate 0.0328   Epoch: 8   Global Step: 142520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:42,261-Speed 3326.35 samples/sec   Loss 2.0020   LearningRate 0.0328   Epoch: 8   Global Step: 142530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:45,349-Speed 3316.56 samples/sec   Loss 2.0516   LearningRate 0.0328   Epoch: 8   Global Step: 142540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:48,451-Speed 3301.33 samples/sec   Loss 2.0458   LearningRate 0.0328   Epoch: 8   Global Step: 142550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:51,524-Speed 3333.67 samples/sec   Loss 2.0040   LearningRate 0.0328   Epoch: 8   Global Step: 142560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:54,601-Speed 3328.25 samples/sec   Loss 2.0487   LearningRate 0.0328   Epoch: 8   Global Step: 142570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:19:57,726-Speed 3277.23 samples/sec   Loss 2.0701   LearningRate 0.0328   Epoch: 8   Global Step: 142580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:00,846-Speed 3283.03 samples/sec   Loss 2.0394   LearningRate 0.0328   Epoch: 8   Global Step: 142590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:03,982-Speed 3265.91 samples/sec   Loss 2.0525   LearningRate 0.0328   Epoch: 8   Global Step: 142600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:07,061-Speed 3327.29 samples/sec   Loss 2.0300   LearningRate 0.0328   Epoch: 8   Global Step: 142610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:10,142-Speed 3323.98 samples/sec   Loss 2.0042   LearningRate 0.0328   Epoch: 8   Global Step: 142620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:13,217-Speed 3331.39 samples/sec   Loss 2.0726   LearningRate 0.0328   Epoch: 8   Global Step: 142630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:16,315-Speed 3306.18 samples/sec   Loss 2.0590   LearningRate 0.0328   Epoch: 8   Global Step: 142640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:19,390-Speed 3331.18 samples/sec   Loss 2.0205   LearningRate 0.0328   Epoch: 8   Global Step: 142650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:22,473-Speed 3321.29 samples/sec   Loss 2.0844   LearningRate 0.0328   Epoch: 8   Global Step: 142660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:25,557-Speed 3321.44 samples/sec   Loss 2.1102   LearningRate 0.0328   Epoch: 8   Global Step: 142670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:28,630-Speed 3333.24 samples/sec   Loss 2.0622   LearningRate 0.0328   Epoch: 8   Global Step: 142680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:20:31,721-Speed 3312.78 samples/sec   Loss 2.0446   LearningRate 0.0328   Epoch: 8   Global Step: 142690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:34,806-Speed 3321.17 samples/sec   Loss 2.0692   LearningRate 0.0328   Epoch: 8   Global Step: 142700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:37,880-Speed 3332.19 samples/sec   Loss 2.0399   LearningRate 0.0328   Epoch: 8   Global Step: 142710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:40,961-Speed 3323.57 samples/sec   Loss 2.0601   LearningRate 0.0328   Epoch: 8   Global Step: 142720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:44,039-Speed 3328.27 samples/sec   Loss 1.9995   LearningRate 0.0328   Epoch: 8   Global Step: 142730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:47,125-Speed 3319.53 samples/sec   Loss 2.0034   LearningRate 0.0328   Epoch: 8   Global Step: 142740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:50,198-Speed 3332.13 samples/sec   Loss 2.0901   LearningRate 0.0328   Epoch: 8   Global Step: 142750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:53,274-Speed 3330.36 samples/sec   Loss 2.0637   LearningRate 0.0328   Epoch: 8   Global Step: 142760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:56,351-Speed 3328.74 samples/sec   Loss 2.0155   LearningRate 0.0328   Epoch: 8   Global Step: 142770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:20:59,526-Speed 3225.65 samples/sec   Loss 2.1292   LearningRate 0.0328   Epoch: 8   Global Step: 142780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:02,778-Speed 3149.99 samples/sec   Loss 2.0118   LearningRate 0.0327   Epoch: 8   Global Step: 142790   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-11 14:21:05,843-Speed 3341.90 samples/sec   Loss 2.0272   LearningRate 0.0327   Epoch: 8   Global Step: 142800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:08,919-Speed 3330.00 samples/sec   Loss 2.0425   LearningRate 0.0327   Epoch: 8   Global Step: 142810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:11,992-Speed 3332.28 samples/sec   Loss 2.0853   LearningRate 0.0327   Epoch: 8   Global Step: 142820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:15,068-Speed 3330.25 samples/sec   Loss 2.0139   LearningRate 0.0327   Epoch: 8   Global Step: 142830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:18,153-Speed 3319.62 samples/sec   Loss 2.0604   LearningRate 0.0327   Epoch: 8   Global Step: 142840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:21,230-Speed 3329.23 samples/sec   Loss 2.0864   LearningRate 0.0327   Epoch: 8   Global Step: 142850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:24,315-Speed 3320.00 samples/sec   Loss 2.0273   LearningRate 0.0327   Epoch: 8   Global Step: 142860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:27,404-Speed 3315.34 samples/sec   Loss 2.0622   LearningRate 0.0327   Epoch: 8   Global Step: 142870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:30,481-Speed 3328.83 samples/sec   Loss 2.0016   LearningRate 0.0327   Epoch: 8   Global Step: 142880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:33,557-Speed 3329.89 samples/sec   Loss 2.0700   LearningRate 0.0327   Epoch: 8   Global Step: 142890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:36,632-Speed 3330.62 samples/sec   Loss 2.0426   LearningRate 0.0327   Epoch: 8   Global Step: 142900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:39,707-Speed 3331.33 samples/sec   Loss 2.0660   LearningRate 0.0327   Epoch: 8   Global Step: 142910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:21:42,776-Speed 3337.48 samples/sec   Loss 2.0388   LearningRate 0.0327   Epoch: 8   Global Step: 142920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:21:45,850-Speed 3332.11 samples/sec   Loss 2.0706   LearningRate 0.0327   Epoch: 8   Global Step: 142930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:21:48,924-Speed 3331.84 samples/sec   Loss 2.1086   LearningRate 0.0327   Epoch: 8   Global Step: 142940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:21:52,008-Speed 3320.69 samples/sec   Loss 2.0435   LearningRate 0.0327   Epoch: 8   Global Step: 142950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:21:55,084-Speed 3329.54 samples/sec   Loss 2.1011   LearningRate 0.0327   Epoch: 8   Global Step: 142960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:21:58,217-Speed 3269.68 samples/sec   Loss 2.0136   LearningRate 0.0327   Epoch: 8   Global Step: 142970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:01,306-Speed 3315.55 samples/sec   Loss 2.0181   LearningRate 0.0327   Epoch: 8   Global Step: 142980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:04,412-Speed 3298.21 samples/sec   Loss 2.0553   LearningRate 0.0327   Epoch: 8   Global Step: 142990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:07,504-Speed 3312.29 samples/sec   Loss 2.0178   LearningRate 0.0327   Epoch: 8   Global Step: 143000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:10,580-Speed 3329.48 samples/sec   Loss 2.0382   LearningRate 0.0327   Epoch: 8   Global Step: 143010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:13,659-Speed 3326.35 samples/sec   Loss 2.0505   LearningRate 0.0327   Epoch: 8   Global Step: 143020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:22:16,747-Speed 3316.90 samples/sec   Loss 2.1149   LearningRate 0.0327   Epoch: 8   Global Step: 143030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:19,838-Speed 3313.61 samples/sec   Loss 2.0057   LearningRate 0.0327   Epoch: 8   Global Step: 143040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:22,934-Speed 3308.98 samples/sec   Loss 2.0162   LearningRate 0.0327   Epoch: 8   Global Step: 143050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:26,029-Speed 3309.78 samples/sec   Loss 2.0782   LearningRate 0.0327   Epoch: 8   Global Step: 143060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:29,162-Speed 3268.55 samples/sec   Loss 1.9737   LearningRate 0.0327   Epoch: 8   Global Step: 143070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:32,237-Speed 3330.93 samples/sec   Loss 2.0055   LearningRate 0.0326   Epoch: 8   Global Step: 143080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:35,314-Speed 3328.26 samples/sec   Loss 2.0823   LearningRate 0.0326   Epoch: 8   Global Step: 143090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:38,402-Speed 3317.44 samples/sec   Loss 2.0926   LearningRate 0.0326   Epoch: 8   Global Step: 143100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:41,477-Speed 3330.46 samples/sec   Loss 1.9902   LearningRate 0.0326   Epoch: 8   Global Step: 143110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:44,552-Speed 3330.53 samples/sec   Loss 2.0767   LearningRate 0.0326   Epoch: 8   Global Step: 143120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:22:47,633-Speed 3324.86 samples/sec   Loss 2.1245   LearningRate 0.0326   Epoch: 8   Global Step: 143130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:22:50,734-Speed 3303.41 samples/sec   Loss 2.0631   LearningRate 0.0326   Epoch: 8   Global Step: 143140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:22:53,816-Speed 3322.77 samples/sec   Loss 2.0278   LearningRate 0.0326   Epoch: 8   Global Step: 143150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:22:56,890-Speed 3332.12 samples/sec   Loss 2.0250   LearningRate 0.0326   Epoch: 8   Global Step: 143160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:22:59,994-Speed 3300.49 samples/sec   Loss 2.0305   LearningRate 0.0326   Epoch: 8   Global Step: 143170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:03,084-Speed 3313.80 samples/sec   Loss 1.9974   LearningRate 0.0326   Epoch: 8   Global Step: 143180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:06,158-Speed 3332.52 samples/sec   Loss 1.9763   LearningRate 0.0326   Epoch: 8   Global Step: 143190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:09,236-Speed 3327.11 samples/sec   Loss 2.1256   LearningRate 0.0326   Epoch: 8   Global Step: 143200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:12,315-Speed 3326.53 samples/sec   Loss 2.0946   LearningRate 0.0326   Epoch: 8   Global Step: 143210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:15,392-Speed 3329.23 samples/sec   Loss 2.1263   LearningRate 0.0326   Epoch: 8   Global Step: 143220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:18,466-Speed 3332.53 samples/sec   Loss 2.0702   LearningRate 0.0326   Epoch: 8   Global Step: 143230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:21,540-Speed 3331.52 samples/sec   Loss 2.0256   LearningRate 0.0326   Epoch: 8   Global Step: 143240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:24,620-Speed 3325.74 samples/sec   Loss 2.1060   LearningRate 0.0326   Epoch: 8   Global Step: 143250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:27,697-Speed 3328.87 samples/sec   Loss 2.0623   LearningRate 0.0326   Epoch: 8   Global Step: 143260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:30,779-Speed 3322.49 samples/sec   Loss 2.0003   LearningRate 0.0326   Epoch: 8   Global Step: 143270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:23:33,896-Speed 3286.14 samples/sec   Loss 2.0764   LearningRate 0.0326   Epoch: 8   Global Step: 143280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:23:36,977-Speed 3324.25 samples/sec   Loss 2.0241   LearningRate 0.0326   Epoch: 8   Global Step: 143290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:23:40,041-Speed 3342.85 samples/sec   Loss 2.0835   LearningRate 0.0326   Epoch: 8   Global Step: 143300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:43,134-Speed 3311.40 samples/sec   Loss 2.0699   LearningRate 0.0326   Epoch: 8   Global Step: 143310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:46,207-Speed 3333.13 samples/sec   Loss 2.0269   LearningRate 0.0326   Epoch: 8   Global Step: 143320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:49,384-Speed 3224.30 samples/sec   Loss 2.0685   LearningRate 0.0326   Epoch: 8   Global Step: 143330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:52,477-Speed 3311.52 samples/sec   Loss 2.1070   LearningRate 0.0326   Epoch: 8   Global Step: 143340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:55,610-Speed 3269.20 samples/sec   Loss 2.0394   LearningRate 0.0326   Epoch: 8   Global Step: 143350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:23:58,727-Speed 3285.62 samples/sec   Loss 2.0368   LearningRate 0.0326   Epoch: 8   Global Step: 143360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:24:01,829-Speed 3302.07 samples/sec   Loss 2.0704   LearningRate 0.0325   Epoch: 8   Global Step: 143370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:24:04,924-Speed 3308.86 samples/sec   Loss 2.0321   LearningRate 0.0325   Epoch: 8   Global Step: 143380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:24:08,000-Speed 3330.05 samples/sec   Loss 2.0478   LearningRate 0.0325   Epoch: 8   Global Step: 143390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:24:11,084-Speed 3321.43 samples/sec   Loss 2.0239   LearningRate 0.0325   Epoch: 8   Global Step: 143400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:14,180-Speed 3308.91 samples/sec   Loss 2.0019   LearningRate 0.0325   Epoch: 8   Global Step: 143410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:17,274-Speed 3310.05 samples/sec   Loss 2.0291   LearningRate 0.0325   Epoch: 8   Global Step: 143420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:20,352-Speed 3326.94 samples/sec   Loss 2.0291   LearningRate 0.0325   Epoch: 8   Global Step: 143430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:23,438-Speed 3319.76 samples/sec   Loss 2.1494   LearningRate 0.0325   Epoch: 8   Global Step: 143440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:26,592-Speed 3247.28 samples/sec   Loss 2.0981   LearningRate 0.0325   Epoch: 8   Global Step: 143450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:29,716-Speed 3278.57 samples/sec   Loss 2.0770   LearningRate 0.0325   Epoch: 8   Global Step: 143460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:32,907-Speed 3209.32 samples/sec   Loss 2.0392   LearningRate 0.0325   Epoch: 8   Global Step: 143470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:36,065-Speed 3243.61 samples/sec   Loss 2.0170   LearningRate 0.0325   Epoch: 8   Global Step: 143480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:39,233-Speed 3233.27 samples/sec   Loss 2.0357   LearningRate 0.0325   Epoch: 8   Global Step: 143490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:42,300-Speed 3339.35 samples/sec   Loss 2.0213   LearningRate 0.0325   Epoch: 8   Global Step: 143500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:45,549-Speed 3152.22 samples/sec   Loss 2.0834   LearningRate 0.0325   Epoch: 8   Global Step: 143510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:48,768-Speed 3182.39 samples/sec   Loss 2.0270   LearningRate 0.0325   Epoch: 8   Global Step: 143520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:51,849-Speed 3324.68 samples/sec   Loss 2.0633   LearningRate 0.0325   Epoch: 8   Global Step: 143530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:54,940-Speed 3312.78 samples/sec   Loss 2.1205   LearningRate 0.0325   Epoch: 8   Global Step: 143540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:24:58,055-Speed 3288.15 samples/sec   Loss 2.0888   LearningRate 0.0325   Epoch: 8   Global Step: 143550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:25:01,130-Speed 3331.02 samples/sec   Loss 2.0141   LearningRate 0.0325   Epoch: 8   Global Step: 143560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:04,207-Speed 3328.41 samples/sec   Loss 2.0469   LearningRate 0.0325   Epoch: 8   Global Step: 143570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:07,320-Speed 3290.52 samples/sec   Loss 2.0678   LearningRate 0.0325   Epoch: 8   Global Step: 143580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:10,397-Speed 3329.20 samples/sec   Loss 2.0497   LearningRate 0.0325   Epoch: 8   Global Step: 143590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:13,487-Speed 3315.30 samples/sec   Loss 2.0283   LearningRate 0.0325   Epoch: 8   Global Step: 143600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:16,575-Speed 3316.24 samples/sec   Loss 2.0832   LearningRate 0.0325   Epoch: 8   Global Step: 143610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:19,659-Speed 3321.20 samples/sec   Loss 2.0854   LearningRate 0.0325   Epoch: 8   Global Step: 143620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:22,743-Speed 3321.12 samples/sec   Loss 2.0693   LearningRate 0.0325   Epoch: 8   Global Step: 143630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:25,914-Speed 3229.90 samples/sec   Loss 2.0592   LearningRate 0.0325   Epoch: 8   Global Step: 143640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:29,069-Speed 3246.63 samples/sec   Loss 2.0630   LearningRate 0.0325   Epoch: 8   Global Step: 143650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:32,218-Speed 3252.88 samples/sec   Loss 2.0836   LearningRate 0.0324   Epoch: 8   Global Step: 143660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:25:35,345-Speed 3275.50 samples/sec   Loss 2.0498   LearningRate 0.0324   Epoch: 8   Global Step: 143670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:25:38,425-Speed 3325.01 samples/sec   Loss 2.1007   LearningRate 0.0324   Epoch: 8   Global Step: 143680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:25:41,546-Speed 3282.26 samples/sec   Loss 2.0324   LearningRate 0.0324   Epoch: 8   Global Step: 143690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:25:44,632-Speed 3319.52 samples/sec   Loss 2.1177   LearningRate 0.0324   Epoch: 8   Global Step: 143700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:47,708-Speed 3329.29 samples/sec   Loss 2.0458   LearningRate 0.0324   Epoch: 8   Global Step: 143710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:50,789-Speed 3325.17 samples/sec   Loss 2.1172   LearningRate 0.0324   Epoch: 8   Global Step: 143720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:53,867-Speed 3327.29 samples/sec   Loss 2.0097   LearningRate 0.0324   Epoch: 8   Global Step: 143730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:25:56,957-Speed 3314.58 samples/sec   Loss 2.0811   LearningRate 0.0324   Epoch: 8   Global Step: 143740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:00,040-Speed 3321.63 samples/sec   Loss 2.1131   LearningRate 0.0324   Epoch: 8   Global Step: 143750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:03,121-Speed 3325.28 samples/sec   Loss 2.1107   LearningRate 0.0324   Epoch: 8   Global Step: 143760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:06,199-Speed 3326.93 samples/sec   Loss 2.0698   LearningRate 0.0324   Epoch: 8   Global Step: 143770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:09,287-Speed 3317.17 samples/sec   Loss 2.0183   LearningRate 0.0324   Epoch: 8   Global Step: 143780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:12,395-Speed 3295.25 samples/sec   Loss 2.0990   LearningRate 0.0324   Epoch: 8   Global Step: 143790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:15,492-Speed 3307.72 samples/sec   Loss 2.0223   LearningRate 0.0324   Epoch: 8   Global Step: 143800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:26:18,567-Speed 3330.93 samples/sec   Loss 2.0634   LearningRate 0.0324   Epoch: 8   Global Step: 143810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:26:21,649-Speed 3323.17 samples/sec   Loss 1.9925   LearningRate 0.0324   Epoch: 8   Global Step: 143820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:26:24,743-Speed 3309.78 samples/sec   Loss 2.0784   LearningRate 0.0324   Epoch: 8   Global Step: 143830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:26:27,929-Speed 3215.31 samples/sec   Loss 2.0215   LearningRate 0.0324   Epoch: 8   Global Step: 143840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:26:31,029-Speed 3304.62 samples/sec   Loss 2.0342   LearningRate 0.0324   Epoch: 8   Global Step: 143850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:26:34,141-Speed 3291.60 samples/sec   Loss 2.0482   LearningRate 0.0324   Epoch: 8   Global Step: 143860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:26:37,202-Speed 3345.79 samples/sec   Loss 2.0526   LearningRate 0.0324   Epoch: 8   Global Step: 143870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:40,277-Speed 3330.75 samples/sec   Loss 2.0248   LearningRate 0.0324   Epoch: 8   Global Step: 143880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:43,354-Speed 3328.70 samples/sec   Loss 2.0262   LearningRate 0.0324   Epoch: 8   Global Step: 143890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:46,430-Speed 3329.21 samples/sec   Loss 2.0314   LearningRate 0.0324   Epoch: 8   Global Step: 143900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:49,505-Speed 3331.35 samples/sec   Loss 1.9985   LearningRate 0.0324   Epoch: 8   Global Step: 143910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:52,610-Speed 3298.34 samples/sec   Loss 2.0107   LearningRate 0.0324   Epoch: 8   Global Step: 143920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:55,759-Speed 3252.24 samples/sec   Loss 2.0670   LearningRate 0.0324   Epoch: 8   Global Step: 143930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:26:58,860-Speed 3304.29 samples/sec   Loss 1.9939   LearningRate 0.0324   Epoch: 8   Global Step: 143940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:27:01,947-Speed 3317.89 samples/sec   Loss 2.0170   LearningRate 0.0324   Epoch: 8   Global Step: 143950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:27:05,181-Speed 3166.97 samples/sec   Loss 2.0380   LearningRate 0.0323   Epoch: 8   Global Step: 143960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:27:08,295-Speed 3288.66 samples/sec   Loss 2.0228   LearningRate 0.0323   Epoch: 8   Global Step: 143970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:27:11,377-Speed 3323.25 samples/sec   Loss 2.0583   LearningRate 0.0323   Epoch: 8   Global Step: 143980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:27:14,463-Speed 3319.72 samples/sec   Loss 2.0825   LearningRate 0.0323   Epoch: 8   Global Step: 143990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:27:17,574-Speed 3292.27 samples/sec   Loss 2.0691   LearningRate 0.0323   Epoch: 8   Global Step: 144000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:28:01,931-[lfw][144000]XNorm: 21.881411
Training: 2022-04-11 14:28:01,931-[lfw][144000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 14:28:01,932-[lfw][144000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:28:53,360-[cfp_fp][144000]XNorm: 21.085438
Training: 2022-04-11 14:28:53,360-[cfp_fp][144000]Accuracy-Flip: 0.98686+-0.00571
Training: 2022-04-11 14:28:53,361-[cfp_fp][144000]Accuracy-Highest: 0.98814
Training: 2022-04-11 14:29:37,547-[agedb_30][144000]XNorm: 21.916345
Training: 2022-04-11 14:29:37,547-[agedb_30][144000]Accuracy-Flip: 0.98133+-0.00618
Training: 2022-04-11 14:29:37,548-[agedb_30][144000]Accuracy-Highest: 0.98317
Training: 2022-04-11 14:29:40,635-Speed 71.58 samples/sec   Loss 2.0614   LearningRate 0.0323   Epoch: 8   Global Step: 144010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:29:43,771-Speed 3266.47 samples/sec   Loss 2.0871   LearningRate 0.0323   Epoch: 8   Global Step: 144020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:29:47,012-Speed 3159.73 samples/sec   Loss 2.0774   LearningRate 0.0323   Epoch: 8   Global Step: 144030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:29:50,095-Speed 3322.43 samples/sec   Loss 2.0838   LearningRate 0.0323   Epoch: 8   Global Step: 144040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:29:53,174-Speed 3327.20 samples/sec   Loss 1.9836   LearningRate 0.0323   Epoch: 8   Global Step: 144050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:29:56,262-Speed 3316.98 samples/sec   Loss 2.0416   LearningRate 0.0323   Epoch: 8   Global Step: 144060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:29:59,334-Speed 3333.66 samples/sec   Loss 2.0756   LearningRate 0.0323   Epoch: 8   Global Step: 144070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:02,409-Speed 3330.72 samples/sec   Loss 2.0707   LearningRate 0.0323   Epoch: 8   Global Step: 144080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:05,482-Speed 3333.20 samples/sec   Loss 1.9893   LearningRate 0.0323   Epoch: 8   Global Step: 144090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:08,658-Speed 3225.10 samples/sec   Loss 2.0017   LearningRate 0.0323   Epoch: 8   Global Step: 144100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:11,785-Speed 3274.73 samples/sec   Loss 2.1249   LearningRate 0.0323   Epoch: 8   Global Step: 144110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:14,862-Speed 3329.28 samples/sec   Loss 2.0566   LearningRate 0.0323   Epoch: 8   Global Step: 144120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:17,948-Speed 3318.82 samples/sec   Loss 2.0111   LearningRate 0.0323   Epoch: 8   Global Step: 144130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:21,049-Speed 3303.32 samples/sec   Loss 2.1131   LearningRate 0.0323   Epoch: 8   Global Step: 144140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:24,138-Speed 3316.13 samples/sec   Loss 2.0843   LearningRate 0.0323   Epoch: 8   Global Step: 144150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:27,232-Speed 3309.80 samples/sec   Loss 2.0892   LearningRate 0.0323   Epoch: 8   Global Step: 144160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:30,344-Speed 3291.26 samples/sec   Loss 2.0496   LearningRate 0.0323   Epoch: 8   Global Step: 144170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:33,418-Speed 3331.47 samples/sec   Loss 2.0927   LearningRate 0.0323   Epoch: 8   Global Step: 144180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:36,503-Speed 3321.06 samples/sec   Loss 2.1068   LearningRate 0.0323   Epoch: 8   Global Step: 144190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:39,584-Speed 3323.48 samples/sec   Loss 2.0728   LearningRate 0.0323   Epoch: 8   Global Step: 144200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:42,662-Speed 3328.43 samples/sec   Loss 2.0538   LearningRate 0.0323   Epoch: 8   Global Step: 144210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:45,748-Speed 3318.20 samples/sec   Loss 1.9906   LearningRate 0.0323   Epoch: 8   Global Step: 144220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:48,845-Speed 3308.44 samples/sec   Loss 1.9832   LearningRate 0.0323   Epoch: 8   Global Step: 144230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:51,942-Speed 3307.15 samples/sec   Loss 2.0165   LearningRate 0.0323   Epoch: 8   Global Step: 144240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:55,021-Speed 3326.05 samples/sec   Loss 2.0587   LearningRate 0.0322   Epoch: 8   Global Step: 144250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:30:58,093-Speed 3334.54 samples/sec   Loss 2.0293   LearningRate 0.0322   Epoch: 8   Global Step: 144260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:31:01,192-Speed 3304.26 samples/sec   Loss 2.0742   LearningRate 0.0322   Epoch: 8   Global Step: 144270   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-11 14:31:04,259-Speed 3340.28 samples/sec   Loss 2.0093   LearningRate 0.0322   Epoch: 8   Global Step: 144280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:31:07,322-Speed 3343.45 samples/sec   Loss 2.0623   LearningRate 0.0322   Epoch: 8   Global Step: 144290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:10,410-Speed 3316.75 samples/sec   Loss 2.0724   LearningRate 0.0322   Epoch: 8   Global Step: 144300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:13,648-Speed 3163.62 samples/sec   Loss 2.0278   LearningRate 0.0322   Epoch: 8   Global Step: 144310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:16,790-Speed 3259.69 samples/sec   Loss 2.0787   LearningRate 0.0322   Epoch: 8   Global Step: 144320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:19,894-Speed 3299.69 samples/sec   Loss 2.0640   LearningRate 0.0322   Epoch: 8   Global Step: 144330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:22,994-Speed 3303.88 samples/sec   Loss 2.1048   LearningRate 0.0322   Epoch: 8   Global Step: 144340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:26,077-Speed 3322.72 samples/sec   Loss 2.0785   LearningRate 0.0322   Epoch: 8   Global Step: 144350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:29,208-Speed 3270.65 samples/sec   Loss 2.0497   LearningRate 0.0322   Epoch: 8   Global Step: 144360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:32,324-Speed 3287.90 samples/sec   Loss 2.0225   LearningRate 0.0322   Epoch: 8   Global Step: 144370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:35,408-Speed 3320.14 samples/sec   Loss 2.0710   LearningRate 0.0322   Epoch: 8   Global Step: 144380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:38,496-Speed 3317.55 samples/sec   Loss 2.0425   LearningRate 0.0322   Epoch: 8   Global Step: 144390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:31:41,587-Speed 3313.53 samples/sec   Loss 2.0578   LearningRate 0.0322   Epoch: 8   Global Step: 144400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:31:44,660-Speed 3332.57 samples/sec   Loss 2.1251   LearningRate 0.0322   Epoch: 8   Global Step: 144410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:47,730-Speed 3336.16 samples/sec   Loss 2.0601   LearningRate 0.0322   Epoch: 8   Global Step: 144420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:50,809-Speed 3326.93 samples/sec   Loss 2.0198   LearningRate 0.0322   Epoch: 8   Global Step: 144430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:53,880-Speed 3334.78 samples/sec   Loss 2.0372   LearningRate 0.0322   Epoch: 8   Global Step: 144440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:31:56,951-Speed 3336.06 samples/sec   Loss 2.0358   LearningRate 0.0322   Epoch: 8   Global Step: 144450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:00,033-Speed 3322.84 samples/sec   Loss 2.0365   LearningRate 0.0322   Epoch: 8   Global Step: 144460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:03,115-Speed 3323.01 samples/sec   Loss 2.0738   LearningRate 0.0322   Epoch: 8   Global Step: 144470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:06,261-Speed 3255.73 samples/sec   Loss 2.0915   LearningRate 0.0322   Epoch: 8   Global Step: 144480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:09,391-Speed 3272.54 samples/sec   Loss 2.0470   LearningRate 0.0322   Epoch: 8   Global Step: 144490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:12,467-Speed 3329.72 samples/sec   Loss 2.0808   LearningRate 0.0322   Epoch: 8   Global Step: 144500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:15,552-Speed 3320.36 samples/sec   Loss 2.0514   LearningRate 0.0322   Epoch: 8   Global Step: 144510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:32:18,635-Speed 3322.25 samples/sec   Loss 1.9891   LearningRate 0.0322   Epoch: 8   Global Step: 144520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:32:21,708-Speed 3333.48 samples/sec   Loss 2.0569   LearningRate 0.0322   Epoch: 8   Global Step: 144530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:32:24,790-Speed 3322.28 samples/sec   Loss 2.0900   LearningRate 0.0322   Epoch: 8   Global Step: 144540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:32:27,875-Speed 3320.34 samples/sec   Loss 2.1297   LearningRate 0.0321   Epoch: 8   Global Step: 144550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:32:30,963-Speed 3317.15 samples/sec   Loss 2.0419   LearningRate 0.0321   Epoch: 8   Global Step: 144560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:32:34,044-Speed 3324.56 samples/sec   Loss 2.0511   LearningRate 0.0321   Epoch: 8   Global Step: 144570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:32:37,122-Speed 3327.49 samples/sec   Loss 2.0718   LearningRate 0.0321   Epoch: 8   Global Step: 144580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:32:40,206-Speed 3321.01 samples/sec   Loss 2.0098   LearningRate 0.0321   Epoch: 8   Global Step: 144590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:32:43,268-Speed 3344.79 samples/sec   Loss 2.0236   LearningRate 0.0321   Epoch: 8   Global Step: 144600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:46,357-Speed 3316.13 samples/sec   Loss 2.0176   LearningRate 0.0321   Epoch: 8   Global Step: 144610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:49,446-Speed 3315.74 samples/sec   Loss 2.0190   LearningRate 0.0321   Epoch: 8   Global Step: 144620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:52,531-Speed 3320.18 samples/sec   Loss 2.0368   LearningRate 0.0321   Epoch: 8   Global Step: 144630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:55,600-Speed 3337.24 samples/sec   Loss 2.0872   LearningRate 0.0321   Epoch: 8   Global Step: 144640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:32:58,680-Speed 3324.45 samples/sec   Loss 2.0615   LearningRate 0.0321   Epoch: 8   Global Step: 144650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:01,773-Speed 3312.61 samples/sec   Loss 2.0011   LearningRate 0.0321   Epoch: 8   Global Step: 144660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:04,857-Speed 3320.66 samples/sec   Loss 2.0451   LearningRate 0.0321   Epoch: 8   Global Step: 144670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:07,935-Speed 3328.54 samples/sec   Loss 2.0418   LearningRate 0.0321   Epoch: 8   Global Step: 144680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:11,016-Speed 3323.77 samples/sec   Loss 2.0801   LearningRate 0.0321   Epoch: 8   Global Step: 144690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:14,078-Speed 3345.02 samples/sec   Loss 2.0932   LearningRate 0.0321   Epoch: 8   Global Step: 144700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:17,157-Speed 3326.75 samples/sec   Loss 2.0505   LearningRate 0.0321   Epoch: 8   Global Step: 144710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:20,246-Speed 3315.20 samples/sec   Loss 2.0473   LearningRate 0.0321   Epoch: 8   Global Step: 144720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:23,334-Speed 3316.80 samples/sec   Loss 2.0187   LearningRate 0.0321   Epoch: 8   Global Step: 144730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:26,418-Speed 3320.82 samples/sec   Loss 1.9728   LearningRate 0.0321   Epoch: 8   Global Step: 144740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:29,500-Speed 3324.38 samples/sec   Loss 2.0910   LearningRate 0.0321   Epoch: 8   Global Step: 144750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:32,584-Speed 3320.80 samples/sec   Loss 2.0596   LearningRate 0.0321   Epoch: 8   Global Step: 144760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:35,660-Speed 3329.86 samples/sec   Loss 2.0668   LearningRate 0.0321   Epoch: 8   Global Step: 144770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:38,756-Speed 3308.68 samples/sec   Loss 2.0888   LearningRate 0.0321   Epoch: 8   Global Step: 144780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:41,841-Speed 3319.32 samples/sec   Loss 2.1003   LearningRate 0.0321   Epoch: 8   Global Step: 144790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:33:44,960-Speed 3283.48 samples/sec   Loss 2.0241   LearningRate 0.0321   Epoch: 8   Global Step: 144800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:33:48,070-Speed 3293.87 samples/sec   Loss 2.0549   LearningRate 0.0321   Epoch: 8   Global Step: 144810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:33:51,200-Speed 3272.62 samples/sec   Loss 2.0158   LearningRate 0.0321   Epoch: 8   Global Step: 144820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:33:54,401-Speed 3199.48 samples/sec   Loss 2.0247   LearningRate 0.0321   Epoch: 8   Global Step: 144830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:33:57,535-Speed 3268.33 samples/sec   Loss 2.0607   LearningRate 0.0320   Epoch: 8   Global Step: 144840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:00,648-Speed 3290.82 samples/sec   Loss 2.0925   LearningRate 0.0320   Epoch: 8   Global Step: 144850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:03,725-Speed 3329.11 samples/sec   Loss 2.0674   LearningRate 0.0320   Epoch: 8   Global Step: 144860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:06,804-Speed 3325.80 samples/sec   Loss 2.0893   LearningRate 0.0320   Epoch: 8   Global Step: 144870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:09,911-Speed 3297.30 samples/sec   Loss 2.0695   LearningRate 0.0320   Epoch: 8   Global Step: 144880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:12,983-Speed 3333.92 samples/sec   Loss 2.1136   LearningRate 0.0320   Epoch: 8   Global Step: 144890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:16,057-Speed 3331.30 samples/sec   Loss 2.0696   LearningRate 0.0320   Epoch: 8   Global Step: 144900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:19,174-Speed 3286.98 samples/sec   Loss 2.1125   LearningRate 0.0320   Epoch: 8   Global Step: 144910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:22,318-Speed 3257.58 samples/sec   Loss 2.0254   LearningRate 0.0320   Epoch: 8   Global Step: 144920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:25,403-Speed 3320.09 samples/sec   Loss 2.0113   LearningRate 0.0320   Epoch: 8   Global Step: 144930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:28,499-Speed 3307.17 samples/sec   Loss 2.0851   LearningRate 0.0320   Epoch: 8   Global Step: 144940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:34:31,619-Speed 3283.76 samples/sec   Loss 2.0581   LearningRate 0.0320   Epoch: 8   Global Step: 144950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:34,716-Speed 3306.96 samples/sec   Loss 2.0775   LearningRate 0.0320   Epoch: 8   Global Step: 144960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:37,841-Speed 3277.90 samples/sec   Loss 2.1039   LearningRate 0.0320   Epoch: 8   Global Step: 144970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:40,929-Speed 3316.72 samples/sec   Loss 2.0203   LearningRate 0.0320   Epoch: 8   Global Step: 144980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:44,004-Speed 3331.49 samples/sec   Loss 2.0345   LearningRate 0.0320   Epoch: 8   Global Step: 144990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:47,093-Speed 3315.23 samples/sec   Loss 1.9804   LearningRate 0.0320   Epoch: 8   Global Step: 145000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:50,170-Speed 3328.75 samples/sec   Loss 2.0278   LearningRate 0.0320   Epoch: 8   Global Step: 145010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:53,298-Speed 3274.71 samples/sec   Loss 2.0535   LearningRate 0.0320   Epoch: 8   Global Step: 145020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:56,380-Speed 3322.80 samples/sec   Loss 2.0595   LearningRate 0.0320   Epoch: 8   Global Step: 145030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:34:59,453-Speed 3333.38 samples/sec   Loss 2.0464   LearningRate 0.0320   Epoch: 8   Global Step: 145040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:35:02,536-Speed 3322.05 samples/sec   Loss 2.0745   LearningRate 0.0320   Epoch: 8   Global Step: 145050   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-11 14:35:05,614-Speed 3327.85 samples/sec   Loss 2.0784   LearningRate 0.0320   Epoch: 8   Global Step: 145060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:35:08,704-Speed 3320.48 samples/sec   Loss 2.0613   LearningRate 0.0320   Epoch: 8   Global Step: 145070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:35:11,787-Speed 3322.36 samples/sec   Loss 2.0002   LearningRate 0.0320   Epoch: 8   Global Step: 145080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:35:14,872-Speed 3320.14 samples/sec   Loss 2.0280   LearningRate 0.0320   Epoch: 8   Global Step: 145090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:35:17,958-Speed 3319.08 samples/sec   Loss 2.1120   LearningRate 0.0320   Epoch: 8   Global Step: 145100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:35:21,036-Speed 3327.22 samples/sec   Loss 2.0977   LearningRate 0.0320   Epoch: 8   Global Step: 145110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:24,121-Speed 3320.78 samples/sec   Loss 2.1107   LearningRate 0.0320   Epoch: 8   Global Step: 145120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:27,200-Speed 3326.51 samples/sec   Loss 2.0994   LearningRate 0.0320   Epoch: 8   Global Step: 145130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:30,278-Speed 3327.54 samples/sec   Loss 2.0550   LearningRate 0.0319   Epoch: 8   Global Step: 145140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:33,372-Speed 3309.42 samples/sec   Loss 1.9937   LearningRate 0.0319   Epoch: 8   Global Step: 145150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:36,465-Speed 3312.34 samples/sec   Loss 2.0394   LearningRate 0.0319   Epoch: 8   Global Step: 145160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:39,544-Speed 3326.21 samples/sec   Loss 2.0070   LearningRate 0.0319   Epoch: 8   Global Step: 145170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:42,626-Speed 3324.74 samples/sec   Loss 2.0375   LearningRate 0.0319   Epoch: 8   Global Step: 145180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:45,705-Speed 3326.80 samples/sec   Loss 1.9951   LearningRate 0.0319   Epoch: 8   Global Step: 145190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:48,789-Speed 3320.54 samples/sec   Loss 2.0103   LearningRate 0.0319   Epoch: 8   Global Step: 145200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:35:51,865-Speed 3329.28 samples/sec   Loss 2.0240   LearningRate 0.0319   Epoch: 8   Global Step: 145210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:35:54,944-Speed 3326.71 samples/sec   Loss 2.0393   LearningRate 0.0319   Epoch: 8   Global Step: 145220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:35:58,020-Speed 3329.94 samples/sec   Loss 2.1001   LearningRate 0.0319   Epoch: 8   Global Step: 145230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:01,099-Speed 3326.62 samples/sec   Loss 2.1346   LearningRate 0.0319   Epoch: 8   Global Step: 145240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:04,187-Speed 3317.04 samples/sec   Loss 2.1145   LearningRate 0.0319   Epoch: 8   Global Step: 145250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:07,253-Speed 3340.21 samples/sec   Loss 2.0251   LearningRate 0.0319   Epoch: 8   Global Step: 145260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:10,336-Speed 3322.78 samples/sec   Loss 2.0586   LearningRate 0.0319   Epoch: 8   Global Step: 145270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:13,419-Speed 3322.04 samples/sec   Loss 2.0408   LearningRate 0.0319   Epoch: 8   Global Step: 145280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:16,514-Speed 3309.35 samples/sec   Loss 2.0151   LearningRate 0.0319   Epoch: 8   Global Step: 145290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:19,590-Speed 3329.36 samples/sec   Loss 2.0436   LearningRate 0.0319   Epoch: 8   Global Step: 145300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:22,670-Speed 3325.93 samples/sec   Loss 2.0330   LearningRate 0.0319   Epoch: 8   Global Step: 145310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:25,745-Speed 3329.81 samples/sec   Loss 2.0062   LearningRate 0.0319   Epoch: 8   Global Step: 145320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:28,832-Speed 3318.76 samples/sec   Loss 2.0927   LearningRate 0.0319   Epoch: 8   Global Step: 145330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:31,911-Speed 3326.61 samples/sec   Loss 2.0103   LearningRate 0.0319   Epoch: 8   Global Step: 145340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:34,999-Speed 3316.24 samples/sec   Loss 2.0231   LearningRate 0.0319   Epoch: 8   Global Step: 145350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:36:38,126-Speed 3276.23 samples/sec   Loss 2.0221   LearningRate 0.0319   Epoch: 8   Global Step: 145360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:41,204-Speed 3327.66 samples/sec   Loss 2.0252   LearningRate 0.0319   Epoch: 8   Global Step: 145370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:44,282-Speed 3327.77 samples/sec   Loss 2.0608   LearningRate 0.0319   Epoch: 8   Global Step: 145380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:47,369-Speed 3317.80 samples/sec   Loss 1.9725   LearningRate 0.0319   Epoch: 8   Global Step: 145390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:50,453-Speed 3320.70 samples/sec   Loss 2.0232   LearningRate 0.0319   Epoch: 8   Global Step: 145400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:53,531-Speed 3327.56 samples/sec   Loss 2.0661   LearningRate 0.0319   Epoch: 8   Global Step: 145410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:56,605-Speed 3331.43 samples/sec   Loss 2.0133   LearningRate 0.0319   Epoch: 8   Global Step: 145420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:36:59,687-Speed 3324.29 samples/sec   Loss 2.0430   LearningRate 0.0318   Epoch: 8   Global Step: 145430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:02,773-Speed 3319.30 samples/sec   Loss 2.0504   LearningRate 0.0318   Epoch: 8   Global Step: 145440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:05,943-Speed 3230.41 samples/sec   Loss 2.0285   LearningRate 0.0318   Epoch: 8   Global Step: 145450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:09,060-Speed 3286.42 samples/sec   Loss 2.0031   LearningRate 0.0318   Epoch: 8   Global Step: 145460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:12,136-Speed 3329.53 samples/sec   Loss 2.0560   LearningRate 0.0318   Epoch: 8   Global Step: 145470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:15,215-Speed 3326.11 samples/sec   Loss 2.0522   LearningRate 0.0318   Epoch: 8   Global Step: 145480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:18,302-Speed 3318.78 samples/sec   Loss 2.0037   LearningRate 0.0318   Epoch: 8   Global Step: 145490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:21,401-Speed 3304.84 samples/sec   Loss 2.0272   LearningRate 0.0318   Epoch: 8   Global Step: 145500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:24,569-Speed 3232.60 samples/sec   Loss 2.0872   LearningRate 0.0318   Epoch: 8   Global Step: 145510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:27,652-Speed 3322.07 samples/sec   Loss 2.0488   LearningRate 0.0318   Epoch: 8   Global Step: 145520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:30,730-Speed 3327.76 samples/sec   Loss 2.0939   LearningRate 0.0318   Epoch: 8   Global Step: 145530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:33,873-Speed 3258.83 samples/sec   Loss 2.0758   LearningRate 0.0318   Epoch: 8   Global Step: 145540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:36,996-Speed 3279.64 samples/sec   Loss 2.0564   LearningRate 0.0318   Epoch: 8   Global Step: 145550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:40,077-Speed 3324.27 samples/sec   Loss 2.0904   LearningRate 0.0318   Epoch: 8   Global Step: 145560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:43,155-Speed 3328.01 samples/sec   Loss 2.0513   LearningRate 0.0318   Epoch: 8   Global Step: 145570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:46,233-Speed 3328.12 samples/sec   Loss 2.0842   LearningRate 0.0318   Epoch: 8   Global Step: 145580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:49,352-Speed 3283.62 samples/sec   Loss 2.0364   LearningRate 0.0318   Epoch: 8   Global Step: 145590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:52,454-Speed 3301.47 samples/sec   Loss 2.0238   LearningRate 0.0318   Epoch: 8   Global Step: 145600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:37:55,531-Speed 3328.84 samples/sec   Loss 2.0236   LearningRate 0.0318   Epoch: 8   Global Step: 145610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:37:58,612-Speed 3324.38 samples/sec   Loss 2.0971   LearningRate 0.0318   Epoch: 8   Global Step: 145620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:01,688-Speed 3330.75 samples/sec   Loss 1.9879   LearningRate 0.0318   Epoch: 8   Global Step: 145630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:04,780-Speed 3311.65 samples/sec   Loss 2.1141   LearningRate 0.0318   Epoch: 8   Global Step: 145640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:07,857-Speed 3328.64 samples/sec   Loss 2.0394   LearningRate 0.0318   Epoch: 8   Global Step: 145650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:11,082-Speed 3176.57 samples/sec   Loss 2.0569   LearningRate 0.0318   Epoch: 8   Global Step: 145660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:14,249-Speed 3233.90 samples/sec   Loss 2.0839   LearningRate 0.0318   Epoch: 8   Global Step: 145670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:17,409-Speed 3240.92 samples/sec   Loss 1.9879   LearningRate 0.0318   Epoch: 8   Global Step: 145680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:20,518-Speed 3293.95 samples/sec   Loss 2.0851   LearningRate 0.0318   Epoch: 8   Global Step: 145690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:23,692-Speed 3227.27 samples/sec   Loss 2.0930   LearningRate 0.0318   Epoch: 8   Global Step: 145700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:26,795-Speed 3301.46 samples/sec   Loss 2.0402   LearningRate 0.0318   Epoch: 8   Global Step: 145710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:38:29,874-Speed 3325.83 samples/sec   Loss 2.0001   LearningRate 0.0318   Epoch: 8   Global Step: 145720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:38:32,955-Speed 3324.69 samples/sec   Loss 2.0434   LearningRate 0.0317   Epoch: 8   Global Step: 145730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:38:36,041-Speed 3318.55 samples/sec   Loss 1.9986   LearningRate 0.0317   Epoch: 8   Global Step: 145740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:38:39,119-Speed 3328.17 samples/sec   Loss 2.0145   LearningRate 0.0317   Epoch: 8   Global Step: 145750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:38:42,283-Speed 3237.40 samples/sec   Loss 2.0300   LearningRate 0.0317   Epoch: 8   Global Step: 145760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:38:45,429-Speed 3255.16 samples/sec   Loss 2.0427   LearningRate 0.0317   Epoch: 8   Global Step: 145770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:38:48,516-Speed 3318.39 samples/sec   Loss 2.0017   LearningRate 0.0317   Epoch: 8   Global Step: 145780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:38:51,590-Speed 3331.73 samples/sec   Loss 2.0452   LearningRate 0.0317   Epoch: 8   Global Step: 145790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:38:54,670-Speed 3325.78 samples/sec   Loss 2.0090   LearningRate 0.0317   Epoch: 8   Global Step: 145800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:38:57,746-Speed 3329.52 samples/sec   Loss 2.0202   LearningRate 0.0317   Epoch: 8   Global Step: 145810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:39:00,830-Speed 3321.27 samples/sec   Loss 2.0549   LearningRate 0.0317   Epoch: 8   Global Step: 145820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:39:03,919-Speed 3315.68 samples/sec   Loss 2.1145   LearningRate 0.0317   Epoch: 8   Global Step: 145830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:39:06,993-Speed 3331.60 samples/sec   Loss 2.0241   LearningRate 0.0317   Epoch: 8   Global Step: 145840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:39:10,075-Speed 3323.38 samples/sec   Loss 2.0300   LearningRate 0.0317   Epoch: 8   Global Step: 145850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:39:13,147-Speed 3334.46 samples/sec   Loss 1.9783   LearningRate 0.0317   Epoch: 8   Global Step: 145860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-11 14:39:16,226-Speed 3327.01 samples/sec   Loss 2.0797   LearningRate 0.0317   Epoch: 8   Global Step: 145870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:19,302-Speed 3329.55 samples/sec   Loss 2.0606   LearningRate 0.0317   Epoch: 8   Global Step: 145880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:22,381-Speed 3326.52 samples/sec   Loss 2.0772   LearningRate 0.0317   Epoch: 8   Global Step: 145890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:25,458-Speed 3328.60 samples/sec   Loss 2.1421   LearningRate 0.0317   Epoch: 8   Global Step: 145900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:28,536-Speed 3327.23 samples/sec   Loss 2.0282   LearningRate 0.0317   Epoch: 8   Global Step: 145910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:31,611-Speed 3330.95 samples/sec   Loss 2.0675   LearningRate 0.0317   Epoch: 8   Global Step: 145920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:34,696-Speed 3320.33 samples/sec   Loss 1.9941   LearningRate 0.0317   Epoch: 8   Global Step: 145930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:37,773-Speed 3328.42 samples/sec   Loss 2.0511   LearningRate 0.0317   Epoch: 8   Global Step: 145940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:40,919-Speed 3255.34 samples/sec   Loss 2.0651   LearningRate 0.0317   Epoch: 8   Global Step: 145950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:44,100-Speed 3219.84 samples/sec   Loss 2.0492   LearningRate 0.0317   Epoch: 8   Global Step: 145960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:39:47,192-Speed 3313.75 samples/sec   Loss 2.0808   LearningRate 0.0317   Epoch: 8   Global Step: 145970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:39:50,270-Speed 3326.82 samples/sec   Loss 2.0875   LearningRate 0.0317   Epoch: 8   Global Step: 145980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:39:53,349-Speed 3326.56 samples/sec   Loss 2.0712   LearningRate 0.0317   Epoch: 8   Global Step: 145990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:39:56,435-Speed 3318.41 samples/sec   Loss 2.0595   LearningRate 0.0317   Epoch: 8   Global Step: 146000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:40:40,414-[lfw][146000]XNorm: 22.527641
Training: 2022-04-11 14:40:40,415-[lfw][146000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 14:40:40,415-[lfw][146000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:41:31,342-[cfp_fp][146000]XNorm: 21.715498
Training: 2022-04-11 14:41:31,342-[cfp_fp][146000]Accuracy-Flip: 0.98800+-0.00630
Training: 2022-04-11 14:41:31,343-[cfp_fp][146000]Accuracy-Highest: 0.98814
Training: 2022-04-11 14:42:15,170-[agedb_30][146000]XNorm: 22.704439
Training: 2022-04-11 14:42:15,170-[agedb_30][146000]Accuracy-Flip: 0.98183+-0.00717
Training: 2022-04-11 14:42:15,171-[agedb_30][146000]Accuracy-Highest: 0.98317
Training: 2022-04-11 14:42:18,272-Speed 72.20 samples/sec   Loss 2.0869   LearningRate 0.0317   Epoch: 8   Global Step: 146010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:42:21,337-Speed 3341.60 samples/sec   Loss 2.1070   LearningRate 0.0316   Epoch: 8   Global Step: 146020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:42:24,407-Speed 3336.04 samples/sec   Loss 2.0444   LearningRate 0.0316   Epoch: 8   Global Step: 146030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:27,506-Speed 3305.12 samples/sec   Loss 2.1637   LearningRate 0.0316   Epoch: 8   Global Step: 146040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:30,586-Speed 3324.87 samples/sec   Loss 2.0476   LearningRate 0.0316   Epoch: 8   Global Step: 146050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:33,657-Speed 3335.99 samples/sec   Loss 2.0598   LearningRate 0.0316   Epoch: 8   Global Step: 146060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:36,823-Speed 3234.93 samples/sec   Loss 2.0078   LearningRate 0.0316   Epoch: 8   Global Step: 146070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:39,947-Speed 3278.31 samples/sec   Loss 2.0460   LearningRate 0.0316   Epoch: 8   Global Step: 146080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:43,036-Speed 3315.21 samples/sec   Loss 2.0214   LearningRate 0.0316   Epoch: 8   Global Step: 146090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:46,110-Speed 3332.92 samples/sec   Loss 2.0431   LearningRate 0.0316   Epoch: 8   Global Step: 146100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:49,179-Speed 3336.77 samples/sec   Loss 2.0785   LearningRate 0.0316   Epoch: 8   Global Step: 146110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:52,251-Speed 3333.88 samples/sec   Loss 2.0245   LearningRate 0.0316   Epoch: 8   Global Step: 146120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:42:55,329-Speed 3327.67 samples/sec   Loss 2.0494   LearningRate 0.0316   Epoch: 8   Global Step: 146130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:42:58,435-Speed 3298.50 samples/sec   Loss 2.0718   LearningRate 0.0316   Epoch: 8   Global Step: 146140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:43:01,514-Speed 3325.56 samples/sec   Loss 2.0498   LearningRate 0.0316   Epoch: 8   Global Step: 146150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:43:04,631-Speed 3286.13 samples/sec   Loss 2.0764   LearningRate 0.0316   Epoch: 8   Global Step: 146160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:43:07,717-Speed 3318.82 samples/sec   Loss 2.0202   LearningRate 0.0316   Epoch: 8   Global Step: 146170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:43:10,810-Speed 3311.62 samples/sec   Loss 2.0963   LearningRate 0.0316   Epoch: 8   Global Step: 146180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:43:13,885-Speed 3331.20 samples/sec   Loss 2.0757   LearningRate 0.0316   Epoch: 8   Global Step: 146190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:16,974-Speed 3316.14 samples/sec   Loss 2.0661   LearningRate 0.0316   Epoch: 8   Global Step: 146200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:20,046-Speed 3333.61 samples/sec   Loss 2.0445   LearningRate 0.0316   Epoch: 8   Global Step: 146210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:23,120-Speed 3331.96 samples/sec   Loss 2.0850   LearningRate 0.0316   Epoch: 8   Global Step: 146220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:26,245-Speed 3277.83 samples/sec   Loss 2.0283   LearningRate 0.0316   Epoch: 8   Global Step: 146230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:29,316-Speed 3335.19 samples/sec   Loss 1.9943   LearningRate 0.0316   Epoch: 8   Global Step: 146240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:32,411-Speed 3309.05 samples/sec   Loss 2.0338   LearningRate 0.0316   Epoch: 8   Global Step: 146250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:35,490-Speed 3326.66 samples/sec   Loss 2.0852   LearningRate 0.0316   Epoch: 8   Global Step: 146260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:38,659-Speed 3231.63 samples/sec   Loss 2.0664   LearningRate 0.0316   Epoch: 8   Global Step: 146270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:41,922-Speed 3138.99 samples/sec   Loss 2.1014   LearningRate 0.0316   Epoch: 8   Global Step: 146280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:43:45,010-Speed 3317.30 samples/sec   Loss 2.0438   LearningRate 0.0316   Epoch: 8   Global Step: 146290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:43:48,081-Speed 3334.78 samples/sec   Loss 2.1051   LearningRate 0.0316   Epoch: 8   Global Step: 146300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:43:51,238-Speed 3244.44 samples/sec   Loss 2.1128   LearningRate 0.0316   Epoch: 8   Global Step: 146310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:43:54,329-Speed 3313.12 samples/sec   Loss 2.0587   LearningRate 0.0315   Epoch: 8   Global Step: 146320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:43:57,412-Speed 3322.16 samples/sec   Loss 2.0413   LearningRate 0.0315   Epoch: 8   Global Step: 146330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:44:00,499-Speed 3318.24 samples/sec   Loss 2.0114   LearningRate 0.0315   Epoch: 8   Global Step: 146340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:44:03,589-Speed 3315.18 samples/sec   Loss 1.9652   LearningRate 0.0315   Epoch: 8   Global Step: 146350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:06,668-Speed 3326.00 samples/sec   Loss 2.0247   LearningRate 0.0315   Epoch: 8   Global Step: 146360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:09,745-Speed 3328.63 samples/sec   Loss 2.0934   LearningRate 0.0315   Epoch: 8   Global Step: 146370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:12,832-Speed 3318.93 samples/sec   Loss 1.9465   LearningRate 0.0315   Epoch: 8   Global Step: 146380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:15,906-Speed 3331.72 samples/sec   Loss 2.0334   LearningRate 0.0315   Epoch: 8   Global Step: 146390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:18,981-Speed 3330.10 samples/sec   Loss 2.0516   LearningRate 0.0315   Epoch: 8   Global Step: 146400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:22,059-Speed 3327.77 samples/sec   Loss 2.1341   LearningRate 0.0315   Epoch: 8   Global Step: 146410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:25,284-Speed 3176.22 samples/sec   Loss 2.0773   LearningRate 0.0315   Epoch: 8   Global Step: 146420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:28,361-Speed 3329.04 samples/sec   Loss 2.1208   LearningRate 0.0315   Epoch: 8   Global Step: 146430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:31,437-Speed 3329.56 samples/sec   Loss 2.0617   LearningRate 0.0315   Epoch: 8   Global Step: 146440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:34,508-Speed 3334.76 samples/sec   Loss 2.0353   LearningRate 0.0315   Epoch: 8   Global Step: 146450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:44:37,579-Speed 3336.12 samples/sec   Loss 2.0059   LearningRate 0.0315   Epoch: 8   Global Step: 146460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:44:40,656-Speed 3328.76 samples/sec   Loss 2.0019   LearningRate 0.0315   Epoch: 8   Global Step: 146470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:44:43,725-Speed 3337.08 samples/sec   Loss 2.0287   LearningRate 0.0315   Epoch: 8   Global Step: 146480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:44:46,791-Speed 3340.80 samples/sec   Loss 2.0315   LearningRate 0.0315   Epoch: 8   Global Step: 146490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:49,916-Speed 3277.17 samples/sec   Loss 2.0661   LearningRate 0.0315   Epoch: 8   Global Step: 146500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:52,991-Speed 3330.71 samples/sec   Loss 2.0585   LearningRate 0.0315   Epoch: 8   Global Step: 146510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:56,069-Speed 3327.37 samples/sec   Loss 2.0399   LearningRate 0.0315   Epoch: 8   Global Step: 146520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:44:59,145-Speed 3329.59 samples/sec   Loss 2.0446   LearningRate 0.0315   Epoch: 8   Global Step: 146530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:02,214-Speed 3337.59 samples/sec   Loss 2.0795   LearningRate 0.0315   Epoch: 8   Global Step: 146540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:05,288-Speed 3332.15 samples/sec   Loss 2.0552   LearningRate 0.0315   Epoch: 8   Global Step: 146550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:08,374-Speed 3318.67 samples/sec   Loss 2.0898   LearningRate 0.0315   Epoch: 8   Global Step: 146560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:11,442-Speed 3338.53 samples/sec   Loss 2.0739   LearningRate 0.0315   Epoch: 8   Global Step: 146570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:14,533-Speed 3314.03 samples/sec   Loss 2.0612   LearningRate 0.0315   Epoch: 8   Global Step: 146580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:17,609-Speed 3330.11 samples/sec   Loss 2.0598   LearningRate 0.0315   Epoch: 8   Global Step: 146590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:45:20,678-Speed 3336.61 samples/sec   Loss 2.0282   LearningRate 0.0315   Epoch: 8   Global Step: 146600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:45:23,748-Speed 3337.00 samples/sec   Loss 2.0560   LearningRate 0.0315   Epoch: 8   Global Step: 146610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:26,821-Speed 3332.67 samples/sec   Loss 2.0125   LearningRate 0.0314   Epoch: 8   Global Step: 146620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:29,914-Speed 3312.33 samples/sec   Loss 2.0505   LearningRate 0.0314   Epoch: 8   Global Step: 146630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:32,989-Speed 3330.78 samples/sec   Loss 2.1428   LearningRate 0.0314   Epoch: 8   Global Step: 146640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:36,091-Speed 3300.99 samples/sec   Loss 2.1145   LearningRate 0.0314   Epoch: 8   Global Step: 146650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:39,160-Speed 3337.82 samples/sec   Loss 2.1012   LearningRate 0.0314   Epoch: 8   Global Step: 146660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:42,234-Speed 3332.21 samples/sec   Loss 2.0943   LearningRate 0.0314   Epoch: 8   Global Step: 146670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:45,338-Speed 3299.44 samples/sec   Loss 2.1330   LearningRate 0.0314   Epoch: 8   Global Step: 146680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:48,422-Speed 3321.07 samples/sec   Loss 2.0085   LearningRate 0.0314   Epoch: 8   Global Step: 146690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:51,493-Speed 3334.75 samples/sec   Loss 2.0430   LearningRate 0.0314   Epoch: 8   Global Step: 146700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:45:54,574-Speed 3325.33 samples/sec   Loss 2.1080   LearningRate 0.0314   Epoch: 8   Global Step: 146710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:45:57,667-Speed 3310.78 samples/sec   Loss 2.0759   LearningRate 0.0314   Epoch: 8   Global Step: 146720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:00,737-Speed 3336.10 samples/sec   Loss 2.0822   LearningRate 0.0314   Epoch: 8   Global Step: 146730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:03,819-Speed 3323.27 samples/sec   Loss 2.0354   LearningRate 0.0314   Epoch: 8   Global Step: 146740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:06,900-Speed 3325.00 samples/sec   Loss 2.0066   LearningRate 0.0314   Epoch: 8   Global Step: 146750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:09,976-Speed 3328.92 samples/sec   Loss 2.0068   LearningRate 0.0314   Epoch: 8   Global Step: 146760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:13,072-Speed 3308.50 samples/sec   Loss 2.0836   LearningRate 0.0314   Epoch: 8   Global Step: 146770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:16,151-Speed 3326.93 samples/sec   Loss 2.0514   LearningRate 0.0314   Epoch: 8   Global Step: 146780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:19,326-Speed 3225.98 samples/sec   Loss 2.0525   LearningRate 0.0314   Epoch: 8   Global Step: 146790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:22,491-Speed 3235.90 samples/sec   Loss 2.0021   LearningRate 0.0314   Epoch: 8   Global Step: 146800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:25,557-Speed 3341.01 samples/sec   Loss 2.0062   LearningRate 0.0314   Epoch: 8   Global Step: 146810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:28,629-Speed 3333.99 samples/sec   Loss 1.9810   LearningRate 0.0314   Epoch: 8   Global Step: 146820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:31,700-Speed 3335.38 samples/sec   Loss 2.0968   LearningRate 0.0314   Epoch: 8   Global Step: 146830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:34,772-Speed 3333.96 samples/sec   Loss 2.0488   LearningRate 0.0314   Epoch: 8   Global Step: 146840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:37,844-Speed 3334.01 samples/sec   Loss 1.9980   LearningRate 0.0314   Epoch: 8   Global Step: 146850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:46:40,909-Speed 3341.70 samples/sec   Loss 1.9969   LearningRate 0.0314   Epoch: 8   Global Step: 146860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:46:44,008-Speed 3305.09 samples/sec   Loss 2.0261   LearningRate 0.0314   Epoch: 8   Global Step: 146870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:46:47,083-Speed 3330.87 samples/sec   Loss 2.0821   LearningRate 0.0314   Epoch: 8   Global Step: 146880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:46:50,185-Speed 3301.21 samples/sec   Loss 2.0348   LearningRate 0.0314   Epoch: 8   Global Step: 146890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:46:53,319-Speed 3268.65 samples/sec   Loss 2.0827   LearningRate 0.0314   Epoch: 8   Global Step: 146900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:46:56,402-Speed 3322.25 samples/sec   Loss 2.0881   LearningRate 0.0314   Epoch: 8   Global Step: 146910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:46:59,475-Speed 3333.01 samples/sec   Loss 2.0373   LearningRate 0.0313   Epoch: 8   Global Step: 146920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:47:02,553-Speed 3327.17 samples/sec   Loss 2.0568   LearningRate 0.0313   Epoch: 8   Global Step: 146930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:47:05,622-Speed 3337.80 samples/sec   Loss 2.1263   LearningRate 0.0313   Epoch: 8   Global Step: 146940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:47:08,700-Speed 3326.79 samples/sec   Loss 2.0154   LearningRate 0.0313   Epoch: 8   Global Step: 146950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:47:11,788-Speed 3317.86 samples/sec   Loss 2.0377   LearningRate 0.0313   Epoch: 8   Global Step: 146960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:14,857-Speed 3337.48 samples/sec   Loss 2.0607   LearningRate 0.0313   Epoch: 8   Global Step: 146970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:17,932-Speed 3330.70 samples/sec   Loss 2.0932   LearningRate 0.0313   Epoch: 8   Global Step: 146980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:21,007-Speed 3330.58 samples/sec   Loss 2.0747   LearningRate 0.0313   Epoch: 8   Global Step: 146990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:24,088-Speed 3324.19 samples/sec   Loss 2.0102   LearningRate 0.0313   Epoch: 8   Global Step: 147000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:27,219-Speed 3270.91 samples/sec   Loss 2.0437   LearningRate 0.0313   Epoch: 8   Global Step: 147010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:30,293-Speed 3331.55 samples/sec   Loss 2.0157   LearningRate 0.0313   Epoch: 8   Global Step: 147020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:33,376-Speed 3322.23 samples/sec   Loss 2.0207   LearningRate 0.0313   Epoch: 8   Global Step: 147030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:36,473-Speed 3307.86 samples/sec   Loss 2.0527   LearningRate 0.0313   Epoch: 8   Global Step: 147040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:39,550-Speed 3328.57 samples/sec   Loss 2.0451   LearningRate 0.0313   Epoch: 8   Global Step: 147050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:42,627-Speed 3329.23 samples/sec   Loss 2.0539   LearningRate 0.0313   Epoch: 8   Global Step: 147060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:45,715-Speed 3316.85 samples/sec   Loss 2.0826   LearningRate 0.0313   Epoch: 8   Global Step: 147070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:48,819-Speed 3299.12 samples/sec   Loss 1.9947   LearningRate 0.0313   Epoch: 8   Global Step: 147080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:47:51,900-Speed 3325.25 samples/sec   Loss 2.0666   LearningRate 0.0313   Epoch: 8   Global Step: 147090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:47:54,977-Speed 3328.32 samples/sec   Loss 2.0673   LearningRate 0.0313   Epoch: 8   Global Step: 147100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:47:58,056-Speed 3326.35 samples/sec   Loss 2.0351   LearningRate 0.0313   Epoch: 8   Global Step: 147110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:01,140-Speed 3320.35 samples/sec   Loss 2.0255   LearningRate 0.0313   Epoch: 8   Global Step: 147120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:04,225-Speed 3320.66 samples/sec   Loss 2.0734   LearningRate 0.0313   Epoch: 8   Global Step: 147130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:07,315-Speed 3314.80 samples/sec   Loss 2.0253   LearningRate 0.0313   Epoch: 8   Global Step: 147140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:10,386-Speed 3335.17 samples/sec   Loss 1.9968   LearningRate 0.0313   Epoch: 8   Global Step: 147150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:13,509-Speed 3279.49 samples/sec   Loss 2.0208   LearningRate 0.0313   Epoch: 8   Global Step: 147160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:16,581-Speed 3334.78 samples/sec   Loss 2.0506   LearningRate 0.0313   Epoch: 8   Global Step: 147170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:19,657-Speed 3329.60 samples/sec   Loss 2.0246   LearningRate 0.0313   Epoch: 8   Global Step: 147180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:22,734-Speed 3328.64 samples/sec   Loss 2.0884   LearningRate 0.0313   Epoch: 8   Global Step: 147190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:48:25,813-Speed 3326.04 samples/sec   Loss 2.0167   LearningRate 0.0313   Epoch: 8   Global Step: 147200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:48:28,898-Speed 3320.24 samples/sec   Loss 2.0387   LearningRate 0.0312   Epoch: 8   Global Step: 147210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:48:31,986-Speed 3316.78 samples/sec   Loss 2.0741   LearningRate 0.0312   Epoch: 8   Global Step: 147220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:48:35,064-Speed 3328.40 samples/sec   Loss 2.0754   LearningRate 0.0312   Epoch: 8   Global Step: 147230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:48:38,151-Speed 3317.54 samples/sec   Loss 2.0665   LearningRate 0.0312   Epoch: 8   Global Step: 147240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:48:41,241-Speed 3314.27 samples/sec   Loss 2.1209   LearningRate 0.0312   Epoch: 8   Global Step: 147250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:48:44,311-Speed 3336.32 samples/sec   Loss 2.0476   LearningRate 0.0312   Epoch: 8   Global Step: 147260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:47,388-Speed 3329.17 samples/sec   Loss 2.1212   LearningRate 0.0312   Epoch: 8   Global Step: 147270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:50,463-Speed 3330.61 samples/sec   Loss 2.0786   LearningRate 0.0312   Epoch: 8   Global Step: 147280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:53,549-Speed 3318.54 samples/sec   Loss 2.0627   LearningRate 0.0312   Epoch: 8   Global Step: 147290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:56,626-Speed 3329.05 samples/sec   Loss 2.0542   LearningRate 0.0312   Epoch: 8   Global Step: 147300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:48:59,731-Speed 3299.04 samples/sec   Loss 1.9436   LearningRate 0.0312   Epoch: 8   Global Step: 147310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:49:02,815-Speed 3320.79 samples/sec   Loss 2.1083   LearningRate 0.0312   Epoch: 8   Global Step: 147320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:49:05,897-Speed 3323.86 samples/sec   Loss 2.0356   LearningRate 0.0312   Epoch: 8   Global Step: 147330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:49:08,973-Speed 3329.29 samples/sec   Loss 2.0538   LearningRate 0.0312   Epoch: 8   Global Step: 147340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:49:12,053-Speed 3325.40 samples/sec   Loss 2.0022   LearningRate 0.0312   Epoch: 8   Global Step: 147350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:49:15,148-Speed 3309.43 samples/sec   Loss 2.0171   LearningRate 0.0312   Epoch: 8   Global Step: 147360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:18,253-Speed 3298.34 samples/sec   Loss 2.0396   LearningRate 0.0312   Epoch: 8   Global Step: 147370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:21,345-Speed 3312.41 samples/sec   Loss 2.0729   LearningRate 0.0312   Epoch: 8   Global Step: 147380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:24,433-Speed 3316.86 samples/sec   Loss 2.1010   LearningRate 0.0312   Epoch: 8   Global Step: 147390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:27,514-Speed 3325.21 samples/sec   Loss 2.0026   LearningRate 0.0312   Epoch: 8   Global Step: 147400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:30,586-Speed 3333.57 samples/sec   Loss 2.0542   LearningRate 0.0312   Epoch: 8   Global Step: 147410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:33,659-Speed 3333.64 samples/sec   Loss 2.0545   LearningRate 0.0312   Epoch: 8   Global Step: 147420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:36,736-Speed 3328.69 samples/sec   Loss 2.0749   LearningRate 0.0312   Epoch: 8   Global Step: 147430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:39,827-Speed 3313.89 samples/sec   Loss 2.0414   LearningRate 0.0312   Epoch: 8   Global Step: 147440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:42,905-Speed 3326.58 samples/sec   Loss 2.0880   LearningRate 0.0312   Epoch: 8   Global Step: 147450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:45,971-Speed 3340.91 samples/sec   Loss 2.0573   LearningRate 0.0312   Epoch: 8   Global Step: 147460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:49:49,035-Speed 3342.32 samples/sec   Loss 2.0214   LearningRate 0.0312   Epoch: 8   Global Step: 147470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:49:52,121-Speed 3318.94 samples/sec   Loss 2.0199   LearningRate 0.0312   Epoch: 8   Global Step: 147480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:49:55,224-Speed 3301.90 samples/sec   Loss 2.0519   LearningRate 0.0312   Epoch: 8   Global Step: 147490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:49:58,364-Speed 3261.07 samples/sec   Loss 2.0388   LearningRate 0.0312   Epoch: 8   Global Step: 147500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:01,451-Speed 3317.73 samples/sec   Loss 1.9403   LearningRate 0.0311   Epoch: 8   Global Step: 147510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:04,539-Speed 3317.63 samples/sec   Loss 2.0569   LearningRate 0.0311   Epoch: 8   Global Step: 147520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:07,649-Speed 3293.32 samples/sec   Loss 1.9911   LearningRate 0.0311   Epoch: 8   Global Step: 147530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:10,727-Speed 3327.51 samples/sec   Loss 2.0801   LearningRate 0.0311   Epoch: 8   Global Step: 147540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:13,809-Speed 3322.52 samples/sec   Loss 2.0463   LearningRate 0.0311   Epoch: 8   Global Step: 147550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:16,900-Speed 3314.20 samples/sec   Loss 2.0845   LearningRate 0.0311   Epoch: 8   Global Step: 147560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:19,980-Speed 3325.10 samples/sec   Loss 2.0370   LearningRate 0.0311   Epoch: 8   Global Step: 147570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:50:23,089-Speed 3294.45 samples/sec   Loss 2.1298   LearningRate 0.0311   Epoch: 8   Global Step: 147580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:26,165-Speed 3330.49 samples/sec   Loss 2.0335   LearningRate 0.0311   Epoch: 8   Global Step: 147590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:29,247-Speed 3323.12 samples/sec   Loss 2.0904   LearningRate 0.0311   Epoch: 8   Global Step: 147600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:32,332-Speed 3319.39 samples/sec   Loss 2.1127   LearningRate 0.0311   Epoch: 8   Global Step: 147610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:35,410-Speed 3327.63 samples/sec   Loss 2.0284   LearningRate 0.0311   Epoch: 8   Global Step: 147620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:38,498-Speed 3317.53 samples/sec   Loss 2.0194   LearningRate 0.0311   Epoch: 8   Global Step: 147630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:41,582-Speed 3320.54 samples/sec   Loss 2.0472   LearningRate 0.0311   Epoch: 8   Global Step: 147640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:44,667-Speed 3319.67 samples/sec   Loss 2.0686   LearningRate 0.0311   Epoch: 8   Global Step: 147650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:47,763-Speed 3308.79 samples/sec   Loss 2.0092   LearningRate 0.0311   Epoch: 8   Global Step: 147660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:50,842-Speed 3326.75 samples/sec   Loss 2.0083   LearningRate 0.0311   Epoch: 8   Global Step: 147670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:53,913-Speed 3335.47 samples/sec   Loss 2.0516   LearningRate 0.0311   Epoch: 8   Global Step: 147680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:50:56,993-Speed 3324.95 samples/sec   Loss 2.0802   LearningRate 0.0311   Epoch: 8   Global Step: 147690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:51:00,098-Speed 3298.95 samples/sec   Loss 2.0461   LearningRate 0.0311   Epoch: 8   Global Step: 147700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:51:03,198-Speed 3303.61 samples/sec   Loss 1.9729   LearningRate 0.0311   Epoch: 8   Global Step: 147710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:51:06,281-Speed 3322.69 samples/sec   Loss 2.0129   LearningRate 0.0311   Epoch: 8   Global Step: 147720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:51:09,360-Speed 3326.77 samples/sec   Loss 2.0150   LearningRate 0.0311   Epoch: 8   Global Step: 147730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:51:12,448-Speed 3316.09 samples/sec   Loss 2.0085   LearningRate 0.0311   Epoch: 8   Global Step: 147740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:51:15,543-Speed 3309.69 samples/sec   Loss 2.0627   LearningRate 0.0311   Epoch: 8   Global Step: 147750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:51:18,631-Speed 3317.09 samples/sec   Loss 2.0306   LearningRate 0.0311   Epoch: 8   Global Step: 147760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:51:21,792-Speed 3239.82 samples/sec   Loss 2.0366   LearningRate 0.0311   Epoch: 8   Global Step: 147770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:51:24,872-Speed 3325.65 samples/sec   Loss 2.0431   LearningRate 0.0311   Epoch: 8   Global Step: 147780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:27,961-Speed 3315.89 samples/sec   Loss 2.0739   LearningRate 0.0311   Epoch: 8   Global Step: 147790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:31,039-Speed 3327.29 samples/sec   Loss 2.0286   LearningRate 0.0311   Epoch: 8   Global Step: 147800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:34,119-Speed 3325.88 samples/sec   Loss 2.0367   LearningRate 0.0310   Epoch: 8   Global Step: 147810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:37,203-Speed 3321.30 samples/sec   Loss 2.0113   LearningRate 0.0310   Epoch: 8   Global Step: 147820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:40,282-Speed 3326.55 samples/sec   Loss 2.0467   LearningRate 0.0310   Epoch: 8   Global Step: 147830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:43,363-Speed 3323.70 samples/sec   Loss 2.0217   LearningRate 0.0310   Epoch: 8   Global Step: 147840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:46,439-Speed 3330.26 samples/sec   Loss 2.0367   LearningRate 0.0310   Epoch: 8   Global Step: 147850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:49,525-Speed 3319.99 samples/sec   Loss 2.0869   LearningRate 0.0310   Epoch: 8   Global Step: 147860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:52,621-Speed 3307.69 samples/sec   Loss 2.0162   LearningRate 0.0310   Epoch: 8   Global Step: 147870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:55,684-Speed 3343.95 samples/sec   Loss 2.0466   LearningRate 0.0310   Epoch: 8   Global Step: 147880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:51:58,760-Speed 3329.09 samples/sec   Loss 1.9899   LearningRate 0.0310   Epoch: 8   Global Step: 147890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:52:01,849-Speed 3316.02 samples/sec   Loss 2.0554   LearningRate 0.0310   Epoch: 8   Global Step: 147900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:52:04,928-Speed 3326.44 samples/sec   Loss 2.0326   LearningRate 0.0310   Epoch: 8   Global Step: 147910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:52:08,018-Speed 3314.96 samples/sec   Loss 2.0810   LearningRate 0.0310   Epoch: 8   Global Step: 147920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:52:11,101-Speed 3322.25 samples/sec   Loss 2.0592   LearningRate 0.0310   Epoch: 8   Global Step: 147930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:52:14,183-Speed 3322.84 samples/sec   Loss 1.9841   LearningRate 0.0310   Epoch: 8   Global Step: 147940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:52:17,287-Speed 3300.16 samples/sec   Loss 2.0366   LearningRate 0.0310   Epoch: 8   Global Step: 147950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:52:20,361-Speed 3332.54 samples/sec   Loss 2.0153   LearningRate 0.0310   Epoch: 8   Global Step: 147960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:52:23,448-Speed 3317.63 samples/sec   Loss 2.0127   LearningRate 0.0310   Epoch: 8   Global Step: 147970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:52:26,524-Speed 3329.28 samples/sec   Loss 1.9916   LearningRate 0.0310   Epoch: 8   Global Step: 147980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:52:29,657-Speed 3269.55 samples/sec   Loss 2.0120   LearningRate 0.0310   Epoch: 8   Global Step: 147990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:52:32,846-Speed 3211.67 samples/sec   Loss 2.0431   LearningRate 0.0310   Epoch: 8   Global Step: 148000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:53:17,249-[lfw][148000]XNorm: 21.490949
Training: 2022-04-11 14:53:17,250-[lfw][148000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-11 14:53:17,250-[lfw][148000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:54:08,579-[cfp_fp][148000]XNorm: 20.830298
Training: 2022-04-11 14:54:08,579-[cfp_fp][148000]Accuracy-Flip: 0.98814+-0.00495
Training: 2022-04-11 14:54:08,580-[cfp_fp][148000]Accuracy-Highest: 0.98814
Training: 2022-04-11 14:54:52,731-[agedb_30][148000]XNorm: 21.667683
Training: 2022-04-11 14:54:52,731-[agedb_30][148000]Accuracy-Flip: 0.98283+-0.00679
Training: 2022-04-11 14:54:52,732-[agedb_30][148000]Accuracy-Highest: 0.98317
Training: 2022-04-11 14:54:55,819-Speed 71.62 samples/sec   Loss 1.9786   LearningRate 0.0310   Epoch: 8   Global Step: 148010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:54:58,880-Speed 3346.40 samples/sec   Loss 2.0399   LearningRate 0.0310   Epoch: 8   Global Step: 148020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:01,958-Speed 3328.08 samples/sec   Loss 2.0168   LearningRate 0.0310   Epoch: 8   Global Step: 148030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:05,023-Speed 3341.08 samples/sec   Loss 2.0809   LearningRate 0.0310   Epoch: 8   Global Step: 148040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:08,108-Speed 3319.97 samples/sec   Loss 2.0251   LearningRate 0.0310   Epoch: 8   Global Step: 148050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:11,178-Speed 3336.61 samples/sec   Loss 2.0496   LearningRate 0.0310   Epoch: 8   Global Step: 148060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:55:14,248-Speed 3336.48 samples/sec   Loss 1.9841   LearningRate 0.0310   Epoch: 8   Global Step: 148070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:55:17,317-Speed 3336.93 samples/sec   Loss 2.0156   LearningRate 0.0310   Epoch: 8   Global Step: 148080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:55:20,395-Speed 3327.38 samples/sec   Loss 2.0567   LearningRate 0.0310   Epoch: 8   Global Step: 148090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:55:23,467-Speed 3334.84 samples/sec   Loss 2.1223   LearningRate 0.0310   Epoch: 8   Global Step: 148100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:55:26,539-Speed 3333.67 samples/sec   Loss 2.0144   LearningRate 0.0309   Epoch: 8   Global Step: 148110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:55:29,609-Speed 3336.63 samples/sec   Loss 1.9296   LearningRate 0.0309   Epoch: 8   Global Step: 148120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:32,696-Speed 3317.48 samples/sec   Loss 2.0890   LearningRate 0.0309   Epoch: 8   Global Step: 148130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:35,770-Speed 3332.13 samples/sec   Loss 2.0438   LearningRate 0.0309   Epoch: 8   Global Step: 148140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:38,852-Speed 3324.10 samples/sec   Loss 2.0225   LearningRate 0.0309   Epoch: 8   Global Step: 148150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:41,940-Speed 3316.38 samples/sec   Loss 2.0045   LearningRate 0.0309   Epoch: 8   Global Step: 148160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:45,027-Speed 3317.67 samples/sec   Loss 2.0634   LearningRate 0.0309   Epoch: 8   Global Step: 148170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:48,127-Speed 3304.21 samples/sec   Loss 2.0341   LearningRate 0.0309   Epoch: 8   Global Step: 148180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:51,224-Speed 3306.99 samples/sec   Loss 2.0715   LearningRate 0.0309   Epoch: 8   Global Step: 148190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:54,296-Speed 3334.66 samples/sec   Loss 2.0701   LearningRate 0.0309   Epoch: 8   Global Step: 148200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:55:57,385-Speed 3315.65 samples/sec   Loss 1.9885   LearningRate 0.0309   Epoch: 8   Global Step: 148210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:56:00,457-Speed 3334.18 samples/sec   Loss 2.0790   LearningRate 0.0309   Epoch: 8   Global Step: 148220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:03,536-Speed 3325.83 samples/sec   Loss 1.9908   LearningRate 0.0309   Epoch: 8   Global Step: 148230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:06,607-Speed 3336.42 samples/sec   Loss 2.0684   LearningRate 0.0309   Epoch: 8   Global Step: 148240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:09,679-Speed 3333.02 samples/sec   Loss 2.0605   LearningRate 0.0309   Epoch: 8   Global Step: 148250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:12,749-Speed 3336.61 samples/sec   Loss 2.0273   LearningRate 0.0309   Epoch: 8   Global Step: 148260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:15,820-Speed 3335.47 samples/sec   Loss 2.0865   LearningRate 0.0309   Epoch: 8   Global Step: 148270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:18,915-Speed 3308.70 samples/sec   Loss 1.9915   LearningRate 0.0309   Epoch: 8   Global Step: 148280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:21,987-Speed 3334.18 samples/sec   Loss 2.0772   LearningRate 0.0309   Epoch: 8   Global Step: 148290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:25,058-Speed 3335.80 samples/sec   Loss 1.9888   LearningRate 0.0309   Epoch: 8   Global Step: 148300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:28,148-Speed 3314.38 samples/sec   Loss 2.0520   LearningRate 0.0309   Epoch: 8   Global Step: 148310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:56:31,276-Speed 3274.67 samples/sec   Loss 2.0135   LearningRate 0.0309   Epoch: 8   Global Step: 148320   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-11 14:56:34,333-Speed 3349.91 samples/sec   Loss 2.0789   LearningRate 0.0309   Epoch: 8   Global Step: 148330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:56:37,436-Speed 3301.85 samples/sec   Loss 2.0255   LearningRate 0.0309   Epoch: 8   Global Step: 148340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:56:40,541-Speed 3298.53 samples/sec   Loss 2.0430   LearningRate 0.0309   Epoch: 8   Global Step: 148350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:56:43,636-Speed 3308.57 samples/sec   Loss 2.0637   LearningRate 0.0309   Epoch: 8   Global Step: 148360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:56:46,751-Speed 3288.42 samples/sec   Loss 1.9999   LearningRate 0.0309   Epoch: 8   Global Step: 148370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:56:49,870-Speed 3284.07 samples/sec   Loss 2.0780   LearningRate 0.0309   Epoch: 8   Global Step: 148380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:56:53,033-Speed 3237.77 samples/sec   Loss 2.1053   LearningRate 0.0309   Epoch: 8   Global Step: 148390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:56:56,138-Speed 3299.04 samples/sec   Loss 1.9978   LearningRate 0.0309   Epoch: 8   Global Step: 148400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:56:59,209-Speed 3335.36 samples/sec   Loss 1.9895   LearningRate 0.0308   Epoch: 8   Global Step: 148410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:57:02,316-Speed 3296.42 samples/sec   Loss 2.0144   LearningRate 0.0308   Epoch: 8   Global Step: 148420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:57:05,403-Speed 3318.50 samples/sec   Loss 2.0571   LearningRate 0.0308   Epoch: 8   Global Step: 148430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:08,473-Speed 3335.57 samples/sec   Loss 2.0460   LearningRate 0.0308   Epoch: 8   Global Step: 148440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:11,580-Speed 3297.10 samples/sec   Loss 2.0391   LearningRate 0.0308   Epoch: 8   Global Step: 148450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:14,655-Speed 3330.91 samples/sec   Loss 2.0159   LearningRate 0.0308   Epoch: 8   Global Step: 148460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:17,733-Speed 3326.85 samples/sec   Loss 2.0925   LearningRate 0.0308   Epoch: 8   Global Step: 148470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:20,836-Speed 3300.82 samples/sec   Loss 2.0257   LearningRate 0.0308   Epoch: 8   Global Step: 148480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:23,932-Speed 3308.47 samples/sec   Loss 2.0830   LearningRate 0.0308   Epoch: 8   Global Step: 148490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:27,016-Speed 3321.21 samples/sec   Loss 2.0259   LearningRate 0.0308   Epoch: 8   Global Step: 148500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:30,095-Speed 3326.70 samples/sec   Loss 2.0053   LearningRate 0.0308   Epoch: 8   Global Step: 148510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:33,176-Speed 3324.47 samples/sec   Loss 2.0814   LearningRate 0.0308   Epoch: 8   Global Step: 148520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:36,252-Speed 3330.31 samples/sec   Loss 2.0345   LearningRate 0.0308   Epoch: 8   Global Step: 148530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:57:39,332-Speed 3324.74 samples/sec   Loss 2.1077   LearningRate 0.0308   Epoch: 8   Global Step: 148540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:57:42,409-Speed 3328.26 samples/sec   Loss 2.0205   LearningRate 0.0308   Epoch: 8   Global Step: 148550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:57:45,484-Speed 3330.79 samples/sec   Loss 2.0377   LearningRate 0.0308   Epoch: 8   Global Step: 148560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:57:48,557-Speed 3333.05 samples/sec   Loss 1.9562   LearningRate 0.0308   Epoch: 8   Global Step: 148570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:57:51,739-Speed 3218.76 samples/sec   Loss 2.1073   LearningRate 0.0308   Epoch: 8   Global Step: 148580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:57:54,986-Speed 3154.56 samples/sec   Loss 2.1033   LearningRate 0.0308   Epoch: 8   Global Step: 148590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:57:58,099-Speed 3290.60 samples/sec   Loss 2.0062   LearningRate 0.0308   Epoch: 8   Global Step: 148600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:58:01,183-Speed 3321.95 samples/sec   Loss 2.0295   LearningRate 0.0308   Epoch: 8   Global Step: 148610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:58:04,288-Speed 3297.83 samples/sec   Loss 2.0523   LearningRate 0.0308   Epoch: 8   Global Step: 148620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:58:07,363-Speed 3330.94 samples/sec   Loss 2.0169   LearningRate 0.0308   Epoch: 8   Global Step: 148630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:58:10,450-Speed 3318.38 samples/sec   Loss 2.1267   LearningRate 0.0308   Epoch: 8   Global Step: 148640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:13,544-Speed 3309.66 samples/sec   Loss 1.9838   LearningRate 0.0308   Epoch: 8   Global Step: 148650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:16,616-Speed 3334.27 samples/sec   Loss 2.0743   LearningRate 0.0308   Epoch: 8   Global Step: 148660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:19,726-Speed 3293.35 samples/sec   Loss 2.0886   LearningRate 0.0308   Epoch: 8   Global Step: 148670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:22,809-Speed 3323.08 samples/sec   Loss 2.0756   LearningRate 0.0308   Epoch: 8   Global Step: 148680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:25,879-Speed 3336.07 samples/sec   Loss 1.9950   LearningRate 0.0308   Epoch: 8   Global Step: 148690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:28,949-Speed 3336.63 samples/sec   Loss 2.0043   LearningRate 0.0308   Epoch: 8   Global Step: 148700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:32,022-Speed 3332.40 samples/sec   Loss 2.0760   LearningRate 0.0307   Epoch: 8   Global Step: 148710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:35,106-Speed 3320.86 samples/sec   Loss 1.9368   LearningRate 0.0307   Epoch: 8   Global Step: 148720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:38,190-Speed 3321.19 samples/sec   Loss 1.9785   LearningRate 0.0307   Epoch: 8   Global Step: 148730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:41,252-Speed 3345.43 samples/sec   Loss 1.9716   LearningRate 0.0307   Epoch: 8   Global Step: 148740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:44,366-Speed 3288.60 samples/sec   Loss 2.0294   LearningRate 0.0307   Epoch: 8   Global Step: 148750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:47,444-Speed 3328.74 samples/sec   Loss 2.0391   LearningRate 0.0307   Epoch: 8   Global Step: 148760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:50,520-Speed 3329.92 samples/sec   Loss 1.9906   LearningRate 0.0307   Epoch: 8   Global Step: 148770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:53,604-Speed 3320.56 samples/sec   Loss 2.0282   LearningRate 0.0307   Epoch: 8   Global Step: 148780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:56,681-Speed 3329.40 samples/sec   Loss 2.0277   LearningRate 0.0307   Epoch: 8   Global Step: 148790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:58:59,780-Speed 3305.10 samples/sec   Loss 2.0219   LearningRate 0.0307   Epoch: 8   Global Step: 148800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:02,872-Speed 3312.05 samples/sec   Loss 2.0091   LearningRate 0.0307   Epoch: 8   Global Step: 148810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:05,973-Speed 3303.43 samples/sec   Loss 2.0429   LearningRate 0.0307   Epoch: 8   Global Step: 148820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:09,070-Speed 3306.69 samples/sec   Loss 2.0715   LearningRate 0.0307   Epoch: 8   Global Step: 148830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:12,129-Speed 3347.95 samples/sec   Loss 2.0379   LearningRate 0.0307   Epoch: 8   Global Step: 148840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:15,210-Speed 3324.71 samples/sec   Loss 2.1013   LearningRate 0.0307   Epoch: 8   Global Step: 148850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:18,286-Speed 3329.80 samples/sec   Loss 2.0169   LearningRate 0.0307   Epoch: 8   Global Step: 148860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:21,360-Speed 3332.50 samples/sec   Loss 1.9948   LearningRate 0.0307   Epoch: 8   Global Step: 148870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:24,509-Speed 3252.16 samples/sec   Loss 2.0076   LearningRate 0.0307   Epoch: 8   Global Step: 148880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:27,736-Speed 3174.01 samples/sec   Loss 2.0141   LearningRate 0.0307   Epoch: 8   Global Step: 148890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:30,815-Speed 3326.89 samples/sec   Loss 2.0299   LearningRate 0.0307   Epoch: 8   Global Step: 148900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 14:59:33,910-Speed 3308.58 samples/sec   Loss 1.9946   LearningRate 0.0307   Epoch: 8   Global Step: 148910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:59:36,982-Speed 3334.42 samples/sec   Loss 2.0301   LearningRate 0.0307   Epoch: 8   Global Step: 148920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:59:40,066-Speed 3321.77 samples/sec   Loss 2.0185   LearningRate 0.0307   Epoch: 8   Global Step: 148930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:59:43,142-Speed 3330.07 samples/sec   Loss 2.0027   LearningRate 0.0307   Epoch: 8   Global Step: 148940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:59:46,276-Speed 3268.34 samples/sec   Loss 1.9951   LearningRate 0.0307   Epoch: 8   Global Step: 148950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:59:49,359-Speed 3322.34 samples/sec   Loss 1.9583   LearningRate 0.0307   Epoch: 8   Global Step: 148960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:59:52,433-Speed 3331.85 samples/sec   Loss 2.0325   LearningRate 0.0307   Epoch: 8   Global Step: 148970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:59:55,512-Speed 3326.42 samples/sec   Loss 2.0392   LearningRate 0.0307   Epoch: 8   Global Step: 148980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 14:59:58,600-Speed 3316.81 samples/sec   Loss 2.0359   LearningRate 0.0307   Epoch: 8   Global Step: 148990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:00:01,697-Speed 3307.17 samples/sec   Loss 1.9940   LearningRate 0.0307   Epoch: 8   Global Step: 149000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:00:04,773-Speed 3328.90 samples/sec   Loss 2.0414   LearningRate 0.0306   Epoch: 8   Global Step: 149010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:07,853-Speed 3326.33 samples/sec   Loss 1.9355   LearningRate 0.0306   Epoch: 8   Global Step: 149020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:10,933-Speed 3325.13 samples/sec   Loss 2.0255   LearningRate 0.0306   Epoch: 8   Global Step: 149030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:14,021-Speed 3316.72 samples/sec   Loss 2.0423   LearningRate 0.0306   Epoch: 8   Global Step: 149040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:17,122-Speed 3303.39 samples/sec   Loss 2.0686   LearningRate 0.0306   Epoch: 8   Global Step: 149050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:20,193-Speed 3335.19 samples/sec   Loss 2.0066   LearningRate 0.0306   Epoch: 8   Global Step: 149060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:23,269-Speed 3329.66 samples/sec   Loss 2.0286   LearningRate 0.0306   Epoch: 8   Global Step: 149070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:26,347-Speed 3327.87 samples/sec   Loss 2.0139   LearningRate 0.0306   Epoch: 8   Global Step: 149080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:29,429-Speed 3323.65 samples/sec   Loss 1.9894   LearningRate 0.0306   Epoch: 8   Global Step: 149090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:32,505-Speed 3329.07 samples/sec   Loss 2.0639   LearningRate 0.0306   Epoch: 8   Global Step: 149100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:35,672-Speed 3234.19 samples/sec   Loss 2.0810   LearningRate 0.0306   Epoch: 8   Global Step: 149110   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-11 15:00:38,746-Speed 3332.34 samples/sec   Loss 2.0411   LearningRate 0.0306   Epoch: 8   Global Step: 149120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:00:41,895-Speed 3252.86 samples/sec   Loss 2.0499   LearningRate 0.0306   Epoch: 8   Global Step: 149130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:00:45,031-Speed 3265.72 samples/sec   Loss 2.0499   LearningRate 0.0306   Epoch: 8   Global Step: 149140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:00:48,116-Speed 3325.09 samples/sec   Loss 2.0150   LearningRate 0.0306   Epoch: 8   Global Step: 149150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:00:51,292-Speed 3224.99 samples/sec   Loss 2.0323   LearningRate 0.0306   Epoch: 8   Global Step: 149160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:00:54,383-Speed 3312.72 samples/sec   Loss 2.0179   LearningRate 0.0306   Epoch: 8   Global Step: 149170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:00:57,470-Speed 3318.24 samples/sec   Loss 2.0204   LearningRate 0.0306   Epoch: 8   Global Step: 149180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:01:00,564-Speed 3310.57 samples/sec   Loss 1.9679   LearningRate 0.0306   Epoch: 8   Global Step: 149190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:01:03,642-Speed 3327.90 samples/sec   Loss 2.0019   LearningRate 0.0306   Epoch: 8   Global Step: 149200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:01:06,745-Speed 3300.44 samples/sec   Loss 2.0507   LearningRate 0.0306   Epoch: 8   Global Step: 149210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:01:09,820-Speed 3331.16 samples/sec   Loss 2.0451   LearningRate 0.0306   Epoch: 8   Global Step: 149220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-11 15:01:12,917-Speed 3306.81 samples/sec   Loss 2.0684   LearningRate 0.0306   Epoch: 8   Global Step: 149230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-11 15:01:16,099-Speed 3219.67 samples/sec   Loss 2.0413   LearningRate 0.0306   Epoch: 8   Global Step: 149240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:01:19,258-Speed 3241.46 samples/sec   Loss 1.9586   LearningRate 0.0306   Epoch: 8   Global Step: 149250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:01:22,338-Speed 3325.25 samples/sec   Loss 1.9697   LearningRate 0.0306   Epoch: 8   Global Step: 149260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:01:25,413-Speed 3330.68 samples/sec   Loss 2.0080   LearningRate 0.0306   Epoch: 8   Global Step: 149270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:01:28,489-Speed 3330.99 samples/sec   Loss 2.0365   LearningRate 0.0306   Epoch: 8   Global Step: 149280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:01:31,562-Speed 3333.10 samples/sec   Loss 2.0319   LearningRate 0.0306   Epoch: 8   Global Step: 149290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:01:34,625-Speed 3343.44 samples/sec   Loss 2.0867   LearningRate 0.0306   Epoch: 8   Global Step: 149300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:01:37,696-Speed 3335.45 samples/sec   Loss 2.0769   LearningRate 0.0306   Epoch: 8   Global Step: 149310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:01:40,784-Speed 3316.93 samples/sec   Loss 2.0183   LearningRate 0.0305   Epoch: 8   Global Step: 149320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:01:43,871-Speed 3317.31 samples/sec   Loss 1.9777   LearningRate 0.0305   Epoch: 8   Global Step: 149330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:01:46,953-Speed 3323.82 samples/sec   Loss 2.1151   LearningRate 0.0305   Epoch: 8   Global Step: 149340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:01:50,034-Speed 3324.35 samples/sec   Loss 2.0511   LearningRate 0.0305   Epoch: 8   Global Step: 149350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:01:53,113-Speed 3326.56 samples/sec   Loss 2.0654   LearningRate 0.0305   Epoch: 8   Global Step: 149360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:01:56,183-Speed 3336.61 samples/sec   Loss 2.0183   LearningRate 0.0305   Epoch: 8   Global Step: 149370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:01:59,291-Speed 3294.86 samples/sec   Loss 2.0613   LearningRate 0.0305   Epoch: 8   Global Step: 149380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:02,365-Speed 3332.08 samples/sec   Loss 2.0223   LearningRate 0.0305   Epoch: 8   Global Step: 149390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:05,438-Speed 3333.02 samples/sec   Loss 1.9925   LearningRate 0.0305   Epoch: 8   Global Step: 149400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:02:08,516-Speed 3327.39 samples/sec   Loss 2.0071   LearningRate 0.0305   Epoch: 8   Global Step: 149410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:02:11,590-Speed 3331.78 samples/sec   Loss 1.9891   LearningRate 0.0305   Epoch: 8   Global Step: 149420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:02:14,662-Speed 3334.97 samples/sec   Loss 2.0484   LearningRate 0.0305   Epoch: 8   Global Step: 149430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:02:17,738-Speed 3329.45 samples/sec   Loss 2.0437   LearningRate 0.0305   Epoch: 8   Global Step: 149440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:02:20,834-Speed 3308.90 samples/sec   Loss 1.9947   LearningRate 0.0305   Epoch: 8   Global Step: 149450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:02:23,917-Speed 3322.08 samples/sec   Loss 2.0130   LearningRate 0.0305   Epoch: 8   Global Step: 149460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:02:26,997-Speed 3325.39 samples/sec   Loss 2.0108   LearningRate 0.0305   Epoch: 8   Global Step: 149470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:02:30,070-Speed 3332.95 samples/sec   Loss 2.0165   LearningRate 0.0305   Epoch: 8   Global Step: 149480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:02:33,154-Speed 3321.12 samples/sec   Loss 2.0485   LearningRate 0.0305   Epoch: 8   Global Step: 149490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:36,240-Speed 3318.53 samples/sec   Loss 1.9934   LearningRate 0.0305   Epoch: 8   Global Step: 149500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:39,318-Speed 3327.65 samples/sec   Loss 2.0539   LearningRate 0.0305   Epoch: 8   Global Step: 149510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:42,396-Speed 3327.63 samples/sec   Loss 2.0070   LearningRate 0.0305   Epoch: 8   Global Step: 149520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:45,468-Speed 3334.12 samples/sec   Loss 2.0559   LearningRate 0.0305   Epoch: 8   Global Step: 149530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:48,541-Speed 3333.04 samples/sec   Loss 1.9762   LearningRate 0.0305   Epoch: 8   Global Step: 149540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:51,620-Speed 3326.65 samples/sec   Loss 1.9817   LearningRate 0.0305   Epoch: 8   Global Step: 149550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:54,694-Speed 3332.01 samples/sec   Loss 1.9896   LearningRate 0.0305   Epoch: 8   Global Step: 149560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:02:57,805-Speed 3293.05 samples/sec   Loss 1.9570   LearningRate 0.0305   Epoch: 8   Global Step: 149570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:03:00,880-Speed 3330.61 samples/sec   Loss 2.0249   LearningRate 0.0305   Epoch: 8   Global Step: 149580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:03:03,972-Speed 3312.62 samples/sec   Loss 2.0235   LearningRate 0.0305   Epoch: 8   Global Step: 149590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:07,056-Speed 3320.35 samples/sec   Loss 1.9963   LearningRate 0.0305   Epoch: 8   Global Step: 149600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:10,924-Speed 2648.01 samples/sec   Loss 2.0222   LearningRate 0.0305   Epoch: 8   Global Step: 149610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:14,023-Speed 3305.05 samples/sec   Loss 1.9670   LearningRate 0.0304   Epoch: 8   Global Step: 149620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:17,150-Speed 3275.94 samples/sec   Loss 2.0661   LearningRate 0.0304   Epoch: 8   Global Step: 149630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:20,282-Speed 3270.32 samples/sec   Loss 1.9980   LearningRate 0.0304   Epoch: 8   Global Step: 149640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:23,353-Speed 3334.94 samples/sec   Loss 1.9615   LearningRate 0.0304   Epoch: 8   Global Step: 149650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:26,442-Speed 3316.08 samples/sec   Loss 2.0443   LearningRate 0.0304   Epoch: 8   Global Step: 149660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:29,530-Speed 3317.19 samples/sec   Loss 1.9849   LearningRate 0.0304   Epoch: 8   Global Step: 149670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:32,658-Speed 3273.64 samples/sec   Loss 2.0525   LearningRate 0.0304   Epoch: 8   Global Step: 149680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:35,756-Speed 3306.00 samples/sec   Loss 1.9724   LearningRate 0.0304   Epoch: 8   Global Step: 149690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:38,835-Speed 3327.06 samples/sec   Loss 1.9635   LearningRate 0.0304   Epoch: 8   Global Step: 149700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:41,945-Speed 3292.95 samples/sec   Loss 2.0726   LearningRate 0.0304   Epoch: 8   Global Step: 149710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:45,029-Speed 3320.85 samples/sec   Loss 2.0289   LearningRate 0.0304   Epoch: 8   Global Step: 149720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:48,131-Speed 3302.31 samples/sec   Loss 2.0143   LearningRate 0.0304   Epoch: 8   Global Step: 149730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:51,206-Speed 3331.42 samples/sec   Loss 2.0181   LearningRate 0.0304   Epoch: 8   Global Step: 149740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:54,304-Speed 3306.04 samples/sec   Loss 1.9807   LearningRate 0.0304   Epoch: 8   Global Step: 149750   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:03:57,382-Speed 3327.56 samples/sec   Loss 2.0221   LearningRate 0.0304   Epoch: 8   Global Step: 149760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:04:00,516-Speed 3267.29 samples/sec   Loss 1.9835   LearningRate 0.0304   Epoch: 8   Global Step: 149770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:04:03,616-Speed 3304.82 samples/sec   Loss 1.9895   LearningRate 0.0304   Epoch: 8   Global Step: 149780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:04:06,691-Speed 3330.07 samples/sec   Loss 1.9980   LearningRate 0.0304   Epoch: 8   Global Step: 149790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:04:09,797-Speed 3297.60 samples/sec   Loss 2.0135   LearningRate 0.0304   Epoch: 8   Global Step: 149800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:04:12,872-Speed 3330.70 samples/sec   Loss 2.0770   LearningRate 0.0304   Epoch: 8   Global Step: 149810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:15,958-Speed 3319.79 samples/sec   Loss 2.0034   LearningRate 0.0304   Epoch: 8   Global Step: 149820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:19,037-Speed 3326.52 samples/sec   Loss 2.0595   LearningRate 0.0304   Epoch: 8   Global Step: 149830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:22,120-Speed 3322.40 samples/sec   Loss 2.0852   LearningRate 0.0304   Epoch: 8   Global Step: 149840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:25,199-Speed 3326.24 samples/sec   Loss 2.0053   LearningRate 0.0304   Epoch: 8   Global Step: 149850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:28,297-Speed 3305.98 samples/sec   Loss 2.0499   LearningRate 0.0304   Epoch: 8   Global Step: 149860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:31,372-Speed 3330.93 samples/sec   Loss 2.0672   LearningRate 0.0304   Epoch: 8   Global Step: 149870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:34,450-Speed 3327.66 samples/sec   Loss 1.9778   LearningRate 0.0304   Epoch: 8   Global Step: 149880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:37,544-Speed 3309.98 samples/sec   Loss 2.0202   LearningRate 0.0304   Epoch: 8   Global Step: 149890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:40,627-Speed 3323.35 samples/sec   Loss 1.9975   LearningRate 0.0304   Epoch: 8   Global Step: 149900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:04:43,704-Speed 3328.53 samples/sec   Loss 2.0111   LearningRate 0.0304   Epoch: 8   Global Step: 149910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:04:46,933-Speed 3172.13 samples/sec   Loss 1.9667   LearningRate 0.0303   Epoch: 8   Global Step: 149920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:04:50,031-Speed 3305.95 samples/sec   Loss 1.9874   LearningRate 0.0303   Epoch: 8   Global Step: 149930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:04:53,102-Speed 3334.73 samples/sec   Loss 1.9954   LearningRate 0.0303   Epoch: 8   Global Step: 149940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:04:56,198-Speed 3309.40 samples/sec   Loss 2.0439   LearningRate 0.0303   Epoch: 8   Global Step: 149950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:04:59,302-Speed 3299.21 samples/sec   Loss 1.9772   LearningRate 0.0303   Epoch: 8   Global Step: 149960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:05:02,380-Speed 3327.59 samples/sec   Loss 1.9956   LearningRate 0.0303   Epoch: 8   Global Step: 149970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:05:05,497-Speed 3285.35 samples/sec   Loss 2.0385   LearningRate 0.0303   Epoch: 8   Global Step: 149980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:05:08,769-Speed 3130.95 samples/sec   Loss 1.9951   LearningRate 0.0303   Epoch: 8   Global Step: 149990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:05:11,885-Speed 3287.03 samples/sec   Loss 2.0096   LearningRate 0.0303   Epoch: 8   Global Step: 150000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:05:56,073-[lfw][150000]XNorm: 19.795810
Training: 2022-04-11 15:05:56,074-[lfw][150000]Accuracy-Flip: 0.99817+-0.00263
Training: 2022-04-11 15:05:56,074-[lfw][150000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:06:46,885-[cfp_fp][150000]XNorm: 19.719060
Training: 2022-04-11 15:06:46,885-[cfp_fp][150000]Accuracy-Flip: 0.98843+-0.00611
Training: 2022-04-11 15:06:46,886-[cfp_fp][150000]Accuracy-Highest: 0.98843
Training: 2022-04-11 15:07:30,545-[agedb_30][150000]XNorm: 20.929331
Training: 2022-04-11 15:07:30,545-[agedb_30][150000]Accuracy-Flip: 0.98283+-0.00563
Training: 2022-04-11 15:07:30,546-[agedb_30][150000]Accuracy-Highest: 0.98317
Training: 2022-04-11 15:07:33,614-Speed 72.25 samples/sec   Loss 2.0080   LearningRate 0.0303   Epoch: 8   Global Step: 150010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:07:36,681-Speed 3338.66 samples/sec   Loss 1.9763   LearningRate 0.0303   Epoch: 8   Global Step: 150020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:07:39,773-Speed 3312.68 samples/sec   Loss 2.0418   LearningRate 0.0303   Epoch: 8   Global Step: 150030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:07:42,850-Speed 3329.54 samples/sec   Loss 1.9524   LearningRate 0.0303   Epoch: 8   Global Step: 150040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:07:45,969-Speed 3283.37 samples/sec   Loss 2.0883   LearningRate 0.0303   Epoch: 8   Global Step: 150050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:07:49,038-Speed 3337.97 samples/sec   Loss 1.9962   LearningRate 0.0303   Epoch: 8   Global Step: 150060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:07:52,123-Speed 3320.62 samples/sec   Loss 1.9992   LearningRate 0.0303   Epoch: 8   Global Step: 150070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:07:55,206-Speed 3321.30 samples/sec   Loss 2.0299   LearningRate 0.0303   Epoch: 8   Global Step: 150080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:07:58,304-Speed 3307.00 samples/sec   Loss 2.0290   LearningRate 0.0303   Epoch: 8   Global Step: 150090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:08:01,371-Speed 3339.35 samples/sec   Loss 2.0042   LearningRate 0.0303   Epoch: 8   Global Step: 150100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:08:04,440-Speed 3337.25 samples/sec   Loss 2.0823   LearningRate 0.0303   Epoch: 8   Global Step: 150110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:08:07,544-Speed 3300.14 samples/sec   Loss 1.9713   LearningRate 0.0303   Epoch: 8   Global Step: 150120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:08:10,634-Speed 3314.31 samples/sec   Loss 1.9211   LearningRate 0.0303   Epoch: 8   Global Step: 150130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:08:13,703-Speed 3338.28 samples/sec   Loss 1.9806   LearningRate 0.0303   Epoch: 8   Global Step: 150140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:08:16,772-Speed 3337.05 samples/sec   Loss 2.0536   LearningRate 0.0303   Epoch: 8   Global Step: 150150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:08:19,841-Speed 3337.46 samples/sec   Loss 2.0018   LearningRate 0.0303   Epoch: 8   Global Step: 150160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:08:22,912-Speed 3334.25 samples/sec   Loss 2.0409   LearningRate 0.0303   Epoch: 8   Global Step: 150170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:08:25,984-Speed 3334.11 samples/sec   Loss 2.0422   LearningRate 0.0303   Epoch: 8   Global Step: 150180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:08:29,054-Speed 3337.20 samples/sec   Loss 2.0420   LearningRate 0.0303   Epoch: 8   Global Step: 150190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:08:32,160-Speed 3297.59 samples/sec   Loss 1.9937   LearningRate 0.0303   Epoch: 8   Global Step: 150200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:08:35,756-Speed 2847.74 samples/sec   Loss 1.9817   LearningRate 0.0303   Epoch: 8   Global Step: 150210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:05,545-Speed 343.77 samples/sec   Loss 1.9321   LearningRate 0.0302   Epoch: 9   Global Step: 150220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:09,123-Speed 2862.81 samples/sec   Loss 1.5180   LearningRate 0.0302   Epoch: 9   Global Step: 150230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:13,020-Speed 2628.02 samples/sec   Loss 1.5091   LearningRate 0.0302   Epoch: 9   Global Step: 150240   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:09:16,155-Speed 3267.81 samples/sec   Loss 1.5456   LearningRate 0.0302   Epoch: 9   Global Step: 150250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:19,286-Speed 3271.17 samples/sec   Loss 1.4570   LearningRate 0.0302   Epoch: 9   Global Step: 150260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:22,427-Speed 3260.57 samples/sec   Loss 1.4245   LearningRate 0.0302   Epoch: 9   Global Step: 150270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:25,513-Speed 3319.19 samples/sec   Loss 1.4874   LearningRate 0.0302   Epoch: 9   Global Step: 150280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:28,650-Speed 3264.68 samples/sec   Loss 1.5332   LearningRate 0.0302   Epoch: 9   Global Step: 150290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:31,746-Speed 3308.74 samples/sec   Loss 1.5305   LearningRate 0.0302   Epoch: 9   Global Step: 150300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:34,837-Speed 3313.29 samples/sec   Loss 1.5443   LearningRate 0.0302   Epoch: 9   Global Step: 150310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:38,150-Speed 3091.76 samples/sec   Loss 1.5086   LearningRate 0.0302   Epoch: 9   Global Step: 150320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:41,352-Speed 3199.30 samples/sec   Loss 1.4720   LearningRate 0.0302   Epoch: 9   Global Step: 150330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:44,520-Speed 3233.72 samples/sec   Loss 1.4577   LearningRate 0.0302   Epoch: 9   Global Step: 150340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:47,590-Speed 3335.44 samples/sec   Loss 1.4814   LearningRate 0.0302   Epoch: 9   Global Step: 150350   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:09:50,662-Speed 3334.74 samples/sec   Loss 1.5408   LearningRate 0.0302   Epoch: 9   Global Step: 150360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:53,748-Speed 3319.43 samples/sec   Loss 1.4665   LearningRate 0.0302   Epoch: 9   Global Step: 150370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:09:57,129-Speed 3029.16 samples/sec   Loss 1.4975   LearningRate 0.0302   Epoch: 9   Global Step: 150380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:10:00,237-Speed 3295.66 samples/sec   Loss 1.4806   LearningRate 0.0302   Epoch: 9   Global Step: 150390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:10:04,225-Speed 2567.85 samples/sec   Loss 1.4414   LearningRate 0.0302   Epoch: 9   Global Step: 150400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:10:07,332-Speed 3297.00 samples/sec   Loss 1.4413   LearningRate 0.0302   Epoch: 9   Global Step: 150410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:10,413-Speed 3323.98 samples/sec   Loss 1.4826   LearningRate 0.0302   Epoch: 9   Global Step: 150420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:13,503-Speed 3314.98 samples/sec   Loss 1.5350   LearningRate 0.0302   Epoch: 9   Global Step: 150430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:16,605-Speed 3302.73 samples/sec   Loss 1.5054   LearningRate 0.0302   Epoch: 9   Global Step: 150440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:19,706-Speed 3302.98 samples/sec   Loss 1.5186   LearningRate 0.0302   Epoch: 9   Global Step: 150450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:22,799-Speed 3310.69 samples/sec   Loss 1.5438   LearningRate 0.0302   Epoch: 9   Global Step: 150460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:25,911-Speed 3291.98 samples/sec   Loss 1.4999   LearningRate 0.0302   Epoch: 9   Global Step: 150470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:29,086-Speed 3225.90 samples/sec   Loss 1.4983   LearningRate 0.0302   Epoch: 9   Global Step: 150480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:32,173-Speed 3318.09 samples/sec   Loss 1.4345   LearningRate 0.0302   Epoch: 9   Global Step: 150490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:35,255-Speed 3322.28 samples/sec   Loss 1.4546   LearningRate 0.0302   Epoch: 9   Global Step: 150500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:10:38,335-Speed 3325.67 samples/sec   Loss 1.4953   LearningRate 0.0302   Epoch: 9   Global Step: 150510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:10:41,422-Speed 3317.97 samples/sec   Loss 1.5284   LearningRate 0.0302   Epoch: 9   Global Step: 150520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:10:44,506-Speed 3322.14 samples/sec   Loss 1.4782   LearningRate 0.0301   Epoch: 9   Global Step: 150530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:10:47,609-Speed 3299.85 samples/sec   Loss 1.4804   LearningRate 0.0301   Epoch: 9   Global Step: 150540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:10:50,738-Speed 3274.40 samples/sec   Loss 1.4630   LearningRate 0.0301   Epoch: 9   Global Step: 150550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:10:53,894-Speed 3244.77 samples/sec   Loss 1.4800   LearningRate 0.0301   Epoch: 9   Global Step: 150560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:10:56,967-Speed 3332.86 samples/sec   Loss 1.4769   LearningRate 0.0301   Epoch: 9   Global Step: 150570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:00,053-Speed 3318.72 samples/sec   Loss 1.4880   LearningRate 0.0301   Epoch: 9   Global Step: 150580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:03,157-Speed 3300.30 samples/sec   Loss 1.4773   LearningRate 0.0301   Epoch: 9   Global Step: 150590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:06,243-Speed 3319.00 samples/sec   Loss 1.4948   LearningRate 0.0301   Epoch: 9   Global Step: 150600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:09,360-Speed 3286.03 samples/sec   Loss 1.5152   LearningRate 0.0301   Epoch: 9   Global Step: 150610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:12,505-Speed 3256.33 samples/sec   Loss 1.5318   LearningRate 0.0301   Epoch: 9   Global Step: 150620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:15,585-Speed 3326.09 samples/sec   Loss 1.4839   LearningRate 0.0301   Epoch: 9   Global Step: 150630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:18,660-Speed 3330.16 samples/sec   Loss 1.5548   LearningRate 0.0301   Epoch: 9   Global Step: 150640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:21,738-Speed 3327.58 samples/sec   Loss 1.5090   LearningRate 0.0301   Epoch: 9   Global Step: 150650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:24,823-Speed 3320.04 samples/sec   Loss 1.4618   LearningRate 0.0301   Epoch: 9   Global Step: 150660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:27,929-Speed 3298.40 samples/sec   Loss 1.5342   LearningRate 0.0301   Epoch: 9   Global Step: 150670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:11:31,006-Speed 3328.19 samples/sec   Loss 1.4809   LearningRate 0.0301   Epoch: 9   Global Step: 150680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:11:34,083-Speed 3329.03 samples/sec   Loss 1.4647   LearningRate 0.0301   Epoch: 9   Global Step: 150690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:37,164-Speed 3324.79 samples/sec   Loss 1.4760   LearningRate 0.0301   Epoch: 9   Global Step: 150700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:40,263-Speed 3304.22 samples/sec   Loss 1.5327   LearningRate 0.0301   Epoch: 9   Global Step: 150710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:43,347-Speed 3322.10 samples/sec   Loss 1.4833   LearningRate 0.0301   Epoch: 9   Global Step: 150720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:46,430-Speed 3321.89 samples/sec   Loss 1.5129   LearningRate 0.0301   Epoch: 9   Global Step: 150730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:49,524-Speed 3310.55 samples/sec   Loss 1.5858   LearningRate 0.0301   Epoch: 9   Global Step: 150740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:52,600-Speed 3329.93 samples/sec   Loss 1.5464   LearningRate 0.0301   Epoch: 9   Global Step: 150750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:55,726-Speed 3276.82 samples/sec   Loss 1.5030   LearningRate 0.0301   Epoch: 9   Global Step: 150760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:11:58,801-Speed 3330.33 samples/sec   Loss 1.4945   LearningRate 0.0301   Epoch: 9   Global Step: 150770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:12:01,875-Speed 3331.90 samples/sec   Loss 1.4931   LearningRate 0.0301   Epoch: 9   Global Step: 150780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:12:04,952-Speed 3328.84 samples/sec   Loss 1.5233   LearningRate 0.0301   Epoch: 9   Global Step: 150790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:08,102-Speed 3251.85 samples/sec   Loss 1.5240   LearningRate 0.0301   Epoch: 9   Global Step: 150800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:11,183-Speed 3324.13 samples/sec   Loss 1.4795   LearningRate 0.0301   Epoch: 9   Global Step: 150810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:14,296-Speed 3289.77 samples/sec   Loss 1.5436   LearningRate 0.0301   Epoch: 9   Global Step: 150820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:17,382-Speed 3318.80 samples/sec   Loss 1.5532   LearningRate 0.0300   Epoch: 9   Global Step: 150830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:20,457-Speed 3330.74 samples/sec   Loss 1.5379   LearningRate 0.0300   Epoch: 9   Global Step: 150840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:23,569-Speed 3291.88 samples/sec   Loss 1.5023   LearningRate 0.0300   Epoch: 9   Global Step: 150850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:26,671-Speed 3301.70 samples/sec   Loss 1.5446   LearningRate 0.0300   Epoch: 9   Global Step: 150860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:29,812-Speed 3261.51 samples/sec   Loss 1.5664   LearningRate 0.0300   Epoch: 9   Global Step: 150870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:32,899-Speed 3317.64 samples/sec   Loss 1.5568   LearningRate 0.0300   Epoch: 9   Global Step: 150880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:35,997-Speed 3305.79 samples/sec   Loss 1.5430   LearningRate 0.0300   Epoch: 9   Global Step: 150890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:39,083-Speed 3319.66 samples/sec   Loss 1.5177   LearningRate 0.0300   Epoch: 9   Global Step: 150900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:42,233-Speed 3250.50 samples/sec   Loss 1.5905   LearningRate 0.0300   Epoch: 9   Global Step: 150910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:45,314-Speed 3324.72 samples/sec   Loss 1.5342   LearningRate 0.0300   Epoch: 9   Global Step: 150920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:48,418-Speed 3300.35 samples/sec   Loss 1.4721   LearningRate 0.0300   Epoch: 9   Global Step: 150930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:51,564-Speed 3255.85 samples/sec   Loss 1.5713   LearningRate 0.0300   Epoch: 9   Global Step: 150940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:54,662-Speed 3309.55 samples/sec   Loss 1.5083   LearningRate 0.0300   Epoch: 9   Global Step: 150950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:12:57,742-Speed 3325.10 samples/sec   Loss 1.5943   LearningRate 0.0300   Epoch: 9   Global Step: 150960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:13:00,844-Speed 3302.40 samples/sec   Loss 1.5476   LearningRate 0.0300   Epoch: 9   Global Step: 150970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:13:04,011-Speed 3233.99 samples/sec   Loss 1.5012   LearningRate 0.0300   Epoch: 9   Global Step: 150980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:13:07,120-Speed 3293.83 samples/sec   Loss 1.5269   LearningRate 0.0300   Epoch: 9   Global Step: 150990   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:13:10,206-Speed 3319.43 samples/sec   Loss 1.5377   LearningRate 0.0300   Epoch: 9   Global Step: 151000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:13:13,314-Speed 3295.01 samples/sec   Loss 1.5546   LearningRate 0.0300   Epoch: 9   Global Step: 151010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:13:16,386-Speed 3333.98 samples/sec   Loss 1.5529   LearningRate 0.0300   Epoch: 9   Global Step: 151020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:13:19,470-Speed 3322.35 samples/sec   Loss 1.4851   LearningRate 0.0300   Epoch: 9   Global Step: 151030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:13:22,547-Speed 3328.45 samples/sec   Loss 1.5142   LearningRate 0.0300   Epoch: 9   Global Step: 151040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:13:25,626-Speed 3326.83 samples/sec   Loss 1.5100   LearningRate 0.0300   Epoch: 9   Global Step: 151050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:13:28,705-Speed 3325.66 samples/sec   Loss 1.5140   LearningRate 0.0300   Epoch: 9   Global Step: 151060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:31,785-Speed 3325.83 samples/sec   Loss 1.5213   LearningRate 0.0300   Epoch: 9   Global Step: 151070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:34,866-Speed 3323.90 samples/sec   Loss 1.5833   LearningRate 0.0300   Epoch: 9   Global Step: 151080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:37,953-Speed 3318.17 samples/sec   Loss 1.5125   LearningRate 0.0300   Epoch: 9   Global Step: 151090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:41,035-Speed 3323.62 samples/sec   Loss 1.5350   LearningRate 0.0300   Epoch: 9   Global Step: 151100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:44,220-Speed 3216.34 samples/sec   Loss 1.4987   LearningRate 0.0300   Epoch: 9   Global Step: 151110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:47,371-Speed 3250.55 samples/sec   Loss 1.4561   LearningRate 0.0300   Epoch: 9   Global Step: 151120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:50,449-Speed 3327.38 samples/sec   Loss 1.5474   LearningRate 0.0300   Epoch: 9   Global Step: 151130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:53,562-Speed 3290.40 samples/sec   Loss 1.5432   LearningRate 0.0299   Epoch: 9   Global Step: 151140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:56,676-Speed 3288.53 samples/sec   Loss 1.5463   LearningRate 0.0299   Epoch: 9   Global Step: 151150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:13:59,827-Speed 3251.16 samples/sec   Loss 1.5452   LearningRate 0.0299   Epoch: 9   Global Step: 151160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:14:02,993-Speed 3234.51 samples/sec   Loss 1.5160   LearningRate 0.0299   Epoch: 9   Global Step: 151170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:14:06,100-Speed 3296.52 samples/sec   Loss 1.5657   LearningRate 0.0299   Epoch: 9   Global Step: 151180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:14:09,199-Speed 3305.43 samples/sec   Loss 1.5455   LearningRate 0.0299   Epoch: 9   Global Step: 151190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:14:12,295-Speed 3308.07 samples/sec   Loss 1.5289   LearningRate 0.0299   Epoch: 9   Global Step: 151200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:14:15,392-Speed 3307.04 samples/sec   Loss 1.5303   LearningRate 0.0299   Epoch: 9   Global Step: 151210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:14:18,481-Speed 3316.75 samples/sec   Loss 1.6223   LearningRate 0.0299   Epoch: 9   Global Step: 151220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:14:21,597-Speed 3286.47 samples/sec   Loss 1.5594   LearningRate 0.0299   Epoch: 9   Global Step: 151230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:14:24,670-Speed 3333.27 samples/sec   Loss 1.5120   LearningRate 0.0299   Epoch: 9   Global Step: 151240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:14:27,850-Speed 3221.70 samples/sec   Loss 1.5523   LearningRate 0.0299   Epoch: 9   Global Step: 151250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:30,941-Speed 3313.60 samples/sec   Loss 1.5810   LearningRate 0.0299   Epoch: 9   Global Step: 151260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:34,037-Speed 3308.26 samples/sec   Loss 1.4985   LearningRate 0.0299   Epoch: 9   Global Step: 151270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:37,138-Speed 3302.37 samples/sec   Loss 1.5186   LearningRate 0.0299   Epoch: 9   Global Step: 151280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:40,230-Speed 3313.28 samples/sec   Loss 1.5414   LearningRate 0.0299   Epoch: 9   Global Step: 151290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:43,385-Speed 3246.05 samples/sec   Loss 1.5975   LearningRate 0.0299   Epoch: 9   Global Step: 151300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:46,510-Speed 3277.41 samples/sec   Loss 1.5285   LearningRate 0.0299   Epoch: 9   Global Step: 151310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:49,603-Speed 3311.28 samples/sec   Loss 1.5232   LearningRate 0.0299   Epoch: 9   Global Step: 151320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:52,680-Speed 3328.79 samples/sec   Loss 1.5667   LearningRate 0.0299   Epoch: 9   Global Step: 151330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:55,767-Speed 3318.56 samples/sec   Loss 1.5116   LearningRate 0.0299   Epoch: 9   Global Step: 151340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:14:58,845-Speed 3327.40 samples/sec   Loss 1.5005   LearningRate 0.0299   Epoch: 9   Global Step: 151350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:15:01,963-Speed 3284.88 samples/sec   Loss 1.5962   LearningRate 0.0299   Epoch: 9   Global Step: 151360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:15:05,060-Speed 3307.29 samples/sec   Loss 1.5773   LearningRate 0.0299   Epoch: 9   Global Step: 151370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:15:08,136-Speed 3329.91 samples/sec   Loss 1.5105   LearningRate 0.0299   Epoch: 9   Global Step: 151380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:15:11,234-Speed 3305.78 samples/sec   Loss 1.5469   LearningRate 0.0299   Epoch: 9   Global Step: 151390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:15:14,312-Speed 3327.61 samples/sec   Loss 1.5324   LearningRate 0.0299   Epoch: 9   Global Step: 151400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:15:17,381-Speed 3337.40 samples/sec   Loss 1.5487   LearningRate 0.0299   Epoch: 9   Global Step: 151410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:20,473-Speed 3312.74 samples/sec   Loss 1.5467   LearningRate 0.0299   Epoch: 9   Global Step: 151420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:23,557-Speed 3320.87 samples/sec   Loss 1.5509   LearningRate 0.0299   Epoch: 9   Global Step: 151430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:26,662-Speed 3298.06 samples/sec   Loss 1.5584   LearningRate 0.0298   Epoch: 9   Global Step: 151440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:29,740-Speed 3327.83 samples/sec   Loss 1.5452   LearningRate 0.0298   Epoch: 9   Global Step: 151450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:32,828-Speed 3317.73 samples/sec   Loss 1.5561   LearningRate 0.0298   Epoch: 9   Global Step: 151460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:35,912-Speed 3321.02 samples/sec   Loss 1.5651   LearningRate 0.0298   Epoch: 9   Global Step: 151470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:38,997-Speed 3319.44 samples/sec   Loss 1.5361   LearningRate 0.0298   Epoch: 9   Global Step: 151480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:42,192-Speed 3206.68 samples/sec   Loss 1.5570   LearningRate 0.0298   Epoch: 9   Global Step: 151490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:45,339-Speed 3254.12 samples/sec   Loss 1.5837   LearningRate 0.0298   Epoch: 9   Global Step: 151500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:15:48,452-Speed 3290.30 samples/sec   Loss 1.5303   LearningRate 0.0298   Epoch: 9   Global Step: 151510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:15:51,542-Speed 3314.54 samples/sec   Loss 1.4840   LearningRate 0.0298   Epoch: 9   Global Step: 151520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:15:54,623-Speed 3324.32 samples/sec   Loss 1.5730   LearningRate 0.0298   Epoch: 9   Global Step: 151530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:15:57,701-Speed 3327.48 samples/sec   Loss 1.5967   LearningRate 0.0298   Epoch: 9   Global Step: 151540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:00,783-Speed 3323.86 samples/sec   Loss 1.5577   LearningRate 0.0298   Epoch: 9   Global Step: 151550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:03,863-Speed 3325.37 samples/sec   Loss 1.6060   LearningRate 0.0298   Epoch: 9   Global Step: 151560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:06,942-Speed 3326.13 samples/sec   Loss 1.5839   LearningRate 0.0298   Epoch: 9   Global Step: 151570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:10,017-Speed 3330.99 samples/sec   Loss 1.5343   LearningRate 0.0298   Epoch: 9   Global Step: 151580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:13,120-Speed 3300.82 samples/sec   Loss 1.5304   LearningRate 0.0298   Epoch: 9   Global Step: 151590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:16,196-Speed 3329.53 samples/sec   Loss 1.5488   LearningRate 0.0298   Epoch: 9   Global Step: 151600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:19,261-Speed 3341.38 samples/sec   Loss 1.5837   LearningRate 0.0298   Epoch: 9   Global Step: 151610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:22,360-Speed 3305.25 samples/sec   Loss 1.5645   LearningRate 0.0298   Epoch: 9   Global Step: 151620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:25,433-Speed 3332.30 samples/sec   Loss 1.5771   LearningRate 0.0298   Epoch: 9   Global Step: 151630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:28,516-Speed 3322.44 samples/sec   Loss 1.5082   LearningRate 0.0298   Epoch: 9   Global Step: 151640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:31,596-Speed 3325.92 samples/sec   Loss 1.5682   LearningRate 0.0298   Epoch: 9   Global Step: 151650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:34,702-Speed 3298.13 samples/sec   Loss 1.5621   LearningRate 0.0298   Epoch: 9   Global Step: 151660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:37,781-Speed 3326.24 samples/sec   Loss 1.5711   LearningRate 0.0298   Epoch: 9   Global Step: 151670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:40,862-Speed 3324.38 samples/sec   Loss 1.5694   LearningRate 0.0298   Epoch: 9   Global Step: 151680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:43,946-Speed 3321.21 samples/sec   Loss 1.5970   LearningRate 0.0298   Epoch: 9   Global Step: 151690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:47,153-Speed 3192.81 samples/sec   Loss 1.5367   LearningRate 0.0298   Epoch: 9   Global Step: 151700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:50,224-Speed 3336.02 samples/sec   Loss 1.5561   LearningRate 0.0298   Epoch: 9   Global Step: 151710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:16:53,291-Speed 3339.55 samples/sec   Loss 1.5472   LearningRate 0.0298   Epoch: 9   Global Step: 151720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:16:56,370-Speed 3326.24 samples/sec   Loss 1.5655   LearningRate 0.0298   Epoch: 9   Global Step: 151730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:16:59,457-Speed 3318.50 samples/sec   Loss 1.5923   LearningRate 0.0298   Epoch: 9   Global Step: 151740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:02,554-Speed 3307.06 samples/sec   Loss 1.5916   LearningRate 0.0297   Epoch: 9   Global Step: 151750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:05,654-Speed 3303.97 samples/sec   Loss 1.6243   LearningRate 0.0297   Epoch: 9   Global Step: 151760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:08,743-Speed 3316.14 samples/sec   Loss 1.6094   LearningRate 0.0297   Epoch: 9   Global Step: 151770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:11,831-Speed 3316.54 samples/sec   Loss 1.5166   LearningRate 0.0297   Epoch: 9   Global Step: 151780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:14,916-Speed 3319.75 samples/sec   Loss 1.5703   LearningRate 0.0297   Epoch: 9   Global Step: 151790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:17,991-Speed 3330.55 samples/sec   Loss 1.5732   LearningRate 0.0297   Epoch: 9   Global Step: 151800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:21,069-Speed 3328.72 samples/sec   Loss 1.6064   LearningRate 0.0297   Epoch: 9   Global Step: 151810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:24,199-Speed 3272.61 samples/sec   Loss 1.5884   LearningRate 0.0297   Epoch: 9   Global Step: 151820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:17:27,275-Speed 3328.63 samples/sec   Loss 1.5435   LearningRate 0.0297   Epoch: 9   Global Step: 151830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:17:30,364-Speed 3316.09 samples/sec   Loss 1.5921   LearningRate 0.0297   Epoch: 9   Global Step: 151840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:17:33,444-Speed 3325.68 samples/sec   Loss 1.5893   LearningRate 0.0297   Epoch: 9   Global Step: 151850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:17:36,525-Speed 3324.75 samples/sec   Loss 1.5531   LearningRate 0.0297   Epoch: 9   Global Step: 151860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:17:39,605-Speed 3324.62 samples/sec   Loss 1.5742   LearningRate 0.0297   Epoch: 9   Global Step: 151870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:17:42,695-Speed 3314.71 samples/sec   Loss 1.5465   LearningRate 0.0297   Epoch: 9   Global Step: 151880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:45,788-Speed 3311.55 samples/sec   Loss 1.5789   LearningRate 0.0297   Epoch: 9   Global Step: 151890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:48,881-Speed 3311.72 samples/sec   Loss 1.5953   LearningRate 0.0297   Epoch: 9   Global Step: 151900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:51,965-Speed 3320.94 samples/sec   Loss 1.5902   LearningRate 0.0297   Epoch: 9   Global Step: 151910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:55,129-Speed 3237.96 samples/sec   Loss 1.5736   LearningRate 0.0297   Epoch: 9   Global Step: 151920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:17:58,387-Speed 3142.90 samples/sec   Loss 1.5877   LearningRate 0.0297   Epoch: 9   Global Step: 151930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:18:01,468-Speed 3324.54 samples/sec   Loss 1.6008   LearningRate 0.0297   Epoch: 9   Global Step: 151940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:18:04,614-Speed 3255.50 samples/sec   Loss 1.5455   LearningRate 0.0297   Epoch: 9   Global Step: 151950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:18:07,694-Speed 3325.38 samples/sec   Loss 1.6052   LearningRate 0.0297   Epoch: 9   Global Step: 151960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:18:10,780-Speed 3319.79 samples/sec   Loss 1.6215   LearningRate 0.0297   Epoch: 9   Global Step: 151970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:18:13,874-Speed 3310.04 samples/sec   Loss 1.6317   LearningRate 0.0297   Epoch: 9   Global Step: 151980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:18:16,955-Speed 3325.23 samples/sec   Loss 1.5277   LearningRate 0.0297   Epoch: 9   Global Step: 151990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:18:20,041-Speed 3319.09 samples/sec   Loss 1.5686   LearningRate 0.0297   Epoch: 9   Global Step: 152000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:19:03,658-[lfw][152000]XNorm: 21.103845
Training: 2022-04-11 15:19:03,659-[lfw][152000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-11 15:19:03,659-[lfw][152000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:19:54,438-[cfp_fp][152000]XNorm: 21.201568
Training: 2022-04-11 15:19:54,439-[cfp_fp][152000]Accuracy-Flip: 0.98671+-0.00579
Training: 2022-04-11 15:19:54,439-[cfp_fp][152000]Accuracy-Highest: 0.98843
Training: 2022-04-11 15:20:38,193-[agedb_30][152000]XNorm: 21.804531
Training: 2022-04-11 15:20:38,194-[agedb_30][152000]Accuracy-Flip: 0.98167+-0.00679
Training: 2022-04-11 15:20:38,194-[agedb_30][152000]Accuracy-Highest: 0.98317
Training: 2022-04-11 15:20:41,313-Speed 72.48 samples/sec   Loss 1.5303   LearningRate 0.0297   Epoch: 9   Global Step: 152010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:20:44,515-Speed 3199.10 samples/sec   Loss 1.6173   LearningRate 0.0297   Epoch: 9   Global Step: 152020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:20:47,612-Speed 3307.75 samples/sec   Loss 1.5309   LearningRate 0.0297   Epoch: 9   Global Step: 152030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:20:50,755-Speed 3258.03 samples/sec   Loss 1.5796   LearningRate 0.0297   Epoch: 9   Global Step: 152040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:20:53,833-Speed 3327.27 samples/sec   Loss 1.5795   LearningRate 0.0296   Epoch: 9   Global Step: 152050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:20:56,929-Speed 3308.35 samples/sec   Loss 1.6607   LearningRate 0.0296   Epoch: 9   Global Step: 152060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:00,047-Speed 3284.56 samples/sec   Loss 1.6043   LearningRate 0.0296   Epoch: 9   Global Step: 152070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:03,311-Speed 3138.32 samples/sec   Loss 1.6452   LearningRate 0.0296   Epoch: 9   Global Step: 152080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:06,478-Speed 3234.66 samples/sec   Loss 1.6477   LearningRate 0.0296   Epoch: 9   Global Step: 152090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:09,723-Speed 3156.87 samples/sec   Loss 1.5907   LearningRate 0.0296   Epoch: 9   Global Step: 152100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:12,869-Speed 3255.56 samples/sec   Loss 1.6056   LearningRate 0.0296   Epoch: 9   Global Step: 152110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:15,975-Speed 3297.10 samples/sec   Loss 1.5960   LearningRate 0.0296   Epoch: 9   Global Step: 152120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:19,226-Speed 3150.06 samples/sec   Loss 1.5606   LearningRate 0.0296   Epoch: 9   Global Step: 152130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:22,389-Speed 3238.25 samples/sec   Loss 1.6031   LearningRate 0.0296   Epoch: 9   Global Step: 152140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:21:25,482-Speed 3312.16 samples/sec   Loss 1.5962   LearningRate 0.0296   Epoch: 9   Global Step: 152150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:21:28,625-Speed 3257.93 samples/sec   Loss 1.6365   LearningRate 0.0296   Epoch: 9   Global Step: 152160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:21:31,723-Speed 3306.63 samples/sec   Loss 1.5914   LearningRate 0.0296   Epoch: 9   Global Step: 152170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:21:34,812-Speed 3315.74 samples/sec   Loss 1.6165   LearningRate 0.0296   Epoch: 9   Global Step: 152180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:21:37,949-Speed 3265.18 samples/sec   Loss 1.5962   LearningRate 0.0296   Epoch: 9   Global Step: 152190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:21:41,036-Speed 3317.80 samples/sec   Loss 1.5899   LearningRate 0.0296   Epoch: 9   Global Step: 152200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:21:44,098-Speed 3344.87 samples/sec   Loss 1.6251   LearningRate 0.0296   Epoch: 9   Global Step: 152210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:47,229-Speed 3271.71 samples/sec   Loss 1.6004   LearningRate 0.0296   Epoch: 9   Global Step: 152220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:50,307-Speed 3327.27 samples/sec   Loss 1.6134   LearningRate 0.0296   Epoch: 9   Global Step: 152230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:53,398-Speed 3313.95 samples/sec   Loss 1.6647   LearningRate 0.0296   Epoch: 9   Global Step: 152240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:56,501-Speed 3300.75 samples/sec   Loss 1.6044   LearningRate 0.0296   Epoch: 9   Global Step: 152250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:21:59,588-Speed 3317.68 samples/sec   Loss 1.6235   LearningRate 0.0296   Epoch: 9   Global Step: 152260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:22:02,699-Speed 3292.84 samples/sec   Loss 1.6710   LearningRate 0.0296   Epoch: 9   Global Step: 152270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:22:05,782-Speed 3322.11 samples/sec   Loss 1.6553   LearningRate 0.0296   Epoch: 9   Global Step: 152280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:22:08,877-Speed 3309.38 samples/sec   Loss 1.6204   LearningRate 0.0296   Epoch: 9   Global Step: 152290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:22:11,965-Speed 3316.48 samples/sec   Loss 1.5965   LearningRate 0.0296   Epoch: 9   Global Step: 152300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:22:15,049-Speed 3321.22 samples/sec   Loss 1.6088   LearningRate 0.0296   Epoch: 9   Global Step: 152310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:18,181-Speed 3270.38 samples/sec   Loss 1.6251   LearningRate 0.0296   Epoch: 9   Global Step: 152320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:21,261-Speed 3325.03 samples/sec   Loss 1.6259   LearningRate 0.0296   Epoch: 9   Global Step: 152330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:24,400-Speed 3263.37 samples/sec   Loss 1.5655   LearningRate 0.0296   Epoch: 9   Global Step: 152340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:27,487-Speed 3318.38 samples/sec   Loss 1.5753   LearningRate 0.0296   Epoch: 9   Global Step: 152350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:30,666-Speed 3221.07 samples/sec   Loss 1.6304   LearningRate 0.0295   Epoch: 9   Global Step: 152360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:33,821-Speed 3247.28 samples/sec   Loss 1.6561   LearningRate 0.0295   Epoch: 9   Global Step: 152370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:37,026-Speed 3194.90 samples/sec   Loss 1.6307   LearningRate 0.0295   Epoch: 9   Global Step: 152380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:40,156-Speed 3272.63 samples/sec   Loss 1.6214   LearningRate 0.0295   Epoch: 9   Global Step: 152390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:43,229-Speed 3332.69 samples/sec   Loss 1.6115   LearningRate 0.0295   Epoch: 9   Global Step: 152400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:46,302-Speed 3333.47 samples/sec   Loss 1.6319   LearningRate 0.0295   Epoch: 9   Global Step: 152410   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:22:49,376-Speed 3331.90 samples/sec   Loss 1.5892   LearningRate 0.0295   Epoch: 9   Global Step: 152420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:52,451-Speed 3330.70 samples/sec   Loss 1.6180   LearningRate 0.0295   Epoch: 9   Global Step: 152430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:55,529-Speed 3327.85 samples/sec   Loss 1.6083   LearningRate 0.0295   Epoch: 9   Global Step: 152440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:22:58,602-Speed 3332.90 samples/sec   Loss 1.6490   LearningRate 0.0295   Epoch: 9   Global Step: 152450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:23:01,688-Speed 3319.77 samples/sec   Loss 1.6366   LearningRate 0.0295   Epoch: 9   Global Step: 152460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:04,762-Speed 3331.89 samples/sec   Loss 1.6574   LearningRate 0.0295   Epoch: 9   Global Step: 152470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:07,836-Speed 3332.61 samples/sec   Loss 1.6015   LearningRate 0.0295   Epoch: 9   Global Step: 152480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:10,913-Speed 3328.07 samples/sec   Loss 1.6798   LearningRate 0.0295   Epoch: 9   Global Step: 152490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:13,987-Speed 3331.38 samples/sec   Loss 1.5807   LearningRate 0.0295   Epoch: 9   Global Step: 152500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:17,129-Speed 3259.98 samples/sec   Loss 1.5639   LearningRate 0.0295   Epoch: 9   Global Step: 152510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:20,233-Speed 3300.44 samples/sec   Loss 1.6085   LearningRate 0.0295   Epoch: 9   Global Step: 152520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:23,323-Speed 3314.44 samples/sec   Loss 1.6050   LearningRate 0.0295   Epoch: 9   Global Step: 152530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:26,430-Speed 3296.38 samples/sec   Loss 1.6161   LearningRate 0.0295   Epoch: 9   Global Step: 152540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:29,504-Speed 3332.50 samples/sec   Loss 1.6259   LearningRate 0.0295   Epoch: 9   Global Step: 152550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:23:32,606-Speed 3301.36 samples/sec   Loss 1.5876   LearningRate 0.0295   Epoch: 9   Global Step: 152560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:23:35,688-Speed 3323.64 samples/sec   Loss 1.5832   LearningRate 0.0295   Epoch: 9   Global Step: 152570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:23:38,789-Speed 3302.76 samples/sec   Loss 1.6472   LearningRate 0.0295   Epoch: 9   Global Step: 152580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:23:41,989-Speed 3201.05 samples/sec   Loss 1.6274   LearningRate 0.0295   Epoch: 9   Global Step: 152590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:23:45,147-Speed 3243.48 samples/sec   Loss 1.6551   LearningRate 0.0295   Epoch: 9   Global Step: 152600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:23:48,229-Speed 3322.76 samples/sec   Loss 1.6411   LearningRate 0.0295   Epoch: 9   Global Step: 152610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:23:51,307-Speed 3327.67 samples/sec   Loss 1.6353   LearningRate 0.0295   Epoch: 9   Global Step: 152620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:23:54,393-Speed 3318.88 samples/sec   Loss 1.6686   LearningRate 0.0295   Epoch: 9   Global Step: 152630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:23:57,469-Speed 3329.49 samples/sec   Loss 1.6347   LearningRate 0.0295   Epoch: 9   Global Step: 152640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:24:00,562-Speed 3311.06 samples/sec   Loss 1.5834   LearningRate 0.0295   Epoch: 9   Global Step: 152650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:24:03,625-Speed 3344.40 samples/sec   Loss 1.6675   LearningRate 0.0295   Epoch: 9   Global Step: 152660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:06,710-Speed 3319.58 samples/sec   Loss 1.6665   LearningRate 0.0294   Epoch: 9   Global Step: 152670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:09,805-Speed 3309.45 samples/sec   Loss 1.6062   LearningRate 0.0294   Epoch: 9   Global Step: 152680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:12,909-Speed 3300.31 samples/sec   Loss 1.6171   LearningRate 0.0294   Epoch: 9   Global Step: 152690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:15,984-Speed 3330.31 samples/sec   Loss 1.6405   LearningRate 0.0294   Epoch: 9   Global Step: 152700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:19,154-Speed 3231.13 samples/sec   Loss 1.6362   LearningRate 0.0294   Epoch: 9   Global Step: 152710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:22,228-Speed 3332.86 samples/sec   Loss 1.6809   LearningRate 0.0294   Epoch: 9   Global Step: 152720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:25,304-Speed 3328.81 samples/sec   Loss 1.6504   LearningRate 0.0294   Epoch: 9   Global Step: 152730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:28,385-Speed 3324.72 samples/sec   Loss 1.6317   LearningRate 0.0294   Epoch: 9   Global Step: 152740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:31,471-Speed 3319.31 samples/sec   Loss 1.6681   LearningRate 0.0294   Epoch: 9   Global Step: 152750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:34,601-Speed 3271.82 samples/sec   Loss 1.7257   LearningRate 0.0294   Epoch: 9   Global Step: 152760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:24:37,695-Speed 3310.79 samples/sec   Loss 1.7035   LearningRate 0.0294   Epoch: 9   Global Step: 152770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:24:40,774-Speed 3326.07 samples/sec   Loss 1.6218   LearningRate 0.0294   Epoch: 9   Global Step: 152780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:24:43,849-Speed 3331.30 samples/sec   Loss 1.6893   LearningRate 0.0294   Epoch: 9   Global Step: 152790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:24:46,913-Speed 3342.33 samples/sec   Loss 1.6027   LearningRate 0.0294   Epoch: 9   Global Step: 152800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:49,990-Speed 3328.80 samples/sec   Loss 1.6590   LearningRate 0.0294   Epoch: 9   Global Step: 152810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:53,088-Speed 3306.54 samples/sec   Loss 1.6314   LearningRate 0.0294   Epoch: 9   Global Step: 152820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:56,182-Speed 3309.61 samples/sec   Loss 1.6180   LearningRate 0.0294   Epoch: 9   Global Step: 152830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:24:59,269-Speed 3318.35 samples/sec   Loss 1.6431   LearningRate 0.0294   Epoch: 9   Global Step: 152840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:02,350-Speed 3323.77 samples/sec   Loss 1.6118   LearningRate 0.0294   Epoch: 9   Global Step: 152850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:05,443-Speed 3312.13 samples/sec   Loss 1.6513   LearningRate 0.0294   Epoch: 9   Global Step: 152860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:08,544-Speed 3303.08 samples/sec   Loss 1.5968   LearningRate 0.0294   Epoch: 9   Global Step: 152870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:11,619-Speed 3331.28 samples/sec   Loss 1.6046   LearningRate 0.0294   Epoch: 9   Global Step: 152880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:14,698-Speed 3325.75 samples/sec   Loss 1.6556   LearningRate 0.0294   Epoch: 9   Global Step: 152890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:17,784-Speed 3319.23 samples/sec   Loss 1.6180   LearningRate 0.0294   Epoch: 9   Global Step: 152900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:25:20,910-Speed 3276.16 samples/sec   Loss 1.5895   LearningRate 0.0294   Epoch: 9   Global Step: 152910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:24,013-Speed 3300.84 samples/sec   Loss 1.6304   LearningRate 0.0294   Epoch: 9   Global Step: 152920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:27,094-Speed 3324.41 samples/sec   Loss 1.6463   LearningRate 0.0294   Epoch: 9   Global Step: 152930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:30,171-Speed 3328.59 samples/sec   Loss 1.6858   LearningRate 0.0294   Epoch: 9   Global Step: 152940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:33,253-Speed 3323.84 samples/sec   Loss 1.6345   LearningRate 0.0294   Epoch: 9   Global Step: 152950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:36,339-Speed 3319.32 samples/sec   Loss 1.6024   LearningRate 0.0294   Epoch: 9   Global Step: 152960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:39,414-Speed 3330.67 samples/sec   Loss 1.6261   LearningRate 0.0294   Epoch: 9   Global Step: 152970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:42,613-Speed 3201.64 samples/sec   Loss 1.6070   LearningRate 0.0293   Epoch: 9   Global Step: 152980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:45,685-Speed 3333.34 samples/sec   Loss 1.6787   LearningRate 0.0293   Epoch: 9   Global Step: 152990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:48,761-Speed 3329.59 samples/sec   Loss 1.6815   LearningRate 0.0293   Epoch: 9   Global Step: 153000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:51,859-Speed 3306.82 samples/sec   Loss 1.6569   LearningRate 0.0293   Epoch: 9   Global Step: 153010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:25:54,979-Speed 3282.03 samples/sec   Loss 1.6315   LearningRate 0.0293   Epoch: 9   Global Step: 153020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:25:58,063-Speed 3321.06 samples/sec   Loss 1.7248   LearningRate 0.0293   Epoch: 9   Global Step: 153030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:01,157-Speed 3311.25 samples/sec   Loss 1.7137   LearningRate 0.0293   Epoch: 9   Global Step: 153040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:04,239-Speed 3323.46 samples/sec   Loss 1.6708   LearningRate 0.0293   Epoch: 9   Global Step: 153050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:07,328-Speed 3315.96 samples/sec   Loss 1.5838   LearningRate 0.0293   Epoch: 9   Global Step: 153060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:10,411-Speed 3322.47 samples/sec   Loss 1.6129   LearningRate 0.0293   Epoch: 9   Global Step: 153070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:13,482-Speed 3334.64 samples/sec   Loss 1.7274   LearningRate 0.0293   Epoch: 9   Global Step: 153080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:16,556-Speed 3332.55 samples/sec   Loss 1.6566   LearningRate 0.0293   Epoch: 9   Global Step: 153090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:19,628-Speed 3334.34 samples/sec   Loss 1.6813   LearningRate 0.0293   Epoch: 9   Global Step: 153100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:22,700-Speed 3333.25 samples/sec   Loss 1.6258   LearningRate 0.0293   Epoch: 9   Global Step: 153110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:25,873-Speed 3228.06 samples/sec   Loss 1.6633   LearningRate 0.0293   Epoch: 9   Global Step: 153120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:26:28,993-Speed 3283.09 samples/sec   Loss 1.6751   LearningRate 0.0293   Epoch: 9   Global Step: 153130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:26:32,093-Speed 3304.40 samples/sec   Loss 1.6559   LearningRate 0.0293   Epoch: 9   Global Step: 153140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:26:35,198-Speed 3299.11 samples/sec   Loss 1.6528   LearningRate 0.0293   Epoch: 9   Global Step: 153150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:26:38,300-Speed 3301.98 samples/sec   Loss 1.6719   LearningRate 0.0293   Epoch: 9   Global Step: 153160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:26:41,360-Speed 3346.40 samples/sec   Loss 1.6808   LearningRate 0.0293   Epoch: 9   Global Step: 153170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:44,437-Speed 3329.51 samples/sec   Loss 1.6553   LearningRate 0.0293   Epoch: 9   Global Step: 153180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:47,512-Speed 3330.76 samples/sec   Loss 1.6685   LearningRate 0.0293   Epoch: 9   Global Step: 153190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:50,589-Speed 3327.68 samples/sec   Loss 1.6601   LearningRate 0.0293   Epoch: 9   Global Step: 153200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:53,664-Speed 3331.04 samples/sec   Loss 1.6819   LearningRate 0.0293   Epoch: 9   Global Step: 153210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:56,740-Speed 3330.23 samples/sec   Loss 1.6888   LearningRate 0.0293   Epoch: 9   Global Step: 153220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:26:59,822-Speed 3323.42 samples/sec   Loss 1.6727   LearningRate 0.0293   Epoch: 9   Global Step: 153230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:02,902-Speed 3325.46 samples/sec   Loss 1.6719   LearningRate 0.0293   Epoch: 9   Global Step: 153240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:05,976-Speed 3331.61 samples/sec   Loss 1.7444   LearningRate 0.0293   Epoch: 9   Global Step: 153250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:09,052-Speed 3329.25 samples/sec   Loss 1.7108   LearningRate 0.0293   Epoch: 9   Global Step: 153260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:12,132-Speed 3325.95 samples/sec   Loss 1.6605   LearningRate 0.0293   Epoch: 9   Global Step: 153270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:27:15,234-Speed 3301.65 samples/sec   Loss 1.6917   LearningRate 0.0292   Epoch: 9   Global Step: 153280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:27:18,314-Speed 3326.87 samples/sec   Loss 1.6545   LearningRate 0.0292   Epoch: 9   Global Step: 153290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:27:21,413-Speed 3304.30 samples/sec   Loss 1.6145   LearningRate 0.0292   Epoch: 9   Global Step: 153300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:27:24,521-Speed 3296.07 samples/sec   Loss 1.6708   LearningRate 0.0292   Epoch: 9   Global Step: 153310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:27:27,598-Speed 3328.68 samples/sec   Loss 1.6638   LearningRate 0.0292   Epoch: 9   Global Step: 153320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:27:30,682-Speed 3321.42 samples/sec   Loss 1.6567   LearningRate 0.0292   Epoch: 9   Global Step: 153330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:27:33,911-Speed 3171.82 samples/sec   Loss 1.6196   LearningRate 0.0292   Epoch: 9   Global Step: 153340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:27:37,000-Speed 3315.04 samples/sec   Loss 1.7157   LearningRate 0.0292   Epoch: 9   Global Step: 153350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:40,079-Speed 3326.86 samples/sec   Loss 1.6348   LearningRate 0.0292   Epoch: 9   Global Step: 153360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:43,158-Speed 3326.79 samples/sec   Loss 1.6659   LearningRate 0.0292   Epoch: 9   Global Step: 153370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:46,326-Speed 3233.03 samples/sec   Loss 1.6769   LearningRate 0.0292   Epoch: 9   Global Step: 153380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:49,461-Speed 3266.97 samples/sec   Loss 1.6309   LearningRate 0.0292   Epoch: 9   Global Step: 153390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:52,541-Speed 3326.10 samples/sec   Loss 1.6621   LearningRate 0.0292   Epoch: 9   Global Step: 153400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:55,618-Speed 3328.29 samples/sec   Loss 1.6437   LearningRate 0.0292   Epoch: 9   Global Step: 153410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:27:58,714-Speed 3307.96 samples/sec   Loss 1.6416   LearningRate 0.0292   Epoch: 9   Global Step: 153420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:28:01,801-Speed 3318.10 samples/sec   Loss 1.6879   LearningRate 0.0292   Epoch: 9   Global Step: 153430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:28:04,896-Speed 3309.18 samples/sec   Loss 1.7163   LearningRate 0.0292   Epoch: 9   Global Step: 153440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:28:07,998-Speed 3301.87 samples/sec   Loss 1.6818   LearningRate 0.0292   Epoch: 9   Global Step: 153450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:11,079-Speed 3323.75 samples/sec   Loss 1.6630   LearningRate 0.0292   Epoch: 9   Global Step: 153460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:14,207-Speed 3275.79 samples/sec   Loss 1.6949   LearningRate 0.0292   Epoch: 9   Global Step: 153470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:17,334-Speed 3276.10 samples/sec   Loss 1.6551   LearningRate 0.0292   Epoch: 9   Global Step: 153480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:20,432-Speed 3306.00 samples/sec   Loss 1.6901   LearningRate 0.0292   Epoch: 9   Global Step: 153490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:23,561-Speed 3273.36 samples/sec   Loss 1.6281   LearningRate 0.0292   Epoch: 9   Global Step: 153500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:26,642-Speed 3324.69 samples/sec   Loss 1.7384   LearningRate 0.0292   Epoch: 9   Global Step: 153510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:29,732-Speed 3313.97 samples/sec   Loss 1.6512   LearningRate 0.0292   Epoch: 9   Global Step: 153520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:32,822-Speed 3315.40 samples/sec   Loss 1.6399   LearningRate 0.0292   Epoch: 9   Global Step: 153530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:37,743-Speed 2081.02 samples/sec   Loss 1.6906   LearningRate 0.0292   Epoch: 9   Global Step: 153540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:40,827-Speed 3320.30 samples/sec   Loss 1.6914   LearningRate 0.0292   Epoch: 9   Global Step: 153550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:46,518-Speed 1799.89 samples/sec   Loss 1.6874   LearningRate 0.0292   Epoch: 9   Global Step: 153560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:49,593-Speed 3330.26 samples/sec   Loss 1.6361   LearningRate 0.0292   Epoch: 9   Global Step: 153570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:52,668-Speed 3331.51 samples/sec   Loss 1.6765   LearningRate 0.0292   Epoch: 9   Global Step: 153580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:55,748-Speed 3325.49 samples/sec   Loss 1.6270   LearningRate 0.0291   Epoch: 9   Global Step: 153590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:28:58,828-Speed 3325.35 samples/sec   Loss 1.6493   LearningRate 0.0291   Epoch: 9   Global Step: 153600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:29:01,934-Speed 3297.14 samples/sec   Loss 1.6907   LearningRate 0.0291   Epoch: 9   Global Step: 153610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:29:05,037-Speed 3301.49 samples/sec   Loss 1.6391   LearningRate 0.0291   Epoch: 9   Global Step: 153620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:29:08,126-Speed 3315.27 samples/sec   Loss 1.7496   LearningRate 0.0291   Epoch: 9   Global Step: 153630   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:11,239-Speed 3290.82 samples/sec   Loss 1.7085   LearningRate 0.0291   Epoch: 9   Global Step: 153640   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:14,319-Speed 3325.04 samples/sec   Loss 1.6763   LearningRate 0.0291   Epoch: 9   Global Step: 153650   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:17,521-Speed 3198.68 samples/sec   Loss 1.7177   LearningRate 0.0291   Epoch: 9   Global Step: 153660   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:20,769-Speed 3154.08 samples/sec   Loss 1.6855   LearningRate 0.0291   Epoch: 9   Global Step: 153670   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:23,844-Speed 3330.00 samples/sec   Loss 1.6507   LearningRate 0.0291   Epoch: 9   Global Step: 153680   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:26,979-Speed 3267.65 samples/sec   Loss 1.6993   LearningRate 0.0291   Epoch: 9   Global Step: 153690   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:30,078-Speed 3304.45 samples/sec   Loss 1.6997   LearningRate 0.0291   Epoch: 9   Global Step: 153700   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:33,199-Speed 3282.22 samples/sec   Loss 1.7009   LearningRate 0.0291   Epoch: 9   Global Step: 153710   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:36,382-Speed 3217.23 samples/sec   Loss 1.6613   LearningRate 0.0291   Epoch: 9   Global Step: 153720   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 15:29:39,500-Speed 3285.54 samples/sec   Loss 1.6891   LearningRate 0.0291   Epoch: 9   Global Step: 153730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:29:42,595-Speed 3309.18 samples/sec   Loss 1.7013   LearningRate 0.0291   Epoch: 9   Global Step: 153740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:29:45,682-Speed 3317.63 samples/sec   Loss 1.6620   LearningRate 0.0291   Epoch: 9   Global Step: 153750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:29:48,782-Speed 3304.02 samples/sec   Loss 1.6416   LearningRate 0.0291   Epoch: 9   Global Step: 153760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:29:51,864-Speed 3323.48 samples/sec   Loss 1.6428   LearningRate 0.0291   Epoch: 9   Global Step: 153770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:29:55,039-Speed 3225.44 samples/sec   Loss 1.6724   LearningRate 0.0291   Epoch: 9   Global Step: 153780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:29:58,117-Speed 3328.40 samples/sec   Loss 1.7051   LearningRate 0.0291   Epoch: 9   Global Step: 153790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:30:01,253-Speed 3265.72 samples/sec   Loss 1.7031   LearningRate 0.0291   Epoch: 9   Global Step: 153800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:30:04,369-Speed 3286.72 samples/sec   Loss 1.7378   LearningRate 0.0291   Epoch: 9   Global Step: 153810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:30:07,457-Speed 3317.12 samples/sec   Loss 1.6692   LearningRate 0.0291   Epoch: 9   Global Step: 153820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:30:10,535-Speed 3327.76 samples/sec   Loss 1.6471   LearningRate 0.0291   Epoch: 9   Global Step: 153830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:13,627-Speed 3312.46 samples/sec   Loss 1.6779   LearningRate 0.0291   Epoch: 9   Global Step: 153840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:16,726-Speed 3304.44 samples/sec   Loss 1.6973   LearningRate 0.0291   Epoch: 9   Global Step: 153850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:19,816-Speed 3316.34 samples/sec   Loss 1.6847   LearningRate 0.0291   Epoch: 9   Global Step: 153860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:22,904-Speed 3317.16 samples/sec   Loss 1.6722   LearningRate 0.0291   Epoch: 9   Global Step: 153870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:25,981-Speed 3328.29 samples/sec   Loss 1.7010   LearningRate 0.0291   Epoch: 9   Global Step: 153880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:29,068-Speed 3317.70 samples/sec   Loss 1.7199   LearningRate 0.0291   Epoch: 9   Global Step: 153890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:32,260-Speed 3208.98 samples/sec   Loss 1.7153   LearningRate 0.0290   Epoch: 9   Global Step: 153900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:35,453-Speed 3208.47 samples/sec   Loss 1.6863   LearningRate 0.0290   Epoch: 9   Global Step: 153910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:38,634-Speed 3219.67 samples/sec   Loss 1.7285   LearningRate 0.0290   Epoch: 9   Global Step: 153920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:30:41,741-Speed 3296.83 samples/sec   Loss 1.6400   LearningRate 0.0290   Epoch: 9   Global Step: 153930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:30:44,822-Speed 3323.73 samples/sec   Loss 1.7003   LearningRate 0.0290   Epoch: 9   Global Step: 153940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:30:47,983-Speed 3241.03 samples/sec   Loss 1.6526   LearningRate 0.0290   Epoch: 9   Global Step: 153950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:30:51,122-Speed 3262.85 samples/sec   Loss 1.6872   LearningRate 0.0290   Epoch: 9   Global Step: 153960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:30:54,266-Speed 3256.76 samples/sec   Loss 1.6590   LearningRate 0.0290   Epoch: 9   Global Step: 153970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:30:57,414-Speed 3253.74 samples/sec   Loss 1.6147   LearningRate 0.0290   Epoch: 9   Global Step: 153980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:31:00,519-Speed 3299.47 samples/sec   Loss 1.6804   LearningRate 0.0290   Epoch: 9   Global Step: 153990   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:31:03,609-Speed 3314.15 samples/sec   Loss 1.6220   LearningRate 0.0290   Epoch: 9   Global Step: 154000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:31:47,888-[lfw][154000]XNorm: 22.941242
Training: 2022-04-11 15:31:47,888-[lfw][154000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 15:31:47,889-[lfw][154000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:32:39,362-[cfp_fp][154000]XNorm: 22.107368
Training: 2022-04-11 15:32:39,363-[cfp_fp][154000]Accuracy-Flip: 0.98814+-0.00527
Training: 2022-04-11 15:32:39,363-[cfp_fp][154000]Accuracy-Highest: 0.98843
Training: 2022-04-11 15:33:23,676-[agedb_30][154000]XNorm: 23.299216
Training: 2022-04-11 15:33:23,677-[agedb_30][154000]Accuracy-Flip: 0.98367+-0.00674
Training: 2022-04-11 15:33:23,677-[agedb_30][154000]Accuracy-Highest: 0.98367
Training: 2022-04-11 15:33:26,771-Speed 71.53 samples/sec   Loss 1.7467   LearningRate 0.0290   Epoch: 9   Global Step: 154010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:33:29,832-Speed 3346.01 samples/sec   Loss 1.7330   LearningRate 0.0290   Epoch: 9   Global Step: 154020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:33:32,884-Speed 3355.96 samples/sec   Loss 1.6761   LearningRate 0.0290   Epoch: 9   Global Step: 154030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:33:35,961-Speed 3328.07 samples/sec   Loss 1.6931   LearningRate 0.0290   Epoch: 9   Global Step: 154040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:33:39,030-Speed 3337.74 samples/sec   Loss 1.7553   LearningRate 0.0290   Epoch: 9   Global Step: 154050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:33:42,104-Speed 3331.06 samples/sec   Loss 1.6873   LearningRate 0.0290   Epoch: 9   Global Step: 154060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:33:45,188-Speed 3322.15 samples/sec   Loss 1.6954   LearningRate 0.0290   Epoch: 9   Global Step: 154070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:33:48,268-Speed 3325.31 samples/sec   Loss 1.6667   LearningRate 0.0290   Epoch: 9   Global Step: 154080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:33:51,364-Speed 3307.86 samples/sec   Loss 1.6870   LearningRate 0.0290   Epoch: 9   Global Step: 154090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:33:54,449-Speed 3319.72 samples/sec   Loss 1.6786   LearningRate 0.0290   Epoch: 9   Global Step: 154100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:33:57,533-Speed 3321.66 samples/sec   Loss 1.7189   LearningRate 0.0290   Epoch: 9   Global Step: 154110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:34:00,620-Speed 3317.91 samples/sec   Loss 1.7334   LearningRate 0.0290   Epoch: 9   Global Step: 154120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:34:03,697-Speed 3328.68 samples/sec   Loss 1.7451   LearningRate 0.0290   Epoch: 9   Global Step: 154130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:34:06,789-Speed 3312.25 samples/sec   Loss 1.6696   LearningRate 0.0290   Epoch: 9   Global Step: 154140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:34:09,931-Speed 3259.91 samples/sec   Loss 1.6734   LearningRate 0.0290   Epoch: 9   Global Step: 154150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:34:13,203-Speed 3129.79 samples/sec   Loss 1.6371   LearningRate 0.0290   Epoch: 9   Global Step: 154160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:16,473-Speed 3132.91 samples/sec   Loss 1.6491   LearningRate 0.0290   Epoch: 9   Global Step: 154170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:19,729-Speed 3145.33 samples/sec   Loss 1.7380   LearningRate 0.0290   Epoch: 9   Global Step: 154180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:22,804-Speed 3331.44 samples/sec   Loss 1.7262   LearningRate 0.0290   Epoch: 9   Global Step: 154190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:25,998-Speed 3206.61 samples/sec   Loss 1.7065   LearningRate 0.0290   Epoch: 9   Global Step: 154200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:29,095-Speed 3307.01 samples/sec   Loss 1.6858   LearningRate 0.0289   Epoch: 9   Global Step: 154210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:32,173-Speed 3327.42 samples/sec   Loss 1.6729   LearningRate 0.0289   Epoch: 9   Global Step: 154220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:35,254-Speed 3324.39 samples/sec   Loss 1.6945   LearningRate 0.0289   Epoch: 9   Global Step: 154230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:38,373-Speed 3284.11 samples/sec   Loss 1.7309   LearningRate 0.0289   Epoch: 9   Global Step: 154240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:41,488-Speed 3287.06 samples/sec   Loss 1.7049   LearningRate 0.0289   Epoch: 9   Global Step: 154250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:44,617-Speed 3274.36 samples/sec   Loss 1.6585   LearningRate 0.0289   Epoch: 9   Global Step: 154260   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:34:47,721-Speed 3299.53 samples/sec   Loss 1.7217   LearningRate 0.0289   Epoch: 9   Global Step: 154270   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:34:50,795-Speed 3331.95 samples/sec   Loss 1.6906   LearningRate 0.0289   Epoch: 9   Global Step: 154280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:34:53,868-Speed 3332.36 samples/sec   Loss 1.6742   LearningRate 0.0289   Epoch: 9   Global Step: 154290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:34:56,958-Speed 3314.95 samples/sec   Loss 1.7166   LearningRate 0.0289   Epoch: 9   Global Step: 154300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:00,052-Speed 3310.52 samples/sec   Loss 1.7309   LearningRate 0.0289   Epoch: 9   Global Step: 154310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:03,141-Speed 3315.24 samples/sec   Loss 1.7141   LearningRate 0.0289   Epoch: 9   Global Step: 154320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:06,234-Speed 3311.78 samples/sec   Loss 1.7121   LearningRate 0.0289   Epoch: 9   Global Step: 154330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:09,362-Speed 3275.18 samples/sec   Loss 1.6602   LearningRate 0.0289   Epoch: 9   Global Step: 154340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:12,446-Speed 3321.19 samples/sec   Loss 1.7079   LearningRate 0.0289   Epoch: 9   Global Step: 154350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:15,525-Speed 3326.27 samples/sec   Loss 1.6879   LearningRate 0.0289   Epoch: 9   Global Step: 154360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:18,646-Speed 3281.24 samples/sec   Loss 1.7011   LearningRate 0.0289   Epoch: 9   Global Step: 154370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:21,735-Speed 3316.17 samples/sec   Loss 1.6540   LearningRate 0.0289   Epoch: 9   Global Step: 154380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:24,819-Speed 3320.67 samples/sec   Loss 1.7520   LearningRate 0.0289   Epoch: 9   Global Step: 154390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:35:27,907-Speed 3316.16 samples/sec   Loss 1.6620   LearningRate 0.0289   Epoch: 9   Global Step: 154400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:35:31,036-Speed 3273.32 samples/sec   Loss 1.6937   LearningRate 0.0289   Epoch: 9   Global Step: 154410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:35:34,246-Speed 3190.69 samples/sec   Loss 1.7010   LearningRate 0.0289   Epoch: 9   Global Step: 154420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:35:37,339-Speed 3312.64 samples/sec   Loss 1.7011   LearningRate 0.0289   Epoch: 9   Global Step: 154430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:35:40,416-Speed 3328.65 samples/sec   Loss 1.6936   LearningRate 0.0289   Epoch: 9   Global Step: 154440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:35:43,505-Speed 3315.68 samples/sec   Loss 1.6557   LearningRate 0.0289   Epoch: 9   Global Step: 154450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:35:46,596-Speed 3313.09 samples/sec   Loss 1.7433   LearningRate 0.0289   Epoch: 9   Global Step: 154460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:35:49,669-Speed 3332.94 samples/sec   Loss 1.7254   LearningRate 0.0289   Epoch: 9   Global Step: 154470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:52,780-Speed 3292.67 samples/sec   Loss 1.6757   LearningRate 0.0289   Epoch: 9   Global Step: 154480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:55,859-Speed 3325.83 samples/sec   Loss 1.6673   LearningRate 0.0289   Epoch: 9   Global Step: 154490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:35:58,946-Speed 3317.94 samples/sec   Loss 1.7123   LearningRate 0.0289   Epoch: 9   Global Step: 154500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:36:02,026-Speed 3325.36 samples/sec   Loss 1.7271   LearningRate 0.0289   Epoch: 9   Global Step: 154510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:36:05,121-Speed 3310.14 samples/sec   Loss 1.6901   LearningRate 0.0288   Epoch: 9   Global Step: 154520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:36:08,209-Speed 3316.09 samples/sec   Loss 1.6532   LearningRate 0.0288   Epoch: 9   Global Step: 154530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:36:11,328-Speed 3284.53 samples/sec   Loss 1.7430   LearningRate 0.0288   Epoch: 9   Global Step: 154540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:36:14,416-Speed 3315.86 samples/sec   Loss 1.6964   LearningRate 0.0288   Epoch: 9   Global Step: 154550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:36:17,603-Speed 3214.59 samples/sec   Loss 1.6833   LearningRate 0.0288   Epoch: 9   Global Step: 154560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:36:20,720-Speed 3286.03 samples/sec   Loss 1.7228   LearningRate 0.0288   Epoch: 9   Global Step: 154570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:23,854-Speed 3267.88 samples/sec   Loss 1.7625   LearningRate 0.0288   Epoch: 9   Global Step: 154580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:27,076-Speed 3178.15 samples/sec   Loss 1.7193   LearningRate 0.0288   Epoch: 9   Global Step: 154590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:30,216-Speed 3262.07 samples/sec   Loss 1.7190   LearningRate 0.0288   Epoch: 9   Global Step: 154600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:33,302-Speed 3319.03 samples/sec   Loss 1.7140   LearningRate 0.0288   Epoch: 9   Global Step: 154610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:36,427-Speed 3278.46 samples/sec   Loss 1.7133   LearningRate 0.0288   Epoch: 9   Global Step: 154620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:39,603-Speed 3224.22 samples/sec   Loss 1.7535   LearningRate 0.0288   Epoch: 9   Global Step: 154630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:42,737-Speed 3268.55 samples/sec   Loss 1.7615   LearningRate 0.0288   Epoch: 9   Global Step: 154640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:45,936-Speed 3201.63 samples/sec   Loss 1.6983   LearningRate 0.0288   Epoch: 9   Global Step: 154650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:49,055-Speed 3283.22 samples/sec   Loss 1.7438   LearningRate 0.0288   Epoch: 9   Global Step: 154660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:52,191-Speed 3266.48 samples/sec   Loss 1.7571   LearningRate 0.0288   Epoch: 9   Global Step: 154670   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:36:55,381-Speed 3210.36 samples/sec   Loss 1.7027   LearningRate 0.0288   Epoch: 9   Global Step: 154680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:36:58,554-Speed 3228.74 samples/sec   Loss 1.7562   LearningRate 0.0288   Epoch: 9   Global Step: 154690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:37:01,635-Speed 3324.21 samples/sec   Loss 1.7206   LearningRate 0.0288   Epoch: 9   Global Step: 154700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:37:04,728-Speed 3311.78 samples/sec   Loss 1.7413   LearningRate 0.0288   Epoch: 9   Global Step: 154710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:37:07,816-Speed 3316.29 samples/sec   Loss 1.7507   LearningRate 0.0288   Epoch: 9   Global Step: 154720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:37:10,925-Speed 3294.38 samples/sec   Loss 1.6588   LearningRate 0.0288   Epoch: 9   Global Step: 154730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:37:14,001-Speed 3329.28 samples/sec   Loss 1.7003   LearningRate 0.0288   Epoch: 9   Global Step: 154740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:37:17,066-Speed 3341.93 samples/sec   Loss 1.7799   LearningRate 0.0288   Epoch: 9   Global Step: 154750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:20,143-Speed 3328.76 samples/sec   Loss 1.7338   LearningRate 0.0288   Epoch: 9   Global Step: 154760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:23,233-Speed 3315.47 samples/sec   Loss 1.6895   LearningRate 0.0288   Epoch: 9   Global Step: 154770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:26,341-Speed 3294.78 samples/sec   Loss 1.7442   LearningRate 0.0288   Epoch: 9   Global Step: 154780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:29,436-Speed 3310.17 samples/sec   Loss 1.7929   LearningRate 0.0288   Epoch: 9   Global Step: 154790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:32,513-Speed 3328.54 samples/sec   Loss 1.7233   LearningRate 0.0288   Epoch: 9   Global Step: 154800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:35,594-Speed 3324.04 samples/sec   Loss 1.7162   LearningRate 0.0288   Epoch: 9   Global Step: 154810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:38,816-Speed 3178.78 samples/sec   Loss 1.7333   LearningRate 0.0288   Epoch: 9   Global Step: 154820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:41,912-Speed 3308.46 samples/sec   Loss 1.7155   LearningRate 0.0287   Epoch: 9   Global Step: 154830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:44,991-Speed 3325.58 samples/sec   Loss 1.6952   LearningRate 0.0287   Epoch: 9   Global Step: 154840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 15:37:48,108-Speed 3286.57 samples/sec   Loss 1.7613   LearningRate 0.0287   Epoch: 9   Global Step: 154850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:37:51,195-Speed 3317.70 samples/sec   Loss 1.6888   LearningRate 0.0287   Epoch: 9   Global Step: 154860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:37:54,270-Speed 3330.95 samples/sec   Loss 1.7574   LearningRate 0.0287   Epoch: 9   Global Step: 154870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:37:57,359-Speed 3315.53 samples/sec   Loss 1.7180   LearningRate 0.0287   Epoch: 9   Global Step: 154880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:00,441-Speed 3323.13 samples/sec   Loss 1.7010   LearningRate 0.0287   Epoch: 9   Global Step: 154890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:03,535-Speed 3310.81 samples/sec   Loss 1.7737   LearningRate 0.0287   Epoch: 9   Global Step: 154900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:06,619-Speed 3320.49 samples/sec   Loss 1.7099   LearningRate 0.0287   Epoch: 9   Global Step: 154910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:09,713-Speed 3310.45 samples/sec   Loss 1.7487   LearningRate 0.0287   Epoch: 9   Global Step: 154920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:12,809-Speed 3308.92 samples/sec   Loss 1.7169   LearningRate 0.0287   Epoch: 9   Global Step: 154930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:15,888-Speed 3326.26 samples/sec   Loss 1.7213   LearningRate 0.0287   Epoch: 9   Global Step: 154940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:18,973-Speed 3321.00 samples/sec   Loss 1.8087   LearningRate 0.0287   Epoch: 9   Global Step: 154950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:22,094-Speed 3281.70 samples/sec   Loss 1.7326   LearningRate 0.0287   Epoch: 9   Global Step: 154960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:25,185-Speed 3312.79 samples/sec   Loss 1.6612   LearningRate 0.0287   Epoch: 9   Global Step: 154970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:28,266-Speed 3324.45 samples/sec   Loss 1.7111   LearningRate 0.0287   Epoch: 9   Global Step: 154980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:31,343-Speed 3329.04 samples/sec   Loss 1.7241   LearningRate 0.0287   Epoch: 9   Global Step: 154990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:34,431-Speed 3316.16 samples/sec   Loss 1.7852   LearningRate 0.0287   Epoch: 9   Global Step: 155000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:37,510-Speed 3326.43 samples/sec   Loss 1.7418   LearningRate 0.0287   Epoch: 9   Global Step: 155010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:40,598-Speed 3317.31 samples/sec   Loss 1.6999   LearningRate 0.0287   Epoch: 9   Global Step: 155020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:43,676-Speed 3327.82 samples/sec   Loss 1.7123   LearningRate 0.0287   Epoch: 9   Global Step: 155030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:46,754-Speed 3328.16 samples/sec   Loss 1.7150   LearningRate 0.0287   Epoch: 9   Global Step: 155040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:49,818-Speed 3342.17 samples/sec   Loss 1.7710   LearningRate 0.0287   Epoch: 9   Global Step: 155050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:52,900-Speed 3323.61 samples/sec   Loss 1.7314   LearningRate 0.0287   Epoch: 9   Global Step: 155060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:55,992-Speed 3311.68 samples/sec   Loss 1.6968   LearningRate 0.0287   Epoch: 9   Global Step: 155070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:38:59,098-Speed 3298.23 samples/sec   Loss 1.7033   LearningRate 0.0287   Epoch: 9   Global Step: 155080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:39:02,290-Speed 3208.50 samples/sec   Loss 1.7340   LearningRate 0.0287   Epoch: 9   Global Step: 155090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:39:05,380-Speed 3314.38 samples/sec   Loss 1.7508   LearningRate 0.0287   Epoch: 9   Global Step: 155100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:39:08,478-Speed 3305.65 samples/sec   Loss 1.6937   LearningRate 0.0287   Epoch: 9   Global Step: 155110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:39:11,569-Speed 3314.46 samples/sec   Loss 1.7026   LearningRate 0.0287   Epoch: 9   Global Step: 155120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:39:14,709-Speed 3261.86 samples/sec   Loss 1.7724   LearningRate 0.0287   Epoch: 9   Global Step: 155130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:39:17,878-Speed 3231.86 samples/sec   Loss 1.7984   LearningRate 0.0287   Epoch: 9   Global Step: 155140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:39:20,960-Speed 3323.70 samples/sec   Loss 1.7544   LearningRate 0.0286   Epoch: 9   Global Step: 155150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:24,050-Speed 3314.71 samples/sec   Loss 1.7340   LearningRate 0.0286   Epoch: 9   Global Step: 155160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:27,136-Speed 3319.10 samples/sec   Loss 1.6751   LearningRate 0.0286   Epoch: 9   Global Step: 155170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:30,269-Speed 3269.06 samples/sec   Loss 1.7648   LearningRate 0.0286   Epoch: 9   Global Step: 155180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:33,361-Speed 3312.13 samples/sec   Loss 1.6978   LearningRate 0.0286   Epoch: 9   Global Step: 155190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:36,441-Speed 3325.82 samples/sec   Loss 1.7795   LearningRate 0.0286   Epoch: 9   Global Step: 155200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:39,521-Speed 3325.68 samples/sec   Loss 1.7832   LearningRate 0.0286   Epoch: 9   Global Step: 155210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:42,629-Speed 3295.61 samples/sec   Loss 1.6949   LearningRate 0.0286   Epoch: 9   Global Step: 155220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:45,704-Speed 3330.58 samples/sec   Loss 1.7256   LearningRate 0.0286   Epoch: 9   Global Step: 155230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:48,813-Speed 3294.11 samples/sec   Loss 1.7081   LearningRate 0.0286   Epoch: 9   Global Step: 155240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:51,935-Speed 3280.94 samples/sec   Loss 1.6788   LearningRate 0.0286   Epoch: 9   Global Step: 155250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:55,025-Speed 3314.05 samples/sec   Loss 1.7393   LearningRate 0.0286   Epoch: 9   Global Step: 155260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:39:58,113-Speed 3316.82 samples/sec   Loss 1.7759   LearningRate 0.0286   Epoch: 9   Global Step: 155270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:01,195-Speed 3323.50 samples/sec   Loss 1.7136   LearningRate 0.0286   Epoch: 9   Global Step: 155280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:04,279-Speed 3321.12 samples/sec   Loss 1.7545   LearningRate 0.0286   Epoch: 9   Global Step: 155290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:07,358-Speed 3326.68 samples/sec   Loss 1.7111   LearningRate 0.0286   Epoch: 9   Global Step: 155300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:10,437-Speed 3326.97 samples/sec   Loss 1.8036   LearningRate 0.0286   Epoch: 9   Global Step: 155310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:13,518-Speed 3323.76 samples/sec   Loss 1.7172   LearningRate 0.0286   Epoch: 9   Global Step: 155320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:16,597-Speed 3327.47 samples/sec   Loss 1.7113   LearningRate 0.0286   Epoch: 9   Global Step: 155330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:19,709-Speed 3291.06 samples/sec   Loss 1.7511   LearningRate 0.0286   Epoch: 9   Global Step: 155340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:22,800-Speed 3312.76 samples/sec   Loss 1.7457   LearningRate 0.0286   Epoch: 9   Global Step: 155350   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:40:25,873-Speed 3333.03 samples/sec   Loss 1.7440   LearningRate 0.0286   Epoch: 9   Global Step: 155360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:28,954-Speed 3324.64 samples/sec   Loss 1.7540   LearningRate 0.0286   Epoch: 9   Global Step: 155370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:32,046-Speed 3311.91 samples/sec   Loss 1.7237   LearningRate 0.0286   Epoch: 9   Global Step: 155380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:35,135-Speed 3316.47 samples/sec   Loss 1.7791   LearningRate 0.0286   Epoch: 9   Global Step: 155390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:38,276-Speed 3261.41 samples/sec   Loss 1.6676   LearningRate 0.0286   Epoch: 9   Global Step: 155400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:41,390-Speed 3288.46 samples/sec   Loss 1.8002   LearningRate 0.0286   Epoch: 9   Global Step: 155410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:44,498-Speed 3296.25 samples/sec   Loss 1.7722   LearningRate 0.0286   Epoch: 9   Global Step: 155420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:47,585-Speed 3317.81 samples/sec   Loss 1.7304   LearningRate 0.0286   Epoch: 9   Global Step: 155430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:50,678-Speed 3310.87 samples/sec   Loss 1.6875   LearningRate 0.0286   Epoch: 9   Global Step: 155440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:53,776-Speed 3306.37 samples/sec   Loss 1.7689   LearningRate 0.0286   Epoch: 9   Global Step: 155450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:40:57,004-Speed 3172.53 samples/sec   Loss 1.7477   LearningRate 0.0285   Epoch: 9   Global Step: 155460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:00,205-Speed 3212.88 samples/sec   Loss 1.7636   LearningRate 0.0285   Epoch: 9   Global Step: 155470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:03,302-Speed 3307.38 samples/sec   Loss 1.7570   LearningRate 0.0285   Epoch: 9   Global Step: 155480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:06,419-Speed 3285.93 samples/sec   Loss 1.8004   LearningRate 0.0285   Epoch: 9   Global Step: 155490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:09,509-Speed 3314.26 samples/sec   Loss 1.7791   LearningRate 0.0285   Epoch: 9   Global Step: 155500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:12,639-Speed 3272.28 samples/sec   Loss 1.7136   LearningRate 0.0285   Epoch: 9   Global Step: 155510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:15,763-Speed 3278.15 samples/sec   Loss 1.7420   LearningRate 0.0285   Epoch: 9   Global Step: 155520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:18,845-Speed 3323.32 samples/sec   Loss 1.7152   LearningRate 0.0285   Epoch: 9   Global Step: 155530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:21,929-Speed 3320.99 samples/sec   Loss 1.7240   LearningRate 0.0285   Epoch: 9   Global Step: 155540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:25,102-Speed 3228.77 samples/sec   Loss 1.6933   LearningRate 0.0285   Epoch: 9   Global Step: 155550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:28,247-Speed 3256.71 samples/sec   Loss 1.6805   LearningRate 0.0285   Epoch: 9   Global Step: 155560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:31,388-Speed 3261.37 samples/sec   Loss 1.7208   LearningRate 0.0285   Epoch: 9   Global Step: 155570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:34,517-Speed 3273.18 samples/sec   Loss 1.7564   LearningRate 0.0285   Epoch: 9   Global Step: 155580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:37,629-Speed 3290.69 samples/sec   Loss 1.7360   LearningRate 0.0285   Epoch: 9   Global Step: 155590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:40,773-Speed 3258.25 samples/sec   Loss 1.7525   LearningRate 0.0285   Epoch: 9   Global Step: 155600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:43,917-Speed 3257.28 samples/sec   Loss 1.7165   LearningRate 0.0285   Epoch: 9   Global Step: 155610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:47,092-Speed 3226.00 samples/sec   Loss 1.7917   LearningRate 0.0285   Epoch: 9   Global Step: 155620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:50,181-Speed 3315.43 samples/sec   Loss 1.7196   LearningRate 0.0285   Epoch: 9   Global Step: 155630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:53,329-Speed 3254.05 samples/sec   Loss 1.7583   LearningRate 0.0285   Epoch: 9   Global Step: 155640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:56,445-Speed 3286.68 samples/sec   Loss 1.7421   LearningRate 0.0285   Epoch: 9   Global Step: 155650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:41:59,527-Speed 3323.92 samples/sec   Loss 1.8292   LearningRate 0.0285   Epoch: 9   Global Step: 155660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:02,616-Speed 3315.12 samples/sec   Loss 1.7776   LearningRate 0.0285   Epoch: 9   Global Step: 155670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:05,694-Speed 3328.51 samples/sec   Loss 1.7209   LearningRate 0.0285   Epoch: 9   Global Step: 155680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:08,786-Speed 3312.30 samples/sec   Loss 1.7608   LearningRate 0.0285   Epoch: 9   Global Step: 155690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:11,862-Speed 3329.93 samples/sec   Loss 1.7500   LearningRate 0.0285   Epoch: 9   Global Step: 155700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:14,941-Speed 3325.48 samples/sec   Loss 1.7240   LearningRate 0.0285   Epoch: 9   Global Step: 155710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:18,048-Speed 3297.33 samples/sec   Loss 1.7190   LearningRate 0.0285   Epoch: 9   Global Step: 155720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:21,146-Speed 3306.60 samples/sec   Loss 1.7300   LearningRate 0.0285   Epoch: 9   Global Step: 155730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:24,280-Speed 3267.93 samples/sec   Loss 1.7534   LearningRate 0.0285   Epoch: 9   Global Step: 155740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:27,362-Speed 3323.21 samples/sec   Loss 1.7383   LearningRate 0.0285   Epoch: 9   Global Step: 155750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:30,454-Speed 3312.82 samples/sec   Loss 1.7654   LearningRate 0.0285   Epoch: 9   Global Step: 155760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:33,576-Speed 3280.00 samples/sec   Loss 1.7069   LearningRate 0.0284   Epoch: 9   Global Step: 155770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:36,732-Speed 3245.21 samples/sec   Loss 1.7536   LearningRate 0.0284   Epoch: 9   Global Step: 155780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:39,828-Speed 3308.97 samples/sec   Loss 1.7369   LearningRate 0.0284   Epoch: 9   Global Step: 155790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:42,922-Speed 3309.38 samples/sec   Loss 1.7960   LearningRate 0.0284   Epoch: 9   Global Step: 155800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:46,016-Speed 3310.87 samples/sec   Loss 1.7299   LearningRate 0.0284   Epoch: 9   Global Step: 155810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:49,118-Speed 3301.93 samples/sec   Loss 1.7574   LearningRate 0.0284   Epoch: 9   Global Step: 155820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:52,258-Speed 3262.11 samples/sec   Loss 1.7816   LearningRate 0.0284   Epoch: 9   Global Step: 155830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:55,411-Speed 3248.29 samples/sec   Loss 1.7186   LearningRate 0.0284   Epoch: 9   Global Step: 155840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:42:58,502-Speed 3314.08 samples/sec   Loss 1.7538   LearningRate 0.0284   Epoch: 9   Global Step: 155850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:01,584-Speed 3322.89 samples/sec   Loss 1.7734   LearningRate 0.0284   Epoch: 9   Global Step: 155860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:43:04,669-Speed 3320.74 samples/sec   Loss 1.7012   LearningRate 0.0284   Epoch: 9   Global Step: 155870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:43:07,750-Speed 3323.95 samples/sec   Loss 1.7396   LearningRate 0.0284   Epoch: 9   Global Step: 155880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:43:10,845-Speed 3309.43 samples/sec   Loss 1.7220   LearningRate 0.0284   Epoch: 9   Global Step: 155890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:43:13,913-Speed 3338.85 samples/sec   Loss 1.7678   LearningRate 0.0284   Epoch: 9   Global Step: 155900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:16,993-Speed 3325.22 samples/sec   Loss 1.7677   LearningRate 0.0284   Epoch: 9   Global Step: 155910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:20,084-Speed 3313.76 samples/sec   Loss 1.7587   LearningRate 0.0284   Epoch: 9   Global Step: 155920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:23,202-Speed 3284.58 samples/sec   Loss 1.6833   LearningRate 0.0284   Epoch: 9   Global Step: 155930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:26,353-Speed 3250.09 samples/sec   Loss 1.6964   LearningRate 0.0284   Epoch: 9   Global Step: 155940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:29,449-Speed 3308.71 samples/sec   Loss 1.7479   LearningRate 0.0284   Epoch: 9   Global Step: 155950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:32,589-Speed 3261.70 samples/sec   Loss 1.7305   LearningRate 0.0284   Epoch: 9   Global Step: 155960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:35,681-Speed 3313.09 samples/sec   Loss 1.7969   LearningRate 0.0284   Epoch: 9   Global Step: 155970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:38,779-Speed 3305.76 samples/sec   Loss 1.7008   LearningRate 0.0284   Epoch: 9   Global Step: 155980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:41,857-Speed 3327.83 samples/sec   Loss 1.7597   LearningRate 0.0284   Epoch: 9   Global Step: 155990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:43:44,969-Speed 3291.44 samples/sec   Loss 1.7278   LearningRate 0.0284   Epoch: 9   Global Step: 156000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:44:29,002-[lfw][156000]XNorm: 22.824285
Training: 2022-04-11 15:44:29,003-[lfw][156000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 15:44:29,003-[lfw][156000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:45:20,102-[cfp_fp][156000]XNorm: 22.534845
Training: 2022-04-11 15:45:20,103-[cfp_fp][156000]Accuracy-Flip: 0.98971+-0.00413
Training: 2022-04-11 15:45:20,103-[cfp_fp][156000]Accuracy-Highest: 0.98971
Training: 2022-04-11 15:46:04,095-[agedb_30][156000]XNorm: 23.496499
Training: 2022-04-11 15:46:04,096-[agedb_30][156000]Accuracy-Flip: 0.98400+-0.00696
Training: 2022-04-11 15:46:04,096-[agedb_30][156000]Accuracy-Highest: 0.98400
Training: 2022-04-11 15:46:07,238-Speed 71.98 samples/sec   Loss 1.7605   LearningRate 0.0284   Epoch: 9   Global Step: 156010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:46:10,311-Speed 3333.00 samples/sec   Loss 1.7254   LearningRate 0.0284   Epoch: 9   Global Step: 156020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:46:13,378-Speed 3339.51 samples/sec   Loss 1.7221   LearningRate 0.0284   Epoch: 9   Global Step: 156030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:46:16,445-Speed 3338.79 samples/sec   Loss 1.7463   LearningRate 0.0284   Epoch: 9   Global Step: 156040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:46:19,523-Speed 3327.73 samples/sec   Loss 1.8120   LearningRate 0.0284   Epoch: 9   Global Step: 156050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:46:22,595-Speed 3333.64 samples/sec   Loss 1.7147   LearningRate 0.0284   Epoch: 9   Global Step: 156060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:46:25,665-Speed 3337.09 samples/sec   Loss 1.7081   LearningRate 0.0284   Epoch: 9   Global Step: 156070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:46:28,796-Speed 3271.22 samples/sec   Loss 1.8071   LearningRate 0.0283   Epoch: 9   Global Step: 156080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:46:31,897-Speed 3302.71 samples/sec   Loss 1.7143   LearningRate 0.0283   Epoch: 9   Global Step: 156090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:46:35,082-Speed 3215.51 samples/sec   Loss 1.7800   LearningRate 0.0283   Epoch: 9   Global Step: 156100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:46:38,159-Speed 3328.42 samples/sec   Loss 1.7544   LearningRate 0.0283   Epoch: 9   Global Step: 156110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:46:41,249-Speed 3315.16 samples/sec   Loss 1.7312   LearningRate 0.0283   Epoch: 9   Global Step: 156120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:46:44,339-Speed 3314.13 samples/sec   Loss 1.8142   LearningRate 0.0283   Epoch: 9   Global Step: 156130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:46:47,428-Speed 3316.75 samples/sec   Loss 1.7686   LearningRate 0.0283   Epoch: 9   Global Step: 156140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:46:50,541-Speed 3289.86 samples/sec   Loss 1.7533   LearningRate 0.0283   Epoch: 9   Global Step: 156150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:46:53,664-Speed 3279.37 samples/sec   Loss 1.7459   LearningRate 0.0283   Epoch: 9   Global Step: 156160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:46:56,753-Speed 3315.86 samples/sec   Loss 1.7774   LearningRate 0.0283   Epoch: 9   Global Step: 156170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:46:59,830-Speed 3328.27 samples/sec   Loss 1.7301   LearningRate 0.0283   Epoch: 9   Global Step: 156180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:47:02,931-Speed 3303.37 samples/sec   Loss 1.7617   LearningRate 0.0283   Epoch: 9   Global Step: 156190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:06,014-Speed 3321.93 samples/sec   Loss 1.7376   LearningRate 0.0283   Epoch: 9   Global Step: 156200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:09,100-Speed 3319.44 samples/sec   Loss 1.7792   LearningRate 0.0283   Epoch: 9   Global Step: 156210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:12,190-Speed 3315.30 samples/sec   Loss 1.8131   LearningRate 0.0283   Epoch: 9   Global Step: 156220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:15,278-Speed 3316.60 samples/sec   Loss 1.7720   LearningRate 0.0283   Epoch: 9   Global Step: 156230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:18,384-Speed 3333.91 samples/sec   Loss 1.7635   LearningRate 0.0283   Epoch: 9   Global Step: 156240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:21,526-Speed 3259.74 samples/sec   Loss 1.8008   LearningRate 0.0283   Epoch: 9   Global Step: 156250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:24,664-Speed 3264.41 samples/sec   Loss 1.7630   LearningRate 0.0283   Epoch: 9   Global Step: 156260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:27,750-Speed 3318.23 samples/sec   Loss 1.7405   LearningRate 0.0283   Epoch: 9   Global Step: 156270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:30,941-Speed 3210.18 samples/sec   Loss 1.7937   LearningRate 0.0283   Epoch: 9   Global Step: 156280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:47:34,045-Speed 3300.15 samples/sec   Loss 1.7550   LearningRate 0.0283   Epoch: 9   Global Step: 156290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:47:37,207-Speed 3238.83 samples/sec   Loss 1.7547   LearningRate 0.0283   Epoch: 9   Global Step: 156300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:47:40,285-Speed 3327.74 samples/sec   Loss 1.7821   LearningRate 0.0283   Epoch: 9   Global Step: 156310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:47:43,370-Speed 3320.75 samples/sec   Loss 1.7560   LearningRate 0.0283   Epoch: 9   Global Step: 156320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:47:46,500-Speed 3271.29 samples/sec   Loss 1.7787   LearningRate 0.0283   Epoch: 9   Global Step: 156330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:47:49,587-Speed 3318.74 samples/sec   Loss 1.7657   LearningRate 0.0283   Epoch: 9   Global Step: 156340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:47:52,688-Speed 3303.33 samples/sec   Loss 1.7825   LearningRate 0.0283   Epoch: 9   Global Step: 156350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:47:55,852-Speed 3236.47 samples/sec   Loss 1.7691   LearningRate 0.0283   Epoch: 9   Global Step: 156360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:47:58,995-Speed 3259.57 samples/sec   Loss 1.6931   LearningRate 0.0283   Epoch: 9   Global Step: 156370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:02,150-Speed 3245.74 samples/sec   Loss 1.8339   LearningRate 0.0283   Epoch: 9   Global Step: 156380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:05,234-Speed 3321.89 samples/sec   Loss 1.8170   LearningRate 0.0283   Epoch: 9   Global Step: 156390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:48:08,316-Speed 3322.44 samples/sec   Loss 1.7812   LearningRate 0.0282   Epoch: 9   Global Step: 156400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:48:11,413-Speed 3307.46 samples/sec   Loss 1.7336   LearningRate 0.0282   Epoch: 9   Global Step: 156410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:48:14,496-Speed 3322.31 samples/sec   Loss 1.7787   LearningRate 0.0282   Epoch: 9   Global Step: 156420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:48:17,576-Speed 3325.17 samples/sec   Loss 1.7395   LearningRate 0.0282   Epoch: 9   Global Step: 156430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:48:20,659-Speed 3322.64 samples/sec   Loss 1.7193   LearningRate 0.0282   Epoch: 9   Global Step: 156440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:48:23,747-Speed 3316.29 samples/sec   Loss 1.7679   LearningRate 0.0282   Epoch: 9   Global Step: 156450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:48:26,825-Speed 3328.18 samples/sec   Loss 1.7364   LearningRate 0.0282   Epoch: 9   Global Step: 156460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:48:29,889-Speed 3342.56 samples/sec   Loss 1.7582   LearningRate 0.0282   Epoch: 9   Global Step: 156470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:32,972-Speed 3321.97 samples/sec   Loss 1.6895   LearningRate 0.0282   Epoch: 9   Global Step: 156480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:36,055-Speed 3322.47 samples/sec   Loss 1.7184   LearningRate 0.0282   Epoch: 9   Global Step: 156490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:39,150-Speed 3309.26 samples/sec   Loss 1.7856   LearningRate 0.0282   Epoch: 9   Global Step: 156500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:42,246-Speed 3308.24 samples/sec   Loss 1.7287   LearningRate 0.0282   Epoch: 9   Global Step: 156510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:45,351-Speed 3298.68 samples/sec   Loss 1.7896   LearningRate 0.0282   Epoch: 9   Global Step: 156520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:48,452-Speed 3303.18 samples/sec   Loss 1.7435   LearningRate 0.0282   Epoch: 9   Global Step: 156530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:51,555-Speed 3301.15 samples/sec   Loss 1.7608   LearningRate 0.0282   Epoch: 9   Global Step: 156540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:54,637-Speed 3322.92 samples/sec   Loss 1.7381   LearningRate 0.0282   Epoch: 9   Global Step: 156550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:48:57,725-Speed 3316.69 samples/sec   Loss 1.7310   LearningRate 0.0282   Epoch: 9   Global Step: 156560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:49:00,870-Speed 3256.18 samples/sec   Loss 1.8035   LearningRate 0.0282   Epoch: 9   Global Step: 156570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:03,951-Speed 3324.90 samples/sec   Loss 1.7526   LearningRate 0.0282   Epoch: 9   Global Step: 156580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:07,031-Speed 3325.20 samples/sec   Loss 1.8364   LearningRate 0.0282   Epoch: 9   Global Step: 156590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:10,136-Speed 3298.93 samples/sec   Loss 1.7709   LearningRate 0.0282   Epoch: 9   Global Step: 156600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:13,259-Speed 3279.43 samples/sec   Loss 1.7547   LearningRate 0.0282   Epoch: 9   Global Step: 156610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:16,342-Speed 3322.44 samples/sec   Loss 1.7459   LearningRate 0.0282   Epoch: 9   Global Step: 156620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:19,419-Speed 3328.46 samples/sec   Loss 1.7782   LearningRate 0.0282   Epoch: 9   Global Step: 156630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:22,496-Speed 3328.74 samples/sec   Loss 1.7554   LearningRate 0.0282   Epoch: 9   Global Step: 156640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:25,586-Speed 3315.14 samples/sec   Loss 1.7276   LearningRate 0.0282   Epoch: 9   Global Step: 156650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:28,667-Speed 3324.38 samples/sec   Loss 1.7991   LearningRate 0.0282   Epoch: 9   Global Step: 156660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:31,744-Speed 3328.08 samples/sec   Loss 1.8406   LearningRate 0.0282   Epoch: 9   Global Step: 156670   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:49:34,958-Speed 3186.77 samples/sec   Loss 1.6792   LearningRate 0.0282   Epoch: 9   Global Step: 156680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:38,226-Speed 3134.21 samples/sec   Loss 1.7563   LearningRate 0.0282   Epoch: 9   Global Step: 156690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:41,366-Speed 3261.44 samples/sec   Loss 1.8182   LearningRate 0.0282   Epoch: 9   Global Step: 156700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:44,451-Speed 3320.28 samples/sec   Loss 1.7612   LearningRate 0.0281   Epoch: 9   Global Step: 156710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:47,530-Speed 3327.00 samples/sec   Loss 1.7638   LearningRate 0.0281   Epoch: 9   Global Step: 156720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:50,608-Speed 3327.71 samples/sec   Loss 1.7458   LearningRate 0.0281   Epoch: 9   Global Step: 156730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:53,683-Speed 3330.46 samples/sec   Loss 1.7693   LearningRate 0.0281   Epoch: 9   Global Step: 156740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:49:56,751-Speed 3338.34 samples/sec   Loss 1.8025   LearningRate 0.0281   Epoch: 9   Global Step: 156750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:49:59,834-Speed 3322.45 samples/sec   Loss 1.7185   LearningRate 0.0281   Epoch: 9   Global Step: 156760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:50:02,985-Speed 3250.84 samples/sec   Loss 1.8114   LearningRate 0.0281   Epoch: 9   Global Step: 156770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:50:06,073-Speed 3316.08 samples/sec   Loss 1.7864   LearningRate 0.0281   Epoch: 9   Global Step: 156780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:50:09,161-Speed 3317.22 samples/sec   Loss 1.7745   LearningRate 0.0281   Epoch: 9   Global Step: 156790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:50:12,242-Speed 3324.11 samples/sec   Loss 1.7894   LearningRate 0.0281   Epoch: 9   Global Step: 156800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:50:15,336-Speed 3310.33 samples/sec   Loss 1.8223   LearningRate 0.0281   Epoch: 9   Global Step: 156810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:50:18,423-Speed 3318.01 samples/sec   Loss 1.7484   LearningRate 0.0281   Epoch: 9   Global Step: 156820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:50:21,504-Speed 3325.09 samples/sec   Loss 1.8092   LearningRate 0.0281   Epoch: 9   Global Step: 156830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:50:24,593-Speed 3315.53 samples/sec   Loss 1.7295   LearningRate 0.0281   Epoch: 9   Global Step: 156840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:50:27,678-Speed 3319.57 samples/sec   Loss 1.6893   LearningRate 0.0281   Epoch: 9   Global Step: 156850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:30,753-Speed 3331.02 samples/sec   Loss 1.7522   LearningRate 0.0281   Epoch: 9   Global Step: 156860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:33,835-Speed 3323.01 samples/sec   Loss 1.7653   LearningRate 0.0281   Epoch: 9   Global Step: 156870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:36,935-Speed 3304.52 samples/sec   Loss 1.8231   LearningRate 0.0281   Epoch: 9   Global Step: 156880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:40,032-Speed 3306.90 samples/sec   Loss 1.7736   LearningRate 0.0281   Epoch: 9   Global Step: 156890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:43,119-Speed 3317.83 samples/sec   Loss 1.7906   LearningRate 0.0281   Epoch: 9   Global Step: 156900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:46,238-Speed 3284.41 samples/sec   Loss 1.7956   LearningRate 0.0281   Epoch: 9   Global Step: 156910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:49,322-Speed 3320.94 samples/sec   Loss 1.7858   LearningRate 0.0281   Epoch: 9   Global Step: 156920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:52,430-Speed 3295.78 samples/sec   Loss 1.6839   LearningRate 0.0281   Epoch: 9   Global Step: 156930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:55,529-Speed 3304.62 samples/sec   Loss 1.7953   LearningRate 0.0281   Epoch: 9   Global Step: 156940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:50:58,606-Speed 3328.87 samples/sec   Loss 1.7029   LearningRate 0.0281   Epoch: 9   Global Step: 156950   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-11 15:51:01,673-Speed 3339.23 samples/sec   Loss 1.7462   LearningRate 0.0281   Epoch: 9   Global Step: 156960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:04,753-Speed 3325.38 samples/sec   Loss 1.8241   LearningRate 0.0281   Epoch: 9   Global Step: 156970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:07,856-Speed 3301.64 samples/sec   Loss 1.7388   LearningRate 0.0281   Epoch: 9   Global Step: 156980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:10,944-Speed 3316.43 samples/sec   Loss 1.7724   LearningRate 0.0281   Epoch: 9   Global Step: 156990   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:14,045-Speed 3302.57 samples/sec   Loss 1.8409   LearningRate 0.0281   Epoch: 9   Global Step: 157000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:17,127-Speed 3323.35 samples/sec   Loss 1.7859   LearningRate 0.0281   Epoch: 9   Global Step: 157010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:20,218-Speed 3313.28 samples/sec   Loss 1.6968   LearningRate 0.0281   Epoch: 9   Global Step: 157020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:23,304-Speed 3319.60 samples/sec   Loss 1.7235   LearningRate 0.0280   Epoch: 9   Global Step: 157030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:26,389-Speed 3319.60 samples/sec   Loss 1.7620   LearningRate 0.0280   Epoch: 9   Global Step: 157040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:29,474-Speed 3320.73 samples/sec   Loss 1.7415   LearningRate 0.0280   Epoch: 9   Global Step: 157050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:32,545-Speed 3334.46 samples/sec   Loss 1.7966   LearningRate 0.0280   Epoch: 9   Global Step: 157060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:35,639-Speed 3310.64 samples/sec   Loss 1.7703   LearningRate 0.0280   Epoch: 9   Global Step: 157070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:38,730-Speed 3313.78 samples/sec   Loss 1.7927   LearningRate 0.0280   Epoch: 9   Global Step: 157080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:41,814-Speed 3321.17 samples/sec   Loss 1.7927   LearningRate 0.0280   Epoch: 9   Global Step: 157090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:44,948-Speed 3267.74 samples/sec   Loss 1.8136   LearningRate 0.0280   Epoch: 9   Global Step: 157100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:48,112-Speed 3237.02 samples/sec   Loss 1.8204   LearningRate 0.0280   Epoch: 9   Global Step: 157110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:51,215-Speed 3301.72 samples/sec   Loss 1.8517   LearningRate 0.0280   Epoch: 9   Global Step: 157120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:54,326-Speed 3291.49 samples/sec   Loss 1.7844   LearningRate 0.0280   Epoch: 9   Global Step: 157130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:51:57,407-Speed 3324.70 samples/sec   Loss 1.7602   LearningRate 0.0280   Epoch: 9   Global Step: 157140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:52:00,551-Speed 3258.10 samples/sec   Loss 1.8030   LearningRate 0.0280   Epoch: 9   Global Step: 157150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:52:03,622-Speed 3334.94 samples/sec   Loss 1.7940   LearningRate 0.0280   Epoch: 9   Global Step: 157160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:06,707-Speed 3320.29 samples/sec   Loss 1.7534   LearningRate 0.0280   Epoch: 9   Global Step: 157170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:09,784-Speed 3328.55 samples/sec   Loss 1.7614   LearningRate 0.0280   Epoch: 9   Global Step: 157180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:12,867-Speed 3321.84 samples/sec   Loss 1.7808   LearningRate 0.0280   Epoch: 9   Global Step: 157190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:15,948-Speed 3324.76 samples/sec   Loss 1.7719   LearningRate 0.0280   Epoch: 9   Global Step: 157200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:19,027-Speed 3326.42 samples/sec   Loss 1.7380   LearningRate 0.0280   Epoch: 9   Global Step: 157210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:22,105-Speed 3327.37 samples/sec   Loss 1.7086   LearningRate 0.0280   Epoch: 9   Global Step: 157220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:25,184-Speed 3326.40 samples/sec   Loss 1.8077   LearningRate 0.0280   Epoch: 9   Global Step: 157230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:28,274-Speed 3315.28 samples/sec   Loss 1.7809   LearningRate 0.0280   Epoch: 9   Global Step: 157240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:31,348-Speed 3331.47 samples/sec   Loss 1.7470   LearningRate 0.0280   Epoch: 9   Global Step: 157250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:34,421-Speed 3333.83 samples/sec   Loss 1.7441   LearningRate 0.0280   Epoch: 9   Global Step: 157260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:52:37,500-Speed 3325.78 samples/sec   Loss 1.7754   LearningRate 0.0280   Epoch: 9   Global Step: 157270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:52:40,577-Speed 3329.62 samples/sec   Loss 1.8148   LearningRate 0.0280   Epoch: 9   Global Step: 157280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:52:43,653-Speed 3328.87 samples/sec   Loss 1.7995   LearningRate 0.0280   Epoch: 9   Global Step: 157290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:52:46,732-Speed 3326.76 samples/sec   Loss 1.7482   LearningRate 0.0280   Epoch: 9   Global Step: 157300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:52:49,830-Speed 3305.57 samples/sec   Loss 1.8027   LearningRate 0.0280   Epoch: 9   Global Step: 157310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:52:52,972-Speed 3260.21 samples/sec   Loss 1.7484   LearningRate 0.0280   Epoch: 9   Global Step: 157320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:56,067-Speed 3309.06 samples/sec   Loss 1.7947   LearningRate 0.0280   Epoch: 9   Global Step: 157330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:52:59,222-Speed 3246.72 samples/sec   Loss 1.7816   LearningRate 0.0279   Epoch: 9   Global Step: 157340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:53:02,391-Speed 3232.42 samples/sec   Loss 1.7695   LearningRate 0.0279   Epoch: 9   Global Step: 157350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:53:05,506-Speed 3287.43 samples/sec   Loss 1.7575   LearningRate 0.0279   Epoch: 9   Global Step: 157360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:53:08,599-Speed 3311.85 samples/sec   Loss 1.7698   LearningRate 0.0279   Epoch: 9   Global Step: 157370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:53:11,685-Speed 3319.29 samples/sec   Loss 1.7818   LearningRate 0.0279   Epoch: 9   Global Step: 157380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:53:14,762-Speed 3327.97 samples/sec   Loss 1.7918   LearningRate 0.0279   Epoch: 9   Global Step: 157390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:53:17,850-Speed 3316.52 samples/sec   Loss 1.7821   LearningRate 0.0279   Epoch: 9   Global Step: 157400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:53:20,958-Speed 3295.62 samples/sec   Loss 1.7948   LearningRate 0.0279   Epoch: 9   Global Step: 157410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:53:24,070-Speed 3292.35 samples/sec   Loss 1.7808   LearningRate 0.0279   Epoch: 9   Global Step: 157420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:27,150-Speed 3324.87 samples/sec   Loss 1.7896   LearningRate 0.0279   Epoch: 9   Global Step: 157430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:30,227-Speed 3328.83 samples/sec   Loss 1.7742   LearningRate 0.0279   Epoch: 9   Global Step: 157440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:33,312-Speed 3319.45 samples/sec   Loss 1.7629   LearningRate 0.0279   Epoch: 9   Global Step: 157450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:36,396-Speed 3321.07 samples/sec   Loss 1.7911   LearningRate 0.0279   Epoch: 9   Global Step: 157460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:39,486-Speed 3315.72 samples/sec   Loss 1.7894   LearningRate 0.0279   Epoch: 9   Global Step: 157470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:42,564-Speed 3327.52 samples/sec   Loss 1.8147   LearningRate 0.0279   Epoch: 9   Global Step: 157480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:45,695-Speed 3270.41 samples/sec   Loss 1.7601   LearningRate 0.0279   Epoch: 9   Global Step: 157490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:48,783-Speed 3317.36 samples/sec   Loss 1.7595   LearningRate 0.0279   Epoch: 9   Global Step: 157500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:51,869-Speed 3318.78 samples/sec   Loss 1.7285   LearningRate 0.0279   Epoch: 9   Global Step: 157510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:53:54,957-Speed 3316.57 samples/sec   Loss 1.7408   LearningRate 0.0279   Epoch: 9   Global Step: 157520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:53:58,042-Speed 3320.68 samples/sec   Loss 1.7423   LearningRate 0.0279   Epoch: 9   Global Step: 157530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:01,127-Speed 3319.53 samples/sec   Loss 1.7557   LearningRate 0.0279   Epoch: 9   Global Step: 157540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:04,216-Speed 3316.23 samples/sec   Loss 1.7954   LearningRate 0.0279   Epoch: 9   Global Step: 157550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:07,307-Speed 3312.85 samples/sec   Loss 1.7930   LearningRate 0.0279   Epoch: 9   Global Step: 157560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:10,393-Speed 3319.43 samples/sec   Loss 1.7945   LearningRate 0.0279   Epoch: 9   Global Step: 157570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:13,491-Speed 3306.48 samples/sec   Loss 1.7945   LearningRate 0.0279   Epoch: 9   Global Step: 157580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:16,591-Speed 3303.83 samples/sec   Loss 1.8477   LearningRate 0.0279   Epoch: 9   Global Step: 157590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:19,718-Speed 3274.96 samples/sec   Loss 1.7859   LearningRate 0.0279   Epoch: 9   Global Step: 157600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:22,894-Speed 3224.96 samples/sec   Loss 1.8089   LearningRate 0.0279   Epoch: 9   Global Step: 157610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:26,001-Speed 3296.56 samples/sec   Loss 1.7754   LearningRate 0.0279   Epoch: 9   Global Step: 157620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:54:29,095-Speed 3310.93 samples/sec   Loss 1.8465   LearningRate 0.0279   Epoch: 9   Global Step: 157630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:54:32,209-Speed 3289.41 samples/sec   Loss 1.7867   LearningRate 0.0279   Epoch: 9   Global Step: 157640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:35,310-Speed 3302.71 samples/sec   Loss 1.7431   LearningRate 0.0279   Epoch: 9   Global Step: 157650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:38,424-Speed 3288.18 samples/sec   Loss 1.7621   LearningRate 0.0278   Epoch: 9   Global Step: 157660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:41,542-Speed 3285.58 samples/sec   Loss 1.7898   LearningRate 0.0278   Epoch: 9   Global Step: 157670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:44,709-Speed 3234.15 samples/sec   Loss 1.7889   LearningRate 0.0278   Epoch: 9   Global Step: 157680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:47,787-Speed 3327.63 samples/sec   Loss 1.7908   LearningRate 0.0278   Epoch: 9   Global Step: 157690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:50,865-Speed 3327.37 samples/sec   Loss 1.7754   LearningRate 0.0278   Epoch: 9   Global Step: 157700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:53,951-Speed 3319.23 samples/sec   Loss 1.7860   LearningRate 0.0278   Epoch: 9   Global Step: 157710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:54:57,045-Speed 3309.87 samples/sec   Loss 1.7850   LearningRate 0.0278   Epoch: 9   Global Step: 157720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:00,125-Speed 3325.75 samples/sec   Loss 1.7812   LearningRate 0.0278   Epoch: 9   Global Step: 157730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:03,227-Speed 3302.31 samples/sec   Loss 1.7979   LearningRate 0.0278   Epoch: 9   Global Step: 157740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:55:06,333-Speed 3297.11 samples/sec   Loss 1.7591   LearningRate 0.0278   Epoch: 9   Global Step: 157750   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:55:09,514-Speed 3220.38 samples/sec   Loss 1.8051   LearningRate 0.0278   Epoch: 9   Global Step: 157760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:55:12,631-Speed 3285.95 samples/sec   Loss 1.7596   LearningRate 0.0278   Epoch: 9   Global Step: 157770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:55:15,753-Speed 3280.88 samples/sec   Loss 1.8629   LearningRate 0.0278   Epoch: 9   Global Step: 157780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:18,845-Speed 3311.94 samples/sec   Loss 1.8282   LearningRate 0.0278   Epoch: 9   Global Step: 157790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:21,928-Speed 3322.67 samples/sec   Loss 1.8357   LearningRate 0.0278   Epoch: 9   Global Step: 157800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:25,075-Speed 3254.21 samples/sec   Loss 1.7768   LearningRate 0.0278   Epoch: 9   Global Step: 157810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:28,232-Speed 3244.29 samples/sec   Loss 1.7698   LearningRate 0.0278   Epoch: 9   Global Step: 157820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:31,311-Speed 3326.08 samples/sec   Loss 1.8218   LearningRate 0.0278   Epoch: 9   Global Step: 157830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:34,393-Speed 3324.17 samples/sec   Loss 1.7878   LearningRate 0.0278   Epoch: 9   Global Step: 157840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:37,491-Speed 3306.60 samples/sec   Loss 1.7516   LearningRate 0.0278   Epoch: 9   Global Step: 157850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:40,577-Speed 3318.25 samples/sec   Loss 1.8503   LearningRate 0.0278   Epoch: 9   Global Step: 157860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:43,661-Speed 3321.57 samples/sec   Loss 1.7714   LearningRate 0.0278   Epoch: 9   Global Step: 157870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:46,851-Speed 3210.20 samples/sec   Loss 1.7988   LearningRate 0.0278   Epoch: 9   Global Step: 157880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:55:49,976-Speed 3277.25 samples/sec   Loss 1.7966   LearningRate 0.0278   Epoch: 9   Global Step: 157890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:55:53,081-Speed 3298.90 samples/sec   Loss 1.7990   LearningRate 0.0278   Epoch: 9   Global Step: 157900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:55:56,186-Speed 3299.04 samples/sec   Loss 1.7575   LearningRate 0.0278   Epoch: 9   Global Step: 157910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:55:59,276-Speed 3314.21 samples/sec   Loss 1.8299   LearningRate 0.0278   Epoch: 9   Global Step: 157920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:56:02,358-Speed 3323.16 samples/sec   Loss 1.7895   LearningRate 0.0278   Epoch: 9   Global Step: 157930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:56:05,441-Speed 3323.04 samples/sec   Loss 1.7401   LearningRate 0.0278   Epoch: 9   Global Step: 157940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:56:08,553-Speed 3291.37 samples/sec   Loss 1.8189   LearningRate 0.0278   Epoch: 9   Global Step: 157950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:56:11,634-Speed 3323.72 samples/sec   Loss 1.7800   LearningRate 0.0278   Epoch: 9   Global Step: 157960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:56:14,719-Speed 3321.75 samples/sec   Loss 1.7201   LearningRate 0.0277   Epoch: 9   Global Step: 157970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:56:17,835-Speed 3286.42 samples/sec   Loss 1.8071   LearningRate 0.0277   Epoch: 9   Global Step: 157980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:56:20,933-Speed 3305.83 samples/sec   Loss 1.7630   LearningRate 0.0277   Epoch: 9   Global Step: 157990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:56:24,039-Speed 3297.30 samples/sec   Loss 1.7885   LearningRate 0.0277   Epoch: 9   Global Step: 158000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:57:07,407-[lfw][158000]XNorm: 23.538166
Training: 2022-04-11 15:57:07,408-[lfw][158000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 15:57:07,408-[lfw][158000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:57:57,761-[cfp_fp][158000]XNorm: 22.637898
Training: 2022-04-11 15:57:57,761-[cfp_fp][158000]Accuracy-Flip: 0.98800+-0.00492
Training: 2022-04-11 15:57:57,761-[cfp_fp][158000]Accuracy-Highest: 0.98971
Training: 2022-04-11 15:58:41,242-[agedb_30][158000]XNorm: 23.997656
Training: 2022-04-11 15:58:41,243-[agedb_30][158000]Accuracy-Flip: 0.98383+-0.00578
Training: 2022-04-11 15:58:41,243-[agedb_30][158000]Accuracy-Highest: 0.98400
Training: 2022-04-11 15:58:44,382-Speed 72.96 samples/sec   Loss 1.8319   LearningRate 0.0277   Epoch: 9   Global Step: 158010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:58:47,535-Speed 3248.50 samples/sec   Loss 1.7540   LearningRate 0.0277   Epoch: 9   Global Step: 158020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:58:50,663-Speed 3274.21 samples/sec   Loss 1.7841   LearningRate 0.0277   Epoch: 9   Global Step: 158030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:58:53,746-Speed 3321.47 samples/sec   Loss 1.7599   LearningRate 0.0277   Epoch: 9   Global Step: 158040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:58:56,844-Speed 3306.65 samples/sec   Loss 1.7790   LearningRate 0.0277   Epoch: 9   Global Step: 158050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:58:59,903-Speed 3348.31 samples/sec   Loss 1.7709   LearningRate 0.0277   Epoch: 9   Global Step: 158060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:02,977-Speed 3331.85 samples/sec   Loss 1.7777   LearningRate 0.0277   Epoch: 9   Global Step: 158070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:06,054-Speed 3329.14 samples/sec   Loss 1.7698   LearningRate 0.0277   Epoch: 9   Global Step: 158080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:09,143-Speed 3315.55 samples/sec   Loss 1.7478   LearningRate 0.0277   Epoch: 9   Global Step: 158090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:12,222-Speed 3326.53 samples/sec   Loss 1.7331   LearningRate 0.0277   Epoch: 9   Global Step: 158100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:15,287-Speed 3341.55 samples/sec   Loss 1.7867   LearningRate 0.0277   Epoch: 9   Global Step: 158110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:18,358-Speed 3334.73 samples/sec   Loss 1.7721   LearningRate 0.0277   Epoch: 9   Global Step: 158120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:21,431-Speed 3333.34 samples/sec   Loss 1.7530   LearningRate 0.0277   Epoch: 9   Global Step: 158130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:24,521-Speed 3314.65 samples/sec   Loss 1.8112   LearningRate 0.0277   Epoch: 9   Global Step: 158140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:27,605-Speed 3321.70 samples/sec   Loss 1.7714   LearningRate 0.0277   Epoch: 9   Global Step: 158150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:30,686-Speed 3324.55 samples/sec   Loss 1.7903   LearningRate 0.0277   Epoch: 9   Global Step: 158160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:59:33,804-Speed 3284.55 samples/sec   Loss 1.7857   LearningRate 0.0277   Epoch: 9   Global Step: 158170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 15:59:36,914-Speed 3292.95 samples/sec   Loss 1.7845   LearningRate 0.0277   Epoch: 9   Global Step: 158180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:40,009-Speed 3309.05 samples/sec   Loss 1.8450   LearningRate 0.0277   Epoch: 9   Global Step: 158190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:43,085-Speed 3330.48 samples/sec   Loss 1.8018   LearningRate 0.0277   Epoch: 9   Global Step: 158200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:46,182-Speed 3307.18 samples/sec   Loss 1.7821   LearningRate 0.0277   Epoch: 9   Global Step: 158210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:49,261-Speed 3325.92 samples/sec   Loss 1.7446   LearningRate 0.0277   Epoch: 9   Global Step: 158220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:52,350-Speed 3315.93 samples/sec   Loss 1.7517   LearningRate 0.0277   Epoch: 9   Global Step: 158230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:55,422-Speed 3334.66 samples/sec   Loss 1.7973   LearningRate 0.0277   Epoch: 9   Global Step: 158240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 15:59:58,520-Speed 3306.39 samples/sec   Loss 1.7804   LearningRate 0.0277   Epoch: 9   Global Step: 158250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:01,598-Speed 3326.66 samples/sec   Loss 1.7633   LearningRate 0.0277   Epoch: 9   Global Step: 158260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:04,701-Speed 3301.58 samples/sec   Loss 1.8851   LearningRate 0.0277   Epoch: 9   Global Step: 158270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:07,783-Speed 3323.11 samples/sec   Loss 1.7928   LearningRate 0.0277   Epoch: 9   Global Step: 158280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:00:10,856-Speed 3332.67 samples/sec   Loss 1.7811   LearningRate 0.0276   Epoch: 9   Global Step: 158290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:13,931-Speed 3330.29 samples/sec   Loss 1.7970   LearningRate 0.0276   Epoch: 9   Global Step: 158300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:17,024-Speed 3312.22 samples/sec   Loss 1.7673   LearningRate 0.0276   Epoch: 9   Global Step: 158310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:20,097-Speed 3333.14 samples/sec   Loss 1.8100   LearningRate 0.0276   Epoch: 9   Global Step: 158320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:23,175-Speed 3327.87 samples/sec   Loss 1.7597   LearningRate 0.0276   Epoch: 9   Global Step: 158330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:26,257-Speed 3323.42 samples/sec   Loss 1.8288   LearningRate 0.0276   Epoch: 9   Global Step: 158340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:29,361-Speed 3299.05 samples/sec   Loss 1.7868   LearningRate 0.0276   Epoch: 9   Global Step: 158350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:32,472-Speed 3292.40 samples/sec   Loss 1.8097   LearningRate 0.0276   Epoch: 9   Global Step: 158360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:35,551-Speed 3326.76 samples/sec   Loss 1.8246   LearningRate 0.0276   Epoch: 9   Global Step: 158370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:38,667-Speed 3287.14 samples/sec   Loss 1.7205   LearningRate 0.0276   Epoch: 9   Global Step: 158380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:00:41,823-Speed 3244.98 samples/sec   Loss 1.7453   LearningRate 0.0276   Epoch: 9   Global Step: 158390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:00:45,001-Speed 3222.27 samples/sec   Loss 1.8548   LearningRate 0.0276   Epoch: 9   Global Step: 158400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:00:48,076-Speed 3331.32 samples/sec   Loss 1.8364   LearningRate 0.0276   Epoch: 9   Global Step: 158410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:00:51,151-Speed 3331.18 samples/sec   Loss 1.7820   LearningRate 0.0276   Epoch: 9   Global Step: 158420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:00:54,241-Speed 3314.61 samples/sec   Loss 1.7860   LearningRate 0.0276   Epoch: 9   Global Step: 158430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:00:57,305-Speed 3343.27 samples/sec   Loss 1.7841   LearningRate 0.0276   Epoch: 9   Global Step: 158440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:00,378-Speed 3332.23 samples/sec   Loss 1.8159   LearningRate 0.0276   Epoch: 9   Global Step: 158450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:03,454-Speed 3329.71 samples/sec   Loss 1.8014   LearningRate 0.0276   Epoch: 9   Global Step: 158460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:06,542-Speed 3317.58 samples/sec   Loss 1.8411   LearningRate 0.0276   Epoch: 9   Global Step: 158470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:09,648-Speed 3297.31 samples/sec   Loss 1.7925   LearningRate 0.0276   Epoch: 9   Global Step: 158480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:12,720-Speed 3334.60 samples/sec   Loss 1.8658   LearningRate 0.0276   Epoch: 9   Global Step: 158490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:15,796-Speed 3329.26 samples/sec   Loss 1.8018   LearningRate 0.0276   Epoch: 9   Global Step: 158500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:18,888-Speed 3313.20 samples/sec   Loss 1.7754   LearningRate 0.0276   Epoch: 9   Global Step: 158510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:21,977-Speed 3315.19 samples/sec   Loss 1.8399   LearningRate 0.0276   Epoch: 9   Global Step: 158520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:25,083-Speed 3298.07 samples/sec   Loss 1.7623   LearningRate 0.0276   Epoch: 9   Global Step: 158530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:28,207-Speed 3278.33 samples/sec   Loss 1.8353   LearningRate 0.0276   Epoch: 9   Global Step: 158540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:01:31,282-Speed 3329.98 samples/sec   Loss 1.7566   LearningRate 0.0276   Epoch: 9   Global Step: 158550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:01:34,365-Speed 3322.10 samples/sec   Loss 1.8084   LearningRate 0.0276   Epoch: 9   Global Step: 158560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:01:37,506-Speed 3260.86 samples/sec   Loss 1.7545   LearningRate 0.0276   Epoch: 9   Global Step: 158570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:01:40,640-Speed 3268.82 samples/sec   Loss 1.7741   LearningRate 0.0276   Epoch: 9   Global Step: 158580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-11 16:01:43,713-Speed 3333.39 samples/sec   Loss 1.7667   LearningRate 0.0276   Epoch: 9   Global Step: 158590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:46,801-Speed 3316.73 samples/sec   Loss 1.7797   LearningRate 0.0276   Epoch: 9   Global Step: 158600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:49,893-Speed 3312.47 samples/sec   Loss 1.7665   LearningRate 0.0275   Epoch: 9   Global Step: 158610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:53,040-Speed 3254.88 samples/sec   Loss 1.7739   LearningRate 0.0275   Epoch: 9   Global Step: 158620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:56,181-Speed 3260.29 samples/sec   Loss 1.7250   LearningRate 0.0275   Epoch: 9   Global Step: 158630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:01:59,265-Speed 3321.16 samples/sec   Loss 1.7436   LearningRate 0.0275   Epoch: 9   Global Step: 158640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:02:02,359-Speed 3310.37 samples/sec   Loss 1.8144   LearningRate 0.0275   Epoch: 9   Global Step: 158650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:02:05,425-Speed 3341.12 samples/sec   Loss 1.7786   LearningRate 0.0275   Epoch: 9   Global Step: 158660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:08,532-Speed 3296.52 samples/sec   Loss 1.8612   LearningRate 0.0275   Epoch: 9   Global Step: 158670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:11,679-Speed 3253.98 samples/sec   Loss 1.7994   LearningRate 0.0275   Epoch: 9   Global Step: 158680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:14,782-Speed 3301.72 samples/sec   Loss 1.7646   LearningRate 0.0275   Epoch: 9   Global Step: 158690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:17,861-Speed 3325.97 samples/sec   Loss 1.8354   LearningRate 0.0275   Epoch: 9   Global Step: 158700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:20,940-Speed 3326.36 samples/sec   Loss 1.8553   LearningRate 0.0275   Epoch: 9   Global Step: 158710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:24,020-Speed 3325.21 samples/sec   Loss 1.7405   LearningRate 0.0275   Epoch: 9   Global Step: 158720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:27,102-Speed 3324.14 samples/sec   Loss 1.7933   LearningRate 0.0275   Epoch: 9   Global Step: 158730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:30,186-Speed 3320.50 samples/sec   Loss 1.8081   LearningRate 0.0275   Epoch: 9   Global Step: 158740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:33,263-Speed 3328.82 samples/sec   Loss 1.7999   LearningRate 0.0275   Epoch: 9   Global Step: 158750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-11 16:02:36,342-Speed 3326.49 samples/sec   Loss 1.8304   LearningRate 0.0275   Epoch: 9   Global Step: 158760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:02:39,435-Speed 3311.71 samples/sec   Loss 1.7882   LearningRate 0.0275   Epoch: 9   Global Step: 158770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:02:42,510-Speed 3331.14 samples/sec   Loss 1.8052   LearningRate 0.0275   Epoch: 9   Global Step: 158780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:02:45,607-Speed 3306.16 samples/sec   Loss 1.7750   LearningRate 0.0275   Epoch: 9   Global Step: 158790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:02:48,704-Speed 3307.42 samples/sec   Loss 1.7912   LearningRate 0.0275   Epoch: 9   Global Step: 158800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:02:51,880-Speed 3225.35 samples/sec   Loss 1.8305   LearningRate 0.0275   Epoch: 9   Global Step: 158810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:02:55,131-Speed 3150.03 samples/sec   Loss 1.8527   LearningRate 0.0275   Epoch: 9   Global Step: 158820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:02:58,275-Speed 3258.56 samples/sec   Loss 1.8058   LearningRate 0.0275   Epoch: 9   Global Step: 158830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:03:01,353-Speed 3327.88 samples/sec   Loss 1.8110   LearningRate 0.0275   Epoch: 9   Global Step: 158840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:03:04,436-Speed 3322.16 samples/sec   Loss 1.7397   LearningRate 0.0275   Epoch: 9   Global Step: 158850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-11 16:03:07,519-Speed 3321.76 samples/sec   Loss 1.8562   LearningRate 0.0275   Epoch: 9   Global Step: 158860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:03:10,654-Speed 3266.53 samples/sec   Loss 1.7763   LearningRate 0.0275   Epoch: 9   Global Step: 158870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:03:13,751-Speed 3307.97 samples/sec   Loss 1.8277   LearningRate 0.0275   Epoch: 9   Global Step: 158880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:16,902-Speed 3250.67 samples/sec   Loss 1.7340   LearningRate 0.0275   Epoch: 9   Global Step: 158890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:19,999-Speed 3306.15 samples/sec   Loss 1.8267   LearningRate 0.0275   Epoch: 9   Global Step: 158900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:23,091-Speed 3313.67 samples/sec   Loss 1.8208   LearningRate 0.0275   Epoch: 9   Global Step: 158910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:26,169-Speed 3326.77 samples/sec   Loss 1.8190   LearningRate 0.0275   Epoch: 9   Global Step: 158920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:29,329-Speed 3241.85 samples/sec   Loss 1.7603   LearningRate 0.0274   Epoch: 9   Global Step: 158930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:32,436-Speed 3296.29 samples/sec   Loss 1.7529   LearningRate 0.0274   Epoch: 9   Global Step: 158940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:35,516-Speed 3325.54 samples/sec   Loss 1.8609   LearningRate 0.0274   Epoch: 9   Global Step: 158950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:38,616-Speed 3304.18 samples/sec   Loss 1.7659   LearningRate 0.0274   Epoch: 9   Global Step: 158960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:41,700-Speed 3320.88 samples/sec   Loss 1.7786   LearningRate 0.0274   Epoch: 9   Global Step: 158970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:03:44,791-Speed 3313.70 samples/sec   Loss 1.8136   LearningRate 0.0274   Epoch: 9   Global Step: 158980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:03:47,889-Speed 3305.30 samples/sec   Loss 1.8134   LearningRate 0.0274   Epoch: 9   Global Step: 158990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:03:50,984-Speed 3309.27 samples/sec   Loss 1.7681   LearningRate 0.0274   Epoch: 9   Global Step: 159000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:03:54,063-Speed 3327.07 samples/sec   Loss 1.7905   LearningRate 0.0274   Epoch: 9   Global Step: 159010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:03:57,157-Speed 3311.34 samples/sec   Loss 1.8005   LearningRate 0.0274   Epoch: 9   Global Step: 159020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:00,245-Speed 3316.06 samples/sec   Loss 1.7515   LearningRate 0.0274   Epoch: 9   Global Step: 159030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:03,353-Speed 3295.22 samples/sec   Loss 1.8246   LearningRate 0.0274   Epoch: 9   Global Step: 159040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:06,463-Speed 3294.15 samples/sec   Loss 1.7849   LearningRate 0.0274   Epoch: 9   Global Step: 159050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:09,541-Speed 3326.68 samples/sec   Loss 1.7975   LearningRate 0.0274   Epoch: 9   Global Step: 159060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:12,617-Speed 3330.01 samples/sec   Loss 1.7999   LearningRate 0.0274   Epoch: 9   Global Step: 159070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:15,707-Speed 3314.52 samples/sec   Loss 1.7504   LearningRate 0.0274   Epoch: 9   Global Step: 159080   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:04:18,811-Speed 3300.07 samples/sec   Loss 1.8058   LearningRate 0.0274   Epoch: 9   Global Step: 159090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:21,964-Speed 3248.32 samples/sec   Loss 1.7887   LearningRate 0.0274   Epoch: 9   Global Step: 159100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:25,039-Speed 3331.02 samples/sec   Loss 1.8160   LearningRate 0.0274   Epoch: 9   Global Step: 159110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:28,141-Speed 3301.83 samples/sec   Loss 1.7797   LearningRate 0.0274   Epoch: 9   Global Step: 159120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:31,227-Speed 3319.51 samples/sec   Loss 1.8589   LearningRate 0.0274   Epoch: 9   Global Step: 159130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:34,313-Speed 3318.45 samples/sec   Loss 1.7741   LearningRate 0.0274   Epoch: 9   Global Step: 159140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:37,416-Speed 3300.77 samples/sec   Loss 1.7921   LearningRate 0.0274   Epoch: 9   Global Step: 159150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:40,495-Speed 3325.67 samples/sec   Loss 1.7806   LearningRate 0.0274   Epoch: 9   Global Step: 159160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:43,654-Speed 3242.68 samples/sec   Loss 1.7786   LearningRate 0.0274   Epoch: 9   Global Step: 159170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:46,741-Speed 3318.16 samples/sec   Loss 1.7968   LearningRate 0.0274   Epoch: 9   Global Step: 159180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:49,817-Speed 3329.42 samples/sec   Loss 1.7872   LearningRate 0.0274   Epoch: 9   Global Step: 159190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:52,920-Speed 3300.80 samples/sec   Loss 1.7500   LearningRate 0.0274   Epoch: 9   Global Step: 159200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:56,004-Speed 3321.53 samples/sec   Loss 1.7760   LearningRate 0.0274   Epoch: 9   Global Step: 159210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:04:59,138-Speed 3268.79 samples/sec   Loss 1.8031   LearningRate 0.0274   Epoch: 9   Global Step: 159220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:02,275-Speed 3264.87 samples/sec   Loss 1.8671   LearningRate 0.0274   Epoch: 9   Global Step: 159230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:05,352-Speed 3327.73 samples/sec   Loss 1.7406   LearningRate 0.0274   Epoch: 9   Global Step: 159240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:08,426-Speed 3332.06 samples/sec   Loss 1.8043   LearningRate 0.0273   Epoch: 9   Global Step: 159250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:11,587-Speed 3240.40 samples/sec   Loss 1.7779   LearningRate 0.0273   Epoch: 9   Global Step: 159260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:14,696-Speed 3295.04 samples/sec   Loss 1.8031   LearningRate 0.0273   Epoch: 9   Global Step: 159270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:17,774-Speed 3327.24 samples/sec   Loss 1.8355   LearningRate 0.0273   Epoch: 9   Global Step: 159280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:20,846-Speed 3333.90 samples/sec   Loss 1.7461   LearningRate 0.0273   Epoch: 9   Global Step: 159290   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:05:23,912-Speed 3340.97 samples/sec   Loss 1.8315   LearningRate 0.0273   Epoch: 9   Global Step: 159300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:27,002-Speed 3314.34 samples/sec   Loss 1.8760   LearningRate 0.0273   Epoch: 9   Global Step: 159310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:30,102-Speed 3304.23 samples/sec   Loss 1.7959   LearningRate 0.0273   Epoch: 9   Global Step: 159320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:33,201-Speed 3305.27 samples/sec   Loss 1.8249   LearningRate 0.0273   Epoch: 9   Global Step: 159330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:36,345-Speed 3257.17 samples/sec   Loss 1.7283   LearningRate 0.0273   Epoch: 9   Global Step: 159340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:39,444-Speed 3305.12 samples/sec   Loss 1.8490   LearningRate 0.0273   Epoch: 9   Global Step: 159350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:42,532-Speed 3317.01 samples/sec   Loss 1.8125   LearningRate 0.0273   Epoch: 9   Global Step: 159360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:45,616-Speed 3321.30 samples/sec   Loss 1.7887   LearningRate 0.0273   Epoch: 9   Global Step: 159370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:48,696-Speed 3325.81 samples/sec   Loss 1.7853   LearningRate 0.0273   Epoch: 9   Global Step: 159380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:51,783-Speed 3317.88 samples/sec   Loss 1.7759   LearningRate 0.0273   Epoch: 9   Global Step: 159390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:54,847-Speed 3342.78 samples/sec   Loss 1.7969   LearningRate 0.0273   Epoch: 9   Global Step: 159400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:05:57,943-Speed 3308.33 samples/sec   Loss 1.7917   LearningRate 0.0273   Epoch: 9   Global Step: 159410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:01,026-Speed 3323.09 samples/sec   Loss 1.7544   LearningRate 0.0273   Epoch: 9   Global Step: 159420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:04,142-Speed 3286.37 samples/sec   Loss 1.8047   LearningRate 0.0273   Epoch: 9   Global Step: 159430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:07,243-Speed 3302.37 samples/sec   Loss 1.8292   LearningRate 0.0273   Epoch: 9   Global Step: 159440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:10,377-Speed 3269.01 samples/sec   Loss 1.7685   LearningRate 0.0273   Epoch: 9   Global Step: 159450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:13,493-Speed 3287.09 samples/sec   Loss 1.7591   LearningRate 0.0273   Epoch: 9   Global Step: 159460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:16,665-Speed 3228.37 samples/sec   Loss 1.7453   LearningRate 0.0273   Epoch: 9   Global Step: 159470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:19,750-Speed 3320.11 samples/sec   Loss 1.8208   LearningRate 0.0273   Epoch: 9   Global Step: 159480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:22,832-Speed 3323.25 samples/sec   Loss 1.7515   LearningRate 0.0273   Epoch: 9   Global Step: 159490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:25,947-Speed 3288.51 samples/sec   Loss 1.8385   LearningRate 0.0273   Epoch: 9   Global Step: 159500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:29,027-Speed 3325.47 samples/sec   Loss 1.8127   LearningRate 0.0273   Epoch: 9   Global Step: 159510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:32,139-Speed 3291.53 samples/sec   Loss 1.7361   LearningRate 0.0273   Epoch: 9   Global Step: 159520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:35,332-Speed 3207.10 samples/sec   Loss 1.7877   LearningRate 0.0273   Epoch: 9   Global Step: 159530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:38,419-Speed 3318.01 samples/sec   Loss 1.8123   LearningRate 0.0273   Epoch: 9   Global Step: 159540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:41,505-Speed 3318.90 samples/sec   Loss 1.7553   LearningRate 0.0273   Epoch: 9   Global Step: 159550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:44,585-Speed 3326.02 samples/sec   Loss 1.8410   LearningRate 0.0273   Epoch: 9   Global Step: 159560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:47,672-Speed 3317.86 samples/sec   Loss 1.8055   LearningRate 0.0272   Epoch: 9   Global Step: 159570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:50,752-Speed 3325.20 samples/sec   Loss 1.8026   LearningRate 0.0272   Epoch: 9   Global Step: 159580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:53,886-Speed 3268.54 samples/sec   Loss 1.8467   LearningRate 0.0272   Epoch: 9   Global Step: 159590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:06:57,025-Speed 3262.69 samples/sec   Loss 1.8371   LearningRate 0.0272   Epoch: 9   Global Step: 159600   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:07:00,105-Speed 3325.26 samples/sec   Loss 1.8112   LearningRate 0.0272   Epoch: 9   Global Step: 159610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:07:03,187-Speed 3323.42 samples/sec   Loss 1.7493   LearningRate 0.0272   Epoch: 9   Global Step: 159620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:07:06,274-Speed 3317.73 samples/sec   Loss 1.8074   LearningRate 0.0272   Epoch: 9   Global Step: 159630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:07:09,356-Speed 3323.34 samples/sec   Loss 1.7717   LearningRate 0.0272   Epoch: 9   Global Step: 159640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:07:12,426-Speed 3336.04 samples/sec   Loss 1.8390   LearningRate 0.0272   Epoch: 9   Global Step: 159650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:15,570-Speed 3258.52 samples/sec   Loss 1.8134   LearningRate 0.0272   Epoch: 9   Global Step: 159660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:18,672-Speed 3301.82 samples/sec   Loss 1.7866   LearningRate 0.0272   Epoch: 9   Global Step: 159670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:21,820-Speed 3253.04 samples/sec   Loss 1.8158   LearningRate 0.0272   Epoch: 9   Global Step: 159680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:24,936-Speed 3286.52 samples/sec   Loss 1.7840   LearningRate 0.0272   Epoch: 9   Global Step: 159690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:28,028-Speed 3313.39 samples/sec   Loss 1.7922   LearningRate 0.0272   Epoch: 9   Global Step: 159700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:31,109-Speed 3324.14 samples/sec   Loss 1.7785   LearningRate 0.0272   Epoch: 9   Global Step: 159710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:34,196-Speed 3318.75 samples/sec   Loss 1.8320   LearningRate 0.0272   Epoch: 9   Global Step: 159720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:37,281-Speed 3319.02 samples/sec   Loss 1.8055   LearningRate 0.0272   Epoch: 9   Global Step: 159730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:40,364-Speed 3322.82 samples/sec   Loss 1.7741   LearningRate 0.0272   Epoch: 9   Global Step: 159740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:07:43,448-Speed 3320.88 samples/sec   Loss 1.7379   LearningRate 0.0272   Epoch: 9   Global Step: 159750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:07:46,529-Speed 3324.51 samples/sec   Loss 1.8661   LearningRate 0.0272   Epoch: 9   Global Step: 159760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:07:49,613-Speed 3321.05 samples/sec   Loss 1.7322   LearningRate 0.0272   Epoch: 9   Global Step: 159770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:07:52,717-Speed 3299.08 samples/sec   Loss 1.8020   LearningRate 0.0272   Epoch: 9   Global Step: 159780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:07:55,824-Speed 3297.04 samples/sec   Loss 1.8309   LearningRate 0.0272   Epoch: 9   Global Step: 159790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:07:58,929-Speed 3298.54 samples/sec   Loss 1.7772   LearningRate 0.0272   Epoch: 9   Global Step: 159800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:08:02,032-Speed 3301.34 samples/sec   Loss 1.7920   LearningRate 0.0272   Epoch: 9   Global Step: 159810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:08:05,122-Speed 3314.43 samples/sec   Loss 1.8284   LearningRate 0.0272   Epoch: 9   Global Step: 159820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:08:08,293-Speed 3230.02 samples/sec   Loss 1.8029   LearningRate 0.0272   Epoch: 9   Global Step: 159830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:08:11,375-Speed 3323.18 samples/sec   Loss 1.7662   LearningRate 0.0272   Epoch: 9   Global Step: 159840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:08:14,472-Speed 3307.76 samples/sec   Loss 1.8013   LearningRate 0.0272   Epoch: 9   Global Step: 159850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:08:17,553-Speed 3323.41 samples/sec   Loss 1.7486   LearningRate 0.0272   Epoch: 9   Global Step: 159860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:20,652-Speed 3305.29 samples/sec   Loss 1.8865   LearningRate 0.0272   Epoch: 9   Global Step: 159870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:23,742-Speed 3315.37 samples/sec   Loss 1.7896   LearningRate 0.0272   Epoch: 9   Global Step: 159880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:26,831-Speed 3315.53 samples/sec   Loss 1.8345   LearningRate 0.0271   Epoch: 9   Global Step: 159890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:29,926-Speed 3309.20 samples/sec   Loss 1.8318   LearningRate 0.0271   Epoch: 9   Global Step: 159900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:33,009-Speed 3322.49 samples/sec   Loss 1.8292   LearningRate 0.0271   Epoch: 9   Global Step: 159910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:36,157-Speed 3253.47 samples/sec   Loss 1.8032   LearningRate 0.0271   Epoch: 9   Global Step: 159920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:39,240-Speed 3321.70 samples/sec   Loss 1.7706   LearningRate 0.0271   Epoch: 9   Global Step: 159930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:42,319-Speed 3326.83 samples/sec   Loss 1.7531   LearningRate 0.0271   Epoch: 9   Global Step: 159940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:45,423-Speed 3299.55 samples/sec   Loss 1.7747   LearningRate 0.0271   Epoch: 9   Global Step: 159950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:08:48,510-Speed 3318.73 samples/sec   Loss 1.7943   LearningRate 0.0271   Epoch: 9   Global Step: 159960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:08:51,626-Speed 3286.36 samples/sec   Loss 1.8152   LearningRate 0.0271   Epoch: 9   Global Step: 159970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:08:54,720-Speed 3311.26 samples/sec   Loss 1.7386   LearningRate 0.0271   Epoch: 9   Global Step: 159980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:08:57,793-Speed 3332.14 samples/sec   Loss 1.7749   LearningRate 0.0271   Epoch: 9   Global Step: 159990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:09:00,894-Speed 3302.94 samples/sec   Loss 1.8159   LearningRate 0.0271   Epoch: 9   Global Step: 160000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:09:44,402-[lfw][160000]XNorm: 21.422027
Training: 2022-04-11 16:09:44,402-[lfw][160000]Accuracy-Flip: 0.99817+-0.00263
Training: 2022-04-11 16:09:44,403-[lfw][160000]Accuracy-Highest: 0.99817
Training: 2022-04-11 16:10:34,976-[cfp_fp][160000]XNorm: 20.936541
Training: 2022-04-11 16:10:34,976-[cfp_fp][160000]Accuracy-Flip: 0.98771+-0.00496
Training: 2022-04-11 16:10:34,977-[cfp_fp][160000]Accuracy-Highest: 0.98971
Training: 2022-04-11 16:11:18,448-[agedb_30][160000]XNorm: 21.856061
Training: 2022-04-11 16:11:18,449-[agedb_30][160000]Accuracy-Flip: 0.98367+-0.00510
Training: 2022-04-11 16:11:18,449-[agedb_30][160000]Accuracy-Highest: 0.98400
Training: 2022-04-11 16:11:21,529-Speed 72.81 samples/sec   Loss 1.8275   LearningRate 0.0271   Epoch: 9   Global Step: 160010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:11:24,596-Speed 3340.08 samples/sec   Loss 1.7852   LearningRate 0.0271   Epoch: 9   Global Step: 160020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:11:27,694-Speed 3305.41 samples/sec   Loss 1.8244   LearningRate 0.0271   Epoch: 9   Global Step: 160030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:11:30,799-Speed 3299.34 samples/sec   Loss 1.7909   LearningRate 0.0271   Epoch: 9   Global Step: 160040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:11:33,871-Speed 3334.00 samples/sec   Loss 1.7842   LearningRate 0.0271   Epoch: 9   Global Step: 160050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:11:37,000-Speed 3272.96 samples/sec   Loss 1.7716   LearningRate 0.0271   Epoch: 9   Global Step: 160060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:11:40,166-Speed 3236.50 samples/sec   Loss 1.8242   LearningRate 0.0271   Epoch: 9   Global Step: 160070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:11:43,261-Speed 3308.64 samples/sec   Loss 1.8567   LearningRate 0.0271   Epoch: 9   Global Step: 160080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:11:46,424-Speed 3238.22 samples/sec   Loss 1.8512   LearningRate 0.0271   Epoch: 9   Global Step: 160090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:11:49,569-Speed 3256.54 samples/sec   Loss 1.8253   LearningRate 0.0271   Epoch: 9   Global Step: 160100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:11:52,671-Speed 3301.58 samples/sec   Loss 1.7656   LearningRate 0.0271   Epoch: 9   Global Step: 160110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:11:55,738-Speed 3340.05 samples/sec   Loss 1.7791   LearningRate 0.0271   Epoch: 9   Global Step: 160120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:11:58,817-Speed 3326.32 samples/sec   Loss 1.7504   LearningRate 0.0271   Epoch: 9   Global Step: 160130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:01,895-Speed 3327.72 samples/sec   Loss 1.8015   LearningRate 0.0271   Epoch: 9   Global Step: 160140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:04,978-Speed 3322.32 samples/sec   Loss 1.8142   LearningRate 0.0271   Epoch: 9   Global Step: 160150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:08,064-Speed 3319.36 samples/sec   Loss 1.7980   LearningRate 0.0271   Epoch: 9   Global Step: 160160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:11,172-Speed 3295.30 samples/sec   Loss 1.7672   LearningRate 0.0271   Epoch: 9   Global Step: 160170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:14,314-Speed 3260.10 samples/sec   Loss 1.8463   LearningRate 0.0271   Epoch: 9   Global Step: 160180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:17,424-Speed 3292.91 samples/sec   Loss 1.7860   LearningRate 0.0271   Epoch: 9   Global Step: 160190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:20,494-Speed 3335.76 samples/sec   Loss 1.8422   LearningRate 0.0271   Epoch: 9   Global Step: 160200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:23,610-Speed 3287.16 samples/sec   Loss 1.7382   LearningRate 0.0270   Epoch: 9   Global Step: 160210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:26,687-Speed 3328.61 samples/sec   Loss 1.8334   LearningRate 0.0270   Epoch: 9   Global Step: 160220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:29,781-Speed 3309.98 samples/sec   Loss 1.9157   LearningRate 0.0270   Epoch: 9   Global Step: 160230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:32,858-Speed 3328.99 samples/sec   Loss 1.8479   LearningRate 0.0270   Epoch: 9   Global Step: 160240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:35,941-Speed 3322.62 samples/sec   Loss 1.8089   LearningRate 0.0270   Epoch: 9   Global Step: 160250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:39,015-Speed 3332.37 samples/sec   Loss 1.8221   LearningRate 0.0270   Epoch: 9   Global Step: 160260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:42,093-Speed 3327.59 samples/sec   Loss 1.8135   LearningRate 0.0270   Epoch: 9   Global Step: 160270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:45,177-Speed 3320.99 samples/sec   Loss 1.8400   LearningRate 0.0270   Epoch: 9   Global Step: 160280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:48,238-Speed 3345.75 samples/sec   Loss 1.7735   LearningRate 0.0270   Epoch: 9   Global Step: 160290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:51,324-Speed 3319.23 samples/sec   Loss 1.7506   LearningRate 0.0270   Epoch: 9   Global Step: 160300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:54,403-Speed 3326.58 samples/sec   Loss 1.7937   LearningRate 0.0270   Epoch: 9   Global Step: 160310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:12:57,484-Speed 3324.45 samples/sec   Loss 1.8032   LearningRate 0.0270   Epoch: 9   Global Step: 160320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:00,571-Speed 3317.59 samples/sec   Loss 1.8407   LearningRate 0.0270   Epoch: 9   Global Step: 160330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:03,669-Speed 3306.88 samples/sec   Loss 1.8396   LearningRate 0.0270   Epoch: 9   Global Step: 160340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:06,748-Speed 3326.53 samples/sec   Loss 1.8110   LearningRate 0.0270   Epoch: 9   Global Step: 160350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:09,845-Speed 3306.76 samples/sec   Loss 1.8234   LearningRate 0.0270   Epoch: 9   Global Step: 160360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:12,925-Speed 3325.23 samples/sec   Loss 1.7968   LearningRate 0.0270   Epoch: 9   Global Step: 160370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:16,049-Speed 3279.03 samples/sec   Loss 1.7840   LearningRate 0.0270   Epoch: 9   Global Step: 160380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:19,222-Speed 3227.32 samples/sec   Loss 1.7992   LearningRate 0.0270   Epoch: 9   Global Step: 160390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:22,362-Speed 3262.36 samples/sec   Loss 1.7855   LearningRate 0.0270   Epoch: 9   Global Step: 160400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:25,435-Speed 3332.51 samples/sec   Loss 1.8503   LearningRate 0.0270   Epoch: 9   Global Step: 160410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:28,522-Speed 3318.50 samples/sec   Loss 1.8403   LearningRate 0.0270   Epoch: 9   Global Step: 160420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:31,641-Speed 3283.60 samples/sec   Loss 1.7925   LearningRate 0.0270   Epoch: 9   Global Step: 160430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:34,749-Speed 3295.56 samples/sec   Loss 1.8392   LearningRate 0.0270   Epoch: 9   Global Step: 160440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:37,853-Speed 3299.93 samples/sec   Loss 1.7134   LearningRate 0.0270   Epoch: 9   Global Step: 160450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:40,928-Speed 3330.40 samples/sec   Loss 1.8163   LearningRate 0.0270   Epoch: 9   Global Step: 160460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:44,024-Speed 3307.85 samples/sec   Loss 1.8273   LearningRate 0.0270   Epoch: 9   Global Step: 160470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:47,107-Speed 3322.52 samples/sec   Loss 1.7994   LearningRate 0.0270   Epoch: 9   Global Step: 160480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:50,176-Speed 3337.61 samples/sec   Loss 1.7467   LearningRate 0.0270   Epoch: 9   Global Step: 160490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:53,253-Speed 3328.07 samples/sec   Loss 1.7739   LearningRate 0.0270   Epoch: 9   Global Step: 160500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:56,325-Speed 3334.94 samples/sec   Loss 1.7816   LearningRate 0.0270   Epoch: 9   Global Step: 160510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:13:59,397-Speed 3334.11 samples/sec   Loss 1.7822   LearningRate 0.0270   Epoch: 9   Global Step: 160520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:14:02,474-Speed 3329.07 samples/sec   Loss 1.7669   LearningRate 0.0269   Epoch: 9   Global Step: 160530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:14:05,559-Speed 3319.18 samples/sec   Loss 1.8550   LearningRate 0.0269   Epoch: 9   Global Step: 160540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:14:08,640-Speed 3324.14 samples/sec   Loss 1.8248   LearningRate 0.0269   Epoch: 9   Global Step: 160550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:14:11,716-Speed 3329.74 samples/sec   Loss 1.7974   LearningRate 0.0269   Epoch: 9   Global Step: 160560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:14:14,797-Speed 3324.45 samples/sec   Loss 1.8363   LearningRate 0.0269   Epoch: 9   Global Step: 160570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:14:17,876-Speed 3327.09 samples/sec   Loss 1.8060   LearningRate 0.0269   Epoch: 9   Global Step: 160580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:14:20,939-Speed 3343.96 samples/sec   Loss 1.8160   LearningRate 0.0269   Epoch: 9   Global Step: 160590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:14:24,019-Speed 3325.21 samples/sec   Loss 1.7493   LearningRate 0.0269   Epoch: 9   Global Step: 160600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:14:27,082-Speed 3343.69 samples/sec   Loss 1.8592   LearningRate 0.0269   Epoch: 9   Global Step: 160610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:30,220-Speed 3264.20 samples/sec   Loss 1.7876   LearningRate 0.0269   Epoch: 9   Global Step: 160620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:33,326-Speed 3297.68 samples/sec   Loss 1.8495   LearningRate 0.0269   Epoch: 9   Global Step: 160630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:36,411-Speed 3320.08 samples/sec   Loss 1.7890   LearningRate 0.0269   Epoch: 9   Global Step: 160640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:39,486-Speed 3330.99 samples/sec   Loss 1.8235   LearningRate 0.0269   Epoch: 9   Global Step: 160650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:42,562-Speed 3329.41 samples/sec   Loss 1.8101   LearningRate 0.0269   Epoch: 9   Global Step: 160660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:45,651-Speed 3315.76 samples/sec   Loss 1.8407   LearningRate 0.0269   Epoch: 9   Global Step: 160670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:48,748-Speed 3307.60 samples/sec   Loss 1.8437   LearningRate 0.0269   Epoch: 9   Global Step: 160680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:51,830-Speed 3322.51 samples/sec   Loss 1.7941   LearningRate 0.0269   Epoch: 9   Global Step: 160690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:54,899-Speed 3338.10 samples/sec   Loss 1.7940   LearningRate 0.0269   Epoch: 9   Global Step: 160700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:14:57,976-Speed 3328.46 samples/sec   Loss 1.7963   LearningRate 0.0269   Epoch: 9   Global Step: 160710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:01,050-Speed 3332.39 samples/sec   Loss 1.8062   LearningRate 0.0269   Epoch: 9   Global Step: 160720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:04,124-Speed 3331.33 samples/sec   Loss 1.8526   LearningRate 0.0269   Epoch: 9   Global Step: 160730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:07,194-Speed 3336.48 samples/sec   Loss 1.8246   LearningRate 0.0269   Epoch: 9   Global Step: 160740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:10,266-Speed 3333.46 samples/sec   Loss 1.8462   LearningRate 0.0269   Epoch: 9   Global Step: 160750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:13,335-Speed 3338.28 samples/sec   Loss 1.8493   LearningRate 0.0269   Epoch: 9   Global Step: 160760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:16,410-Speed 3331.34 samples/sec   Loss 1.7759   LearningRate 0.0269   Epoch: 9   Global Step: 160770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:19,484-Speed 3331.83 samples/sec   Loss 1.8042   LearningRate 0.0269   Epoch: 9   Global Step: 160780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:22,570-Speed 3318.88 samples/sec   Loss 1.8641   LearningRate 0.0269   Epoch: 9   Global Step: 160790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:25,643-Speed 3333.03 samples/sec   Loss 1.7589   LearningRate 0.0269   Epoch: 9   Global Step: 160800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:28,713-Speed 3337.06 samples/sec   Loss 1.7799   LearningRate 0.0269   Epoch: 9   Global Step: 160810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:31,803-Speed 3314.27 samples/sec   Loss 1.7950   LearningRate 0.0269   Epoch: 9   Global Step: 160820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:34,878-Speed 3330.53 samples/sec   Loss 1.8316   LearningRate 0.0269   Epoch: 9   Global Step: 160830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:37,963-Speed 3320.31 samples/sec   Loss 1.7174   LearningRate 0.0269   Epoch: 9   Global Step: 160840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:41,044-Speed 3324.30 samples/sec   Loss 1.7501   LearningRate 0.0268   Epoch: 9   Global Step: 160850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:44,125-Speed 3325.00 samples/sec   Loss 1.8029   LearningRate 0.0268   Epoch: 9   Global Step: 160860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:47,203-Speed 3327.48 samples/sec   Loss 1.8250   LearningRate 0.0268   Epoch: 9   Global Step: 160870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:50,313-Speed 3293.62 samples/sec   Loss 1.7799   LearningRate 0.0268   Epoch: 9   Global Step: 160880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:53,394-Speed 3323.47 samples/sec   Loss 1.8427   LearningRate 0.0268   Epoch: 9   Global Step: 160890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:56,480-Speed 3318.96 samples/sec   Loss 1.8468   LearningRate 0.0268   Epoch: 9   Global Step: 160900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:15:59,556-Speed 3329.86 samples/sec   Loss 1.8754   LearningRate 0.0268   Epoch: 9   Global Step: 160910   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:16:02,618-Speed 3344.64 samples/sec   Loss 1.7773   LearningRate 0.0268   Epoch: 9   Global Step: 160920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:05,698-Speed 3326.21 samples/sec   Loss 1.7790   LearningRate 0.0268   Epoch: 9   Global Step: 160930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:08,772-Speed 3331.73 samples/sec   Loss 1.7334   LearningRate 0.0268   Epoch: 9   Global Step: 160940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:11,856-Speed 3320.79 samples/sec   Loss 1.8419   LearningRate 0.0268   Epoch: 9   Global Step: 160950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:14,932-Speed 3330.01 samples/sec   Loss 1.8287   LearningRate 0.0268   Epoch: 9   Global Step: 160960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:18,038-Speed 3298.13 samples/sec   Loss 1.8187   LearningRate 0.0268   Epoch: 9   Global Step: 160970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:21,124-Speed 3318.85 samples/sec   Loss 1.7320   LearningRate 0.0268   Epoch: 9   Global Step: 160980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:24,204-Speed 3325.44 samples/sec   Loss 1.7958   LearningRate 0.0268   Epoch: 9   Global Step: 160990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:27,286-Speed 3323.10 samples/sec   Loss 1.8167   LearningRate 0.0268   Epoch: 9   Global Step: 161000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:30,360-Speed 3331.50 samples/sec   Loss 1.7324   LearningRate 0.0268   Epoch: 9   Global Step: 161010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:16:33,411-Speed 3357.31 samples/sec   Loss 1.8291   LearningRate 0.0268   Epoch: 9   Global Step: 161020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:16:36,503-Speed 3312.81 samples/sec   Loss 1.8941   LearningRate 0.0268   Epoch: 9   Global Step: 161030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:16:39,602-Speed 3305.59 samples/sec   Loss 1.8233   LearningRate 0.0268   Epoch: 9   Global Step: 161040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:16:42,698-Speed 3307.35 samples/sec   Loss 1.8154   LearningRate 0.0268   Epoch: 9   Global Step: 161050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:16:45,786-Speed 3316.93 samples/sec   Loss 1.7620   LearningRate 0.0268   Epoch: 9   Global Step: 161060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:16:48,878-Speed 3312.76 samples/sec   Loss 1.8318   LearningRate 0.0268   Epoch: 9   Global Step: 161070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:16:51,969-Speed 3313.85 samples/sec   Loss 1.7449   LearningRate 0.0268   Epoch: 9   Global Step: 161080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:16:55,046-Speed 3327.88 samples/sec   Loss 1.8011   LearningRate 0.0268   Epoch: 9   Global Step: 161090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:16:58,131-Speed 3320.54 samples/sec   Loss 1.8332   LearningRate 0.0268   Epoch: 9   Global Step: 161100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:17:01,222-Speed 3313.47 samples/sec   Loss 1.8034   LearningRate 0.0268   Epoch: 9   Global Step: 161110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:17:04,300-Speed 3327.63 samples/sec   Loss 1.8209   LearningRate 0.0268   Epoch: 9   Global Step: 161120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:07,429-Speed 3273.67 samples/sec   Loss 1.7972   LearningRate 0.0268   Epoch: 9   Global Step: 161130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:10,526-Speed 3306.75 samples/sec   Loss 1.8076   LearningRate 0.0268   Epoch: 9   Global Step: 161140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:13,662-Speed 3266.58 samples/sec   Loss 1.8660   LearningRate 0.0268   Epoch: 9   Global Step: 161150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:16,763-Speed 3302.74 samples/sec   Loss 1.7989   LearningRate 0.0268   Epoch: 9   Global Step: 161160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:19,837-Speed 3331.32 samples/sec   Loss 1.8486   LearningRate 0.0267   Epoch: 9   Global Step: 161170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:22,924-Speed 3318.72 samples/sec   Loss 1.8970   LearningRate 0.0267   Epoch: 9   Global Step: 161180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:26,020-Speed 3307.96 samples/sec   Loss 1.7489   LearningRate 0.0267   Epoch: 9   Global Step: 161190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:29,117-Speed 3306.41 samples/sec   Loss 1.8114   LearningRate 0.0267   Epoch: 9   Global Step: 161200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:32,195-Speed 3328.68 samples/sec   Loss 1.8381   LearningRate 0.0267   Epoch: 9   Global Step: 161210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:35,280-Speed 3319.32 samples/sec   Loss 1.7814   LearningRate 0.0267   Epoch: 9   Global Step: 161220   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:17:38,345-Speed 3342.49 samples/sec   Loss 1.7920   LearningRate 0.0267   Epoch: 9   Global Step: 161230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:41,478-Speed 3268.57 samples/sec   Loss 1.8067   LearningRate 0.0267   Epoch: 9   Global Step: 161240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:44,561-Speed 3322.66 samples/sec   Loss 1.8369   LearningRate 0.0267   Epoch: 9   Global Step: 161250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:47,643-Speed 3322.88 samples/sec   Loss 1.7646   LearningRate 0.0267   Epoch: 9   Global Step: 161260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:50,729-Speed 3319.02 samples/sec   Loss 1.8208   LearningRate 0.0267   Epoch: 9   Global Step: 161270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:53,862-Speed 3268.92 samples/sec   Loss 1.8118   LearningRate 0.0267   Epoch: 9   Global Step: 161280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:17:56,951-Speed 3316.42 samples/sec   Loss 1.8221   LearningRate 0.0267   Epoch: 9   Global Step: 161290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:00,031-Speed 3325.04 samples/sec   Loss 1.8568   LearningRate 0.0267   Epoch: 9   Global Step: 161300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:03,145-Speed 3289.02 samples/sec   Loss 1.7990   LearningRate 0.0267   Epoch: 9   Global Step: 161310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:06,405-Speed 3141.56 samples/sec   Loss 1.7648   LearningRate 0.0267   Epoch: 9   Global Step: 161320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:09,544-Speed 3264.11 samples/sec   Loss 1.8669   LearningRate 0.0267   Epoch: 9   Global Step: 161330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:12,657-Speed 3289.42 samples/sec   Loss 1.8061   LearningRate 0.0267   Epoch: 9   Global Step: 161340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:15,727-Speed 3336.08 samples/sec   Loss 1.7936   LearningRate 0.0267   Epoch: 9   Global Step: 161350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:18,849-Speed 3280.71 samples/sec   Loss 1.7587   LearningRate 0.0267   Epoch: 9   Global Step: 161360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:22,001-Speed 3249.92 samples/sec   Loss 1.8280   LearningRate 0.0267   Epoch: 9   Global Step: 161370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:25,085-Speed 3320.81 samples/sec   Loss 1.8175   LearningRate 0.0267   Epoch: 9   Global Step: 161380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:28,164-Speed 3327.19 samples/sec   Loss 1.7839   LearningRate 0.0267   Epoch: 9   Global Step: 161390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:31,253-Speed 3315.04 samples/sec   Loss 1.7938   LearningRate 0.0267   Epoch: 9   Global Step: 161400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:34,330-Speed 3329.20 samples/sec   Loss 1.7373   LearningRate 0.0267   Epoch: 9   Global Step: 161410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:37,425-Speed 3308.64 samples/sec   Loss 1.8050   LearningRate 0.0267   Epoch: 9   Global Step: 161420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:40,569-Speed 3258.31 samples/sec   Loss 1.8867   LearningRate 0.0267   Epoch: 9   Global Step: 161430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:43,823-Speed 3147.43 samples/sec   Loss 1.8389   LearningRate 0.0267   Epoch: 9   Global Step: 161440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:18:46,925-Speed 3301.73 samples/sec   Loss 1.7936   LearningRate 0.0267   Epoch: 9   Global Step: 161450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:50,021-Speed 3309.10 samples/sec   Loss 1.7976   LearningRate 0.0267   Epoch: 9   Global Step: 161460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:53,135-Speed 3289.31 samples/sec   Loss 1.8219   LearningRate 0.0267   Epoch: 9   Global Step: 161470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:56,270-Speed 3267.38 samples/sec   Loss 1.8367   LearningRate 0.0267   Epoch: 9   Global Step: 161480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:18:59,479-Speed 3191.43 samples/sec   Loss 1.7841   LearningRate 0.0266   Epoch: 9   Global Step: 161490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:19:02,661-Speed 3218.56 samples/sec   Loss 1.7554   LearningRate 0.0266   Epoch: 9   Global Step: 161500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:19:05,789-Speed 3274.13 samples/sec   Loss 1.8203   LearningRate 0.0266   Epoch: 9   Global Step: 161510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:08,888-Speed 3305.69 samples/sec   Loss 1.7616   LearningRate 0.0266   Epoch: 9   Global Step: 161520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:11,968-Speed 3324.63 samples/sec   Loss 1.8302   LearningRate 0.0266   Epoch: 9   Global Step: 161530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:15,051-Speed 3322.33 samples/sec   Loss 1.8370   LearningRate 0.0266   Epoch: 9   Global Step: 161540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:18,144-Speed 3312.07 samples/sec   Loss 1.8043   LearningRate 0.0266   Epoch: 9   Global Step: 161550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:21,223-Speed 3326.54 samples/sec   Loss 1.8494   LearningRate 0.0266   Epoch: 9   Global Step: 161560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:24,304-Speed 3324.74 samples/sec   Loss 1.7835   LearningRate 0.0266   Epoch: 9   Global Step: 161570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:27,394-Speed 3314.54 samples/sec   Loss 1.8147   LearningRate 0.0266   Epoch: 9   Global Step: 161580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:30,478-Speed 3320.78 samples/sec   Loss 1.8225   LearningRate 0.0266   Epoch: 9   Global Step: 161590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:33,567-Speed 3315.32 samples/sec   Loss 1.8207   LearningRate 0.0266   Epoch: 9   Global Step: 161600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:36,662-Speed 3309.41 samples/sec   Loss 1.7809   LearningRate 0.0266   Epoch: 9   Global Step: 161610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:19:39,765-Speed 3301.53 samples/sec   Loss 1.8044   LearningRate 0.0266   Epoch: 9   Global Step: 161620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:19:42,854-Speed 3316.80 samples/sec   Loss 1.8428   LearningRate 0.0266   Epoch: 9   Global Step: 161630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:19:45,919-Speed 3340.89 samples/sec   Loss 1.8374   LearningRate 0.0266   Epoch: 9   Global Step: 161640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:48,999-Speed 3325.97 samples/sec   Loss 1.7727   LearningRate 0.0266   Epoch: 9   Global Step: 161650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:52,079-Speed 3325.23 samples/sec   Loss 1.8200   LearningRate 0.0266   Epoch: 9   Global Step: 161660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:55,161-Speed 3324.04 samples/sec   Loss 1.7944   LearningRate 0.0266   Epoch: 9   Global Step: 161670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:19:58,320-Speed 3241.81 samples/sec   Loss 1.7734   LearningRate 0.0266   Epoch: 9   Global Step: 161680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:20:01,449-Speed 3273.33 samples/sec   Loss 1.7752   LearningRate 0.0266   Epoch: 9   Global Step: 161690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:20:04,596-Speed 3254.42 samples/sec   Loss 1.8039   LearningRate 0.0266   Epoch: 9   Global Step: 161700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:20:07,758-Speed 3239.75 samples/sec   Loss 1.8377   LearningRate 0.0266   Epoch: 9   Global Step: 161710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:20:10,910-Speed 3249.24 samples/sec   Loss 1.8102   LearningRate 0.0266   Epoch: 9   Global Step: 161720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:20:13,987-Speed 3329.14 samples/sec   Loss 1.7978   LearningRate 0.0266   Epoch: 9   Global Step: 161730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:20:17,059-Speed 3333.98 samples/sec   Loss 1.7654   LearningRate 0.0266   Epoch: 9   Global Step: 161740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:20,141-Speed 3323.62 samples/sec   Loss 1.8531   LearningRate 0.0266   Epoch: 9   Global Step: 161750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:23,248-Speed 3295.77 samples/sec   Loss 1.8203   LearningRate 0.0266   Epoch: 9   Global Step: 161760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:26,342-Speed 3311.03 samples/sec   Loss 1.7746   LearningRate 0.0266   Epoch: 9   Global Step: 161770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:29,495-Speed 3248.30 samples/sec   Loss 1.7702   LearningRate 0.0266   Epoch: 9   Global Step: 161780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:32,591-Speed 3308.10 samples/sec   Loss 1.7868   LearningRate 0.0266   Epoch: 9   Global Step: 161790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:35,673-Speed 3323.64 samples/sec   Loss 1.8326   LearningRate 0.0266   Epoch: 9   Global Step: 161800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:38,849-Speed 3224.98 samples/sec   Loss 1.7900   LearningRate 0.0266   Epoch: 9   Global Step: 161810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:42,025-Speed 3225.00 samples/sec   Loss 1.8561   LearningRate 0.0265   Epoch: 9   Global Step: 161820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:45,218-Speed 3206.98 samples/sec   Loss 1.7940   LearningRate 0.0265   Epoch: 9   Global Step: 161830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:48,375-Speed 3244.80 samples/sec   Loss 1.8173   LearningRate 0.0265   Epoch: 9   Global Step: 161840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:20:51,458-Speed 3321.62 samples/sec   Loss 1.8691   LearningRate 0.0265   Epoch: 9   Global Step: 161850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:54,537-Speed 3326.99 samples/sec   Loss 1.8143   LearningRate 0.0265   Epoch: 9   Global Step: 161860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:20:57,617-Speed 3325.48 samples/sec   Loss 1.8124   LearningRate 0.0265   Epoch: 9   Global Step: 161870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:21:00,698-Speed 3323.73 samples/sec   Loss 1.8380   LearningRate 0.0265   Epoch: 9   Global Step: 161880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:21:03,824-Speed 3277.42 samples/sec   Loss 1.8241   LearningRate 0.0265   Epoch: 9   Global Step: 161890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:21:06,906-Speed 3323.34 samples/sec   Loss 1.7875   LearningRate 0.0265   Epoch: 9   Global Step: 161900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:21:10,039-Speed 3268.36 samples/sec   Loss 1.8574   LearningRate 0.0265   Epoch: 9   Global Step: 161910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:21:13,175-Speed 3265.98 samples/sec   Loss 1.7719   LearningRate 0.0265   Epoch: 9   Global Step: 161920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:21:16,280-Speed 3299.03 samples/sec   Loss 1.7947   LearningRate 0.0265   Epoch: 9   Global Step: 161930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:21:19,363-Speed 3322.64 samples/sec   Loss 1.8052   LearningRate 0.0265   Epoch: 9   Global Step: 161940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:21:22,439-Speed 3329.57 samples/sec   Loss 1.8155   LearningRate 0.0265   Epoch: 9   Global Step: 161950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:21:25,531-Speed 3312.01 samples/sec   Loss 1.8395   LearningRate 0.0265   Epoch: 9   Global Step: 161960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:21:28,612-Speed 3325.29 samples/sec   Loss 1.7856   LearningRate 0.0265   Epoch: 9   Global Step: 161970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:21:31,697-Speed 3320.31 samples/sec   Loss 1.8329   LearningRate 0.0265   Epoch: 9   Global Step: 161980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:21:34,777-Speed 3325.62 samples/sec   Loss 1.8732   LearningRate 0.0265   Epoch: 9   Global Step: 161990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:21:37,865-Speed 3316.19 samples/sec   Loss 1.8276   LearningRate 0.0265   Epoch: 9   Global Step: 162000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:22:21,585-[lfw][162000]XNorm: 21.985103
Training: 2022-04-11 16:22:21,586-[lfw][162000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-11 16:22:21,586-[lfw][162000]Accuracy-Highest: 0.99817
Training: 2022-04-11 16:23:12,375-[cfp_fp][162000]XNorm: 21.550250
Training: 2022-04-11 16:23:12,376-[cfp_fp][162000]Accuracy-Flip: 0.98786+-0.00500
Training: 2022-04-11 16:23:12,376-[cfp_fp][162000]Accuracy-Highest: 0.98971
Training: 2022-04-11 16:23:56,153-[agedb_30][162000]XNorm: 22.324515
Training: 2022-04-11 16:23:56,154-[agedb_30][162000]Accuracy-Flip: 0.98167+-0.00715
Training: 2022-04-11 16:23:56,154-[agedb_30][162000]Accuracy-Highest: 0.98400
Training: 2022-04-11 16:23:59,241-Speed 72.43 samples/sec   Loss 1.7867   LearningRate 0.0265   Epoch: 9   Global Step: 162010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:24:02,306-Speed 3342.17 samples/sec   Loss 1.7954   LearningRate 0.0265   Epoch: 9   Global Step: 162020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:24:05,388-Speed 3323.32 samples/sec   Loss 1.8436   LearningRate 0.0265   Epoch: 9   Global Step: 162030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:24:08,453-Speed 3341.27 samples/sec   Loss 1.8315   LearningRate 0.0265   Epoch: 9   Global Step: 162040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:24:11,528-Speed 3330.25 samples/sec   Loss 1.8339   LearningRate 0.0265   Epoch: 9   Global Step: 162050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:14,597-Speed 3337.30 samples/sec   Loss 1.8078   LearningRate 0.0265   Epoch: 9   Global Step: 162060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:17,676-Speed 3327.24 samples/sec   Loss 1.7957   LearningRate 0.0265   Epoch: 9   Global Step: 162070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:20,749-Speed 3333.03 samples/sec   Loss 1.8860   LearningRate 0.0265   Epoch: 9   Global Step: 162080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:23,823-Speed 3331.88 samples/sec   Loss 1.8160   LearningRate 0.0265   Epoch: 9   Global Step: 162090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:26,897-Speed 3332.21 samples/sec   Loss 1.7829   LearningRate 0.0265   Epoch: 9   Global Step: 162100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:30,091-Speed 3206.53 samples/sec   Loss 1.7772   LearningRate 0.0265   Epoch: 9   Global Step: 162110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:33,176-Speed 3320.45 samples/sec   Loss 1.8426   LearningRate 0.0265   Epoch: 9   Global Step: 162120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:36,248-Speed 3333.78 samples/sec   Loss 1.7702   LearningRate 0.0265   Epoch: 9   Global Step: 162130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:39,328-Speed 3324.85 samples/sec   Loss 1.7642   LearningRate 0.0264   Epoch: 9   Global Step: 162140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:42,391-Speed 3344.55 samples/sec   Loss 1.8142   LearningRate 0.0264   Epoch: 9   Global Step: 162150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:45,479-Speed 3315.92 samples/sec   Loss 1.8021   LearningRate 0.0264   Epoch: 9   Global Step: 162160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:48,556-Speed 3329.29 samples/sec   Loss 1.8189   LearningRate 0.0264   Epoch: 9   Global Step: 162170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:51,632-Speed 3329.59 samples/sec   Loss 1.8248   LearningRate 0.0264   Epoch: 9   Global Step: 162180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:54,741-Speed 3294.81 samples/sec   Loss 1.7882   LearningRate 0.0264   Epoch: 9   Global Step: 162190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:24:57,867-Speed 3275.92 samples/sec   Loss 1.7633   LearningRate 0.0264   Epoch: 9   Global Step: 162200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:25:00,983-Speed 3287.68 samples/sec   Loss 1.8206   LearningRate 0.0264   Epoch: 9   Global Step: 162210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:25:04,108-Speed 3276.93 samples/sec   Loss 1.8275   LearningRate 0.0264   Epoch: 9   Global Step: 162220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:07,188-Speed 3325.64 samples/sec   Loss 1.8665   LearningRate 0.0264   Epoch: 9   Global Step: 162230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:10,329-Speed 3261.47 samples/sec   Loss 1.7815   LearningRate 0.0264   Epoch: 9   Global Step: 162240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:13,465-Speed 3265.93 samples/sec   Loss 1.8400   LearningRate 0.0264   Epoch: 9   Global Step: 162250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:16,554-Speed 3315.48 samples/sec   Loss 1.8280   LearningRate 0.0264   Epoch: 9   Global Step: 162260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:19,661-Speed 3296.32 samples/sec   Loss 1.8174   LearningRate 0.0264   Epoch: 9   Global Step: 162270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:22,756-Speed 3309.74 samples/sec   Loss 1.8173   LearningRate 0.0264   Epoch: 9   Global Step: 162280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:25,922-Speed 3234.47 samples/sec   Loss 1.8180   LearningRate 0.0264   Epoch: 9   Global Step: 162290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:29,002-Speed 3326.07 samples/sec   Loss 1.7517   LearningRate 0.0264   Epoch: 9   Global Step: 162300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:32,096-Speed 3309.88 samples/sec   Loss 1.7753   LearningRate 0.0264   Epoch: 9   Global Step: 162310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:25:35,316-Speed 3181.53 samples/sec   Loss 1.8214   LearningRate 0.0264   Epoch: 9   Global Step: 162320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:25:38,475-Speed 3242.06 samples/sec   Loss 1.8763   LearningRate 0.0264   Epoch: 9   Global Step: 162330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:25:41,556-Speed 3324.72 samples/sec   Loss 1.8065   LearningRate 0.0264   Epoch: 9   Global Step: 162340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:25:44,637-Speed 3325.28 samples/sec   Loss 1.7788   LearningRate 0.0264   Epoch: 9   Global Step: 162350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:25:47,745-Speed 3295.11 samples/sec   Loss 1.7738   LearningRate 0.0264   Epoch: 9   Global Step: 162360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:25:50,904-Speed 3241.63 samples/sec   Loss 1.7822   LearningRate 0.0264   Epoch: 9   Global Step: 162370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:25:53,998-Speed 3311.06 samples/sec   Loss 1.7760   LearningRate 0.0264   Epoch: 9   Global Step: 162380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:25:57,190-Speed 3208.87 samples/sec   Loss 1.8290   LearningRate 0.0264   Epoch: 9   Global Step: 162390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:00,268-Speed 3327.49 samples/sec   Loss 1.8012   LearningRate 0.0264   Epoch: 9   Global Step: 162400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:03,349-Speed 3324.49 samples/sec   Loss 1.8048   LearningRate 0.0264   Epoch: 9   Global Step: 162410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:06,471-Speed 3279.90 samples/sec   Loss 1.8211   LearningRate 0.0264   Epoch: 9   Global Step: 162420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:09,687-Speed 3185.28 samples/sec   Loss 1.8176   LearningRate 0.0264   Epoch: 9   Global Step: 162430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:12,765-Speed 3327.90 samples/sec   Loss 1.8031   LearningRate 0.0264   Epoch: 9   Global Step: 162440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:15,842-Speed 3328.15 samples/sec   Loss 1.7854   LearningRate 0.0264   Epoch: 9   Global Step: 162450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:18,920-Speed 3327.85 samples/sec   Loss 1.8575   LearningRate 0.0264   Epoch: 9   Global Step: 162460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:22,010-Speed 3315.36 samples/sec   Loss 1.8556   LearningRate 0.0263   Epoch: 9   Global Step: 162470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:25,115-Speed 3298.43 samples/sec   Loss 1.8279   LearningRate 0.0263   Epoch: 9   Global Step: 162480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:28,200-Speed 3319.26 samples/sec   Loss 1.8205   LearningRate 0.0263   Epoch: 9   Global Step: 162490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:31,289-Speed 3315.66 samples/sec   Loss 1.8397   LearningRate 0.0263   Epoch: 9   Global Step: 162500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:34,415-Speed 3276.84 samples/sec   Loss 1.7868   LearningRate 0.0263   Epoch: 9   Global Step: 162510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:37,570-Speed 3247.10 samples/sec   Loss 1.8132   LearningRate 0.0263   Epoch: 9   Global Step: 162520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:40,682-Speed 3290.64 samples/sec   Loss 1.7881   LearningRate 0.0263   Epoch: 9   Global Step: 162530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:43,877-Speed 3205.72 samples/sec   Loss 1.8273   LearningRate 0.0263   Epoch: 9   Global Step: 162540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:46,986-Speed 3294.40 samples/sec   Loss 1.7818   LearningRate 0.0263   Epoch: 9   Global Step: 162550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:50,119-Speed 3269.03 samples/sec   Loss 1.7800   LearningRate 0.0263   Epoch: 9   Global Step: 162560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:53,206-Speed 3317.60 samples/sec   Loss 1.8143   LearningRate 0.0263   Epoch: 9   Global Step: 162570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:56,295-Speed 3316.50 samples/sec   Loss 1.7754   LearningRate 0.0263   Epoch: 9   Global Step: 162580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:26:59,401-Speed 3296.90 samples/sec   Loss 1.8088   LearningRate 0.0263   Epoch: 9   Global Step: 162590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:27:02,488-Speed 3318.29 samples/sec   Loss 1.7563   LearningRate 0.0263   Epoch: 9   Global Step: 162600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:27:05,573-Speed 3320.09 samples/sec   Loss 1.7526   LearningRate 0.0263   Epoch: 9   Global Step: 162610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:27:08,736-Speed 3238.29 samples/sec   Loss 1.8269   LearningRate 0.0263   Epoch: 9   Global Step: 162620   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:27:11,870-Speed 3268.50 samples/sec   Loss 1.8254   LearningRate 0.0263   Epoch: 9   Global Step: 162630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:27:15,050-Speed 3222.14 samples/sec   Loss 1.8053   LearningRate 0.0263   Epoch: 9   Global Step: 162640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:27:18,172-Speed 3281.14 samples/sec   Loss 1.7928   LearningRate 0.0263   Epoch: 9   Global Step: 162650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:27:21,343-Speed 3229.26 samples/sec   Loss 1.8392   LearningRate 0.0263   Epoch: 9   Global Step: 162660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:27:24,575-Speed 3169.21 samples/sec   Loss 1.7739   LearningRate 0.0263   Epoch: 9   Global Step: 162670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:27:27,656-Speed 3324.70 samples/sec   Loss 1.8849   LearningRate 0.0263   Epoch: 9   Global Step: 162680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:27:30,766-Speed 3293.96 samples/sec   Loss 1.8663   LearningRate 0.0263   Epoch: 9   Global Step: 162690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:27:33,871-Speed 3298.18 samples/sec   Loss 1.8095   LearningRate 0.0263   Epoch: 9   Global Step: 162700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:27:36,947-Speed 3329.40 samples/sec   Loss 1.8028   LearningRate 0.0263   Epoch: 9   Global Step: 162710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:27:40,032-Speed 3321.15 samples/sec   Loss 1.8657   LearningRate 0.0263   Epoch: 9   Global Step: 162720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:27:43,117-Speed 3319.43 samples/sec   Loss 1.8435   LearningRate 0.0263   Epoch: 9   Global Step: 162730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:27:46,199-Speed 3323.55 samples/sec   Loss 1.8664   LearningRate 0.0263   Epoch: 9   Global Step: 162740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:27:49,294-Speed 3308.84 samples/sec   Loss 1.7776   LearningRate 0.0263   Epoch: 9   Global Step: 162750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:27:52,374-Speed 3326.07 samples/sec   Loss 1.7739   LearningRate 0.0263   Epoch: 9   Global Step: 162760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:27:55,506-Speed 3270.44 samples/sec   Loss 1.7853   LearningRate 0.0263   Epoch: 9   Global Step: 162770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:27:58,619-Speed 3289.37 samples/sec   Loss 1.7934   LearningRate 0.0263   Epoch: 9   Global Step: 162780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:28:01,696-Speed 3329.41 samples/sec   Loss 1.7605   LearningRate 0.0262   Epoch: 9   Global Step: 162790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:28:04,775-Speed 3325.61 samples/sec   Loss 1.8052   LearningRate 0.0262   Epoch: 9   Global Step: 162800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:07,856-Speed 3324.55 samples/sec   Loss 1.7946   LearningRate 0.0262   Epoch: 9   Global Step: 162810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:10,953-Speed 3308.10 samples/sec   Loss 1.8150   LearningRate 0.0262   Epoch: 9   Global Step: 162820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:14,048-Speed 3309.50 samples/sec   Loss 1.8275   LearningRate 0.0262   Epoch: 9   Global Step: 162830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:17,147-Speed 3304.19 samples/sec   Loss 1.8635   LearningRate 0.0262   Epoch: 9   Global Step: 162840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:20,228-Speed 3324.91 samples/sec   Loss 1.8317   LearningRate 0.0262   Epoch: 9   Global Step: 162850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:23,301-Speed 3333.07 samples/sec   Loss 1.8218   LearningRate 0.0262   Epoch: 9   Global Step: 162860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:26,414-Speed 3290.77 samples/sec   Loss 1.7978   LearningRate 0.0262   Epoch: 9   Global Step: 162870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:29,536-Speed 3280.08 samples/sec   Loss 1.8569   LearningRate 0.0262   Epoch: 9   Global Step: 162880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:32,787-Speed 3150.83 samples/sec   Loss 1.7635   LearningRate 0.0262   Epoch: 9   Global Step: 162890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:35,869-Speed 3323.19 samples/sec   Loss 1.7855   LearningRate 0.0262   Epoch: 9   Global Step: 162900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:28:38,945-Speed 3329.31 samples/sec   Loss 1.7476   LearningRate 0.0262   Epoch: 9   Global Step: 162910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:28:42,048-Speed 3300.84 samples/sec   Loss 1.8357   LearningRate 0.0262   Epoch: 9   Global Step: 162920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:28:45,193-Speed 3256.55 samples/sec   Loss 1.7921   LearningRate 0.0262   Epoch: 9   Global Step: 162930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:28:48,311-Speed 3284.65 samples/sec   Loss 1.8406   LearningRate 0.0262   Epoch: 9   Global Step: 162940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:51,399-Speed 3318.10 samples/sec   Loss 1.8074   LearningRate 0.0262   Epoch: 9   Global Step: 162950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:54,486-Speed 3317.43 samples/sec   Loss 1.8355   LearningRate 0.0262   Epoch: 9   Global Step: 162960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:28:57,608-Speed 3280.54 samples/sec   Loss 1.7707   LearningRate 0.0262   Epoch: 9   Global Step: 162970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:00,710-Speed 3301.53 samples/sec   Loss 1.7991   LearningRate 0.0262   Epoch: 9   Global Step: 162980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:03,794-Speed 3321.41 samples/sec   Loss 1.8417   LearningRate 0.0262   Epoch: 9   Global Step: 162990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:06,882-Speed 3316.89 samples/sec   Loss 1.7933   LearningRate 0.0262   Epoch: 9   Global Step: 163000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:10,014-Speed 3270.38 samples/sec   Loss 1.8839   LearningRate 0.0262   Epoch: 9   Global Step: 163010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:13,088-Speed 3332.37 samples/sec   Loss 1.8108   LearningRate 0.0262   Epoch: 9   Global Step: 163020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:16,178-Speed 3314.19 samples/sec   Loss 1.7669   LearningRate 0.0262   Epoch: 9   Global Step: 163030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:19,264-Speed 3319.45 samples/sec   Loss 1.8055   LearningRate 0.0262   Epoch: 9   Global Step: 163040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:29:22,369-Speed 3298.23 samples/sec   Loss 1.8153   LearningRate 0.0262   Epoch: 9   Global Step: 163050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:29:25,446-Speed 3328.75 samples/sec   Loss 1.7994   LearningRate 0.0262   Epoch: 9   Global Step: 163060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:29:28,616-Speed 3230.91 samples/sec   Loss 1.8326   LearningRate 0.0262   Epoch: 9   Global Step: 163070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:29:31,691-Speed 3331.31 samples/sec   Loss 1.8192   LearningRate 0.0262   Epoch: 9   Global Step: 163080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:34,767-Speed 3329.92 samples/sec   Loss 1.8006   LearningRate 0.0262   Epoch: 9   Global Step: 163090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:37,875-Speed 3295.36 samples/sec   Loss 1.8163   LearningRate 0.0262   Epoch: 9   Global Step: 163100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:40,953-Speed 3327.09 samples/sec   Loss 1.8016   LearningRate 0.0262   Epoch: 9   Global Step: 163110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:44,050-Speed 3307.49 samples/sec   Loss 1.8098   LearningRate 0.0261   Epoch: 9   Global Step: 163120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:47,180-Speed 3272.52 samples/sec   Loss 1.8390   LearningRate 0.0261   Epoch: 9   Global Step: 163130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:50,266-Speed 3318.90 samples/sec   Loss 1.8524   LearningRate 0.0261   Epoch: 9   Global Step: 163140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:53,345-Speed 3326.20 samples/sec   Loss 1.8093   LearningRate 0.0261   Epoch: 9   Global Step: 163150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:56,425-Speed 3324.98 samples/sec   Loss 1.7626   LearningRate 0.0261   Epoch: 9   Global Step: 163160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:29:59,506-Speed 3325.03 samples/sec   Loss 1.8315   LearningRate 0.0261   Epoch: 9   Global Step: 163170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:30:02,594-Speed 3317.12 samples/sec   Loss 1.7821   LearningRate 0.0261   Epoch: 9   Global Step: 163180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:05,679-Speed 3319.05 samples/sec   Loss 1.8166   LearningRate 0.0261   Epoch: 9   Global Step: 163190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:08,780-Speed 3303.37 samples/sec   Loss 1.8476   LearningRate 0.0261   Epoch: 9   Global Step: 163200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:11,862-Speed 3322.93 samples/sec   Loss 1.8670   LearningRate 0.0261   Epoch: 9   Global Step: 163210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:14,955-Speed 3311.89 samples/sec   Loss 1.8855   LearningRate 0.0261   Epoch: 9   Global Step: 163220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:18,058-Speed 3300.32 samples/sec   Loss 1.8599   LearningRate 0.0261   Epoch: 9   Global Step: 163230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:21,281-Speed 3178.70 samples/sec   Loss 1.7635   LearningRate 0.0261   Epoch: 9   Global Step: 163240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:24,360-Speed 3326.19 samples/sec   Loss 1.8145   LearningRate 0.0261   Epoch: 9   Global Step: 163250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:27,501-Speed 3260.78 samples/sec   Loss 1.7806   LearningRate 0.0261   Epoch: 9   Global Step: 163260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:30,596-Speed 3308.48 samples/sec   Loss 1.7603   LearningRate 0.0261   Epoch: 9   Global Step: 163270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:33,666-Speed 3337.71 samples/sec   Loss 1.7970   LearningRate 0.0261   Epoch: 9   Global Step: 163280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:36,746-Speed 3325.55 samples/sec   Loss 1.7818   LearningRate 0.0261   Epoch: 9   Global Step: 163290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:39,888-Speed 3260.38 samples/sec   Loss 1.8633   LearningRate 0.0261   Epoch: 9   Global Step: 163300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:42,973-Speed 3320.02 samples/sec   Loss 1.8038   LearningRate 0.0261   Epoch: 9   Global Step: 163310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:30:46,058-Speed 3320.34 samples/sec   Loss 1.8329   LearningRate 0.0261   Epoch: 9   Global Step: 163320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:30:49,202-Speed 3257.90 samples/sec   Loss 1.7972   LearningRate 0.0261   Epoch: 9   Global Step: 163330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:30:52,304-Speed 3301.19 samples/sec   Loss 1.8552   LearningRate 0.0261   Epoch: 9   Global Step: 163340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:30:55,414-Speed 3293.38 samples/sec   Loss 1.8054   LearningRate 0.0261   Epoch: 9   Global Step: 163350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:30:58,500-Speed 3319.46 samples/sec   Loss 1.8073   LearningRate 0.0261   Epoch: 9   Global Step: 163360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:01,580-Speed 3324.88 samples/sec   Loss 1.7721   LearningRate 0.0261   Epoch: 9   Global Step: 163370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:04,682-Speed 3302.10 samples/sec   Loss 1.7630   LearningRate 0.0261   Epoch: 9   Global Step: 163380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:07,772-Speed 3314.52 samples/sec   Loss 1.8094   LearningRate 0.0261   Epoch: 9   Global Step: 163390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:10,881-Speed 3294.42 samples/sec   Loss 1.8134   LearningRate 0.0261   Epoch: 9   Global Step: 163400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:13,996-Speed 3288.96 samples/sec   Loss 1.7899   LearningRate 0.0261   Epoch: 9   Global Step: 163410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:17,080-Speed 3320.71 samples/sec   Loss 1.7444   LearningRate 0.0261   Epoch: 9   Global Step: 163420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:31:20,166-Speed 3319.07 samples/sec   Loss 1.7961   LearningRate 0.0261   Epoch: 9   Global Step: 163430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:31:23,321-Speed 3245.91 samples/sec   Loss 1.8099   LearningRate 0.0261   Epoch: 9   Global Step: 163440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:31:26,559-Speed 3163.70 samples/sec   Loss 1.8339   LearningRate 0.0260   Epoch: 9   Global Step: 163450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:31:29,768-Speed 3191.14 samples/sec   Loss 1.8118   LearningRate 0.0260   Epoch: 9   Global Step: 163460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:31:32,867-Speed 3305.69 samples/sec   Loss 1.7612   LearningRate 0.0260   Epoch: 9   Global Step: 163470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:31:35,980-Speed 3290.21 samples/sec   Loss 1.8514   LearningRate 0.0260   Epoch: 9   Global Step: 163480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:39,071-Speed 3312.99 samples/sec   Loss 1.8331   LearningRate 0.0260   Epoch: 9   Global Step: 163490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:42,167-Speed 3308.60 samples/sec   Loss 1.8246   LearningRate 0.0260   Epoch: 9   Global Step: 163500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:45,249-Speed 3323.63 samples/sec   Loss 1.7901   LearningRate 0.0260   Epoch: 9   Global Step: 163510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:48,358-Speed 3294.58 samples/sec   Loss 1.8348   LearningRate 0.0260   Epoch: 9   Global Step: 163520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:51,536-Speed 3222.25 samples/sec   Loss 1.8282   LearningRate 0.0260   Epoch: 9   Global Step: 163530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:54,643-Speed 3296.52 samples/sec   Loss 1.7975   LearningRate 0.0260   Epoch: 9   Global Step: 163540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:31:57,722-Speed 3326.69 samples/sec   Loss 1.8300   LearningRate 0.0260   Epoch: 9   Global Step: 163550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:00,808-Speed 3319.12 samples/sec   Loss 1.7953   LearningRate 0.0260   Epoch: 9   Global Step: 163560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:03,904-Speed 3308.49 samples/sec   Loss 1.7901   LearningRate 0.0260   Epoch: 9   Global Step: 163570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:07,032-Speed 3274.33 samples/sec   Loss 1.7133   LearningRate 0.0260   Epoch: 9   Global Step: 163580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:32:10,159-Speed 3275.65 samples/sec   Loss 1.8016   LearningRate 0.0260   Epoch: 9   Global Step: 163590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:32:13,244-Speed 3320.37 samples/sec   Loss 1.8075   LearningRate 0.0260   Epoch: 9   Global Step: 163600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:16,330-Speed 3319.07 samples/sec   Loss 1.8292   LearningRate 0.0260   Epoch: 9   Global Step: 163610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:19,413-Speed 3321.48 samples/sec   Loss 1.8309   LearningRate 0.0260   Epoch: 9   Global Step: 163620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:22,529-Speed 3287.00 samples/sec   Loss 1.7910   LearningRate 0.0260   Epoch: 9   Global Step: 163630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:25,677-Speed 3254.53 samples/sec   Loss 1.7637   LearningRate 0.0260   Epoch: 9   Global Step: 163640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:28,774-Speed 3306.35 samples/sec   Loss 1.7659   LearningRate 0.0260   Epoch: 9   Global Step: 163650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:31,854-Speed 3326.13 samples/sec   Loss 1.8498   LearningRate 0.0260   Epoch: 9   Global Step: 163660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:34,935-Speed 3323.80 samples/sec   Loss 1.8605   LearningRate 0.0260   Epoch: 9   Global Step: 163670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:38,017-Speed 3323.88 samples/sec   Loss 1.7980   LearningRate 0.0260   Epoch: 9   Global Step: 163680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:41,101-Speed 3321.12 samples/sec   Loss 1.7774   LearningRate 0.0260   Epoch: 9   Global Step: 163690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:32:44,246-Speed 3256.92 samples/sec   Loss 1.8470   LearningRate 0.0260   Epoch: 9   Global Step: 163700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:32:47,351-Speed 3298.28 samples/sec   Loss 1.7760   LearningRate 0.0260   Epoch: 9   Global Step: 163710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:32:50,461-Speed 3293.27 samples/sec   Loss 1.7331   LearningRate 0.0260   Epoch: 9   Global Step: 163720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:32:53,580-Speed 3283.74 samples/sec   Loss 1.8164   LearningRate 0.0260   Epoch: 9   Global Step: 163730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:32:56,663-Speed 3323.17 samples/sec   Loss 1.7744   LearningRate 0.0260   Epoch: 9   Global Step: 163740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:32:59,788-Speed 3277.30 samples/sec   Loss 1.7615   LearningRate 0.0260   Epoch: 9   Global Step: 163750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:33:02,882-Speed 3309.88 samples/sec   Loss 1.7985   LearningRate 0.0260   Epoch: 9   Global Step: 163760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:33:05,976-Speed 3310.18 samples/sec   Loss 1.7824   LearningRate 0.0259   Epoch: 9   Global Step: 163770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:33:09,054-Speed 3328.39 samples/sec   Loss 1.7722   LearningRate 0.0259   Epoch: 9   Global Step: 163780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:33:12,148-Speed 3309.66 samples/sec   Loss 1.7821   LearningRate 0.0259   Epoch: 9   Global Step: 163790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:33:15,214-Speed 3341.23 samples/sec   Loss 1.8312   LearningRate 0.0259   Epoch: 9   Global Step: 163800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:18,293-Speed 3326.34 samples/sec   Loss 1.7847   LearningRate 0.0259   Epoch: 9   Global Step: 163810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:21,380-Speed 3318.90 samples/sec   Loss 1.7271   LearningRate 0.0259   Epoch: 9   Global Step: 163820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:24,458-Speed 3327.22 samples/sec   Loss 1.8355   LearningRate 0.0259   Epoch: 9   Global Step: 163830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:27,544-Speed 3318.81 samples/sec   Loss 1.8246   LearningRate 0.0259   Epoch: 9   Global Step: 163840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:30,641-Speed 3307.41 samples/sec   Loss 1.7527   LearningRate 0.0259   Epoch: 9   Global Step: 163850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:33,727-Speed 3318.96 samples/sec   Loss 1.8461   LearningRate 0.0259   Epoch: 9   Global Step: 163860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:36,928-Speed 3199.32 samples/sec   Loss 1.8050   LearningRate 0.0259   Epoch: 9   Global Step: 163870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:40,118-Speed 3210.54 samples/sec   Loss 1.8017   LearningRate 0.0259   Epoch: 9   Global Step: 163880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:43,238-Speed 3282.80 samples/sec   Loss 1.7905   LearningRate 0.0259   Epoch: 9   Global Step: 163890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:33:46,342-Speed 3300.42 samples/sec   Loss 1.8229   LearningRate 0.0259   Epoch: 9   Global Step: 163900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:33:49,459-Speed 3286.55 samples/sec   Loss 1.7706   LearningRate 0.0259   Epoch: 9   Global Step: 163910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:33:52,673-Speed 3186.03 samples/sec   Loss 1.8291   LearningRate 0.0259   Epoch: 9   Global Step: 163920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:33:55,792-Speed 3284.53 samples/sec   Loss 1.8032   LearningRate 0.0259   Epoch: 9   Global Step: 163930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:33:58,872-Speed 3324.83 samples/sec   Loss 1.8124   LearningRate 0.0259   Epoch: 9   Global Step: 163940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:34:01,952-Speed 3326.24 samples/sec   Loss 1.7297   LearningRate 0.0259   Epoch: 9   Global Step: 163950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:34:05,188-Speed 3164.34 samples/sec   Loss 1.8061   LearningRate 0.0259   Epoch: 9   Global Step: 163960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:34:08,351-Speed 3237.74 samples/sec   Loss 1.8198   LearningRate 0.0259   Epoch: 9   Global Step: 163970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:34:11,532-Speed 3220.00 samples/sec   Loss 1.8442   LearningRate 0.0259   Epoch: 9   Global Step: 163980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:34:14,643-Speed 3293.29 samples/sec   Loss 1.8629   LearningRate 0.0259   Epoch: 9   Global Step: 163990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:34:17,750-Speed 3295.92 samples/sec   Loss 1.7982   LearningRate 0.0259   Epoch: 9   Global Step: 164000   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:35:01,670-[lfw][164000]XNorm: 21.350872
Training: 2022-04-11 16:35:01,670-[lfw][164000]Accuracy-Flip: 0.99750+-0.00291
Training: 2022-04-11 16:35:01,671-[lfw][164000]Accuracy-Highest: 0.99817
Training: 2022-04-11 16:35:52,459-[cfp_fp][164000]XNorm: 20.389568
Training: 2022-04-11 16:35:52,459-[cfp_fp][164000]Accuracy-Flip: 0.98971+-0.00432
Training: 2022-04-11 16:35:52,460-[cfp_fp][164000]Accuracy-Highest: 0.98971
Training: 2022-04-11 16:36:36,168-[agedb_30][164000]XNorm: 21.657997
Training: 2022-04-11 16:36:36,168-[agedb_30][164000]Accuracy-Flip: 0.98450+-0.00578
Training: 2022-04-11 16:36:36,169-[agedb_30][164000]Accuracy-Highest: 0.98450
Training: 2022-04-11 16:36:39,315-Speed 72.34 samples/sec   Loss 1.8630   LearningRate 0.0259   Epoch: 9   Global Step: 164010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:36:42,413-Speed 3305.22 samples/sec   Loss 1.8336   LearningRate 0.0259   Epoch: 9   Global Step: 164020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:36:45,564-Speed 3250.78 samples/sec   Loss 1.8404   LearningRate 0.0259   Epoch: 9   Global Step: 164030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:36:48,641-Speed 3328.93 samples/sec   Loss 1.8027   LearningRate 0.0259   Epoch: 9   Global Step: 164040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:36:51,707-Speed 3340.48 samples/sec   Loss 1.8093   LearningRate 0.0259   Epoch: 9   Global Step: 164050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:36:54,777-Speed 3335.84 samples/sec   Loss 1.8047   LearningRate 0.0259   Epoch: 9   Global Step: 164060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:36:57,874-Speed 3306.81 samples/sec   Loss 1.7909   LearningRate 0.0259   Epoch: 9   Global Step: 164070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:00,980-Speed 3298.35 samples/sec   Loss 1.7397   LearningRate 0.0259   Epoch: 9   Global Step: 164080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:04,135-Speed 3246.39 samples/sec   Loss 1.7999   LearningRate 0.0259   Epoch: 9   Global Step: 164090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:07,231-Speed 3308.37 samples/sec   Loss 1.7767   LearningRate 0.0258   Epoch: 9   Global Step: 164100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:10,299-Speed 3338.88 samples/sec   Loss 1.8251   LearningRate 0.0258   Epoch: 9   Global Step: 164110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:13,382-Speed 3321.89 samples/sec   Loss 1.8209   LearningRate 0.0258   Epoch: 9   Global Step: 164120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:16,443-Speed 3345.39 samples/sec   Loss 1.7638   LearningRate 0.0258   Epoch: 9   Global Step: 164130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:19,517-Speed 3332.41 samples/sec   Loss 1.7262   LearningRate 0.0258   Epoch: 9   Global Step: 164140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:22,744-Speed 3173.99 samples/sec   Loss 1.7994   LearningRate 0.0258   Epoch: 9   Global Step: 164150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:25,856-Speed 3291.64 samples/sec   Loss 1.8097   LearningRate 0.0258   Epoch: 9   Global Step: 164160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:28,934-Speed 3327.54 samples/sec   Loss 1.8771   LearningRate 0.0258   Epoch: 9   Global Step: 164170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:32,011-Speed 3328.10 samples/sec   Loss 1.7625   LearningRate 0.0258   Epoch: 9   Global Step: 164180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:35,083-Speed 3335.03 samples/sec   Loss 1.8560   LearningRate 0.0258   Epoch: 9   Global Step: 164190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:38,159-Speed 3328.94 samples/sec   Loss 1.7787   LearningRate 0.0258   Epoch: 9   Global Step: 164200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:41,246-Speed 3317.93 samples/sec   Loss 1.8456   LearningRate 0.0258   Epoch: 9   Global Step: 164210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:44,349-Speed 3300.71 samples/sec   Loss 1.7727   LearningRate 0.0258   Epoch: 9   Global Step: 164220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:37:47,430-Speed 3324.88 samples/sec   Loss 1.7859   LearningRate 0.0258   Epoch: 9   Global Step: 164230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:50,526-Speed 3307.92 samples/sec   Loss 1.8004   LearningRate 0.0258   Epoch: 9   Global Step: 164240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:53,623-Speed 3307.27 samples/sec   Loss 1.7698   LearningRate 0.0258   Epoch: 9   Global Step: 164250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:56,721-Speed 3306.08 samples/sec   Loss 1.8382   LearningRate 0.0258   Epoch: 9   Global Step: 164260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:37:59,809-Speed 3317.70 samples/sec   Loss 1.8260   LearningRate 0.0258   Epoch: 9   Global Step: 164270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:02,903-Speed 3310.29 samples/sec   Loss 1.7698   LearningRate 0.0258   Epoch: 9   Global Step: 164280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:06,060-Speed 3243.58 samples/sec   Loss 1.8057   LearningRate 0.0258   Epoch: 9   Global Step: 164290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:09,154-Speed 3310.45 samples/sec   Loss 1.7827   LearningRate 0.0258   Epoch: 9   Global Step: 164300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:12,231-Speed 3328.97 samples/sec   Loss 1.8374   LearningRate 0.0258   Epoch: 9   Global Step: 164310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:15,332-Speed 3302.30 samples/sec   Loss 1.7628   LearningRate 0.0258   Epoch: 9   Global Step: 164320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:18,432-Speed 3304.47 samples/sec   Loss 1.8571   LearningRate 0.0258   Epoch: 9   Global Step: 164330   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:38:21,506-Speed 3331.84 samples/sec   Loss 1.8172   LearningRate 0.0258   Epoch: 9   Global Step: 164340   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:38:24,575-Speed 3338.21 samples/sec   Loss 1.8203   LearningRate 0.0258   Epoch: 9   Global Step: 164350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:27,698-Speed 3278.55 samples/sec   Loss 1.8315   LearningRate 0.0258   Epoch: 9   Global Step: 164360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:30,778-Speed 3326.16 samples/sec   Loss 1.8345   LearningRate 0.0258   Epoch: 9   Global Step: 164370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:33,853-Speed 3330.89 samples/sec   Loss 1.8182   LearningRate 0.0258   Epoch: 9   Global Step: 164380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:36,941-Speed 3316.96 samples/sec   Loss 1.7868   LearningRate 0.0258   Epoch: 9   Global Step: 164390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:40,030-Speed 3314.85 samples/sec   Loss 1.8744   LearningRate 0.0258   Epoch: 9   Global Step: 164400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:43,102-Speed 3333.94 samples/sec   Loss 1.8836   LearningRate 0.0258   Epoch: 9   Global Step: 164410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:46,304-Speed 3199.51 samples/sec   Loss 1.7992   LearningRate 0.0258   Epoch: 9   Global Step: 164420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:49,410-Speed 3298.20 samples/sec   Loss 1.7866   LearningRate 0.0257   Epoch: 9   Global Step: 164430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:52,489-Speed 3326.14 samples/sec   Loss 1.8168   LearningRate 0.0257   Epoch: 9   Global Step: 164440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:55,560-Speed 3334.98 samples/sec   Loss 1.8396   LearningRate 0.0257   Epoch: 9   Global Step: 164450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:38:58,652-Speed 3312.98 samples/sec   Loss 1.8109   LearningRate 0.0257   Epoch: 9   Global Step: 164460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:01,827-Speed 3225.62 samples/sec   Loss 1.7562   LearningRate 0.0257   Epoch: 9   Global Step: 164470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:04,905-Speed 3327.60 samples/sec   Loss 1.7913   LearningRate 0.0257   Epoch: 9   Global Step: 164480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:07,992-Speed 3317.95 samples/sec   Loss 1.7828   LearningRate 0.0257   Epoch: 9   Global Step: 164490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:11,072-Speed 3325.71 samples/sec   Loss 1.8130   LearningRate 0.0257   Epoch: 9   Global Step: 164500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:14,150-Speed 3327.54 samples/sec   Loss 1.7837   LearningRate 0.0257   Epoch: 9   Global Step: 164510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:17,337-Speed 3213.69 samples/sec   Loss 1.7488   LearningRate 0.0257   Epoch: 9   Global Step: 164520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:20,474-Speed 3264.69 samples/sec   Loss 1.7764   LearningRate 0.0257   Epoch: 9   Global Step: 164530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:23,563-Speed 3316.55 samples/sec   Loss 1.7758   LearningRate 0.0257   Epoch: 9   Global Step: 164540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:26,635-Speed 3333.46 samples/sec   Loss 1.8041   LearningRate 0.0257   Epoch: 9   Global Step: 164550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:29,714-Speed 3326.73 samples/sec   Loss 1.8235   LearningRate 0.0257   Epoch: 9   Global Step: 164560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:32,790-Speed 3329.58 samples/sec   Loss 1.7407   LearningRate 0.0257   Epoch: 9   Global Step: 164570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:35,872-Speed 3322.89 samples/sec   Loss 1.8439   LearningRate 0.0257   Epoch: 9   Global Step: 164580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:38,949-Speed 3328.52 samples/sec   Loss 1.7889   LearningRate 0.0257   Epoch: 9   Global Step: 164590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:42,036-Speed 3318.39 samples/sec   Loss 1.7943   LearningRate 0.0257   Epoch: 9   Global Step: 164600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:45,114-Speed 3328.37 samples/sec   Loss 1.7589   LearningRate 0.0257   Epoch: 9   Global Step: 164610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:39:48,176-Speed 3344.33 samples/sec   Loss 1.7597   LearningRate 0.0257   Epoch: 9   Global Step: 164620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:39:51,248-Speed 3334.75 samples/sec   Loss 1.7401   LearningRate 0.0257   Epoch: 9   Global Step: 164630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:39:54,330-Speed 3322.43 samples/sec   Loss 1.8143   LearningRate 0.0257   Epoch: 9   Global Step: 164640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:39:57,412-Speed 3323.47 samples/sec   Loss 1.7770   LearningRate 0.0257   Epoch: 9   Global Step: 164650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:00,489-Speed 3328.64 samples/sec   Loss 1.7680   LearningRate 0.0257   Epoch: 9   Global Step: 164660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:03,576-Speed 3318.34 samples/sec   Loss 1.8684   LearningRate 0.0257   Epoch: 9   Global Step: 164670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:06,678-Speed 3301.97 samples/sec   Loss 1.7804   LearningRate 0.0257   Epoch: 9   Global Step: 164680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:09,759-Speed 3323.94 samples/sec   Loss 1.7695   LearningRate 0.0257   Epoch: 9   Global Step: 164690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:12,848-Speed 3316.06 samples/sec   Loss 1.7587   LearningRate 0.0257   Epoch: 9   Global Step: 164700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:15,963-Speed 3288.60 samples/sec   Loss 1.7969   LearningRate 0.0257   Epoch: 9   Global Step: 164710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:19,040-Speed 3328.63 samples/sec   Loss 1.7646   LearningRate 0.0257   Epoch: 9   Global Step: 164720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:40:22,187-Speed 3253.85 samples/sec   Loss 1.7921   LearningRate 0.0257   Epoch: 9   Global Step: 164730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:40:25,327-Speed 3262.06 samples/sec   Loss 1.7838   LearningRate 0.0257   Epoch: 9   Global Step: 164740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:40:28,414-Speed 3318.20 samples/sec   Loss 1.8298   LearningRate 0.0257   Epoch: 9   Global Step: 164750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:40:31,498-Speed 3320.50 samples/sec   Loss 1.8164   LearningRate 0.0256   Epoch: 9   Global Step: 164760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:40:34,586-Speed 3318.11 samples/sec   Loss 1.8509   LearningRate 0.0256   Epoch: 9   Global Step: 164770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:40:37,666-Speed 3325.19 samples/sec   Loss 1.8003   LearningRate 0.0256   Epoch: 9   Global Step: 164780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:40,741-Speed 3330.47 samples/sec   Loss 1.7559   LearningRate 0.0256   Epoch: 9   Global Step: 164790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:43,849-Speed 3295.73 samples/sec   Loss 1.8392   LearningRate 0.0256   Epoch: 9   Global Step: 164800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:46,939-Speed 3314.71 samples/sec   Loss 1.7610   LearningRate 0.0256   Epoch: 9   Global Step: 164810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:50,016-Speed 3328.64 samples/sec   Loss 1.8079   LearningRate 0.0256   Epoch: 9   Global Step: 164820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:53,124-Speed 3295.92 samples/sec   Loss 1.7555   LearningRate 0.0256   Epoch: 9   Global Step: 164830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:56,196-Speed 3333.74 samples/sec   Loss 1.7852   LearningRate 0.0256   Epoch: 9   Global Step: 164840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:40:59,273-Speed 3328.14 samples/sec   Loss 1.7827   LearningRate 0.0256   Epoch: 9   Global Step: 164850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:02,370-Speed 3307.28 samples/sec   Loss 1.7914   LearningRate 0.0256   Epoch: 9   Global Step: 164860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:05,447-Speed 3329.16 samples/sec   Loss 1.8319   LearningRate 0.0256   Epoch: 9   Global Step: 164870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:08,573-Speed 3276.24 samples/sec   Loss 1.7937   LearningRate 0.0256   Epoch: 9   Global Step: 164880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:41:11,677-Speed 3300.69 samples/sec   Loss 1.7347   LearningRate 0.0256   Epoch: 9   Global Step: 164890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:41:14,940-Speed 3138.07 samples/sec   Loss 1.8329   LearningRate 0.0256   Epoch: 9   Global Step: 164900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:41:18,045-Speed 3298.98 samples/sec   Loss 1.7360   LearningRate 0.0256   Epoch: 9   Global Step: 164910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:41:21,159-Speed 3288.59 samples/sec   Loss 1.7796   LearningRate 0.0256   Epoch: 9   Global Step: 164920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:24,290-Speed 3272.00 samples/sec   Loss 1.7514   LearningRate 0.0256   Epoch: 9   Global Step: 164930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:27,380-Speed 3314.27 samples/sec   Loss 1.8069   LearningRate 0.0256   Epoch: 9   Global Step: 164940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:30,596-Speed 3185.38 samples/sec   Loss 1.7747   LearningRate 0.0256   Epoch: 9   Global Step: 164950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:33,849-Speed 3148.27 samples/sec   Loss 1.7597   LearningRate 0.0256   Epoch: 9   Global Step: 164960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:37,101-Speed 3149.74 samples/sec   Loss 1.7438   LearningRate 0.0256   Epoch: 9   Global Step: 164970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:40,298-Speed 3204.09 samples/sec   Loss 1.7918   LearningRate 0.0256   Epoch: 9   Global Step: 164980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:43,457-Speed 3242.01 samples/sec   Loss 1.7838   LearningRate 0.0256   Epoch: 9   Global Step: 164990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:46,615-Speed 3243.02 samples/sec   Loss 1.7820   LearningRate 0.0256   Epoch: 9   Global Step: 165000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:49,768-Speed 3248.42 samples/sec   Loss 1.8147   LearningRate 0.0256   Epoch: 9   Global Step: 165010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:41:52,852-Speed 3321.01 samples/sec   Loss 1.7672   LearningRate 0.0256   Epoch: 9   Global Step: 165020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:41:55,949-Speed 3307.31 samples/sec   Loss 1.7811   LearningRate 0.0256   Epoch: 9   Global Step: 165030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:41:59,061-Speed 3291.31 samples/sec   Loss 1.8353   LearningRate 0.0256   Epoch: 9   Global Step: 165040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:42:02,190-Speed 3273.48 samples/sec   Loss 1.7787   LearningRate 0.0256   Epoch: 9   Global Step: 165050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:42:05,306-Speed 3286.78 samples/sec   Loss 1.8289   LearningRate 0.0256   Epoch: 9   Global Step: 165060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:42:08,505-Speed 3202.10 samples/sec   Loss 1.8016   LearningRate 0.0256   Epoch: 9   Global Step: 165070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:42:11,744-Speed 3162.50 samples/sec   Loss 1.7971   LearningRate 0.0256   Epoch: 9   Global Step: 165080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:42:14,851-Speed 3296.22 samples/sec   Loss 1.8375   LearningRate 0.0255   Epoch: 9   Global Step: 165090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:42:18,019-Speed 3232.73 samples/sec   Loss 1.8407   LearningRate 0.0255   Epoch: 9   Global Step: 165100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:42:21,184-Speed 3236.91 samples/sec   Loss 1.8247   LearningRate 0.0255   Epoch: 9   Global Step: 165110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:42:24,276-Speed 3311.94 samples/sec   Loss 1.7974   LearningRate 0.0255   Epoch: 9   Global Step: 165120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:27,363-Speed 3317.96 samples/sec   Loss 1.8386   LearningRate 0.0255   Epoch: 9   Global Step: 165130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:30,479-Speed 3287.55 samples/sec   Loss 1.8077   LearningRate 0.0255   Epoch: 9   Global Step: 165140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:33,711-Speed 3168.71 samples/sec   Loss 1.7995   LearningRate 0.0255   Epoch: 9   Global Step: 165150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:36,868-Speed 3252.18 samples/sec   Loss 1.7562   LearningRate 0.0255   Epoch: 9   Global Step: 165160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:40,058-Speed 3211.50 samples/sec   Loss 1.8066   LearningRate 0.0255   Epoch: 9   Global Step: 165170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:43,212-Speed 3247.05 samples/sec   Loss 1.8168   LearningRate 0.0255   Epoch: 9   Global Step: 165180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:46,341-Speed 3273.48 samples/sec   Loss 1.8277   LearningRate 0.0255   Epoch: 9   Global Step: 165190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:49,419-Speed 3328.24 samples/sec   Loss 1.8055   LearningRate 0.0255   Epoch: 9   Global Step: 165200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:52,507-Speed 3316.54 samples/sec   Loss 1.8046   LearningRate 0.0255   Epoch: 9   Global Step: 165210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:42:55,590-Speed 3321.93 samples/sec   Loss 1.8240   LearningRate 0.0255   Epoch: 9   Global Step: 165220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:42:58,676-Speed 3318.63 samples/sec   Loss 1.8120   LearningRate 0.0255   Epoch: 9   Global Step: 165230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:43:01,758-Speed 3323.40 samples/sec   Loss 1.7732   LearningRate 0.0255   Epoch: 9   Global Step: 165240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:43:04,846-Speed 3317.00 samples/sec   Loss 1.8462   LearningRate 0.0255   Epoch: 9   Global Step: 165250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:43:07,931-Speed 3319.79 samples/sec   Loss 1.7927   LearningRate 0.0255   Epoch: 9   Global Step: 165260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:43:11,012-Speed 3324.22 samples/sec   Loss 1.7726   LearningRate 0.0255   Epoch: 9   Global Step: 165270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:43:14,132-Speed 3283.70 samples/sec   Loss 1.7821   LearningRate 0.0255   Epoch: 9   Global Step: 165280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:17,228-Speed 3308.39 samples/sec   Loss 1.7869   LearningRate 0.0255   Epoch: 9   Global Step: 165290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:20,311-Speed 3322.13 samples/sec   Loss 1.8122   LearningRate 0.0255   Epoch: 9   Global Step: 165300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:23,391-Speed 3325.46 samples/sec   Loss 1.8263   LearningRate 0.0255   Epoch: 9   Global Step: 165310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:26,479-Speed 3316.54 samples/sec   Loss 1.7826   LearningRate 0.0255   Epoch: 9   Global Step: 165320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:29,560-Speed 3324.18 samples/sec   Loss 1.8068   LearningRate 0.0255   Epoch: 9   Global Step: 165330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:32,650-Speed 3314.98 samples/sec   Loss 1.8180   LearningRate 0.0255   Epoch: 9   Global Step: 165340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:35,731-Speed 3323.50 samples/sec   Loss 1.7949   LearningRate 0.0255   Epoch: 9   Global Step: 165350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:38,830-Speed 3305.13 samples/sec   Loss 1.7936   LearningRate 0.0255   Epoch: 9   Global Step: 165360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:41,916-Speed 3319.89 samples/sec   Loss 1.8645   LearningRate 0.0255   Epoch: 9   Global Step: 165370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:43:44,999-Speed 3321.74 samples/sec   Loss 1.8021   LearningRate 0.0255   Epoch: 9   Global Step: 165380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:43:48,083-Speed 3320.96 samples/sec   Loss 1.7599   LearningRate 0.0255   Epoch: 9   Global Step: 165390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:43:51,182-Speed 3305.23 samples/sec   Loss 1.7485   LearningRate 0.0255   Epoch: 9   Global Step: 165400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:43:54,275-Speed 3311.35 samples/sec   Loss 1.8386   LearningRate 0.0255   Epoch: 9   Global Step: 165410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:43:57,365-Speed 3314.61 samples/sec   Loss 1.7640   LearningRate 0.0254   Epoch: 9   Global Step: 165420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:00,450-Speed 3320.03 samples/sec   Loss 1.8954   LearningRate 0.0254   Epoch: 9   Global Step: 165430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:03,548-Speed 3306.08 samples/sec   Loss 1.7304   LearningRate 0.0254   Epoch: 9   Global Step: 165440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:06,642-Speed 3311.11 samples/sec   Loss 1.8139   LearningRate 0.0254   Epoch: 9   Global Step: 165450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:09,766-Speed 3278.90 samples/sec   Loss 1.7539   LearningRate 0.0254   Epoch: 9   Global Step: 165460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:12,850-Speed 3320.55 samples/sec   Loss 1.7938   LearningRate 0.0254   Epoch: 9   Global Step: 165470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:15,920-Speed 3336.86 samples/sec   Loss 1.7872   LearningRate 0.0254   Epoch: 9   Global Step: 165480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:19,000-Speed 3325.02 samples/sec   Loss 1.8103   LearningRate 0.0254   Epoch: 9   Global Step: 165490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:22,089-Speed 3315.96 samples/sec   Loss 1.7729   LearningRate 0.0254   Epoch: 9   Global Step: 165500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:25,213-Speed 3278.45 samples/sec   Loss 1.7396   LearningRate 0.0254   Epoch: 9   Global Step: 165510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:28,311-Speed 3306.08 samples/sec   Loss 1.7189   LearningRate 0.0254   Epoch: 9   Global Step: 165520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:31,419-Speed 3295.60 samples/sec   Loss 1.7625   LearningRate 0.0254   Epoch: 9   Global Step: 165530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:34,517-Speed 3305.85 samples/sec   Loss 1.8470   LearningRate 0.0254   Epoch: 9   Global Step: 165540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:44:37,587-Speed 3336.19 samples/sec   Loss 1.7904   LearningRate 0.0254   Epoch: 9   Global Step: 165550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:44:40,668-Speed 3325.17 samples/sec   Loss 1.7913   LearningRate 0.0254   Epoch: 9   Global Step: 165560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:44:43,800-Speed 3269.56 samples/sec   Loss 1.7852   LearningRate 0.0254   Epoch: 9   Global Step: 165570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:44:46,903-Speed 3300.97 samples/sec   Loss 1.7401   LearningRate 0.0254   Epoch: 9   Global Step: 165580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:44:50,001-Speed 3306.32 samples/sec   Loss 1.7136   LearningRate 0.0254   Epoch: 9   Global Step: 165590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:44:53,085-Speed 3320.80 samples/sec   Loss 1.7780   LearningRate 0.0254   Epoch: 9   Global Step: 165600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:44:56,194-Speed 3294.81 samples/sec   Loss 1.8323   LearningRate 0.0254   Epoch: 9   Global Step: 165610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:44:59,309-Speed 3288.54 samples/sec   Loss 1.7579   LearningRate 0.0254   Epoch: 9   Global Step: 165620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:02,401-Speed 3312.46 samples/sec   Loss 1.8273   LearningRate 0.0254   Epoch: 9   Global Step: 165630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:05,483-Speed 3322.81 samples/sec   Loss 1.7646   LearningRate 0.0254   Epoch: 9   Global Step: 165640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:08,569-Speed 3319.76 samples/sec   Loss 1.8078   LearningRate 0.0254   Epoch: 9   Global Step: 165650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:45:12,429-Speed 2653.30 samples/sec   Loss 1.7566   LearningRate 0.0254   Epoch: 9   Global Step: 165660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:45:15,496-Speed 3338.90 samples/sec   Loss 1.7924   LearningRate 0.0254   Epoch: 9   Global Step: 165670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:18,600-Speed 3299.68 samples/sec   Loss 1.7600   LearningRate 0.0254   Epoch: 9   Global Step: 165680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:21,690-Speed 3315.07 samples/sec   Loss 1.7933   LearningRate 0.0254   Epoch: 9   Global Step: 165690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:24,807-Speed 3285.73 samples/sec   Loss 1.7568   LearningRate 0.0254   Epoch: 9   Global Step: 165700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:27,961-Speed 3247.61 samples/sec   Loss 1.8078   LearningRate 0.0254   Epoch: 9   Global Step: 165710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:31,070-Speed 3293.87 samples/sec   Loss 1.7149   LearningRate 0.0254   Epoch: 9   Global Step: 165720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:34,148-Speed 3327.39 samples/sec   Loss 1.8161   LearningRate 0.0254   Epoch: 9   Global Step: 165730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:37,222-Speed 3332.82 samples/sec   Loss 1.7407   LearningRate 0.0254   Epoch: 9   Global Step: 165740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:40,299-Speed 3328.23 samples/sec   Loss 1.7443   LearningRate 0.0253   Epoch: 9   Global Step: 165750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:43,374-Speed 3331.15 samples/sec   Loss 1.7622   LearningRate 0.0253   Epoch: 9   Global Step: 165760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:45:46,452-Speed 3327.22 samples/sec   Loss 1.7787   LearningRate 0.0253   Epoch: 9   Global Step: 165770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:45:49,528-Speed 3330.13 samples/sec   Loss 1.7869   LearningRate 0.0253   Epoch: 9   Global Step: 165780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:45:52,611-Speed 3321.82 samples/sec   Loss 1.7895   LearningRate 0.0253   Epoch: 9   Global Step: 165790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:45:55,732-Speed 3282.43 samples/sec   Loss 1.7887   LearningRate 0.0253   Epoch: 9   Global Step: 165800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:45:58,826-Speed 3309.86 samples/sec   Loss 1.8607   LearningRate 0.0253   Epoch: 9   Global Step: 165810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:46:01,923-Speed 3307.68 samples/sec   Loss 1.8184   LearningRate 0.0253   Epoch: 9   Global Step: 165820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:46:05,009-Speed 3319.37 samples/sec   Loss 1.7668   LearningRate 0.0253   Epoch: 9   Global Step: 165830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:46:08,095-Speed 3318.24 samples/sec   Loss 1.7897   LearningRate 0.0253   Epoch: 9   Global Step: 165840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:46:11,201-Speed 3298.50 samples/sec   Loss 1.7906   LearningRate 0.0253   Epoch: 9   Global Step: 165850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:46:14,258-Speed 3350.45 samples/sec   Loss 1.7683   LearningRate 0.0253   Epoch: 9   Global Step: 165860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:17,393-Speed 3266.43 samples/sec   Loss 1.7786   LearningRate 0.0253   Epoch: 9   Global Step: 165870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:20,508-Speed 3288.29 samples/sec   Loss 1.7425   LearningRate 0.0253   Epoch: 9   Global Step: 165880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:23,584-Speed 3329.82 samples/sec   Loss 1.7863   LearningRate 0.0253   Epoch: 9   Global Step: 165890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:26,665-Speed 3323.89 samples/sec   Loss 1.7555   LearningRate 0.0253   Epoch: 9   Global Step: 165900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:29,800-Speed 3267.49 samples/sec   Loss 1.7874   LearningRate 0.0253   Epoch: 9   Global Step: 165910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:32,920-Speed 3283.51 samples/sec   Loss 1.7734   LearningRate 0.0253   Epoch: 9   Global Step: 165920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:36,062-Speed 3259.44 samples/sec   Loss 1.8148   LearningRate 0.0253   Epoch: 9   Global Step: 165930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:39,145-Speed 3321.89 samples/sec   Loss 1.8016   LearningRate 0.0253   Epoch: 9   Global Step: 165940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:42,326-Speed 3220.09 samples/sec   Loss 1.8484   LearningRate 0.0253   Epoch: 9   Global Step: 165950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 16:46:45,445-Speed 3283.90 samples/sec   Loss 1.8066   LearningRate 0.0253   Epoch: 9   Global Step: 165960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:46:48,529-Speed 3320.50 samples/sec   Loss 1.7107   LearningRate 0.0253   Epoch: 9   Global Step: 165970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:46:51,628-Speed 3304.84 samples/sec   Loss 1.8211   LearningRate 0.0253   Epoch: 9   Global Step: 165980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:46:54,716-Speed 3316.58 samples/sec   Loss 1.8012   LearningRate 0.0253   Epoch: 9   Global Step: 165990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:46:57,806-Speed 3315.72 samples/sec   Loss 1.7956   LearningRate 0.0253   Epoch: 9   Global Step: 166000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:47:41,409-[lfw][166000]XNorm: 20.707809
Training: 2022-04-11 16:47:41,409-[lfw][166000]Accuracy-Flip: 0.99800+-0.00256
Training: 2022-04-11 16:47:41,410-[lfw][166000]Accuracy-Highest: 0.99817
Training: 2022-04-11 16:48:32,202-[cfp_fp][166000]XNorm: 20.603376
Training: 2022-04-11 16:48:32,202-[cfp_fp][166000]Accuracy-Flip: 0.98914+-0.00516
Training: 2022-04-11 16:48:32,203-[cfp_fp][166000]Accuracy-Highest: 0.98971
Training: 2022-04-11 16:49:15,763-[agedb_30][166000]XNorm: 21.331133
Training: 2022-04-11 16:49:15,764-[agedb_30][166000]Accuracy-Flip: 0.98250+-0.00588
Training: 2022-04-11 16:49:15,764-[agedb_30][166000]Accuracy-Highest: 0.98450
Training: 2022-04-11 16:49:18,842-Speed 72.61 samples/sec   Loss 1.7806   LearningRate 0.0253   Epoch: 9   Global Step: 166010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:49:21,944-Speed 3301.95 samples/sec   Loss 1.8019   LearningRate 0.0253   Epoch: 9   Global Step: 166020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:49:25,017-Speed 3332.75 samples/sec   Loss 1.7832   LearningRate 0.0253   Epoch: 9   Global Step: 166030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:49:28,090-Speed 3332.80 samples/sec   Loss 1.7342   LearningRate 0.0253   Epoch: 9   Global Step: 166040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:49:31,163-Speed 3332.71 samples/sec   Loss 1.8095   LearningRate 0.0253   Epoch: 9   Global Step: 166050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:49:34,239-Speed 3329.79 samples/sec   Loss 1.6998   LearningRate 0.0253   Epoch: 9   Global Step: 166060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:49:37,311-Speed 3334.08 samples/sec   Loss 1.7770   LearningRate 0.0253   Epoch: 9   Global Step: 166070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:49:40,403-Speed 3312.56 samples/sec   Loss 1.7969   LearningRate 0.0252   Epoch: 9   Global Step: 166080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:49:43,470-Speed 3339.54 samples/sec   Loss 1.7576   LearningRate 0.0252   Epoch: 9   Global Step: 166090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:49:46,547-Speed 3328.94 samples/sec   Loss 1.8120   LearningRate 0.0252   Epoch: 9   Global Step: 166100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:49:49,633-Speed 3318.85 samples/sec   Loss 1.7707   LearningRate 0.0252   Epoch: 9   Global Step: 166110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:49:52,693-Speed 3347.62 samples/sec   Loss 1.7765   LearningRate 0.0252   Epoch: 9   Global Step: 166120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:49:55,786-Speed 3310.90 samples/sec   Loss 1.7771   LearningRate 0.0252   Epoch: 9   Global Step: 166130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:49:58,859-Speed 3333.82 samples/sec   Loss 1.7969   LearningRate 0.0252   Epoch: 9   Global Step: 166140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:01,965-Speed 3296.94 samples/sec   Loss 1.7495   LearningRate 0.0252   Epoch: 9   Global Step: 166150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:05,044-Speed 3326.77 samples/sec   Loss 1.7940   LearningRate 0.0252   Epoch: 9   Global Step: 166160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:08,127-Speed 3322.18 samples/sec   Loss 1.8329   LearningRate 0.0252   Epoch: 9   Global Step: 166170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:11,255-Speed 3274.31 samples/sec   Loss 1.7961   LearningRate 0.0252   Epoch: 9   Global Step: 166180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:14,374-Speed 3283.89 samples/sec   Loss 1.8082   LearningRate 0.0252   Epoch: 9   Global Step: 166190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:17,536-Speed 3239.15 samples/sec   Loss 1.7570   LearningRate 0.0252   Epoch: 9   Global Step: 166200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:20,615-Speed 3326.67 samples/sec   Loss 1.8332   LearningRate 0.0252   Epoch: 9   Global Step: 166210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:23,693-Speed 3326.95 samples/sec   Loss 1.7840   LearningRate 0.0252   Epoch: 9   Global Step: 166220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:50:26,756-Speed 3344.29 samples/sec   Loss 1.8362   LearningRate 0.0252   Epoch: 9   Global Step: 166230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:29,831-Speed 3331.18 samples/sec   Loss 1.7907   LearningRate 0.0252   Epoch: 9   Global Step: 166240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:32,939-Speed 3294.73 samples/sec   Loss 1.8170   LearningRate 0.0252   Epoch: 9   Global Step: 166250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:36,013-Speed 3332.47 samples/sec   Loss 1.7648   LearningRate 0.0252   Epoch: 9   Global Step: 166260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:39,236-Speed 3178.10 samples/sec   Loss 1.8211   LearningRate 0.0252   Epoch: 9   Global Step: 166270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:42,419-Speed 3218.06 samples/sec   Loss 1.8217   LearningRate 0.0252   Epoch: 9   Global Step: 166280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:45,516-Speed 3307.08 samples/sec   Loss 1.8123   LearningRate 0.0252   Epoch: 9   Global Step: 166290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:48,615-Speed 3304.58 samples/sec   Loss 1.7436   LearningRate 0.0252   Epoch: 9   Global Step: 166300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:51,733-Speed 3284.38 samples/sec   Loss 1.7464   LearningRate 0.0252   Epoch: 9   Global Step: 166310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:54,831-Speed 3306.37 samples/sec   Loss 1.8014   LearningRate 0.0252   Epoch: 9   Global Step: 166320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:50:57,921-Speed 3314.95 samples/sec   Loss 1.8309   LearningRate 0.0252   Epoch: 9   Global Step: 166330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:51:01,078-Speed 3243.64 samples/sec   Loss 1.7785   LearningRate 0.0252   Epoch: 9   Global Step: 166340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:51:04,164-Speed 3319.73 samples/sec   Loss 1.7717   LearningRate 0.0252   Epoch: 9   Global Step: 166350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:51:07,255-Speed 3313.75 samples/sec   Loss 1.7863   LearningRate 0.0252   Epoch: 9   Global Step: 166360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:10,400-Speed 3256.92 samples/sec   Loss 1.7626   LearningRate 0.0252   Epoch: 9   Global Step: 166370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:13,473-Speed 3332.20 samples/sec   Loss 1.7234   LearningRate 0.0252   Epoch: 9   Global Step: 166380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:16,550-Speed 3328.44 samples/sec   Loss 1.8151   LearningRate 0.0252   Epoch: 9   Global Step: 166390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:19,622-Speed 3334.18 samples/sec   Loss 1.8239   LearningRate 0.0252   Epoch: 9   Global Step: 166400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:22,719-Speed 3306.99 samples/sec   Loss 1.7422   LearningRate 0.0252   Epoch: 9   Global Step: 166410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:25,800-Speed 3324.48 samples/sec   Loss 1.7777   LearningRate 0.0251   Epoch: 9   Global Step: 166420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:28,890-Speed 3315.34 samples/sec   Loss 1.8117   LearningRate 0.0251   Epoch: 9   Global Step: 166430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:31,964-Speed 3331.44 samples/sec   Loss 1.8002   LearningRate 0.0251   Epoch: 9   Global Step: 166440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:35,034-Speed 3336.87 samples/sec   Loss 1.8134   LearningRate 0.0251   Epoch: 9   Global Step: 166450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:38,130-Speed 3308.05 samples/sec   Loss 1.7765   LearningRate 0.0251   Epoch: 9   Global Step: 166460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:51:41,203-Speed 3333.13 samples/sec   Loss 1.7742   LearningRate 0.0251   Epoch: 9   Global Step: 166470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:51:44,265-Speed 3344.64 samples/sec   Loss 1.8496   LearningRate 0.0251   Epoch: 9   Global Step: 166480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:47,336-Speed 3335.42 samples/sec   Loss 1.8026   LearningRate 0.0251   Epoch: 9   Global Step: 166490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:50,407-Speed 3334.89 samples/sec   Loss 1.7950   LearningRate 0.0251   Epoch: 9   Global Step: 166500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:53,484-Speed 3328.53 samples/sec   Loss 1.7731   LearningRate 0.0251   Epoch: 9   Global Step: 166510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:56,555-Speed 3334.44 samples/sec   Loss 1.7601   LearningRate 0.0251   Epoch: 9   Global Step: 166520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:51:59,628-Speed 3334.33 samples/sec   Loss 1.7778   LearningRate 0.0251   Epoch: 9   Global Step: 166530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:52:02,703-Speed 3330.48 samples/sec   Loss 1.7433   LearningRate 0.0251   Epoch: 9   Global Step: 166540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:52:05,778-Speed 3330.20 samples/sec   Loss 1.7465   LearningRate 0.0251   Epoch: 9   Global Step: 166550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:52:08,850-Speed 3334.54 samples/sec   Loss 1.7851   LearningRate 0.0251   Epoch: 9   Global Step: 166560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:52:11,921-Speed 3334.68 samples/sec   Loss 1.7866   LearningRate 0.0251   Epoch: 9   Global Step: 166570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:52:15,007-Speed 3320.00 samples/sec   Loss 1.8139   LearningRate 0.0251   Epoch: 9   Global Step: 166580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:18,086-Speed 3326.20 samples/sec   Loss 1.7913   LearningRate 0.0251   Epoch: 9   Global Step: 166590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:21,172-Speed 3318.26 samples/sec   Loss 1.7514   LearningRate 0.0251   Epoch: 9   Global Step: 166600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:24,251-Speed 3327.55 samples/sec   Loss 1.7352   LearningRate 0.0251   Epoch: 9   Global Step: 166610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:27,325-Speed 3331.91 samples/sec   Loss 1.7377   LearningRate 0.0251   Epoch: 9   Global Step: 166620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:30,481-Speed 3244.57 samples/sec   Loss 1.8269   LearningRate 0.0251   Epoch: 9   Global Step: 166630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:33,573-Speed 3312.27 samples/sec   Loss 1.8085   LearningRate 0.0251   Epoch: 9   Global Step: 166640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:36,659-Speed 3319.79 samples/sec   Loss 1.7972   LearningRate 0.0251   Epoch: 9   Global Step: 166650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:39,735-Speed 3328.80 samples/sec   Loss 1.6850   LearningRate 0.0251   Epoch: 9   Global Step: 166660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:42,815-Speed 3326.29 samples/sec   Loss 1.7984   LearningRate 0.0251   Epoch: 9   Global Step: 166670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:45,876-Speed 3345.42 samples/sec   Loss 1.7888   LearningRate 0.0251   Epoch: 9   Global Step: 166680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:52:49,036-Speed 3241.17 samples/sec   Loss 1.8141   LearningRate 0.0251   Epoch: 9   Global Step: 166690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:52:52,133-Speed 3307.86 samples/sec   Loss 1.7956   LearningRate 0.0251   Epoch: 9   Global Step: 166700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:52:55,238-Speed 3298.75 samples/sec   Loss 1.6907   LearningRate 0.0251   Epoch: 9   Global Step: 166710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:52:58,337-Speed 3304.22 samples/sec   Loss 1.7779   LearningRate 0.0251   Epoch: 9   Global Step: 166720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:53:01,453-Speed 3287.69 samples/sec   Loss 1.7991   LearningRate 0.0251   Epoch: 9   Global Step: 166730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:53:04,548-Speed 3309.12 samples/sec   Loss 1.7417   LearningRate 0.0251   Epoch: 9   Global Step: 166740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:53:07,632-Speed 3321.10 samples/sec   Loss 1.8020   LearningRate 0.0250   Epoch: 9   Global Step: 166750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:53:10,742-Speed 3293.16 samples/sec   Loss 1.7807   LearningRate 0.0250   Epoch: 9   Global Step: 166760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:53:13,925-Speed 3217.10 samples/sec   Loss 1.7946   LearningRate 0.0250   Epoch: 9   Global Step: 166770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:53:16,998-Speed 3334.04 samples/sec   Loss 1.7595   LearningRate 0.0250   Epoch: 9   Global Step: 166780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:53:20,090-Speed 3312.26 samples/sec   Loss 1.7843   LearningRate 0.0250   Epoch: 9   Global Step: 166790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:23,165-Speed 3330.77 samples/sec   Loss 1.8079   LearningRate 0.0250   Epoch: 9   Global Step: 166800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:26,244-Speed 3326.35 samples/sec   Loss 1.7004   LearningRate 0.0250   Epoch: 9   Global Step: 166810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:29,326-Speed 3323.66 samples/sec   Loss 1.7257   LearningRate 0.0250   Epoch: 9   Global Step: 166820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:32,398-Speed 3333.25 samples/sec   Loss 1.7311   LearningRate 0.0250   Epoch: 9   Global Step: 166830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:35,479-Speed 3324.31 samples/sec   Loss 1.8055   LearningRate 0.0250   Epoch: 9   Global Step: 166840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:38,576-Speed 3307.14 samples/sec   Loss 1.7849   LearningRate 0.0250   Epoch: 9   Global Step: 166850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:41,663-Speed 3318.80 samples/sec   Loss 1.8298   LearningRate 0.0250   Epoch: 9   Global Step: 166860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:44,740-Speed 3328.09 samples/sec   Loss 1.8188   LearningRate 0.0250   Epoch: 9   Global Step: 166870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:47,822-Speed 3323.32 samples/sec   Loss 1.7105   LearningRate 0.0250   Epoch: 9   Global Step: 166880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:50,901-Speed 3327.38 samples/sec   Loss 1.7594   LearningRate 0.0250   Epoch: 9   Global Step: 166890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:53:54,557-Speed 2800.96 samples/sec   Loss 1.7852   LearningRate 0.0250   Epoch: 9   Global Step: 166900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:53:57,742-Speed 3215.26 samples/sec   Loss 1.7413   LearningRate 0.0250   Epoch: 9   Global Step: 166910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:54:27,360-Speed 345.75 samples/sec   Loss 1.3104   LearningRate 0.0250   Epoch: 10   Global Step: 166920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:54:30,533-Speed 3227.62 samples/sec   Loss 1.2835   LearningRate 0.0250   Epoch: 10   Global Step: 166930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:54:33,647-Speed 3289.35 samples/sec   Loss 1.2823   LearningRate 0.0250   Epoch: 10   Global Step: 166940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:54:36,727-Speed 3325.69 samples/sec   Loss 1.2979   LearningRate 0.0250   Epoch: 10   Global Step: 166950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:54:39,823-Speed 3308.93 samples/sec   Loss 1.2461   LearningRate 0.0250   Epoch: 10   Global Step: 166960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:54:42,987-Speed 3236.37 samples/sec   Loss 1.2635   LearningRate 0.0250   Epoch: 10   Global Step: 166970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:54:46,245-Speed 3143.62 samples/sec   Loss 1.2903   LearningRate 0.0250   Epoch: 10   Global Step: 166980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:54:49,479-Speed 3166.95 samples/sec   Loss 1.2744   LearningRate 0.0250   Epoch: 10   Global Step: 166990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:54:52,610-Speed 3271.68 samples/sec   Loss 1.2661   LearningRate 0.0250   Epoch: 10   Global Step: 167000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:54:55,720-Speed 3292.52 samples/sec   Loss 1.2618   LearningRate 0.0250   Epoch: 10   Global Step: 167010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:54:58,805-Speed 3320.67 samples/sec   Loss 1.3172   LearningRate 0.0250   Epoch: 10   Global Step: 167020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:55:01,884-Speed 3325.99 samples/sec   Loss 1.3104   LearningRate 0.0250   Epoch: 10   Global Step: 167030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:55:05,042-Speed 3243.77 samples/sec   Loss 1.2577   LearningRate 0.0250   Epoch: 10   Global Step: 167040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:55:08,149-Speed 3296.77 samples/sec   Loss 1.2985   LearningRate 0.0250   Epoch: 10   Global Step: 167050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:55:11,280-Speed 3271.47 samples/sec   Loss 1.2548   LearningRate 0.0250   Epoch: 10   Global Step: 167060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:55:14,514-Speed 3166.86 samples/sec   Loss 1.2702   LearningRate 0.0250   Epoch: 10   Global Step: 167070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:17,660-Speed 3255.50 samples/sec   Loss 1.2967   LearningRate 0.0249   Epoch: 10   Global Step: 167080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:21,684-Speed 2545.26 samples/sec   Loss 1.2470   LearningRate 0.0249   Epoch: 10   Global Step: 167090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:24,928-Speed 3156.60 samples/sec   Loss 1.3581   LearningRate 0.0249   Epoch: 10   Global Step: 167100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:28,023-Speed 3310.02 samples/sec   Loss 1.2769   LearningRate 0.0249   Epoch: 10   Global Step: 167110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:31,117-Speed 3310.81 samples/sec   Loss 1.2971   LearningRate 0.0249   Epoch: 10   Global Step: 167120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:34,213-Speed 3307.48 samples/sec   Loss 1.3260   LearningRate 0.0249   Epoch: 10   Global Step: 167130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:37,356-Speed 3259.02 samples/sec   Loss 1.2831   LearningRate 0.0249   Epoch: 10   Global Step: 167140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:40,499-Speed 3258.63 samples/sec   Loss 1.2374   LearningRate 0.0249   Epoch: 10   Global Step: 167150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:43,599-Speed 3304.24 samples/sec   Loss 1.2932   LearningRate 0.0249   Epoch: 10   Global Step: 167160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:55:46,747-Speed 3253.99 samples/sec   Loss 1.2925   LearningRate 0.0249   Epoch: 10   Global Step: 167170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:55:49,876-Speed 3273.12 samples/sec   Loss 1.2588   LearningRate 0.0249   Epoch: 10   Global Step: 167180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:55:53,033-Speed 3244.53 samples/sec   Loss 1.2353   LearningRate 0.0249   Epoch: 10   Global Step: 167190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:55:56,120-Speed 3317.86 samples/sec   Loss 1.2914   LearningRate 0.0249   Epoch: 10   Global Step: 167200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:55:59,214-Speed 3310.66 samples/sec   Loss 1.2575   LearningRate 0.0249   Epoch: 10   Global Step: 167210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:02,306-Speed 3312.38 samples/sec   Loss 1.3123   LearningRate 0.0249   Epoch: 10   Global Step: 167220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:05,386-Speed 3325.11 samples/sec   Loss 1.2873   LearningRate 0.0249   Epoch: 10   Global Step: 167230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:08,478-Speed 3312.37 samples/sec   Loss 1.2849   LearningRate 0.0249   Epoch: 10   Global Step: 167240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:11,577-Speed 3304.63 samples/sec   Loss 1.3070   LearningRate 0.0249   Epoch: 10   Global Step: 167250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:14,662-Speed 3320.54 samples/sec   Loss 1.2645   LearningRate 0.0249   Epoch: 10   Global Step: 167260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:17,760-Speed 3305.70 samples/sec   Loss 1.2692   LearningRate 0.0249   Epoch: 10   Global Step: 167270   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:56:20,941-Speed 3220.39 samples/sec   Loss 1.2801   LearningRate 0.0249   Epoch: 10   Global Step: 167280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:24,165-Speed 3176.74 samples/sec   Loss 1.2894   LearningRate 0.0249   Epoch: 10   Global Step: 167290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:27,291-Speed 3275.98 samples/sec   Loss 1.2777   LearningRate 0.0249   Epoch: 10   Global Step: 167300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:30,380-Speed 3316.30 samples/sec   Loss 1.2462   LearningRate 0.0249   Epoch: 10   Global Step: 167310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:56:33,479-Speed 3304.81 samples/sec   Loss 1.3008   LearningRate 0.0249   Epoch: 10   Global Step: 167320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:56:36,562-Speed 3322.23 samples/sec   Loss 1.2703   LearningRate 0.0249   Epoch: 10   Global Step: 167330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:56:39,643-Speed 3324.62 samples/sec   Loss 1.2871   LearningRate 0.0249   Epoch: 10   Global Step: 167340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:56:42,736-Speed 3310.82 samples/sec   Loss 1.2855   LearningRate 0.0249   Epoch: 10   Global Step: 167350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:56:45,833-Speed 3307.47 samples/sec   Loss 1.2939   LearningRate 0.0249   Epoch: 10   Global Step: 167360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:56:48,912-Speed 3326.71 samples/sec   Loss 1.2798   LearningRate 0.0249   Epoch: 10   Global Step: 167370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:56:52,048-Speed 3266.36 samples/sec   Loss 1.2838   LearningRate 0.0249   Epoch: 10   Global Step: 167380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:56:55,134-Speed 3318.44 samples/sec   Loss 1.2625   LearningRate 0.0249   Epoch: 10   Global Step: 167390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:56:58,214-Speed 3325.84 samples/sec   Loss 1.3151   LearningRate 0.0249   Epoch: 10   Global Step: 167400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:57:01,321-Speed 3296.50 samples/sec   Loss 1.2896   LearningRate 0.0249   Epoch: 10   Global Step: 167410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:57:04,462-Speed 3261.06 samples/sec   Loss 1.2732   LearningRate 0.0248   Epoch: 10   Global Step: 167420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:07,547-Speed 3319.24 samples/sec   Loss 1.3200   LearningRate 0.0248   Epoch: 10   Global Step: 167430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:10,697-Speed 3252.42 samples/sec   Loss 1.2901   LearningRate 0.0248   Epoch: 10   Global Step: 167440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:13,777-Speed 3324.93 samples/sec   Loss 1.3276   LearningRate 0.0248   Epoch: 10   Global Step: 167450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:16,859-Speed 3323.91 samples/sec   Loss 1.3348   LearningRate 0.0248   Epoch: 10   Global Step: 167460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:19,981-Speed 3280.46 samples/sec   Loss 1.2659   LearningRate 0.0248   Epoch: 10   Global Step: 167470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:23,065-Speed 3320.65 samples/sec   Loss 1.2579   LearningRate 0.0248   Epoch: 10   Global Step: 167480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:26,149-Speed 3320.88 samples/sec   Loss 1.2829   LearningRate 0.0248   Epoch: 10   Global Step: 167490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:29,237-Speed 3317.55 samples/sec   Loss 1.2797   LearningRate 0.0248   Epoch: 10   Global Step: 167500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:32,325-Speed 3316.57 samples/sec   Loss 1.3269   LearningRate 0.0248   Epoch: 10   Global Step: 167510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:35,410-Speed 3320.33 samples/sec   Loss 1.3302   LearningRate 0.0248   Epoch: 10   Global Step: 167520   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 16:57:38,488-Speed 3327.33 samples/sec   Loss 1.3167   LearningRate 0.0248   Epoch: 10   Global Step: 167530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:41,573-Speed 3320.68 samples/sec   Loss 1.3441   LearningRate 0.0248   Epoch: 10   Global Step: 167540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:57:44,643-Speed 3336.17 samples/sec   Loss 1.3116   LearningRate 0.0248   Epoch: 10   Global Step: 167550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:57:47,834-Speed 3209.25 samples/sec   Loss 1.3223   LearningRate 0.0248   Epoch: 10   Global Step: 167560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:57:50,930-Speed 3308.73 samples/sec   Loss 1.2965   LearningRate 0.0248   Epoch: 10   Global Step: 167570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:57:54,041-Speed 3291.41 samples/sec   Loss 1.2569   LearningRate 0.0248   Epoch: 10   Global Step: 167580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:57:57,186-Speed 3256.50 samples/sec   Loss 1.3041   LearningRate 0.0248   Epoch: 10   Global Step: 167590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:00,290-Speed 3300.63 samples/sec   Loss 1.3238   LearningRate 0.0248   Epoch: 10   Global Step: 167600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:03,381-Speed 3313.85 samples/sec   Loss 1.2797   LearningRate 0.0248   Epoch: 10   Global Step: 167610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:06,480-Speed 3305.36 samples/sec   Loss 1.3571   LearningRate 0.0248   Epoch: 10   Global Step: 167620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:09,617-Speed 3264.83 samples/sec   Loss 1.2934   LearningRate 0.0248   Epoch: 10   Global Step: 167630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:12,739-Speed 3281.08 samples/sec   Loss 1.3128   LearningRate 0.0248   Epoch: 10   Global Step: 167640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:15,806-Speed 3339.47 samples/sec   Loss 1.3127   LearningRate 0.0248   Epoch: 10   Global Step: 167650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:18,910-Speed 3299.64 samples/sec   Loss 1.2922   LearningRate 0.0248   Epoch: 10   Global Step: 167660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:22,004-Speed 3310.01 samples/sec   Loss 1.3657   LearningRate 0.0248   Epoch: 10   Global Step: 167670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:25,093-Speed 3315.51 samples/sec   Loss 1.3481   LearningRate 0.0248   Epoch: 10   Global Step: 167680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:28,217-Speed 3279.41 samples/sec   Loss 1.3235   LearningRate 0.0248   Epoch: 10   Global Step: 167690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:31,387-Speed 3231.26 samples/sec   Loss 1.3147   LearningRate 0.0248   Epoch: 10   Global Step: 167700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:34,513-Speed 3276.31 samples/sec   Loss 1.3461   LearningRate 0.0248   Epoch: 10   Global Step: 167710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:37,710-Speed 3203.66 samples/sec   Loss 1.3212   LearningRate 0.0248   Epoch: 10   Global Step: 167720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:40,814-Speed 3299.76 samples/sec   Loss 1.2903   LearningRate 0.0248   Epoch: 10   Global Step: 167730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:43,937-Speed 3279.58 samples/sec   Loss 1.2597   LearningRate 0.0248   Epoch: 10   Global Step: 167740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:58:47,039-Speed 3302.01 samples/sec   Loss 1.2752   LearningRate 0.0247   Epoch: 10   Global Step: 167750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:58:50,216-Speed 3223.72 samples/sec   Loss 1.3123   LearningRate 0.0247   Epoch: 10   Global Step: 167760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:58:53,315-Speed 3304.27 samples/sec   Loss 1.3246   LearningRate 0.0247   Epoch: 10   Global Step: 167770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:58:56,422-Speed 3296.55 samples/sec   Loss 1.3082   LearningRate 0.0247   Epoch: 10   Global Step: 167780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:58:59,506-Speed 3322.04 samples/sec   Loss 1.3055   LearningRate 0.0247   Epoch: 10   Global Step: 167790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:59:02,591-Speed 3320.19 samples/sec   Loss 1.3043   LearningRate 0.0247   Epoch: 10   Global Step: 167800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:59:05,687-Speed 3308.08 samples/sec   Loss 1.3184   LearningRate 0.0247   Epoch: 10   Global Step: 167810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:59:08,768-Speed 3323.29 samples/sec   Loss 1.2871   LearningRate 0.0247   Epoch: 10   Global Step: 167820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:59:11,868-Speed 3304.36 samples/sec   Loss 1.3359   LearningRate 0.0247   Epoch: 10   Global Step: 167830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:14,994-Speed 3277.23 samples/sec   Loss 1.3075   LearningRate 0.0247   Epoch: 10   Global Step: 167840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:18,133-Speed 3262.39 samples/sec   Loss 1.3627   LearningRate 0.0247   Epoch: 10   Global Step: 167850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:21,221-Speed 3316.82 samples/sec   Loss 1.3406   LearningRate 0.0247   Epoch: 10   Global Step: 167860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:24,332-Speed 3292.45 samples/sec   Loss 1.3450   LearningRate 0.0247   Epoch: 10   Global Step: 167870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:27,467-Speed 3267.47 samples/sec   Loss 1.3135   LearningRate 0.0247   Epoch: 10   Global Step: 167880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:30,550-Speed 3321.88 samples/sec   Loss 1.3423   LearningRate 0.0247   Epoch: 10   Global Step: 167890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:33,772-Speed 3179.39 samples/sec   Loss 1.2836   LearningRate 0.0247   Epoch: 10   Global Step: 167900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:36,932-Speed 3240.15 samples/sec   Loss 1.3402   LearningRate 0.0247   Epoch: 10   Global Step: 167910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:40,169-Speed 3164.92 samples/sec   Loss 1.3632   LearningRate 0.0247   Epoch: 10   Global Step: 167920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 16:59:43,265-Speed 3307.58 samples/sec   Loss 1.3090   LearningRate 0.0247   Epoch: 10   Global Step: 167930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:59:46,347-Speed 3323.94 samples/sec   Loss 1.3291   LearningRate 0.0247   Epoch: 10   Global Step: 167940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:59:49,432-Speed 3319.21 samples/sec   Loss 1.3282   LearningRate 0.0247   Epoch: 10   Global Step: 167950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:59:52,542-Speed 3293.53 samples/sec   Loss 1.3014   LearningRate 0.0247   Epoch: 10   Global Step: 167960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:59:55,670-Speed 3274.19 samples/sec   Loss 1.3300   LearningRate 0.0247   Epoch: 10   Global Step: 167970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 16:59:58,757-Speed 3318.56 samples/sec   Loss 1.3647   LearningRate 0.0247   Epoch: 10   Global Step: 167980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:00:01,859-Speed 3301.27 samples/sec   Loss 1.3139   LearningRate 0.0247   Epoch: 10   Global Step: 167990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:00:04,962-Speed 3300.78 samples/sec   Loss 1.3254   LearningRate 0.0247   Epoch: 10   Global Step: 168000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:00:49,123-[lfw][168000]XNorm: 23.595225
Training: 2022-04-11 17:00:49,124-[lfw][168000]Accuracy-Flip: 0.99817+-0.00252
Training: 2022-04-11 17:00:49,124-[lfw][168000]Accuracy-Highest: 0.99817
Training: 2022-04-11 17:01:39,972-[cfp_fp][168000]XNorm: 23.025112
Training: 2022-04-11 17:01:39,973-[cfp_fp][168000]Accuracy-Flip: 0.98771+-0.00475
Training: 2022-04-11 17:01:39,973-[cfp_fp][168000]Accuracy-Highest: 0.98971
Training: 2022-04-11 17:02:24,015-[agedb_30][168000]XNorm: 23.996105
Training: 2022-04-11 17:02:24,016-[agedb_30][168000]Accuracy-Flip: 0.98450+-0.00654
Training: 2022-04-11 17:02:24,016-[agedb_30][168000]Accuracy-Highest: 0.98450
Training: 2022-04-11 17:02:27,169-Speed 72.01 samples/sec   Loss 1.3266   LearningRate 0.0247   Epoch: 10   Global Step: 168010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:02:30,350-Speed 3219.70 samples/sec   Loss 1.3368   LearningRate 0.0247   Epoch: 10   Global Step: 168020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:02:33,462-Speed 3291.26 samples/sec   Loss 1.2936   LearningRate 0.0247   Epoch: 10   Global Step: 168030   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-11 17:02:36,533-Speed 3335.07 samples/sec   Loss 1.3259   LearningRate 0.0247   Epoch: 10   Global Step: 168040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:02:39,622-Speed 3315.12 samples/sec   Loss 1.3691   LearningRate 0.0247   Epoch: 10   Global Step: 168050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:02:42,726-Speed 3299.50 samples/sec   Loss 1.2938   LearningRate 0.0247   Epoch: 10   Global Step: 168060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:02:45,801-Speed 3331.32 samples/sec   Loss 1.3151   LearningRate 0.0247   Epoch: 10   Global Step: 168070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:02:48,875-Speed 3331.95 samples/sec   Loss 1.3497   LearningRate 0.0247   Epoch: 10   Global Step: 168080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:02:51,967-Speed 3313.18 samples/sec   Loss 1.3092   LearningRate 0.0246   Epoch: 10   Global Step: 168090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:02:55,050-Speed 3321.61 samples/sec   Loss 1.3092   LearningRate 0.0246   Epoch: 10   Global Step: 168100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:02:58,206-Speed 3245.20 samples/sec   Loss 1.3124   LearningRate 0.0246   Epoch: 10   Global Step: 168110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:01,280-Speed 3332.37 samples/sec   Loss 1.2978   LearningRate 0.0246   Epoch: 10   Global Step: 168120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:04,352-Speed 3333.61 samples/sec   Loss 1.3419   LearningRate 0.0246   Epoch: 10   Global Step: 168130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:07,420-Speed 3338.75 samples/sec   Loss 1.3605   LearningRate 0.0246   Epoch: 10   Global Step: 168140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:10,539-Speed 3284.06 samples/sec   Loss 1.3308   LearningRate 0.0246   Epoch: 10   Global Step: 168150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:13,611-Speed 3334.23 samples/sec   Loss 1.3010   LearningRate 0.0246   Epoch: 10   Global Step: 168160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:16,706-Speed 3309.09 samples/sec   Loss 1.3319   LearningRate 0.0246   Epoch: 10   Global Step: 168170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:19,779-Speed 3333.38 samples/sec   Loss 1.3400   LearningRate 0.0246   Epoch: 10   Global Step: 168180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:22,868-Speed 3315.82 samples/sec   Loss 1.3334   LearningRate 0.0246   Epoch: 10   Global Step: 168190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:25,945-Speed 3328.89 samples/sec   Loss 1.2987   LearningRate 0.0246   Epoch: 10   Global Step: 168200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:29,020-Speed 3330.61 samples/sec   Loss 1.3134   LearningRate 0.0246   Epoch: 10   Global Step: 168210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:32,106-Speed 3319.31 samples/sec   Loss 1.3132   LearningRate 0.0246   Epoch: 10   Global Step: 168220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:35,196-Speed 3314.28 samples/sec   Loss 1.3466   LearningRate 0.0246   Epoch: 10   Global Step: 168230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:38,280-Speed 3320.92 samples/sec   Loss 1.3627   LearningRate 0.0246   Epoch: 10   Global Step: 168240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:41,369-Speed 3315.90 samples/sec   Loss 1.3448   LearningRate 0.0246   Epoch: 10   Global Step: 168250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:44,454-Speed 3319.81 samples/sec   Loss 1.3355   LearningRate 0.0246   Epoch: 10   Global Step: 168260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:03:47,552-Speed 3306.48 samples/sec   Loss 1.2856   LearningRate 0.0246   Epoch: 10   Global Step: 168270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:50,633-Speed 3324.20 samples/sec   Loss 1.3608   LearningRate 0.0246   Epoch: 10   Global Step: 168280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:53,716-Speed 3322.29 samples/sec   Loss 1.3020   LearningRate 0.0246   Epoch: 10   Global Step: 168290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:56,816-Speed 3303.88 samples/sec   Loss 1.3273   LearningRate 0.0246   Epoch: 10   Global Step: 168300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:03:59,885-Speed 3337.83 samples/sec   Loss 1.2932   LearningRate 0.0246   Epoch: 10   Global Step: 168310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:03,059-Speed 3226.54 samples/sec   Loss 1.3198   LearningRate 0.0246   Epoch: 10   Global Step: 168320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:06,149-Speed 3314.89 samples/sec   Loss 1.3472   LearningRate 0.0246   Epoch: 10   Global Step: 168330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:09,223-Speed 3331.33 samples/sec   Loss 1.3672   LearningRate 0.0246   Epoch: 10   Global Step: 168340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:12,360-Speed 3265.28 samples/sec   Loss 1.3340   LearningRate 0.0246   Epoch: 10   Global Step: 168350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:15,479-Speed 3284.59 samples/sec   Loss 1.3292   LearningRate 0.0246   Epoch: 10   Global Step: 168360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:18,543-Speed 3342.75 samples/sec   Loss 1.3643   LearningRate 0.0246   Epoch: 10   Global Step: 168370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:21,720-Speed 3224.30 samples/sec   Loss 1.3301   LearningRate 0.0246   Epoch: 10   Global Step: 168380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:24,882-Speed 3238.47 samples/sec   Loss 1.4062   LearningRate 0.0246   Epoch: 10   Global Step: 168390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:28,036-Speed 3247.68 samples/sec   Loss 1.3037   LearningRate 0.0246   Epoch: 10   Global Step: 168400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:31,126-Speed 3314.22 samples/sec   Loss 1.3109   LearningRate 0.0246   Epoch: 10   Global Step: 168410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-11 17:04:34,345-Speed 3181.74 samples/sec   Loss 1.3408   LearningRate 0.0245   Epoch: 10   Global Step: 168420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:04:37,427-Speed 3323.58 samples/sec   Loss 1.3815   LearningRate 0.0245   Epoch: 10   Global Step: 168430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-11 17:04:40,540-Speed 3290.39 samples/sec   Loss 1.3445   LearningRate 0.0245   Epoch: 10   Global Step: 168440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:04:43,640-Speed 3303.52 samples/sec   Loss 1.3671   LearningRate 0.0245   Epoch: 10   Global Step: 168450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:04:46,737-Speed 3307.07 samples/sec   Loss 1.3490   LearningRate 0.0245   Epoch: 10   Global Step: 168460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:04:49,856-Speed 3284.59 samples/sec   Loss 1.3448   LearningRate 0.0245   Epoch: 10   Global Step: 168470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:04:53,071-Speed 3185.11 samples/sec   Loss 1.3659   LearningRate 0.0245   Epoch: 10   Global Step: 168480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:04:56,186-Speed 3287.88 samples/sec   Loss 1.3653   LearningRate 0.0245   Epoch: 10   Global Step: 168490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:04:59,270-Speed 3321.92 samples/sec   Loss 1.3749   LearningRate 0.0245   Epoch: 10   Global Step: 168500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:05:02,341-Speed 3334.34 samples/sec   Loss 1.3605   LearningRate 0.0245   Epoch: 10   Global Step: 168510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:05:05,411-Speed 3336.45 samples/sec   Loss 1.3589   LearningRate 0.0245   Epoch: 10   Global Step: 168520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:05:08,549-Speed 3264.46 samples/sec   Loss 1.3649   LearningRate 0.0245   Epoch: 10   Global Step: 168530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:05:11,618-Speed 3337.53 samples/sec   Loss 1.3688   LearningRate 0.0245   Epoch: 10   Global Step: 168540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:05:14,761-Speed 3258.91 samples/sec   Loss 1.4233   LearningRate 0.0245   Epoch: 10   Global Step: 168550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:05:17,909-Speed 3253.01 samples/sec   Loss 1.3962   LearningRate 0.0245   Epoch: 10   Global Step: 168560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:20,988-Speed 3326.56 samples/sec   Loss 1.3809   LearningRate 0.0245   Epoch: 10   Global Step: 168570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:24,054-Speed 3340.69 samples/sec   Loss 1.3018   LearningRate 0.0245   Epoch: 10   Global Step: 168580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:27,137-Speed 3322.21 samples/sec   Loss 1.3430   LearningRate 0.0245   Epoch: 10   Global Step: 168590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:30,213-Speed 3329.36 samples/sec   Loss 1.3679   LearningRate 0.0245   Epoch: 10   Global Step: 168600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:33,288-Speed 3331.67 samples/sec   Loss 1.3612   LearningRate 0.0245   Epoch: 10   Global Step: 168610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:36,396-Speed 3294.98 samples/sec   Loss 1.3738   LearningRate 0.0245   Epoch: 10   Global Step: 168620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:39,513-Speed 3286.65 samples/sec   Loss 1.3510   LearningRate 0.0245   Epoch: 10   Global Step: 168630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:42,647-Speed 3267.53 samples/sec   Loss 1.3733   LearningRate 0.0245   Epoch: 10   Global Step: 168640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:45,733-Speed 3319.30 samples/sec   Loss 1.3904   LearningRate 0.0245   Epoch: 10   Global Step: 168650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:05:48,802-Speed 3337.51 samples/sec   Loss 1.3472   LearningRate 0.0245   Epoch: 10   Global Step: 168660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:05:51,878-Speed 3329.80 samples/sec   Loss 1.3594   LearningRate 0.0245   Epoch: 10   Global Step: 168670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:05:54,969-Speed 3313.59 samples/sec   Loss 1.3664   LearningRate 0.0245   Epoch: 10   Global Step: 168680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:05:58,040-Speed 3334.74 samples/sec   Loss 1.4369   LearningRate 0.0245   Epoch: 10   Global Step: 168690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:06:01,110-Speed 3336.23 samples/sec   Loss 1.4143   LearningRate 0.0245   Epoch: 10   Global Step: 168700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:06:04,207-Speed 3307.95 samples/sec   Loss 1.4048   LearningRate 0.0245   Epoch: 10   Global Step: 168710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:06:07,280-Speed 3333.07 samples/sec   Loss 1.3585   LearningRate 0.0245   Epoch: 10   Global Step: 168720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:06:10,396-Speed 3286.91 samples/sec   Loss 1.3545   LearningRate 0.0245   Epoch: 10   Global Step: 168730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:06:13,510-Speed 3288.26 samples/sec   Loss 1.3693   LearningRate 0.0245   Epoch: 10   Global Step: 168740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:06:16,691-Speed 3220.10 samples/sec   Loss 1.3759   LearningRate 0.0245   Epoch: 10   Global Step: 168750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:06:19,839-Speed 3253.22 samples/sec   Loss 1.4400   LearningRate 0.0244   Epoch: 10   Global Step: 168760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:06:22,949-Speed 3293.73 samples/sec   Loss 1.3834   LearningRate 0.0244   Epoch: 10   Global Step: 168770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:06:26,024-Speed 3331.67 samples/sec   Loss 1.3520   LearningRate 0.0244   Epoch: 10   Global Step: 168780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:06:29,121-Speed 3307.26 samples/sec   Loss 1.3456   LearningRate 0.0244   Epoch: 10   Global Step: 168790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:06:32,210-Speed 3315.73 samples/sec   Loss 1.4077   LearningRate 0.0244   Epoch: 10   Global Step: 168800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:06:35,313-Speed 3300.31 samples/sec   Loss 1.3790   LearningRate 0.0244   Epoch: 10   Global Step: 168810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:06:38,490-Speed 3223.80 samples/sec   Loss 1.3807   LearningRate 0.0244   Epoch: 10   Global Step: 168820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:06:41,638-Speed 3253.84 samples/sec   Loss 1.3407   LearningRate 0.0244   Epoch: 10   Global Step: 168830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:06:44,730-Speed 3312.10 samples/sec   Loss 1.3751   LearningRate 0.0244   Epoch: 10   Global Step: 168840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:06:47,805-Speed 3331.22 samples/sec   Loss 1.3664   LearningRate 0.0244   Epoch: 10   Global Step: 168850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:06:50,874-Speed 3336.71 samples/sec   Loss 1.3641   LearningRate 0.0244   Epoch: 10   Global Step: 168860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:06:53,945-Speed 3336.06 samples/sec   Loss 1.3523   LearningRate 0.0244   Epoch: 10   Global Step: 168870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:06:57,034-Speed 3316.09 samples/sec   Loss 1.3939   LearningRate 0.0244   Epoch: 10   Global Step: 168880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:07:00,129-Speed 3308.56 samples/sec   Loss 1.4055   LearningRate 0.0244   Epoch: 10   Global Step: 168890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:07:03,202-Speed 3333.16 samples/sec   Loss 1.3710   LearningRate 0.0244   Epoch: 10   Global Step: 168900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:07:06,272-Speed 3337.00 samples/sec   Loss 1.3498   LearningRate 0.0244   Epoch: 10   Global Step: 168910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:07:09,454-Speed 3218.35 samples/sec   Loss 1.3889   LearningRate 0.0244   Epoch: 10   Global Step: 168920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:07:12,552-Speed 3307.03 samples/sec   Loss 1.4148   LearningRate 0.0244   Epoch: 10   Global Step: 168930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:15,647-Speed 3308.32 samples/sec   Loss 1.4167   LearningRate 0.0244   Epoch: 10   Global Step: 168940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:18,719-Speed 3334.08 samples/sec   Loss 1.3680   LearningRate 0.0244   Epoch: 10   Global Step: 168950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:21,847-Speed 3274.64 samples/sec   Loss 1.3636   LearningRate 0.0244   Epoch: 10   Global Step: 168960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:24,937-Speed 3315.98 samples/sec   Loss 1.3517   LearningRate 0.0244   Epoch: 10   Global Step: 168970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:28,010-Speed 3331.99 samples/sec   Loss 1.3911   LearningRate 0.0244   Epoch: 10   Global Step: 168980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:31,078-Speed 3338.78 samples/sec   Loss 1.3878   LearningRate 0.0244   Epoch: 10   Global Step: 168990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:34,196-Speed 3285.03 samples/sec   Loss 1.4052   LearningRate 0.0244   Epoch: 10   Global Step: 169000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:37,277-Speed 3324.53 samples/sec   Loss 1.3899   LearningRate 0.0244   Epoch: 10   Global Step: 169010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:40,357-Speed 3324.88 samples/sec   Loss 1.3480   LearningRate 0.0244   Epoch: 10   Global Step: 169020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:43,444-Speed 3318.25 samples/sec   Loss 1.4338   LearningRate 0.0244   Epoch: 10   Global Step: 169030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:07:46,520-Speed 3330.33 samples/sec   Loss 1.3605   LearningRate 0.0244   Epoch: 10   Global Step: 169040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:07:49,589-Speed 3337.44 samples/sec   Loss 1.3745   LearningRate 0.0244   Epoch: 10   Global Step: 169050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:07:52,650-Speed 3345.52 samples/sec   Loss 1.3356   LearningRate 0.0244   Epoch: 10   Global Step: 169060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:55,719-Speed 3337.26 samples/sec   Loss 1.4132   LearningRate 0.0244   Epoch: 10   Global Step: 169070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:07:58,819-Speed 3304.51 samples/sec   Loss 1.4004   LearningRate 0.0244   Epoch: 10   Global Step: 169080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:01,888-Speed 3336.64 samples/sec   Loss 1.3951   LearningRate 0.0244   Epoch: 10   Global Step: 169090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:05,001-Speed 3290.10 samples/sec   Loss 1.4142   LearningRate 0.0243   Epoch: 10   Global Step: 169100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:08,095-Speed 3310.83 samples/sec   Loss 1.4413   LearningRate 0.0243   Epoch: 10   Global Step: 169110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:11,168-Speed 3332.96 samples/sec   Loss 1.4797   LearningRate 0.0243   Epoch: 10   Global Step: 169120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:14,295-Speed 3276.11 samples/sec   Loss 1.4246   LearningRate 0.0243   Epoch: 10   Global Step: 169130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:17,380-Speed 3319.96 samples/sec   Loss 1.4033   LearningRate 0.0243   Epoch: 10   Global Step: 169140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:20,470-Speed 3315.23 samples/sec   Loss 1.4408   LearningRate 0.0243   Epoch: 10   Global Step: 169150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:23,532-Speed 3344.86 samples/sec   Loss 1.3881   LearningRate 0.0243   Epoch: 10   Global Step: 169160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:26,635-Speed 3299.94 samples/sec   Loss 1.3406   LearningRate 0.0243   Epoch: 10   Global Step: 169170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:29,706-Speed 3336.08 samples/sec   Loss 1.4035   LearningRate 0.0243   Epoch: 10   Global Step: 169180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:32,791-Speed 3319.66 samples/sec   Loss 1.4169   LearningRate 0.0243   Epoch: 10   Global Step: 169190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:35,880-Speed 3316.20 samples/sec   Loss 1.3784   LearningRate 0.0243   Epoch: 10   Global Step: 169200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:38,962-Speed 3323.41 samples/sec   Loss 1.4049   LearningRate 0.0243   Epoch: 10   Global Step: 169210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:42,053-Speed 3313.54 samples/sec   Loss 1.3780   LearningRate 0.0243   Epoch: 10   Global Step: 169220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:45,135-Speed 3323.49 samples/sec   Loss 1.3688   LearningRate 0.0243   Epoch: 10   Global Step: 169230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:48,211-Speed 3329.49 samples/sec   Loss 1.4216   LearningRate 0.0243   Epoch: 10   Global Step: 169240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:51,291-Speed 3325.42 samples/sec   Loss 1.4290   LearningRate 0.0243   Epoch: 10   Global Step: 169250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:08:54,388-Speed 3308.00 samples/sec   Loss 1.3770   LearningRate 0.0243   Epoch: 10   Global Step: 169260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:08:57,483-Speed 3308.58 samples/sec   Loss 1.3833   LearningRate 0.0243   Epoch: 10   Global Step: 169270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:09:00,543-Speed 3347.05 samples/sec   Loss 1.3783   LearningRate 0.0243   Epoch: 10   Global Step: 169280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:03,621-Speed 3327.39 samples/sec   Loss 1.3993   LearningRate 0.0243   Epoch: 10   Global Step: 169290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:06,696-Speed 3330.99 samples/sec   Loss 1.3587   LearningRate 0.0243   Epoch: 10   Global Step: 169300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:09,794-Speed 3305.99 samples/sec   Loss 1.3756   LearningRate 0.0243   Epoch: 10   Global Step: 169310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:12,903-Speed 3294.87 samples/sec   Loss 1.3740   LearningRate 0.0243   Epoch: 10   Global Step: 169320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:16,132-Speed 3172.83 samples/sec   Loss 1.4148   LearningRate 0.0243   Epoch: 10   Global Step: 169330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:19,225-Speed 3310.62 samples/sec   Loss 1.4043   LearningRate 0.0243   Epoch: 10   Global Step: 169340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:22,304-Speed 3326.17 samples/sec   Loss 1.4514   LearningRate 0.0243   Epoch: 10   Global Step: 169350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:25,377-Speed 3333.24 samples/sec   Loss 1.4157   LearningRate 0.0243   Epoch: 10   Global Step: 169360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:28,456-Speed 3326.48 samples/sec   Loss 1.4337   LearningRate 0.0243   Epoch: 10   Global Step: 169370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:09:31,569-Speed 3290.67 samples/sec   Loss 1.4083   LearningRate 0.0243   Epoch: 10   Global Step: 169380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:09:34,649-Speed 3325.43 samples/sec   Loss 1.4060   LearningRate 0.0243   Epoch: 10   Global Step: 169390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:09:37,734-Speed 3320.32 samples/sec   Loss 1.3826   LearningRate 0.0243   Epoch: 10   Global Step: 169400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:09:40,806-Speed 3334.31 samples/sec   Loss 1.3812   LearningRate 0.0243   Epoch: 10   Global Step: 169410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:09:43,879-Speed 3332.52 samples/sec   Loss 1.3810   LearningRate 0.0243   Epoch: 10   Global Step: 169420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:09:46,957-Speed 3327.92 samples/sec   Loss 1.4265   LearningRate 0.0243   Epoch: 10   Global Step: 169430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:09:50,032-Speed 3330.72 samples/sec   Loss 1.4190   LearningRate 0.0242   Epoch: 10   Global Step: 169440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:09:53,106-Speed 3331.81 samples/sec   Loss 1.4077   LearningRate 0.0242   Epoch: 10   Global Step: 169450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:09:56,184-Speed 3327.94 samples/sec   Loss 1.3757   LearningRate 0.0242   Epoch: 10   Global Step: 169460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:09:59,260-Speed 3328.84 samples/sec   Loss 1.3994   LearningRate 0.0242   Epoch: 10   Global Step: 169470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:02,340-Speed 3325.15 samples/sec   Loss 1.3963   LearningRate 0.0242   Epoch: 10   Global Step: 169480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:10:05,420-Speed 3326.34 samples/sec   Loss 1.3835   LearningRate 0.0242   Epoch: 10   Global Step: 169490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:08,493-Speed 3333.31 samples/sec   Loss 1.4298   LearningRate 0.0242   Epoch: 10   Global Step: 169500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:11,564-Speed 3334.79 samples/sec   Loss 1.3950   LearningRate 0.0242   Epoch: 10   Global Step: 169510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:14,643-Speed 3327.12 samples/sec   Loss 1.3748   LearningRate 0.0242   Epoch: 10   Global Step: 169520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:17,732-Speed 3315.17 samples/sec   Loss 1.4032   LearningRate 0.0242   Epoch: 10   Global Step: 169530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:20,863-Speed 3271.42 samples/sec   Loss 1.4220   LearningRate 0.0242   Epoch: 10   Global Step: 169540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:23,970-Speed 3296.57 samples/sec   Loss 1.3828   LearningRate 0.0242   Epoch: 10   Global Step: 169550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:27,055-Speed 3319.25 samples/sec   Loss 1.3600   LearningRate 0.0242   Epoch: 10   Global Step: 169560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:30,176-Speed 3282.50 samples/sec   Loss 1.3518   LearningRate 0.0242   Epoch: 10   Global Step: 169570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:33,247-Speed 3334.81 samples/sec   Loss 1.3816   LearningRate 0.0242   Epoch: 10   Global Step: 169580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:10:36,414-Speed 3234.16 samples/sec   Loss 1.3879   LearningRate 0.0242   Epoch: 10   Global Step: 169590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:10:39,489-Speed 3330.82 samples/sec   Loss 1.4456   LearningRate 0.0242   Epoch: 10   Global Step: 169600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:10:42,618-Speed 3273.71 samples/sec   Loss 1.4354   LearningRate 0.0242   Epoch: 10   Global Step: 169610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:10:45,709-Speed 3313.86 samples/sec   Loss 1.4465   LearningRate 0.0242   Epoch: 10   Global Step: 169620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:10:48,893-Speed 3216.19 samples/sec   Loss 1.3950   LearningRate 0.0242   Epoch: 10   Global Step: 169630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:10:52,053-Speed 3241.87 samples/sec   Loss 1.3834   LearningRate 0.0242   Epoch: 10   Global Step: 169640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:10:55,217-Speed 3236.58 samples/sec   Loss 1.4158   LearningRate 0.0242   Epoch: 10   Global Step: 169650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:10:58,368-Speed 3251.10 samples/sec   Loss 1.4196   LearningRate 0.0242   Epoch: 10   Global Step: 169660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:11:01,461-Speed 3311.58 samples/sec   Loss 1.4173   LearningRate 0.0242   Epoch: 10   Global Step: 169670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:04,557-Speed 3307.94 samples/sec   Loss 1.3546   LearningRate 0.0242   Epoch: 10   Global Step: 169680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:07,646-Speed 3316.50 samples/sec   Loss 1.4703   LearningRate 0.0242   Epoch: 10   Global Step: 169690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:10,725-Speed 3325.91 samples/sec   Loss 1.4319   LearningRate 0.0242   Epoch: 10   Global Step: 169700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:13,802-Speed 3329.06 samples/sec   Loss 1.4220   LearningRate 0.0242   Epoch: 10   Global Step: 169710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:16,899-Speed 3306.96 samples/sec   Loss 1.4154   LearningRate 0.0242   Epoch: 10   Global Step: 169720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:19,981-Speed 3322.94 samples/sec   Loss 1.4294   LearningRate 0.0242   Epoch: 10   Global Step: 169730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:23,082-Speed 3303.11 samples/sec   Loss 1.3816   LearningRate 0.0242   Epoch: 10   Global Step: 169740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:26,157-Speed 3330.64 samples/sec   Loss 1.4022   LearningRate 0.0242   Epoch: 10   Global Step: 169750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:29,265-Speed 3296.04 samples/sec   Loss 1.4210   LearningRate 0.0242   Epoch: 10   Global Step: 169760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:11:32,353-Speed 3316.26 samples/sec   Loss 1.3906   LearningRate 0.0242   Epoch: 10   Global Step: 169770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:11:35,545-Speed 3209.02 samples/sec   Loss 1.3879   LearningRate 0.0241   Epoch: 10   Global Step: 169780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:11:38,638-Speed 3311.13 samples/sec   Loss 1.4039   LearningRate 0.0241   Epoch: 10   Global Step: 169790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:11:41,728-Speed 3315.10 samples/sec   Loss 1.4326   LearningRate 0.0241   Epoch: 10   Global Step: 169800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:11:44,873-Speed 3256.55 samples/sec   Loss 1.4542   LearningRate 0.0241   Epoch: 10   Global Step: 169810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:11:48,083-Speed 3190.72 samples/sec   Loss 1.4414   LearningRate 0.0241   Epoch: 10   Global Step: 169820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:11:51,276-Speed 3208.27 samples/sec   Loss 1.4532   LearningRate 0.0241   Epoch: 10   Global Step: 169830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:11:54,425-Speed 3252.39 samples/sec   Loss 1.4391   LearningRate 0.0241   Epoch: 10   Global Step: 169840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:11:57,515-Speed 3314.85 samples/sec   Loss 1.3884   LearningRate 0.0241   Epoch: 10   Global Step: 169850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:12:00,594-Speed 3326.14 samples/sec   Loss 1.4581   LearningRate 0.0241   Epoch: 10   Global Step: 169860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:12:03,663-Speed 3338.09 samples/sec   Loss 1.4625   LearningRate 0.0241   Epoch: 10   Global Step: 169870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:12:06,775-Speed 3290.54 samples/sec   Loss 1.4025   LearningRate 0.0241   Epoch: 10   Global Step: 169880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:12:09,857-Speed 3323.72 samples/sec   Loss 1.4941   LearningRate 0.0241   Epoch: 10   Global Step: 169890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:12:14,987-Speed 1996.27 samples/sec   Loss 1.4560   LearningRate 0.0241   Epoch: 10   Global Step: 169900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:12:22,018-Speed 1456.58 samples/sec   Loss 1.4504   LearningRate 0.0241   Epoch: 10   Global Step: 169910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:12:25,201-Speed 3218.26 samples/sec   Loss 1.4234   LearningRate 0.0241   Epoch: 10   Global Step: 169920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:12:28,288-Speed 3317.36 samples/sec   Loss 1.4233   LearningRate 0.0241   Epoch: 10   Global Step: 169930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:12:31,370-Speed 3324.12 samples/sec   Loss 1.4321   LearningRate 0.0241   Epoch: 10   Global Step: 169940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:12:34,532-Speed 3238.75 samples/sec   Loss 1.4069   LearningRate 0.0241   Epoch: 10   Global Step: 169950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:12:37,699-Speed 3233.63 samples/sec   Loss 1.4478   LearningRate 0.0241   Epoch: 10   Global Step: 169960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:12:40,782-Speed 3322.48 samples/sec   Loss 1.4137   LearningRate 0.0241   Epoch: 10   Global Step: 169970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:12:43,874-Speed 3312.99 samples/sec   Loss 1.4407   LearningRate 0.0241   Epoch: 10   Global Step: 169980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:12:46,953-Speed 3326.18 samples/sec   Loss 1.3830   LearningRate 0.0241   Epoch: 10   Global Step: 169990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:12:50,034-Speed 3324.83 samples/sec   Loss 1.4325   LearningRate 0.0241   Epoch: 10   Global Step: 170000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:13:33,818-[lfw][170000]XNorm: 21.802826
Training: 2022-04-11 17:13:33,818-[lfw][170000]Accuracy-Flip: 0.99767+-0.00300
Training: 2022-04-11 17:13:33,819-[lfw][170000]Accuracy-Highest: 0.99817
Training: 2022-04-11 17:14:24,252-[cfp_fp][170000]XNorm: 21.593737
Training: 2022-04-11 17:14:24,253-[cfp_fp][170000]Accuracy-Flip: 0.98843+-0.00488
Training: 2022-04-11 17:14:24,253-[cfp_fp][170000]Accuracy-Highest: 0.98971
Training: 2022-04-11 17:15:07,644-[agedb_30][170000]XNorm: 22.317822
Training: 2022-04-11 17:15:07,645-[agedb_30][170000]Accuracy-Flip: 0.98283+-0.00610
Training: 2022-04-11 17:15:07,645-[agedb_30][170000]Accuracy-Highest: 0.98450
Training: 2022-04-11 17:15:10,733-Speed 72.78 samples/sec   Loss 1.4524   LearningRate 0.0241   Epoch: 10   Global Step: 170010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:15:13,797-Speed 3342.53 samples/sec   Loss 1.3830   LearningRate 0.0241   Epoch: 10   Global Step: 170020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:16,877-Speed 3325.75 samples/sec   Loss 1.4455   LearningRate 0.0241   Epoch: 10   Global Step: 170030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:19,940-Speed 3343.59 samples/sec   Loss 1.4704   LearningRate 0.0241   Epoch: 10   Global Step: 170040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:23,005-Speed 3341.70 samples/sec   Loss 1.4146   LearningRate 0.0241   Epoch: 10   Global Step: 170050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:26,106-Speed 3303.64 samples/sec   Loss 1.4536   LearningRate 0.0241   Epoch: 10   Global Step: 170060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:29,176-Speed 3336.00 samples/sec   Loss 1.4548   LearningRate 0.0241   Epoch: 10   Global Step: 170070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:32,263-Speed 3317.97 samples/sec   Loss 1.4422   LearningRate 0.0241   Epoch: 10   Global Step: 170080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:35,370-Speed 3296.21 samples/sec   Loss 1.4292   LearningRate 0.0241   Epoch: 10   Global Step: 170090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:38,448-Speed 3328.41 samples/sec   Loss 1.4279   LearningRate 0.0241   Epoch: 10   Global Step: 170100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:41,527-Speed 3326.02 samples/sec   Loss 1.4047   LearningRate 0.0241   Epoch: 10   Global Step: 170110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:15:44,599-Speed 3334.00 samples/sec   Loss 1.4312   LearningRate 0.0240   Epoch: 10   Global Step: 170120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:15:47,793-Speed 3206.91 samples/sec   Loss 1.3903   LearningRate 0.0240   Epoch: 10   Global Step: 170130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:15:50,881-Speed 3317.24 samples/sec   Loss 1.3886   LearningRate 0.0240   Epoch: 10   Global Step: 170140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:15:53,985-Speed 3299.52 samples/sec   Loss 1.4639   LearningRate 0.0240   Epoch: 10   Global Step: 170150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:15:57,157-Speed 3228.23 samples/sec   Loss 1.4769   LearningRate 0.0240   Epoch: 10   Global Step: 170160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:00,260-Speed 3300.96 samples/sec   Loss 1.3796   LearningRate 0.0240   Epoch: 10   Global Step: 170170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:03,331-Speed 3335.95 samples/sec   Loss 1.3919   LearningRate 0.0240   Epoch: 10   Global Step: 170180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:06,448-Speed 3285.60 samples/sec   Loss 1.4395   LearningRate 0.0240   Epoch: 10   Global Step: 170190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:09,549-Speed 3302.75 samples/sec   Loss 1.4476   LearningRate 0.0240   Epoch: 10   Global Step: 170200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:12,639-Speed 3315.31 samples/sec   Loss 1.4143   LearningRate 0.0240   Epoch: 10   Global Step: 170210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:15,717-Speed 3327.05 samples/sec   Loss 1.4670   LearningRate 0.0240   Epoch: 10   Global Step: 170220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:16:18,786-Speed 3336.99 samples/sec   Loss 1.4319   LearningRate 0.0240   Epoch: 10   Global Step: 170230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:16:22,009-Speed 3313.49 samples/sec   Loss 1.3873   LearningRate 0.0240   Epoch: 10   Global Step: 170240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:16:25,093-Speed 3321.22 samples/sec   Loss 1.3765   LearningRate 0.0240   Epoch: 10   Global Step: 170250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:16:28,165-Speed 3334.39 samples/sec   Loss 1.4184   LearningRate 0.0240   Epoch: 10   Global Step: 170260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:16:31,235-Speed 3337.13 samples/sec   Loss 1.3374   LearningRate 0.0240   Epoch: 10   Global Step: 170270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:16:34,306-Speed 3335.07 samples/sec   Loss 1.4700   LearningRate 0.0240   Epoch: 10   Global Step: 170280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:16:37,381-Speed 3331.74 samples/sec   Loss 1.4338   LearningRate 0.0240   Epoch: 10   Global Step: 170290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:16:40,481-Speed 3303.40 samples/sec   Loss 1.3737   LearningRate 0.0240   Epoch: 10   Global Step: 170300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:43,558-Speed 3328.74 samples/sec   Loss 1.4075   LearningRate 0.0240   Epoch: 10   Global Step: 170310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:46,625-Speed 3339.07 samples/sec   Loss 1.3937   LearningRate 0.0240   Epoch: 10   Global Step: 170320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:49,704-Speed 3327.86 samples/sec   Loss 1.4493   LearningRate 0.0240   Epoch: 10   Global Step: 170330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:52,776-Speed 3333.96 samples/sec   Loss 1.4136   LearningRate 0.0240   Epoch: 10   Global Step: 170340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:55,915-Speed 3262.94 samples/sec   Loss 1.4959   LearningRate 0.0240   Epoch: 10   Global Step: 170350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:16:59,075-Speed 3241.40 samples/sec   Loss 1.4641   LearningRate 0.0240   Epoch: 10   Global Step: 170360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:17:02,142-Speed 3339.43 samples/sec   Loss 1.4602   LearningRate 0.0240   Epoch: 10   Global Step: 170370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:17:05,214-Speed 3334.15 samples/sec   Loss 1.4407   LearningRate 0.0240   Epoch: 10   Global Step: 170380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:17:08,352-Speed 3265.58 samples/sec   Loss 1.5036   LearningRate 0.0240   Epoch: 10   Global Step: 170390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:17:11,504-Speed 3249.10 samples/sec   Loss 1.4301   LearningRate 0.0240   Epoch: 10   Global Step: 170400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:14,652-Speed 3254.05 samples/sec   Loss 1.4463   LearningRate 0.0240   Epoch: 10   Global Step: 170410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:17,732-Speed 3324.96 samples/sec   Loss 1.4553   LearningRate 0.0240   Epoch: 10   Global Step: 170420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:20,811-Speed 3327.87 samples/sec   Loss 1.3961   LearningRate 0.0240   Epoch: 10   Global Step: 170430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:23,879-Speed 3338.14 samples/sec   Loss 1.4544   LearningRate 0.0240   Epoch: 10   Global Step: 170440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:26,960-Speed 3324.16 samples/sec   Loss 1.4370   LearningRate 0.0240   Epoch: 10   Global Step: 170450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:30,027-Speed 3339.82 samples/sec   Loss 1.4244   LearningRate 0.0239   Epoch: 10   Global Step: 170460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:33,183-Speed 3245.26 samples/sec   Loss 1.3800   LearningRate 0.0239   Epoch: 10   Global Step: 170470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:36,370-Speed 3213.66 samples/sec   Loss 1.4566   LearningRate 0.0239   Epoch: 10   Global Step: 170480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:39,449-Speed 3326.38 samples/sec   Loss 1.4617   LearningRate 0.0239   Epoch: 10   Global Step: 170490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:42,522-Speed 3332.92 samples/sec   Loss 1.4286   LearningRate 0.0239   Epoch: 10   Global Step: 170500   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-04-11 17:17:45,595-Speed 3333.38 samples/sec   Loss 1.4806   LearningRate 0.0239   Epoch: 10   Global Step: 170510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:48,662-Speed 3339.15 samples/sec   Loss 1.4863   LearningRate 0.0239   Epoch: 10   Global Step: 170520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:51,765-Speed 3301.01 samples/sec   Loss 1.4504   LearningRate 0.0239   Epoch: 10   Global Step: 170530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:54,853-Speed 3316.63 samples/sec   Loss 1.4360   LearningRate 0.0239   Epoch: 10   Global Step: 170540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:17:57,924-Speed 3335.45 samples/sec   Loss 1.4011   LearningRate 0.0239   Epoch: 10   Global Step: 170550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:18:00,990-Speed 3340.35 samples/sec   Loss 1.4398   LearningRate 0.0239   Epoch: 10   Global Step: 170560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:18:04,088-Speed 3306.55 samples/sec   Loss 1.4322   LearningRate 0.0239   Epoch: 10   Global Step: 170570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:18:07,315-Speed 3174.18 samples/sec   Loss 1.4281   LearningRate 0.0239   Epoch: 10   Global Step: 170580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:18:10,421-Speed 3296.84 samples/sec   Loss 1.4931   LearningRate 0.0239   Epoch: 10   Global Step: 170590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:18:13,531-Speed 3293.66 samples/sec   Loss 1.4339   LearningRate 0.0239   Epoch: 10   Global Step: 170600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:18:16,626-Speed 3309.63 samples/sec   Loss 1.4558   LearningRate 0.0239   Epoch: 10   Global Step: 170610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:18:19,734-Speed 3295.43 samples/sec   Loss 1.4937   LearningRate 0.0239   Epoch: 10   Global Step: 170620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:18:22,802-Speed 3338.00 samples/sec   Loss 1.4513   LearningRate 0.0239   Epoch: 10   Global Step: 170630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:18:25,871-Speed 3339.40 samples/sec   Loss 1.4163   LearningRate 0.0239   Epoch: 10   Global Step: 170640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:18:28,962-Speed 3313.52 samples/sec   Loss 1.4903   LearningRate 0.0239   Epoch: 10   Global Step: 170650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:18:32,054-Speed 3312.15 samples/sec   Loss 1.4448   LearningRate 0.0239   Epoch: 10   Global Step: 170660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:18:35,137-Speed 3322.30 samples/sec   Loss 1.5242   LearningRate 0.0239   Epoch: 10   Global Step: 170670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:18:38,212-Speed 3331.19 samples/sec   Loss 1.4618   LearningRate 0.0239   Epoch: 10   Global Step: 170680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:18:41,351-Speed 3262.58 samples/sec   Loss 1.5648   LearningRate 0.0239   Epoch: 10   Global Step: 170690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:18:44,585-Speed 3167.47 samples/sec   Loss 1.4587   LearningRate 0.0239   Epoch: 10   Global Step: 170700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:18:47,702-Speed 3285.27 samples/sec   Loss 1.4586   LearningRate 0.0239   Epoch: 10   Global Step: 170710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:18:50,775-Speed 3333.71 samples/sec   Loss 1.4629   LearningRate 0.0239   Epoch: 10   Global Step: 170720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:18:53,878-Speed 3300.74 samples/sec   Loss 1.4373   LearningRate 0.0239   Epoch: 10   Global Step: 170730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:18:56,975-Speed 3307.54 samples/sec   Loss 1.4805   LearningRate 0.0239   Epoch: 10   Global Step: 170740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:19:00,200-Speed 3175.79 samples/sec   Loss 1.4241   LearningRate 0.0239   Epoch: 10   Global Step: 170750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:19:03,298-Speed 3306.31 samples/sec   Loss 1.4470   LearningRate 0.0239   Epoch: 10   Global Step: 170760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:06,377-Speed 3326.25 samples/sec   Loss 1.4163   LearningRate 0.0239   Epoch: 10   Global Step: 170770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:09,571-Speed 3206.82 samples/sec   Loss 1.4553   LearningRate 0.0239   Epoch: 10   Global Step: 170780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:12,671-Speed 3303.99 samples/sec   Loss 1.4300   LearningRate 0.0239   Epoch: 10   Global Step: 170790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:15,756-Speed 3319.91 samples/sec   Loss 1.4169   LearningRate 0.0238   Epoch: 10   Global Step: 170800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:18,835-Speed 3326.48 samples/sec   Loss 1.4763   LearningRate 0.0238   Epoch: 10   Global Step: 170810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:21,976-Speed 3261.24 samples/sec   Loss 1.4126   LearningRate 0.0238   Epoch: 10   Global Step: 170820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:25,174-Speed 3202.48 samples/sec   Loss 1.4060   LearningRate 0.0238   Epoch: 10   Global Step: 170830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:28,245-Speed 3336.14 samples/sec   Loss 1.4702   LearningRate 0.0238   Epoch: 10   Global Step: 170840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:31,320-Speed 3330.67 samples/sec   Loss 1.4221   LearningRate 0.0238   Epoch: 10   Global Step: 170850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:19:34,393-Speed 3333.15 samples/sec   Loss 1.4442   LearningRate 0.0238   Epoch: 10   Global Step: 170860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:19:37,518-Speed 3277.55 samples/sec   Loss 1.4312   LearningRate 0.0238   Epoch: 10   Global Step: 170870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:19:40,689-Speed 3229.21 samples/sec   Loss 1.4351   LearningRate 0.0238   Epoch: 10   Global Step: 170880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:19:43,834-Speed 3257.05 samples/sec   Loss 1.4327   LearningRate 0.0238   Epoch: 10   Global Step: 170890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:19:46,904-Speed 3336.33 samples/sec   Loss 1.4806   LearningRate 0.0238   Epoch: 10   Global Step: 170900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:19:50,008-Speed 3299.43 samples/sec   Loss 1.4478   LearningRate 0.0238   Epoch: 10   Global Step: 170910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:19:53,084-Speed 3329.72 samples/sec   Loss 1.4314   LearningRate 0.0238   Epoch: 10   Global Step: 170920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:19:56,157-Speed 3334.26 samples/sec   Loss 1.4586   LearningRate 0.0238   Epoch: 10   Global Step: 170930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:19:59,226-Speed 3336.56 samples/sec   Loss 1.4252   LearningRate 0.0238   Epoch: 10   Global Step: 170940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:20:02,286-Speed 3346.99 samples/sec   Loss 1.4526   LearningRate 0.0238   Epoch: 10   Global Step: 170950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:05,367-Speed 3324.79 samples/sec   Loss 1.4299   LearningRate 0.0238   Epoch: 10   Global Step: 170960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:08,464-Speed 3307.43 samples/sec   Loss 1.4819   LearningRate 0.0238   Epoch: 10   Global Step: 170970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:11,533-Speed 3337.32 samples/sec   Loss 1.4491   LearningRate 0.0238   Epoch: 10   Global Step: 170980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:14,634-Speed 3303.33 samples/sec   Loss 1.4636   LearningRate 0.0238   Epoch: 10   Global Step: 170990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:17,709-Speed 3330.45 samples/sec   Loss 1.4699   LearningRate 0.0238   Epoch: 10   Global Step: 171000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:20,789-Speed 3324.71 samples/sec   Loss 1.4915   LearningRate 0.0238   Epoch: 10   Global Step: 171010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:23,864-Speed 3331.02 samples/sec   Loss 1.4552   LearningRate 0.0238   Epoch: 10   Global Step: 171020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:26,942-Speed 3328.17 samples/sec   Loss 1.4631   LearningRate 0.0238   Epoch: 10   Global Step: 171030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:30,025-Speed 3322.27 samples/sec   Loss 1.3988   LearningRate 0.0238   Epoch: 10   Global Step: 171040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:33,099-Speed 3331.96 samples/sec   Loss 1.4782   LearningRate 0.0238   Epoch: 10   Global Step: 171050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:20:36,305-Speed 3336.04 samples/sec   Loss 1.4818   LearningRate 0.0238   Epoch: 10   Global Step: 171060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:39,378-Speed 3332.56 samples/sec   Loss 1.5132   LearningRate 0.0238   Epoch: 10   Global Step: 171070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:42,464-Speed 3319.97 samples/sec   Loss 1.4679   LearningRate 0.0238   Epoch: 10   Global Step: 171080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:45,532-Speed 3337.66 samples/sec   Loss 1.4576   LearningRate 0.0238   Epoch: 10   Global Step: 171090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:48,609-Speed 3328.61 samples/sec   Loss 1.4669   LearningRate 0.0238   Epoch: 10   Global Step: 171100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:51,692-Speed 3322.53 samples/sec   Loss 1.4707   LearningRate 0.0238   Epoch: 10   Global Step: 171110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:54,777-Speed 3320.31 samples/sec   Loss 1.4830   LearningRate 0.0238   Epoch: 10   Global Step: 171120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:20:57,862-Speed 3320.31 samples/sec   Loss 1.4539   LearningRate 0.0238   Epoch: 10   Global Step: 171130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:21:00,953-Speed 3313.37 samples/sec   Loss 1.4330   LearningRate 0.0237   Epoch: 10   Global Step: 171140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:21:04,031-Speed 3327.96 samples/sec   Loss 1.4595   LearningRate 0.0237   Epoch: 10   Global Step: 171150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:21:07,318-Speed 3317.11 samples/sec   Loss 1.4567   LearningRate 0.0237   Epoch: 10   Global Step: 171160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:10,423-Speed 3298.25 samples/sec   Loss 1.5017   LearningRate 0.0237   Epoch: 10   Global Step: 171170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:13,510-Speed 3318.42 samples/sec   Loss 1.4643   LearningRate 0.0237   Epoch: 10   Global Step: 171180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:16,581-Speed 3334.58 samples/sec   Loss 1.4819   LearningRate 0.0237   Epoch: 10   Global Step: 171190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:19,671-Speed 3314.78 samples/sec   Loss 1.4528   LearningRate 0.0237   Epoch: 10   Global Step: 171200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:22,867-Speed 3205.48 samples/sec   Loss 1.4831   LearningRate 0.0237   Epoch: 10   Global Step: 171210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:25,938-Speed 3335.20 samples/sec   Loss 1.5414   LearningRate 0.0237   Epoch: 10   Global Step: 171220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:29,009-Speed 3334.85 samples/sec   Loss 1.5228   LearningRate 0.0237   Epoch: 10   Global Step: 171230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:32,094-Speed 3320.06 samples/sec   Loss 1.4536   LearningRate 0.0237   Epoch: 10   Global Step: 171240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:35,196-Speed 3301.87 samples/sec   Loss 1.4726   LearningRate 0.0237   Epoch: 10   Global Step: 171250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:38,282-Speed 3318.58 samples/sec   Loss 1.4663   LearningRate 0.0237   Epoch: 10   Global Step: 171260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:41,393-Speed 3292.60 samples/sec   Loss 1.4728   LearningRate 0.0237   Epoch: 10   Global Step: 171270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:44,484-Speed 3313.92 samples/sec   Loss 1.4787   LearningRate 0.0237   Epoch: 10   Global Step: 171280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:47,557-Speed 3332.94 samples/sec   Loss 1.5150   LearningRate 0.0237   Epoch: 10   Global Step: 171290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:50,648-Speed 3313.62 samples/sec   Loss 1.4149   LearningRate 0.0237   Epoch: 10   Global Step: 171300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:53,718-Speed 3336.25 samples/sec   Loss 1.4611   LearningRate 0.0237   Epoch: 10   Global Step: 171310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:56,835-Speed 3285.44 samples/sec   Loss 1.4845   LearningRate 0.0237   Epoch: 10   Global Step: 171320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:21:59,929-Speed 3311.22 samples/sec   Loss 1.4678   LearningRate 0.0237   Epoch: 10   Global Step: 171330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:22:03,016-Speed 3317.04 samples/sec   Loss 1.5095   LearningRate 0.0237   Epoch: 10   Global Step: 171340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:22:06,095-Speed 3326.65 samples/sec   Loss 1.4683   LearningRate 0.0237   Epoch: 10   Global Step: 171350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:09,172-Speed 3328.44 samples/sec   Loss 1.4813   LearningRate 0.0237   Epoch: 10   Global Step: 171360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:12,480-Speed 3262.90 samples/sec   Loss 1.5018   LearningRate 0.0237   Epoch: 10   Global Step: 171370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:15,576-Speed 3308.56 samples/sec   Loss 1.4695   LearningRate 0.0237   Epoch: 10   Global Step: 171380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:18,744-Speed 3232.59 samples/sec   Loss 1.4491   LearningRate 0.0237   Epoch: 10   Global Step: 171390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:21,834-Speed 3314.70 samples/sec   Loss 1.4376   LearningRate 0.0237   Epoch: 10   Global Step: 171400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:24,907-Speed 3333.83 samples/sec   Loss 1.5300   LearningRate 0.0237   Epoch: 10   Global Step: 171410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:27,978-Speed 3335.08 samples/sec   Loss 1.4772   LearningRate 0.0237   Epoch: 10   Global Step: 171420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:31,096-Speed 3284.66 samples/sec   Loss 1.4314   LearningRate 0.0237   Epoch: 10   Global Step: 171430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:34,190-Speed 3309.42 samples/sec   Loss 1.4430   LearningRate 0.0237   Epoch: 10   Global Step: 171440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:22:37,279-Speed 3316.47 samples/sec   Loss 1.4643   LearningRate 0.0237   Epoch: 10   Global Step: 171450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:22:40,361-Speed 3323.98 samples/sec   Loss 1.5065   LearningRate 0.0237   Epoch: 10   Global Step: 171460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:22:43,451-Speed 3314.25 samples/sec   Loss 1.4594   LearningRate 0.0237   Epoch: 10   Global Step: 171470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:22:46,546-Speed 3309.17 samples/sec   Loss 1.4430   LearningRate 0.0236   Epoch: 10   Global Step: 171480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:22:49,635-Speed 3315.48 samples/sec   Loss 1.4759   LearningRate 0.0236   Epoch: 10   Global Step: 171490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:22:52,722-Speed 3318.41 samples/sec   Loss 1.4871   LearningRate 0.0236   Epoch: 10   Global Step: 171500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:22:55,810-Speed 3316.34 samples/sec   Loss 1.4990   LearningRate 0.0236   Epoch: 10   Global Step: 171510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:22:58,892-Speed 3323.75 samples/sec   Loss 1.4793   LearningRate 0.0236   Epoch: 10   Global Step: 171520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:01,993-Speed 3302.48 samples/sec   Loss 1.4528   LearningRate 0.0236   Epoch: 10   Global Step: 171530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:05,078-Speed 3320.46 samples/sec   Loss 1.4262   LearningRate 0.0236   Epoch: 10   Global Step: 171540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:08,162-Speed 3321.84 samples/sec   Loss 1.4367   LearningRate 0.0236   Epoch: 10   Global Step: 171550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:11,248-Speed 3318.52 samples/sec   Loss 1.4811   LearningRate 0.0236   Epoch: 10   Global Step: 171560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:14,431-Speed 3217.88 samples/sec   Loss 1.4498   LearningRate 0.0236   Epoch: 10   Global Step: 171570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:17,530-Speed 3305.86 samples/sec   Loss 1.4889   LearningRate 0.0236   Epoch: 10   Global Step: 171580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:20,673-Speed 3258.60 samples/sec   Loss 1.5097   LearningRate 0.0236   Epoch: 10   Global Step: 171590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:23,791-Speed 3284.69 samples/sec   Loss 1.4622   LearningRate 0.0236   Epoch: 10   Global Step: 171600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:26,911-Speed 3282.75 samples/sec   Loss 1.4706   LearningRate 0.0236   Epoch: 10   Global Step: 171610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:30,064-Speed 3248.16 samples/sec   Loss 1.4338   LearningRate 0.0236   Epoch: 10   Global Step: 171620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:23:33,238-Speed 3227.68 samples/sec   Loss 1.4422   LearningRate 0.0236   Epoch: 10   Global Step: 171630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:23:36,337-Speed 3304.73 samples/sec   Loss 1.4741   LearningRate 0.0236   Epoch: 10   Global Step: 171640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:23:39,420-Speed 3322.04 samples/sec   Loss 1.4347   LearningRate 0.0236   Epoch: 10   Global Step: 171650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:23:42,497-Speed 3328.58 samples/sec   Loss 1.4250   LearningRate 0.0236   Epoch: 10   Global Step: 171660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:23:45,595-Speed 3305.89 samples/sec   Loss 1.4577   LearningRate 0.0236   Epoch: 10   Global Step: 171670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:23:48,665-Speed 3336.97 samples/sec   Loss 1.5061   LearningRate 0.0236   Epoch: 10   Global Step: 171680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:51,753-Speed 3316.56 samples/sec   Loss 1.5121   LearningRate 0.0236   Epoch: 10   Global Step: 171690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:54,862-Speed 3294.98 samples/sec   Loss 1.4912   LearningRate 0.0236   Epoch: 10   Global Step: 171700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:23:57,940-Speed 3327.23 samples/sec   Loss 1.4487   LearningRate 0.0236   Epoch: 10   Global Step: 171710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:01,022-Speed 3323.49 samples/sec   Loss 1.4946   LearningRate 0.0236   Epoch: 10   Global Step: 171720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:04,105-Speed 3322.48 samples/sec   Loss 1.4561   LearningRate 0.0236   Epoch: 10   Global Step: 171730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:07,265-Speed 3240.50 samples/sec   Loss 1.4965   LearningRate 0.0236   Epoch: 10   Global Step: 171740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:10,355-Speed 3314.84 samples/sec   Loss 1.4896   LearningRate 0.0236   Epoch: 10   Global Step: 171750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:13,554-Speed 3201.68 samples/sec   Loss 1.4158   LearningRate 0.0236   Epoch: 10   Global Step: 171760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:16,658-Speed 3300.39 samples/sec   Loss 1.4865   LearningRate 0.0236   Epoch: 10   Global Step: 171770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:19,733-Speed 3330.18 samples/sec   Loss 1.4997   LearningRate 0.0236   Epoch: 10   Global Step: 171780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:24:22,827-Speed 3311.21 samples/sec   Loss 1.4866   LearningRate 0.0236   Epoch: 10   Global Step: 171790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:24:25,903-Speed 3329.50 samples/sec   Loss 1.5100   LearningRate 0.0236   Epoch: 10   Global Step: 171800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:24:28,996-Speed 3311.77 samples/sec   Loss 1.5118   LearningRate 0.0236   Epoch: 10   Global Step: 171810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:24:32,207-Speed 3190.31 samples/sec   Loss 1.4936   LearningRate 0.0236   Epoch: 10   Global Step: 171820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:24:35,311-Speed 3299.55 samples/sec   Loss 1.4869   LearningRate 0.0235   Epoch: 10   Global Step: 171830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:24:38,424-Speed 3290.50 samples/sec   Loss 1.5381   LearningRate 0.0235   Epoch: 10   Global Step: 171840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:24:41,553-Speed 3273.63 samples/sec   Loss 1.4861   LearningRate 0.0235   Epoch: 10   Global Step: 171850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:44,659-Speed 3296.66 samples/sec   Loss 1.4609   LearningRate 0.0235   Epoch: 10   Global Step: 171860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:47,752-Speed 3312.67 samples/sec   Loss 1.4687   LearningRate 0.0235   Epoch: 10   Global Step: 171870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:50,833-Speed 3323.27 samples/sec   Loss 1.5253   LearningRate 0.0235   Epoch: 10   Global Step: 171880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:53,918-Speed 3320.85 samples/sec   Loss 1.5218   LearningRate 0.0235   Epoch: 10   Global Step: 171890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:24:56,996-Speed 3327.82 samples/sec   Loss 1.5228   LearningRate 0.0235   Epoch: 10   Global Step: 171900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:25:00,083-Speed 3317.80 samples/sec   Loss 1.4879   LearningRate 0.0235   Epoch: 10   Global Step: 171910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:25:03,161-Speed 3326.93 samples/sec   Loss 1.5329   LearningRate 0.0235   Epoch: 10   Global Step: 171920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:25:06,242-Speed 3324.99 samples/sec   Loss 1.5613   LearningRate 0.0235   Epoch: 10   Global Step: 171930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:25:09,441-Speed 3202.07 samples/sec   Loss 1.4985   LearningRate 0.0235   Epoch: 10   Global Step: 171940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:25:12,585-Speed 3257.10 samples/sec   Loss 1.4840   LearningRate 0.0235   Epoch: 10   Global Step: 171950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:25:15,688-Speed 3300.78 samples/sec   Loss 1.4482   LearningRate 0.0235   Epoch: 10   Global Step: 171960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:25:19,095-Speed 3217.61 samples/sec   Loss 1.4965   LearningRate 0.0235   Epoch: 10   Global Step: 171970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:25:22,219-Speed 3279.61 samples/sec   Loss 1.4745   LearningRate 0.0235   Epoch: 10   Global Step: 171980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:25:25,328-Speed 3293.92 samples/sec   Loss 1.4783   LearningRate 0.0235   Epoch: 10   Global Step: 171990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:25:28,450-Speed 3280.46 samples/sec   Loss 1.5806   LearningRate 0.0235   Epoch: 10   Global Step: 172000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:26:12,702-[lfw][172000]XNorm: 20.466393
Training: 2022-04-11 17:26:12,703-[lfw][172000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 17:26:12,703-[lfw][172000]Accuracy-Highest: 0.99817
Training: 2022-04-11 17:27:03,650-[cfp_fp][172000]XNorm: 20.593770
Training: 2022-04-11 17:27:03,651-[cfp_fp][172000]Accuracy-Flip: 0.98814+-0.00596
Training: 2022-04-11 17:27:03,652-[cfp_fp][172000]Accuracy-Highest: 0.98971
Training: 2022-04-11 17:27:47,490-[agedb_30][172000]XNorm: 21.332894
Training: 2022-04-11 17:27:47,491-[agedb_30][172000]Accuracy-Flip: 0.98317+-0.00621
Training: 2022-04-11 17:27:47,491-[agedb_30][172000]Accuracy-Highest: 0.98450
Training: 2022-04-11 17:27:50,563-Speed 72.06 samples/sec   Loss 1.4620   LearningRate 0.0235   Epoch: 10   Global Step: 172010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:27:53,637-Speed 3332.63 samples/sec   Loss 1.5097   LearningRate 0.0235   Epoch: 10   Global Step: 172020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:27:56,726-Speed 3315.10 samples/sec   Loss 1.5329   LearningRate 0.0235   Epoch: 10   Global Step: 172030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:27:59,841-Speed 3288.50 samples/sec   Loss 1.5124   LearningRate 0.0235   Epoch: 10   Global Step: 172040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:28:02,923-Speed 3323.44 samples/sec   Loss 1.5100   LearningRate 0.0235   Epoch: 10   Global Step: 172050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:28:05,985-Speed 3344.11 samples/sec   Loss 1.4852   LearningRate 0.0235   Epoch: 10   Global Step: 172060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:28:09,152-Speed 3234.64 samples/sec   Loss 1.5330   LearningRate 0.0235   Epoch: 10   Global Step: 172070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:28:12,264-Speed 3292.09 samples/sec   Loss 1.5608   LearningRate 0.0235   Epoch: 10   Global Step: 172080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:28:15,416-Speed 3248.51 samples/sec   Loss 1.4860   LearningRate 0.0235   Epoch: 10   Global Step: 172090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:28:18,540-Speed 3278.71 samples/sec   Loss 1.4644   LearningRate 0.0235   Epoch: 10   Global Step: 172100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:28:21,642-Speed 3301.96 samples/sec   Loss 1.4839   LearningRate 0.0235   Epoch: 10   Global Step: 172110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:28:24,707-Speed 3341.66 samples/sec   Loss 1.5007   LearningRate 0.0235   Epoch: 10   Global Step: 172120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:28:27,777-Speed 3336.03 samples/sec   Loss 1.5005   LearningRate 0.0235   Epoch: 10   Global Step: 172130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:30,871-Speed 3310.71 samples/sec   Loss 1.5273   LearningRate 0.0235   Epoch: 10   Global Step: 172140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:33,962-Speed 3314.51 samples/sec   Loss 1.4920   LearningRate 0.0235   Epoch: 10   Global Step: 172150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:37,043-Speed 3324.17 samples/sec   Loss 1.4272   LearningRate 0.0235   Epoch: 10   Global Step: 172160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:40,117-Speed 3332.01 samples/sec   Loss 1.4902   LearningRate 0.0234   Epoch: 10   Global Step: 172170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:43,201-Speed 3321.39 samples/sec   Loss 1.4988   LearningRate 0.0234   Epoch: 10   Global Step: 172180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:46,347-Speed 3256.55 samples/sec   Loss 1.4824   LearningRate 0.0234   Epoch: 10   Global Step: 172190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:49,447-Speed 3303.27 samples/sec   Loss 1.4835   LearningRate 0.0234   Epoch: 10   Global Step: 172200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:52,523-Speed 3329.88 samples/sec   Loss 1.4761   LearningRate 0.0234   Epoch: 10   Global Step: 172210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:55,641-Speed 3284.99 samples/sec   Loss 1.5117   LearningRate 0.0234   Epoch: 10   Global Step: 172220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:28:58,721-Speed 3325.13 samples/sec   Loss 1.5043   LearningRate 0.0234   Epoch: 10   Global Step: 172230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:01,791-Speed 3337.02 samples/sec   Loss 1.4877   LearningRate 0.0234   Epoch: 10   Global Step: 172240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:04,910-Speed 3283.95 samples/sec   Loss 1.4692   LearningRate 0.0234   Epoch: 10   Global Step: 172250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:07,986-Speed 3329.79 samples/sec   Loss 1.4933   LearningRate 0.0234   Epoch: 10   Global Step: 172260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:11,060-Speed 3331.64 samples/sec   Loss 1.4935   LearningRate 0.0234   Epoch: 10   Global Step: 172270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:14,136-Speed 3330.04 samples/sec   Loss 1.4544   LearningRate 0.0234   Epoch: 10   Global Step: 172280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:17,203-Speed 3338.64 samples/sec   Loss 1.5026   LearningRate 0.0234   Epoch: 10   Global Step: 172290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:20,325-Speed 3281.43 samples/sec   Loss 1.4946   LearningRate 0.0234   Epoch: 10   Global Step: 172300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:23,405-Speed 3324.33 samples/sec   Loss 1.4264   LearningRate 0.0234   Epoch: 10   Global Step: 172310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:26,489-Speed 3321.72 samples/sec   Loss 1.5027   LearningRate 0.0234   Epoch: 10   Global Step: 172320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:29,561-Speed 3333.90 samples/sec   Loss 1.4938   LearningRate 0.0234   Epoch: 10   Global Step: 172330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:29:32,631-Speed 3336.45 samples/sec   Loss 1.4466   LearningRate 0.0234   Epoch: 10   Global Step: 172340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:29:35,705-Speed 3332.23 samples/sec   Loss 1.4613   LearningRate 0.0234   Epoch: 10   Global Step: 172350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:38,844-Speed 3263.02 samples/sec   Loss 1.4827   LearningRate 0.0234   Epoch: 10   Global Step: 172360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:41,926-Speed 3323.17 samples/sec   Loss 1.4346   LearningRate 0.0234   Epoch: 10   Global Step: 172370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:44,994-Speed 3338.37 samples/sec   Loss 1.5118   LearningRate 0.0234   Epoch: 10   Global Step: 172380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:48,071-Speed 3329.03 samples/sec   Loss 1.5049   LearningRate 0.0234   Epoch: 10   Global Step: 172390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:29:51,175-Speed 3299.12 samples/sec   Loss 1.4437   LearningRate 0.0234   Epoch: 10   Global Step: 172400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:29:54,258-Speed 3337.72 samples/sec   Loss 1.5266   LearningRate 0.0234   Epoch: 10   Global Step: 172410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:29:57,343-Speed 3319.76 samples/sec   Loss 1.4840   LearningRate 0.0234   Epoch: 10   Global Step: 172420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:30:00,419-Speed 3331.29 samples/sec   Loss 1.5419   LearningRate 0.0234   Epoch: 10   Global Step: 172430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:30:03,510-Speed 3313.90 samples/sec   Loss 1.4816   LearningRate 0.0234   Epoch: 10   Global Step: 172440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:30:06,621-Speed 3292.31 samples/sec   Loss 1.4967   LearningRate 0.0234   Epoch: 10   Global Step: 172450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:30:09,708-Speed 3317.74 samples/sec   Loss 1.5023   LearningRate 0.0234   Epoch: 10   Global Step: 172460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:30:12,786-Speed 3327.89 samples/sec   Loss 1.5260   LearningRate 0.0234   Epoch: 10   Global Step: 172470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:30:15,863-Speed 3327.68 samples/sec   Loss 1.4632   LearningRate 0.0234   Epoch: 10   Global Step: 172480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:30:19,031-Speed 3234.10 samples/sec   Loss 1.4891   LearningRate 0.0234   Epoch: 10   Global Step: 172490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:30:22,107-Speed 3329.68 samples/sec   Loss 1.4912   LearningRate 0.0234   Epoch: 10   Global Step: 172500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:25,179-Speed 3334.35 samples/sec   Loss 1.4955   LearningRate 0.0234   Epoch: 10   Global Step: 172510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:28,249-Speed 3336.13 samples/sec   Loss 1.5091   LearningRate 0.0233   Epoch: 10   Global Step: 172520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:31,329-Speed 3325.48 samples/sec   Loss 1.5012   LearningRate 0.0233   Epoch: 10   Global Step: 172530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:34,400-Speed 3335.45 samples/sec   Loss 1.5288   LearningRate 0.0233   Epoch: 10   Global Step: 172540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:37,484-Speed 3321.08 samples/sec   Loss 1.4865   LearningRate 0.0233   Epoch: 10   Global Step: 172550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:40,558-Speed 3331.83 samples/sec   Loss 1.5127   LearningRate 0.0233   Epoch: 10   Global Step: 172560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:43,640-Speed 3322.80 samples/sec   Loss 1.5074   LearningRate 0.0233   Epoch: 10   Global Step: 172570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:46,732-Speed 3313.47 samples/sec   Loss 1.4861   LearningRate 0.0233   Epoch: 10   Global Step: 172580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:49,831-Speed 3304.98 samples/sec   Loss 1.5056   LearningRate 0.0233   Epoch: 10   Global Step: 172590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:30:52,907-Speed 3329.29 samples/sec   Loss 1.5081   LearningRate 0.0233   Epoch: 10   Global Step: 172600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:30:55,981-Speed 3332.13 samples/sec   Loss 1.4927   LearningRate 0.0233   Epoch: 10   Global Step: 172610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:30:59,083-Speed 3301.76 samples/sec   Loss 1.4761   LearningRate 0.0233   Epoch: 10   Global Step: 172620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:02,154-Speed 3335.63 samples/sec   Loss 1.5072   LearningRate 0.0233   Epoch: 10   Global Step: 172630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:05,235-Speed 3324.39 samples/sec   Loss 1.4907   LearningRate 0.0233   Epoch: 10   Global Step: 172640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:08,314-Speed 3326.49 samples/sec   Loss 1.4996   LearningRate 0.0233   Epoch: 10   Global Step: 172650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:11,395-Speed 3324.05 samples/sec   Loss 1.4953   LearningRate 0.0233   Epoch: 10   Global Step: 172660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:14,494-Speed 3305.58 samples/sec   Loss 1.5275   LearningRate 0.0233   Epoch: 10   Global Step: 172670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:17,571-Speed 3328.09 samples/sec   Loss 1.4707   LearningRate 0.0233   Epoch: 10   Global Step: 172680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:20,738-Speed 3234.33 samples/sec   Loss 1.4894   LearningRate 0.0233   Epoch: 10   Global Step: 172690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:23,971-Speed 3168.26 samples/sec   Loss 1.5488   LearningRate 0.0233   Epoch: 10   Global Step: 172700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:27,145-Speed 3226.74 samples/sec   Loss 1.5405   LearningRate 0.0233   Epoch: 10   Global Step: 172710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:30,281-Speed 3266.34 samples/sec   Loss 1.5854   LearningRate 0.0233   Epoch: 10   Global Step: 172720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:33,370-Speed 3315.33 samples/sec   Loss 1.5244   LearningRate 0.0233   Epoch: 10   Global Step: 172730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:36,452-Speed 3322.97 samples/sec   Loss 1.5818   LearningRate 0.0233   Epoch: 10   Global Step: 172740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:31:39,622-Speed 3230.91 samples/sec   Loss 1.5117   LearningRate 0.0233   Epoch: 10   Global Step: 172750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:31:42,785-Speed 3238.75 samples/sec   Loss 1.5052   LearningRate 0.0233   Epoch: 10   Global Step: 172760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:31:45,857-Speed 3333.64 samples/sec   Loss 1.5222   LearningRate 0.0233   Epoch: 10   Global Step: 172770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:31:48,934-Speed 3329.21 samples/sec   Loss 1.4851   LearningRate 0.0233   Epoch: 10   Global Step: 172780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:31:52,012-Speed 3327.10 samples/sec   Loss 1.5108   LearningRate 0.0233   Epoch: 10   Global Step: 172790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:31:55,102-Speed 3314.81 samples/sec   Loss 1.5437   LearningRate 0.0233   Epoch: 10   Global Step: 172800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:31:58,235-Speed 3268.94 samples/sec   Loss 1.5156   LearningRate 0.0233   Epoch: 10   Global Step: 172810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:01,327-Speed 3312.99 samples/sec   Loss 1.4931   LearningRate 0.0233   Epoch: 10   Global Step: 172820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:04,503-Speed 3225.08 samples/sec   Loss 1.5420   LearningRate 0.0233   Epoch: 10   Global Step: 172830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:07,681-Speed 3222.76 samples/sec   Loss 1.5184   LearningRate 0.0233   Epoch: 10   Global Step: 172840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:10,775-Speed 3310.23 samples/sec   Loss 1.4885   LearningRate 0.0233   Epoch: 10   Global Step: 172850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:32:13,895-Speed 3283.44 samples/sec   Loss 1.5032   LearningRate 0.0232   Epoch: 10   Global Step: 172860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:32:16,977-Speed 3322.35 samples/sec   Loss 1.5070   LearningRate 0.0232   Epoch: 10   Global Step: 172870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:32:20,048-Speed 3335.64 samples/sec   Loss 1.4957   LearningRate 0.0232   Epoch: 10   Global Step: 172880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:23,141-Speed 3311.93 samples/sec   Loss 1.4762   LearningRate 0.0232   Epoch: 10   Global Step: 172890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:26,215-Speed 3331.10 samples/sec   Loss 1.5358   LearningRate 0.0232   Epoch: 10   Global Step: 172900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:29,294-Speed 3327.20 samples/sec   Loss 1.5856   LearningRate 0.0232   Epoch: 10   Global Step: 172910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:32,371-Speed 3328.00 samples/sec   Loss 1.5240   LearningRate 0.0232   Epoch: 10   Global Step: 172920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:35,454-Speed 3322.45 samples/sec   Loss 1.5066   LearningRate 0.0232   Epoch: 10   Global Step: 172930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:38,534-Speed 3325.96 samples/sec   Loss 1.5209   LearningRate 0.0232   Epoch: 10   Global Step: 172940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:41,621-Speed 3317.82 samples/sec   Loss 1.5252   LearningRate 0.0232   Epoch: 10   Global Step: 172950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:44,699-Speed 3327.78 samples/sec   Loss 1.5018   LearningRate 0.0232   Epoch: 10   Global Step: 172960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:47,773-Speed 3332.28 samples/sec   Loss 1.5741   LearningRate 0.0232   Epoch: 10   Global Step: 172970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:32:50,849-Speed 3328.75 samples/sec   Loss 1.4699   LearningRate 0.0232   Epoch: 10   Global Step: 172980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:32:53,941-Speed 3313.09 samples/sec   Loss 1.4925   LearningRate 0.0232   Epoch: 10   Global Step: 172990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:32:57,016-Speed 3330.50 samples/sec   Loss 1.5121   LearningRate 0.0232   Epoch: 10   Global Step: 173000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:33:00,101-Speed 3319.63 samples/sec   Loss 1.5340   LearningRate 0.0232   Epoch: 10   Global Step: 173010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:33:03,180-Speed 3327.14 samples/sec   Loss 1.4469   LearningRate 0.0232   Epoch: 10   Global Step: 173020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:33:06,262-Speed 3323.62 samples/sec   Loss 1.5353   LearningRate 0.0232   Epoch: 10   Global Step: 173030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:33:09,341-Speed 3326.70 samples/sec   Loss 1.5148   LearningRate 0.0232   Epoch: 10   Global Step: 173040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:33:12,420-Speed 3326.38 samples/sec   Loss 1.4727   LearningRate 0.0232   Epoch: 10   Global Step: 173050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:33:15,529-Speed 3294.23 samples/sec   Loss 1.4899   LearningRate 0.0232   Epoch: 10   Global Step: 173060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:18,605-Speed 3330.39 samples/sec   Loss 1.4465   LearningRate 0.0232   Epoch: 10   Global Step: 173070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:21,717-Speed 3290.43 samples/sec   Loss 1.4788   LearningRate 0.0232   Epoch: 10   Global Step: 173080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:24,892-Speed 3226.07 samples/sec   Loss 1.5734   LearningRate 0.0232   Epoch: 10   Global Step: 173090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:28,028-Speed 3266.43 samples/sec   Loss 1.5009   LearningRate 0.0232   Epoch: 10   Global Step: 173100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:31,131-Speed 3301.25 samples/sec   Loss 1.5413   LearningRate 0.0232   Epoch: 10   Global Step: 173110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:34,219-Speed 3317.17 samples/sec   Loss 1.5629   LearningRate 0.0232   Epoch: 10   Global Step: 173120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:37,300-Speed 3323.55 samples/sec   Loss 1.5280   LearningRate 0.0232   Epoch: 10   Global Step: 173130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:40,384-Speed 3321.58 samples/sec   Loss 1.5197   LearningRate 0.0232   Epoch: 10   Global Step: 173140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:43,492-Speed 3295.64 samples/sec   Loss 1.4945   LearningRate 0.0232   Epoch: 10   Global Step: 173150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:46,587-Speed 3308.96 samples/sec   Loss 1.5503   LearningRate 0.0232   Epoch: 10   Global Step: 173160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:49,727-Speed 3261.52 samples/sec   Loss 1.4855   LearningRate 0.0232   Epoch: 10   Global Step: 173170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:52,815-Speed 3317.35 samples/sec   Loss 1.5214   LearningRate 0.0232   Epoch: 10   Global Step: 173180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:56,013-Speed 3202.64 samples/sec   Loss 1.4945   LearningRate 0.0232   Epoch: 10   Global Step: 173190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:33:59,096-Speed 3322.70 samples/sec   Loss 1.5320   LearningRate 0.0232   Epoch: 10   Global Step: 173200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:34:02,257-Speed 3239.54 samples/sec   Loss 1.5535   LearningRate 0.0231   Epoch: 10   Global Step: 173210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:34:05,398-Speed 3261.08 samples/sec   Loss 1.5058   LearningRate 0.0231   Epoch: 10   Global Step: 173220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:34:08,526-Speed 3274.76 samples/sec   Loss 1.5054   LearningRate 0.0231   Epoch: 10   Global Step: 173230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:34:11,605-Speed 3326.63 samples/sec   Loss 1.5329   LearningRate 0.0231   Epoch: 10   Global Step: 173240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:34:14,681-Speed 3328.99 samples/sec   Loss 1.5395   LearningRate 0.0231   Epoch: 10   Global Step: 173250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:34:17,757-Speed 3330.30 samples/sec   Loss 1.4860   LearningRate 0.0231   Epoch: 10   Global Step: 173260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:20,829-Speed 3333.83 samples/sec   Loss 1.4553   LearningRate 0.0231   Epoch: 10   Global Step: 173270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:23,916-Speed 3318.38 samples/sec   Loss 1.5364   LearningRate 0.0231   Epoch: 10   Global Step: 173280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:26,995-Speed 3326.54 samples/sec   Loss 1.4910   LearningRate 0.0231   Epoch: 10   Global Step: 173290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:30,112-Speed 3286.11 samples/sec   Loss 1.4898   LearningRate 0.0231   Epoch: 10   Global Step: 173300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:33,237-Speed 3277.62 samples/sec   Loss 1.5367   LearningRate 0.0231   Epoch: 10   Global Step: 173310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:36,323-Speed 3319.12 samples/sec   Loss 1.5391   LearningRate 0.0231   Epoch: 10   Global Step: 173320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:39,404-Speed 3323.95 samples/sec   Loss 1.5318   LearningRate 0.0231   Epoch: 10   Global Step: 173330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:42,485-Speed 3323.86 samples/sec   Loss 1.4861   LearningRate 0.0231   Epoch: 10   Global Step: 173340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:45,569-Speed 3321.09 samples/sec   Loss 1.5148   LearningRate 0.0231   Epoch: 10   Global Step: 173350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:48,645-Speed 3329.56 samples/sec   Loss 1.5340   LearningRate 0.0231   Epoch: 10   Global Step: 173360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:51,737-Speed 3313.42 samples/sec   Loss 1.5001   LearningRate 0.0231   Epoch: 10   Global Step: 173370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:54,867-Speed 3271.91 samples/sec   Loss 1.4924   LearningRate 0.0231   Epoch: 10   Global Step: 173380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:34:57,961-Speed 3310.41 samples/sec   Loss 1.5109   LearningRate 0.0231   Epoch: 10   Global Step: 173390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:35:01,046-Speed 3319.78 samples/sec   Loss 1.5141   LearningRate 0.0231   Epoch: 10   Global Step: 173400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:04,174-Speed 3274.39 samples/sec   Loss 1.4841   LearningRate 0.0231   Epoch: 10   Global Step: 173410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:07,275-Speed 3303.25 samples/sec   Loss 1.5312   LearningRate 0.0231   Epoch: 10   Global Step: 173420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:10,360-Speed 3320.32 samples/sec   Loss 1.5358   LearningRate 0.0231   Epoch: 10   Global Step: 173430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:13,495-Speed 3266.91 samples/sec   Loss 1.5314   LearningRate 0.0231   Epoch: 10   Global Step: 173440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:16,662-Speed 3234.70 samples/sec   Loss 1.4829   LearningRate 0.0231   Epoch: 10   Global Step: 173450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:19,740-Speed 3327.66 samples/sec   Loss 1.5349   LearningRate 0.0231   Epoch: 10   Global Step: 173460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:22,834-Speed 3310.43 samples/sec   Loss 1.5404   LearningRate 0.0231   Epoch: 10   Global Step: 173470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:25,910-Speed 3329.22 samples/sec   Loss 1.5847   LearningRate 0.0231   Epoch: 10   Global Step: 173480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:29,079-Speed 3232.00 samples/sec   Loss 1.5298   LearningRate 0.0231   Epoch: 10   Global Step: 173490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:32,165-Speed 3318.69 samples/sec   Loss 1.5352   LearningRate 0.0231   Epoch: 10   Global Step: 173500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:35:35,242-Speed 3329.27 samples/sec   Loss 1.5099   LearningRate 0.0231   Epoch: 10   Global Step: 173510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:35:38,360-Speed 3285.12 samples/sec   Loss 1.5726   LearningRate 0.0231   Epoch: 10   Global Step: 173520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:35:41,464-Speed 3299.32 samples/sec   Loss 1.5014   LearningRate 0.0231   Epoch: 10   Global Step: 173530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:35:44,551-Speed 3318.11 samples/sec   Loss 1.5512   LearningRate 0.0231   Epoch: 10   Global Step: 173540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:47,635-Speed 3321.13 samples/sec   Loss 1.5006   LearningRate 0.0231   Epoch: 10   Global Step: 173550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:50,714-Speed 3327.20 samples/sec   Loss 1.4924   LearningRate 0.0230   Epoch: 10   Global Step: 173560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:53,798-Speed 3321.17 samples/sec   Loss 1.5413   LearningRate 0.0230   Epoch: 10   Global Step: 173570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:35:56,932-Speed 3267.68 samples/sec   Loss 1.5318   LearningRate 0.0230   Epoch: 10   Global Step: 173580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:36:00,015-Speed 3321.55 samples/sec   Loss 1.5889   LearningRate 0.0230   Epoch: 10   Global Step: 173590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:36:03,109-Speed 3311.13 samples/sec   Loss 1.5022   LearningRate 0.0230   Epoch: 10   Global Step: 173600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:36:06,203-Speed 3310.42 samples/sec   Loss 1.4825   LearningRate 0.0230   Epoch: 10   Global Step: 173610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:36:09,285-Speed 3323.48 samples/sec   Loss 1.5151   LearningRate 0.0230   Epoch: 10   Global Step: 173620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:36:12,358-Speed 3333.01 samples/sec   Loss 1.5341   LearningRate 0.0230   Epoch: 10   Global Step: 173630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:36:15,436-Speed 3327.08 samples/sec   Loss 1.5492   LearningRate 0.0230   Epoch: 10   Global Step: 173640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:18,512-Speed 3329.69 samples/sec   Loss 1.4903   LearningRate 0.0230   Epoch: 10   Global Step: 173650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:21,589-Speed 3328.53 samples/sec   Loss 1.4884   LearningRate 0.0230   Epoch: 10   Global Step: 173660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:24,666-Speed 3328.73 samples/sec   Loss 1.4665   LearningRate 0.0230   Epoch: 10   Global Step: 173670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:27,777-Speed 3292.75 samples/sec   Loss 1.5214   LearningRate 0.0230   Epoch: 10   Global Step: 173680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:30,863-Speed 3319.22 samples/sec   Loss 1.5020   LearningRate 0.0230   Epoch: 10   Global Step: 173690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:34,013-Speed 3250.43 samples/sec   Loss 1.5513   LearningRate 0.0230   Epoch: 10   Global Step: 173700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:37,113-Speed 3304.64 samples/sec   Loss 1.5443   LearningRate 0.0230   Epoch: 10   Global Step: 173710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:40,190-Speed 3329.25 samples/sec   Loss 1.5061   LearningRate 0.0230   Epoch: 10   Global Step: 173720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:43,278-Speed 3316.75 samples/sec   Loss 1.5313   LearningRate 0.0230   Epoch: 10   Global Step: 173730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:46,343-Speed 3342.22 samples/sec   Loss 1.5547   LearningRate 0.0230   Epoch: 10   Global Step: 173740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:49,422-Speed 3325.63 samples/sec   Loss 1.5173   LearningRate 0.0230   Epoch: 10   Global Step: 173750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:52,513-Speed 3314.02 samples/sec   Loss 1.5816   LearningRate 0.0230   Epoch: 10   Global Step: 173760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:36:55,611-Speed 3305.71 samples/sec   Loss 1.5291   LearningRate 0.0230   Epoch: 10   Global Step: 173770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:36:58,749-Speed 3264.43 samples/sec   Loss 1.5475   LearningRate 0.0230   Epoch: 10   Global Step: 173780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:01,826-Speed 3329.17 samples/sec   Loss 1.6013   LearningRate 0.0230   Epoch: 10   Global Step: 173790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:04,901-Speed 3330.71 samples/sec   Loss 1.6151   LearningRate 0.0230   Epoch: 10   Global Step: 173800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:07,994-Speed 3311.54 samples/sec   Loss 1.5598   LearningRate 0.0230   Epoch: 10   Global Step: 173810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:11,073-Speed 3326.02 samples/sec   Loss 1.5467   LearningRate 0.0230   Epoch: 10   Global Step: 173820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:14,180-Speed 3297.30 samples/sec   Loss 1.5210   LearningRate 0.0230   Epoch: 10   Global Step: 173830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:17,266-Speed 3318.82 samples/sec   Loss 1.5436   LearningRate 0.0230   Epoch: 10   Global Step: 173840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:20,372-Speed 3296.93 samples/sec   Loss 1.5375   LearningRate 0.0230   Epoch: 10   Global Step: 173850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:23,463-Speed 3313.91 samples/sec   Loss 1.5261   LearningRate 0.0230   Epoch: 10   Global Step: 173860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:26,550-Speed 3318.39 samples/sec   Loss 1.5244   LearningRate 0.0230   Epoch: 10   Global Step: 173870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:37:29,636-Speed 3319.43 samples/sec   Loss 1.5183   LearningRate 0.0230   Epoch: 10   Global Step: 173880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:37:32,727-Speed 3312.72 samples/sec   Loss 1.5089   LearningRate 0.0230   Epoch: 10   Global Step: 173890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:37:35,793-Speed 3340.88 samples/sec   Loss 1.5444   LearningRate 0.0229   Epoch: 10   Global Step: 173900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:37:38,881-Speed 3316.96 samples/sec   Loss 1.5353   LearningRate 0.0229   Epoch: 10   Global Step: 173910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:37:41,961-Speed 3325.24 samples/sec   Loss 1.5411   LearningRate 0.0229   Epoch: 10   Global Step: 173920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:37:45,034-Speed 3333.57 samples/sec   Loss 1.5353   LearningRate 0.0229   Epoch: 10   Global Step: 173930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:37:48,174-Speed 3261.98 samples/sec   Loss 1.5704   LearningRate 0.0229   Epoch: 10   Global Step: 173940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:37:51,283-Speed 3294.09 samples/sec   Loss 1.5609   LearningRate 0.0229   Epoch: 10   Global Step: 173950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:37:54,386-Speed 3300.77 samples/sec   Loss 1.5769   LearningRate 0.0229   Epoch: 10   Global Step: 173960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:37:57,507-Speed 3282.81 samples/sec   Loss 1.4897   LearningRate 0.0229   Epoch: 10   Global Step: 173970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:38:00,589-Speed 3322.94 samples/sec   Loss 1.5469   LearningRate 0.0229   Epoch: 10   Global Step: 173980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:38:03,669-Speed 3325.62 samples/sec   Loss 1.4973   LearningRate 0.0229   Epoch: 10   Global Step: 173990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:38:06,747-Speed 3327.07 samples/sec   Loss 1.5233   LearningRate 0.0229   Epoch: 10   Global Step: 174000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:38:50,829-[lfw][174000]XNorm: 22.484794
Training: 2022-04-11 17:38:50,830-[lfw][174000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 17:38:50,830-[lfw][174000]Accuracy-Highest: 0.99817
Training: 2022-04-11 17:39:42,011-[cfp_fp][174000]XNorm: 22.273649
Training: 2022-04-11 17:39:42,012-[cfp_fp][174000]Accuracy-Flip: 0.98829+-0.00498
Training: 2022-04-11 17:39:42,012-[cfp_fp][174000]Accuracy-Highest: 0.98971
Training: 2022-04-11 17:40:26,028-[agedb_30][174000]XNorm: 23.406004
Training: 2022-04-11 17:40:26,029-[agedb_30][174000]Accuracy-Flip: 0.98350+-0.00669
Training: 2022-04-11 17:40:26,029-[agedb_30][174000]Accuracy-Highest: 0.98450
Training: 2022-04-11 17:40:29,142-Speed 71.91 samples/sec   Loss 1.5212   LearningRate 0.0229   Epoch: 10   Global Step: 174010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:40:32,203-Speed 3346.32 samples/sec   Loss 1.5356   LearningRate 0.0229   Epoch: 10   Global Step: 174020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:40:35,291-Speed 3317.52 samples/sec   Loss 1.5424   LearningRate 0.0229   Epoch: 10   Global Step: 174030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:40:38,366-Speed 3330.25 samples/sec   Loss 1.5819   LearningRate 0.0229   Epoch: 10   Global Step: 174040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:40:41,431-Speed 3341.32 samples/sec   Loss 1.5533   LearningRate 0.0229   Epoch: 10   Global Step: 174050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:40:44,495-Speed 3342.67 samples/sec   Loss 1.4947   LearningRate 0.0229   Epoch: 10   Global Step: 174060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:40:47,622-Speed 3276.36 samples/sec   Loss 1.5065   LearningRate 0.0229   Epoch: 10   Global Step: 174070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:40:50,790-Speed 3232.50 samples/sec   Loss 1.4855   LearningRate 0.0229   Epoch: 10   Global Step: 174080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:40:53,863-Speed 3332.93 samples/sec   Loss 1.5012   LearningRate 0.0229   Epoch: 10   Global Step: 174090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:40:56,928-Speed 3341.57 samples/sec   Loss 1.5501   LearningRate 0.0229   Epoch: 10   Global Step: 174100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:00,016-Speed 3316.83 samples/sec   Loss 1.5456   LearningRate 0.0229   Epoch: 10   Global Step: 174110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:41:03,096-Speed 3326.19 samples/sec   Loss 1.5780   LearningRate 0.0229   Epoch: 10   Global Step: 174120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:41:06,192-Speed 3307.51 samples/sec   Loss 1.5279   LearningRate 0.0229   Epoch: 10   Global Step: 174130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:09,261-Speed 3338.02 samples/sec   Loss 1.5887   LearningRate 0.0229   Epoch: 10   Global Step: 174140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:12,347-Speed 3319.02 samples/sec   Loss 1.5527   LearningRate 0.0229   Epoch: 10   Global Step: 174150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:15,420-Speed 3332.92 samples/sec   Loss 1.5376   LearningRate 0.0229   Epoch: 10   Global Step: 174160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:18,497-Speed 3328.67 samples/sec   Loss 1.5413   LearningRate 0.0229   Epoch: 10   Global Step: 174170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:21,744-Speed 3154.51 samples/sec   Loss 1.5700   LearningRate 0.0229   Epoch: 10   Global Step: 174180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:24,832-Speed 3317.45 samples/sec   Loss 1.5250   LearningRate 0.0229   Epoch: 10   Global Step: 174190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:27,908-Speed 3328.62 samples/sec   Loss 1.5271   LearningRate 0.0229   Epoch: 10   Global Step: 174200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:30,984-Speed 3330.38 samples/sec   Loss 1.5185   LearningRate 0.0229   Epoch: 10   Global Step: 174210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:34,104-Speed 3282.49 samples/sec   Loss 1.5221   LearningRate 0.0229   Epoch: 10   Global Step: 174220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:41:37,224-Speed 3282.55 samples/sec   Loss 1.5166   LearningRate 0.0229   Epoch: 10   Global Step: 174230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:41:40,299-Speed 3331.47 samples/sec   Loss 1.5337   LearningRate 0.0229   Epoch: 10   Global Step: 174240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:41:43,373-Speed 3331.66 samples/sec   Loss 1.5708   LearningRate 0.0228   Epoch: 10   Global Step: 174250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:41:46,482-Speed 3294.71 samples/sec   Loss 1.5238   LearningRate 0.0228   Epoch: 10   Global Step: 174260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:41:49,639-Speed 3244.28 samples/sec   Loss 1.5558   LearningRate 0.0228   Epoch: 10   Global Step: 174270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:41:52,720-Speed 3324.12 samples/sec   Loss 1.4697   LearningRate 0.0228   Epoch: 10   Global Step: 174280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:41:55,803-Speed 3322.62 samples/sec   Loss 1.5294   LearningRate 0.0228   Epoch: 10   Global Step: 174290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:41:58,982-Speed 3221.56 samples/sec   Loss 1.5432   LearningRate 0.0228   Epoch: 10   Global Step: 174300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:42:02,049-Speed 3340.21 samples/sec   Loss 1.5165   LearningRate 0.0228   Epoch: 10   Global Step: 174310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:05,146-Speed 3307.23 samples/sec   Loss 1.5333   LearningRate 0.0228   Epoch: 10   Global Step: 174320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:08,220-Speed 3332.43 samples/sec   Loss 1.5557   LearningRate 0.0228   Epoch: 10   Global Step: 174330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:11,299-Speed 3326.27 samples/sec   Loss 1.5062   LearningRate 0.0228   Epoch: 10   Global Step: 174340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:14,373-Speed 3332.57 samples/sec   Loss 1.5487   LearningRate 0.0228   Epoch: 10   Global Step: 174350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:17,470-Speed 3306.70 samples/sec   Loss 1.5244   LearningRate 0.0228   Epoch: 10   Global Step: 174360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:20,539-Speed 3337.45 samples/sec   Loss 1.5526   LearningRate 0.0228   Epoch: 10   Global Step: 174370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:23,659-Speed 3283.15 samples/sec   Loss 1.5243   LearningRate 0.0228   Epoch: 10   Global Step: 174380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:26,734-Speed 3330.32 samples/sec   Loss 1.5205   LearningRate 0.0228   Epoch: 10   Global Step: 174390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:29,822-Speed 3316.60 samples/sec   Loss 1.5556   LearningRate 0.0228   Epoch: 10   Global Step: 174400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:32,995-Speed 3228.99 samples/sec   Loss 1.5539   LearningRate 0.0228   Epoch: 10   Global Step: 174410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:42:36,054-Speed 3347.83 samples/sec   Loss 1.5469   LearningRate 0.0228   Epoch: 10   Global Step: 174420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:39,140-Speed 3319.16 samples/sec   Loss 1.5728   LearningRate 0.0228   Epoch: 10   Global Step: 174430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:42,227-Speed 3318.37 samples/sec   Loss 1.5864   LearningRate 0.0228   Epoch: 10   Global Step: 174440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:45,314-Speed 3317.43 samples/sec   Loss 1.5340   LearningRate 0.0228   Epoch: 10   Global Step: 174450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:48,397-Speed 3321.50 samples/sec   Loss 1.4819   LearningRate 0.0228   Epoch: 10   Global Step: 174460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:51,475-Speed 3328.60 samples/sec   Loss 1.5679   LearningRate 0.0228   Epoch: 10   Global Step: 174470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:54,591-Speed 3286.46 samples/sec   Loss 1.5694   LearningRate 0.0228   Epoch: 10   Global Step: 174480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:42:57,673-Speed 3323.32 samples/sec   Loss 1.5748   LearningRate 0.0228   Epoch: 10   Global Step: 174490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:00,747-Speed 3331.56 samples/sec   Loss 1.5347   LearningRate 0.0228   Epoch: 10   Global Step: 174500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:03,824-Speed 3329.00 samples/sec   Loss 1.5218   LearningRate 0.0228   Epoch: 10   Global Step: 174510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:06,905-Speed 3324.75 samples/sec   Loss 1.5613   LearningRate 0.0228   Epoch: 10   Global Step: 174520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:43:09,977-Speed 3334.19 samples/sec   Loss 1.5913   LearningRate 0.0228   Epoch: 10   Global Step: 174530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:43:13,145-Speed 3232.70 samples/sec   Loss 1.5684   LearningRate 0.0228   Epoch: 10   Global Step: 174540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:43:16,310-Speed 3236.59 samples/sec   Loss 1.5320   LearningRate 0.0228   Epoch: 10   Global Step: 174550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:19,380-Speed 3335.20 samples/sec   Loss 1.5900   LearningRate 0.0228   Epoch: 10   Global Step: 174560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:22,461-Speed 3324.88 samples/sec   Loss 1.5536   LearningRate 0.0228   Epoch: 10   Global Step: 174570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:25,563-Speed 3302.20 samples/sec   Loss 1.5303   LearningRate 0.0228   Epoch: 10   Global Step: 174580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:28,655-Speed 3311.92 samples/sec   Loss 1.5025   LearningRate 0.0228   Epoch: 10   Global Step: 174590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:31,799-Speed 3257.92 samples/sec   Loss 1.5292   LearningRate 0.0227   Epoch: 10   Global Step: 174600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:34,930-Speed 3271.56 samples/sec   Loss 1.5277   LearningRate 0.0227   Epoch: 10   Global Step: 174610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:38,011-Speed 3323.85 samples/sec   Loss 1.5618   LearningRate 0.0227   Epoch: 10   Global Step: 174620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:41,098-Speed 3319.12 samples/sec   Loss 1.5165   LearningRate 0.0227   Epoch: 10   Global Step: 174630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:44,186-Speed 3315.94 samples/sec   Loss 1.5859   LearningRate 0.0227   Epoch: 10   Global Step: 174640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:43:47,380-Speed 3206.91 samples/sec   Loss 1.5444   LearningRate 0.0227   Epoch: 10   Global Step: 174650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:43:50,474-Speed 3310.15 samples/sec   Loss 1.5479   LearningRate 0.0227   Epoch: 10   Global Step: 174660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:43:53,564-Speed 3315.49 samples/sec   Loss 1.5724   LearningRate 0.0227   Epoch: 10   Global Step: 174670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:43:56,632-Speed 3337.81 samples/sec   Loss 1.5333   LearningRate 0.0227   Epoch: 10   Global Step: 174680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:43:59,705-Speed 3333.16 samples/sec   Loss 1.5493   LearningRate 0.0227   Epoch: 10   Global Step: 174690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:44:02,783-Speed 3327.39 samples/sec   Loss 1.4885   LearningRate 0.0227   Epoch: 10   Global Step: 174700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:44:05,860-Speed 3329.10 samples/sec   Loss 1.5170   LearningRate 0.0227   Epoch: 10   Global Step: 174710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:44:08,966-Speed 3298.13 samples/sec   Loss 1.5389   LearningRate 0.0227   Epoch: 10   Global Step: 174720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:44:12,043-Speed 3328.32 samples/sec   Loss 1.5764   LearningRate 0.0227   Epoch: 10   Global Step: 174730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:15,143-Speed 3303.76 samples/sec   Loss 1.5608   LearningRate 0.0227   Epoch: 10   Global Step: 174740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:18,275-Speed 3269.77 samples/sec   Loss 1.5732   LearningRate 0.0227   Epoch: 10   Global Step: 174750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:21,352-Speed 3329.44 samples/sec   Loss 1.5254   LearningRate 0.0227   Epoch: 10   Global Step: 174760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:24,449-Speed 3306.92 samples/sec   Loss 1.5571   LearningRate 0.0227   Epoch: 10   Global Step: 174770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:27,540-Speed 3314.15 samples/sec   Loss 1.5767   LearningRate 0.0227   Epoch: 10   Global Step: 174780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:30,628-Speed 3316.36 samples/sec   Loss 1.5259   LearningRate 0.0227   Epoch: 10   Global Step: 174790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:33,707-Speed 3326.23 samples/sec   Loss 1.4278   LearningRate 0.0227   Epoch: 10   Global Step: 174800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:36,811-Speed 3300.32 samples/sec   Loss 1.5288   LearningRate 0.0227   Epoch: 10   Global Step: 174810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:39,898-Speed 3317.79 samples/sec   Loss 1.5265   LearningRate 0.0227   Epoch: 10   Global Step: 174820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:42,969-Speed 3334.43 samples/sec   Loss 1.5430   LearningRate 0.0227   Epoch: 10   Global Step: 174830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:46,045-Speed 3330.14 samples/sec   Loss 1.6048   LearningRate 0.0227   Epoch: 10   Global Step: 174840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:44:49,124-Speed 3326.97 samples/sec   Loss 1.5234   LearningRate 0.0227   Epoch: 10   Global Step: 174850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:44:52,205-Speed 3324.38 samples/sec   Loss 1.6091   LearningRate 0.0227   Epoch: 10   Global Step: 174860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:44:55,280-Speed 3331.19 samples/sec   Loss 1.5361   LearningRate 0.0227   Epoch: 10   Global Step: 174870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:44:58,390-Speed 3293.04 samples/sec   Loss 1.5768   LearningRate 0.0227   Epoch: 10   Global Step: 174880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:45:01,498-Speed 3295.14 samples/sec   Loss 1.5720   LearningRate 0.0227   Epoch: 10   Global Step: 174890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:45:04,580-Speed 3323.86 samples/sec   Loss 1.5275   LearningRate 0.0227   Epoch: 10   Global Step: 174900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:45:07,707-Speed 3275.12 samples/sec   Loss 1.5636   LearningRate 0.0227   Epoch: 10   Global Step: 174910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:45:10,874-Speed 3234.40 samples/sec   Loss 1.4808   LearningRate 0.0227   Epoch: 10   Global Step: 174920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:45:14,004-Speed 3273.18 samples/sec   Loss 1.4790   LearningRate 0.0227   Epoch: 10   Global Step: 174930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:45:17,079-Speed 3330.25 samples/sec   Loss 1.5381   LearningRate 0.0227   Epoch: 10   Global Step: 174940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:45:20,154-Speed 3331.31 samples/sec   Loss 1.5712   LearningRate 0.0226   Epoch: 10   Global Step: 174950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:23,237-Speed 3322.60 samples/sec   Loss 1.5414   LearningRate 0.0226   Epoch: 10   Global Step: 174960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:26,315-Speed 3327.11 samples/sec   Loss 1.5521   LearningRate 0.0226   Epoch: 10   Global Step: 174970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:29,407-Speed 3312.61 samples/sec   Loss 1.5421   LearningRate 0.0226   Epoch: 10   Global Step: 174980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:32,479-Speed 3333.88 samples/sec   Loss 1.5543   LearningRate 0.0226   Epoch: 10   Global Step: 174990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:35,570-Speed 3313.66 samples/sec   Loss 1.5635   LearningRate 0.0226   Epoch: 10   Global Step: 175000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:38,701-Speed 3271.29 samples/sec   Loss 1.5304   LearningRate 0.0226   Epoch: 10   Global Step: 175010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:41,808-Speed 3296.96 samples/sec   Loss 1.6036   LearningRate 0.0226   Epoch: 10   Global Step: 175020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:44,896-Speed 3316.27 samples/sec   Loss 1.5377   LearningRate 0.0226   Epoch: 10   Global Step: 175030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:48,008-Speed 3290.97 samples/sec   Loss 1.5159   LearningRate 0.0226   Epoch: 10   Global Step: 175040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:45:51,095-Speed 3317.98 samples/sec   Loss 1.5552   LearningRate 0.0226   Epoch: 10   Global Step: 175050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:45:54,174-Speed 3326.65 samples/sec   Loss 1.5082   LearningRate 0.0226   Epoch: 10   Global Step: 175060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:45:57,252-Speed 3327.31 samples/sec   Loss 1.5557   LearningRate 0.0226   Epoch: 10   Global Step: 175070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:00,338-Speed 3318.88 samples/sec   Loss 1.6094   LearningRate 0.0226   Epoch: 10   Global Step: 175080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:03,451-Speed 3291.07 samples/sec   Loss 1.5609   LearningRate 0.0226   Epoch: 10   Global Step: 175090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:06,561-Speed 3293.73 samples/sec   Loss 1.5369   LearningRate 0.0226   Epoch: 10   Global Step: 175100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:09,828-Speed 3135.24 samples/sec   Loss 1.5602   LearningRate 0.0226   Epoch: 10   Global Step: 175110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:13,012-Speed 3216.02 samples/sec   Loss 1.5638   LearningRate 0.0226   Epoch: 10   Global Step: 175120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:16,099-Speed 3318.03 samples/sec   Loss 1.4994   LearningRate 0.0226   Epoch: 10   Global Step: 175130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:19,185-Speed 3318.86 samples/sec   Loss 1.5314   LearningRate 0.0226   Epoch: 10   Global Step: 175140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:22,320-Speed 3267.82 samples/sec   Loss 1.5646   LearningRate 0.0226   Epoch: 10   Global Step: 175150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:25,447-Speed 3274.68 samples/sec   Loss 1.5645   LearningRate 0.0226   Epoch: 10   Global Step: 175160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:28,544-Speed 3308.07 samples/sec   Loss 1.5305   LearningRate 0.0226   Epoch: 10   Global Step: 175170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:46:31,629-Speed 3320.02 samples/sec   Loss 1.5802   LearningRate 0.0226   Epoch: 10   Global Step: 175180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:34,720-Speed 3313.19 samples/sec   Loss 1.5922   LearningRate 0.0226   Epoch: 10   Global Step: 175190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:37,814-Speed 3311.07 samples/sec   Loss 1.5078   LearningRate 0.0226   Epoch: 10   Global Step: 175200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:40,892-Speed 3327.31 samples/sec   Loss 1.5459   LearningRate 0.0226   Epoch: 10   Global Step: 175210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:44,013-Speed 3281.87 samples/sec   Loss 1.5541   LearningRate 0.0226   Epoch: 10   Global Step: 175220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:47,115-Speed 3301.15 samples/sec   Loss 1.5389   LearningRate 0.0226   Epoch: 10   Global Step: 175230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:50,195-Speed 3325.65 samples/sec   Loss 1.5616   LearningRate 0.0226   Epoch: 10   Global Step: 175240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:53,314-Speed 3284.18 samples/sec   Loss 1.5166   LearningRate 0.0226   Epoch: 10   Global Step: 175250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:56,447-Speed 3269.47 samples/sec   Loss 1.5882   LearningRate 0.0226   Epoch: 10   Global Step: 175260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:46:59,521-Speed 3331.26 samples/sec   Loss 1.5726   LearningRate 0.0226   Epoch: 10   Global Step: 175270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:47:02,657-Speed 3266.56 samples/sec   Loss 1.4880   LearningRate 0.0226   Epoch: 10   Global Step: 175280   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-04-11 17:47:05,730-Speed 3332.71 samples/sec   Loss 1.5325   LearningRate 0.0226   Epoch: 10   Global Step: 175290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:47:08,827-Speed 3307.17 samples/sec   Loss 1.5902   LearningRate 0.0225   Epoch: 10   Global Step: 175300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:47:11,900-Speed 3333.17 samples/sec   Loss 1.5313   LearningRate 0.0225   Epoch: 10   Global Step: 175310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:47:14,975-Speed 3330.35 samples/sec   Loss 1.5304   LearningRate 0.0225   Epoch: 10   Global Step: 175320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:47:18,057-Speed 3323.86 samples/sec   Loss 1.5060   LearningRate 0.0225   Epoch: 10   Global Step: 175330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:47:21,181-Speed 3277.80 samples/sec   Loss 1.5374   LearningRate 0.0225   Epoch: 10   Global Step: 175340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:47:24,330-Speed 3253.12 samples/sec   Loss 1.5674   LearningRate 0.0225   Epoch: 10   Global Step: 175350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:47:27,452-Speed 3281.45 samples/sec   Loss 1.5273   LearningRate 0.0225   Epoch: 10   Global Step: 175360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:47:30,572-Speed 3282.62 samples/sec   Loss 1.5095   LearningRate 0.0225   Epoch: 10   Global Step: 175370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:47:33,662-Speed 3314.63 samples/sec   Loss 1.5320   LearningRate 0.0225   Epoch: 10   Global Step: 175380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:47:36,740-Speed 3327.66 samples/sec   Loss 1.5394   LearningRate 0.0225   Epoch: 10   Global Step: 175390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:47:39,820-Speed 3324.98 samples/sec   Loss 1.5968   LearningRate 0.0225   Epoch: 10   Global Step: 175400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:47:42,897-Speed 3328.91 samples/sec   Loss 1.5511   LearningRate 0.0225   Epoch: 10   Global Step: 175410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:47:45,972-Speed 3330.49 samples/sec   Loss 1.5681   LearningRate 0.0225   Epoch: 10   Global Step: 175420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:47:49,076-Speed 3299.60 samples/sec   Loss 1.5497   LearningRate 0.0225   Epoch: 10   Global Step: 175430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:47:52,150-Speed 3332.24 samples/sec   Loss 1.5052   LearningRate 0.0225   Epoch: 10   Global Step: 175440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:47:55,225-Speed 3331.21 samples/sec   Loss 1.5973   LearningRate 0.0225   Epoch: 10   Global Step: 175450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:47:58,298-Speed 3333.54 samples/sec   Loss 1.5101   LearningRate 0.0225   Epoch: 10   Global Step: 175460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:48:01,388-Speed 3314.24 samples/sec   Loss 1.5330   LearningRate 0.0225   Epoch: 10   Global Step: 175470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:04,493-Speed 3298.59 samples/sec   Loss 1.5919   LearningRate 0.0225   Epoch: 10   Global Step: 175480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:07,568-Speed 3330.99 samples/sec   Loss 1.5218   LearningRate 0.0225   Epoch: 10   Global Step: 175490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:10,643-Speed 3330.65 samples/sec   Loss 1.6634   LearningRate 0.0225   Epoch: 10   Global Step: 175500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:13,738-Speed 3309.73 samples/sec   Loss 1.5563   LearningRate 0.0225   Epoch: 10   Global Step: 175510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:16,985-Speed 3154.34 samples/sec   Loss 1.5327   LearningRate 0.0225   Epoch: 10   Global Step: 175520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:20,111-Speed 3276.65 samples/sec   Loss 1.5486   LearningRate 0.0225   Epoch: 10   Global Step: 175530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:23,194-Speed 3322.50 samples/sec   Loss 1.5029   LearningRate 0.0225   Epoch: 10   Global Step: 175540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:26,313-Speed 3283.66 samples/sec   Loss 1.4802   LearningRate 0.0225   Epoch: 10   Global Step: 175550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:29,387-Speed 3332.17 samples/sec   Loss 1.5118   LearningRate 0.0225   Epoch: 10   Global Step: 175560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:32,453-Speed 3340.34 samples/sec   Loss 1.5755   LearningRate 0.0225   Epoch: 10   Global Step: 175570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:35,539-Speed 3318.88 samples/sec   Loss 1.5725   LearningRate 0.0225   Epoch: 10   Global Step: 175580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:38,625-Speed 3319.53 samples/sec   Loss 1.5714   LearningRate 0.0225   Epoch: 10   Global Step: 175590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:41,707-Speed 3322.81 samples/sec   Loss 1.5889   LearningRate 0.0225   Epoch: 10   Global Step: 175600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:44,797-Speed 3314.81 samples/sec   Loss 1.5174   LearningRate 0.0225   Epoch: 10   Global Step: 175610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:47,887-Speed 3314.84 samples/sec   Loss 1.5220   LearningRate 0.0225   Epoch: 10   Global Step: 175620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:51,093-Speed 3195.00 samples/sec   Loss 1.5673   LearningRate 0.0225   Epoch: 10   Global Step: 175630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:54,210-Speed 3286.26 samples/sec   Loss 1.5352   LearningRate 0.0225   Epoch: 10   Global Step: 175640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:48:57,287-Speed 3328.44 samples/sec   Loss 1.5652   LearningRate 0.0225   Epoch: 10   Global Step: 175650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:49:00,380-Speed 3311.29 samples/sec   Loss 1.5164   LearningRate 0.0224   Epoch: 10   Global Step: 175660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:49:03,453-Speed 3332.73 samples/sec   Loss 1.5898   LearningRate 0.0224   Epoch: 10   Global Step: 175670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:06,581-Speed 3275.04 samples/sec   Loss 1.5728   LearningRate 0.0224   Epoch: 10   Global Step: 175680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:09,706-Speed 3278.14 samples/sec   Loss 1.5688   LearningRate 0.0224   Epoch: 10   Global Step: 175690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:12,798-Speed 3311.96 samples/sec   Loss 1.5271   LearningRate 0.0224   Epoch: 10   Global Step: 175700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:15,940-Speed 3260.46 samples/sec   Loss 1.5321   LearningRate 0.0224   Epoch: 10   Global Step: 175710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:19,071-Speed 3270.94 samples/sec   Loss 1.5655   LearningRate 0.0224   Epoch: 10   Global Step: 175720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:22,252-Speed 3220.31 samples/sec   Loss 1.5698   LearningRate 0.0224   Epoch: 10   Global Step: 175730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:25,404-Speed 3248.98 samples/sec   Loss 1.5311   LearningRate 0.0224   Epoch: 10   Global Step: 175740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:28,502-Speed 3306.36 samples/sec   Loss 1.5470   LearningRate 0.0224   Epoch: 10   Global Step: 175750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:31,584-Speed 3322.68 samples/sec   Loss 1.5356   LearningRate 0.0224   Epoch: 10   Global Step: 175760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:34,674-Speed 3315.49 samples/sec   Loss 1.5383   LearningRate 0.0224   Epoch: 10   Global Step: 175770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:49:37,751-Speed 3328.24 samples/sec   Loss 1.6621   LearningRate 0.0224   Epoch: 10   Global Step: 175780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:49:40,833-Speed 3324.03 samples/sec   Loss 1.5478   LearningRate 0.0224   Epoch: 10   Global Step: 175790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:49:43,913-Speed 3325.25 samples/sec   Loss 1.6002   LearningRate 0.0224   Epoch: 10   Global Step: 175800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:49:47,010-Speed 3307.30 samples/sec   Loss 1.4812   LearningRate 0.0224   Epoch: 10   Global Step: 175810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:49:50,120-Speed 3293.59 samples/sec   Loss 1.5608   LearningRate 0.0224   Epoch: 10   Global Step: 175820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:49:53,195-Speed 3330.18 samples/sec   Loss 1.5532   LearningRate 0.0224   Epoch: 10   Global Step: 175830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:49:56,292-Speed 3307.30 samples/sec   Loss 1.5488   LearningRate 0.0224   Epoch: 10   Global Step: 175840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:49:59,385-Speed 3311.06 samples/sec   Loss 1.5884   LearningRate 0.0224   Epoch: 10   Global Step: 175850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:50:02,483-Speed 3306.46 samples/sec   Loss 1.5731   LearningRate 0.0224   Epoch: 10   Global Step: 175860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:05,571-Speed 3317.89 samples/sec   Loss 1.5964   LearningRate 0.0224   Epoch: 10   Global Step: 175870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:08,728-Speed 3243.91 samples/sec   Loss 1.5906   LearningRate 0.0224   Epoch: 10   Global Step: 175880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:11,816-Speed 3317.14 samples/sec   Loss 1.5621   LearningRate 0.0224   Epoch: 10   Global Step: 175890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:14,926-Speed 3292.81 samples/sec   Loss 1.5956   LearningRate 0.0224   Epoch: 10   Global Step: 175900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:18,079-Speed 3248.63 samples/sec   Loss 1.5534   LearningRate 0.0224   Epoch: 10   Global Step: 175910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:21,185-Speed 3297.83 samples/sec   Loss 1.5449   LearningRate 0.0224   Epoch: 10   Global Step: 175920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:24,287-Speed 3301.72 samples/sec   Loss 1.5848   LearningRate 0.0224   Epoch: 10   Global Step: 175930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:27,381-Speed 3310.47 samples/sec   Loss 1.5397   LearningRate 0.0224   Epoch: 10   Global Step: 175940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:30,473-Speed 3312.61 samples/sec   Loss 1.5698   LearningRate 0.0224   Epoch: 10   Global Step: 175950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:50:33,561-Speed 3316.27 samples/sec   Loss 1.5277   LearningRate 0.0224   Epoch: 10   Global Step: 175960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:50:36,637-Speed 3330.36 samples/sec   Loss 1.5390   LearningRate 0.0224   Epoch: 10   Global Step: 175970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:50:39,717-Speed 3325.65 samples/sec   Loss 1.5441   LearningRate 0.0224   Epoch: 10   Global Step: 175980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:50:42,795-Speed 3326.81 samples/sec   Loss 1.5584   LearningRate 0.0224   Epoch: 10   Global Step: 175990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:50:45,915-Speed 3283.34 samples/sec   Loss 1.6196   LearningRate 0.0224   Epoch: 10   Global Step: 176000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:51:29,901-[lfw][176000]XNorm: 22.923849
Training: 2022-04-11 17:51:29,902-[lfw][176000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-04-11 17:51:29,902-[lfw][176000]Accuracy-Highest: 0.99817
Training: 2022-04-11 17:52:20,760-[cfp_fp][176000]XNorm: 22.563153
Training: 2022-04-11 17:52:20,761-[cfp_fp][176000]Accuracy-Flip: 0.98900+-0.00429
Training: 2022-04-11 17:52:20,761-[cfp_fp][176000]Accuracy-Highest: 0.98971
Training: 2022-04-11 17:53:04,580-[agedb_30][176000]XNorm: 23.380471
Training: 2022-04-11 17:53:04,581-[agedb_30][176000]Accuracy-Flip: 0.98317+-0.00621
Training: 2022-04-11 17:53:04,581-[agedb_30][176000]Accuracy-Highest: 0.98450
Training: 2022-04-11 17:53:07,688-Speed 72.23 samples/sec   Loss 1.5798   LearningRate 0.0223   Epoch: 10   Global Step: 176010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:53:10,842-Speed 3247.13 samples/sec   Loss 1.5773   LearningRate 0.0223   Epoch: 10   Global Step: 176020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:53:13,927-Speed 3320.09 samples/sec   Loss 1.5639   LearningRate 0.0223   Epoch: 10   Global Step: 176030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:53:17,039-Speed 3291.73 samples/sec   Loss 1.5475   LearningRate 0.0223   Epoch: 10   Global Step: 176040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:53:20,164-Speed 3277.58 samples/sec   Loss 1.6111   LearningRate 0.0223   Epoch: 10   Global Step: 176050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:53:23,235-Speed 3334.78 samples/sec   Loss 1.5349   LearningRate 0.0223   Epoch: 10   Global Step: 176060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:26,323-Speed 3317.16 samples/sec   Loss 1.5371   LearningRate 0.0223   Epoch: 10   Global Step: 176070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:29,424-Speed 3303.42 samples/sec   Loss 1.5552   LearningRate 0.0223   Epoch: 10   Global Step: 176080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:32,574-Speed 3250.87 samples/sec   Loss 1.5935   LearningRate 0.0223   Epoch: 10   Global Step: 176090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:35,699-Speed 3277.52 samples/sec   Loss 1.5496   LearningRate 0.0223   Epoch: 10   Global Step: 176100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:38,788-Speed 3316.21 samples/sec   Loss 1.6334   LearningRate 0.0223   Epoch: 10   Global Step: 176110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:41,926-Speed 3264.24 samples/sec   Loss 1.5168   LearningRate 0.0223   Epoch: 10   Global Step: 176120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:45,044-Speed 3284.42 samples/sec   Loss 1.5419   LearningRate 0.0223   Epoch: 10   Global Step: 176130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:48,180-Speed 3266.37 samples/sec   Loss 1.5072   LearningRate 0.0223   Epoch: 10   Global Step: 176140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:51,271-Speed 3313.03 samples/sec   Loss 1.6038   LearningRate 0.0223   Epoch: 10   Global Step: 176150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:54,340-Speed 3338.56 samples/sec   Loss 1.5992   LearningRate 0.0223   Epoch: 10   Global Step: 176160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:53:57,416-Speed 3329.09 samples/sec   Loss 1.4829   LearningRate 0.0223   Epoch: 10   Global Step: 176170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:00,508-Speed 3313.21 samples/sec   Loss 1.5664   LearningRate 0.0223   Epoch: 10   Global Step: 176180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:03,582-Speed 3331.75 samples/sec   Loss 1.6159   LearningRate 0.0223   Epoch: 10   Global Step: 176190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:06,675-Speed 3311.53 samples/sec   Loss 1.5468   LearningRate 0.0223   Epoch: 10   Global Step: 176200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:09,904-Speed 3171.41 samples/sec   Loss 1.5998   LearningRate 0.0223   Epoch: 10   Global Step: 176210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:13,046-Speed 3260.47 samples/sec   Loss 1.5858   LearningRate 0.0223   Epoch: 10   Global Step: 176220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:16,116-Speed 3335.83 samples/sec   Loss 1.5960   LearningRate 0.0223   Epoch: 10   Global Step: 176230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:19,227-Speed 3292.69 samples/sec   Loss 1.5291   LearningRate 0.0223   Epoch: 10   Global Step: 176240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:22,310-Speed 3322.53 samples/sec   Loss 1.5990   LearningRate 0.0223   Epoch: 10   Global Step: 176250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:25,385-Speed 3330.61 samples/sec   Loss 1.5199   LearningRate 0.0223   Epoch: 10   Global Step: 176260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:28,466-Speed 3324.71 samples/sec   Loss 1.5281   LearningRate 0.0223   Epoch: 10   Global Step: 176270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:31,544-Speed 3326.82 samples/sec   Loss 1.5447   LearningRate 0.0223   Epoch: 10   Global Step: 176280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:34,627-Speed 3322.34 samples/sec   Loss 1.5384   LearningRate 0.0223   Epoch: 10   Global Step: 176290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:37,700-Speed 3333.04 samples/sec   Loss 1.5442   LearningRate 0.0223   Epoch: 10   Global Step: 176300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:40,775-Speed 3330.25 samples/sec   Loss 1.5290   LearningRate 0.0223   Epoch: 10   Global Step: 176310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:43,860-Speed 3320.84 samples/sec   Loss 1.5816   LearningRate 0.0223   Epoch: 10   Global Step: 176320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:46,939-Speed 3326.36 samples/sec   Loss 1.5686   LearningRate 0.0223   Epoch: 10   Global Step: 176330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:54:50,027-Speed 3317.52 samples/sec   Loss 1.5549   LearningRate 0.0223   Epoch: 10   Global Step: 176340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:53,110-Speed 3321.67 samples/sec   Loss 1.5725   LearningRate 0.0223   Epoch: 10   Global Step: 176350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:56,195-Speed 3319.84 samples/sec   Loss 1.5804   LearningRate 0.0222   Epoch: 10   Global Step: 176360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:54:59,282-Speed 3318.53 samples/sec   Loss 1.5548   LearningRate 0.0222   Epoch: 10   Global Step: 176370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:55:02,387-Speed 3298.02 samples/sec   Loss 1.5238   LearningRate 0.0222   Epoch: 10   Global Step: 176380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:05,465-Speed 3328.22 samples/sec   Loss 1.5740   LearningRate 0.0222   Epoch: 10   Global Step: 176390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:08,560-Speed 3309.15 samples/sec   Loss 1.5971   LearningRate 0.0222   Epoch: 10   Global Step: 176400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:11,633-Speed 3332.29 samples/sec   Loss 1.5207   LearningRate 0.0222   Epoch: 10   Global Step: 176410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:14,714-Speed 3324.35 samples/sec   Loss 1.5553   LearningRate 0.0222   Epoch: 10   Global Step: 176420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:17,796-Speed 3325.07 samples/sec   Loss 1.5741   LearningRate 0.0222   Epoch: 10   Global Step: 176430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:20,864-Speed 3338.22 samples/sec   Loss 1.5566   LearningRate 0.0222   Epoch: 10   Global Step: 176440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:23,967-Speed 3300.24 samples/sec   Loss 1.5653   LearningRate 0.0222   Epoch: 10   Global Step: 176450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:27,055-Speed 3316.72 samples/sec   Loss 1.5647   LearningRate 0.0222   Epoch: 10   Global Step: 176460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:30,126-Speed 3336.26 samples/sec   Loss 1.5427   LearningRate 0.0222   Epoch: 10   Global Step: 176470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:55:33,198-Speed 3334.01 samples/sec   Loss 1.5702   LearningRate 0.0222   Epoch: 10   Global Step: 176480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:55:36,283-Speed 3319.66 samples/sec   Loss 1.6373   LearningRate 0.0222   Epoch: 10   Global Step: 176490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:55:39,402-Speed 3283.30 samples/sec   Loss 1.5098   LearningRate 0.0222   Epoch: 10   Global Step: 176500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:55:42,484-Speed 3323.18 samples/sec   Loss 1.5651   LearningRate 0.0222   Epoch: 10   Global Step: 176510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:55:45,640-Speed 3245.82 samples/sec   Loss 1.6292   LearningRate 0.0222   Epoch: 10   Global Step: 176520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:55:48,785-Speed 3256.79 samples/sec   Loss 1.5606   LearningRate 0.0222   Epoch: 10   Global Step: 176530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:55:51,879-Speed 3310.31 samples/sec   Loss 1.5744   LearningRate 0.0222   Epoch: 10   Global Step: 176540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:55:54,960-Speed 3324.62 samples/sec   Loss 1.5794   LearningRate 0.0222   Epoch: 10   Global Step: 176550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:55:58,029-Speed 3337.42 samples/sec   Loss 1.5907   LearningRate 0.0222   Epoch: 10   Global Step: 176560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:56:01,099-Speed 3335.90 samples/sec   Loss 1.5714   LearningRate 0.0222   Epoch: 10   Global Step: 176570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:56:04,171-Speed 3335.04 samples/sec   Loss 1.5689   LearningRate 0.0222   Epoch: 10   Global Step: 176580   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-04-11 17:56:07,281-Speed 3292.61 samples/sec   Loss 1.5230   LearningRate 0.0222   Epoch: 10   Global Step: 176590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:56:10,362-Speed 3324.70 samples/sec   Loss 1.5536   LearningRate 0.0222   Epoch: 10   Global Step: 176600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:56:13,434-Speed 3334.87 samples/sec   Loss 1.5317   LearningRate 0.0222   Epoch: 10   Global Step: 176610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:56:16,506-Speed 3333.76 samples/sec   Loss 1.6087   LearningRate 0.0222   Epoch: 10   Global Step: 176620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:56:19,583-Speed 3329.33 samples/sec   Loss 1.5351   LearningRate 0.0222   Epoch: 10   Global Step: 176630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:56:22,659-Speed 3329.69 samples/sec   Loss 1.5800   LearningRate 0.0222   Epoch: 10   Global Step: 176640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:56:25,736-Speed 3328.14 samples/sec   Loss 1.5160   LearningRate 0.0222   Epoch: 10   Global Step: 176650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:28,856-Speed 3282.30 samples/sec   Loss 1.5617   LearningRate 0.0222   Epoch: 10   Global Step: 176660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:32,045-Speed 3212.17 samples/sec   Loss 1.5207   LearningRate 0.0222   Epoch: 10   Global Step: 176670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:35,201-Speed 3245.26 samples/sec   Loss 1.5761   LearningRate 0.0222   Epoch: 10   Global Step: 176680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:38,288-Speed 3318.18 samples/sec   Loss 1.5185   LearningRate 0.0222   Epoch: 10   Global Step: 176690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:41,372-Speed 3320.80 samples/sec   Loss 1.5380   LearningRate 0.0222   Epoch: 10   Global Step: 176700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:44,458-Speed 3319.87 samples/sec   Loss 1.5288   LearningRate 0.0222   Epoch: 10   Global Step: 176710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:47,545-Speed 3318.01 samples/sec   Loss 1.5275   LearningRate 0.0221   Epoch: 10   Global Step: 176720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:50,624-Speed 3326.25 samples/sec   Loss 1.6209   LearningRate 0.0221   Epoch: 10   Global Step: 176730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:53,775-Speed 3251.35 samples/sec   Loss 1.5267   LearningRate 0.0221   Epoch: 10   Global Step: 176740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:56:56,905-Speed 3272.24 samples/sec   Loss 1.5842   LearningRate 0.0221   Epoch: 10   Global Step: 176750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:57:00,026-Speed 3281.70 samples/sec   Loss 1.5763   LearningRate 0.0221   Epoch: 10   Global Step: 176760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:57:03,104-Speed 3327.38 samples/sec   Loss 1.5785   LearningRate 0.0221   Epoch: 10   Global Step: 176770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:57:06,173-Speed 3337.55 samples/sec   Loss 1.6368   LearningRate 0.0221   Epoch: 10   Global Step: 176780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:57:09,232-Speed 3348.35 samples/sec   Loss 1.6046   LearningRate 0.0221   Epoch: 10   Global Step: 176790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:12,332-Speed 3303.71 samples/sec   Loss 1.5282   LearningRate 0.0221   Epoch: 10   Global Step: 176800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:15,445-Speed 3290.18 samples/sec   Loss 1.5446   LearningRate 0.0221   Epoch: 10   Global Step: 176810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:18,571-Speed 3276.86 samples/sec   Loss 1.5471   LearningRate 0.0221   Epoch: 10   Global Step: 176820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:21,654-Speed 3322.72 samples/sec   Loss 1.5899   LearningRate 0.0221   Epoch: 10   Global Step: 176830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:24,726-Speed 3333.68 samples/sec   Loss 1.5469   LearningRate 0.0221   Epoch: 10   Global Step: 176840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:27,841-Speed 3288.27 samples/sec   Loss 1.5584   LearningRate 0.0221   Epoch: 10   Global Step: 176850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:30,959-Speed 3285.21 samples/sec   Loss 1.5805   LearningRate 0.0221   Epoch: 10   Global Step: 176860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:34,051-Speed 3312.68 samples/sec   Loss 1.5752   LearningRate 0.0221   Epoch: 10   Global Step: 176870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:37,124-Speed 3333.03 samples/sec   Loss 1.5684   LearningRate 0.0221   Epoch: 10   Global Step: 176880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:57:40,197-Speed 3332.45 samples/sec   Loss 1.5219   LearningRate 0.0221   Epoch: 10   Global Step: 176890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:57:43,307-Speed 3293.20 samples/sec   Loss 1.5597   LearningRate 0.0221   Epoch: 10   Global Step: 176900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:57:46,386-Speed 3326.97 samples/sec   Loss 1.6218   LearningRate 0.0221   Epoch: 10   Global Step: 176910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:57:49,458-Speed 3334.53 samples/sec   Loss 1.5013   LearningRate 0.0221   Epoch: 10   Global Step: 176920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:57:52,535-Speed 3328.64 samples/sec   Loss 1.5941   LearningRate 0.0221   Epoch: 10   Global Step: 176930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:57:55,633-Speed 3305.58 samples/sec   Loss 1.5674   LearningRate 0.0221   Epoch: 10   Global Step: 176940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:57:58,801-Speed 3233.88 samples/sec   Loss 1.5907   LearningRate 0.0221   Epoch: 10   Global Step: 176950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:58:01,870-Speed 3337.70 samples/sec   Loss 1.6119   LearningRate 0.0221   Epoch: 10   Global Step: 176960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:58:04,975-Speed 3299.27 samples/sec   Loss 1.5668   LearningRate 0.0221   Epoch: 10   Global Step: 176970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:58:08,053-Speed 3327.37 samples/sec   Loss 1.5861   LearningRate 0.0221   Epoch: 10   Global Step: 176980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:58:11,159-Speed 3297.77 samples/sec   Loss 1.5750   LearningRate 0.0221   Epoch: 10   Global Step: 176990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:58:14,236-Speed 3328.71 samples/sec   Loss 1.5570   LearningRate 0.0221   Epoch: 10   Global Step: 177000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:58:17,309-Speed 3333.57 samples/sec   Loss 1.6150   LearningRate 0.0221   Epoch: 10   Global Step: 177010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:58:20,378-Speed 3336.53 samples/sec   Loss 1.6217   LearningRate 0.0221   Epoch: 10   Global Step: 177020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:58:23,450-Speed 3334.19 samples/sec   Loss 1.5637   LearningRate 0.0221   Epoch: 10   Global Step: 177030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:58:26,529-Speed 3327.67 samples/sec   Loss 1.5421   LearningRate 0.0221   Epoch: 10   Global Step: 177040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:58:29,600-Speed 3335.21 samples/sec   Loss 1.5606   LearningRate 0.0221   Epoch: 10   Global Step: 177050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:58:32,672-Speed 3333.88 samples/sec   Loss 1.5484   LearningRate 0.0221   Epoch: 10   Global Step: 177060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:58:35,753-Speed 3324.62 samples/sec   Loss 1.5688   LearningRate 0.0220   Epoch: 10   Global Step: 177070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:58:38,817-Speed 3342.29 samples/sec   Loss 1.5807   LearningRate 0.0220   Epoch: 10   Global Step: 177080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:58:41,891-Speed 3331.62 samples/sec   Loss 1.5433   LearningRate 0.0220   Epoch: 10   Global Step: 177090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:58:44,969-Speed 3327.60 samples/sec   Loss 1.5961   LearningRate 0.0220   Epoch: 10   Global Step: 177100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:58:48,076-Speed 3296.76 samples/sec   Loss 1.5554   LearningRate 0.0220   Epoch: 10   Global Step: 177110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:58:51,273-Speed 3204.63 samples/sec   Loss 1.6369   LearningRate 0.0220   Epoch: 10   Global Step: 177120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:58:54,429-Speed 3245.53 samples/sec   Loss 1.5477   LearningRate 0.0220   Epoch: 10   Global Step: 177130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:58:57,527-Speed 3305.40 samples/sec   Loss 1.6030   LearningRate 0.0220   Epoch: 10   Global Step: 177140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:59:00,609-Speed 3322.95 samples/sec   Loss 1.5541   LearningRate 0.0220   Epoch: 10   Global Step: 177150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:59:03,685-Speed 3330.51 samples/sec   Loss 1.6020   LearningRate 0.0220   Epoch: 10   Global Step: 177160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:59:06,755-Speed 3336.14 samples/sec   Loss 1.5680   LearningRate 0.0220   Epoch: 10   Global Step: 177170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:59:09,828-Speed 3332.88 samples/sec   Loss 1.6356   LearningRate 0.0220   Epoch: 10   Global Step: 177180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 17:59:12,915-Speed 3318.47 samples/sec   Loss 1.5678   LearningRate 0.0220   Epoch: 10   Global Step: 177190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:15,991-Speed 3329.22 samples/sec   Loss 1.5447   LearningRate 0.0220   Epoch: 10   Global Step: 177200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:19,075-Speed 3321.14 samples/sec   Loss 1.6237   LearningRate 0.0220   Epoch: 10   Global Step: 177210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:22,159-Speed 3321.78 samples/sec   Loss 1.5938   LearningRate 0.0220   Epoch: 10   Global Step: 177220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:25,269-Speed 3293.27 samples/sec   Loss 1.5533   LearningRate 0.0220   Epoch: 10   Global Step: 177230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:28,343-Speed 3332.37 samples/sec   Loss 1.5740   LearningRate 0.0220   Epoch: 10   Global Step: 177240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:31,460-Speed 3285.88 samples/sec   Loss 1.5331   LearningRate 0.0220   Epoch: 10   Global Step: 177250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:34,534-Speed 3331.27 samples/sec   Loss 1.5652   LearningRate 0.0220   Epoch: 10   Global Step: 177260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:37,609-Speed 3331.53 samples/sec   Loss 1.5712   LearningRate 0.0220   Epoch: 10   Global Step: 177270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:40,682-Speed 3332.77 samples/sec   Loss 1.5243   LearningRate 0.0220   Epoch: 10   Global Step: 177280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 17:59:43,836-Speed 3247.00 samples/sec   Loss 1.5284   LearningRate 0.0220   Epoch: 10   Global Step: 177290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:59:46,913-Speed 3330.57 samples/sec   Loss 1.5752   LearningRate 0.0220   Epoch: 10   Global Step: 177300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:59:50,028-Speed 3288.44 samples/sec   Loss 1.5785   LearningRate 0.0220   Epoch: 10   Global Step: 177310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:59:53,150-Speed 3280.30 samples/sec   Loss 1.5568   LearningRate 0.0220   Epoch: 10   Global Step: 177320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:59:56,253-Speed 3301.16 samples/sec   Loss 1.5867   LearningRate 0.0220   Epoch: 10   Global Step: 177330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 17:59:59,331-Speed 3327.08 samples/sec   Loss 1.6281   LearningRate 0.0220   Epoch: 10   Global Step: 177340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:02,413-Speed 3323.01 samples/sec   Loss 1.6210   LearningRate 0.0220   Epoch: 10   Global Step: 177350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:05,501-Speed 3317.65 samples/sec   Loss 1.5993   LearningRate 0.0220   Epoch: 10   Global Step: 177360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:08,595-Speed 3310.03 samples/sec   Loss 1.5862   LearningRate 0.0220   Epoch: 10   Global Step: 177370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:11,698-Speed 3300.32 samples/sec   Loss 1.5329   LearningRate 0.0220   Epoch: 10   Global Step: 177380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:14,835-Speed 3265.45 samples/sec   Loss 1.5695   LearningRate 0.0220   Epoch: 10   Global Step: 177390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:17,949-Speed 3289.40 samples/sec   Loss 1.5763   LearningRate 0.0220   Epoch: 10   Global Step: 177400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:21,035-Speed 3319.04 samples/sec   Loss 1.5782   LearningRate 0.0220   Epoch: 10   Global Step: 177410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:24,217-Speed 3218.97 samples/sec   Loss 1.5762   LearningRate 0.0220   Epoch: 10   Global Step: 177420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:27,364-Speed 3254.36 samples/sec   Loss 1.5892   LearningRate 0.0219   Epoch: 10   Global Step: 177430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:30,452-Speed 3316.84 samples/sec   Loss 1.5779   LearningRate 0.0219   Epoch: 10   Global Step: 177440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:00:33,530-Speed 3328.08 samples/sec   Loss 1.6396   LearningRate 0.0219   Epoch: 10   Global Step: 177450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:00:36,606-Speed 3329.15 samples/sec   Loss 1.6089   LearningRate 0.0219   Epoch: 10   Global Step: 177460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:00:39,729-Speed 3280.52 samples/sec   Loss 1.6167   LearningRate 0.0219   Epoch: 10   Global Step: 177470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:00:42,866-Speed 3264.79 samples/sec   Loss 1.5523   LearningRate 0.0219   Epoch: 10   Global Step: 177480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:00:45,993-Speed 3275.38 samples/sec   Loss 1.5913   LearningRate 0.0219   Epoch: 10   Global Step: 177490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:00:49,061-Speed 3339.11 samples/sec   Loss 1.5570   LearningRate 0.0219   Epoch: 10   Global Step: 177500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:52,209-Speed 3253.63 samples/sec   Loss 1.5618   LearningRate 0.0219   Epoch: 10   Global Step: 177510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:55,310-Speed 3303.01 samples/sec   Loss 1.6085   LearningRate 0.0219   Epoch: 10   Global Step: 177520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:00:58,427-Speed 3286.37 samples/sec   Loss 1.5524   LearningRate 0.0219   Epoch: 10   Global Step: 177530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:01:01,539-Speed 3291.02 samples/sec   Loss 1.5578   LearningRate 0.0219   Epoch: 10   Global Step: 177540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:01:04,617-Speed 3327.00 samples/sec   Loss 1.5770   LearningRate 0.0219   Epoch: 10   Global Step: 177550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:01:07,720-Speed 3301.78 samples/sec   Loss 1.5738   LearningRate 0.0219   Epoch: 10   Global Step: 177560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:01:10,835-Speed 3287.94 samples/sec   Loss 1.5663   LearningRate 0.0219   Epoch: 10   Global Step: 177570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:01:13,929-Speed 3309.84 samples/sec   Loss 1.5891   LearningRate 0.0219   Epoch: 10   Global Step: 177580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:01:17,047-Speed 3285.88 samples/sec   Loss 1.5564   LearningRate 0.0219   Epoch: 10   Global Step: 177590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-11 18:01:20,135-Speed 3316.44 samples/sec   Loss 1.6007   LearningRate 0.0219   Epoch: 10   Global Step: 177600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:01:23,241-Speed 3297.55 samples/sec   Loss 1.6397   LearningRate 0.0219   Epoch: 10   Global Step: 177610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:01:26,380-Speed 3263.42 samples/sec   Loss 1.6052   LearningRate 0.0219   Epoch: 10   Global Step: 177620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:01:29,504-Speed 3278.81 samples/sec   Loss 1.6313   LearningRate 0.0219   Epoch: 10   Global Step: 177630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:01:32,642-Speed 3263.47 samples/sec   Loss 1.5782   LearningRate 0.0219   Epoch: 10   Global Step: 177640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:01:35,793-Speed 3250.97 samples/sec   Loss 1.5898   LearningRate 0.0219   Epoch: 10   Global Step: 177650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:01:38,887-Speed 3310.43 samples/sec   Loss 1.5292   LearningRate 0.0219   Epoch: 10   Global Step: 177660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-11 18:01:41,978-Speed 3313.36 samples/sec   Loss 1.5631   LearningRate 0.0219   Epoch: 10   Global Step: 177670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:01:45,078-Speed 3304.06 samples/sec   Loss 1.5258   LearningRate 0.0219   Epoch: 10   Global Step: 177680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:01:48,238-Speed 3241.66 samples/sec   Loss 1.6150   LearningRate 0.0219   Epoch: 10   Global Step: 177690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:01:51,471-Speed 3167.37 samples/sec   Loss 1.5986   LearningRate 0.0219   Epoch: 10   Global Step: 177700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:01:54,637-Speed 3235.93 samples/sec   Loss 1.5863   LearningRate 0.0219   Epoch: 10   Global Step: 177710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:01:57,762-Speed 3277.49 samples/sec   Loss 1.5926   LearningRate 0.0219   Epoch: 10   Global Step: 177720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:00,838-Speed 3329.51 samples/sec   Loss 1.5531   LearningRate 0.0219   Epoch: 10   Global Step: 177730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:03,913-Speed 3331.94 samples/sec   Loss 1.5296   LearningRate 0.0219   Epoch: 10   Global Step: 177740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:07,002-Speed 3315.68 samples/sec   Loss 1.5172   LearningRate 0.0219   Epoch: 10   Global Step: 177750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:10,117-Speed 3287.13 samples/sec   Loss 1.5622   LearningRate 0.0219   Epoch: 10   Global Step: 177760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:13,318-Speed 3199.57 samples/sec   Loss 1.5455   LearningRate 0.0219   Epoch: 10   Global Step: 177770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:16,426-Speed 3296.36 samples/sec   Loss 1.5336   LearningRate 0.0218   Epoch: 10   Global Step: 177780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:19,503-Speed 3327.90 samples/sec   Loss 1.6006   LearningRate 0.0218   Epoch: 10   Global Step: 177790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:22,650-Speed 3254.50 samples/sec   Loss 1.5852   LearningRate 0.0218   Epoch: 10   Global Step: 177800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:25,872-Speed 3178.82 samples/sec   Loss 1.5857   LearningRate 0.0218   Epoch: 10   Global Step: 177810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:28,970-Speed 3306.90 samples/sec   Loss 1.6072   LearningRate 0.0218   Epoch: 10   Global Step: 177820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:02:32,079-Speed 3294.81 samples/sec   Loss 1.5649   LearningRate 0.0218   Epoch: 10   Global Step: 177830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:35,241-Speed 3238.87 samples/sec   Loss 1.5664   LearningRate 0.0218   Epoch: 10   Global Step: 177840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:38,341-Speed 3304.13 samples/sec   Loss 1.5610   LearningRate 0.0218   Epoch: 10   Global Step: 177850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:41,425-Speed 3321.41 samples/sec   Loss 1.5822   LearningRate 0.0218   Epoch: 10   Global Step: 177860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:44,498-Speed 3332.21 samples/sec   Loss 1.5380   LearningRate 0.0218   Epoch: 10   Global Step: 177870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:47,580-Speed 3323.78 samples/sec   Loss 1.6434   LearningRate 0.0218   Epoch: 10   Global Step: 177880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:50,666-Speed 3318.57 samples/sec   Loss 1.6323   LearningRate 0.0218   Epoch: 10   Global Step: 177890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:53,746-Speed 3326.28 samples/sec   Loss 1.5600   LearningRate 0.0218   Epoch: 10   Global Step: 177900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:56,826-Speed 3325.94 samples/sec   Loss 1.5995   LearningRate 0.0218   Epoch: 10   Global Step: 177910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:02:59,904-Speed 3327.48 samples/sec   Loss 1.5967   LearningRate 0.0218   Epoch: 10   Global Step: 177920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:03:02,990-Speed 3318.83 samples/sec   Loss 1.6013   LearningRate 0.0218   Epoch: 10   Global Step: 177930   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 18:03:06,060-Speed 3336.23 samples/sec   Loss 1.5421   LearningRate 0.0218   Epoch: 10   Global Step: 177940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:03:09,157-Speed 3307.14 samples/sec   Loss 1.5498   LearningRate 0.0218   Epoch: 10   Global Step: 177950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:03:12,234-Speed 3327.92 samples/sec   Loss 1.5757   LearningRate 0.0218   Epoch: 10   Global Step: 177960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:03:15,336-Speed 3302.67 samples/sec   Loss 1.5206   LearningRate 0.0218   Epoch: 10   Global Step: 177970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:03:18,407-Speed 3335.86 samples/sec   Loss 1.5335   LearningRate 0.0218   Epoch: 10   Global Step: 177980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:03:21,562-Speed 3246.55 samples/sec   Loss 1.5782   LearningRate 0.0218   Epoch: 10   Global Step: 177990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:03:24,651-Speed 3314.85 samples/sec   Loss 1.5523   LearningRate 0.0218   Epoch: 10   Global Step: 178000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:04:08,730-[lfw][178000]XNorm: 21.818219
Training: 2022-04-11 18:04:08,731-[lfw][178000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 18:04:08,731-[lfw][178000]Accuracy-Highest: 0.99817
Training: 2022-04-11 18:04:59,396-[cfp_fp][178000]XNorm: 21.690726
Training: 2022-04-11 18:04:59,397-[cfp_fp][178000]Accuracy-Flip: 0.98971+-0.00522
Training: 2022-04-11 18:04:59,397-[cfp_fp][178000]Accuracy-Highest: 0.98971
Training: 2022-04-11 18:05:43,237-[agedb_30][178000]XNorm: 22.382697
Training: 2022-04-11 18:05:43,237-[agedb_30][178000]Accuracy-Flip: 0.98283+-0.00633
Training: 2022-04-11 18:05:43,238-[agedb_30][178000]Accuracy-Highest: 0.98450
Training: 2022-04-11 18:05:46,341-Speed 72.27 samples/sec   Loss 1.5545   LearningRate 0.0218   Epoch: 10   Global Step: 178010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:05:49,521-Speed 3220.95 samples/sec   Loss 1.6011   LearningRate 0.0218   Epoch: 10   Global Step: 178020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:05:52,595-Speed 3331.49 samples/sec   Loss 1.5931   LearningRate 0.0218   Epoch: 10   Global Step: 178030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:05:55,659-Speed 3343.35 samples/sec   Loss 1.5500   LearningRate 0.0218   Epoch: 10   Global Step: 178040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:05:58,721-Speed 3344.80 samples/sec   Loss 1.5831   LearningRate 0.0218   Epoch: 10   Global Step: 178050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:01,778-Speed 3349.67 samples/sec   Loss 1.5929   LearningRate 0.0218   Epoch: 10   Global Step: 178060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:04,875-Speed 3307.65 samples/sec   Loss 1.5358   LearningRate 0.0218   Epoch: 10   Global Step: 178070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:07,983-Speed 3295.13 samples/sec   Loss 1.6141   LearningRate 0.0218   Epoch: 10   Global Step: 178080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:06:11,051-Speed 3338.79 samples/sec   Loss 1.6338   LearningRate 0.0218   Epoch: 10   Global Step: 178090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:06:14,131-Speed 3325.85 samples/sec   Loss 1.5737   LearningRate 0.0218   Epoch: 10   Global Step: 178100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:06:17,308-Speed 3223.26 samples/sec   Loss 1.5692   LearningRate 0.0218   Epoch: 10   Global Step: 178110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:06:20,404-Speed 3308.78 samples/sec   Loss 1.5254   LearningRate 0.0218   Epoch: 10   Global Step: 178120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:06:23,525-Speed 3282.02 samples/sec   Loss 1.5088   LearningRate 0.0218   Epoch: 10   Global Step: 178130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:06:26,583-Speed 3349.20 samples/sec   Loss 1.5831   LearningRate 0.0217   Epoch: 10   Global Step: 178140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:29,653-Speed 3336.78 samples/sec   Loss 1.6245   LearningRate 0.0217   Epoch: 10   Global Step: 178150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:32,736-Speed 3321.55 samples/sec   Loss 1.5518   LearningRate 0.0217   Epoch: 10   Global Step: 178160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:35,859-Speed 3280.09 samples/sec   Loss 1.5390   LearningRate 0.0217   Epoch: 10   Global Step: 178170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:38,930-Speed 3335.34 samples/sec   Loss 1.5172   LearningRate 0.0217   Epoch: 10   Global Step: 178180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:42,019-Speed 3315.92 samples/sec   Loss 1.5740   LearningRate 0.0217   Epoch: 10   Global Step: 178190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:45,123-Speed 3299.47 samples/sec   Loss 1.5436   LearningRate 0.0217   Epoch: 10   Global Step: 178200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:48,243-Speed 3282.71 samples/sec   Loss 1.5319   LearningRate 0.0217   Epoch: 10   Global Step: 178210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:51,366-Speed 3280.67 samples/sec   Loss 1.5399   LearningRate 0.0217   Epoch: 10   Global Step: 178220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:54,478-Speed 3290.90 samples/sec   Loss 1.5261   LearningRate 0.0217   Epoch: 10   Global Step: 178230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:06:57,651-Speed 3229.20 samples/sec   Loss 1.5404   LearningRate 0.0217   Epoch: 10   Global Step: 178240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:07:00,775-Speed 3277.63 samples/sec   Loss 1.5779   LearningRate 0.0217   Epoch: 10   Global Step: 178250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:07:03,946-Speed 3230.48 samples/sec   Loss 1.5970   LearningRate 0.0217   Epoch: 10   Global Step: 178260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:07:07,042-Speed 3308.61 samples/sec   Loss 1.5815   LearningRate 0.0217   Epoch: 10   Global Step: 178270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:07:10,137-Speed 3308.89 samples/sec   Loss 1.5999   LearningRate 0.0217   Epoch: 10   Global Step: 178280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:07:13,230-Speed 3311.88 samples/sec   Loss 1.5993   LearningRate 0.0217   Epoch: 10   Global Step: 178290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:07:16,286-Speed 3351.28 samples/sec   Loss 1.5984   LearningRate 0.0217   Epoch: 10   Global Step: 178300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:19,358-Speed 3333.87 samples/sec   Loss 1.5450   LearningRate 0.0217   Epoch: 10   Global Step: 178310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:22,434-Speed 3330.13 samples/sec   Loss 1.5691   LearningRate 0.0217   Epoch: 10   Global Step: 178320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:25,511-Speed 3329.09 samples/sec   Loss 1.5819   LearningRate 0.0217   Epoch: 10   Global Step: 178330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:28,621-Speed 3293.16 samples/sec   Loss 1.5958   LearningRate 0.0217   Epoch: 10   Global Step: 178340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:31,726-Speed 3299.04 samples/sec   Loss 1.5998   LearningRate 0.0217   Epoch: 10   Global Step: 178350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:34,840-Speed 3288.79 samples/sec   Loss 1.6105   LearningRate 0.0217   Epoch: 10   Global Step: 178360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:37,910-Speed 3337.03 samples/sec   Loss 1.5397   LearningRate 0.0217   Epoch: 10   Global Step: 178370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:40,979-Speed 3336.51 samples/sec   Loss 1.5710   LearningRate 0.0217   Epoch: 10   Global Step: 178380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:44,047-Speed 3338.46 samples/sec   Loss 1.5441   LearningRate 0.0217   Epoch: 10   Global Step: 178390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:07:47,117-Speed 3336.17 samples/sec   Loss 1.5800   LearningRate 0.0217   Epoch: 10   Global Step: 178400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:07:50,207-Speed 3315.32 samples/sec   Loss 1.5987   LearningRate 0.0217   Epoch: 10   Global Step: 178410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:07:53,286-Speed 3326.52 samples/sec   Loss 1.5858   LearningRate 0.0217   Epoch: 10   Global Step: 178420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:07:56,374-Speed 3316.95 samples/sec   Loss 1.6486   LearningRate 0.0217   Epoch: 10   Global Step: 178430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:07:59,446-Speed 3334.28 samples/sec   Loss 1.5666   LearningRate 0.0217   Epoch: 10   Global Step: 178440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:02,516-Speed 3335.90 samples/sec   Loss 1.5639   LearningRate 0.0217   Epoch: 10   Global Step: 178450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:05,649-Speed 3269.58 samples/sec   Loss 1.5784   LearningRate 0.0217   Epoch: 10   Global Step: 178460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:08,838-Speed 3211.44 samples/sec   Loss 1.5304   LearningRate 0.0217   Epoch: 10   Global Step: 178470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:12,026-Speed 3213.14 samples/sec   Loss 1.5348   LearningRate 0.0217   Epoch: 10   Global Step: 178480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:15,174-Speed 3253.58 samples/sec   Loss 1.5579   LearningRate 0.0217   Epoch: 10   Global Step: 178490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:18,244-Speed 3335.82 samples/sec   Loss 1.5665   LearningRate 0.0216   Epoch: 10   Global Step: 178500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:08:21,320-Speed 3330.35 samples/sec   Loss 1.5752   LearningRate 0.0216   Epoch: 10   Global Step: 178510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:08:24,429-Speed 3294.08 samples/sec   Loss 1.5635   LearningRate 0.0216   Epoch: 10   Global Step: 178520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:08:27,545-Speed 3287.34 samples/sec   Loss 1.5904   LearningRate 0.0216   Epoch: 10   Global Step: 178530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:08:30,705-Speed 3241.27 samples/sec   Loss 1.5154   LearningRate 0.0216   Epoch: 10   Global Step: 178540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:08:33,783-Speed 3327.51 samples/sec   Loss 1.5644   LearningRate 0.0216   Epoch: 10   Global Step: 178550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:08:36,844-Speed 3345.75 samples/sec   Loss 1.5761   LearningRate 0.0216   Epoch: 10   Global Step: 178560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:39,921-Speed 3328.79 samples/sec   Loss 1.6104   LearningRate 0.0216   Epoch: 10   Global Step: 178570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:42,994-Speed 3333.65 samples/sec   Loss 1.5127   LearningRate 0.0216   Epoch: 10   Global Step: 178580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:46,128-Speed 3267.96 samples/sec   Loss 1.5582   LearningRate 0.0216   Epoch: 10   Global Step: 178590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:49,229-Speed 3302.65 samples/sec   Loss 1.5639   LearningRate 0.0216   Epoch: 10   Global Step: 178600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:52,335-Speed 3298.65 samples/sec   Loss 1.5951   LearningRate 0.0216   Epoch: 10   Global Step: 178610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:55,407-Speed 3333.34 samples/sec   Loss 1.5960   LearningRate 0.0216   Epoch: 10   Global Step: 178620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:08:58,540-Speed 3269.84 samples/sec   Loss 1.6294   LearningRate 0.0216   Epoch: 10   Global Step: 178630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:09:01,678-Speed 3263.89 samples/sec   Loss 1.6668   LearningRate 0.0216   Epoch: 10   Global Step: 178640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:09:04,910-Speed 3169.12 samples/sec   Loss 1.5789   LearningRate 0.0216   Epoch: 10   Global Step: 178650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:09:07,999-Speed 3314.73 samples/sec   Loss 1.5223   LearningRate 0.0216   Epoch: 10   Global Step: 178660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:09:11,065-Speed 3340.77 samples/sec   Loss 1.5770   LearningRate 0.0216   Epoch: 10   Global Step: 178670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:09:14,124-Speed 3348.17 samples/sec   Loss 1.5694   LearningRate 0.0216   Epoch: 10   Global Step: 178680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:17,207-Speed 3322.27 samples/sec   Loss 1.4979   LearningRate 0.0216   Epoch: 10   Global Step: 178690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:20,299-Speed 3313.65 samples/sec   Loss 1.6132   LearningRate 0.0216   Epoch: 10   Global Step: 178700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:23,416-Speed 3285.47 samples/sec   Loss 1.5426   LearningRate 0.0216   Epoch: 10   Global Step: 178710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:26,592-Speed 3225.47 samples/sec   Loss 1.5866   LearningRate 0.0216   Epoch: 10   Global Step: 178720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:29,665-Speed 3332.61 samples/sec   Loss 1.5315   LearningRate 0.0216   Epoch: 10   Global Step: 178730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:32,743-Speed 3326.86 samples/sec   Loss 1.5451   LearningRate 0.0216   Epoch: 10   Global Step: 178740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:35,830-Speed 3317.73 samples/sec   Loss 1.5747   LearningRate 0.0216   Epoch: 10   Global Step: 178750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:38,976-Speed 3255.84 samples/sec   Loss 1.5618   LearningRate 0.0216   Epoch: 10   Global Step: 178760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:42,101-Speed 3278.14 samples/sec   Loss 1.5766   LearningRate 0.0216   Epoch: 10   Global Step: 178770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:09:45,257-Speed 3244.89 samples/sec   Loss 1.5910   LearningRate 0.0216   Epoch: 10   Global Step: 178780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:09:48,446-Speed 3212.34 samples/sec   Loss 1.5452   LearningRate 0.0216   Epoch: 10   Global Step: 178790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:09:51,527-Speed 3323.84 samples/sec   Loss 1.5130   LearningRate 0.0216   Epoch: 10   Global Step: 178800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:09:54,622-Speed 3309.83 samples/sec   Loss 1.5622   LearningRate 0.0216   Epoch: 10   Global Step: 178810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:09:57,721-Speed 3304.67 samples/sec   Loss 1.5832   LearningRate 0.0216   Epoch: 10   Global Step: 178820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:00,873-Speed 3249.18 samples/sec   Loss 1.5583   LearningRate 0.0216   Epoch: 10   Global Step: 178830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:03,968-Speed 3310.10 samples/sec   Loss 1.5520   LearningRate 0.0216   Epoch: 10   Global Step: 178840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:07,054-Speed 3318.52 samples/sec   Loss 1.6044   LearningRate 0.0216   Epoch: 10   Global Step: 178850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:10,147-Speed 3311.37 samples/sec   Loss 1.5605   LearningRate 0.0215   Epoch: 10   Global Step: 178860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:13,234-Speed 3318.86 samples/sec   Loss 1.5979   LearningRate 0.0215   Epoch: 10   Global Step: 178870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:16,360-Speed 3276.05 samples/sec   Loss 1.5723   LearningRate 0.0215   Epoch: 10   Global Step: 178880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:10:19,541-Speed 3219.92 samples/sec   Loss 1.6090   LearningRate 0.0215   Epoch: 10   Global Step: 178890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:10:22,620-Speed 3326.77 samples/sec   Loss 1.5614   LearningRate 0.0215   Epoch: 10   Global Step: 178900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:10:25,728-Speed 3296.00 samples/sec   Loss 1.5831   LearningRate 0.0215   Epoch: 10   Global Step: 178910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:10:28,812-Speed 3320.74 samples/sec   Loss 1.5308   LearningRate 0.0215   Epoch: 10   Global Step: 178920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:10:31,929-Speed 3285.72 samples/sec   Loss 1.5963   LearningRate 0.0215   Epoch: 10   Global Step: 178930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:10:35,020-Speed 3313.71 samples/sec   Loss 1.5841   LearningRate 0.0215   Epoch: 10   Global Step: 178940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:10:38,131-Speed 3293.22 samples/sec   Loss 1.5722   LearningRate 0.0215   Epoch: 10   Global Step: 178950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:41,262-Speed 3271.38 samples/sec   Loss 1.6445   LearningRate 0.0215   Epoch: 10   Global Step: 178960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:44,338-Speed 3329.58 samples/sec   Loss 1.6337   LearningRate 0.0215   Epoch: 10   Global Step: 178970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:47,428-Speed 3314.65 samples/sec   Loss 1.5413   LearningRate 0.0215   Epoch: 10   Global Step: 178980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:50,516-Speed 3317.14 samples/sec   Loss 1.5452   LearningRate 0.0215   Epoch: 10   Global Step: 178990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:53,609-Speed 3311.18 samples/sec   Loss 1.6101   LearningRate 0.0215   Epoch: 10   Global Step: 179000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:56,699-Speed 3314.59 samples/sec   Loss 1.5816   LearningRate 0.0215   Epoch: 10   Global Step: 179010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:10:59,786-Speed 3317.34 samples/sec   Loss 1.6072   LearningRate 0.0215   Epoch: 10   Global Step: 179020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:02,942-Speed 3245.93 samples/sec   Loss 1.5759   LearningRate 0.0215   Epoch: 10   Global Step: 179030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:06,063-Speed 3282.01 samples/sec   Loss 1.5941   LearningRate 0.0215   Epoch: 10   Global Step: 179040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:09,161-Speed 3306.34 samples/sec   Loss 1.5420   LearningRate 0.0215   Epoch: 10   Global Step: 179050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:11:12,240-Speed 3326.57 samples/sec   Loss 1.5817   LearningRate 0.0215   Epoch: 10   Global Step: 179060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:11:15,303-Speed 3343.83 samples/sec   Loss 1.5458   LearningRate 0.0215   Epoch: 10   Global Step: 179070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:18,401-Speed 3305.38 samples/sec   Loss 1.6092   LearningRate 0.0215   Epoch: 10   Global Step: 179080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:21,489-Speed 3317.66 samples/sec   Loss 1.5718   LearningRate 0.0215   Epoch: 10   Global Step: 179090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:24,578-Speed 3315.70 samples/sec   Loss 1.5513   LearningRate 0.0215   Epoch: 10   Global Step: 179100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:27,658-Speed 3325.24 samples/sec   Loss 1.5667   LearningRate 0.0215   Epoch: 10   Global Step: 179110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:30,771-Speed 3289.80 samples/sec   Loss 1.5923   LearningRate 0.0215   Epoch: 10   Global Step: 179120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:33,875-Speed 3300.04 samples/sec   Loss 1.5297   LearningRate 0.0215   Epoch: 10   Global Step: 179130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:37,047-Speed 3228.92 samples/sec   Loss 1.6234   LearningRate 0.0215   Epoch: 10   Global Step: 179140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:40,177-Speed 3272.55 samples/sec   Loss 1.6121   LearningRate 0.0215   Epoch: 10   Global Step: 179150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:43,312-Speed 3266.86 samples/sec   Loss 1.6216   LearningRate 0.0215   Epoch: 10   Global Step: 179160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:11:46,389-Speed 3329.09 samples/sec   Loss 1.5516   LearningRate 0.0215   Epoch: 10   Global Step: 179170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:11:49,538-Speed 3252.76 samples/sec   Loss 1.5847   LearningRate 0.0215   Epoch: 10   Global Step: 179180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:11:52,654-Speed 3286.25 samples/sec   Loss 1.5377   LearningRate 0.0215   Epoch: 10   Global Step: 179190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:11:55,732-Speed 3327.42 samples/sec   Loss 1.6354   LearningRate 0.0215   Epoch: 10   Global Step: 179200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:11:58,829-Speed 3306.95 samples/sec   Loss 1.6099   LearningRate 0.0215   Epoch: 10   Global Step: 179210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:12:01,983-Speed 3248.05 samples/sec   Loss 1.6111   LearningRate 0.0214   Epoch: 10   Global Step: 179220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:12:05,062-Speed 3326.33 samples/sec   Loss 1.5547   LearningRate 0.0214   Epoch: 10   Global Step: 179230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:12:08,143-Speed 3324.76 samples/sec   Loss 1.5681   LearningRate 0.0214   Epoch: 10   Global Step: 179240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:12:11,243-Speed 3303.86 samples/sec   Loss 1.5032   LearningRate 0.0214   Epoch: 10   Global Step: 179250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:12:14,420-Speed 3223.90 samples/sec   Loss 1.5758   LearningRate 0.0214   Epoch: 10   Global Step: 179260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:12:17,516-Speed 3308.86 samples/sec   Loss 1.5278   LearningRate 0.0214   Epoch: 10   Global Step: 179270   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 18:12:20,585-Speed 3337.06 samples/sec   Loss 1.5298   LearningRate 0.0214   Epoch: 10   Global Step: 179280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:23,665-Speed 3325.40 samples/sec   Loss 1.5829   LearningRate 0.0214   Epoch: 10   Global Step: 179290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:26,749-Speed 3321.45 samples/sec   Loss 1.5564   LearningRate 0.0214   Epoch: 10   Global Step: 179300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:29,826-Speed 3328.76 samples/sec   Loss 1.5527   LearningRate 0.0214   Epoch: 10   Global Step: 179310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:32,923-Speed 3307.85 samples/sec   Loss 1.5486   LearningRate 0.0214   Epoch: 10   Global Step: 179320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:36,010-Speed 3317.68 samples/sec   Loss 1.5608   LearningRate 0.0214   Epoch: 10   Global Step: 179330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:39,091-Speed 3324.61 samples/sec   Loss 1.6043   LearningRate 0.0214   Epoch: 10   Global Step: 179340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:42,257-Speed 3235.11 samples/sec   Loss 1.5962   LearningRate 0.0214   Epoch: 10   Global Step: 179350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:45,366-Speed 3294.01 samples/sec   Loss 1.5373   LearningRate 0.0214   Epoch: 10   Global Step: 179360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:48,451-Speed 3320.75 samples/sec   Loss 1.5562   LearningRate 0.0214   Epoch: 10   Global Step: 179370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:12:51,531-Speed 3325.45 samples/sec   Loss 1.6318   LearningRate 0.0214   Epoch: 10   Global Step: 179380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:12:54,613-Speed 3322.82 samples/sec   Loss 1.5915   LearningRate 0.0214   Epoch: 10   Global Step: 179390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:12:57,699-Speed 3319.68 samples/sec   Loss 1.6324   LearningRate 0.0214   Epoch: 10   Global Step: 179400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:00,776-Speed 3328.43 samples/sec   Loss 1.6212   LearningRate 0.0214   Epoch: 10   Global Step: 179410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:03,923-Speed 3254.61 samples/sec   Loss 1.6362   LearningRate 0.0214   Epoch: 10   Global Step: 179420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:07,080-Speed 3244.20 samples/sec   Loss 1.6025   LearningRate 0.0214   Epoch: 10   Global Step: 179430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:10,166-Speed 3319.22 samples/sec   Loss 1.5863   LearningRate 0.0214   Epoch: 10   Global Step: 179440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:13,258-Speed 3312.40 samples/sec   Loss 1.5430   LearningRate 0.0214   Epoch: 10   Global Step: 179450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:16,342-Speed 3320.99 samples/sec   Loss 1.5708   LearningRate 0.0214   Epoch: 10   Global Step: 179460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:19,437-Speed 3309.32 samples/sec   Loss 1.5610   LearningRate 0.0214   Epoch: 10   Global Step: 179470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:22,522-Speed 3319.80 samples/sec   Loss 1.5310   LearningRate 0.0214   Epoch: 10   Global Step: 179480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:25,606-Speed 3321.33 samples/sec   Loss 1.5839   LearningRate 0.0214   Epoch: 10   Global Step: 179490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:28,743-Speed 3265.55 samples/sec   Loss 1.5282   LearningRate 0.0214   Epoch: 10   Global Step: 179500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:31,825-Speed 3323.32 samples/sec   Loss 1.5514   LearningRate 0.0214   Epoch: 10   Global Step: 179510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:34,906-Speed 3323.77 samples/sec   Loss 1.6451   LearningRate 0.0214   Epoch: 10   Global Step: 179520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:38,001-Speed 3310.31 samples/sec   Loss 1.6355   LearningRate 0.0214   Epoch: 10   Global Step: 179530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:41,121-Speed 3282.27 samples/sec   Loss 1.6002   LearningRate 0.0214   Epoch: 10   Global Step: 179540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:44,210-Speed 3315.74 samples/sec   Loss 1.5659   LearningRate 0.0214   Epoch: 10   Global Step: 179550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:47,297-Speed 3317.42 samples/sec   Loss 1.5139   LearningRate 0.0214   Epoch: 10   Global Step: 179560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:50,395-Speed 3306.52 samples/sec   Loss 1.5785   LearningRate 0.0214   Epoch: 10   Global Step: 179570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:53,508-Speed 3290.63 samples/sec   Loss 1.5398   LearningRate 0.0213   Epoch: 10   Global Step: 179580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:56,619-Speed 3292.13 samples/sec   Loss 1.5530   LearningRate 0.0213   Epoch: 10   Global Step: 179590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:13:59,694-Speed 3330.30 samples/sec   Loss 1.5474   LearningRate 0.0213   Epoch: 10   Global Step: 179600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:02,806-Speed 3291.88 samples/sec   Loss 1.5666   LearningRate 0.0213   Epoch: 10   Global Step: 179610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:05,885-Speed 3326.55 samples/sec   Loss 1.6058   LearningRate 0.0213   Epoch: 10   Global Step: 179620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:09,034-Speed 3252.66 samples/sec   Loss 1.6069   LearningRate 0.0213   Epoch: 10   Global Step: 179630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:12,164-Speed 3272.75 samples/sec   Loss 1.5818   LearningRate 0.0213   Epoch: 10   Global Step: 179640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:15,327-Speed 3237.28 samples/sec   Loss 1.6006   LearningRate 0.0213   Epoch: 10   Global Step: 179650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:18,439-Speed 3291.97 samples/sec   Loss 1.5925   LearningRate 0.0213   Epoch: 10   Global Step: 179660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:21,541-Speed 3302.32 samples/sec   Loss 1.6372   LearningRate 0.0213   Epoch: 10   Global Step: 179670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:24,651-Speed 3292.82 samples/sec   Loss 1.6395   LearningRate 0.0213   Epoch: 10   Global Step: 179680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:27,743-Speed 3312.70 samples/sec   Loss 1.6039   LearningRate 0.0213   Epoch: 10   Global Step: 179690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:30,835-Speed 3312.77 samples/sec   Loss 1.6146   LearningRate 0.0213   Epoch: 10   Global Step: 179700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:33,914-Speed 3326.40 samples/sec   Loss 1.5494   LearningRate 0.0213   Epoch: 10   Global Step: 179710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:36,996-Speed 3322.76 samples/sec   Loss 1.5798   LearningRate 0.0213   Epoch: 10   Global Step: 179720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:40,088-Speed 3312.77 samples/sec   Loss 1.6067   LearningRate 0.0213   Epoch: 10   Global Step: 179730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:43,171-Speed 3323.06 samples/sec   Loss 1.6197   LearningRate 0.0213   Epoch: 10   Global Step: 179740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:46,248-Speed 3327.80 samples/sec   Loss 1.5775   LearningRate 0.0213   Epoch: 10   Global Step: 179750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:49,334-Speed 3319.33 samples/sec   Loss 1.5555   LearningRate 0.0213   Epoch: 10   Global Step: 179760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:52,425-Speed 3313.88 samples/sec   Loss 1.5625   LearningRate 0.0213   Epoch: 10   Global Step: 179770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:55,481-Speed 3350.86 samples/sec   Loss 1.6516   LearningRate 0.0213   Epoch: 10   Global Step: 179780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:14:58,573-Speed 3313.05 samples/sec   Loss 1.6067   LearningRate 0.0213   Epoch: 10   Global Step: 179790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:01,662-Speed 3315.36 samples/sec   Loss 1.6071   LearningRate 0.0213   Epoch: 10   Global Step: 179800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:04,770-Speed 3295.49 samples/sec   Loss 1.6536   LearningRate 0.0213   Epoch: 10   Global Step: 179810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:07,933-Speed 3237.48 samples/sec   Loss 1.5675   LearningRate 0.0213   Epoch: 10   Global Step: 179820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:11,108-Speed 3227.11 samples/sec   Loss 1.5978   LearningRate 0.0213   Epoch: 10   Global Step: 179830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:14,240-Speed 3269.80 samples/sec   Loss 1.5720   LearningRate 0.0213   Epoch: 10   Global Step: 179840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:17,328-Speed 3317.20 samples/sec   Loss 1.6114   LearningRate 0.0213   Epoch: 10   Global Step: 179850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:20,406-Speed 3327.16 samples/sec   Loss 1.5358   LearningRate 0.0213   Epoch: 10   Global Step: 179860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:23,481-Speed 3330.62 samples/sec   Loss 1.6234   LearningRate 0.0213   Epoch: 10   Global Step: 179870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:26,551-Speed 3337.05 samples/sec   Loss 1.6443   LearningRate 0.0213   Epoch: 10   Global Step: 179880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:29,794-Speed 3157.84 samples/sec   Loss 1.5528   LearningRate 0.0213   Epoch: 10   Global Step: 179890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:32,918-Speed 3278.70 samples/sec   Loss 1.5469   LearningRate 0.0213   Epoch: 10   Global Step: 179900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:15:36,055-Speed 3264.86 samples/sec   Loss 1.5948   LearningRate 0.0213   Epoch: 10   Global Step: 179910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:15:39,207-Speed 3249.99 samples/sec   Loss 1.5780   LearningRate 0.0213   Epoch: 10   Global Step: 179920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:15:42,357-Speed 3251.23 samples/sec   Loss 1.5817   LearningRate 0.0213   Epoch: 10   Global Step: 179930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:15:45,515-Speed 3242.74 samples/sec   Loss 1.5682   LearningRate 0.0212   Epoch: 10   Global Step: 179940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:15:48,610-Speed 3310.42 samples/sec   Loss 1.6195   LearningRate 0.0212   Epoch: 10   Global Step: 179950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:15:51,692-Speed 3322.44 samples/sec   Loss 1.5945   LearningRate 0.0212   Epoch: 10   Global Step: 179960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:15:54,830-Speed 3263.97 samples/sec   Loss 1.5886   LearningRate 0.0212   Epoch: 10   Global Step: 179970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:15:57,934-Speed 3299.89 samples/sec   Loss 1.5800   LearningRate 0.0212   Epoch: 10   Global Step: 179980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:16:01,049-Speed 3287.58 samples/sec   Loss 1.6128   LearningRate 0.0212   Epoch: 10   Global Step: 179990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:16:04,140-Speed 3314.33 samples/sec   Loss 1.6199   LearningRate 0.0212   Epoch: 10   Global Step: 180000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:16:47,675-[lfw][180000]XNorm: 21.936092
Training: 2022-04-11 18:16:47,675-[lfw][180000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-11 18:16:47,676-[lfw][180000]Accuracy-Highest: 0.99817
Training: 2022-04-11 18:17:38,136-[cfp_fp][180000]XNorm: 22.151719
Training: 2022-04-11 18:17:38,136-[cfp_fp][180000]Accuracy-Flip: 0.98914+-0.00448
Training: 2022-04-11 18:17:38,137-[cfp_fp][180000]Accuracy-Highest: 0.98971
Training: 2022-04-11 18:18:21,574-[agedb_30][180000]XNorm: 22.679250
Training: 2022-04-11 18:18:21,575-[agedb_30][180000]Accuracy-Flip: 0.98500+-0.00641
Training: 2022-04-11 18:18:21,575-[agedb_30][180000]Accuracy-Highest: 0.98500
Training: 2022-04-11 18:18:24,660-Speed 72.87 samples/sec   Loss 1.5472   LearningRate 0.0212   Epoch: 10   Global Step: 180010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:18:27,708-Speed 3360.22 samples/sec   Loss 1.5789   LearningRate 0.0212   Epoch: 10   Global Step: 180020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:30,860-Speed 3250.26 samples/sec   Loss 1.5699   LearningRate 0.0212   Epoch: 10   Global Step: 180030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:33,926-Speed 3340.42 samples/sec   Loss 1.5549   LearningRate 0.0212   Epoch: 10   Global Step: 180040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:36,995-Speed 3337.40 samples/sec   Loss 1.6388   LearningRate 0.0212   Epoch: 10   Global Step: 180050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:40,107-Speed 3290.65 samples/sec   Loss 1.5978   LearningRate 0.0212   Epoch: 10   Global Step: 180060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:43,173-Speed 3340.85 samples/sec   Loss 1.6109   LearningRate 0.0212   Epoch: 10   Global Step: 180070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:46,250-Speed 3328.87 samples/sec   Loss 1.6383   LearningRate 0.0212   Epoch: 10   Global Step: 180080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:49,339-Speed 3315.12 samples/sec   Loss 1.5626   LearningRate 0.0212   Epoch: 10   Global Step: 180090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:52,427-Speed 3317.69 samples/sec   Loss 1.5597   LearningRate 0.0212   Epoch: 10   Global Step: 180100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:55,542-Speed 3287.28 samples/sec   Loss 1.5453   LearningRate 0.0212   Epoch: 10   Global Step: 180110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:18:58,634-Speed 3313.03 samples/sec   Loss 1.5759   LearningRate 0.0212   Epoch: 10   Global Step: 180120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:19:01,731-Speed 3307.22 samples/sec   Loss 1.5308   LearningRate 0.0212   Epoch: 10   Global Step: 180130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:19:04,809-Speed 3328.22 samples/sec   Loss 1.6303   LearningRate 0.0212   Epoch: 10   Global Step: 180140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:19:07,884-Speed 3329.92 samples/sec   Loss 1.5722   LearningRate 0.0212   Epoch: 10   Global Step: 180150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:19:10,979-Speed 3309.27 samples/sec   Loss 1.5118   LearningRate 0.0212   Epoch: 10   Global Step: 180160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:19:14,066-Speed 3317.62 samples/sec   Loss 1.6186   LearningRate 0.0212   Epoch: 10   Global Step: 180170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:19:17,158-Speed 3313.77 samples/sec   Loss 1.5644   LearningRate 0.0212   Epoch: 10   Global Step: 180180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:19:20,230-Speed 3334.43 samples/sec   Loss 1.5851   LearningRate 0.0212   Epoch: 10   Global Step: 180190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:23,328-Speed 3306.16 samples/sec   Loss 1.5912   LearningRate 0.0212   Epoch: 10   Global Step: 180200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:26,411-Speed 3321.32 samples/sec   Loss 1.5302   LearningRate 0.0212   Epoch: 10   Global Step: 180210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:29,500-Speed 3316.44 samples/sec   Loss 1.5489   LearningRate 0.0212   Epoch: 10   Global Step: 180220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:32,588-Speed 3316.56 samples/sec   Loss 1.5630   LearningRate 0.0212   Epoch: 10   Global Step: 180230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:35,700-Speed 3291.35 samples/sec   Loss 1.5682   LearningRate 0.0212   Epoch: 10   Global Step: 180240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:38,794-Speed 3310.45 samples/sec   Loss 1.5776   LearningRate 0.0212   Epoch: 10   Global Step: 180250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:41,888-Speed 3309.97 samples/sec   Loss 1.5383   LearningRate 0.0212   Epoch: 10   Global Step: 180260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:44,985-Speed 3307.59 samples/sec   Loss 1.5325   LearningRate 0.0212   Epoch: 10   Global Step: 180270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:48,069-Speed 3320.73 samples/sec   Loss 1.5980   LearningRate 0.0212   Epoch: 10   Global Step: 180280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:19:51,150-Speed 3324.39 samples/sec   Loss 1.5548   LearningRate 0.0212   Epoch: 10   Global Step: 180290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:19:54,239-Speed 3316.20 samples/sec   Loss 1.5786   LearningRate 0.0211   Epoch: 10   Global Step: 180300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:19:57,336-Speed 3306.20 samples/sec   Loss 1.6035   LearningRate 0.0211   Epoch: 10   Global Step: 180310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:00,461-Speed 3277.86 samples/sec   Loss 1.5981   LearningRate 0.0211   Epoch: 10   Global Step: 180320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:03,554-Speed 3312.04 samples/sec   Loss 1.5726   LearningRate 0.0211   Epoch: 10   Global Step: 180330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:06,682-Speed 3274.61 samples/sec   Loss 1.5483   LearningRate 0.0211   Epoch: 10   Global Step: 180340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:09,774-Speed 3312.35 samples/sec   Loss 1.5413   LearningRate 0.0211   Epoch: 10   Global Step: 180350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:12,860-Speed 3318.90 samples/sec   Loss 1.5364   LearningRate 0.0211   Epoch: 10   Global Step: 180360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:15,938-Speed 3327.35 samples/sec   Loss 1.5779   LearningRate 0.0211   Epoch: 10   Global Step: 180370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:19,015-Speed 3329.02 samples/sec   Loss 1.5582   LearningRate 0.0211   Epoch: 10   Global Step: 180380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:22,090-Speed 3330.62 samples/sec   Loss 1.5572   LearningRate 0.0211   Epoch: 10   Global Step: 180390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:25,165-Speed 3331.28 samples/sec   Loss 1.5575   LearningRate 0.0211   Epoch: 10   Global Step: 180400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:20:28,232-Speed 3339.04 samples/sec   Loss 1.5651   LearningRate 0.0211   Epoch: 10   Global Step: 180410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:31,334-Speed 3302.54 samples/sec   Loss 1.5849   LearningRate 0.0211   Epoch: 10   Global Step: 180420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:34,436-Speed 3302.07 samples/sec   Loss 1.5252   LearningRate 0.0211   Epoch: 10   Global Step: 180430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:37,519-Speed 3321.64 samples/sec   Loss 1.5677   LearningRate 0.0211   Epoch: 10   Global Step: 180440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:40,628-Speed 3294.19 samples/sec   Loss 1.5729   LearningRate 0.0211   Epoch: 10   Global Step: 180450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:43,786-Speed 3243.29 samples/sec   Loss 1.6000   LearningRate 0.0211   Epoch: 10   Global Step: 180460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:47,054-Speed 3135.02 samples/sec   Loss 1.5493   LearningRate 0.0211   Epoch: 10   Global Step: 180470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:50,135-Speed 3323.71 samples/sec   Loss 1.6074   LearningRate 0.0211   Epoch: 10   Global Step: 180480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:53,221-Speed 3318.86 samples/sec   Loss 1.5774   LearningRate 0.0211   Epoch: 10   Global Step: 180490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:56,343-Speed 3280.76 samples/sec   Loss 1.5811   LearningRate 0.0211   Epoch: 10   Global Step: 180500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:20:59,439-Speed 3308.68 samples/sec   Loss 1.6100   LearningRate 0.0211   Epoch: 10   Global Step: 180510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:21:02,586-Speed 3254.25 samples/sec   Loss 1.6538   LearningRate 0.0211   Epoch: 10   Global Step: 180520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:21:05,675-Speed 3317.12 samples/sec   Loss 1.5001   LearningRate 0.0211   Epoch: 10   Global Step: 180530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:21:08,857-Speed 3218.28 samples/sec   Loss 1.5556   LearningRate 0.0211   Epoch: 10   Global Step: 180540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:21:11,947-Speed 3314.67 samples/sec   Loss 1.5351   LearningRate 0.0211   Epoch: 10   Global Step: 180550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:21:15,063-Speed 3286.90 samples/sec   Loss 1.5582   LearningRate 0.0211   Epoch: 10   Global Step: 180560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:18,179-Speed 3287.38 samples/sec   Loss 1.6053   LearningRate 0.0211   Epoch: 10   Global Step: 180570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:21,264-Speed 3319.63 samples/sec   Loss 1.6186   LearningRate 0.0211   Epoch: 10   Global Step: 180580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:24,392-Speed 3274.12 samples/sec   Loss 1.6027   LearningRate 0.0211   Epoch: 10   Global Step: 180590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:27,476-Speed 3320.57 samples/sec   Loss 1.4958   LearningRate 0.0211   Epoch: 10   Global Step: 180600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:30,556-Speed 3326.61 samples/sec   Loss 1.5418   LearningRate 0.0211   Epoch: 10   Global Step: 180610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:33,738-Speed 3218.14 samples/sec   Loss 1.6080   LearningRate 0.0211   Epoch: 10   Global Step: 180620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:36,845-Speed 3297.35 samples/sec   Loss 1.5426   LearningRate 0.0211   Epoch: 10   Global Step: 180630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:39,961-Speed 3286.45 samples/sec   Loss 1.5662   LearningRate 0.0211   Epoch: 10   Global Step: 180640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:43,052-Speed 3314.28 samples/sec   Loss 1.5876   LearningRate 0.0211   Epoch: 10   Global Step: 180650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:21:46,135-Speed 3321.07 samples/sec   Loss 1.6297   LearningRate 0.0211   Epoch: 10   Global Step: 180660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:21:49,227-Speed 3313.13 samples/sec   Loss 1.5973   LearningRate 0.0210   Epoch: 10   Global Step: 180670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:21:52,426-Speed 3201.88 samples/sec   Loss 1.5650   LearningRate 0.0210   Epoch: 10   Global Step: 180680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:21:55,520-Speed 3310.12 samples/sec   Loss 1.5759   LearningRate 0.0210   Epoch: 10   Global Step: 180690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:21:58,605-Speed 3321.61 samples/sec   Loss 1.5586   LearningRate 0.0210   Epoch: 10   Global Step: 180700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:01,752-Speed 3254.71 samples/sec   Loss 1.6095   LearningRate 0.0210   Epoch: 10   Global Step: 180710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:04,880-Speed 3273.88 samples/sec   Loss 1.5801   LearningRate 0.0210   Epoch: 10   Global Step: 180720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:07,961-Speed 3324.99 samples/sec   Loss 1.6086   LearningRate 0.0210   Epoch: 10   Global Step: 180730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:11,082-Speed 3281.53 samples/sec   Loss 1.6148   LearningRate 0.0210   Epoch: 10   Global Step: 180740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:14,160-Speed 3327.13 samples/sec   Loss 1.5201   LearningRate 0.0210   Epoch: 10   Global Step: 180750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:17,236-Speed 3329.91 samples/sec   Loss 1.5662   LearningRate 0.0210   Epoch: 10   Global Step: 180760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:20,337-Speed 3303.15 samples/sec   Loss 1.5688   LearningRate 0.0210   Epoch: 10   Global Step: 180770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:23,439-Speed 3301.44 samples/sec   Loss 1.5892   LearningRate 0.0210   Epoch: 10   Global Step: 180780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:26,588-Speed 3253.45 samples/sec   Loss 1.5816   LearningRate 0.0210   Epoch: 10   Global Step: 180790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:29,691-Speed 3300.55 samples/sec   Loss 1.5774   LearningRate 0.0210   Epoch: 10   Global Step: 180800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:32,815-Speed 3278.87 samples/sec   Loss 1.5764   LearningRate 0.0210   Epoch: 10   Global Step: 180810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:35,894-Speed 3326.32 samples/sec   Loss 1.5594   LearningRate 0.0210   Epoch: 10   Global Step: 180820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:38,987-Speed 3311.07 samples/sec   Loss 1.5235   LearningRate 0.0210   Epoch: 10   Global Step: 180830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:42,067-Speed 3325.95 samples/sec   Loss 1.6044   LearningRate 0.0210   Epoch: 10   Global Step: 180840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:22:45,182-Speed 3287.94 samples/sec   Loss 1.5400   LearningRate 0.0210   Epoch: 10   Global Step: 180850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:22:48,355-Speed 3227.85 samples/sec   Loss 1.5469   LearningRate 0.0210   Epoch: 10   Global Step: 180860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:22:51,446-Speed 3313.94 samples/sec   Loss 1.5625   LearningRate 0.0210   Epoch: 10   Global Step: 180870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:22:54,533-Speed 3317.89 samples/sec   Loss 1.6074   LearningRate 0.0210   Epoch: 10   Global Step: 180880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:22:57,609-Speed 3329.49 samples/sec   Loss 1.6271   LearningRate 0.0210   Epoch: 10   Global Step: 180890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:23:00,691-Speed 3323.64 samples/sec   Loss 1.5686   LearningRate 0.0210   Epoch: 10   Global Step: 180900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:23:03,774-Speed 3322.47 samples/sec   Loss 1.5352   LearningRate 0.0210   Epoch: 10   Global Step: 180910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:23:06,859-Speed 3320.01 samples/sec   Loss 1.6221   LearningRate 0.0210   Epoch: 10   Global Step: 180920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:23:09,947-Speed 3317.07 samples/sec   Loss 1.6060   LearningRate 0.0210   Epoch: 10   Global Step: 180930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:23:13,057-Speed 3293.22 samples/sec   Loss 1.5971   LearningRate 0.0210   Epoch: 10   Global Step: 180940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:23:16,142-Speed 3319.33 samples/sec   Loss 1.5759   LearningRate 0.0210   Epoch: 10   Global Step: 180950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:19,238-Speed 3308.09 samples/sec   Loss 1.6331   LearningRate 0.0210   Epoch: 10   Global Step: 180960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:22,319-Speed 3324.80 samples/sec   Loss 1.5668   LearningRate 0.0210   Epoch: 10   Global Step: 180970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:25,398-Speed 3327.26 samples/sec   Loss 1.6105   LearningRate 0.0210   Epoch: 10   Global Step: 180980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:28,494-Speed 3307.68 samples/sec   Loss 1.5820   LearningRate 0.0210   Epoch: 10   Global Step: 180990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:31,570-Speed 3330.41 samples/sec   Loss 1.6305   LearningRate 0.0210   Epoch: 10   Global Step: 181000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:34,652-Speed 3323.03 samples/sec   Loss 1.5774   LearningRate 0.0210   Epoch: 10   Global Step: 181010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:37,734-Speed 3322.97 samples/sec   Loss 1.5816   LearningRate 0.0210   Epoch: 10   Global Step: 181020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:40,913-Speed 3222.08 samples/sec   Loss 1.5614   LearningRate 0.0209   Epoch: 10   Global Step: 181030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:44,052-Speed 3262.70 samples/sec   Loss 1.6269   LearningRate 0.0209   Epoch: 10   Global Step: 181040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:47,196-Speed 3257.44 samples/sec   Loss 1.5759   LearningRate 0.0209   Epoch: 10   Global Step: 181050   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 18:23:50,280-Speed 3321.33 samples/sec   Loss 1.5419   LearningRate 0.0209   Epoch: 10   Global Step: 181060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:53,358-Speed 3328.45 samples/sec   Loss 1.5950   LearningRate 0.0209   Epoch: 10   Global Step: 181070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:56,443-Speed 3319.14 samples/sec   Loss 1.5845   LearningRate 0.0209   Epoch: 10   Global Step: 181080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:23:59,527-Speed 3321.84 samples/sec   Loss 1.5434   LearningRate 0.0209   Epoch: 10   Global Step: 181090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:24:02,629-Speed 3301.28 samples/sec   Loss 1.5488   LearningRate 0.0209   Epoch: 10   Global Step: 181100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:24:05,742-Speed 3290.35 samples/sec   Loss 1.5907   LearningRate 0.0209   Epoch: 10   Global Step: 181110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:24:08,870-Speed 3274.91 samples/sec   Loss 1.5828   LearningRate 0.0209   Epoch: 10   Global Step: 181120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:24:12,010-Speed 3261.71 samples/sec   Loss 1.5290   LearningRate 0.0209   Epoch: 10   Global Step: 181130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:24:15,127-Speed 3285.66 samples/sec   Loss 1.5867   LearningRate 0.0209   Epoch: 10   Global Step: 181140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:18,206-Speed 3327.25 samples/sec   Loss 1.5936   LearningRate 0.0209   Epoch: 10   Global Step: 181150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:21,288-Speed 3323.30 samples/sec   Loss 1.5329   LearningRate 0.0209   Epoch: 10   Global Step: 181160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:24,375-Speed 3317.95 samples/sec   Loss 1.5744   LearningRate 0.0209   Epoch: 10   Global Step: 181170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:27,515-Speed 3261.74 samples/sec   Loss 1.4845   LearningRate 0.0209   Epoch: 10   Global Step: 181180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:30,609-Speed 3310.15 samples/sec   Loss 1.5361   LearningRate 0.0209   Epoch: 10   Global Step: 181190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:33,717-Speed 3295.71 samples/sec   Loss 1.5337   LearningRate 0.0209   Epoch: 10   Global Step: 181200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:36,801-Speed 3320.83 samples/sec   Loss 1.6130   LearningRate 0.0209   Epoch: 10   Global Step: 181210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:39,917-Speed 3286.80 samples/sec   Loss 1.6077   LearningRate 0.0209   Epoch: 10   Global Step: 181220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:43,021-Speed 3299.84 samples/sec   Loss 1.6306   LearningRate 0.0209   Epoch: 10   Global Step: 181230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:24:46,174-Speed 3248.89 samples/sec   Loss 1.5739   LearningRate 0.0209   Epoch: 10   Global Step: 181240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:24:49,379-Speed 3195.70 samples/sec   Loss 1.6336   LearningRate 0.0209   Epoch: 10   Global Step: 181250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:24:52,462-Speed 3322.19 samples/sec   Loss 1.5343   LearningRate 0.0209   Epoch: 10   Global Step: 181260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:24:55,540-Speed 3327.66 samples/sec   Loss 1.5627   LearningRate 0.0209   Epoch: 10   Global Step: 181270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:24:58,615-Speed 3330.59 samples/sec   Loss 1.5690   LearningRate 0.0209   Epoch: 10   Global Step: 181280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:01,708-Speed 3311.68 samples/sec   Loss 1.5165   LearningRate 0.0209   Epoch: 10   Global Step: 181290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:04,784-Speed 3329.34 samples/sec   Loss 1.5537   LearningRate 0.0209   Epoch: 10   Global Step: 181300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:07,880-Speed 3308.35 samples/sec   Loss 1.5870   LearningRate 0.0209   Epoch: 10   Global Step: 181310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:10,957-Speed 3329.28 samples/sec   Loss 1.5732   LearningRate 0.0209   Epoch: 10   Global Step: 181320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:14,034-Speed 3328.93 samples/sec   Loss 1.5502   LearningRate 0.0209   Epoch: 10   Global Step: 181330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:17,114-Speed 3325.08 samples/sec   Loss 1.5777   LearningRate 0.0209   Epoch: 10   Global Step: 181340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:25:20,234-Speed 3282.26 samples/sec   Loss 1.5743   LearningRate 0.0209   Epoch: 10   Global Step: 181350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:25:23,430-Speed 3205.50 samples/sec   Loss 1.6276   LearningRate 0.0209   Epoch: 10   Global Step: 181360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:25:26,504-Speed 3331.48 samples/sec   Loss 1.5434   LearningRate 0.0209   Epoch: 10   Global Step: 181370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:25:29,583-Speed 3326.60 samples/sec   Loss 1.5894   LearningRate 0.0209   Epoch: 10   Global Step: 181380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:25:32,663-Speed 3324.96 samples/sec   Loss 1.5547   LearningRate 0.0209   Epoch: 10   Global Step: 181390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:25:35,801-Speed 3263.82 samples/sec   Loss 1.5196   LearningRate 0.0208   Epoch: 10   Global Step: 181400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:25:38,878-Speed 3329.70 samples/sec   Loss 1.5953   LearningRate 0.0208   Epoch: 10   Global Step: 181410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:25:41,944-Speed 3340.12 samples/sec   Loss 1.5426   LearningRate 0.0208   Epoch: 10   Global Step: 181420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:45,031-Speed 3318.55 samples/sec   Loss 1.5693   LearningRate 0.0208   Epoch: 10   Global Step: 181430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:48,129-Speed 3306.44 samples/sec   Loss 1.5824   LearningRate 0.0208   Epoch: 10   Global Step: 181440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:51,270-Speed 3260.06 samples/sec   Loss 1.5747   LearningRate 0.0208   Epoch: 10   Global Step: 181450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:54,351-Speed 3324.29 samples/sec   Loss 1.5928   LearningRate 0.0208   Epoch: 10   Global Step: 181460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:25:57,449-Speed 3306.57 samples/sec   Loss 1.5042   LearningRate 0.0208   Epoch: 10   Global Step: 181470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:00,528-Speed 3326.63 samples/sec   Loss 1.5579   LearningRate 0.0208   Epoch: 10   Global Step: 181480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:03,630-Speed 3301.32 samples/sec   Loss 1.5209   LearningRate 0.0208   Epoch: 10   Global Step: 181490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:06,709-Speed 3326.62 samples/sec   Loss 1.6452   LearningRate 0.0208   Epoch: 10   Global Step: 181500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:09,798-Speed 3315.58 samples/sec   Loss 1.5744   LearningRate 0.0208   Epoch: 10   Global Step: 181510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:12,932-Speed 3268.82 samples/sec   Loss 1.5165   LearningRate 0.0208   Epoch: 10   Global Step: 181520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:26:16,066-Speed 3267.94 samples/sec   Loss 1.5836   LearningRate 0.0208   Epoch: 10   Global Step: 181530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:26:19,151-Speed 3320.28 samples/sec   Loss 1.5858   LearningRate 0.0208   Epoch: 10   Global Step: 181540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:26:22,238-Speed 3317.01 samples/sec   Loss 1.5723   LearningRate 0.0208   Epoch: 10   Global Step: 181550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:26:25,352-Speed 3289.65 samples/sec   Loss 1.5364   LearningRate 0.0208   Epoch: 10   Global Step: 181560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:28,450-Speed 3305.85 samples/sec   Loss 1.5654   LearningRate 0.0208   Epoch: 10   Global Step: 181570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:31,587-Speed 3265.56 samples/sec   Loss 1.5374   LearningRate 0.0208   Epoch: 10   Global Step: 181580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:34,683-Speed 3308.85 samples/sec   Loss 1.6000   LearningRate 0.0208   Epoch: 10   Global Step: 181590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:37,807-Speed 3277.77 samples/sec   Loss 1.6113   LearningRate 0.0208   Epoch: 10   Global Step: 181600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:40,943-Speed 3266.29 samples/sec   Loss 1.5316   LearningRate 0.0208   Epoch: 10   Global Step: 181610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:44,075-Speed 3270.21 samples/sec   Loss 1.5353   LearningRate 0.0208   Epoch: 10   Global Step: 181620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:47,172-Speed 3307.49 samples/sec   Loss 1.5163   LearningRate 0.0208   Epoch: 10   Global Step: 181630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:50,281-Speed 3294.08 samples/sec   Loss 1.5318   LearningRate 0.0208   Epoch: 10   Global Step: 181640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:53,364-Speed 3322.35 samples/sec   Loss 1.5367   LearningRate 0.0208   Epoch: 10   Global Step: 181650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:26:56,448-Speed 3321.04 samples/sec   Loss 1.5906   LearningRate 0.0208   Epoch: 10   Global Step: 181660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:26:59,596-Speed 3253.52 samples/sec   Loss 1.6051   LearningRate 0.0208   Epoch: 10   Global Step: 181670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:27:02,834-Speed 3163.00 samples/sec   Loss 1.6088   LearningRate 0.0208   Epoch: 10   Global Step: 181680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:27:05,931-Speed 3307.40 samples/sec   Loss 1.5585   LearningRate 0.0208   Epoch: 10   Global Step: 181690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:27:09,025-Speed 3310.31 samples/sec   Loss 1.5720   LearningRate 0.0208   Epoch: 10   Global Step: 181700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:27:12,111-Speed 3318.72 samples/sec   Loss 1.5373   LearningRate 0.0208   Epoch: 10   Global Step: 181710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:27:16,000-Speed 2633.96 samples/sec   Loss 1.5240   LearningRate 0.0208   Epoch: 10   Global Step: 181720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:27:19,087-Speed 3317.37 samples/sec   Loss 1.5843   LearningRate 0.0208   Epoch: 10   Global Step: 181730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:27:22,163-Speed 3329.75 samples/sec   Loss 1.5804   LearningRate 0.0208   Epoch: 10   Global Step: 181740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:27:25,235-Speed 3334.49 samples/sec   Loss 1.6149   LearningRate 0.0208   Epoch: 10   Global Step: 181750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:28,346-Speed 3292.51 samples/sec   Loss 1.5826   LearningRate 0.0207   Epoch: 10   Global Step: 181760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:31,436-Speed 3314.41 samples/sec   Loss 1.6248   LearningRate 0.0207   Epoch: 10   Global Step: 181770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:34,523-Speed 3318.11 samples/sec   Loss 1.5880   LearningRate 0.0207   Epoch: 10   Global Step: 181780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:37,653-Speed 3272.45 samples/sec   Loss 1.5479   LearningRate 0.0207   Epoch: 10   Global Step: 181790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:40,804-Speed 3250.50 samples/sec   Loss 1.5757   LearningRate 0.0207   Epoch: 10   Global Step: 181800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:43,955-Speed 3250.15 samples/sec   Loss 1.5658   LearningRate 0.0207   Epoch: 10   Global Step: 181810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:47,058-Speed 3301.06 samples/sec   Loss 1.5225   LearningRate 0.0207   Epoch: 10   Global Step: 181820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:50,172-Speed 3289.72 samples/sec   Loss 1.5716   LearningRate 0.0207   Epoch: 10   Global Step: 181830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:53,290-Speed 3283.94 samples/sec   Loss 1.5876   LearningRate 0.0207   Epoch: 10   Global Step: 181840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:27:56,376-Speed 3319.59 samples/sec   Loss 1.5258   LearningRate 0.0207   Epoch: 10   Global Step: 181850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:27:59,533-Speed 3244.18 samples/sec   Loss 1.5695   LearningRate 0.0207   Epoch: 10   Global Step: 181860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:28:02,658-Speed 3277.91 samples/sec   Loss 1.6043   LearningRate 0.0207   Epoch: 10   Global Step: 181870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:28:05,792-Speed 3267.72 samples/sec   Loss 1.5538   LearningRate 0.0207   Epoch: 10   Global Step: 181880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:28:08,903-Speed 3292.42 samples/sec   Loss 1.5622   LearningRate 0.0207   Epoch: 10   Global Step: 181890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:28:12,019-Speed 3286.71 samples/sec   Loss 1.5319   LearningRate 0.0207   Epoch: 10   Global Step: 181900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:28:15,117-Speed 3306.56 samples/sec   Loss 1.5996   LearningRate 0.0207   Epoch: 10   Global Step: 181910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:28:18,202-Speed 3319.87 samples/sec   Loss 1.5382   LearningRate 0.0207   Epoch: 10   Global Step: 181920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:28:21,304-Speed 3302.11 samples/sec   Loss 1.5400   LearningRate 0.0207   Epoch: 10   Global Step: 181930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:28:24,411-Speed 3296.62 samples/sec   Loss 1.5437   LearningRate 0.0207   Epoch: 10   Global Step: 181940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:28:27,486-Speed 3331.38 samples/sec   Loss 1.5408   LearningRate 0.0207   Epoch: 10   Global Step: 181950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:28:30,568-Speed 3323.65 samples/sec   Loss 1.5345   LearningRate 0.0207   Epoch: 10   Global Step: 181960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:28:33,645-Speed 3328.84 samples/sec   Loss 1.5405   LearningRate 0.0207   Epoch: 10   Global Step: 181970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:28:36,749-Speed 3298.97 samples/sec   Loss 1.5825   LearningRate 0.0207   Epoch: 10   Global Step: 181980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:28:39,829-Speed 3325.57 samples/sec   Loss 1.5293   LearningRate 0.0207   Epoch: 10   Global Step: 181990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:28:42,953-Speed 3278.76 samples/sec   Loss 1.5242   LearningRate 0.0207   Epoch: 10   Global Step: 182000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:29:26,622-[lfw][182000]XNorm: 22.455109
Training: 2022-04-11 18:29:26,623-[lfw][182000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 18:29:26,623-[lfw][182000]Accuracy-Highest: 0.99817
Training: 2022-04-11 18:30:17,286-[cfp_fp][182000]XNorm: 21.556972
Training: 2022-04-11 18:30:17,286-[cfp_fp][182000]Accuracy-Flip: 0.98914+-0.00561
Training: 2022-04-11 18:30:17,287-[cfp_fp][182000]Accuracy-Highest: 0.98971
Training: 2022-04-11 18:31:00,894-[agedb_30][182000]XNorm: 23.078960
Training: 2022-04-11 18:31:00,895-[agedb_30][182000]Accuracy-Flip: 0.98267+-0.00659
Training: 2022-04-11 18:31:00,896-[agedb_30][182000]Accuracy-Highest: 0.98500
Training: 2022-04-11 18:31:03,982-Speed 72.61 samples/sec   Loss 1.6088   LearningRate 0.0207   Epoch: 10   Global Step: 182010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:31:07,061-Speed 3326.65 samples/sec   Loss 1.5919   LearningRate 0.0207   Epoch: 10   Global Step: 182020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:31:10,170-Speed 3296.11 samples/sec   Loss 1.5472   LearningRate 0.0207   Epoch: 10   Global Step: 182030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:31:13,244-Speed 3331.77 samples/sec   Loss 1.5520   LearningRate 0.0207   Epoch: 10   Global Step: 182040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:31:16,330-Speed 3318.18 samples/sec   Loss 1.5609   LearningRate 0.0207   Epoch: 10   Global Step: 182050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:31:19,443-Speed 3290.65 samples/sec   Loss 1.5851   LearningRate 0.0207   Epoch: 10   Global Step: 182060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:31:22,508-Speed 3341.06 samples/sec   Loss 1.5172   LearningRate 0.0207   Epoch: 10   Global Step: 182070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:31:25,579-Speed 3336.13 samples/sec   Loss 1.6171   LearningRate 0.0207   Epoch: 10   Global Step: 182080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:31:28,652-Speed 3332.63 samples/sec   Loss 1.5501   LearningRate 0.0207   Epoch: 10   Global Step: 182090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:31:31,723-Speed 3335.30 samples/sec   Loss 1.5569   LearningRate 0.0207   Epoch: 10   Global Step: 182100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:31:34,792-Speed 3336.96 samples/sec   Loss 1.5791   LearningRate 0.0207   Epoch: 10   Global Step: 182110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:31:37,876-Speed 3321.25 samples/sec   Loss 1.5880   LearningRate 0.0207   Epoch: 10   Global Step: 182120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:31:40,934-Speed 3348.99 samples/sec   Loss 1.5339   LearningRate 0.0206   Epoch: 10   Global Step: 182130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:31:44,019-Speed 3319.84 samples/sec   Loss 1.5956   LearningRate 0.0206   Epoch: 10   Global Step: 182140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:31:47,098-Speed 3327.62 samples/sec   Loss 1.6061   LearningRate 0.0206   Epoch: 10   Global Step: 182150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:31:50,172-Speed 3331.95 samples/sec   Loss 1.5941   LearningRate 0.0206   Epoch: 10   Global Step: 182160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:31:53,270-Speed 3305.51 samples/sec   Loss 1.5134   LearningRate 0.0206   Epoch: 10   Global Step: 182170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:31:56,489-Speed 3181.81 samples/sec   Loss 1.5303   LearningRate 0.0206   Epoch: 10   Global Step: 182180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:31:59,739-Speed 3151.39 samples/sec   Loss 1.5704   LearningRate 0.0206   Epoch: 10   Global Step: 182190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:32:02,824-Speed 3320.18 samples/sec   Loss 1.5829   LearningRate 0.0206   Epoch: 10   Global Step: 182200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:32:05,967-Speed 3259.08 samples/sec   Loss 1.5474   LearningRate 0.0206   Epoch: 10   Global Step: 182210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:32:09,144-Speed 3223.83 samples/sec   Loss 1.5730   LearningRate 0.0206   Epoch: 10   Global Step: 182220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:32:12,280-Speed 3265.08 samples/sec   Loss 1.5483   LearningRate 0.0206   Epoch: 10   Global Step: 182230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:32:15,410-Speed 3273.68 samples/sec   Loss 1.5707   LearningRate 0.0206   Epoch: 10   Global Step: 182240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:32:18,519-Speed 3293.57 samples/sec   Loss 1.5890   LearningRate 0.0206   Epoch: 10   Global Step: 182250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:32:21,643-Speed 3278.70 samples/sec   Loss 1.5708   LearningRate 0.0206   Epoch: 10   Global Step: 182260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:24,849-Speed 3194.76 samples/sec   Loss 1.6316   LearningRate 0.0206   Epoch: 10   Global Step: 182270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:27,931-Speed 3323.18 samples/sec   Loss 1.5576   LearningRate 0.0206   Epoch: 10   Global Step: 182280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:31,012-Speed 3324.44 samples/sec   Loss 1.5379   LearningRate 0.0206   Epoch: 10   Global Step: 182290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:34,097-Speed 3320.44 samples/sec   Loss 1.5589   LearningRate 0.0206   Epoch: 10   Global Step: 182300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:37,316-Speed 3181.41 samples/sec   Loss 1.5364   LearningRate 0.0206   Epoch: 10   Global Step: 182310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:40,424-Speed 3295.75 samples/sec   Loss 1.5868   LearningRate 0.0206   Epoch: 10   Global Step: 182320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:43,504-Speed 3325.40 samples/sec   Loss 1.6023   LearningRate 0.0206   Epoch: 10   Global Step: 182330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:46,611-Speed 3296.30 samples/sec   Loss 1.5769   LearningRate 0.0206   Epoch: 10   Global Step: 182340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:49,739-Speed 3275.02 samples/sec   Loss 1.5920   LearningRate 0.0206   Epoch: 10   Global Step: 182350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:52,845-Speed 3297.47 samples/sec   Loss 1.5618   LearningRate 0.0206   Epoch: 10   Global Step: 182360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:32:55,907-Speed 3345.86 samples/sec   Loss 1.5919   LearningRate 0.0206   Epoch: 10   Global Step: 182370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:32:59,025-Speed 3284.23 samples/sec   Loss 1.5812   LearningRate 0.0206   Epoch: 10   Global Step: 182380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:33:02,169-Speed 3258.09 samples/sec   Loss 1.5625   LearningRate 0.0206   Epoch: 10   Global Step: 182390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:33:05,395-Speed 3174.23 samples/sec   Loss 1.5529   LearningRate 0.0206   Epoch: 10   Global Step: 182400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:33:08,485-Speed 3315.12 samples/sec   Loss 1.5309   LearningRate 0.0206   Epoch: 10   Global Step: 182410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:33:11,564-Speed 3326.40 samples/sec   Loss 1.5564   LearningRate 0.0206   Epoch: 10   Global Step: 182420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:33:14,644-Speed 3325.48 samples/sec   Loss 1.5423   LearningRate 0.0206   Epoch: 10   Global Step: 182430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:33:17,713-Speed 3337.24 samples/sec   Loss 1.5888   LearningRate 0.0206   Epoch: 10   Global Step: 182440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:33:20,808-Speed 3309.45 samples/sec   Loss 1.5578   LearningRate 0.0206   Epoch: 10   Global Step: 182450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:33:23,877-Speed 3339.95 samples/sec   Loss 1.5917   LearningRate 0.0206   Epoch: 10   Global Step: 182460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:33:26,954-Speed 3327.91 samples/sec   Loss 1.6400   LearningRate 0.0206   Epoch: 10   Global Step: 182470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:30,045-Speed 3314.31 samples/sec   Loss 1.5371   LearningRate 0.0206   Epoch: 10   Global Step: 182480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:33,126-Speed 3324.43 samples/sec   Loss 1.5527   LearningRate 0.0206   Epoch: 10   Global Step: 182490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:36,212-Speed 3318.98 samples/sec   Loss 1.5115   LearningRate 0.0205   Epoch: 10   Global Step: 182500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:39,296-Speed 3321.64 samples/sec   Loss 1.4884   LearningRate 0.0205   Epoch: 10   Global Step: 182510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:42,370-Speed 3331.11 samples/sec   Loss 1.5530   LearningRate 0.0205   Epoch: 10   Global Step: 182520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:45,445-Speed 3330.60 samples/sec   Loss 1.5575   LearningRate 0.0205   Epoch: 10   Global Step: 182530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:48,575-Speed 3273.06 samples/sec   Loss 1.5495   LearningRate 0.0205   Epoch: 10   Global Step: 182540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:51,679-Speed 3299.93 samples/sec   Loss 1.5524   LearningRate 0.0205   Epoch: 10   Global Step: 182550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:54,790-Speed 3292.14 samples/sec   Loss 1.5475   LearningRate 0.0205   Epoch: 10   Global Step: 182560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:33:57,954-Speed 3236.16 samples/sec   Loss 1.5853   LearningRate 0.0205   Epoch: 10   Global Step: 182570   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 18:34:01,020-Speed 3341.37 samples/sec   Loss 1.5511   LearningRate 0.0205   Epoch: 10   Global Step: 182580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:04,124-Speed 3300.02 samples/sec   Loss 1.5783   LearningRate 0.0205   Epoch: 10   Global Step: 182590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:07,230-Speed 3298.39 samples/sec   Loss 1.5539   LearningRate 0.0205   Epoch: 10   Global Step: 182600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:10,312-Speed 3322.34 samples/sec   Loss 1.5411   LearningRate 0.0205   Epoch: 10   Global Step: 182610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:13,403-Speed 3313.82 samples/sec   Loss 1.5622   LearningRate 0.0205   Epoch: 10   Global Step: 182620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:16,483-Speed 3325.63 samples/sec   Loss 1.6167   LearningRate 0.0205   Epoch: 10   Global Step: 182630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:19,574-Speed 3313.85 samples/sec   Loss 1.5629   LearningRate 0.0205   Epoch: 10   Global Step: 182640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:22,664-Speed 3314.73 samples/sec   Loss 1.5941   LearningRate 0.0205   Epoch: 10   Global Step: 182650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:25,746-Speed 3323.14 samples/sec   Loss 1.5682   LearningRate 0.0205   Epoch: 10   Global Step: 182660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:28,831-Speed 3320.04 samples/sec   Loss 1.4984   LearningRate 0.0205   Epoch: 10   Global Step: 182670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:34:31,991-Speed 3241.14 samples/sec   Loss 1.5259   LearningRate 0.0205   Epoch: 10   Global Step: 182680   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 18:34:35,076-Speed 3320.35 samples/sec   Loss 1.5663   LearningRate 0.0205   Epoch: 10   Global Step: 182690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:34:38,169-Speed 3311.78 samples/sec   Loss 1.5943   LearningRate 0.0205   Epoch: 10   Global Step: 182700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:34:41,261-Speed 3312.17 samples/sec   Loss 1.6168   LearningRate 0.0205   Epoch: 10   Global Step: 182710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:34:44,422-Speed 3240.08 samples/sec   Loss 1.5194   LearningRate 0.0205   Epoch: 10   Global Step: 182720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:34:47,579-Speed 3244.89 samples/sec   Loss 1.5788   LearningRate 0.0205   Epoch: 10   Global Step: 182730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:34:50,695-Speed 3286.74 samples/sec   Loss 1.5906   LearningRate 0.0205   Epoch: 10   Global Step: 182740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:34:53,785-Speed 3314.78 samples/sec   Loss 1.5524   LearningRate 0.0205   Epoch: 10   Global Step: 182750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:34:56,897-Speed 3291.59 samples/sec   Loss 1.5504   LearningRate 0.0205   Epoch: 10   Global Step: 182760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:00,040-Speed 3258.23 samples/sec   Loss 1.5156   LearningRate 0.0205   Epoch: 10   Global Step: 182770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:03,139-Speed 3305.01 samples/sec   Loss 1.5849   LearningRate 0.0205   Epoch: 10   Global Step: 182780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:06,202-Speed 3344.77 samples/sec   Loss 1.5929   LearningRate 0.0205   Epoch: 10   Global Step: 182790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:09,291-Speed 3315.89 samples/sec   Loss 1.5710   LearningRate 0.0205   Epoch: 10   Global Step: 182800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:12,454-Speed 3237.43 samples/sec   Loss 1.5481   LearningRate 0.0205   Epoch: 10   Global Step: 182810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:15,561-Speed 3296.49 samples/sec   Loss 1.5380   LearningRate 0.0205   Epoch: 10   Global Step: 182820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:18,670-Speed 3294.73 samples/sec   Loss 1.5382   LearningRate 0.0205   Epoch: 10   Global Step: 182830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:21,766-Speed 3307.63 samples/sec   Loss 1.5888   LearningRate 0.0205   Epoch: 10   Global Step: 182840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:24,852-Speed 3319.82 samples/sec   Loss 1.5450   LearningRate 0.0205   Epoch: 10   Global Step: 182850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:27,932-Speed 3324.94 samples/sec   Loss 1.5135   LearningRate 0.0205   Epoch: 10   Global Step: 182860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:31,048-Speed 3287.50 samples/sec   Loss 1.5570   LearningRate 0.0204   Epoch: 10   Global Step: 182870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:34,149-Speed 3302.98 samples/sec   Loss 1.5949   LearningRate 0.0204   Epoch: 10   Global Step: 182880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:37,247-Speed 3306.66 samples/sec   Loss 1.5935   LearningRate 0.0204   Epoch: 10   Global Step: 182890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:40,334-Speed 3317.00 samples/sec   Loss 1.5501   LearningRate 0.0204   Epoch: 10   Global Step: 182900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:43,429-Speed 3310.13 samples/sec   Loss 1.5898   LearningRate 0.0204   Epoch: 10   Global Step: 182910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:46,515-Speed 3318.41 samples/sec   Loss 1.5642   LearningRate 0.0204   Epoch: 10   Global Step: 182920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:49,618-Speed 3300.65 samples/sec   Loss 1.5443   LearningRate 0.0204   Epoch: 10   Global Step: 182930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:35:52,734-Speed 3287.51 samples/sec   Loss 1.5852   LearningRate 0.0204   Epoch: 10   Global Step: 182940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:55,830-Speed 3308.64 samples/sec   Loss 1.5119   LearningRate 0.0204   Epoch: 10   Global Step: 182950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:35:58,998-Speed 3233.01 samples/sec   Loss 1.6211   LearningRate 0.0204   Epoch: 10   Global Step: 182960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:02,076-Speed 3327.85 samples/sec   Loss 1.5303   LearningRate 0.0204   Epoch: 10   Global Step: 182970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:05,157-Speed 3324.39 samples/sec   Loss 1.5642   LearningRate 0.0204   Epoch: 10   Global Step: 182980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:08,334-Speed 3223.18 samples/sec   Loss 1.5716   LearningRate 0.0204   Epoch: 10   Global Step: 182990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:11,421-Speed 3318.62 samples/sec   Loss 1.5129   LearningRate 0.0204   Epoch: 10   Global Step: 183000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:14,589-Speed 3233.07 samples/sec   Loss 1.5598   LearningRate 0.0204   Epoch: 10   Global Step: 183010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:17,698-Speed 3294.82 samples/sec   Loss 1.5594   LearningRate 0.0204   Epoch: 10   Global Step: 183020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:20,791-Speed 3311.35 samples/sec   Loss 1.5406   LearningRate 0.0204   Epoch: 10   Global Step: 183030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:23,883-Speed 3311.67 samples/sec   Loss 1.5617   LearningRate 0.0204   Epoch: 10   Global Step: 183040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:36:26,983-Speed 3304.19 samples/sec   Loss 1.5467   LearningRate 0.0204   Epoch: 10   Global Step: 183050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:36:30,074-Speed 3314.01 samples/sec   Loss 1.5809   LearningRate 0.0204   Epoch: 10   Global Step: 183060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:36:33,153-Speed 3325.74 samples/sec   Loss 1.5420   LearningRate 0.0204   Epoch: 10   Global Step: 183070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:36:36,240-Speed 3318.24 samples/sec   Loss 1.5676   LearningRate 0.0204   Epoch: 10   Global Step: 183080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:36:39,447-Speed 3194.18 samples/sec   Loss 1.5528   LearningRate 0.0204   Epoch: 10   Global Step: 183090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:36:42,527-Speed 3325.69 samples/sec   Loss 1.5711   LearningRate 0.0204   Epoch: 10   Global Step: 183100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:36:45,633-Speed 3297.35 samples/sec   Loss 1.5379   LearningRate 0.0204   Epoch: 10   Global Step: 183110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:36:48,707-Speed 3331.78 samples/sec   Loss 1.5778   LearningRate 0.0204   Epoch: 10   Global Step: 183120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:51,807-Speed 3304.43 samples/sec   Loss 1.5794   LearningRate 0.0204   Epoch: 10   Global Step: 183130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:54,959-Speed 3249.46 samples/sec   Loss 1.5515   LearningRate 0.0204   Epoch: 10   Global Step: 183140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:36:58,051-Speed 3312.58 samples/sec   Loss 1.5960   LearningRate 0.0204   Epoch: 10   Global Step: 183150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:01,138-Speed 3317.31 samples/sec   Loss 1.5269   LearningRate 0.0204   Epoch: 10   Global Step: 183160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:04,349-Speed 3189.56 samples/sec   Loss 1.4983   LearningRate 0.0204   Epoch: 10   Global Step: 183170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:07,538-Speed 3213.12 samples/sec   Loss 1.5910   LearningRate 0.0204   Epoch: 10   Global Step: 183180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:10,630-Speed 3312.77 samples/sec   Loss 1.6093   LearningRate 0.0204   Epoch: 10   Global Step: 183190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:13,723-Speed 3310.46 samples/sec   Loss 1.5571   LearningRate 0.0204   Epoch: 10   Global Step: 183200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:16,808-Speed 3320.87 samples/sec   Loss 1.5455   LearningRate 0.0204   Epoch: 10   Global Step: 183210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:19,912-Speed 3299.62 samples/sec   Loss 1.5796   LearningRate 0.0204   Epoch: 10   Global Step: 183220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:37:22,999-Speed 3317.35 samples/sec   Loss 1.5858   LearningRate 0.0204   Epoch: 10   Global Step: 183230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:37:26,066-Speed 3339.75 samples/sec   Loss 1.6025   LearningRate 0.0203   Epoch: 10   Global Step: 183240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:29,154-Speed 3317.22 samples/sec   Loss 1.5489   LearningRate 0.0203   Epoch: 10   Global Step: 183250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:32,241-Speed 3317.65 samples/sec   Loss 1.5710   LearningRate 0.0203   Epoch: 10   Global Step: 183260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:35,342-Speed 3302.80 samples/sec   Loss 1.5603   LearningRate 0.0203   Epoch: 10   Global Step: 183270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:38,449-Speed 3296.91 samples/sec   Loss 1.5873   LearningRate 0.0203   Epoch: 10   Global Step: 183280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:41,530-Speed 3324.05 samples/sec   Loss 1.5530   LearningRate 0.0203   Epoch: 10   Global Step: 183290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:44,613-Speed 3321.81 samples/sec   Loss 1.5463   LearningRate 0.0203   Epoch: 10   Global Step: 183300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:47,705-Speed 3312.77 samples/sec   Loss 1.5346   LearningRate 0.0203   Epoch: 10   Global Step: 183310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:50,789-Speed 3321.23 samples/sec   Loss 1.5966   LearningRate 0.0203   Epoch: 10   Global Step: 183320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:53,870-Speed 3324.71 samples/sec   Loss 1.5683   LearningRate 0.0203   Epoch: 10   Global Step: 183330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:37:56,979-Speed 3293.85 samples/sec   Loss 1.5115   LearningRate 0.0203   Epoch: 10   Global Step: 183340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:00,074-Speed 3309.60 samples/sec   Loss 1.5569   LearningRate 0.0203   Epoch: 10   Global Step: 183350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:03,175-Speed 3303.60 samples/sec   Loss 1.5687   LearningRate 0.0203   Epoch: 10   Global Step: 183360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:06,296-Speed 3281.60 samples/sec   Loss 1.5007   LearningRate 0.0203   Epoch: 10   Global Step: 183370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:09,403-Speed 3296.45 samples/sec   Loss 1.5686   LearningRate 0.0203   Epoch: 10   Global Step: 183380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:12,502-Speed 3305.27 samples/sec   Loss 1.5198   LearningRate 0.0203   Epoch: 10   Global Step: 183390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:15,657-Speed 3246.33 samples/sec   Loss 1.5888   LearningRate 0.0203   Epoch: 10   Global Step: 183400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:18,783-Speed 3276.75 samples/sec   Loss 1.5327   LearningRate 0.0203   Epoch: 10   Global Step: 183410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:21,867-Speed 3320.48 samples/sec   Loss 1.5104   LearningRate 0.0203   Epoch: 10   Global Step: 183420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:24,960-Speed 3311.19 samples/sec   Loss 1.5801   LearningRate 0.0203   Epoch: 10   Global Step: 183430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:38:28,073-Speed 3290.87 samples/sec   Loss 1.5892   LearningRate 0.0203   Epoch: 10   Global Step: 183440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:31,152-Speed 3325.93 samples/sec   Loss 1.5538   LearningRate 0.0203   Epoch: 10   Global Step: 183450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:34,233-Speed 3324.94 samples/sec   Loss 1.6076   LearningRate 0.0203   Epoch: 10   Global Step: 183460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:37,322-Speed 3316.46 samples/sec   Loss 1.5335   LearningRate 0.0203   Epoch: 10   Global Step: 183470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:40,402-Speed 3325.20 samples/sec   Loss 1.5330   LearningRate 0.0203   Epoch: 10   Global Step: 183480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:43,479-Speed 3328.21 samples/sec   Loss 1.5854   LearningRate 0.0203   Epoch: 10   Global Step: 183490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:46,576-Speed 3307.33 samples/sec   Loss 1.5836   LearningRate 0.0203   Epoch: 10   Global Step: 183500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:49,654-Speed 3327.91 samples/sec   Loss 1.5494   LearningRate 0.0203   Epoch: 10   Global Step: 183510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:52,745-Speed 3313.29 samples/sec   Loss 1.5807   LearningRate 0.0203   Epoch: 10   Global Step: 183520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:55,826-Speed 3324.63 samples/sec   Loss 1.5784   LearningRate 0.0203   Epoch: 10   Global Step: 183530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:38:58,895-Speed 3337.02 samples/sec   Loss 1.5489   LearningRate 0.0203   Epoch: 10   Global Step: 183540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:39:01,978-Speed 3322.28 samples/sec   Loss 1.5556   LearningRate 0.0203   Epoch: 10   Global Step: 183550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:39:05,110-Speed 3271.07 samples/sec   Loss 1.5296   LearningRate 0.0203   Epoch: 10   Global Step: 183560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:39:08,207-Speed 3306.86 samples/sec   Loss 1.5362   LearningRate 0.0203   Epoch: 10   Global Step: 183570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:39:11,312-Speed 3299.00 samples/sec   Loss 1.5556   LearningRate 0.0203   Epoch: 10   Global Step: 183580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:39:14,926-Speed 2833.92 samples/sec   Loss 1.5313   LearningRate 0.0203   Epoch: 10   Global Step: 183590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:39:18,012-Speed 3318.82 samples/sec   Loss 1.5337   LearningRate 0.0203   Epoch: 10   Global Step: 183600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:39:48,725-Speed 333.42 samples/sec   Loss 1.1431   LearningRate 0.0202   Epoch: 11   Global Step: 183610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:39:51,913-Speed 3213.00 samples/sec   Loss 1.0930   LearningRate 0.0202   Epoch: 11   Global Step: 183620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:39:54,998-Speed 3320.18 samples/sec   Loss 1.1288   LearningRate 0.0202   Epoch: 11   Global Step: 183630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:39:58,168-Speed 3232.14 samples/sec   Loss 1.0608   LearningRate 0.0202   Epoch: 11   Global Step: 183640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:01,324-Speed 3244.87 samples/sec   Loss 1.0684   LearningRate 0.0202   Epoch: 11   Global Step: 183650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:04,412-Speed 3317.10 samples/sec   Loss 1.1016   LearningRate 0.0202   Epoch: 11   Global Step: 183660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:07,510-Speed 3306.33 samples/sec   Loss 1.0932   LearningRate 0.0202   Epoch: 11   Global Step: 183670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:10,719-Speed 3192.26 samples/sec   Loss 1.0267   LearningRate 0.0202   Epoch: 11   Global Step: 183680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:13,817-Speed 3305.68 samples/sec   Loss 1.0647   LearningRate 0.0202   Epoch: 11   Global Step: 183690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:40:16,895-Speed 3327.89 samples/sec   Loss 1.1425   LearningRate 0.0202   Epoch: 11   Global Step: 183700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:40:20,029-Speed 3267.99 samples/sec   Loss 1.1140   LearningRate 0.0202   Epoch: 11   Global Step: 183710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:40:23,176-Speed 3254.79 samples/sec   Loss 1.0583   LearningRate 0.0202   Epoch: 11   Global Step: 183720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:40:26,268-Speed 3312.46 samples/sec   Loss 1.0236   LearningRate 0.0202   Epoch: 11   Global Step: 183730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:40:29,346-Speed 3328.20 samples/sec   Loss 1.1132   LearningRate 0.0202   Epoch: 11   Global Step: 183740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:40:32,477-Speed 3271.33 samples/sec   Loss 1.1151   LearningRate 0.0202   Epoch: 11   Global Step: 183750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:40:35,631-Speed 3247.28 samples/sec   Loss 1.1223   LearningRate 0.0202   Epoch: 11   Global Step: 183760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:38,796-Speed 3235.90 samples/sec   Loss 1.0859   LearningRate 0.0202   Epoch: 11   Global Step: 183770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:42,058-Speed 3140.65 samples/sec   Loss 1.1137   LearningRate 0.0202   Epoch: 11   Global Step: 183780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:45,674-Speed 2832.08 samples/sec   Loss 1.0371   LearningRate 0.0202   Epoch: 11   Global Step: 183790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:48,759-Speed 3320.82 samples/sec   Loss 1.0825   LearningRate 0.0202   Epoch: 11   Global Step: 183800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:51,889-Speed 3271.83 samples/sec   Loss 1.0927   LearningRate 0.0202   Epoch: 11   Global Step: 183810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:55,256-Speed 3041.52 samples/sec   Loss 1.0971   LearningRate 0.0202   Epoch: 11   Global Step: 183820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:40:58,359-Speed 3301.05 samples/sec   Loss 1.0396   LearningRate 0.0202   Epoch: 11   Global Step: 183830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:41:01,468-Speed 3295.56 samples/sec   Loss 1.0727   LearningRate 0.0202   Epoch: 11   Global Step: 183840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:41:04,597-Speed 3272.86 samples/sec   Loss 1.0612   LearningRate 0.0202   Epoch: 11   Global Step: 183850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:41:07,771-Speed 3227.57 samples/sec   Loss 1.1399   LearningRate 0.0202   Epoch: 11   Global Step: 183860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:10,932-Speed 3240.21 samples/sec   Loss 1.0806   LearningRate 0.0202   Epoch: 11   Global Step: 183870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:14,059-Speed 3275.06 samples/sec   Loss 1.0706   LearningRate 0.0202   Epoch: 11   Global Step: 183880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:17,163-Speed 3299.45 samples/sec   Loss 1.0931   LearningRate 0.0202   Epoch: 11   Global Step: 183890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:20,258-Speed 3309.92 samples/sec   Loss 1.1044   LearningRate 0.0202   Epoch: 11   Global Step: 183900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:23,357-Speed 3305.24 samples/sec   Loss 1.0851   LearningRate 0.0202   Epoch: 11   Global Step: 183910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:26,464-Speed 3296.46 samples/sec   Loss 1.1105   LearningRate 0.0202   Epoch: 11   Global Step: 183920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:29,619-Speed 3246.93 samples/sec   Loss 1.0530   LearningRate 0.0202   Epoch: 11   Global Step: 183930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:32,716-Speed 3307.09 samples/sec   Loss 1.0864   LearningRate 0.0202   Epoch: 11   Global Step: 183940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:35,826-Speed 3292.89 samples/sec   Loss 1.1158   LearningRate 0.0202   Epoch: 11   Global Step: 183950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:38,911-Speed 3320.78 samples/sec   Loss 1.0818   LearningRate 0.0202   Epoch: 11   Global Step: 183960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:42,027-Speed 3286.63 samples/sec   Loss 1.0599   LearningRate 0.0202   Epoch: 11   Global Step: 183970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:45,148-Speed 3282.35 samples/sec   Loss 1.0955   LearningRate 0.0201   Epoch: 11   Global Step: 183980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:48,279-Speed 3271.30 samples/sec   Loss 1.0914   LearningRate 0.0201   Epoch: 11   Global Step: 183990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:41:51,469-Speed 3210.15 samples/sec   Loss 1.1005   LearningRate 0.0201   Epoch: 11   Global Step: 184000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:42:35,981-[lfw][184000]XNorm: 23.647841
Training: 2022-04-11 18:42:35,982-[lfw][184000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 18:42:35,982-[lfw][184000]Accuracy-Highest: 0.99817
Training: 2022-04-11 18:43:27,336-[cfp_fp][184000]XNorm: 23.160488
Training: 2022-04-11 18:43:27,336-[cfp_fp][184000]Accuracy-Flip: 0.98857+-0.00465
Training: 2022-04-11 18:43:27,337-[cfp_fp][184000]Accuracy-Highest: 0.98971
Training: 2022-04-11 18:44:11,450-[agedb_30][184000]XNorm: 24.379485
Training: 2022-04-11 18:44:11,451-[agedb_30][184000]Accuracy-Flip: 0.98417+-0.00620
Training: 2022-04-11 18:44:11,451-[agedb_30][184000]Accuracy-Highest: 0.98500
Training: 2022-04-11 18:44:14,538-Speed 71.57 samples/sec   Loss 1.0960   LearningRate 0.0201   Epoch: 11   Global Step: 184010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:44:17,630-Speed 3312.50 samples/sec   Loss 1.0819   LearningRate 0.0201   Epoch: 11   Global Step: 184020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:44:20,700-Speed 3337.32 samples/sec   Loss 1.1370   LearningRate 0.0201   Epoch: 11   Global Step: 184030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:23,796-Speed 3307.34 samples/sec   Loss 1.1114   LearningRate 0.0201   Epoch: 11   Global Step: 184040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:26,924-Speed 3274.74 samples/sec   Loss 1.1004   LearningRate 0.0201   Epoch: 11   Global Step: 184050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:30,018-Speed 3310.93 samples/sec   Loss 1.1228   LearningRate 0.0201   Epoch: 11   Global Step: 184060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:33,084-Speed 3340.40 samples/sec   Loss 1.1490   LearningRate 0.0201   Epoch: 11   Global Step: 184070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:36,157-Speed 3333.76 samples/sec   Loss 1.0816   LearningRate 0.0201   Epoch: 11   Global Step: 184080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:39,235-Speed 3327.40 samples/sec   Loss 1.0844   LearningRate 0.0201   Epoch: 11   Global Step: 184090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:42,375-Speed 3261.46 samples/sec   Loss 1.1347   LearningRate 0.0201   Epoch: 11   Global Step: 184100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:45,474-Speed 3305.02 samples/sec   Loss 1.1324   LearningRate 0.0201   Epoch: 11   Global Step: 184110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:48,552-Speed 3327.79 samples/sec   Loss 1.1227   LearningRate 0.0201   Epoch: 11   Global Step: 184120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 18:44:51,638-Speed 3318.28 samples/sec   Loss 1.0608   LearningRate 0.0201   Epoch: 11   Global Step: 184130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:44:54,740-Speed 3302.19 samples/sec   Loss 1.0985   LearningRate 0.0201   Epoch: 11   Global Step: 184140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:44:57,827-Speed 3318.56 samples/sec   Loss 1.1139   LearningRate 0.0201   Epoch: 11   Global Step: 184150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:45:00,913-Speed 3318.41 samples/sec   Loss 1.0992   LearningRate 0.0201   Epoch: 11   Global Step: 184160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:45:04,004-Speed 3314.38 samples/sec   Loss 1.1092   LearningRate 0.0201   Epoch: 11   Global Step: 184170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:45:07,118-Speed 3288.59 samples/sec   Loss 1.1049   LearningRate 0.0201   Epoch: 11   Global Step: 184180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:45:10,204-Speed 3318.75 samples/sec   Loss 1.0626   LearningRate 0.0201   Epoch: 11   Global Step: 184190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:45:13,329-Speed 3277.95 samples/sec   Loss 1.1060   LearningRate 0.0201   Epoch: 11   Global Step: 184200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:45:16,422-Speed 3311.25 samples/sec   Loss 1.1358   LearningRate 0.0201   Epoch: 11   Global Step: 184210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:45:19,493-Speed 3335.87 samples/sec   Loss 1.1305   LearningRate 0.0201   Epoch: 11   Global Step: 184220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:45:22,564-Speed 3335.61 samples/sec   Loss 1.1062   LearningRate 0.0201   Epoch: 11   Global Step: 184230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:25,632-Speed 3338.98 samples/sec   Loss 1.1024   LearningRate 0.0201   Epoch: 11   Global Step: 184240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:28,698-Speed 3339.78 samples/sec   Loss 1.0878   LearningRate 0.0201   Epoch: 11   Global Step: 184250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:31,767-Speed 3337.50 samples/sec   Loss 1.1006   LearningRate 0.0201   Epoch: 11   Global Step: 184260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:34,854-Speed 3318.22 samples/sec   Loss 1.0778   LearningRate 0.0201   Epoch: 11   Global Step: 184270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:38,010-Speed 3245.19 samples/sec   Loss 1.0584   LearningRate 0.0201   Epoch: 11   Global Step: 184280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:41,112-Speed 3301.82 samples/sec   Loss 1.1098   LearningRate 0.0201   Epoch: 11   Global Step: 184290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:44,208-Speed 3308.45 samples/sec   Loss 1.0946   LearningRate 0.0201   Epoch: 11   Global Step: 184300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:47,332-Speed 3278.54 samples/sec   Loss 1.1029   LearningRate 0.0201   Epoch: 11   Global Step: 184310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:50,424-Speed 3312.32 samples/sec   Loss 1.0771   LearningRate 0.0201   Epoch: 11   Global Step: 184320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:53,511-Speed 3318.09 samples/sec   Loss 1.1027   LearningRate 0.0201   Epoch: 11   Global Step: 184330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:56,626-Speed 3287.92 samples/sec   Loss 1.0926   LearningRate 0.0201   Epoch: 11   Global Step: 184340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:45:59,788-Speed 3239.22 samples/sec   Loss 1.1393   LearningRate 0.0200   Epoch: 11   Global Step: 184350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:02,921-Speed 3269.26 samples/sec   Loss 1.0862   LearningRate 0.0200   Epoch: 11   Global Step: 184360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:06,026-Speed 3299.15 samples/sec   Loss 1.1073   LearningRate 0.0200   Epoch: 11   Global Step: 184370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:09,152-Speed 3275.68 samples/sec   Loss 1.1525   LearningRate 0.0200   Epoch: 11   Global Step: 184380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:12,260-Speed 3296.18 samples/sec   Loss 1.0802   LearningRate 0.0200   Epoch: 11   Global Step: 184390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:15,361-Speed 3302.38 samples/sec   Loss 1.0875   LearningRate 0.0200   Epoch: 11   Global Step: 184400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:18,553-Speed 3209.00 samples/sec   Loss 1.0904   LearningRate 0.0200   Epoch: 11   Global Step: 184410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:21,620-Speed 3340.49 samples/sec   Loss 1.1109   LearningRate 0.0200   Epoch: 11   Global Step: 184420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:24,785-Speed 3235.68 samples/sec   Loss 1.1560   LearningRate 0.0200   Epoch: 11   Global Step: 184430   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 18:46:27,869-Speed 3320.65 samples/sec   Loss 1.1079   LearningRate 0.0200   Epoch: 11   Global Step: 184440   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 18:46:31,023-Speed 3247.98 samples/sec   Loss 1.1318   LearningRate 0.0200   Epoch: 11   Global Step: 184450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:34,115-Speed 3312.28 samples/sec   Loss 1.1260   LearningRate 0.0200   Epoch: 11   Global Step: 184460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:46:37,264-Speed 3253.09 samples/sec   Loss 1.1389   LearningRate 0.0200   Epoch: 11   Global Step: 184470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:46:40,350-Speed 3318.25 samples/sec   Loss 1.0978   LearningRate 0.0200   Epoch: 11   Global Step: 184480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:46:43,435-Speed 3320.18 samples/sec   Loss 1.1398   LearningRate 0.0200   Epoch: 11   Global Step: 184490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:46:46,548-Speed 3290.82 samples/sec   Loss 1.1102   LearningRate 0.0200   Epoch: 11   Global Step: 184500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:46:49,668-Speed 3282.29 samples/sec   Loss 1.0982   LearningRate 0.0200   Epoch: 11   Global Step: 184510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:46:52,777-Speed 3294.69 samples/sec   Loss 1.1008   LearningRate 0.0200   Epoch: 11   Global Step: 184520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:46:55,860-Speed 3321.90 samples/sec   Loss 1.0965   LearningRate 0.0200   Epoch: 11   Global Step: 184530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:46:58,968-Speed 3296.01 samples/sec   Loss 1.0723   LearningRate 0.0200   Epoch: 11   Global Step: 184540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:47:02,052-Speed 3320.76 samples/sec   Loss 1.0786   LearningRate 0.0200   Epoch: 11   Global Step: 184550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:47:05,190-Speed 3263.27 samples/sec   Loss 1.0900   LearningRate 0.0200   Epoch: 11   Global Step: 184560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:47:08,371-Speed 3220.40 samples/sec   Loss 1.1725   LearningRate 0.0200   Epoch: 11   Global Step: 184570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:11,495-Speed 3279.26 samples/sec   Loss 1.1334   LearningRate 0.0200   Epoch: 11   Global Step: 184580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:14,685-Speed 3210.94 samples/sec   Loss 1.1664   LearningRate 0.0200   Epoch: 11   Global Step: 184590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:17,820-Speed 3266.40 samples/sec   Loss 1.1474   LearningRate 0.0200   Epoch: 11   Global Step: 184600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:20,923-Speed 3301.30 samples/sec   Loss 1.1064   LearningRate 0.0200   Epoch: 11   Global Step: 184610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:24,022-Speed 3305.03 samples/sec   Loss 1.1344   LearningRate 0.0200   Epoch: 11   Global Step: 184620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:27,189-Speed 3234.15 samples/sec   Loss 1.1062   LearningRate 0.0200   Epoch: 11   Global Step: 184630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:30,307-Speed 3284.99 samples/sec   Loss 1.0933   LearningRate 0.0200   Epoch: 11   Global Step: 184640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:33,386-Speed 3325.97 samples/sec   Loss 1.0985   LearningRate 0.0200   Epoch: 11   Global Step: 184650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:36,463-Speed 3328.29 samples/sec   Loss 1.1083   LearningRate 0.0200   Epoch: 11   Global Step: 184660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:39,535-Speed 3335.40 samples/sec   Loss 1.1362   LearningRate 0.0200   Epoch: 11   Global Step: 184670   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 18:47:42,759-Speed 3176.41 samples/sec   Loss 1.0732   LearningRate 0.0200   Epoch: 11   Global Step: 184680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:45,998-Speed 3162.59 samples/sec   Loss 1.1136   LearningRate 0.0200   Epoch: 11   Global Step: 184690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:49,101-Speed 3300.03 samples/sec   Loss 1.1073   LearningRate 0.0200   Epoch: 11   Global Step: 184700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:52,173-Speed 3334.66 samples/sec   Loss 1.1257   LearningRate 0.0200   Epoch: 11   Global Step: 184710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:55,251-Speed 3327.21 samples/sec   Loss 1.1651   LearningRate 0.0199   Epoch: 11   Global Step: 184720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:47:58,321-Speed 3336.59 samples/sec   Loss 1.1398   LearningRate 0.0199   Epoch: 11   Global Step: 184730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:48:01,399-Speed 3327.32 samples/sec   Loss 1.1470   LearningRate 0.0199   Epoch: 11   Global Step: 184740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:48:04,458-Speed 3348.40 samples/sec   Loss 1.1276   LearningRate 0.0199   Epoch: 11   Global Step: 184750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:07,534-Speed 3330.64 samples/sec   Loss 1.1266   LearningRate 0.0199   Epoch: 11   Global Step: 184760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:10,605-Speed 3334.22 samples/sec   Loss 1.1259   LearningRate 0.0199   Epoch: 11   Global Step: 184770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:13,695-Speed 3314.52 samples/sec   Loss 1.1134   LearningRate 0.0199   Epoch: 11   Global Step: 184780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:16,782-Speed 3318.66 samples/sec   Loss 1.0733   LearningRate 0.0199   Epoch: 11   Global Step: 184790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:19,866-Speed 3320.47 samples/sec   Loss 1.1261   LearningRate 0.0199   Epoch: 11   Global Step: 184800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:22,961-Speed 3309.52 samples/sec   Loss 1.1037   LearningRate 0.0199   Epoch: 11   Global Step: 184810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:26,079-Speed 3285.36 samples/sec   Loss 1.1284   LearningRate 0.0199   Epoch: 11   Global Step: 184820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:29,190-Speed 3291.78 samples/sec   Loss 1.0736   LearningRate 0.0199   Epoch: 11   Global Step: 184830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:32,278-Speed 3317.34 samples/sec   Loss 1.1340   LearningRate 0.0199   Epoch: 11   Global Step: 184840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:35,345-Speed 3339.71 samples/sec   Loss 1.1359   LearningRate 0.0199   Epoch: 11   Global Step: 184850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:48:38,411-Speed 3340.54 samples/sec   Loss 1.1379   LearningRate 0.0199   Epoch: 11   Global Step: 184860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:41,497-Speed 3319.30 samples/sec   Loss 1.1599   LearningRate 0.0199   Epoch: 11   Global Step: 184870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:44,573-Speed 3329.16 samples/sec   Loss 1.1469   LearningRate 0.0199   Epoch: 11   Global Step: 184880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:47,761-Speed 3213.19 samples/sec   Loss 1.1454   LearningRate 0.0199   Epoch: 11   Global Step: 184890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:51,006-Speed 3155.77 samples/sec   Loss 1.1063   LearningRate 0.0199   Epoch: 11   Global Step: 184900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:54,203-Speed 3204.46 samples/sec   Loss 1.1518   LearningRate 0.0199   Epoch: 11   Global Step: 184910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:48:57,369-Speed 3234.80 samples/sec   Loss 1.1749   LearningRate 0.0199   Epoch: 11   Global Step: 184920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:49:00,447-Speed 3328.12 samples/sec   Loss 1.1442   LearningRate 0.0199   Epoch: 11   Global Step: 184930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:49:03,609-Speed 3239.52 samples/sec   Loss 1.1335   LearningRate 0.0199   Epoch: 11   Global Step: 184940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:49:06,678-Speed 3336.46 samples/sec   Loss 1.1293   LearningRate 0.0199   Epoch: 11   Global Step: 184950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:49:09,769-Speed 3313.82 samples/sec   Loss 1.1218   LearningRate 0.0199   Epoch: 11   Global Step: 184960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:12,895-Speed 3277.14 samples/sec   Loss 1.1099   LearningRate 0.0199   Epoch: 11   Global Step: 184970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:15,973-Speed 3327.29 samples/sec   Loss 1.1439   LearningRate 0.0199   Epoch: 11   Global Step: 184980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:19,072-Speed 3304.91 samples/sec   Loss 1.1515   LearningRate 0.0199   Epoch: 11   Global Step: 184990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:22,242-Speed 3231.22 samples/sec   Loss 1.1749   LearningRate 0.0199   Epoch: 11   Global Step: 185000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:25,329-Speed 3317.60 samples/sec   Loss 1.1664   LearningRate 0.0199   Epoch: 11   Global Step: 185010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:28,401-Speed 3333.89 samples/sec   Loss 1.1776   LearningRate 0.0199   Epoch: 11   Global Step: 185020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:31,521-Speed 3283.68 samples/sec   Loss 1.1313   LearningRate 0.0199   Epoch: 11   Global Step: 185030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:34,608-Speed 3317.04 samples/sec   Loss 1.1436   LearningRate 0.0199   Epoch: 11   Global Step: 185040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:37,693-Speed 3320.61 samples/sec   Loss 1.1240   LearningRate 0.0199   Epoch: 11   Global Step: 185050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:40,786-Speed 3310.95 samples/sec   Loss 1.1240   LearningRate 0.0199   Epoch: 11   Global Step: 185060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:43,900-Speed 3289.43 samples/sec   Loss 1.1367   LearningRate 0.0199   Epoch: 11   Global Step: 185070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:47,021-Speed 3281.74 samples/sec   Loss 1.1430   LearningRate 0.0199   Epoch: 11   Global Step: 185080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:49:50,160-Speed 3263.09 samples/sec   Loss 1.1517   LearningRate 0.0199   Epoch: 11   Global Step: 185090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:49:53,307-Speed 3254.93 samples/sec   Loss 1.1244   LearningRate 0.0198   Epoch: 11   Global Step: 185100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:49:56,474-Speed 3234.27 samples/sec   Loss 1.1381   LearningRate 0.0198   Epoch: 11   Global Step: 185110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:49:59,586-Speed 3291.01 samples/sec   Loss 1.1377   LearningRate 0.0198   Epoch: 11   Global Step: 185120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:02,711-Speed 3277.20 samples/sec   Loss 1.1415   LearningRate 0.0198   Epoch: 11   Global Step: 185130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:05,815-Speed 3300.04 samples/sec   Loss 1.1716   LearningRate 0.0198   Epoch: 11   Global Step: 185140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:08,911-Speed 3308.12 samples/sec   Loss 1.1397   LearningRate 0.0198   Epoch: 11   Global Step: 185150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:11,994-Speed 3321.71 samples/sec   Loss 1.1146   LearningRate 0.0198   Epoch: 11   Global Step: 185160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:15,084-Speed 3315.24 samples/sec   Loss 1.1687   LearningRate 0.0198   Epoch: 11   Global Step: 185170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:18,165-Speed 3324.39 samples/sec   Loss 1.1834   LearningRate 0.0198   Epoch: 11   Global Step: 185180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:21,237-Speed 3333.44 samples/sec   Loss 1.1623   LearningRate 0.0198   Epoch: 11   Global Step: 185190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:50:24,315-Speed 3328.07 samples/sec   Loss 1.1075   LearningRate 0.0198   Epoch: 11   Global Step: 185200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:50:27,465-Speed 3251.10 samples/sec   Loss 1.1300   LearningRate 0.0198   Epoch: 11   Global Step: 185210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:50:30,559-Speed 3310.64 samples/sec   Loss 1.1691   LearningRate 0.0198   Epoch: 11   Global Step: 185220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:50:33,669-Speed 3293.05 samples/sec   Loss 1.1257   LearningRate 0.0198   Epoch: 11   Global Step: 185230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:50:36,764-Speed 3309.75 samples/sec   Loss 1.1456   LearningRate 0.0198   Epoch: 11   Global Step: 185240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:50:39,873-Speed 3294.74 samples/sec   Loss 1.1629   LearningRate 0.0198   Epoch: 11   Global Step: 185250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:50:42,973-Speed 3304.46 samples/sec   Loss 1.1328   LearningRate 0.0198   Epoch: 11   Global Step: 185260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:50:46,055-Speed 3323.49 samples/sec   Loss 1.1216   LearningRate 0.0198   Epoch: 11   Global Step: 185270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:50:49,127-Speed 3334.38 samples/sec   Loss 1.1126   LearningRate 0.0198   Epoch: 11   Global Step: 185280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:52,225-Speed 3305.88 samples/sec   Loss 1.1297   LearningRate 0.0198   Epoch: 11   Global Step: 185290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:55,325-Speed 3304.54 samples/sec   Loss 1.1665   LearningRate 0.0198   Epoch: 11   Global Step: 185300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:50:58,405-Speed 3324.36 samples/sec   Loss 1.1300   LearningRate 0.0198   Epoch: 11   Global Step: 185310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:51:01,479-Speed 3332.17 samples/sec   Loss 1.1901   LearningRate 0.0198   Epoch: 11   Global Step: 185320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:51:04,557-Speed 3328.37 samples/sec   Loss 1.1372   LearningRate 0.0198   Epoch: 11   Global Step: 185330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:51:07,645-Speed 3316.44 samples/sec   Loss 1.1810   LearningRate 0.0198   Epoch: 11   Global Step: 185340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:51:10,714-Speed 3336.65 samples/sec   Loss 1.1382   LearningRate 0.0198   Epoch: 11   Global Step: 185350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:51:13,809-Speed 3310.14 samples/sec   Loss 1.1244   LearningRate 0.0198   Epoch: 11   Global Step: 185360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:51:16,880-Speed 3335.03 samples/sec   Loss 1.0961   LearningRate 0.0198   Epoch: 11   Global Step: 185370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:51:19,954-Speed 3332.07 samples/sec   Loss 1.1597   LearningRate 0.0198   Epoch: 11   Global Step: 185380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:23,041-Speed 3317.51 samples/sec   Loss 1.1635   LearningRate 0.0198   Epoch: 11   Global Step: 185390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:26,129-Speed 3317.90 samples/sec   Loss 1.1129   LearningRate 0.0198   Epoch: 11   Global Step: 185400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:29,205-Speed 3329.12 samples/sec   Loss 1.1254   LearningRate 0.0198   Epoch: 11   Global Step: 185410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:32,281-Speed 3330.12 samples/sec   Loss 1.1425   LearningRate 0.0198   Epoch: 11   Global Step: 185420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:35,393-Speed 3291.11 samples/sec   Loss 1.1365   LearningRate 0.0198   Epoch: 11   Global Step: 185430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:38,477-Speed 3321.03 samples/sec   Loss 1.1530   LearningRate 0.0198   Epoch: 11   Global Step: 185440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:41,549-Speed 3334.45 samples/sec   Loss 1.1401   LearningRate 0.0198   Epoch: 11   Global Step: 185450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:44,633-Speed 3320.70 samples/sec   Loss 1.1627   LearningRate 0.0198   Epoch: 11   Global Step: 185460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:47,726-Speed 3311.94 samples/sec   Loss 1.1691   LearningRate 0.0197   Epoch: 11   Global Step: 185470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:50,790-Speed 3342.64 samples/sec   Loss 1.1229   LearningRate 0.0197   Epoch: 11   Global Step: 185480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:53,880-Speed 3314.80 samples/sec   Loss 1.1538   LearningRate 0.0197   Epoch: 11   Global Step: 185490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:51:56,971-Speed 3313.26 samples/sec   Loss 1.1537   LearningRate 0.0197   Epoch: 11   Global Step: 185500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:00,051-Speed 3325.95 samples/sec   Loss 1.1450   LearningRate 0.0197   Epoch: 11   Global Step: 185510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:03,132-Speed 3324.44 samples/sec   Loss 1.2163   LearningRate 0.0197   Epoch: 11   Global Step: 185520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:06,212-Speed 3325.43 samples/sec   Loss 1.1849   LearningRate 0.0197   Epoch: 11   Global Step: 185530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:09,284-Speed 3334.21 samples/sec   Loss 1.1258   LearningRate 0.0197   Epoch: 11   Global Step: 185540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:12,362-Speed 3327.19 samples/sec   Loss 1.1702   LearningRate 0.0197   Epoch: 11   Global Step: 185550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:15,450-Speed 3317.07 samples/sec   Loss 1.1986   LearningRate 0.0197   Epoch: 11   Global Step: 185560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:18,525-Speed 3330.94 samples/sec   Loss 1.1815   LearningRate 0.0197   Epoch: 11   Global Step: 185570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:21,608-Speed 3321.66 samples/sec   Loss 1.1850   LearningRate 0.0197   Epoch: 11   Global Step: 185580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:24,680-Speed 3334.98 samples/sec   Loss 1.1656   LearningRate 0.0197   Epoch: 11   Global Step: 185590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:27,751-Speed 3335.14 samples/sec   Loss 1.2326   LearningRate 0.0197   Epoch: 11   Global Step: 185600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:30,828-Speed 3328.74 samples/sec   Loss 1.2014   LearningRate 0.0197   Epoch: 11   Global Step: 185610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:33,907-Speed 3326.27 samples/sec   Loss 1.1528   LearningRate 0.0197   Epoch: 11   Global Step: 185620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:36,992-Speed 3320.65 samples/sec   Loss 1.1689   LearningRate 0.0197   Epoch: 11   Global Step: 185630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:40,072-Speed 3325.37 samples/sec   Loss 1.1547   LearningRate 0.0197   Epoch: 11   Global Step: 185640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:43,201-Speed 3273.24 samples/sec   Loss 1.2008   LearningRate 0.0197   Epoch: 11   Global Step: 185650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:52:46,275-Speed 3331.27 samples/sec   Loss 1.1434   LearningRate 0.0197   Epoch: 11   Global Step: 185660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:49,360-Speed 3320.91 samples/sec   Loss 1.1888   LearningRate 0.0197   Epoch: 11   Global Step: 185670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:52,436-Speed 3328.80 samples/sec   Loss 1.1534   LearningRate 0.0197   Epoch: 11   Global Step: 185680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:55,511-Speed 3331.35 samples/sec   Loss 1.1830   LearningRate 0.0197   Epoch: 11   Global Step: 185690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:52:58,584-Speed 3332.98 samples/sec   Loss 1.1782   LearningRate 0.0197   Epoch: 11   Global Step: 185700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:53:01,660-Speed 3329.93 samples/sec   Loss 1.1526   LearningRate 0.0197   Epoch: 11   Global Step: 185710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:53:04,873-Speed 3188.25 samples/sec   Loss 1.1011   LearningRate 0.0197   Epoch: 11   Global Step: 185720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:53:08,014-Speed 3261.00 samples/sec   Loss 1.2041   LearningRate 0.0197   Epoch: 11   Global Step: 185730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:53:11,240-Speed 3174.33 samples/sec   Loss 1.1602   LearningRate 0.0197   Epoch: 11   Global Step: 185740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:53:14,352-Speed 3291.61 samples/sec   Loss 1.1478   LearningRate 0.0197   Epoch: 11   Global Step: 185750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:53:17,423-Speed 3334.84 samples/sec   Loss 1.1642   LearningRate 0.0197   Epoch: 11   Global Step: 185760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:20,519-Speed 3308.05 samples/sec   Loss 1.1521   LearningRate 0.0197   Epoch: 11   Global Step: 185770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:23,647-Speed 3275.19 samples/sec   Loss 1.1565   LearningRate 0.0197   Epoch: 11   Global Step: 185780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:26,727-Speed 3325.33 samples/sec   Loss 1.1285   LearningRate 0.0197   Epoch: 11   Global Step: 185790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:29,801-Speed 3331.37 samples/sec   Loss 1.1684   LearningRate 0.0197   Epoch: 11   Global Step: 185800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:32,878-Speed 3329.73 samples/sec   Loss 1.1372   LearningRate 0.0197   Epoch: 11   Global Step: 185810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:36,009-Speed 3270.57 samples/sec   Loss 1.1777   LearningRate 0.0197   Epoch: 11   Global Step: 185820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:39,116-Speed 3297.07 samples/sec   Loss 1.1748   LearningRate 0.0197   Epoch: 11   Global Step: 185830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:42,209-Speed 3311.09 samples/sec   Loss 1.1695   LearningRate 0.0197   Epoch: 11   Global Step: 185840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:45,297-Speed 3316.96 samples/sec   Loss 1.1664   LearningRate 0.0196   Epoch: 11   Global Step: 185850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:53:48,503-Speed 3194.61 samples/sec   Loss 1.1437   LearningRate 0.0196   Epoch: 11   Global Step: 185860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:53:51,638-Speed 3267.47 samples/sec   Loss 1.1780   LearningRate 0.0196   Epoch: 11   Global Step: 185870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:53:54,779-Speed 3260.17 samples/sec   Loss 1.1840   LearningRate 0.0196   Epoch: 11   Global Step: 185880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:53:57,869-Speed 3315.64 samples/sec   Loss 1.2103   LearningRate 0.0196   Epoch: 11   Global Step: 185890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:00,942-Speed 3333.08 samples/sec   Loss 1.1828   LearningRate 0.0196   Epoch: 11   Global Step: 185900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:04,020-Speed 3327.54 samples/sec   Loss 1.1812   LearningRate 0.0196   Epoch: 11   Global Step: 185910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:07,114-Speed 3310.26 samples/sec   Loss 1.1608   LearningRate 0.0196   Epoch: 11   Global Step: 185920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:10,195-Speed 3324.27 samples/sec   Loss 1.1696   LearningRate 0.0196   Epoch: 11   Global Step: 185930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:13,270-Speed 3330.95 samples/sec   Loss 1.2239   LearningRate 0.0196   Epoch: 11   Global Step: 185940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:16,347-Speed 3328.38 samples/sec   Loss 1.2002   LearningRate 0.0196   Epoch: 11   Global Step: 185950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:19,441-Speed 3310.53 samples/sec   Loss 1.1822   LearningRate 0.0196   Epoch: 11   Global Step: 185960   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 18:54:22,513-Speed 3334.03 samples/sec   Loss 1.2141   LearningRate 0.0196   Epoch: 11   Global Step: 185970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:25,686-Speed 3228.39 samples/sec   Loss 1.1975   LearningRate 0.0196   Epoch: 11   Global Step: 185980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:28,766-Speed 3325.46 samples/sec   Loss 1.1918   LearningRate 0.0196   Epoch: 11   Global Step: 185990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:54:31,844-Speed 3327.30 samples/sec   Loss 1.1239   LearningRate 0.0196   Epoch: 11   Global Step: 186000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:55:15,839-[lfw][186000]XNorm: 21.301421
Training: 2022-04-11 18:55:15,839-[lfw][186000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-11 18:55:15,840-[lfw][186000]Accuracy-Highest: 0.99817
Training: 2022-04-11 18:56:06,775-[cfp_fp][186000]XNorm: 21.410969
Training: 2022-04-11 18:56:06,776-[cfp_fp][186000]Accuracy-Flip: 0.98943+-0.00448
Training: 2022-04-11 18:56:06,777-[cfp_fp][186000]Accuracy-Highest: 0.98971
Training: 2022-04-11 18:56:50,737-[agedb_30][186000]XNorm: 22.159895
Training: 2022-04-11 18:56:50,738-[agedb_30][186000]Accuracy-Flip: 0.98167+-0.00632
Training: 2022-04-11 18:56:50,739-[agedb_30][186000]Accuracy-Highest: 0.98500
Training: 2022-04-11 18:56:53,810-Speed 72.13 samples/sec   Loss 1.2082   LearningRate 0.0196   Epoch: 11   Global Step: 186010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:56:56,874-Speed 3342.69 samples/sec   Loss 1.1793   LearningRate 0.0196   Epoch: 11   Global Step: 186020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:56:59,937-Speed 3343.39 samples/sec   Loss 1.1898   LearningRate 0.0196   Epoch: 11   Global Step: 186030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:03,000-Speed 3344.21 samples/sec   Loss 1.2053   LearningRate 0.0196   Epoch: 11   Global Step: 186040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:06,125-Speed 3276.74 samples/sec   Loss 1.2105   LearningRate 0.0196   Epoch: 11   Global Step: 186050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:09,197-Speed 3334.69 samples/sec   Loss 1.1798   LearningRate 0.0196   Epoch: 11   Global Step: 186060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:12,357-Speed 3240.72 samples/sec   Loss 1.2140   LearningRate 0.0196   Epoch: 11   Global Step: 186070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:15,478-Speed 3282.41 samples/sec   Loss 1.1446   LearningRate 0.0196   Epoch: 11   Global Step: 186080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:18,567-Speed 3314.85 samples/sec   Loss 1.1600   LearningRate 0.0196   Epoch: 11   Global Step: 186090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:21,638-Speed 3335.87 samples/sec   Loss 1.1522   LearningRate 0.0196   Epoch: 11   Global Step: 186100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:24,713-Speed 3330.17 samples/sec   Loss 1.2225   LearningRate 0.0196   Epoch: 11   Global Step: 186110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:27,799-Speed 3319.37 samples/sec   Loss 1.1935   LearningRate 0.0196   Epoch: 11   Global Step: 186120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:30,879-Speed 3325.60 samples/sec   Loss 1.1495   LearningRate 0.0196   Epoch: 11   Global Step: 186130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:33,969-Speed 3315.04 samples/sec   Loss 1.2233   LearningRate 0.0196   Epoch: 11   Global Step: 186140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:37,052-Speed 3321.66 samples/sec   Loss 1.2065   LearningRate 0.0196   Epoch: 11   Global Step: 186150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:40,128-Speed 3329.34 samples/sec   Loss 1.1968   LearningRate 0.0196   Epoch: 11   Global Step: 186160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:43,200-Speed 3334.13 samples/sec   Loss 1.2032   LearningRate 0.0196   Epoch: 11   Global Step: 186170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:46,301-Speed 3303.20 samples/sec   Loss 1.1850   LearningRate 0.0196   Epoch: 11   Global Step: 186180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:49,385-Speed 3321.04 samples/sec   Loss 1.2028   LearningRate 0.0196   Epoch: 11   Global Step: 186190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:52,457-Speed 3333.97 samples/sec   Loss 1.1955   LearningRate 0.0196   Epoch: 11   Global Step: 186200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:55,543-Speed 3319.56 samples/sec   Loss 1.1694   LearningRate 0.0196   Epoch: 11   Global Step: 186210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:57:58,629-Speed 3318.35 samples/sec   Loss 1.2089   LearningRate 0.0196   Epoch: 11   Global Step: 186220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:58:01,703-Speed 3332.24 samples/sec   Loss 1.1968   LearningRate 0.0195   Epoch: 11   Global Step: 186230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:58:05,570-Speed 2648.46 samples/sec   Loss 1.1987   LearningRate 0.0195   Epoch: 11   Global Step: 186240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:58:12,137-Speed 1559.57 samples/sec   Loss 1.2658   LearningRate 0.0195   Epoch: 11   Global Step: 186250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:58:16,013-Speed 2642.44 samples/sec   Loss 1.1736   LearningRate 0.0195   Epoch: 11   Global Step: 186260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:58:19,096-Speed 3322.26 samples/sec   Loss 1.2156   LearningRate 0.0195   Epoch: 11   Global Step: 186270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:58:22,170-Speed 3332.08 samples/sec   Loss 1.1784   LearningRate 0.0195   Epoch: 11   Global Step: 186280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:58:25,263-Speed 3310.69 samples/sec   Loss 1.2068   LearningRate 0.0195   Epoch: 11   Global Step: 186290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:58:28,319-Speed 3352.72 samples/sec   Loss 1.1428   LearningRate 0.0195   Epoch: 11   Global Step: 186300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:31,434-Speed 3287.25 samples/sec   Loss 1.1607   LearningRate 0.0195   Epoch: 11   Global Step: 186310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:34,591-Speed 3244.85 samples/sec   Loss 1.2263   LearningRate 0.0195   Epoch: 11   Global Step: 186320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:37,695-Speed 3299.85 samples/sec   Loss 1.1849   LearningRate 0.0195   Epoch: 11   Global Step: 186330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:40,912-Speed 3183.16 samples/sec   Loss 1.2183   LearningRate 0.0195   Epoch: 11   Global Step: 186340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:44,001-Speed 3316.41 samples/sec   Loss 1.1841   LearningRate 0.0195   Epoch: 11   Global Step: 186350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:47,072-Speed 3334.72 samples/sec   Loss 1.1669   LearningRate 0.0195   Epoch: 11   Global Step: 186360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:50,268-Speed 3204.75 samples/sec   Loss 1.2370   LearningRate 0.0195   Epoch: 11   Global Step: 186370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:53,344-Speed 3329.42 samples/sec   Loss 1.1968   LearningRate 0.0195   Epoch: 11   Global Step: 186380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:56,413-Speed 3337.59 samples/sec   Loss 1.1667   LearningRate 0.0195   Epoch: 11   Global Step: 186390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:58:59,501-Speed 3317.14 samples/sec   Loss 1.2013   LearningRate 0.0195   Epoch: 11   Global Step: 186400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:59:02,568-Speed 3339.48 samples/sec   Loss 1.1996   LearningRate 0.0195   Epoch: 11   Global Step: 186410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:59:05,636-Speed 3337.81 samples/sec   Loss 1.1881   LearningRate 0.0195   Epoch: 11   Global Step: 186420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:59:08,717-Speed 3324.96 samples/sec   Loss 1.1884   LearningRate 0.0195   Epoch: 11   Global Step: 186430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:59:11,790-Speed 3332.82 samples/sec   Loss 1.1834   LearningRate 0.0195   Epoch: 11   Global Step: 186440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:59:14,858-Speed 3338.57 samples/sec   Loss 1.1178   LearningRate 0.0195   Epoch: 11   Global Step: 186450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:59:17,934-Speed 3329.67 samples/sec   Loss 1.2657   LearningRate 0.0195   Epoch: 11   Global Step: 186460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:59:21,004-Speed 3336.05 samples/sec   Loss 1.1921   LearningRate 0.0195   Epoch: 11   Global Step: 186470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:24,077-Speed 3332.70 samples/sec   Loss 1.2286   LearningRate 0.0195   Epoch: 11   Global Step: 186480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:27,147-Speed 3336.21 samples/sec   Loss 1.1931   LearningRate 0.0195   Epoch: 11   Global Step: 186490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:30,222-Speed 3331.22 samples/sec   Loss 1.2113   LearningRate 0.0195   Epoch: 11   Global Step: 186500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:33,305-Speed 3322.22 samples/sec   Loss 1.2187   LearningRate 0.0195   Epoch: 11   Global Step: 186510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:36,382-Speed 3328.72 samples/sec   Loss 1.2177   LearningRate 0.0195   Epoch: 11   Global Step: 186520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:39,492-Speed 3293.04 samples/sec   Loss 1.1790   LearningRate 0.0195   Epoch: 11   Global Step: 186530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:42,564-Speed 3333.91 samples/sec   Loss 1.1511   LearningRate 0.0195   Epoch: 11   Global Step: 186540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:45,670-Speed 3298.05 samples/sec   Loss 1.1751   LearningRate 0.0195   Epoch: 11   Global Step: 186550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:48,783-Speed 3289.76 samples/sec   Loss 1.1862   LearningRate 0.0195   Epoch: 11   Global Step: 186560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 18:59:51,890-Speed 3296.76 samples/sec   Loss 1.1982   LearningRate 0.0195   Epoch: 11   Global Step: 186570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:59:54,984-Speed 3309.99 samples/sec   Loss 1.1870   LearningRate 0.0195   Epoch: 11   Global Step: 186580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 18:59:58,059-Speed 3331.71 samples/sec   Loss 1.1824   LearningRate 0.0195   Epoch: 11   Global Step: 186590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:01,134-Speed 3330.51 samples/sec   Loss 1.1580   LearningRate 0.0194   Epoch: 11   Global Step: 186600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:04,234-Speed 3304.15 samples/sec   Loss 1.1710   LearningRate 0.0194   Epoch: 11   Global Step: 186610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:07,344-Speed 3293.20 samples/sec   Loss 1.1677   LearningRate 0.0194   Epoch: 11   Global Step: 186620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:10,423-Speed 3326.01 samples/sec   Loss 1.1788   LearningRate 0.0194   Epoch: 11   Global Step: 186630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:13,495-Speed 3334.89 samples/sec   Loss 1.2120   LearningRate 0.0194   Epoch: 11   Global Step: 186640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:16,563-Speed 3338.39 samples/sec   Loss 1.1731   LearningRate 0.0194   Epoch: 11   Global Step: 186650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:19,632-Speed 3337.08 samples/sec   Loss 1.2080   LearningRate 0.0194   Epoch: 11   Global Step: 186660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:22,689-Speed 3350.12 samples/sec   Loss 1.2236   LearningRate 0.0194   Epoch: 11   Global Step: 186670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:25,760-Speed 3335.27 samples/sec   Loss 1.2336   LearningRate 0.0194   Epoch: 11   Global Step: 186680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:28,836-Speed 3329.66 samples/sec   Loss 1.2558   LearningRate 0.0194   Epoch: 11   Global Step: 186690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:31,989-Speed 3248.94 samples/sec   Loss 1.1906   LearningRate 0.0194   Epoch: 11   Global Step: 186700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:35,086-Speed 3306.44 samples/sec   Loss 1.1761   LearningRate 0.0194   Epoch: 11   Global Step: 186710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:38,173-Speed 3319.04 samples/sec   Loss 1.2403   LearningRate 0.0194   Epoch: 11   Global Step: 186720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:41,248-Speed 3330.75 samples/sec   Loss 1.2177   LearningRate 0.0194   Epoch: 11   Global Step: 186730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:44,387-Speed 3262.82 samples/sec   Loss 1.2270   LearningRate 0.0194   Epoch: 11   Global Step: 186740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:47,508-Speed 3281.96 samples/sec   Loss 1.1977   LearningRate 0.0194   Epoch: 11   Global Step: 186750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:50,666-Speed 3242.69 samples/sec   Loss 1.1985   LearningRate 0.0194   Epoch: 11   Global Step: 186760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:53,741-Speed 3331.00 samples/sec   Loss 1.2548   LearningRate 0.0194   Epoch: 11   Global Step: 186770   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-11 19:00:56,836-Speed 3309.31 samples/sec   Loss 1.2225   LearningRate 0.0194   Epoch: 11   Global Step: 186780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:00:59,907-Speed 3334.72 samples/sec   Loss 1.2222   LearningRate 0.0194   Epoch: 11   Global Step: 186790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:01:02,968-Speed 3346.64 samples/sec   Loss 1.2020   LearningRate 0.0194   Epoch: 11   Global Step: 186800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:06,048-Speed 3324.81 samples/sec   Loss 1.2039   LearningRate 0.0194   Epoch: 11   Global Step: 186810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:09,140-Speed 3313.13 samples/sec   Loss 1.2363   LearningRate 0.0194   Epoch: 11   Global Step: 186820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:12,208-Speed 3339.03 samples/sec   Loss 1.2127   LearningRate 0.0194   Epoch: 11   Global Step: 186830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:15,292-Speed 3320.62 samples/sec   Loss 1.1629   LearningRate 0.0194   Epoch: 11   Global Step: 186840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:18,372-Speed 3325.13 samples/sec   Loss 1.2196   LearningRate 0.0194   Epoch: 11   Global Step: 186850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:21,459-Speed 3317.58 samples/sec   Loss 1.1876   LearningRate 0.0194   Epoch: 11   Global Step: 186860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:24,539-Speed 3325.98 samples/sec   Loss 1.2022   LearningRate 0.0194   Epoch: 11   Global Step: 186870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:27,637-Speed 3305.96 samples/sec   Loss 1.1820   LearningRate 0.0194   Epoch: 11   Global Step: 186880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:30,710-Speed 3332.87 samples/sec   Loss 1.2065   LearningRate 0.0194   Epoch: 11   Global Step: 186890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:01:33,811-Speed 3302.33 samples/sec   Loss 1.2200   LearningRate 0.0194   Epoch: 11   Global Step: 186900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:01:36,884-Speed 3333.20 samples/sec   Loss 1.2413   LearningRate 0.0194   Epoch: 11   Global Step: 186910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:01:39,960-Speed 3330.88 samples/sec   Loss 1.1793   LearningRate 0.0194   Epoch: 11   Global Step: 186920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:01:43,037-Speed 3328.33 samples/sec   Loss 1.2017   LearningRate 0.0194   Epoch: 11   Global Step: 186930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:01:46,110-Speed 3332.78 samples/sec   Loss 1.2107   LearningRate 0.0194   Epoch: 11   Global Step: 186940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:01:49,189-Speed 3326.49 samples/sec   Loss 1.1954   LearningRate 0.0194   Epoch: 11   Global Step: 186950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:01:52,262-Speed 3333.15 samples/sec   Loss 1.2390   LearningRate 0.0194   Epoch: 11   Global Step: 186960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:01:55,343-Speed 3324.43 samples/sec   Loss 1.2208   LearningRate 0.0194   Epoch: 11   Global Step: 186970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:01:58,409-Speed 3339.92 samples/sec   Loss 1.2086   LearningRate 0.0193   Epoch: 11   Global Step: 186980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:01,488-Speed 3326.46 samples/sec   Loss 1.1868   LearningRate 0.0193   Epoch: 11   Global Step: 186990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:04,579-Speed 3314.46 samples/sec   Loss 1.2311   LearningRate 0.0193   Epoch: 11   Global Step: 187000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:07,652-Speed 3332.35 samples/sec   Loss 1.1910   LearningRate 0.0193   Epoch: 11   Global Step: 187010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:10,727-Speed 3331.43 samples/sec   Loss 1.2453   LearningRate 0.0193   Epoch: 11   Global Step: 187020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:13,850-Speed 3279.95 samples/sec   Loss 1.1826   LearningRate 0.0193   Epoch: 11   Global Step: 187030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:16,927-Speed 3328.07 samples/sec   Loss 1.2139   LearningRate 0.0193   Epoch: 11   Global Step: 187040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:20,073-Speed 3255.85 samples/sec   Loss 1.2274   LearningRate 0.0193   Epoch: 11   Global Step: 187050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:23,179-Speed 3297.36 samples/sec   Loss 1.2232   LearningRate 0.0193   Epoch: 11   Global Step: 187060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:26,267-Speed 3316.85 samples/sec   Loss 1.2342   LearningRate 0.0193   Epoch: 11   Global Step: 187070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:02:29,488-Speed 3179.94 samples/sec   Loss 1.2048   LearningRate 0.0193   Epoch: 11   Global Step: 187080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:02:32,589-Speed 3302.55 samples/sec   Loss 1.2283   LearningRate 0.0193   Epoch: 11   Global Step: 187090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:02:35,673-Speed 3321.47 samples/sec   Loss 1.2049   LearningRate 0.0193   Epoch: 11   Global Step: 187100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:02:38,800-Speed 3276.09 samples/sec   Loss 1.1961   LearningRate 0.0193   Epoch: 11   Global Step: 187110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:02:41,879-Speed 3325.74 samples/sec   Loss 1.2158   LearningRate 0.0193   Epoch: 11   Global Step: 187120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:02:44,953-Speed 3332.03 samples/sec   Loss 1.2059   LearningRate 0.0193   Epoch: 11   Global Step: 187130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:02:48,036-Speed 3321.71 samples/sec   Loss 1.2292   LearningRate 0.0193   Epoch: 11   Global Step: 187140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:02:51,125-Speed 3315.64 samples/sec   Loss 1.2193   LearningRate 0.0193   Epoch: 11   Global Step: 187150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:02:54,232-Speed 3296.78 samples/sec   Loss 1.2334   LearningRate 0.0193   Epoch: 11   Global Step: 187160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:02:57,331-Speed 3305.66 samples/sec   Loss 1.2066   LearningRate 0.0193   Epoch: 11   Global Step: 187170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:03:00,422-Speed 3313.14 samples/sec   Loss 1.2387   LearningRate 0.0193   Epoch: 11   Global Step: 187180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:03:03,511-Speed 3316.18 samples/sec   Loss 1.1960   LearningRate 0.0193   Epoch: 11   Global Step: 187190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-11 19:03:06,673-Speed 3239.16 samples/sec   Loss 1.1841   LearningRate 0.0193   Epoch: 11   Global Step: 187200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:03:09,748-Speed 3330.47 samples/sec   Loss 1.1776   LearningRate 0.0193   Epoch: 11   Global Step: 187210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:03:12,825-Speed 3329.09 samples/sec   Loss 1.2045   LearningRate 0.0193   Epoch: 11   Global Step: 187220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:03:15,926-Speed 3302.25 samples/sec   Loss 1.2147   LearningRate 0.0193   Epoch: 11   Global Step: 187230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:03:19,022-Speed 3307.95 samples/sec   Loss 1.1953   LearningRate 0.0193   Epoch: 11   Global Step: 187240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 19:03:22,100-Speed 3328.80 samples/sec   Loss 1.2177   LearningRate 0.0193   Epoch: 11   Global Step: 187250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:03:25,178-Speed 3327.41 samples/sec   Loss 1.1721   LearningRate 0.0193   Epoch: 11   Global Step: 187260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:03:28,253-Speed 3330.98 samples/sec   Loss 1.2320   LearningRate 0.0193   Epoch: 11   Global Step: 187270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:03:31,344-Speed 3313.43 samples/sec   Loss 1.2152   LearningRate 0.0193   Epoch: 11   Global Step: 187280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:03:34,424-Speed 3325.34 samples/sec   Loss 1.2300   LearningRate 0.0193   Epoch: 11   Global Step: 187290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:03:37,506-Speed 3323.36 samples/sec   Loss 1.2129   LearningRate 0.0193   Epoch: 11   Global Step: 187300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:03:40,582-Speed 3329.43 samples/sec   Loss 1.1685   LearningRate 0.0193   Epoch: 11   Global Step: 187310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:03:43,655-Speed 3333.01 samples/sec   Loss 1.1799   LearningRate 0.0193   Epoch: 11   Global Step: 187320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:03:46,728-Speed 3333.03 samples/sec   Loss 1.2477   LearningRate 0.0193   Epoch: 11   Global Step: 187330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:03:49,814-Speed 3318.73 samples/sec   Loss 1.2173   LearningRate 0.0193   Epoch: 11   Global Step: 187340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:03:52,889-Speed 3331.44 samples/sec   Loss 1.2523   LearningRate 0.0193   Epoch: 11   Global Step: 187350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:03:55,961-Speed 3333.95 samples/sec   Loss 1.1750   LearningRate 0.0192   Epoch: 11   Global Step: 187360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:03:59,033-Speed 3333.29 samples/sec   Loss 1.2605   LearningRate 0.0192   Epoch: 11   Global Step: 187370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:04:02,159-Speed 3276.74 samples/sec   Loss 1.2749   LearningRate 0.0192   Epoch: 11   Global Step: 187380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:05,236-Speed 3328.70 samples/sec   Loss 1.1882   LearningRate 0.0192   Epoch: 11   Global Step: 187390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:08,314-Speed 3327.51 samples/sec   Loss 1.2352   LearningRate 0.0192   Epoch: 11   Global Step: 187400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:11,477-Speed 3238.31 samples/sec   Loss 1.2588   LearningRate 0.0192   Epoch: 11   Global Step: 187410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:14,650-Speed 3227.75 samples/sec   Loss 1.2145   LearningRate 0.0192   Epoch: 11   Global Step: 187420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:17,805-Speed 3246.77 samples/sec   Loss 1.2198   LearningRate 0.0192   Epoch: 11   Global Step: 187430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:20,916-Speed 3292.13 samples/sec   Loss 1.2215   LearningRate 0.0192   Epoch: 11   Global Step: 187440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:24,018-Speed 3302.05 samples/sec   Loss 1.2207   LearningRate 0.0192   Epoch: 11   Global Step: 187450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:27,207-Speed 3212.50 samples/sec   Loss 1.1555   LearningRate 0.0192   Epoch: 11   Global Step: 187460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:30,335-Speed 3273.43 samples/sec   Loss 1.2070   LearningRate 0.0192   Epoch: 11   Global Step: 187470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:33,419-Speed 3322.22 samples/sec   Loss 1.2132   LearningRate 0.0192   Epoch: 11   Global Step: 187480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:04:36,540-Speed 3281.35 samples/sec   Loss 1.1865   LearningRate 0.0192   Epoch: 11   Global Step: 187490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:04:39,617-Speed 3328.20 samples/sec   Loss 1.2676   LearningRate 0.0192   Epoch: 11   Global Step: 187500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:04:42,706-Speed 3316.09 samples/sec   Loss 1.2436   LearningRate 0.0192   Epoch: 11   Global Step: 187510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:04:45,814-Speed 3296.19 samples/sec   Loss 1.2300   LearningRate 0.0192   Epoch: 11   Global Step: 187520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:49,049-Speed 3165.61 samples/sec   Loss 1.2539   LearningRate 0.0192   Epoch: 11   Global Step: 187530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:52,176-Speed 3275.11 samples/sec   Loss 1.2203   LearningRate 0.0192   Epoch: 11   Global Step: 187540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:55,261-Speed 3320.89 samples/sec   Loss 1.2298   LearningRate 0.0192   Epoch: 11   Global Step: 187550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:04:58,382-Speed 3281.43 samples/sec   Loss 1.2297   LearningRate 0.0192   Epoch: 11   Global Step: 187560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:01,523-Speed 3259.91 samples/sec   Loss 1.2635   LearningRate 0.0192   Epoch: 11   Global Step: 187570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:04,599-Speed 3329.96 samples/sec   Loss 1.2309   LearningRate 0.0192   Epoch: 11   Global Step: 187580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:07,697-Speed 3306.26 samples/sec   Loss 1.2339   LearningRate 0.0192   Epoch: 11   Global Step: 187590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:10,808-Speed 3292.81 samples/sec   Loss 1.2152   LearningRate 0.0192   Epoch: 11   Global Step: 187600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:13,894-Speed 3318.84 samples/sec   Loss 1.2404   LearningRate 0.0192   Epoch: 11   Global Step: 187610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:16,980-Speed 3319.18 samples/sec   Loss 1.2279   LearningRate 0.0192   Epoch: 11   Global Step: 187620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:05:20,069-Speed 3315.80 samples/sec   Loss 1.2006   LearningRate 0.0192   Epoch: 11   Global Step: 187630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:05:23,199-Speed 3271.91 samples/sec   Loss 1.2049   LearningRate 0.0192   Epoch: 11   Global Step: 187640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:05:26,386-Speed 3214.29 samples/sec   Loss 1.2455   LearningRate 0.0192   Epoch: 11   Global Step: 187650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:05:29,545-Speed 3241.48 samples/sec   Loss 1.2467   LearningRate 0.0192   Epoch: 11   Global Step: 187660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:32,630-Speed 3320.77 samples/sec   Loss 1.1901   LearningRate 0.0192   Epoch: 11   Global Step: 187670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:35,832-Speed 3198.30 samples/sec   Loss 1.2142   LearningRate 0.0192   Epoch: 11   Global Step: 187680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:38,944-Speed 3292.25 samples/sec   Loss 1.1563   LearningRate 0.0192   Epoch: 11   Global Step: 187690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:42,032-Speed 3316.39 samples/sec   Loss 1.1933   LearningRate 0.0192   Epoch: 11   Global Step: 187700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:45,131-Speed 3305.60 samples/sec   Loss 1.2561   LearningRate 0.0192   Epoch: 11   Global Step: 187710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:48,234-Speed 3300.54 samples/sec   Loss 1.2026   LearningRate 0.0192   Epoch: 11   Global Step: 187720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:51,318-Speed 3320.74 samples/sec   Loss 1.2361   LearningRate 0.0192   Epoch: 11   Global Step: 187730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:54,423-Speed 3298.24 samples/sec   Loss 1.2051   LearningRate 0.0191   Epoch: 11   Global Step: 187740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:05:57,590-Speed 3234.02 samples/sec   Loss 1.2362   LearningRate 0.0191   Epoch: 11   Global Step: 187750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:00,733-Speed 3258.66 samples/sec   Loss 1.2572   LearningRate 0.0191   Epoch: 11   Global Step: 187760   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:06:03,824-Speed 3314.24 samples/sec   Loss 1.2448   LearningRate 0.0191   Epoch: 11   Global Step: 187770   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:06:06,908-Speed 3321.16 samples/sec   Loss 1.1983   LearningRate 0.0191   Epoch: 11   Global Step: 187780   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:06:09,992-Speed 3321.38 samples/sec   Loss 1.2622   LearningRate 0.0191   Epoch: 11   Global Step: 187790   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:06:13,085-Speed 3310.99 samples/sec   Loss 1.2707   LearningRate 0.0191   Epoch: 11   Global Step: 187800   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:06:16,169-Speed 3321.34 samples/sec   Loss 1.2057   LearningRate 0.0191   Epoch: 11   Global Step: 187810   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:06:19,348-Speed 3221.65 samples/sec   Loss 1.2090   LearningRate 0.0191   Epoch: 11   Global Step: 187820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:22,486-Speed 3263.75 samples/sec   Loss 1.2574   LearningRate 0.0191   Epoch: 11   Global Step: 187830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:25,583-Speed 3307.02 samples/sec   Loss 1.1938   LearningRate 0.0191   Epoch: 11   Global Step: 187840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:28,691-Speed 3295.86 samples/sec   Loss 1.2414   LearningRate 0.0191   Epoch: 11   Global Step: 187850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:31,770-Speed 3325.87 samples/sec   Loss 1.2591   LearningRate 0.0191   Epoch: 11   Global Step: 187860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:34,870-Speed 3305.41 samples/sec   Loss 1.2072   LearningRate 0.0191   Epoch: 11   Global Step: 187870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:38,069-Speed 3200.90 samples/sec   Loss 1.2498   LearningRate 0.0191   Epoch: 11   Global Step: 187880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:41,221-Speed 3251.06 samples/sec   Loss 1.2742   LearningRate 0.0191   Epoch: 11   Global Step: 187890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:44,303-Speed 3322.19 samples/sec   Loss 1.2092   LearningRate 0.0191   Epoch: 11   Global Step: 187900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:47,411-Speed 3295.38 samples/sec   Loss 1.2642   LearningRate 0.0191   Epoch: 11   Global Step: 187910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:50,492-Speed 3325.06 samples/sec   Loss 1.1992   LearningRate 0.0191   Epoch: 11   Global Step: 187920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:53,604-Speed 3291.12 samples/sec   Loss 1.2625   LearningRate 0.0191   Epoch: 11   Global Step: 187930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:56,687-Speed 3321.79 samples/sec   Loss 1.2520   LearningRate 0.0191   Epoch: 11   Global Step: 187940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:06:59,823-Speed 3265.98 samples/sec   Loss 1.2213   LearningRate 0.0191   Epoch: 11   Global Step: 187950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:07:02,916-Speed 3311.88 samples/sec   Loss 1.2834   LearningRate 0.0191   Epoch: 11   Global Step: 187960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:07:05,998-Speed 3323.26 samples/sec   Loss 1.2235   LearningRate 0.0191   Epoch: 11   Global Step: 187970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:07:09,088-Speed 3314.79 samples/sec   Loss 1.2784   LearningRate 0.0191   Epoch: 11   Global Step: 187980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:07:12,192-Speed 3299.14 samples/sec   Loss 1.2754   LearningRate 0.0191   Epoch: 11   Global Step: 187990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:07:15,320-Speed 3274.60 samples/sec   Loss 1.2734   LearningRate 0.0191   Epoch: 11   Global Step: 188000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:07:59,417-[lfw][188000]XNorm: 21.595937
Training: 2022-04-11 19:07:59,417-[lfw][188000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 19:07:59,418-[lfw][188000]Accuracy-Highest: 0.99817
Training: 2022-04-11 19:08:50,578-[cfp_fp][188000]XNorm: 21.456830
Training: 2022-04-11 19:08:50,578-[cfp_fp][188000]Accuracy-Flip: 0.98957+-0.00461
Training: 2022-04-11 19:08:50,579-[cfp_fp][188000]Accuracy-Highest: 0.98971
Training: 2022-04-11 19:09:34,659-[agedb_30][188000]XNorm: 22.247376
Training: 2022-04-11 19:09:34,660-[agedb_30][188000]Accuracy-Flip: 0.98400+-0.00720
Training: 2022-04-11 19:09:34,660-[agedb_30][188000]Accuracy-Highest: 0.98500
Training: 2022-04-11 19:09:37,791-Speed 71.87 samples/sec   Loss 1.2898   LearningRate 0.0191   Epoch: 11   Global Step: 188010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:09:40,939-Speed 3253.57 samples/sec   Loss 1.2428   LearningRate 0.0191   Epoch: 11   Global Step: 188020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:09:44,053-Speed 3289.59 samples/sec   Loss 1.2305   LearningRate 0.0191   Epoch: 11   Global Step: 188030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:09:47,117-Speed 3341.91 samples/sec   Loss 1.2230   LearningRate 0.0191   Epoch: 11   Global Step: 188040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:09:50,187-Speed 3337.06 samples/sec   Loss 1.2531   LearningRate 0.0191   Epoch: 11   Global Step: 188050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:09:53,268-Speed 3324.11 samples/sec   Loss 1.2436   LearningRate 0.0191   Epoch: 11   Global Step: 188060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:09:56,358-Speed 3314.07 samples/sec   Loss 1.2279   LearningRate 0.0191   Epoch: 11   Global Step: 188070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:09:59,420-Speed 3345.26 samples/sec   Loss 1.2546   LearningRate 0.0191   Epoch: 11   Global Step: 188080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:02,510-Speed 3315.28 samples/sec   Loss 1.2214   LearningRate 0.0191   Epoch: 11   Global Step: 188090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:05,632-Speed 3280.04 samples/sec   Loss 1.2694   LearningRate 0.0191   Epoch: 11   Global Step: 188100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:08,713-Speed 3324.13 samples/sec   Loss 1.2408   LearningRate 0.0191   Epoch: 11   Global Step: 188110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:11,839-Speed 3276.91 samples/sec   Loss 1.2385   LearningRate 0.0190   Epoch: 11   Global Step: 188120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:14,913-Speed 3331.82 samples/sec   Loss 1.2318   LearningRate 0.0190   Epoch: 11   Global Step: 188130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:18,015-Speed 3302.08 samples/sec   Loss 1.2338   LearningRate 0.0190   Epoch: 11   Global Step: 188140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:21,104-Speed 3315.84 samples/sec   Loss 1.2340   LearningRate 0.0190   Epoch: 11   Global Step: 188150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:24,330-Speed 3174.95 samples/sec   Loss 1.2325   LearningRate 0.0190   Epoch: 11   Global Step: 188160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:27,463-Speed 3269.41 samples/sec   Loss 1.2341   LearningRate 0.0190   Epoch: 11   Global Step: 188170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:10:30,534-Speed 3334.54 samples/sec   Loss 1.2815   LearningRate 0.0190   Epoch: 11   Global Step: 188180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:10:33,607-Speed 3333.11 samples/sec   Loss 1.2543   LearningRate 0.0190   Epoch: 11   Global Step: 188190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:10:36,678-Speed 3334.98 samples/sec   Loss 1.2525   LearningRate 0.0190   Epoch: 11   Global Step: 188200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:10:39,805-Speed 3275.35 samples/sec   Loss 1.2720   LearningRate 0.0190   Epoch: 11   Global Step: 188210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:10:42,999-Speed 3207.15 samples/sec   Loss 1.2359   LearningRate 0.0190   Epoch: 11   Global Step: 188220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:10:46,109-Speed 3292.58 samples/sec   Loss 1.2182   LearningRate 0.0190   Epoch: 11   Global Step: 188230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:10:49,207-Speed 3306.03 samples/sec   Loss 1.2630   LearningRate 0.0190   Epoch: 11   Global Step: 188240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:10:52,284-Speed 3329.42 samples/sec   Loss 1.2354   LearningRate 0.0190   Epoch: 11   Global Step: 188250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:10:55,357-Speed 3333.04 samples/sec   Loss 1.2054   LearningRate 0.0190   Epoch: 11   Global Step: 188260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:10:58,431-Speed 3331.98 samples/sec   Loss 1.2707   LearningRate 0.0190   Epoch: 11   Global Step: 188270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:11:01,503-Speed 3334.40 samples/sec   Loss 1.2790   LearningRate 0.0190   Epoch: 11   Global Step: 188280   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-04-11 19:11:04,565-Speed 3344.15 samples/sec   Loss 1.2435   LearningRate 0.0190   Epoch: 11   Global Step: 188290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:11:07,648-Speed 3323.01 samples/sec   Loss 1.2673   LearningRate 0.0190   Epoch: 11   Global Step: 188300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:11:10,737-Speed 3314.61 samples/sec   Loss 1.2471   LearningRate 0.0190   Epoch: 11   Global Step: 188310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:11:13,819-Speed 3323.63 samples/sec   Loss 1.2673   LearningRate 0.0190   Epoch: 11   Global Step: 188320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:11:16,889-Speed 3336.47 samples/sec   Loss 1.2398   LearningRate 0.0190   Epoch: 11   Global Step: 188330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:19,957-Speed 3338.73 samples/sec   Loss 1.2210   LearningRate 0.0190   Epoch: 11   Global Step: 188340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:23,036-Speed 3326.50 samples/sec   Loss 1.2142   LearningRate 0.0190   Epoch: 11   Global Step: 188350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:26,126-Speed 3314.67 samples/sec   Loss 1.3026   LearningRate 0.0190   Epoch: 11   Global Step: 188360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:29,201-Speed 3330.90 samples/sec   Loss 1.2050   LearningRate 0.0190   Epoch: 11   Global Step: 188370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:32,282-Speed 3324.68 samples/sec   Loss 1.2436   LearningRate 0.0190   Epoch: 11   Global Step: 188380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:35,366-Speed 3320.79 samples/sec   Loss 1.2364   LearningRate 0.0190   Epoch: 11   Global Step: 188390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:38,450-Speed 3320.54 samples/sec   Loss 1.2579   LearningRate 0.0190   Epoch: 11   Global Step: 188400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:41,524-Speed 3332.17 samples/sec   Loss 1.2367   LearningRate 0.0190   Epoch: 11   Global Step: 188410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:44,595-Speed 3335.59 samples/sec   Loss 1.2416   LearningRate 0.0190   Epoch: 11   Global Step: 188420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:11:47,693-Speed 3306.18 samples/sec   Loss 1.2550   LearningRate 0.0190   Epoch: 11   Global Step: 188430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:11:50,762-Speed 3337.17 samples/sec   Loss 1.2378   LearningRate 0.0190   Epoch: 11   Global Step: 188440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:11:53,844-Speed 3323.75 samples/sec   Loss 1.2232   LearningRate 0.0190   Epoch: 11   Global Step: 188450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:11:56,926-Speed 3322.85 samples/sec   Loss 1.2137   LearningRate 0.0190   Epoch: 11   Global Step: 188460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:11:59,990-Speed 3342.50 samples/sec   Loss 1.2489   LearningRate 0.0190   Epoch: 11   Global Step: 188470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:03,062-Speed 3334.82 samples/sec   Loss 1.2435   LearningRate 0.0190   Epoch: 11   Global Step: 188480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:06,204-Speed 3259.87 samples/sec   Loss 1.2413   LearningRate 0.0190   Epoch: 11   Global Step: 188490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:09,324-Speed 3282.41 samples/sec   Loss 1.2305   LearningRate 0.0190   Epoch: 11   Global Step: 188500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:12,404-Speed 3325.48 samples/sec   Loss 1.2251   LearningRate 0.0189   Epoch: 11   Global Step: 188510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:15,551-Speed 3255.30 samples/sec   Loss 1.2595   LearningRate 0.0189   Epoch: 11   Global Step: 188520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:18,645-Speed 3310.28 samples/sec   Loss 1.2359   LearningRate 0.0189   Epoch: 11   Global Step: 188530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:21,713-Speed 3337.84 samples/sec   Loss 1.2571   LearningRate 0.0189   Epoch: 11   Global Step: 188540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:24,782-Speed 3337.36 samples/sec   Loss 1.2632   LearningRate 0.0189   Epoch: 11   Global Step: 188550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:27,865-Speed 3322.80 samples/sec   Loss 1.2724   LearningRate 0.0189   Epoch: 11   Global Step: 188560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:12:30,937-Speed 3333.36 samples/sec   Loss 1.2324   LearningRate 0.0189   Epoch: 11   Global Step: 188570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:12:34,012-Speed 3330.98 samples/sec   Loss 1.2752   LearningRate 0.0189   Epoch: 11   Global Step: 188580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:12:37,104-Speed 3312.93 samples/sec   Loss 1.2368   LearningRate 0.0189   Epoch: 11   Global Step: 188590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:12:40,177-Speed 3332.45 samples/sec   Loss 1.2342   LearningRate 0.0189   Epoch: 11   Global Step: 188600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:12:43,247-Speed 3337.09 samples/sec   Loss 1.2609   LearningRate 0.0189   Epoch: 11   Global Step: 188610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:12:46,335-Speed 3316.86 samples/sec   Loss 1.2679   LearningRate 0.0189   Epoch: 11   Global Step: 188620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:12:49,501-Speed 3235.03 samples/sec   Loss 1.2805   LearningRate 0.0189   Epoch: 11   Global Step: 188630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:12:52,582-Speed 3324.09 samples/sec   Loss 1.2509   LearningRate 0.0189   Epoch: 11   Global Step: 188640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:12:55,653-Speed 3335.04 samples/sec   Loss 1.2176   LearningRate 0.0189   Epoch: 11   Global Step: 188650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:12:58,775-Speed 3280.18 samples/sec   Loss 1.2303   LearningRate 0.0189   Epoch: 11   Global Step: 188660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:13:01,855-Speed 3326.21 samples/sec   Loss 1.2402   LearningRate 0.0189   Epoch: 11   Global Step: 188670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:13:04,980-Speed 3277.18 samples/sec   Loss 1.2102   LearningRate 0.0189   Epoch: 11   Global Step: 188680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:13:08,053-Speed 3333.42 samples/sec   Loss 1.2629   LearningRate 0.0189   Epoch: 11   Global Step: 188690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:13:11,150-Speed 3306.93 samples/sec   Loss 1.2320   LearningRate 0.0189   Epoch: 11   Global Step: 188700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:13:14,283-Speed 3268.87 samples/sec   Loss 1.2507   LearningRate 0.0189   Epoch: 11   Global Step: 188710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:13:17,397-Speed 3289.38 samples/sec   Loss 1.2578   LearningRate 0.0189   Epoch: 11   Global Step: 188720   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:13:20,468-Speed 3335.32 samples/sec   Loss 1.2286   LearningRate 0.0189   Epoch: 11   Global Step: 188730   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:13:23,550-Speed 3323.41 samples/sec   Loss 1.2655   LearningRate 0.0189   Epoch: 11   Global Step: 188740   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:13:26,614-Speed 3342.50 samples/sec   Loss 1.2859   LearningRate 0.0189   Epoch: 11   Global Step: 188750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:13:29,704-Speed 3314.52 samples/sec   Loss 1.3135   LearningRate 0.0189   Epoch: 11   Global Step: 188760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:13:32,770-Speed 3340.26 samples/sec   Loss 1.2668   LearningRate 0.0189   Epoch: 11   Global Step: 188770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:13:35,860-Speed 3314.74 samples/sec   Loss 1.2239   LearningRate 0.0189   Epoch: 11   Global Step: 188780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:13:38,940-Speed 3325.83 samples/sec   Loss 1.2546   LearningRate 0.0189   Epoch: 11   Global Step: 188790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:13:42,009-Speed 3337.46 samples/sec   Loss 1.2965   LearningRate 0.0189   Epoch: 11   Global Step: 188800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:13:45,137-Speed 3274.52 samples/sec   Loss 1.2838   LearningRate 0.0189   Epoch: 11   Global Step: 188810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:13:48,215-Speed 3327.77 samples/sec   Loss 1.2587   LearningRate 0.0189   Epoch: 11   Global Step: 188820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:13:51,287-Speed 3333.38 samples/sec   Loss 1.2368   LearningRate 0.0189   Epoch: 11   Global Step: 188830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:13:54,400-Speed 3290.42 samples/sec   Loss 1.2869   LearningRate 0.0189   Epoch: 11   Global Step: 188840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:13:57,524-Speed 3278.53 samples/sec   Loss 1.2614   LearningRate 0.0189   Epoch: 11   Global Step: 188850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:14:00,600-Speed 3329.87 samples/sec   Loss 1.2702   LearningRate 0.0189   Epoch: 11   Global Step: 188860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:14:03,681-Speed 3324.50 samples/sec   Loss 1.2201   LearningRate 0.0189   Epoch: 11   Global Step: 188870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:06,757-Speed 3330.68 samples/sec   Loss 1.2664   LearningRate 0.0189   Epoch: 11   Global Step: 188880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:09,827-Speed 3335.39 samples/sec   Loss 1.2991   LearningRate 0.0188   Epoch: 11   Global Step: 188890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:12,905-Speed 3327.98 samples/sec   Loss 1.2312   LearningRate 0.0188   Epoch: 11   Global Step: 188900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:15,977-Speed 3334.18 samples/sec   Loss 1.2756   LearningRate 0.0188   Epoch: 11   Global Step: 188910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:19,054-Speed 3329.01 samples/sec   Loss 1.3135   LearningRate 0.0188   Epoch: 11   Global Step: 188920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:22,129-Speed 3330.75 samples/sec   Loss 1.2492   LearningRate 0.0188   Epoch: 11   Global Step: 188930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:25,203-Speed 3331.43 samples/sec   Loss 1.3085   LearningRate 0.0188   Epoch: 11   Global Step: 188940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:28,275-Speed 3334.47 samples/sec   Loss 1.2731   LearningRate 0.0188   Epoch: 11   Global Step: 188950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:31,389-Speed 3288.85 samples/sec   Loss 1.2913   LearningRate 0.0188   Epoch: 11   Global Step: 188960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:34,479-Speed 3314.61 samples/sec   Loss 1.3059   LearningRate 0.0188   Epoch: 11   Global Step: 188970   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:14:37,552-Speed 3333.01 samples/sec   Loss 1.2468   LearningRate 0.0188   Epoch: 11   Global Step: 188980   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:14:40,632-Speed 3325.48 samples/sec   Loss 1.2628   LearningRate 0.0188   Epoch: 11   Global Step: 188990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:14:43,711-Speed 3325.98 samples/sec   Loss 1.2430   LearningRate 0.0188   Epoch: 11   Global Step: 189000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:14:46,797-Speed 3328.82 samples/sec   Loss 1.2566   LearningRate 0.0188   Epoch: 11   Global Step: 189010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:14:49,868-Speed 3334.77 samples/sec   Loss 1.2665   LearningRate 0.0188   Epoch: 11   Global Step: 189020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:14:52,945-Speed 3328.58 samples/sec   Loss 1.3228   LearningRate 0.0188   Epoch: 11   Global Step: 189030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:56,019-Speed 3332.82 samples/sec   Loss 1.2793   LearningRate 0.0188   Epoch: 11   Global Step: 189040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:14:59,100-Speed 3324.03 samples/sec   Loss 1.2387   LearningRate 0.0188   Epoch: 11   Global Step: 189050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:02,178-Speed 3327.07 samples/sec   Loss 1.2970   LearningRate 0.0188   Epoch: 11   Global Step: 189060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:05,252-Speed 3332.22 samples/sec   Loss 1.3140   LearningRate 0.0188   Epoch: 11   Global Step: 189070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:08,330-Speed 3327.71 samples/sec   Loss 1.2420   LearningRate 0.0188   Epoch: 11   Global Step: 189080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:11,432-Speed 3301.51 samples/sec   Loss 1.2548   LearningRate 0.0188   Epoch: 11   Global Step: 189090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:14,510-Speed 3327.71 samples/sec   Loss 1.2119   LearningRate 0.0188   Epoch: 11   Global Step: 189100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:17,584-Speed 3332.48 samples/sec   Loss 1.2486   LearningRate 0.0188   Epoch: 11   Global Step: 189110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:20,688-Speed 3299.33 samples/sec   Loss 1.2836   LearningRate 0.0188   Epoch: 11   Global Step: 189120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:23,766-Speed 3328.23 samples/sec   Loss 1.2711   LearningRate 0.0188   Epoch: 11   Global Step: 189130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:15:26,856-Speed 3315.00 samples/sec   Loss 1.2380   LearningRate 0.0188   Epoch: 11   Global Step: 189140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:15:29,946-Speed 3313.91 samples/sec   Loss 1.2825   LearningRate 0.0188   Epoch: 11   Global Step: 189150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:15:33,037-Speed 3313.50 samples/sec   Loss 1.2679   LearningRate 0.0188   Epoch: 11   Global Step: 189160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:15:36,216-Speed 3222.68 samples/sec   Loss 1.2905   LearningRate 0.0188   Epoch: 11   Global Step: 189170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:15:39,367-Speed 3250.03 samples/sec   Loss 1.3029   LearningRate 0.0188   Epoch: 11   Global Step: 189180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:15:42,476-Speed 3294.74 samples/sec   Loss 1.2548   LearningRate 0.0188   Epoch: 11   Global Step: 189190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:15:45,741-Speed 3136.42 samples/sec   Loss 1.2751   LearningRate 0.0188   Epoch: 11   Global Step: 189200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:15:48,974-Speed 3168.29 samples/sec   Loss 1.3053   LearningRate 0.0188   Epoch: 11   Global Step: 189210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:52,191-Speed 3183.84 samples/sec   Loss 1.2650   LearningRate 0.0188   Epoch: 11   Global Step: 189220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:55,346-Speed 3247.37 samples/sec   Loss 1.2858   LearningRate 0.0188   Epoch: 11   Global Step: 189230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:15:58,419-Speed 3332.37 samples/sec   Loss 1.2587   LearningRate 0.0188   Epoch: 11   Global Step: 189240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:01,498-Speed 3326.51 samples/sec   Loss 1.3036   LearningRate 0.0188   Epoch: 11   Global Step: 189250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:04,664-Speed 3235.49 samples/sec   Loss 1.2908   LearningRate 0.0188   Epoch: 11   Global Step: 189260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:07,753-Speed 3315.09 samples/sec   Loss 1.3002   LearningRate 0.0188   Epoch: 11   Global Step: 189270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:10,895-Speed 3259.97 samples/sec   Loss 1.2991   LearningRate 0.0187   Epoch: 11   Global Step: 189280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:14,125-Speed 3170.69 samples/sec   Loss 1.2367   LearningRate 0.0187   Epoch: 11   Global Step: 189290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:17,290-Speed 3237.52 samples/sec   Loss 1.2473   LearningRate 0.0187   Epoch: 11   Global Step: 189300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:20,511-Speed 3179.46 samples/sec   Loss 1.2435   LearningRate 0.0187   Epoch: 11   Global Step: 189310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:16:23,604-Speed 3311.06 samples/sec   Loss 1.3064   LearningRate 0.0187   Epoch: 11   Global Step: 189320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:16:26,690-Speed 3318.95 samples/sec   Loss 1.2374   LearningRate 0.0187   Epoch: 11   Global Step: 189330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:16:29,768-Speed 3327.54 samples/sec   Loss 1.2650   LearningRate 0.0187   Epoch: 11   Global Step: 189340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:16:32,947-Speed 3222.42 samples/sec   Loss 1.2807   LearningRate 0.0187   Epoch: 11   Global Step: 189350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:16:36,011-Speed 3342.90 samples/sec   Loss 1.2452   LearningRate 0.0187   Epoch: 11   Global Step: 189360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:39,111-Speed 3303.88 samples/sec   Loss 1.2583   LearningRate 0.0187   Epoch: 11   Global Step: 189370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:42,249-Speed 3263.91 samples/sec   Loss 1.2912   LearningRate 0.0187   Epoch: 11   Global Step: 189380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:45,350-Speed 3303.12 samples/sec   Loss 1.2776   LearningRate 0.0187   Epoch: 11   Global Step: 189390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:48,471-Speed 3281.69 samples/sec   Loss 1.2730   LearningRate 0.0187   Epoch: 11   Global Step: 189400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:51,548-Speed 3328.63 samples/sec   Loss 1.2966   LearningRate 0.0187   Epoch: 11   Global Step: 189410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:54,623-Speed 3330.39 samples/sec   Loss 1.2831   LearningRate 0.0187   Epoch: 11   Global Step: 189420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:16:57,725-Speed 3301.87 samples/sec   Loss 1.2250   LearningRate 0.0187   Epoch: 11   Global Step: 189430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:17:00,844-Speed 3284.55 samples/sec   Loss 1.2904   LearningRate 0.0187   Epoch: 11   Global Step: 189440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:17:03,955-Speed 3291.96 samples/sec   Loss 1.2948   LearningRate 0.0187   Epoch: 11   Global Step: 189450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:17:07,068-Speed 3289.64 samples/sec   Loss 1.2383   LearningRate 0.0187   Epoch: 11   Global Step: 189460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:17:10,144-Speed 3330.57 samples/sec   Loss 1.2871   LearningRate 0.0187   Epoch: 11   Global Step: 189470   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:17:13,218-Speed 3331.72 samples/sec   Loss 1.2926   LearningRate 0.0187   Epoch: 11   Global Step: 189480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:17:16,266-Speed 3360.18 samples/sec   Loss 1.2352   LearningRate 0.0187   Epoch: 11   Global Step: 189490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:19,440-Speed 3227.26 samples/sec   Loss 1.2914   LearningRate 0.0187   Epoch: 11   Global Step: 189500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:22,530-Speed 3314.25 samples/sec   Loss 1.2898   LearningRate 0.0187   Epoch: 11   Global Step: 189510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:25,668-Speed 3264.38 samples/sec   Loss 1.3162   LearningRate 0.0187   Epoch: 11   Global Step: 189520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:28,797-Speed 3273.23 samples/sec   Loss 1.3037   LearningRate 0.0187   Epoch: 11   Global Step: 189530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:31,917-Speed 3282.16 samples/sec   Loss 1.2871   LearningRate 0.0187   Epoch: 11   Global Step: 189540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:34,995-Speed 3327.68 samples/sec   Loss 1.3530   LearningRate 0.0187   Epoch: 11   Global Step: 189550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:38,120-Speed 3278.34 samples/sec   Loss 1.3003   LearningRate 0.0187   Epoch: 11   Global Step: 189560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:41,206-Speed 3318.84 samples/sec   Loss 1.2779   LearningRate 0.0187   Epoch: 11   Global Step: 189570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:44,282-Speed 3329.71 samples/sec   Loss 1.2826   LearningRate 0.0187   Epoch: 11   Global Step: 189580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:47,370-Speed 3316.98 samples/sec   Loss 1.2676   LearningRate 0.0187   Epoch: 11   Global Step: 189590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:17:50,466-Speed 3307.76 samples/sec   Loss 1.2891   LearningRate 0.0187   Epoch: 11   Global Step: 189600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:17:53,573-Speed 3297.02 samples/sec   Loss 1.3058   LearningRate 0.0187   Epoch: 11   Global Step: 189610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:56,688-Speed 3287.60 samples/sec   Loss 1.2767   LearningRate 0.0187   Epoch: 11   Global Step: 189620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:17:59,763-Speed 3331.72 samples/sec   Loss 1.2333   LearningRate 0.0187   Epoch: 11   Global Step: 189630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:18:02,909-Speed 3254.64 samples/sec   Loss 1.2529   LearningRate 0.0187   Epoch: 11   Global Step: 189640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:18:06,131-Speed 3179.58 samples/sec   Loss 1.2585   LearningRate 0.0187   Epoch: 11   Global Step: 189650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:18:09,216-Speed 3320.08 samples/sec   Loss 1.2786   LearningRate 0.0186   Epoch: 11   Global Step: 189660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:18:12,292-Speed 3329.83 samples/sec   Loss 1.2607   LearningRate 0.0186   Epoch: 11   Global Step: 189670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:18:15,371-Speed 3325.95 samples/sec   Loss 1.2976   LearningRate 0.0186   Epoch: 11   Global Step: 189680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:18:18,462-Speed 3313.79 samples/sec   Loss 1.2933   LearningRate 0.0186   Epoch: 11   Global Step: 189690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:18:21,539-Speed 3329.11 samples/sec   Loss 1.2468   LearningRate 0.0186   Epoch: 11   Global Step: 189700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:18:24,614-Speed 3330.26 samples/sec   Loss 1.3420   LearningRate 0.0186   Epoch: 11   Global Step: 189710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:27,700-Speed 3319.27 samples/sec   Loss 1.2754   LearningRate 0.0186   Epoch: 11   Global Step: 189720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:30,795-Speed 3309.58 samples/sec   Loss 1.3131   LearningRate 0.0186   Epoch: 11   Global Step: 189730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:33,881-Speed 3318.87 samples/sec   Loss 1.2902   LearningRate 0.0186   Epoch: 11   Global Step: 189740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:36,959-Speed 3327.82 samples/sec   Loss 1.2236   LearningRate 0.0186   Epoch: 11   Global Step: 189750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:40,062-Speed 3301.14 samples/sec   Loss 1.3100   LearningRate 0.0186   Epoch: 11   Global Step: 189760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:43,230-Speed 3232.43 samples/sec   Loss 1.2936   LearningRate 0.0186   Epoch: 11   Global Step: 189770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:46,310-Speed 3325.94 samples/sec   Loss 1.2520   LearningRate 0.0186   Epoch: 11   Global Step: 189780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:49,390-Speed 3324.58 samples/sec   Loss 1.2927   LearningRate 0.0186   Epoch: 11   Global Step: 189790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:52,506-Speed 3287.85 samples/sec   Loss 1.2123   LearningRate 0.0186   Epoch: 11   Global Step: 189800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:55,587-Speed 3324.42 samples/sec   Loss 1.3332   LearningRate 0.0186   Epoch: 11   Global Step: 189810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:18:58,843-Speed 3145.50 samples/sec   Loss 1.3323   LearningRate 0.0186   Epoch: 11   Global Step: 189820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:19:01,948-Speed 3299.10 samples/sec   Loss 1.3016   LearningRate 0.0186   Epoch: 11   Global Step: 189830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:19:05,030-Speed 3323.77 samples/sec   Loss 1.2587   LearningRate 0.0186   Epoch: 11   Global Step: 189840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:19:08,105-Speed 3329.79 samples/sec   Loss 1.2676   LearningRate 0.0186   Epoch: 11   Global Step: 189850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:19:11,181-Speed 3329.98 samples/sec   Loss 1.2699   LearningRate 0.0186   Epoch: 11   Global Step: 189860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:19:14,382-Speed 3200.00 samples/sec   Loss 1.2639   LearningRate 0.0186   Epoch: 11   Global Step: 189870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:19:17,537-Speed 3246.13 samples/sec   Loss 1.3096   LearningRate 0.0186   Epoch: 11   Global Step: 189880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:19:20,670-Speed 3268.96 samples/sec   Loss 1.3012   LearningRate 0.0186   Epoch: 11   Global Step: 189890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:19:23,761-Speed 3314.61 samples/sec   Loss 1.3095   LearningRate 0.0186   Epoch: 11   Global Step: 189900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:19:26,849-Speed 3317.02 samples/sec   Loss 1.3046   LearningRate 0.0186   Epoch: 11   Global Step: 189910   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:19:29,923-Speed 3331.93 samples/sec   Loss 1.2648   LearningRate 0.0186   Epoch: 11   Global Step: 189920   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:19:33,043-Speed 3283.10 samples/sec   Loss 1.2689   LearningRate 0.0186   Epoch: 11   Global Step: 189930   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:19:36,120-Speed 3327.54 samples/sec   Loss 1.3287   LearningRate 0.0186   Epoch: 11   Global Step: 189940   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:19:39,198-Speed 3328.51 samples/sec   Loss 1.2585   LearningRate 0.0186   Epoch: 11   Global Step: 189950   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:19:42,273-Speed 3330.04 samples/sec   Loss 1.2896   LearningRate 0.0186   Epoch: 11   Global Step: 189960   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:19:45,364-Speed 3314.16 samples/sec   Loss 1.2723   LearningRate 0.0186   Epoch: 11   Global Step: 189970   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:19:48,606-Speed 3159.31 samples/sec   Loss 1.2848   LearningRate 0.0186   Epoch: 11   Global Step: 189980   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:19:51,690-Speed 3321.51 samples/sec   Loss 1.2711   LearningRate 0.0186   Epoch: 11   Global Step: 189990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:19:54,828-Speed 3263.34 samples/sec   Loss 1.3254   LearningRate 0.0186   Epoch: 11   Global Step: 190000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:20:38,211-[lfw][190000]XNorm: 21.337224
Training: 2022-04-11 19:20:38,212-[lfw][190000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-11 19:20:38,212-[lfw][190000]Accuracy-Highest: 0.99817
Training: 2022-04-11 19:21:28,603-[cfp_fp][190000]XNorm: 21.182673
Training: 2022-04-11 19:21:28,603-[cfp_fp][190000]Accuracy-Flip: 0.98814+-0.00589
Training: 2022-04-11 19:21:28,604-[cfp_fp][190000]Accuracy-Highest: 0.98971
Training: 2022-04-11 19:22:11,980-[agedb_30][190000]XNorm: 21.754465
Training: 2022-04-11 19:22:11,980-[agedb_30][190000]Accuracy-Flip: 0.98433+-0.00742
Training: 2022-04-11 19:22:11,981-[agedb_30][190000]Accuracy-Highest: 0.98500
Training: 2022-04-11 19:22:15,048-Speed 73.03 samples/sec   Loss 1.3221   LearningRate 0.0186   Epoch: 11   Global Step: 190010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:18,124-Speed 3329.60 samples/sec   Loss 1.3174   LearningRate 0.0186   Epoch: 11   Global Step: 190020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:21,201-Speed 3329.25 samples/sec   Loss 1.2907   LearningRate 0.0186   Epoch: 11   Global Step: 190030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:24,341-Speed 3262.16 samples/sec   Loss 1.3203   LearningRate 0.0186   Epoch: 11   Global Step: 190040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:27,475-Speed 3267.61 samples/sec   Loss 1.3043   LearningRate 0.0185   Epoch: 11   Global Step: 190050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:30,543-Speed 3338.53 samples/sec   Loss 1.3148   LearningRate 0.0185   Epoch: 11   Global Step: 190060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:33,608-Speed 3341.41 samples/sec   Loss 1.2893   LearningRate 0.0185   Epoch: 11   Global Step: 190070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:36,684-Speed 3330.31 samples/sec   Loss 1.2657   LearningRate 0.0185   Epoch: 11   Global Step: 190080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:39,785-Speed 3302.67 samples/sec   Loss 1.2999   LearningRate 0.0185   Epoch: 11   Global Step: 190090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:42,859-Speed 3332.29 samples/sec   Loss 1.3229   LearningRate 0.0185   Epoch: 11   Global Step: 190100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:45,944-Speed 3319.19 samples/sec   Loss 1.2991   LearningRate 0.0185   Epoch: 11   Global Step: 190110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:49,024-Speed 3325.94 samples/sec   Loss 1.2593   LearningRate 0.0185   Epoch: 11   Global Step: 190120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:52,107-Speed 3322.19 samples/sec   Loss 1.2661   LearningRate 0.0185   Epoch: 11   Global Step: 190130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:55,218-Speed 3292.61 samples/sec   Loss 1.2695   LearningRate 0.0185   Epoch: 11   Global Step: 190140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:22:58,341-Speed 3279.18 samples/sec   Loss 1.3099   LearningRate 0.0185   Epoch: 11   Global Step: 190150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:23:01,399-Speed 3349.26 samples/sec   Loss 1.2871   LearningRate 0.0185   Epoch: 11   Global Step: 190160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:04,478-Speed 3326.54 samples/sec   Loss 1.2937   LearningRate 0.0185   Epoch: 11   Global Step: 190170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:07,562-Speed 3321.60 samples/sec   Loss 1.2759   LearningRate 0.0185   Epoch: 11   Global Step: 190180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:10,642-Speed 3325.17 samples/sec   Loss 1.2895   LearningRate 0.0185   Epoch: 11   Global Step: 190190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:13,719-Speed 3328.94 samples/sec   Loss 1.2877   LearningRate 0.0185   Epoch: 11   Global Step: 190200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:16,793-Speed 3332.55 samples/sec   Loss 1.3473   LearningRate 0.0185   Epoch: 11   Global Step: 190210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:19,876-Speed 3321.94 samples/sec   Loss 1.2997   LearningRate 0.0185   Epoch: 11   Global Step: 190220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:22,966-Speed 3314.73 samples/sec   Loss 1.2676   LearningRate 0.0185   Epoch: 11   Global Step: 190230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:26,042-Speed 3329.50 samples/sec   Loss 1.2908   LearningRate 0.0185   Epoch: 11   Global Step: 190240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:29,112-Speed 3336.39 samples/sec   Loss 1.3289   LearningRate 0.0185   Epoch: 11   Global Step: 190250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:32,183-Speed 3334.72 samples/sec   Loss 1.2856   LearningRate 0.0185   Epoch: 11   Global Step: 190260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:23:35,270-Speed 3318.70 samples/sec   Loss 1.2904   LearningRate 0.0185   Epoch: 11   Global Step: 190270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:23:38,351-Speed 3323.75 samples/sec   Loss 1.3171   LearningRate 0.0185   Epoch: 11   Global Step: 190280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:23:41,438-Speed 3318.23 samples/sec   Loss 1.3151   LearningRate 0.0185   Epoch: 11   Global Step: 190290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:23:44,511-Speed 3333.13 samples/sec   Loss 1.3145   LearningRate 0.0185   Epoch: 11   Global Step: 190300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:47,610-Speed 3304.77 samples/sec   Loss 1.3485   LearningRate 0.0185   Epoch: 11   Global Step: 190310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:50,691-Speed 3325.03 samples/sec   Loss 1.2775   LearningRate 0.0185   Epoch: 11   Global Step: 190320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:53,760-Speed 3337.62 samples/sec   Loss 1.3375   LearningRate 0.0185   Epoch: 11   Global Step: 190330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:23:56,841-Speed 3323.58 samples/sec   Loss 1.3084   LearningRate 0.0185   Epoch: 11   Global Step: 190340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:24:00,005-Speed 3237.34 samples/sec   Loss 1.3261   LearningRate 0.0185   Epoch: 11   Global Step: 190350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:24:03,083-Speed 3327.46 samples/sec   Loss 1.2991   LearningRate 0.0185   Epoch: 11   Global Step: 190360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:24:06,158-Speed 3331.40 samples/sec   Loss 1.2892   LearningRate 0.0185   Epoch: 11   Global Step: 190370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:24:09,234-Speed 3329.35 samples/sec   Loss 1.3222   LearningRate 0.0185   Epoch: 11   Global Step: 190380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:24:12,315-Speed 3324.87 samples/sec   Loss 1.3149   LearningRate 0.0185   Epoch: 11   Global Step: 190390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:24:15,386-Speed 3334.83 samples/sec   Loss 1.3132   LearningRate 0.0185   Epoch: 11   Global Step: 190400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:18,459-Speed 3333.14 samples/sec   Loss 1.3557   LearningRate 0.0185   Epoch: 11   Global Step: 190410   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:21,539-Speed 3325.43 samples/sec   Loss 1.2651   LearningRate 0.0185   Epoch: 11   Global Step: 190420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:24,606-Speed 3339.60 samples/sec   Loss 1.2628   LearningRate 0.0185   Epoch: 11   Global Step: 190430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:27,678-Speed 3333.31 samples/sec   Loss 1.3003   LearningRate 0.0184   Epoch: 11   Global Step: 190440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:30,748-Speed 3337.10 samples/sec   Loss 1.3171   LearningRate 0.0184   Epoch: 11   Global Step: 190450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:33,845-Speed 3306.49 samples/sec   Loss 1.3227   LearningRate 0.0184   Epoch: 11   Global Step: 190460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:36,928-Speed 3322.76 samples/sec   Loss 1.2770   LearningRate 0.0184   Epoch: 11   Global Step: 190470   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:40,005-Speed 3328.34 samples/sec   Loss 1.2566   LearningRate 0.0184   Epoch: 11   Global Step: 190480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:43,082-Speed 3328.62 samples/sec   Loss 1.2669   LearningRate 0.0184   Epoch: 11   Global Step: 190490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:46,141-Speed 3349.13 samples/sec   Loss 1.2543   LearningRate 0.0184   Epoch: 11   Global Step: 190500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:49,274-Speed 3268.67 samples/sec   Loss 1.2770   LearningRate 0.0184   Epoch: 11   Global Step: 190510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:52,374-Speed 3303.42 samples/sec   Loss 1.2651   LearningRate 0.0184   Epoch: 11   Global Step: 190520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:55,521-Speed 3255.49 samples/sec   Loss 1.2932   LearningRate 0.0184   Epoch: 11   Global Step: 190530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:24:58,611-Speed 3314.80 samples/sec   Loss 1.3111   LearningRate 0.0184   Epoch: 11   Global Step: 190540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:01,696-Speed 3319.86 samples/sec   Loss 1.3232   LearningRate 0.0184   Epoch: 11   Global Step: 190550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:04,794-Speed 3305.57 samples/sec   Loss 1.2842   LearningRate 0.0184   Epoch: 11   Global Step: 190560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:08,015-Speed 3180.17 samples/sec   Loss 1.2434   LearningRate 0.0184   Epoch: 11   Global Step: 190570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:11,130-Speed 3287.69 samples/sec   Loss 1.2900   LearningRate 0.0184   Epoch: 11   Global Step: 190580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:14,348-Speed 3183.41 samples/sec   Loss 1.2645   LearningRate 0.0184   Epoch: 11   Global Step: 190590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:17,421-Speed 3332.94 samples/sec   Loss 1.2685   LearningRate 0.0184   Epoch: 11   Global Step: 190600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:20,493-Speed 3334.19 samples/sec   Loss 1.3043   LearningRate 0.0184   Epoch: 11   Global Step: 190610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:23,563-Speed 3335.69 samples/sec   Loss 1.3264   LearningRate 0.0184   Epoch: 11   Global Step: 190620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:26,652-Speed 3315.55 samples/sec   Loss 1.3290   LearningRate 0.0184   Epoch: 11   Global Step: 190630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:29,738-Speed 3319.63 samples/sec   Loss 1.2984   LearningRate 0.0184   Epoch: 11   Global Step: 190640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:25:32,810-Speed 3334.17 samples/sec   Loss 1.2791   LearningRate 0.0184   Epoch: 11   Global Step: 190650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:25:35,893-Speed 3322.31 samples/sec   Loss 1.2952   LearningRate 0.0184   Epoch: 11   Global Step: 190660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:25:38,969-Speed 3328.83 samples/sec   Loss 1.3153   LearningRate 0.0184   Epoch: 11   Global Step: 190670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:25:42,046-Speed 3328.97 samples/sec   Loss 1.2970   LearningRate 0.0184   Epoch: 11   Global Step: 190680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:25:45,187-Speed 3261.21 samples/sec   Loss 1.3174   LearningRate 0.0184   Epoch: 11   Global Step: 190690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:25:48,315-Speed 3274.83 samples/sec   Loss 1.3122   LearningRate 0.0184   Epoch: 11   Global Step: 190700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:25:51,394-Speed 3325.86 samples/sec   Loss 1.3328   LearningRate 0.0184   Epoch: 11   Global Step: 190710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:54,491-Speed 3306.55 samples/sec   Loss 1.2871   LearningRate 0.0184   Epoch: 11   Global Step: 190720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:25:57,643-Speed 3250.47 samples/sec   Loss 1.3273   LearningRate 0.0184   Epoch: 11   Global Step: 190730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:26:00,722-Speed 3326.76 samples/sec   Loss 1.3530   LearningRate 0.0184   Epoch: 11   Global Step: 190740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:26:03,805-Speed 3321.95 samples/sec   Loss 1.2647   LearningRate 0.0184   Epoch: 11   Global Step: 190750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:26:06,875-Speed 3335.93 samples/sec   Loss 1.3483   LearningRate 0.0184   Epoch: 11   Global Step: 190760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:26:09,950-Speed 3330.68 samples/sec   Loss 1.3079   LearningRate 0.0184   Epoch: 11   Global Step: 190770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:26:13,020-Speed 3336.75 samples/sec   Loss 1.2717   LearningRate 0.0184   Epoch: 11   Global Step: 190780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:26:16,102-Speed 3322.48 samples/sec   Loss 1.2824   LearningRate 0.0184   Epoch: 11   Global Step: 190790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:26:19,174-Speed 3334.29 samples/sec   Loss 1.3078   LearningRate 0.0184   Epoch: 11   Global Step: 190800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:26:22,266-Speed 3313.39 samples/sec   Loss 1.3344   LearningRate 0.0184   Epoch: 11   Global Step: 190810   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:25,345-Speed 3326.47 samples/sec   Loss 1.3170   LearningRate 0.0184   Epoch: 11   Global Step: 190820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:28,427-Speed 3322.74 samples/sec   Loss 1.2702   LearningRate 0.0183   Epoch: 11   Global Step: 190830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:31,502-Speed 3330.80 samples/sec   Loss 1.3431   LearningRate 0.0183   Epoch: 11   Global Step: 190840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:34,577-Speed 3330.62 samples/sec   Loss 1.2718   LearningRate 0.0183   Epoch: 11   Global Step: 190850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:37,659-Speed 3323.76 samples/sec   Loss 1.3142   LearningRate 0.0183   Epoch: 11   Global Step: 190860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:40,734-Speed 3330.77 samples/sec   Loss 1.3289   LearningRate 0.0183   Epoch: 11   Global Step: 190870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:43,813-Speed 3326.63 samples/sec   Loss 1.2917   LearningRate 0.0183   Epoch: 11   Global Step: 190880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:46,894-Speed 3324.48 samples/sec   Loss 1.2888   LearningRate 0.0183   Epoch: 11   Global Step: 190890   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:49,982-Speed 3317.30 samples/sec   Loss 1.2729   LearningRate 0.0183   Epoch: 11   Global Step: 190900   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:53,066-Speed 3320.75 samples/sec   Loss 1.3029   LearningRate 0.0183   Epoch: 11   Global Step: 190910   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-04-11 19:26:56,169-Speed 3300.86 samples/sec   Loss 1.3357   LearningRate 0.0183   Epoch: 11   Global Step: 190920   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:26:59,279-Speed 3293.15 samples/sec   Loss 1.2982   LearningRate 0.0183   Epoch: 11   Global Step: 190930   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:27:02,351-Speed 3333.66 samples/sec   Loss 1.2893   LearningRate 0.0183   Epoch: 11   Global Step: 190940   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:27:05,424-Speed 3333.31 samples/sec   Loss 1.3192   LearningRate 0.0183   Epoch: 11   Global Step: 190950   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:27:08,511-Speed 3317.24 samples/sec   Loss 1.2991   LearningRate 0.0183   Epoch: 11   Global Step: 190960   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:27:11,603-Speed 3313.15 samples/sec   Loss 1.3417   LearningRate 0.0183   Epoch: 11   Global Step: 190970   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:27:14,691-Speed 3316.66 samples/sec   Loss 1.3026   LearningRate 0.0183   Epoch: 11   Global Step: 190980   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:27:17,766-Speed 3331.50 samples/sec   Loss 1.2962   LearningRate 0.0183   Epoch: 11   Global Step: 190990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:20,834-Speed 3338.36 samples/sec   Loss 1.2868   LearningRate 0.0183   Epoch: 11   Global Step: 191000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:24,005-Speed 3230.23 samples/sec   Loss 1.2628   LearningRate 0.0183   Epoch: 11   Global Step: 191010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:27,095-Speed 3314.07 samples/sec   Loss 1.3145   LearningRate 0.0183   Epoch: 11   Global Step: 191020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:30,196-Speed 3303.01 samples/sec   Loss 1.3168   LearningRate 0.0183   Epoch: 11   Global Step: 191030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:33,308-Speed 3290.94 samples/sec   Loss 1.2923   LearningRate 0.0183   Epoch: 11   Global Step: 191040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:36,381-Speed 3333.36 samples/sec   Loss 1.3294   LearningRate 0.0183   Epoch: 11   Global Step: 191050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:39,472-Speed 3313.95 samples/sec   Loss 1.2901   LearningRate 0.0183   Epoch: 11   Global Step: 191060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:42,553-Speed 3324.17 samples/sec   Loss 1.2634   LearningRate 0.0183   Epoch: 11   Global Step: 191070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:45,646-Speed 3311.57 samples/sec   Loss 1.3045   LearningRate 0.0183   Epoch: 11   Global Step: 191080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:27:48,728-Speed 3323.91 samples/sec   Loss 1.2995   LearningRate 0.0183   Epoch: 11   Global Step: 191090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:27:51,801-Speed 3332.39 samples/sec   Loss 1.3311   LearningRate 0.0183   Epoch: 11   Global Step: 191100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:27:54,877-Speed 3330.18 samples/sec   Loss 1.3151   LearningRate 0.0183   Epoch: 11   Global Step: 191110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:27:57,986-Speed 3293.59 samples/sec   Loss 1.3134   LearningRate 0.0183   Epoch: 11   Global Step: 191120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:01,118-Speed 3270.38 samples/sec   Loss 1.3122   LearningRate 0.0183   Epoch: 11   Global Step: 191130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:04,273-Speed 3246.51 samples/sec   Loss 1.3152   LearningRate 0.0183   Epoch: 11   Global Step: 191140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:07,358-Speed 3320.03 samples/sec   Loss 1.3124   LearningRate 0.0183   Epoch: 11   Global Step: 191150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:10,464-Speed 3298.50 samples/sec   Loss 1.3033   LearningRate 0.0183   Epoch: 11   Global Step: 191160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:13,580-Speed 3286.42 samples/sec   Loss 1.3142   LearningRate 0.0183   Epoch: 11   Global Step: 191170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:16,726-Speed 3255.72 samples/sec   Loss 1.3213   LearningRate 0.0183   Epoch: 11   Global Step: 191180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:19,803-Speed 3328.56 samples/sec   Loss 1.2537   LearningRate 0.0183   Epoch: 11   Global Step: 191190   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-04-11 19:28:22,865-Speed 3345.57 samples/sec   Loss 1.2811   LearningRate 0.0183   Epoch: 11   Global Step: 191200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:25,939-Speed 3331.32 samples/sec   Loss 1.3080   LearningRate 0.0183   Epoch: 11   Global Step: 191210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:29,037-Speed 3305.73 samples/sec   Loss 1.2132   LearningRate 0.0182   Epoch: 11   Global Step: 191220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:32,124-Speed 3317.96 samples/sec   Loss 1.2450   LearningRate 0.0182   Epoch: 11   Global Step: 191230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:35,224-Speed 3304.69 samples/sec   Loss 1.3253   LearningRate 0.0182   Epoch: 11   Global Step: 191240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:38,367-Speed 3258.20 samples/sec   Loss 1.3099   LearningRate 0.0182   Epoch: 11   Global Step: 191250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:41,508-Speed 3261.06 samples/sec   Loss 1.2920   LearningRate 0.0182   Epoch: 11   Global Step: 191260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:44,586-Speed 3328.15 samples/sec   Loss 1.2898   LearningRate 0.0182   Epoch: 11   Global Step: 191270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:47,678-Speed 3312.45 samples/sec   Loss 1.2979   LearningRate 0.0182   Epoch: 11   Global Step: 191280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:50,756-Speed 3327.33 samples/sec   Loss 1.2860   LearningRate 0.0182   Epoch: 11   Global Step: 191290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:53,828-Speed 3333.88 samples/sec   Loss 1.3029   LearningRate 0.0182   Epoch: 11   Global Step: 191300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:28:57,003-Speed 3226.40 samples/sec   Loss 1.2863   LearningRate 0.0182   Epoch: 11   Global Step: 191310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:29:00,133-Speed 3272.04 samples/sec   Loss 1.2802   LearningRate 0.0182   Epoch: 11   Global Step: 191320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:03,224-Speed 3312.94 samples/sec   Loss 1.2997   LearningRate 0.0182   Epoch: 11   Global Step: 191330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:06,316-Speed 3312.88 samples/sec   Loss 1.2828   LearningRate 0.0182   Epoch: 11   Global Step: 191340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:09,443-Speed 3275.57 samples/sec   Loss 1.3150   LearningRate 0.0182   Epoch: 11   Global Step: 191350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:12,637-Speed 3206.70 samples/sec   Loss 1.2846   LearningRate 0.0182   Epoch: 11   Global Step: 191360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:15,805-Speed 3233.48 samples/sec   Loss 1.3455   LearningRate 0.0182   Epoch: 11   Global Step: 191370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:18,892-Speed 3317.72 samples/sec   Loss 1.3222   LearningRate 0.0182   Epoch: 11   Global Step: 191380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:22,041-Speed 3253.08 samples/sec   Loss 1.3338   LearningRate 0.0182   Epoch: 11   Global Step: 191390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:25,125-Speed 3320.60 samples/sec   Loss 1.3286   LearningRate 0.0182   Epoch: 11   Global Step: 191400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:28,195-Speed 3336.42 samples/sec   Loss 1.3216   LearningRate 0.0182   Epoch: 11   Global Step: 191410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:31,273-Speed 3326.98 samples/sec   Loss 1.3212   LearningRate 0.0182   Epoch: 11   Global Step: 191420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:29:34,347-Speed 3332.41 samples/sec   Loss 1.3017   LearningRate 0.0182   Epoch: 11   Global Step: 191430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:29:37,432-Speed 3319.81 samples/sec   Loss 1.3501   LearningRate 0.0182   Epoch: 11   Global Step: 191440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:29:40,505-Speed 3333.81 samples/sec   Loss 1.3517   LearningRate 0.0182   Epoch: 11   Global Step: 191450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:29:43,599-Speed 3310.14 samples/sec   Loss 1.2884   LearningRate 0.0182   Epoch: 11   Global Step: 191460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:46,675-Speed 3329.27 samples/sec   Loss 1.2634   LearningRate 0.0182   Epoch: 11   Global Step: 191470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:49,745-Speed 3336.88 samples/sec   Loss 1.3363   LearningRate 0.0182   Epoch: 11   Global Step: 191480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:52,836-Speed 3313.12 samples/sec   Loss 1.3235   LearningRate 0.0182   Epoch: 11   Global Step: 191490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:55,922-Speed 3318.97 samples/sec   Loss 1.3049   LearningRate 0.0182   Epoch: 11   Global Step: 191500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:29:59,021-Speed 3305.13 samples/sec   Loss 1.3431   LearningRate 0.0182   Epoch: 11   Global Step: 191510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:02,139-Speed 3285.02 samples/sec   Loss 1.2953   LearningRate 0.0182   Epoch: 11   Global Step: 191520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:05,251-Speed 3291.71 samples/sec   Loss 1.3309   LearningRate 0.0182   Epoch: 11   Global Step: 191530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:08,326-Speed 3330.17 samples/sec   Loss 1.2898   LearningRate 0.0182   Epoch: 11   Global Step: 191540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:11,407-Speed 3325.01 samples/sec   Loss 1.3322   LearningRate 0.0182   Epoch: 11   Global Step: 191550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:14,499-Speed 3312.62 samples/sec   Loss 1.3089   LearningRate 0.0182   Epoch: 11   Global Step: 191560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:30:17,569-Speed 3335.34 samples/sec   Loss 1.3113   LearningRate 0.0182   Epoch: 11   Global Step: 191570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:20,645-Speed 3330.27 samples/sec   Loss 1.3496   LearningRate 0.0182   Epoch: 11   Global Step: 191580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:23,720-Speed 3330.26 samples/sec   Loss 1.3104   LearningRate 0.0182   Epoch: 11   Global Step: 191590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:26,807-Speed 3318.79 samples/sec   Loss 1.3132   LearningRate 0.0182   Epoch: 11   Global Step: 191600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:29,882-Speed 3330.85 samples/sec   Loss 1.2778   LearningRate 0.0181   Epoch: 11   Global Step: 191610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:32,955-Speed 3333.18 samples/sec   Loss 1.2718   LearningRate 0.0181   Epoch: 11   Global Step: 191620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:36,115-Speed 3241.47 samples/sec   Loss 1.2879   LearningRate 0.0181   Epoch: 11   Global Step: 191630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:39,195-Speed 3325.00 samples/sec   Loss 1.3007   LearningRate 0.0181   Epoch: 11   Global Step: 191640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:42,291-Speed 3308.26 samples/sec   Loss 1.3455   LearningRate 0.0181   Epoch: 11   Global Step: 191650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:45,513-Speed 3178.79 samples/sec   Loss 1.2640   LearningRate 0.0181   Epoch: 11   Global Step: 191660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:30:48,599-Speed 3319.00 samples/sec   Loss 1.3289   LearningRate 0.0181   Epoch: 11   Global Step: 191670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:30:51,718-Speed 3284.22 samples/sec   Loss 1.3056   LearningRate 0.0181   Epoch: 11   Global Step: 191680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:30:54,898-Speed 3220.83 samples/sec   Loss 1.2560   LearningRate 0.0181   Epoch: 11   Global Step: 191690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:30:57,998-Speed 3304.51 samples/sec   Loss 1.3347   LearningRate 0.0181   Epoch: 11   Global Step: 191700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:31:01,116-Speed 3284.10 samples/sec   Loss 1.2906   LearningRate 0.0181   Epoch: 11   Global Step: 191710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:31:04,184-Speed 3338.30 samples/sec   Loss 1.3114   LearningRate 0.0181   Epoch: 11   Global Step: 191720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:07,282-Speed 3306.38 samples/sec   Loss 1.2853   LearningRate 0.0181   Epoch: 11   Global Step: 191730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:10,373-Speed 3313.74 samples/sec   Loss 1.3356   LearningRate 0.0181   Epoch: 11   Global Step: 191740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:13,585-Speed 3188.51 samples/sec   Loss 1.2706   LearningRate 0.0181   Epoch: 11   Global Step: 191750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:16,682-Speed 3306.99 samples/sec   Loss 1.3318   LearningRate 0.0181   Epoch: 11   Global Step: 191760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:19,788-Speed 3297.60 samples/sec   Loss 1.3350   LearningRate 0.0181   Epoch: 11   Global Step: 191770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:23,069-Speed 3121.70 samples/sec   Loss 1.3189   LearningRate 0.0181   Epoch: 11   Global Step: 191780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:26,270-Speed 3199.77 samples/sec   Loss 1.3143   LearningRate 0.0181   Epoch: 11   Global Step: 191790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:29,345-Speed 3331.65 samples/sec   Loss 1.2869   LearningRate 0.0181   Epoch: 11   Global Step: 191800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:32,428-Speed 3322.28 samples/sec   Loss 1.3279   LearningRate 0.0181   Epoch: 11   Global Step: 191810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:31:35,509-Speed 3323.37 samples/sec   Loss 1.3212   LearningRate 0.0181   Epoch: 11   Global Step: 191820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:31:38,601-Speed 3312.76 samples/sec   Loss 1.2515   LearningRate 0.0181   Epoch: 11   Global Step: 191830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:31:41,683-Speed 3323.71 samples/sec   Loss 1.3614   LearningRate 0.0181   Epoch: 11   Global Step: 191840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:31:44,786-Speed 3300.67 samples/sec   Loss 1.3032   LearningRate 0.0181   Epoch: 11   Global Step: 191850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:31:47,863-Speed 3328.83 samples/sec   Loss 1.3851   LearningRate 0.0181   Epoch: 11   Global Step: 191860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:31:50,945-Speed 3323.27 samples/sec   Loss 1.2671   LearningRate 0.0181   Epoch: 11   Global Step: 191870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:31:54,061-Speed 3286.50 samples/sec   Loss 1.3125   LearningRate 0.0181   Epoch: 11   Global Step: 191880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:31:57,129-Speed 3338.39 samples/sec   Loss 1.3038   LearningRate 0.0181   Epoch: 11   Global Step: 191890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:00,207-Speed 3328.44 samples/sec   Loss 1.2966   LearningRate 0.0181   Epoch: 11   Global Step: 191900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:03,330-Speed 3278.91 samples/sec   Loss 1.2689   LearningRate 0.0181   Epoch: 11   Global Step: 191910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:06,407-Speed 3328.79 samples/sec   Loss 1.3036   LearningRate 0.0181   Epoch: 11   Global Step: 191920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:09,497-Speed 3314.97 samples/sec   Loss 1.3054   LearningRate 0.0181   Epoch: 11   Global Step: 191930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:12,579-Speed 3322.98 samples/sec   Loss 1.3129   LearningRate 0.0181   Epoch: 11   Global Step: 191940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:15,743-Speed 3236.81 samples/sec   Loss 1.3839   LearningRate 0.0181   Epoch: 11   Global Step: 191950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:18,846-Speed 3301.39 samples/sec   Loss 1.3384   LearningRate 0.0181   Epoch: 11   Global Step: 191960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:21,923-Speed 3328.46 samples/sec   Loss 1.3408   LearningRate 0.0181   Epoch: 11   Global Step: 191970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:25,005-Speed 3323.73 samples/sec   Loss 1.3298   LearningRate 0.0181   Epoch: 11   Global Step: 191980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:32:28,095-Speed 3314.45 samples/sec   Loss 1.2756   LearningRate 0.0181   Epoch: 11   Global Step: 191990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:32:31,186-Speed 3313.94 samples/sec   Loss 1.3220   LearningRate 0.0180   Epoch: 11   Global Step: 192000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:33:14,809-[lfw][192000]XNorm: 22.004050
Training: 2022-04-11 19:33:14,810-[lfw][192000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 19:33:14,810-[lfw][192000]Accuracy-Highest: 0.99817
Training: 2022-04-11 19:34:05,513-[cfp_fp][192000]XNorm: 22.239280
Training: 2022-04-11 19:34:05,514-[cfp_fp][192000]Accuracy-Flip: 0.98829+-0.00534
Training: 2022-04-11 19:34:05,514-[cfp_fp][192000]Accuracy-Highest: 0.98971
Training: 2022-04-11 19:34:49,092-[agedb_30][192000]XNorm: 22.984833
Training: 2022-04-11 19:34:49,092-[agedb_30][192000]Accuracy-Flip: 0.98350+-0.00660
Training: 2022-04-11 19:34:49,093-[agedb_30][192000]Accuracy-Highest: 0.98500
Training: 2022-04-11 19:34:52,163-Speed 72.64 samples/sec   Loss 1.3528   LearningRate 0.0180   Epoch: 11   Global Step: 192010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:34:55,308-Speed 3255.94 samples/sec   Loss 1.3111   LearningRate 0.0180   Epoch: 11   Global Step: 192020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:34:58,413-Speed 3298.71 samples/sec   Loss 1.3172   LearningRate 0.0180   Epoch: 11   Global Step: 192030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:01,504-Speed 3313.83 samples/sec   Loss 1.2859   LearningRate 0.0180   Epoch: 11   Global Step: 192040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:04,578-Speed 3331.88 samples/sec   Loss 1.2776   LearningRate 0.0180   Epoch: 11   Global Step: 192050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:07,645-Speed 3339.86 samples/sec   Loss 1.3021   LearningRate 0.0180   Epoch: 11   Global Step: 192060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:10,721-Speed 3329.88 samples/sec   Loss 1.2911   LearningRate 0.0180   Epoch: 11   Global Step: 192070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:13,788-Speed 3339.68 samples/sec   Loss 1.3210   LearningRate 0.0180   Epoch: 11   Global Step: 192080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:16,881-Speed 3310.95 samples/sec   Loss 1.3011   LearningRate 0.0180   Epoch: 11   Global Step: 192090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:19,948-Speed 3339.82 samples/sec   Loss 1.3479   LearningRate 0.0180   Epoch: 11   Global Step: 192100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:23,018-Speed 3335.85 samples/sec   Loss 1.2759   LearningRate 0.0180   Epoch: 11   Global Step: 192110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:26,104-Speed 3319.16 samples/sec   Loss 1.3326   LearningRate 0.0180   Epoch: 11   Global Step: 192120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:35:29,250-Speed 3255.47 samples/sec   Loss 1.2956   LearningRate 0.0180   Epoch: 11   Global Step: 192130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:35:32,325-Speed 3330.72 samples/sec   Loss 1.3767   LearningRate 0.0180   Epoch: 11   Global Step: 192140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:35:35,425-Speed 3303.96 samples/sec   Loss 1.3460   LearningRate 0.0180   Epoch: 11   Global Step: 192150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:35:38,489-Speed 3343.50 samples/sec   Loss 1.3227   LearningRate 0.0180   Epoch: 11   Global Step: 192160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:41,571-Speed 3323.22 samples/sec   Loss 1.3319   LearningRate 0.0180   Epoch: 11   Global Step: 192170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:44,639-Speed 3337.92 samples/sec   Loss 1.3670   LearningRate 0.0180   Epoch: 11   Global Step: 192180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:47,716-Speed 3329.18 samples/sec   Loss 1.3726   LearningRate 0.0180   Epoch: 11   Global Step: 192190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:50,824-Speed 3295.41 samples/sec   Loss 1.3085   LearningRate 0.0180   Epoch: 11   Global Step: 192200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:53,945-Speed 3281.07 samples/sec   Loss 1.3517   LearningRate 0.0180   Epoch: 11   Global Step: 192210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:35:57,052-Speed 3297.15 samples/sec   Loss 1.3225   LearningRate 0.0180   Epoch: 11   Global Step: 192220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:00,135-Speed 3322.33 samples/sec   Loss 1.3148   LearningRate 0.0180   Epoch: 11   Global Step: 192230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:03,211-Speed 3330.58 samples/sec   Loss 1.3614   LearningRate 0.0180   Epoch: 11   Global Step: 192240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:06,362-Speed 3250.35 samples/sec   Loss 1.3440   LearningRate 0.0180   Epoch: 11   Global Step: 192250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:09,528-Speed 3234.45 samples/sec   Loss 1.3340   LearningRate 0.0180   Epoch: 11   Global Step: 192260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:36:12,742-Speed 3187.02 samples/sec   Loss 1.3488   LearningRate 0.0180   Epoch: 11   Global Step: 192270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:36:15,809-Speed 3339.00 samples/sec   Loss 1.3272   LearningRate 0.0180   Epoch: 11   Global Step: 192280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:36:18,910-Speed 3303.33 samples/sec   Loss 1.3626   LearningRate 0.0180   Epoch: 11   Global Step: 192290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:36:21,985-Speed 3330.83 samples/sec   Loss 1.3643   LearningRate 0.0180   Epoch: 11   Global Step: 192300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:25,068-Speed 3321.98 samples/sec   Loss 1.2943   LearningRate 0.0180   Epoch: 11   Global Step: 192310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:28,142-Speed 3332.44 samples/sec   Loss 1.3181   LearningRate 0.0180   Epoch: 11   Global Step: 192320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:31,213-Speed 3335.16 samples/sec   Loss 1.3669   LearningRate 0.0180   Epoch: 11   Global Step: 192330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:34,284-Speed 3334.74 samples/sec   Loss 1.3655   LearningRate 0.0180   Epoch: 11   Global Step: 192340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:37,465-Speed 3219.70 samples/sec   Loss 1.2817   LearningRate 0.0180   Epoch: 11   Global Step: 192350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:40,558-Speed 3311.43 samples/sec   Loss 1.2615   LearningRate 0.0180   Epoch: 11   Global Step: 192360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:43,684-Speed 3277.21 samples/sec   Loss 1.3224   LearningRate 0.0180   Epoch: 11   Global Step: 192370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:46,757-Speed 3332.53 samples/sec   Loss 1.3603   LearningRate 0.0180   Epoch: 11   Global Step: 192380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:49,837-Speed 3324.85 samples/sec   Loss 1.3427   LearningRate 0.0179   Epoch: 11   Global Step: 192390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:36:52,927-Speed 3315.17 samples/sec   Loss 1.3067   LearningRate 0.0179   Epoch: 11   Global Step: 192400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:36:56,016-Speed 3316.40 samples/sec   Loss 1.3408   LearningRate 0.0179   Epoch: 11   Global Step: 192410   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:36:59,088-Speed 3333.56 samples/sec   Loss 1.3033   LearningRate 0.0179   Epoch: 11   Global Step: 192420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:37:02,173-Speed 3320.54 samples/sec   Loss 1.2929   LearningRate 0.0179   Epoch: 11   Global Step: 192430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:37:05,244-Speed 3335.48 samples/sec   Loss 1.3026   LearningRate 0.0179   Epoch: 11   Global Step: 192440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:37:08,314-Speed 3335.75 samples/sec   Loss 1.2875   LearningRate 0.0179   Epoch: 11   Global Step: 192450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:37:11,406-Speed 3312.18 samples/sec   Loss 1.3145   LearningRate 0.0179   Epoch: 11   Global Step: 192460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:14,555-Speed 3252.54 samples/sec   Loss 1.2990   LearningRate 0.0179   Epoch: 11   Global Step: 192470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:17,687-Speed 3270.64 samples/sec   Loss 1.3175   LearningRate 0.0179   Epoch: 11   Global Step: 192480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:20,761-Speed 3331.89 samples/sec   Loss 1.3092   LearningRate 0.0179   Epoch: 11   Global Step: 192490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:23,829-Speed 3338.93 samples/sec   Loss 1.3458   LearningRate 0.0179   Epoch: 11   Global Step: 192500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:26,911-Speed 3323.39 samples/sec   Loss 1.3094   LearningRate 0.0179   Epoch: 11   Global Step: 192510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:29,983-Speed 3333.28 samples/sec   Loss 1.3460   LearningRate 0.0179   Epoch: 11   Global Step: 192520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:33,052-Speed 3338.38 samples/sec   Loss 1.3497   LearningRate 0.0179   Epoch: 11   Global Step: 192530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:36,128-Speed 3329.32 samples/sec   Loss 1.3368   LearningRate 0.0179   Epoch: 11   Global Step: 192540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:39,197-Speed 3336.88 samples/sec   Loss 1.3705   LearningRate 0.0179   Epoch: 11   Global Step: 192550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:37:42,280-Speed 3322.16 samples/sec   Loss 1.3132   LearningRate 0.0179   Epoch: 11   Global Step: 192560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:37:45,430-Speed 3251.75 samples/sec   Loss 1.3093   LearningRate 0.0179   Epoch: 11   Global Step: 192570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:37:48,519-Speed 3316.76 samples/sec   Loss 1.2595   LearningRate 0.0179   Epoch: 11   Global Step: 192580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:37:51,661-Speed 3258.83 samples/sec   Loss 1.3208   LearningRate 0.0179   Epoch: 11   Global Step: 192590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:37:54,734-Speed 3333.34 samples/sec   Loss 1.3669   LearningRate 0.0179   Epoch: 11   Global Step: 192600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:37:57,802-Speed 3338.96 samples/sec   Loss 1.3131   LearningRate 0.0179   Epoch: 11   Global Step: 192610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:38:00,879-Speed 3328.33 samples/sec   Loss 1.2682   LearningRate 0.0179   Epoch: 11   Global Step: 192620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:38:03,995-Speed 3287.35 samples/sec   Loss 1.3024   LearningRate 0.0179   Epoch: 11   Global Step: 192630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:07,082-Speed 3317.78 samples/sec   Loss 1.3014   LearningRate 0.0179   Epoch: 11   Global Step: 192640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:10,226-Speed 3257.52 samples/sec   Loss 1.3359   LearningRate 0.0179   Epoch: 11   Global Step: 192650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:13,347-Speed 3281.61 samples/sec   Loss 1.3443   LearningRate 0.0179   Epoch: 11   Global Step: 192660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:16,480-Speed 3269.66 samples/sec   Loss 1.3658   LearningRate 0.0179   Epoch: 11   Global Step: 192670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:19,548-Speed 3337.87 samples/sec   Loss 1.3082   LearningRate 0.0179   Epoch: 11   Global Step: 192680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:22,622-Speed 3332.53 samples/sec   Loss 1.3138   LearningRate 0.0179   Epoch: 11   Global Step: 192690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:25,697-Speed 3330.55 samples/sec   Loss 1.3784   LearningRate 0.0179   Epoch: 11   Global Step: 192700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:28,855-Speed 3243.31 samples/sec   Loss 1.3604   LearningRate 0.0179   Epoch: 11   Global Step: 192710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:31,929-Speed 3331.61 samples/sec   Loss 1.3107   LearningRate 0.0179   Epoch: 11   Global Step: 192720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:35,087-Speed 3243.25 samples/sec   Loss 1.3362   LearningRate 0.0179   Epoch: 11   Global Step: 192730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:38,174-Speed 3317.80 samples/sec   Loss 1.3350   LearningRate 0.0179   Epoch: 11   Global Step: 192740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:41,262-Speed 3317.46 samples/sec   Loss 1.3298   LearningRate 0.0179   Epoch: 11   Global Step: 192750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:44,360-Speed 3306.72 samples/sec   Loss 1.3232   LearningRate 0.0179   Epoch: 11   Global Step: 192760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:47,430-Speed 3335.64 samples/sec   Loss 1.2886   LearningRate 0.0179   Epoch: 11   Global Step: 192770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:50,506-Speed 3329.16 samples/sec   Loss 1.3605   LearningRate 0.0179   Epoch: 11   Global Step: 192780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:53,572-Speed 3340.56 samples/sec   Loss 1.3022   LearningRate 0.0178   Epoch: 11   Global Step: 192790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:56,641-Speed 3337.79 samples/sec   Loss 1.2827   LearningRate 0.0178   Epoch: 11   Global Step: 192800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:38:59,715-Speed 3331.90 samples/sec   Loss 1.2810   LearningRate 0.0178   Epoch: 11   Global Step: 192810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:39:02,799-Speed 3321.73 samples/sec   Loss 1.3444   LearningRate 0.0178   Epoch: 11   Global Step: 192820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:39:05,894-Speed 3309.09 samples/sec   Loss 1.3369   LearningRate 0.0178   Epoch: 11   Global Step: 192830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:39:08,982-Speed 3317.27 samples/sec   Loss 1.3255   LearningRate 0.0178   Epoch: 11   Global Step: 192840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:39:12,054-Speed 3334.02 samples/sec   Loss 1.3137   LearningRate 0.0178   Epoch: 11   Global Step: 192850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:39:15,141-Speed 3317.48 samples/sec   Loss 1.3363   LearningRate 0.0178   Epoch: 11   Global Step: 192860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:39:18,216-Speed 3330.70 samples/sec   Loss 1.3665   LearningRate 0.0178   Epoch: 11   Global Step: 192870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:39:21,269-Speed 3355.38 samples/sec   Loss 1.3478   LearningRate 0.0178   Epoch: 11   Global Step: 192880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:24,341-Speed 3333.58 samples/sec   Loss 1.3508   LearningRate 0.0178   Epoch: 11   Global Step: 192890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:27,435-Speed 3310.07 samples/sec   Loss 1.3689   LearningRate 0.0178   Epoch: 11   Global Step: 192900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:30,585-Speed 3251.13 samples/sec   Loss 1.3683   LearningRate 0.0178   Epoch: 11   Global Step: 192910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:33,671-Speed 3320.32 samples/sec   Loss 1.3211   LearningRate 0.0178   Epoch: 11   Global Step: 192920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:36,742-Speed 3334.31 samples/sec   Loss 1.3121   LearningRate 0.0178   Epoch: 11   Global Step: 192930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:39,840-Speed 3306.96 samples/sec   Loss 1.3678   LearningRate 0.0178   Epoch: 11   Global Step: 192940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:42,966-Speed 3275.92 samples/sec   Loss 1.3382   LearningRate 0.0178   Epoch: 11   Global Step: 192950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:46,055-Speed 3315.45 samples/sec   Loss 1.3752   LearningRate 0.0178   Epoch: 11   Global Step: 192960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:49,124-Speed 3337.02 samples/sec   Loss 1.2796   LearningRate 0.0178   Epoch: 11   Global Step: 192970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:39:52,196-Speed 3334.29 samples/sec   Loss 1.2711   LearningRate 0.0178   Epoch: 11   Global Step: 192980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:39:55,264-Speed 3338.60 samples/sec   Loss 1.3069   LearningRate 0.0178   Epoch: 11   Global Step: 192990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:39:58,334-Speed 3336.27 samples/sec   Loss 1.3347   LearningRate 0.0178   Epoch: 11   Global Step: 193000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:01,444-Speed 3293.96 samples/sec   Loss 1.3826   LearningRate 0.0178   Epoch: 11   Global Step: 193010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:04,549-Speed 3298.59 samples/sec   Loss 1.2903   LearningRate 0.0178   Epoch: 11   Global Step: 193020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:07,628-Speed 3326.04 samples/sec   Loss 1.3385   LearningRate 0.0178   Epoch: 11   Global Step: 193030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:10,703-Speed 3331.56 samples/sec   Loss 1.3293   LearningRate 0.0178   Epoch: 11   Global Step: 193040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:13,882-Speed 3221.11 samples/sec   Loss 1.3023   LearningRate 0.0178   Epoch: 11   Global Step: 193050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:16,956-Speed 3332.23 samples/sec   Loss 1.3402   LearningRate 0.0178   Epoch: 11   Global Step: 193060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:20,044-Speed 3316.29 samples/sec   Loss 1.3131   LearningRate 0.0178   Epoch: 11   Global Step: 193070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:23,122-Speed 3328.43 samples/sec   Loss 1.3056   LearningRate 0.0178   Epoch: 11   Global Step: 193080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:40:26,197-Speed 3330.17 samples/sec   Loss 1.3265   LearningRate 0.0178   Epoch: 11   Global Step: 193090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:40:29,273-Speed 3329.83 samples/sec   Loss 1.3210   LearningRate 0.0178   Epoch: 11   Global Step: 193100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:40:32,358-Speed 3320.93 samples/sec   Loss 1.3742   LearningRate 0.0178   Epoch: 11   Global Step: 193110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:40:35,416-Speed 3348.91 samples/sec   Loss 1.3531   LearningRate 0.0178   Epoch: 11   Global Step: 193120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:38,498-Speed 3323.45 samples/sec   Loss 1.3501   LearningRate 0.0178   Epoch: 11   Global Step: 193130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:41,574-Speed 3329.14 samples/sec   Loss 1.3193   LearningRate 0.0178   Epoch: 11   Global Step: 193140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:44,647-Speed 3333.07 samples/sec   Loss 1.3410   LearningRate 0.0178   Epoch: 11   Global Step: 193150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:47,752-Speed 3298.43 samples/sec   Loss 1.3916   LearningRate 0.0178   Epoch: 11   Global Step: 193160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:50,834-Speed 3323.69 samples/sec   Loss 1.3154   LearningRate 0.0178   Epoch: 11   Global Step: 193170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:53,908-Speed 3332.31 samples/sec   Loss 1.3725   LearningRate 0.0177   Epoch: 11   Global Step: 193180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:40:56,981-Speed 3332.75 samples/sec   Loss 1.2946   LearningRate 0.0177   Epoch: 11   Global Step: 193190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:41:00,075-Speed 3309.66 samples/sec   Loss 1.3319   LearningRate 0.0177   Epoch: 11   Global Step: 193200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:41:03,148-Speed 3334.77 samples/sec   Loss 1.3504   LearningRate 0.0177   Epoch: 11   Global Step: 193210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:41:06,253-Speed 3298.36 samples/sec   Loss 1.3038   LearningRate 0.0177   Epoch: 11   Global Step: 193220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:09,354-Speed 3302.50 samples/sec   Loss 1.3264   LearningRate 0.0177   Epoch: 11   Global Step: 193230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:12,453-Speed 3305.61 samples/sec   Loss 1.3474   LearningRate 0.0177   Epoch: 11   Global Step: 193240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:15,710-Speed 3144.04 samples/sec   Loss 1.2788   LearningRate 0.0177   Epoch: 11   Global Step: 193250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:18,881-Speed 3230.21 samples/sec   Loss 1.3232   LearningRate 0.0177   Epoch: 11   Global Step: 193260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:21,984-Speed 3300.69 samples/sec   Loss 1.2784   LearningRate 0.0177   Epoch: 11   Global Step: 193270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:25,058-Speed 3332.56 samples/sec   Loss 1.3437   LearningRate 0.0177   Epoch: 11   Global Step: 193280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:28,164-Speed 3297.68 samples/sec   Loss 1.3119   LearningRate 0.0177   Epoch: 11   Global Step: 193290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:31,323-Speed 3242.17 samples/sec   Loss 1.3042   LearningRate 0.0177   Epoch: 11   Global Step: 193300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:34,402-Speed 3325.63 samples/sec   Loss 1.3369   LearningRate 0.0177   Epoch: 11   Global Step: 193310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:37,475-Speed 3333.77 samples/sec   Loss 1.3542   LearningRate 0.0177   Epoch: 11   Global Step: 193320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:40,565-Speed 3314.38 samples/sec   Loss 1.3417   LearningRate 0.0177   Epoch: 11   Global Step: 193330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:43,639-Speed 3332.08 samples/sec   Loss 1.3315   LearningRate 0.0177   Epoch: 11   Global Step: 193340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:46,743-Speed 3299.36 samples/sec   Loss 1.3286   LearningRate 0.0177   Epoch: 11   Global Step: 193350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:49,826-Speed 3322.89 samples/sec   Loss 1.3544   LearningRate 0.0177   Epoch: 11   Global Step: 193360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:52,902-Speed 3329.40 samples/sec   Loss 1.2726   LearningRate 0.0177   Epoch: 11   Global Step: 193370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:55,977-Speed 3330.87 samples/sec   Loss 1.2890   LearningRate 0.0177   Epoch: 11   Global Step: 193380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:41:59,051-Speed 3332.21 samples/sec   Loss 1.3188   LearningRate 0.0177   Epoch: 11   Global Step: 193390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:42:02,136-Speed 3319.79 samples/sec   Loss 1.3122   LearningRate 0.0177   Epoch: 11   Global Step: 193400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:42:05,204-Speed 3338.17 samples/sec   Loss 1.3220   LearningRate 0.0177   Epoch: 11   Global Step: 193410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:08,288-Speed 3321.49 samples/sec   Loss 1.2944   LearningRate 0.0177   Epoch: 11   Global Step: 193420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:11,365-Speed 3327.98 samples/sec   Loss 1.3085   LearningRate 0.0177   Epoch: 11   Global Step: 193430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:14,505-Speed 3262.12 samples/sec   Loss 1.3267   LearningRate 0.0177   Epoch: 11   Global Step: 193440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:17,609-Speed 3300.13 samples/sec   Loss 1.3641   LearningRate 0.0177   Epoch: 11   Global Step: 193450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:20,688-Speed 3326.84 samples/sec   Loss 1.3205   LearningRate 0.0177   Epoch: 11   Global Step: 193460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:23,773-Speed 3319.00 samples/sec   Loss 1.3209   LearningRate 0.0177   Epoch: 11   Global Step: 193470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:26,878-Speed 3298.62 samples/sec   Loss 1.3251   LearningRate 0.0177   Epoch: 11   Global Step: 193480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:30,064-Speed 3215.12 samples/sec   Loss 1.3261   LearningRate 0.0177   Epoch: 11   Global Step: 193490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:33,148-Speed 3320.74 samples/sec   Loss 1.2825   LearningRate 0.0177   Epoch: 11   Global Step: 193500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:42:36,246-Speed 3306.78 samples/sec   Loss 1.3019   LearningRate 0.0177   Epoch: 11   Global Step: 193510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:42:39,428-Speed 3218.10 samples/sec   Loss 1.2997   LearningRate 0.0177   Epoch: 11   Global Step: 193520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:42:42,505-Speed 3328.80 samples/sec   Loss 1.3545   LearningRate 0.0177   Epoch: 11   Global Step: 193530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:42:45,584-Speed 3326.49 samples/sec   Loss 1.3271   LearningRate 0.0177   Epoch: 11   Global Step: 193540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:42:48,684-Speed 3304.92 samples/sec   Loss 1.3062   LearningRate 0.0177   Epoch: 11   Global Step: 193550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:42:51,759-Speed 3330.71 samples/sec   Loss 1.3209   LearningRate 0.0177   Epoch: 11   Global Step: 193560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:42:54,855-Speed 3307.32 samples/sec   Loss 1.3830   LearningRate 0.0177   Epoch: 11   Global Step: 193570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:42:57,950-Speed 3309.87 samples/sec   Loss 1.3163   LearningRate 0.0176   Epoch: 11   Global Step: 193580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:43:01,039-Speed 3315.50 samples/sec   Loss 1.3470   LearningRate 0.0176   Epoch: 11   Global Step: 193590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:43:04,123-Speed 3321.05 samples/sec   Loss 1.3358   LearningRate 0.0176   Epoch: 11   Global Step: 193600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:07,199-Speed 3329.53 samples/sec   Loss 1.3453   LearningRate 0.0176   Epoch: 11   Global Step: 193610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:10,275-Speed 3329.95 samples/sec   Loss 1.2675   LearningRate 0.0176   Epoch: 11   Global Step: 193620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:13,377-Speed 3301.98 samples/sec   Loss 1.3302   LearningRate 0.0176   Epoch: 11   Global Step: 193630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:16,495-Speed 3284.93 samples/sec   Loss 1.2775   LearningRate 0.0176   Epoch: 11   Global Step: 193640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:19,749-Speed 3147.44 samples/sec   Loss 1.3104   LearningRate 0.0176   Epoch: 11   Global Step: 193650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:22,931-Speed 3218.83 samples/sec   Loss 1.3302   LearningRate 0.0176   Epoch: 11   Global Step: 193660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:26,019-Speed 3316.68 samples/sec   Loss 1.3577   LearningRate 0.0176   Epoch: 11   Global Step: 193670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:29,113-Speed 3310.59 samples/sec   Loss 1.3232   LearningRate 0.0176   Epoch: 11   Global Step: 193680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:32,206-Speed 3311.91 samples/sec   Loss 1.3428   LearningRate 0.0176   Epoch: 11   Global Step: 193690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:35,278-Speed 3333.20 samples/sec   Loss 1.2932   LearningRate 0.0176   Epoch: 11   Global Step: 193700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:43:38,368-Speed 3315.15 samples/sec   Loss 1.3567   LearningRate 0.0176   Epoch: 11   Global Step: 193710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:43:41,462-Speed 3310.59 samples/sec   Loss 1.2843   LearningRate 0.0176   Epoch: 11   Global Step: 193720   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:43:44,551-Speed 3315.46 samples/sec   Loss 1.3389   LearningRate 0.0176   Epoch: 11   Global Step: 193730   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:43:47,631-Speed 3326.22 samples/sec   Loss 1.3419   LearningRate 0.0176   Epoch: 11   Global Step: 193740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:50,777-Speed 3255.49 samples/sec   Loss 1.2692   LearningRate 0.0176   Epoch: 11   Global Step: 193750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:53,874-Speed 3306.68 samples/sec   Loss 1.3345   LearningRate 0.0176   Epoch: 11   Global Step: 193760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:43:57,007-Speed 3269.03 samples/sec   Loss 1.2700   LearningRate 0.0176   Epoch: 11   Global Step: 193770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:44:00,121-Speed 3289.55 samples/sec   Loss 1.3357   LearningRate 0.0176   Epoch: 11   Global Step: 193780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:44:03,208-Speed 3317.42 samples/sec   Loss 1.3395   LearningRate 0.0176   Epoch: 11   Global Step: 193790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:44:06,287-Speed 3326.44 samples/sec   Loss 1.3420   LearningRate 0.0176   Epoch: 11   Global Step: 193800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:44:09,367-Speed 3326.32 samples/sec   Loss 1.3374   LearningRate 0.0176   Epoch: 11   Global Step: 193810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:44:12,448-Speed 3324.46 samples/sec   Loss 1.2929   LearningRate 0.0176   Epoch: 11   Global Step: 193820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:44:15,531-Speed 3321.41 samples/sec   Loss 1.3425   LearningRate 0.0176   Epoch: 11   Global Step: 193830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:44:18,637-Speed 3297.87 samples/sec   Loss 1.3152   LearningRate 0.0176   Epoch: 11   Global Step: 193840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:44:21,825-Speed 3212.67 samples/sec   Loss 1.3233   LearningRate 0.0176   Epoch: 11   Global Step: 193850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:44:24,913-Speed 3316.47 samples/sec   Loss 1.3062   LearningRate 0.0176   Epoch: 11   Global Step: 193860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:44:27,997-Speed 3321.30 samples/sec   Loss 1.3110   LearningRate 0.0176   Epoch: 11   Global Step: 193870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:44:31,118-Speed 3281.86 samples/sec   Loss 1.3056   LearningRate 0.0176   Epoch: 11   Global Step: 193880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:44:34,170-Speed 3355.68 samples/sec   Loss 1.3649   LearningRate 0.0176   Epoch: 11   Global Step: 193890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:44:37,326-Speed 3246.06 samples/sec   Loss 1.2624   LearningRate 0.0176   Epoch: 11   Global Step: 193900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:44:40,519-Speed 3206.98 samples/sec   Loss 1.3308   LearningRate 0.0176   Epoch: 11   Global Step: 193910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:44:43,739-Speed 3181.38 samples/sec   Loss 1.3274   LearningRate 0.0176   Epoch: 11   Global Step: 193920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:44:46,926-Speed 3213.79 samples/sec   Loss 1.3325   LearningRate 0.0176   Epoch: 11   Global Step: 193930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:44:50,044-Speed 3283.99 samples/sec   Loss 1.3366   LearningRate 0.0176   Epoch: 11   Global Step: 193940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:44:53,135-Speed 3313.81 samples/sec   Loss 1.3308   LearningRate 0.0176   Epoch: 11   Global Step: 193950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:44:56,222-Speed 3317.75 samples/sec   Loss 1.3327   LearningRate 0.0176   Epoch: 11   Global Step: 193960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:44:59,341-Speed 3284.82 samples/sec   Loss 1.2869   LearningRate 0.0176   Epoch: 11   Global Step: 193970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:45:02,421-Speed 3325.46 samples/sec   Loss 1.3633   LearningRate 0.0175   Epoch: 11   Global Step: 193980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 19:45:05,547-Speed 3276.34 samples/sec   Loss 1.3525   LearningRate 0.0175   Epoch: 11   Global Step: 193990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:45:08,632-Speed 3320.15 samples/sec   Loss 1.3612   LearningRate 0.0175   Epoch: 11   Global Step: 194000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:45:52,100-[lfw][194000]XNorm: 22.154739
Training: 2022-04-11 19:45:52,100-[lfw][194000]Accuracy-Flip: 0.99733+-0.00343
Training: 2022-04-11 19:45:52,101-[lfw][194000]Accuracy-Highest: 0.99817
Training: 2022-04-11 19:46:42,761-[cfp_fp][194000]XNorm: 21.844114
Training: 2022-04-11 19:46:42,762-[cfp_fp][194000]Accuracy-Flip: 0.98900+-0.00367
Training: 2022-04-11 19:46:42,763-[cfp_fp][194000]Accuracy-Highest: 0.98971
Training: 2022-04-11 19:47:26,202-[agedb_30][194000]XNorm: 22.831644
Training: 2022-04-11 19:47:26,203-[agedb_30][194000]Accuracy-Flip: 0.98367+-0.00572
Training: 2022-04-11 19:47:26,203-[agedb_30][194000]Accuracy-Highest: 0.98500
Training: 2022-04-11 19:47:29,285-Speed 72.80 samples/sec   Loss 1.3432   LearningRate 0.0175   Epoch: 11   Global Step: 194010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:47:32,363-Speed 3327.27 samples/sec   Loss 1.3123   LearningRate 0.0175   Epoch: 11   Global Step: 194020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:47:35,428-Speed 3342.00 samples/sec   Loss 1.2958   LearningRate 0.0175   Epoch: 11   Global Step: 194030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:47:38,519-Speed 3312.88 samples/sec   Loss 1.3554   LearningRate 0.0175   Epoch: 11   Global Step: 194040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:47:41,601-Speed 3323.90 samples/sec   Loss 1.3102   LearningRate 0.0175   Epoch: 11   Global Step: 194050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:47:44,808-Speed 3193.57 samples/sec   Loss 1.3381   LearningRate 0.0175   Epoch: 11   Global Step: 194060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:47:47,911-Speed 3300.36 samples/sec   Loss 1.3477   LearningRate 0.0175   Epoch: 11   Global Step: 194070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:47:50,994-Speed 3322.81 samples/sec   Loss 1.3455   LearningRate 0.0175   Epoch: 11   Global Step: 194080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:47:54,074-Speed 3324.63 samples/sec   Loss 1.2654   LearningRate 0.0175   Epoch: 11   Global Step: 194090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:47:57,225-Speed 3250.32 samples/sec   Loss 1.2979   LearningRate 0.0175   Epoch: 11   Global Step: 194100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:00,308-Speed 3323.03 samples/sec   Loss 1.3386   LearningRate 0.0175   Epoch: 11   Global Step: 194110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:03,377-Speed 3336.59 samples/sec   Loss 1.2942   LearningRate 0.0175   Epoch: 11   Global Step: 194120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:06,446-Speed 3337.72 samples/sec   Loss 1.3123   LearningRate 0.0175   Epoch: 11   Global Step: 194130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:09,530-Speed 3320.69 samples/sec   Loss 1.3446   LearningRate 0.0175   Epoch: 11   Global Step: 194140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:12,615-Speed 3320.50 samples/sec   Loss 1.2816   LearningRate 0.0175   Epoch: 11   Global Step: 194150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:15,695-Speed 3324.76 samples/sec   Loss 1.2809   LearningRate 0.0175   Epoch: 11   Global Step: 194160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:18,771-Speed 3329.95 samples/sec   Loss 1.3376   LearningRate 0.0175   Epoch: 11   Global Step: 194170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:21,905-Speed 3268.99 samples/sec   Loss 1.3610   LearningRate 0.0175   Epoch: 11   Global Step: 194180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:25,085-Speed 3220.79 samples/sec   Loss 1.3247   LearningRate 0.0175   Epoch: 11   Global Step: 194190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:48:28,227-Speed 3259.58 samples/sec   Loss 1.3602   LearningRate 0.0175   Epoch: 11   Global Step: 194200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:31,320-Speed 3310.71 samples/sec   Loss 1.3594   LearningRate 0.0175   Epoch: 11   Global Step: 194210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:34,392-Speed 3334.72 samples/sec   Loss 1.3400   LearningRate 0.0175   Epoch: 11   Global Step: 194220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:37,479-Speed 3317.31 samples/sec   Loss 1.3141   LearningRate 0.0175   Epoch: 11   Global Step: 194230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:40,558-Speed 3326.85 samples/sec   Loss 1.3342   LearningRate 0.0175   Epoch: 11   Global Step: 194240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:43,637-Speed 3327.07 samples/sec   Loss 1.3109   LearningRate 0.0175   Epoch: 11   Global Step: 194250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:46,713-Speed 3330.04 samples/sec   Loss 1.3446   LearningRate 0.0175   Epoch: 11   Global Step: 194260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:49,865-Speed 3248.77 samples/sec   Loss 1.3254   LearningRate 0.0175   Epoch: 11   Global Step: 194270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:53,011-Speed 3256.48 samples/sec   Loss 1.2742   LearningRate 0.0175   Epoch: 11   Global Step: 194280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:56,114-Speed 3300.32 samples/sec   Loss 1.2877   LearningRate 0.0175   Epoch: 11   Global Step: 194290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:48:59,248-Speed 3268.25 samples/sec   Loss 1.3851   LearningRate 0.0175   Epoch: 11   Global Step: 194300   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-04-11 19:49:02,369-Speed 3281.49 samples/sec   Loss 1.2755   LearningRate 0.0175   Epoch: 11   Global Step: 194310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:49:05,550-Speed 3219.75 samples/sec   Loss 1.3317   LearningRate 0.0175   Epoch: 11   Global Step: 194320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:49:08,642-Speed 3312.53 samples/sec   Loss 1.3441   LearningRate 0.0175   Epoch: 11   Global Step: 194330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:49:11,721-Speed 3326.39 samples/sec   Loss 1.3457   LearningRate 0.0175   Epoch: 11   Global Step: 194340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:49:14,798-Speed 3328.77 samples/sec   Loss 1.3404   LearningRate 0.0175   Epoch: 11   Global Step: 194350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:49:17,905-Speed 3297.28 samples/sec   Loss 1.3752   LearningRate 0.0175   Epoch: 11   Global Step: 194360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:20,989-Speed 3320.59 samples/sec   Loss 1.3365   LearningRate 0.0175   Epoch: 11   Global Step: 194370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:24,122-Speed 3269.72 samples/sec   Loss 1.3475   LearningRate 0.0174   Epoch: 11   Global Step: 194380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:27,193-Speed 3335.22 samples/sec   Loss 1.3427   LearningRate 0.0174   Epoch: 11   Global Step: 194390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:30,268-Speed 3330.71 samples/sec   Loss 1.3193   LearningRate 0.0174   Epoch: 11   Global Step: 194400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:33,356-Speed 3316.97 samples/sec   Loss 1.3880   LearningRate 0.0174   Epoch: 11   Global Step: 194410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:36,450-Speed 3310.53 samples/sec   Loss 1.3751   LearningRate 0.0174   Epoch: 11   Global Step: 194420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:39,556-Speed 3297.64 samples/sec   Loss 1.3351   LearningRate 0.0174   Epoch: 11   Global Step: 194430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:42,690-Speed 3267.92 samples/sec   Loss 1.3170   LearningRate 0.0174   Epoch: 11   Global Step: 194440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:45,766-Speed 3329.44 samples/sec   Loss 1.3671   LearningRate 0.0174   Epoch: 11   Global Step: 194450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:48,846-Speed 3325.13 samples/sec   Loss 1.3224   LearningRate 0.0174   Epoch: 11   Global Step: 194460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:49:51,925-Speed 3326.70 samples/sec   Loss 1.3720   LearningRate 0.0174   Epoch: 11   Global Step: 194470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:55,010-Speed 3320.42 samples/sec   Loss 1.3086   LearningRate 0.0174   Epoch: 11   Global Step: 194480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:49:58,084-Speed 3331.96 samples/sec   Loss 1.2783   LearningRate 0.0174   Epoch: 11   Global Step: 194490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:01,172-Speed 3316.37 samples/sec   Loss 1.2866   LearningRate 0.0174   Epoch: 11   Global Step: 194500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:04,255-Speed 3321.96 samples/sec   Loss 1.3523   LearningRate 0.0174   Epoch: 11   Global Step: 194510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:07,333-Speed 3327.72 samples/sec   Loss 1.3679   LearningRate 0.0174   Epoch: 11   Global Step: 194520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:10,402-Speed 3337.87 samples/sec   Loss 1.3322   LearningRate 0.0174   Epoch: 11   Global Step: 194530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:13,503-Speed 3302.94 samples/sec   Loss 1.3265   LearningRate 0.0174   Epoch: 11   Global Step: 194540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:16,612-Speed 3294.59 samples/sec   Loss 1.3637   LearningRate 0.0174   Epoch: 11   Global Step: 194550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:19,691-Speed 3326.19 samples/sec   Loss 1.3358   LearningRate 0.0174   Epoch: 11   Global Step: 194560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:22,800-Speed 3293.98 samples/sec   Loss 1.3238   LearningRate 0.0174   Epoch: 11   Global Step: 194570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:50:25,953-Speed 3248.46 samples/sec   Loss 1.4046   LearningRate 0.0174   Epoch: 11   Global Step: 194580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:29,068-Speed 3288.34 samples/sec   Loss 1.3452   LearningRate 0.0174   Epoch: 11   Global Step: 194590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:32,179-Speed 3292.54 samples/sec   Loss 1.3179   LearningRate 0.0174   Epoch: 11   Global Step: 194600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:35,351-Speed 3229.35 samples/sec   Loss 1.3244   LearningRate 0.0174   Epoch: 11   Global Step: 194610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:38,439-Speed 3316.50 samples/sec   Loss 1.3797   LearningRate 0.0174   Epoch: 11   Global Step: 194620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:41,513-Speed 3332.56 samples/sec   Loss 1.3278   LearningRate 0.0174   Epoch: 11   Global Step: 194630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:44,610-Speed 3307.32 samples/sec   Loss 1.3380   LearningRate 0.0174   Epoch: 11   Global Step: 194640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:47,710-Speed 3303.57 samples/sec   Loss 1.3047   LearningRate 0.0174   Epoch: 11   Global Step: 194650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:50,788-Speed 3327.30 samples/sec   Loss 1.3041   LearningRate 0.0174   Epoch: 11   Global Step: 194660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:53,887-Speed 3304.52 samples/sec   Loss 1.3492   LearningRate 0.0174   Epoch: 11   Global Step: 194670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:50:56,980-Speed 3311.93 samples/sec   Loss 1.3471   LearningRate 0.0174   Epoch: 11   Global Step: 194680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:00,072-Speed 3313.29 samples/sec   Loss 1.3322   LearningRate 0.0174   Epoch: 11   Global Step: 194690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:03,144-Speed 3334.17 samples/sec   Loss 1.3895   LearningRate 0.0174   Epoch: 11   Global Step: 194700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:06,299-Speed 3245.89 samples/sec   Loss 1.3469   LearningRate 0.0174   Epoch: 11   Global Step: 194710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:09,373-Speed 3332.58 samples/sec   Loss 1.3499   LearningRate 0.0174   Epoch: 11   Global Step: 194720   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:12,447-Speed 3331.88 samples/sec   Loss 1.4040   LearningRate 0.0174   Epoch: 11   Global Step: 194730   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:15,521-Speed 3331.53 samples/sec   Loss 1.3658   LearningRate 0.0174   Epoch: 11   Global Step: 194740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:18,661-Speed 3261.34 samples/sec   Loss 1.3695   LearningRate 0.0174   Epoch: 11   Global Step: 194750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:21,775-Speed 3289.51 samples/sec   Loss 1.3504   LearningRate 0.0174   Epoch: 11   Global Step: 194760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:24,848-Speed 3332.46 samples/sec   Loss 1.3563   LearningRate 0.0174   Epoch: 11   Global Step: 194770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:27,935-Speed 3318.71 samples/sec   Loss 1.3295   LearningRate 0.0173   Epoch: 11   Global Step: 194780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:31,016-Speed 3324.01 samples/sec   Loss 1.3417   LearningRate 0.0173   Epoch: 11   Global Step: 194790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:34,131-Speed 3288.04 samples/sec   Loss 1.3424   LearningRate 0.0173   Epoch: 11   Global Step: 194800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:37,212-Speed 3324.84 samples/sec   Loss 1.3138   LearningRate 0.0173   Epoch: 11   Global Step: 194810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:40,286-Speed 3332.03 samples/sec   Loss 1.3162   LearningRate 0.0173   Epoch: 11   Global Step: 194820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:43,360-Speed 3332.06 samples/sec   Loss 1.2998   LearningRate 0.0173   Epoch: 11   Global Step: 194830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:51:46,432-Speed 3333.00 samples/sec   Loss 1.3062   LearningRate 0.0173   Epoch: 11   Global Step: 194840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:49,583-Speed 3251.33 samples/sec   Loss 1.3466   LearningRate 0.0173   Epoch: 11   Global Step: 194850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:52,670-Speed 3317.54 samples/sec   Loss 1.3526   LearningRate 0.0173   Epoch: 11   Global Step: 194860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:55,765-Speed 3309.51 samples/sec   Loss 1.3535   LearningRate 0.0173   Epoch: 11   Global Step: 194870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:51:58,831-Speed 3340.74 samples/sec   Loss 1.3653   LearningRate 0.0173   Epoch: 11   Global Step: 194880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:01,918-Speed 3317.51 samples/sec   Loss 1.3068   LearningRate 0.0173   Epoch: 11   Global Step: 194890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:05,028-Speed 3294.03 samples/sec   Loss 1.3572   LearningRate 0.0173   Epoch: 11   Global Step: 194900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:08,100-Speed 3333.77 samples/sec   Loss 1.3326   LearningRate 0.0173   Epoch: 11   Global Step: 194910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:11,183-Speed 3322.62 samples/sec   Loss 1.3237   LearningRate 0.0173   Epoch: 11   Global Step: 194920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:14,270-Speed 3317.44 samples/sec   Loss 1.3431   LearningRate 0.0173   Epoch: 11   Global Step: 194930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:17,352-Speed 3322.66 samples/sec   Loss 1.3204   LearningRate 0.0173   Epoch: 11   Global Step: 194940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:20,428-Speed 3329.86 samples/sec   Loss 1.4025   LearningRate 0.0173   Epoch: 11   Global Step: 194950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:23,519-Speed 3314.27 samples/sec   Loss 1.2848   LearningRate 0.0173   Epoch: 11   Global Step: 194960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:26,590-Speed 3335.14 samples/sec   Loss 1.3287   LearningRate 0.0173   Epoch: 11   Global Step: 194970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:29,661-Speed 3335.31 samples/sec   Loss 1.3458   LearningRate 0.0173   Epoch: 11   Global Step: 194980   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:52:32,736-Speed 3330.28 samples/sec   Loss 1.3457   LearningRate 0.0173   Epoch: 11   Global Step: 194990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:52:35,845-Speed 3294.37 samples/sec   Loss 1.3952   LearningRate 0.0173   Epoch: 11   Global Step: 195000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:52:39,046-Speed 3199.78 samples/sec   Loss 1.3744   LearningRate 0.0173   Epoch: 11   Global Step: 195010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:52:42,232-Speed 3214.59 samples/sec   Loss 1.3542   LearningRate 0.0173   Epoch: 11   Global Step: 195020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:52:45,292-Speed 3347.79 samples/sec   Loss 1.3112   LearningRate 0.0173   Epoch: 11   Global Step: 195030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:48,401-Speed 3294.84 samples/sec   Loss 1.3567   LearningRate 0.0173   Epoch: 11   Global Step: 195040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:51,476-Speed 3330.89 samples/sec   Loss 1.3519   LearningRate 0.0173   Epoch: 11   Global Step: 195050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:54,565-Speed 3315.06 samples/sec   Loss 1.3594   LearningRate 0.0173   Epoch: 11   Global Step: 195060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:52:57,656-Speed 3313.57 samples/sec   Loss 1.3213   LearningRate 0.0173   Epoch: 11   Global Step: 195070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:00,744-Speed 3317.14 samples/sec   Loss 1.3146   LearningRate 0.0173   Epoch: 11   Global Step: 195080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:03,826-Speed 3326.12 samples/sec   Loss 1.3517   LearningRate 0.0173   Epoch: 11   Global Step: 195090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:06,928-Speed 3301.20 samples/sec   Loss 1.3410   LearningRate 0.0173   Epoch: 11   Global Step: 195100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:10,001-Speed 3332.96 samples/sec   Loss 1.3063   LearningRate 0.0173   Epoch: 11   Global Step: 195110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:13,105-Speed 3299.84 samples/sec   Loss 1.2936   LearningRate 0.0173   Epoch: 11   Global Step: 195120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:16,198-Speed 3312.09 samples/sec   Loss 1.3472   LearningRate 0.0173   Epoch: 11   Global Step: 195130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:53:19,275-Speed 3328.64 samples/sec   Loss 1.2941   LearningRate 0.0173   Epoch: 11   Global Step: 195140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:53:22,350-Speed 3330.37 samples/sec   Loss 1.3269   LearningRate 0.0173   Epoch: 11   Global Step: 195150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:53:25,422-Speed 3334.50 samples/sec   Loss 1.3612   LearningRate 0.0173   Epoch: 11   Global Step: 195160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:53:28,500-Speed 3327.39 samples/sec   Loss 1.3369   LearningRate 0.0173   Epoch: 11   Global Step: 195170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:53:31,573-Speed 3332.79 samples/sec   Loss 1.3617   LearningRate 0.0172   Epoch: 11   Global Step: 195180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:53:34,660-Speed 3317.40 samples/sec   Loss 1.3481   LearningRate 0.0172   Epoch: 11   Global Step: 195190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:53:37,759-Speed 3304.84 samples/sec   Loss 1.3003   LearningRate 0.0172   Epoch: 11   Global Step: 195200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:40,956-Speed 3204.50 samples/sec   Loss 1.3433   LearningRate 0.0172   Epoch: 11   Global Step: 195210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:44,113-Speed 3244.39 samples/sec   Loss 1.3481   LearningRate 0.0172   Epoch: 11   Global Step: 195220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:47,281-Speed 3233.41 samples/sec   Loss 1.3199   LearningRate 0.0172   Epoch: 11   Global Step: 195230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:50,372-Speed 3313.34 samples/sec   Loss 1.3583   LearningRate 0.0172   Epoch: 11   Global Step: 195240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:53,488-Speed 3287.04 samples/sec   Loss 1.3243   LearningRate 0.0172   Epoch: 11   Global Step: 195250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:56,609-Speed 3281.68 samples/sec   Loss 1.3365   LearningRate 0.0172   Epoch: 11   Global Step: 195260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:53:59,688-Speed 3325.47 samples/sec   Loss 1.3385   LearningRate 0.0172   Epoch: 11   Global Step: 195270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:02,766-Speed 3327.80 samples/sec   Loss 1.3341   LearningRate 0.0172   Epoch: 11   Global Step: 195280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:05,846-Speed 3325.28 samples/sec   Loss 1.4033   LearningRate 0.0172   Epoch: 11   Global Step: 195290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:08,931-Speed 3321.15 samples/sec   Loss 1.3354   LearningRate 0.0172   Epoch: 11   Global Step: 195300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:54:12,019-Speed 3317.19 samples/sec   Loss 1.2948   LearningRate 0.0172   Epoch: 11   Global Step: 195310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:54:15,142-Speed 3279.39 samples/sec   Loss 1.3492   LearningRate 0.0172   Epoch: 11   Global Step: 195320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:54:18,220-Speed 3326.64 samples/sec   Loss 1.2646   LearningRate 0.0172   Epoch: 11   Global Step: 195330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:54:21,285-Speed 3342.05 samples/sec   Loss 1.3275   LearningRate 0.0172   Epoch: 11   Global Step: 195340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:24,368-Speed 3322.07 samples/sec   Loss 1.3436   LearningRate 0.0172   Epoch: 11   Global Step: 195350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:27,481-Speed 3289.76 samples/sec   Loss 1.3546   LearningRate 0.0172   Epoch: 11   Global Step: 195360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:30,561-Speed 3325.26 samples/sec   Loss 1.3280   LearningRate 0.0172   Epoch: 11   Global Step: 195370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:33,650-Speed 3316.68 samples/sec   Loss 1.3255   LearningRate 0.0172   Epoch: 11   Global Step: 195380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:36,763-Speed 3289.90 samples/sec   Loss 1.3133   LearningRate 0.0172   Epoch: 11   Global Step: 195390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:39,907-Speed 3258.45 samples/sec   Loss 1.3495   LearningRate 0.0172   Epoch: 11   Global Step: 195400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:42,985-Speed 3327.25 samples/sec   Loss 1.3439   LearningRate 0.0172   Epoch: 11   Global Step: 195410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:46,060-Speed 3330.96 samples/sec   Loss 1.3421   LearningRate 0.0172   Epoch: 11   Global Step: 195420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:49,148-Speed 3316.42 samples/sec   Loss 1.3530   LearningRate 0.0172   Epoch: 11   Global Step: 195430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:54:52,224-Speed 3329.49 samples/sec   Loss 1.3451   LearningRate 0.0172   Epoch: 11   Global Step: 195440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:54:55,304-Speed 3325.31 samples/sec   Loss 1.2863   LearningRate 0.0172   Epoch: 11   Global Step: 195450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:54:58,385-Speed 3325.02 samples/sec   Loss 1.3684   LearningRate 0.0172   Epoch: 11   Global Step: 195460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:55:01,467-Speed 3323.09 samples/sec   Loss 1.3010   LearningRate 0.0172   Epoch: 11   Global Step: 195470   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:55:04,547-Speed 3325.59 samples/sec   Loss 1.3386   LearningRate 0.0172   Epoch: 11   Global Step: 195480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:55:07,623-Speed 3329.73 samples/sec   Loss 1.3163   LearningRate 0.0172   Epoch: 11   Global Step: 195490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:55:10,701-Speed 3327.88 samples/sec   Loss 1.3023   LearningRate 0.0172   Epoch: 11   Global Step: 195500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:55:13,797-Speed 3307.99 samples/sec   Loss 1.3330   LearningRate 0.0172   Epoch: 11   Global Step: 195510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:55:16,890-Speed 3310.98 samples/sec   Loss 1.2928   LearningRate 0.0172   Epoch: 11   Global Step: 195520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:55:19,979-Speed 3315.64 samples/sec   Loss 1.3183   LearningRate 0.0172   Epoch: 11   Global Step: 195530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:55:23,058-Speed 3327.12 samples/sec   Loss 1.3512   LearningRate 0.0172   Epoch: 11   Global Step: 195540   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-04-11 19:55:26,127-Speed 3337.56 samples/sec   Loss 1.3620   LearningRate 0.0172   Epoch: 11   Global Step: 195550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:29,213-Speed 3318.32 samples/sec   Loss 1.3179   LearningRate 0.0172   Epoch: 11   Global Step: 195560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:32,296-Speed 3322.30 samples/sec   Loss 1.3091   LearningRate 0.0172   Epoch: 11   Global Step: 195570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:35,435-Speed 3264.01 samples/sec   Loss 1.2912   LearningRate 0.0171   Epoch: 11   Global Step: 195580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:38,529-Speed 3309.69 samples/sec   Loss 1.3087   LearningRate 0.0171   Epoch: 11   Global Step: 195590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:41,623-Speed 3310.91 samples/sec   Loss 1.3514   LearningRate 0.0171   Epoch: 11   Global Step: 195600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:44,737-Speed 3289.03 samples/sec   Loss 1.3649   LearningRate 0.0171   Epoch: 11   Global Step: 195610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:47,902-Speed 3235.28 samples/sec   Loss 1.2966   LearningRate 0.0171   Epoch: 11   Global Step: 195620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:50,992-Speed 3314.43 samples/sec   Loss 1.3399   LearningRate 0.0171   Epoch: 11   Global Step: 195630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:54,070-Speed 3328.44 samples/sec   Loss 1.3304   LearningRate 0.0171   Epoch: 11   Global Step: 195640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:55:57,190-Speed 3282.68 samples/sec   Loss 1.3197   LearningRate 0.0171   Epoch: 11   Global Step: 195650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:00,441-Speed 3150.53 samples/sec   Loss 1.3500   LearningRate 0.0171   Epoch: 11   Global Step: 195660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:03,636-Speed 3206.21 samples/sec   Loss 1.3281   LearningRate 0.0171   Epoch: 11   Global Step: 195670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:06,746-Speed 3292.46 samples/sec   Loss 1.3745   LearningRate 0.0171   Epoch: 11   Global Step: 195680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:09,892-Speed 3256.48 samples/sec   Loss 1.3039   LearningRate 0.0171   Epoch: 11   Global Step: 195690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:12,954-Speed 3344.15 samples/sec   Loss 1.2786   LearningRate 0.0171   Epoch: 11   Global Step: 195700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:16,044-Speed 3315.32 samples/sec   Loss 1.3360   LearningRate 0.0171   Epoch: 11   Global Step: 195710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:19,137-Speed 3311.14 samples/sec   Loss 1.3131   LearningRate 0.0171   Epoch: 11   Global Step: 195720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:22,237-Speed 3304.01 samples/sec   Loss 1.3036   LearningRate 0.0171   Epoch: 11   Global Step: 195730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:25,322-Speed 3320.11 samples/sec   Loss 1.3522   LearningRate 0.0171   Epoch: 11   Global Step: 195740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:28,397-Speed 3330.93 samples/sec   Loss 1.3621   LearningRate 0.0171   Epoch: 11   Global Step: 195750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:31,543-Speed 3255.21 samples/sec   Loss 1.3455   LearningRate 0.0171   Epoch: 11   Global Step: 195760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:34,649-Speed 3298.22 samples/sec   Loss 1.3356   LearningRate 0.0171   Epoch: 11   Global Step: 195770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:37,771-Speed 3280.08 samples/sec   Loss 1.3227   LearningRate 0.0171   Epoch: 11   Global Step: 195780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:40,862-Speed 3313.45 samples/sec   Loss 1.3240   LearningRate 0.0171   Epoch: 11   Global Step: 195790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:56:43,964-Speed 3302.57 samples/sec   Loss 1.3058   LearningRate 0.0171   Epoch: 11   Global Step: 195800   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:47,066-Speed 3301.84 samples/sec   Loss 1.2870   LearningRate 0.0171   Epoch: 11   Global Step: 195810   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:50,149-Speed 3321.59 samples/sec   Loss 1.3050   LearningRate 0.0171   Epoch: 11   Global Step: 195820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:53,236-Speed 3318.44 samples/sec   Loss 1.2774   LearningRate 0.0171   Epoch: 11   Global Step: 195830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:56,313-Speed 3328.01 samples/sec   Loss 1.3006   LearningRate 0.0171   Epoch: 11   Global Step: 195840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 19:56:59,384-Speed 3335.73 samples/sec   Loss 1.3421   LearningRate 0.0171   Epoch: 11   Global Step: 195850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:02,498-Speed 3289.47 samples/sec   Loss 1.3068   LearningRate 0.0171   Epoch: 11   Global Step: 195860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:05,575-Speed 3328.03 samples/sec   Loss 1.2983   LearningRate 0.0171   Epoch: 11   Global Step: 195870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:08,732-Speed 3244.99 samples/sec   Loss 1.3262   LearningRate 0.0171   Epoch: 11   Global Step: 195880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:11,809-Speed 3327.99 samples/sec   Loss 1.2814   LearningRate 0.0171   Epoch: 11   Global Step: 195890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:14,910-Speed 3303.54 samples/sec   Loss 1.3741   LearningRate 0.0171   Epoch: 11   Global Step: 195900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:17,985-Speed 3330.98 samples/sec   Loss 1.3023   LearningRate 0.0171   Epoch: 11   Global Step: 195910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:21,065-Speed 3325.59 samples/sec   Loss 1.3478   LearningRate 0.0171   Epoch: 11   Global Step: 195920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:24,144-Speed 3325.92 samples/sec   Loss 1.3673   LearningRate 0.0171   Epoch: 11   Global Step: 195930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:27,259-Speed 3287.93 samples/sec   Loss 1.3111   LearningRate 0.0171   Epoch: 11   Global Step: 195940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:30,330-Speed 3335.18 samples/sec   Loss 1.3385   LearningRate 0.0171   Epoch: 11   Global Step: 195950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:33,402-Speed 3333.69 samples/sec   Loss 1.3630   LearningRate 0.0171   Epoch: 11   Global Step: 195960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:36,500-Speed 3306.75 samples/sec   Loss 1.3020   LearningRate 0.0171   Epoch: 11   Global Step: 195970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:39,742-Speed 3159.06 samples/sec   Loss 1.3194   LearningRate 0.0171   Epoch: 11   Global Step: 195980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:42,816-Speed 3331.79 samples/sec   Loss 1.3498   LearningRate 0.0170   Epoch: 11   Global Step: 195990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:57:45,893-Speed 3329.53 samples/sec   Loss 1.3464   LearningRate 0.0170   Epoch: 11   Global Step: 196000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 19:58:29,350-[lfw][196000]XNorm: 23.225161
Training: 2022-04-11 19:58:29,351-[lfw][196000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 19:58:29,351-[lfw][196000]Accuracy-Highest: 0.99817
Training: 2022-04-11 19:59:20,053-[cfp_fp][196000]XNorm: 22.931134
Training: 2022-04-11 19:59:20,054-[cfp_fp][196000]Accuracy-Flip: 0.98986+-0.00358
Training: 2022-04-11 19:59:20,054-[cfp_fp][196000]Accuracy-Highest: 0.98986
Training: 2022-04-11 20:00:03,817-[agedb_30][196000]XNorm: 23.861441
Training: 2022-04-11 20:00:03,818-[agedb_30][196000]Accuracy-Flip: 0.98367+-0.00645
Training: 2022-04-11 20:00:03,818-[agedb_30][196000]Accuracy-Highest: 0.98500
Training: 2022-04-11 20:00:06,891-Speed 72.63 samples/sec   Loss 1.3727   LearningRate 0.0170   Epoch: 11   Global Step: 196010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:00:09,957-Speed 3341.16 samples/sec   Loss 1.3559   LearningRate 0.0170   Epoch: 11   Global Step: 196020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:00:13,021-Speed 3342.73 samples/sec   Loss 1.3424   LearningRate 0.0170   Epoch: 11   Global Step: 196030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:00:16,084-Speed 3343.76 samples/sec   Loss 1.3691   LearningRate 0.0170   Epoch: 11   Global Step: 196040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:00:19,148-Speed 3343.09 samples/sec   Loss 1.3059   LearningRate 0.0170   Epoch: 11   Global Step: 196050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:22,221-Speed 3332.21 samples/sec   Loss 1.3872   LearningRate 0.0170   Epoch: 11   Global Step: 196060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:25,296-Speed 3331.51 samples/sec   Loss 1.3611   LearningRate 0.0170   Epoch: 11   Global Step: 196070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:28,388-Speed 3312.10 samples/sec   Loss 1.3530   LearningRate 0.0170   Epoch: 11   Global Step: 196080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:31,569-Speed 3219.74 samples/sec   Loss 1.3533   LearningRate 0.0170   Epoch: 11   Global Step: 196090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:34,681-Speed 3291.71 samples/sec   Loss 1.3650   LearningRate 0.0170   Epoch: 11   Global Step: 196100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:37,769-Speed 3316.50 samples/sec   Loss 1.3255   LearningRate 0.0170   Epoch: 11   Global Step: 196110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:40,868-Speed 3305.54 samples/sec   Loss 1.3239   LearningRate 0.0170   Epoch: 11   Global Step: 196120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:43,969-Speed 3302.35 samples/sec   Loss 1.3033   LearningRate 0.0170   Epoch: 11   Global Step: 196130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:47,077-Speed 3296.30 samples/sec   Loss 1.3351   LearningRate 0.0170   Epoch: 11   Global Step: 196140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:50,156-Speed 3325.97 samples/sec   Loss 1.3639   LearningRate 0.0170   Epoch: 11   Global Step: 196150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:53,235-Speed 3325.96 samples/sec   Loss 1.3197   LearningRate 0.0170   Epoch: 11   Global Step: 196160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:56,317-Speed 3323.80 samples/sec   Loss 1.3217   LearningRate 0.0170   Epoch: 11   Global Step: 196170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:00:59,395-Speed 3328.05 samples/sec   Loss 1.3373   LearningRate 0.0170   Epoch: 11   Global Step: 196180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:02,502-Speed 3296.57 samples/sec   Loss 1.3392   LearningRate 0.0170   Epoch: 11   Global Step: 196190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:05,653-Speed 3250.80 samples/sec   Loss 1.3421   LearningRate 0.0170   Epoch: 11   Global Step: 196200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:08,772-Speed 3283.02 samples/sec   Loss 1.3356   LearningRate 0.0170   Epoch: 11   Global Step: 196210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:11,920-Speed 3253.62 samples/sec   Loss 1.3538   LearningRate 0.0170   Epoch: 11   Global Step: 196220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:15,028-Speed 3296.07 samples/sec   Loss 1.3507   LearningRate 0.0170   Epoch: 11   Global Step: 196230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:18,104-Speed 3329.32 samples/sec   Loss 1.3159   LearningRate 0.0170   Epoch: 11   Global Step: 196240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:21,199-Speed 3309.82 samples/sec   Loss 1.3225   LearningRate 0.0170   Epoch: 11   Global Step: 196250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:24,278-Speed 3326.20 samples/sec   Loss 1.3036   LearningRate 0.0170   Epoch: 11   Global Step: 196260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:27,477-Speed 3201.78 samples/sec   Loss 1.3572   LearningRate 0.0170   Epoch: 11   Global Step: 196270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:30,549-Speed 3334.04 samples/sec   Loss 1.3618   LearningRate 0.0170   Epoch: 11   Global Step: 196280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:01:33,641-Speed 3313.22 samples/sec   Loss 1.3528   LearningRate 0.0170   Epoch: 11   Global Step: 196290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:01:36,720-Speed 3325.86 samples/sec   Loss 1.3200   LearningRate 0.0170   Epoch: 11   Global Step: 196300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:01:39,940-Speed 3180.93 samples/sec   Loss 1.3219   LearningRate 0.0170   Epoch: 11   Global Step: 196310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:01:43,057-Speed 3286.14 samples/sec   Loss 1.2815   LearningRate 0.0170   Epoch: 11   Global Step: 196320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:46,178-Speed 3281.63 samples/sec   Loss 1.3348   LearningRate 0.0170   Epoch: 11   Global Step: 196330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:49,259-Speed 3323.96 samples/sec   Loss 1.3174   LearningRate 0.0170   Epoch: 11   Global Step: 196340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:52,451-Speed 3209.20 samples/sec   Loss 1.3222   LearningRate 0.0170   Epoch: 11   Global Step: 196350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:55,527-Speed 3330.25 samples/sec   Loss 1.3609   LearningRate 0.0170   Epoch: 11   Global Step: 196360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:01:58,609-Speed 3323.73 samples/sec   Loss 1.3228   LearningRate 0.0170   Epoch: 11   Global Step: 196370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:01,714-Speed 3298.09 samples/sec   Loss 1.3299   LearningRate 0.0170   Epoch: 11   Global Step: 196380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:04,815-Speed 3303.05 samples/sec   Loss 1.3380   LearningRate 0.0169   Epoch: 11   Global Step: 196390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:07,966-Speed 3251.28 samples/sec   Loss 1.3126   LearningRate 0.0169   Epoch: 11   Global Step: 196400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:11,107-Speed 3260.00 samples/sec   Loss 1.3403   LearningRate 0.0169   Epoch: 11   Global Step: 196410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:14,253-Speed 3256.24 samples/sec   Loss 1.3470   LearningRate 0.0169   Epoch: 11   Global Step: 196420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:02:17,348-Speed 3308.70 samples/sec   Loss 1.3359   LearningRate 0.0169   Epoch: 11   Global Step: 196430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:02:20,409-Speed 3346.55 samples/sec   Loss 1.4254   LearningRate 0.0169   Epoch: 11   Global Step: 196440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:23,557-Speed 3253.55 samples/sec   Loss 1.3950   LearningRate 0.0169   Epoch: 11   Global Step: 196450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:26,642-Speed 3320.24 samples/sec   Loss 1.3432   LearningRate 0.0169   Epoch: 11   Global Step: 196460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:29,724-Speed 3323.44 samples/sec   Loss 1.3420   LearningRate 0.0169   Epoch: 11   Global Step: 196470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:32,819-Speed 3309.51 samples/sec   Loss 1.3426   LearningRate 0.0169   Epoch: 11   Global Step: 196480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:35,918-Speed 3305.02 samples/sec   Loss 1.3435   LearningRate 0.0169   Epoch: 11   Global Step: 196490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:39,021-Speed 3300.57 samples/sec   Loss 1.3340   LearningRate 0.0169   Epoch: 11   Global Step: 196500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:42,121-Speed 3303.59 samples/sec   Loss 1.3361   LearningRate 0.0169   Epoch: 11   Global Step: 196510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:45,197-Speed 3330.27 samples/sec   Loss 1.3619   LearningRate 0.0169   Epoch: 11   Global Step: 196520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:48,277-Speed 3324.71 samples/sec   Loss 1.3399   LearningRate 0.0169   Epoch: 11   Global Step: 196530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:02:51,355-Speed 3327.51 samples/sec   Loss 1.3382   LearningRate 0.0169   Epoch: 11   Global Step: 196540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:02:54,430-Speed 3331.72 samples/sec   Loss 1.2914   LearningRate 0.0169   Epoch: 11   Global Step: 196550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:02:57,565-Speed 3267.29 samples/sec   Loss 1.2920   LearningRate 0.0169   Epoch: 11   Global Step: 196560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:03:00,651-Speed 3318.76 samples/sec   Loss 1.3552   LearningRate 0.0169   Epoch: 11   Global Step: 196570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 20:03:03,822-Speed 3229.86 samples/sec   Loss 1.3247   LearningRate 0.0169   Epoch: 11   Global Step: 196580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:06,925-Speed 3300.84 samples/sec   Loss 1.2366   LearningRate 0.0169   Epoch: 11   Global Step: 196590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:09,995-Speed 3335.73 samples/sec   Loss 1.2848   LearningRate 0.0169   Epoch: 11   Global Step: 196600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:13,093-Speed 3305.87 samples/sec   Loss 1.3044   LearningRate 0.0169   Epoch: 11   Global Step: 196610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:16,172-Speed 3326.76 samples/sec   Loss 1.4111   LearningRate 0.0169   Epoch: 11   Global Step: 196620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:19,247-Speed 3331.72 samples/sec   Loss 1.3279   LearningRate 0.0169   Epoch: 11   Global Step: 196630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:22,323-Speed 3329.32 samples/sec   Loss 1.2967   LearningRate 0.0169   Epoch: 11   Global Step: 196640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:25,502-Speed 3222.22 samples/sec   Loss 1.3391   LearningRate 0.0169   Epoch: 11   Global Step: 196650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:28,628-Speed 3276.86 samples/sec   Loss 1.3505   LearningRate 0.0169   Epoch: 11   Global Step: 196660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:31,715-Speed 3317.10 samples/sec   Loss 1.4230   LearningRate 0.0169   Epoch: 11   Global Step: 196670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:34,788-Speed 3333.73 samples/sec   Loss 1.3095   LearningRate 0.0169   Epoch: 11   Global Step: 196680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 20:03:37,845-Speed 3350.82 samples/sec   Loss 1.3613   LearningRate 0.0169   Epoch: 11   Global Step: 196690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 20:03:40,925-Speed 3324.71 samples/sec   Loss 1.3377   LearningRate 0.0169   Epoch: 11   Global Step: 196700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 20:03:44,125-Speed 3201.06 samples/sec   Loss 1.3241   LearningRate 0.0169   Epoch: 11   Global Step: 196710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 20:03:47,292-Speed 3234.43 samples/sec   Loss 1.3631   LearningRate 0.0169   Epoch: 11   Global Step: 196720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 20:03:50,364-Speed 3333.84 samples/sec   Loss 1.3546   LearningRate 0.0169   Epoch: 11   Global Step: 196730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-11 20:03:53,438-Speed 3331.48 samples/sec   Loss 1.3576   LearningRate 0.0169   Epoch: 11   Global Step: 196740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:03:56,523-Speed 3320.06 samples/sec   Loss 1.3203   LearningRate 0.0169   Epoch: 11   Global Step: 196750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:03:59,593-Speed 3336.55 samples/sec   Loss 1.3392   LearningRate 0.0169   Epoch: 11   Global Step: 196760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:04:02,675-Speed 3323.58 samples/sec   Loss 1.3451   LearningRate 0.0169   Epoch: 11   Global Step: 196770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:04:05,783-Speed 3295.30 samples/sec   Loss 1.3656   LearningRate 0.0169   Epoch: 11   Global Step: 196780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:04:08,858-Speed 3330.61 samples/sec   Loss 1.3058   LearningRate 0.0169   Epoch: 11   Global Step: 196790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:11,939-Speed 3324.10 samples/sec   Loss 1.3445   LearningRate 0.0168   Epoch: 11   Global Step: 196800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:15,020-Speed 3325.51 samples/sec   Loss 1.3028   LearningRate 0.0168   Epoch: 11   Global Step: 196810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:18,107-Speed 3317.28 samples/sec   Loss 1.4247   LearningRate 0.0168   Epoch: 11   Global Step: 196820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:21,183-Speed 3329.71 samples/sec   Loss 1.3347   LearningRate 0.0168   Epoch: 11   Global Step: 196830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:24,257-Speed 3332.51 samples/sec   Loss 1.3598   LearningRate 0.0168   Epoch: 11   Global Step: 196840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:27,331-Speed 3331.30 samples/sec   Loss 1.3621   LearningRate 0.0168   Epoch: 11   Global Step: 196850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:30,411-Speed 3325.76 samples/sec   Loss 1.3680   LearningRate 0.0168   Epoch: 11   Global Step: 196860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:33,510-Speed 3304.83 samples/sec   Loss 1.3017   LearningRate 0.0168   Epoch: 11   Global Step: 196870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:36,582-Speed 3334.48 samples/sec   Loss 1.3056   LearningRate 0.0168   Epoch: 11   Global Step: 196880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:39,649-Speed 3339.45 samples/sec   Loss 1.3480   LearningRate 0.0168   Epoch: 11   Global Step: 196890   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:04:42,717-Speed 3339.16 samples/sec   Loss 1.3733   LearningRate 0.0168   Epoch: 11   Global Step: 196900   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:04:45,790-Speed 3332.63 samples/sec   Loss 1.2888   LearningRate 0.0168   Epoch: 11   Global Step: 196910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:04:48,940-Speed 3252.09 samples/sec   Loss 1.3991   LearningRate 0.0168   Epoch: 11   Global Step: 196920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:04:51,998-Speed 3348.57 samples/sec   Loss 1.3606   LearningRate 0.0168   Epoch: 11   Global Step: 196930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:55,071-Speed 3333.25 samples/sec   Loss 1.3215   LearningRate 0.0168   Epoch: 11   Global Step: 196940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:04:58,144-Speed 3333.52 samples/sec   Loss 1.3843   LearningRate 0.0168   Epoch: 11   Global Step: 196950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:01,231-Speed 3317.67 samples/sec   Loss 1.4072   LearningRate 0.0168   Epoch: 11   Global Step: 196960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:04,318-Speed 3317.42 samples/sec   Loss 1.3187   LearningRate 0.0168   Epoch: 11   Global Step: 196970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:07,401-Speed 3322.58 samples/sec   Loss 1.3929   LearningRate 0.0168   Epoch: 11   Global Step: 196980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:10,484-Speed 3322.11 samples/sec   Loss 1.3575   LearningRate 0.0168   Epoch: 11   Global Step: 196990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:13,584-Speed 3304.07 samples/sec   Loss 1.3745   LearningRate 0.0168   Epoch: 11   Global Step: 197000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:16,659-Speed 3331.52 samples/sec   Loss 1.3260   LearningRate 0.0168   Epoch: 11   Global Step: 197010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:19,729-Speed 3335.60 samples/sec   Loss 1.3259   LearningRate 0.0168   Epoch: 11   Global Step: 197020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:22,790-Speed 3346.68 samples/sec   Loss 1.4018   LearningRate 0.0168   Epoch: 11   Global Step: 197030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:25,870-Speed 3325.32 samples/sec   Loss 1.4108   LearningRate 0.0168   Epoch: 11   Global Step: 197040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:28,973-Speed 3300.41 samples/sec   Loss 1.3750   LearningRate 0.0168   Epoch: 11   Global Step: 197050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:32,059-Speed 3318.67 samples/sec   Loss 1.3791   LearningRate 0.0168   Epoch: 11   Global Step: 197060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:35,130-Speed 3335.62 samples/sec   Loss 1.3324   LearningRate 0.0168   Epoch: 11   Global Step: 197070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:38,964-Speed 2671.24 samples/sec   Loss 1.3901   LearningRate 0.0168   Epoch: 11   Global Step: 197080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:42,047-Speed 3322.64 samples/sec   Loss 1.3569   LearningRate 0.0168   Epoch: 11   Global Step: 197090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:45,116-Speed 3337.60 samples/sec   Loss 1.4009   LearningRate 0.0168   Epoch: 11   Global Step: 197100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:48,189-Speed 3332.26 samples/sec   Loss 1.2932   LearningRate 0.0168   Epoch: 11   Global Step: 197110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:51,265-Speed 3329.68 samples/sec   Loss 1.3032   LearningRate 0.0168   Epoch: 11   Global Step: 197120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:05:54,347-Speed 3323.17 samples/sec   Loss 1.4042   LearningRate 0.0168   Epoch: 11   Global Step: 197130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:05:57,431-Speed 3321.27 samples/sec   Loss 1.3441   LearningRate 0.0168   Epoch: 11   Global Step: 197140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:06:00,503-Speed 3334.37 samples/sec   Loss 1.3114   LearningRate 0.0168   Epoch: 11   Global Step: 197150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:06:03,571-Speed 3339.02 samples/sec   Loss 1.3329   LearningRate 0.0168   Epoch: 11   Global Step: 197160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:06:06,657-Speed 3318.54 samples/sec   Loss 1.3403   LearningRate 0.0168   Epoch: 11   Global Step: 197170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:06:09,734-Speed 3329.35 samples/sec   Loss 1.3424   LearningRate 0.0168   Epoch: 11   Global Step: 197180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:06:12,793-Speed 3347.42 samples/sec   Loss 1.3399   LearningRate 0.0168   Epoch: 11   Global Step: 197190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:15,866-Speed 3333.57 samples/sec   Loss 1.2739   LearningRate 0.0167   Epoch: 11   Global Step: 197200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:18,950-Speed 3321.31 samples/sec   Loss 1.3716   LearningRate 0.0167   Epoch: 11   Global Step: 197210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:22,042-Speed 3312.57 samples/sec   Loss 1.3266   LearningRate 0.0167   Epoch: 11   Global Step: 197220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:25,130-Speed 3316.45 samples/sec   Loss 1.3426   LearningRate 0.0167   Epoch: 11   Global Step: 197230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:28,316-Speed 3214.64 samples/sec   Loss 1.3486   LearningRate 0.0167   Epoch: 11   Global Step: 197240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:31,399-Speed 3322.42 samples/sec   Loss 1.3551   LearningRate 0.0167   Epoch: 11   Global Step: 197250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:34,561-Speed 3239.55 samples/sec   Loss 1.3155   LearningRate 0.0167   Epoch: 11   Global Step: 197260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:37,712-Speed 3250.86 samples/sec   Loss 1.3288   LearningRate 0.0167   Epoch: 11   Global Step: 197270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:40,895-Speed 3217.65 samples/sec   Loss 1.3450   LearningRate 0.0167   Epoch: 11   Global Step: 197280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:43,969-Speed 3331.19 samples/sec   Loss 1.3892   LearningRate 0.0167   Epoch: 11   Global Step: 197290   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:06:47,052-Speed 3323.17 samples/sec   Loss 1.3649   LearningRate 0.0167   Epoch: 11   Global Step: 197300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:06:50,134-Speed 3323.08 samples/sec   Loss 1.3864   LearningRate 0.0167   Epoch: 11   Global Step: 197310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:06:53,190-Speed 3351.04 samples/sec   Loss 1.3975   LearningRate 0.0167   Epoch: 11   Global Step: 197320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:56,270-Speed 3325.15 samples/sec   Loss 1.3391   LearningRate 0.0167   Epoch: 11   Global Step: 197330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:06:59,348-Speed 3328.00 samples/sec   Loss 1.3932   LearningRate 0.0167   Epoch: 11   Global Step: 197340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:02,420-Speed 3334.80 samples/sec   Loss 1.3077   LearningRate 0.0167   Epoch: 11   Global Step: 197350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:05,529-Speed 3294.83 samples/sec   Loss 1.3497   LearningRate 0.0167   Epoch: 11   Global Step: 197360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:08,622-Speed 3311.40 samples/sec   Loss 1.3530   LearningRate 0.0167   Epoch: 11   Global Step: 197370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:11,846-Speed 3176.64 samples/sec   Loss 1.3568   LearningRate 0.0167   Epoch: 11   Global Step: 197380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:15,087-Speed 3160.11 samples/sec   Loss 1.3163   LearningRate 0.0167   Epoch: 11   Global Step: 197390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:18,319-Speed 3168.51 samples/sec   Loss 1.3609   LearningRate 0.0167   Epoch: 11   Global Step: 197400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:21,573-Speed 3147.75 samples/sec   Loss 1.3067   LearningRate 0.0167   Epoch: 11   Global Step: 197410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:24,741-Speed 3233.74 samples/sec   Loss 1.3630   LearningRate 0.0167   Epoch: 11   Global Step: 197420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:07:27,819-Speed 3327.57 samples/sec   Loss 1.3527   LearningRate 0.0167   Epoch: 11   Global Step: 197430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:07:31,037-Speed 3182.50 samples/sec   Loss 1.3683   LearningRate 0.0167   Epoch: 11   Global Step: 197440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:34,249-Speed 3189.29 samples/sec   Loss 1.3620   LearningRate 0.0167   Epoch: 11   Global Step: 197450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:37,532-Speed 3119.73 samples/sec   Loss 1.3591   LearningRate 0.0167   Epoch: 11   Global Step: 197460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:40,672-Speed 3261.06 samples/sec   Loss 1.3526   LearningRate 0.0167   Epoch: 11   Global Step: 197470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:43,847-Speed 3226.21 samples/sec   Loss 1.3647   LearningRate 0.0167   Epoch: 11   Global Step: 197480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:47,076-Speed 3171.60 samples/sec   Loss 1.3188   LearningRate 0.0167   Epoch: 11   Global Step: 197490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:50,310-Speed 3168.21 samples/sec   Loss 1.3157   LearningRate 0.0167   Epoch: 11   Global Step: 197500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:53,402-Speed 3311.65 samples/sec   Loss 1.3368   LearningRate 0.0167   Epoch: 11   Global Step: 197510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:56,482-Speed 3325.73 samples/sec   Loss 1.2766   LearningRate 0.0167   Epoch: 11   Global Step: 197520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:07:59,569-Speed 3317.96 samples/sec   Loss 1.3462   LearningRate 0.0167   Epoch: 11   Global Step: 197530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:02,645-Speed 3330.12 samples/sec   Loss 1.3802   LearningRate 0.0167   Epoch: 11   Global Step: 197540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:08:05,719-Speed 3331.92 samples/sec   Loss 1.3295   LearningRate 0.0167   Epoch: 11   Global Step: 197550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:08:08,811-Speed 3312.05 samples/sec   Loss 1.3366   LearningRate 0.0167   Epoch: 11   Global Step: 197560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:08:11,889-Speed 3327.87 samples/sec   Loss 1.3356   LearningRate 0.0167   Epoch: 11   Global Step: 197570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:08:14,966-Speed 3328.77 samples/sec   Loss 1.3493   LearningRate 0.0167   Epoch: 11   Global Step: 197580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:08:18,061-Speed 3309.68 samples/sec   Loss 1.3362   LearningRate 0.0167   Epoch: 11   Global Step: 197590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:08:21,125-Speed 3342.59 samples/sec   Loss 1.3091   LearningRate 0.0167   Epoch: 11   Global Step: 197600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:24,198-Speed 3332.41 samples/sec   Loss 1.3469   LearningRate 0.0166   Epoch: 11   Global Step: 197610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:27,291-Speed 3312.40 samples/sec   Loss 1.3289   LearningRate 0.0166   Epoch: 11   Global Step: 197620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:30,464-Speed 3227.75 samples/sec   Loss 1.3381   LearningRate 0.0166   Epoch: 11   Global Step: 197630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:33,556-Speed 3312.16 samples/sec   Loss 1.3330   LearningRate 0.0166   Epoch: 11   Global Step: 197640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:36,659-Speed 3300.76 samples/sec   Loss 1.3368   LearningRate 0.0166   Epoch: 11   Global Step: 197650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:39,739-Speed 3325.65 samples/sec   Loss 1.3286   LearningRate 0.0166   Epoch: 11   Global Step: 197660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:42,814-Speed 3330.91 samples/sec   Loss 1.2903   LearningRate 0.0166   Epoch: 11   Global Step: 197670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:45,901-Speed 3318.11 samples/sec   Loss 1.3638   LearningRate 0.0166   Epoch: 11   Global Step: 197680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:48,995-Speed 3310.21 samples/sec   Loss 1.3142   LearningRate 0.0166   Epoch: 11   Global Step: 197690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:08:52,084-Speed 3315.83 samples/sec   Loss 1.3502   LearningRate 0.0166   Epoch: 11   Global Step: 197700   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:08:55,195-Speed 3292.27 samples/sec   Loss 1.3222   LearningRate 0.0166   Epoch: 11   Global Step: 197710   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:08:58,259-Speed 3342.99 samples/sec   Loss 1.3116   LearningRate 0.0166   Epoch: 11   Global Step: 197720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:01,335-Speed 3329.32 samples/sec   Loss 1.3438   LearningRate 0.0166   Epoch: 11   Global Step: 197730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:04,434-Speed 3305.34 samples/sec   Loss 1.3478   LearningRate 0.0166   Epoch: 11   Global Step: 197740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:07,506-Speed 3333.34 samples/sec   Loss 1.3620   LearningRate 0.0166   Epoch: 11   Global Step: 197750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:10,594-Speed 3317.40 samples/sec   Loss 1.3553   LearningRate 0.0166   Epoch: 11   Global Step: 197760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:13,689-Speed 3308.80 samples/sec   Loss 1.2930   LearningRate 0.0166   Epoch: 11   Global Step: 197770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:16,766-Speed 3329.19 samples/sec   Loss 1.3900   LearningRate 0.0166   Epoch: 11   Global Step: 197780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:19,875-Speed 3294.11 samples/sec   Loss 1.3680   LearningRate 0.0166   Epoch: 11   Global Step: 197790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:22,950-Speed 3331.04 samples/sec   Loss 1.3825   LearningRate 0.0166   Epoch: 11   Global Step: 197800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:26,037-Speed 3317.53 samples/sec   Loss 1.3463   LearningRate 0.0166   Epoch: 11   Global Step: 197810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:29,165-Speed 3275.10 samples/sec   Loss 1.3530   LearningRate 0.0166   Epoch: 11   Global Step: 197820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:09:32,255-Speed 3313.73 samples/sec   Loss 1.3117   LearningRate 0.0166   Epoch: 11   Global Step: 197830   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:09:35,345-Speed 3314.65 samples/sec   Loss 1.2996   LearningRate 0.0166   Epoch: 11   Global Step: 197840   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:09:38,467-Speed 3281.50 samples/sec   Loss 1.3140   LearningRate 0.0166   Epoch: 11   Global Step: 197850   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:09:41,588-Speed 3281.61 samples/sec   Loss 1.3359   LearningRate 0.0166   Epoch: 11   Global Step: 197860   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:09:44,713-Speed 3278.14 samples/sec   Loss 1.3678   LearningRate 0.0166   Epoch: 11   Global Step: 197870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:47,785-Speed 3333.80 samples/sec   Loss 1.3146   LearningRate 0.0166   Epoch: 11   Global Step: 197880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:50,912-Speed 3275.51 samples/sec   Loss 1.3408   LearningRate 0.0166   Epoch: 11   Global Step: 197890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:54,023-Speed 3292.26 samples/sec   Loss 1.3725   LearningRate 0.0166   Epoch: 11   Global Step: 197900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:09:57,097-Speed 3331.56 samples/sec   Loss 1.3001   LearningRate 0.0166   Epoch: 11   Global Step: 197910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:10:00,182-Speed 3320.15 samples/sec   Loss 1.3550   LearningRate 0.0166   Epoch: 11   Global Step: 197920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:10:03,267-Speed 3319.99 samples/sec   Loss 1.3708   LearningRate 0.0166   Epoch: 11   Global Step: 197930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:10:06,357-Speed 3315.60 samples/sec   Loss 1.3650   LearningRate 0.0166   Epoch: 11   Global Step: 197940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:10:09,433-Speed 3330.36 samples/sec   Loss 1.3428   LearningRate 0.0166   Epoch: 11   Global Step: 197950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:10:12,508-Speed 3329.97 samples/sec   Loss 1.3398   LearningRate 0.0166   Epoch: 11   Global Step: 197960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:10:15,590-Speed 3323.28 samples/sec   Loss 1.3090   LearningRate 0.0166   Epoch: 11   Global Step: 197970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:10:18,673-Speed 3322.35 samples/sec   Loss 1.3151   LearningRate 0.0166   Epoch: 11   Global Step: 197980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:10:21,761-Speed 3317.34 samples/sec   Loss 1.3772   LearningRate 0.0166   Epoch: 11   Global Step: 197990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:10:24,934-Speed 3227.91 samples/sec   Loss 1.3665   LearningRate 0.0166   Epoch: 11   Global Step: 198000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:11:09,340-[lfw][198000]XNorm: 21.941668
Training: 2022-04-11 20:11:09,341-[lfw][198000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-11 20:11:09,341-[lfw][198000]Accuracy-Highest: 0.99817
Training: 2022-04-11 20:12:00,774-[cfp_fp][198000]XNorm: 22.147303
Training: 2022-04-11 20:12:00,775-[cfp_fp][198000]Accuracy-Flip: 0.99029+-0.00437
Training: 2022-04-11 20:12:00,775-[cfp_fp][198000]Accuracy-Highest: 0.99029
Training: 2022-04-11 20:12:45,035-[agedb_30][198000]XNorm: 22.661545
Training: 2022-04-11 20:12:45,036-[agedb_30][198000]Accuracy-Flip: 0.98317+-0.00594
Training: 2022-04-11 20:12:45,036-[agedb_30][198000]Accuracy-Highest: 0.98500
Training: 2022-04-11 20:12:48,130-Speed 71.51 samples/sec   Loss 1.3618   LearningRate 0.0166   Epoch: 11   Global Step: 198010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:12:51,204-Speed 3332.12 samples/sec   Loss 1.3579   LearningRate 0.0165   Epoch: 11   Global Step: 198020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:12:54,357-Speed 3247.92 samples/sec   Loss 1.3076   LearningRate 0.0165   Epoch: 11   Global Step: 198030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:12:57,548-Speed 3210.12 samples/sec   Loss 1.3389   LearningRate 0.0165   Epoch: 11   Global Step: 198040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:13:00,616-Speed 3338.06 samples/sec   Loss 1.2991   LearningRate 0.0165   Epoch: 11   Global Step: 198050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:13:03,815-Speed 3201.95 samples/sec   Loss 1.3299   LearningRate 0.0165   Epoch: 11   Global Step: 198060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:07,023-Speed 3192.71 samples/sec   Loss 1.3230   LearningRate 0.0165   Epoch: 11   Global Step: 198070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:10,153-Speed 3272.62 samples/sec   Loss 1.2984   LearningRate 0.0165   Epoch: 11   Global Step: 198080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:13,259-Speed 3297.07 samples/sec   Loss 1.3604   LearningRate 0.0165   Epoch: 11   Global Step: 198090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:16,350-Speed 3313.99 samples/sec   Loss 1.3437   LearningRate 0.0165   Epoch: 11   Global Step: 198100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:19,469-Speed 3283.53 samples/sec   Loss 1.3249   LearningRate 0.0165   Epoch: 11   Global Step: 198110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:22,540-Speed 3335.68 samples/sec   Loss 1.3375   LearningRate 0.0165   Epoch: 11   Global Step: 198120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:25,610-Speed 3335.45 samples/sec   Loss 1.3605   LearningRate 0.0165   Epoch: 11   Global Step: 198130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:28,719-Speed 3294.54 samples/sec   Loss 1.3866   LearningRate 0.0165   Epoch: 11   Global Step: 198140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:31,821-Speed 3301.79 samples/sec   Loss 1.3251   LearningRate 0.0165   Epoch: 11   Global Step: 198150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:34,895-Speed 3332.37 samples/sec   Loss 1.3571   LearningRate 0.0165   Epoch: 11   Global Step: 198160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:13:37,977-Speed 3323.71 samples/sec   Loss 1.3337   LearningRate 0.0165   Epoch: 11   Global Step: 198170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:41,053-Speed 3329.80 samples/sec   Loss 1.3132   LearningRate 0.0165   Epoch: 11   Global Step: 198180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:44,129-Speed 3329.11 samples/sec   Loss 1.3531   LearningRate 0.0165   Epoch: 11   Global Step: 198190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:47,212-Speed 3322.75 samples/sec   Loss 1.3018   LearningRate 0.0165   Epoch: 11   Global Step: 198200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:13:50,277-Speed 3340.89 samples/sec   Loss 1.3097   LearningRate 0.0165   Epoch: 11   Global Step: 198210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:13:53,361-Speed 3321.12 samples/sec   Loss 1.3369   LearningRate 0.0165   Epoch: 11   Global Step: 198220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:13:56,437-Speed 3329.64 samples/sec   Loss 1.3418   LearningRate 0.0165   Epoch: 11   Global Step: 198230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:13:59,521-Speed 3321.31 samples/sec   Loss 1.2820   LearningRate 0.0165   Epoch: 11   Global Step: 198240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:14:02,636-Speed 3289.11 samples/sec   Loss 1.3937   LearningRate 0.0165   Epoch: 11   Global Step: 198250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:14:05,759-Speed 3279.75 samples/sec   Loss 1.3551   LearningRate 0.0165   Epoch: 11   Global Step: 198260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:14:08,852-Speed 3310.38 samples/sec   Loss 1.2994   LearningRate 0.0165   Epoch: 11   Global Step: 198270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:14:11,973-Speed 3281.86 samples/sec   Loss 1.3406   LearningRate 0.0165   Epoch: 11   Global Step: 198280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:14:15,066-Speed 3311.57 samples/sec   Loss 1.3190   LearningRate 0.0165   Epoch: 11   Global Step: 198290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:14:18,162-Speed 3308.56 samples/sec   Loss 1.3114   LearningRate 0.0165   Epoch: 11   Global Step: 198300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:14:21,247-Speed 3319.32 samples/sec   Loss 1.3619   LearningRate 0.0165   Epoch: 11   Global Step: 198310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:24,329-Speed 3323.26 samples/sec   Loss 1.3245   LearningRate 0.0165   Epoch: 11   Global Step: 198320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:27,416-Speed 3318.73 samples/sec   Loss 1.3341   LearningRate 0.0165   Epoch: 11   Global Step: 198330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:30,492-Speed 3329.83 samples/sec   Loss 1.3166   LearningRate 0.0165   Epoch: 11   Global Step: 198340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:33,629-Speed 3265.45 samples/sec   Loss 1.3436   LearningRate 0.0165   Epoch: 11   Global Step: 198350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:36,700-Speed 3335.00 samples/sec   Loss 1.3333   LearningRate 0.0165   Epoch: 11   Global Step: 198360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:39,770-Speed 3335.48 samples/sec   Loss 1.3565   LearningRate 0.0165   Epoch: 11   Global Step: 198370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:42,839-Speed 3337.51 samples/sec   Loss 1.2546   LearningRate 0.0165   Epoch: 11   Global Step: 198380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:45,934-Speed 3309.89 samples/sec   Loss 1.3460   LearningRate 0.0165   Epoch: 11   Global Step: 198390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:49,006-Speed 3333.41 samples/sec   Loss 1.3476   LearningRate 0.0165   Epoch: 11   Global Step: 198400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:52,076-Speed 3336.25 samples/sec   Loss 1.3446   LearningRate 0.0165   Epoch: 11   Global Step: 198410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:55,149-Speed 3334.09 samples/sec   Loss 1.2538   LearningRate 0.0165   Epoch: 11   Global Step: 198420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:14:58,244-Speed 3308.77 samples/sec   Loss 1.3526   LearningRate 0.0164   Epoch: 11   Global Step: 198430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:01,324-Speed 3325.69 samples/sec   Loss 1.3247   LearningRate 0.0164   Epoch: 11   Global Step: 198440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:04,398-Speed 3332.07 samples/sec   Loss 1.2613   LearningRate 0.0164   Epoch: 11   Global Step: 198450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:07,491-Speed 3311.58 samples/sec   Loss 1.3171   LearningRate 0.0164   Epoch: 11   Global Step: 198460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:10,593-Speed 3302.11 samples/sec   Loss 1.2676   LearningRate 0.0164   Epoch: 11   Global Step: 198470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:13,717-Speed 3278.01 samples/sec   Loss 1.3318   LearningRate 0.0164   Epoch: 11   Global Step: 198480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:16,806-Speed 3315.41 samples/sec   Loss 1.3449   LearningRate 0.0164   Epoch: 11   Global Step: 198490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:19,924-Speed 3284.56 samples/sec   Loss 1.3066   LearningRate 0.0164   Epoch: 11   Global Step: 198500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:23,022-Speed 3307.06 samples/sec   Loss 1.3751   LearningRate 0.0164   Epoch: 11   Global Step: 198510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:15:26,095-Speed 3332.72 samples/sec   Loss 1.3474   LearningRate 0.0164   Epoch: 11   Global Step: 198520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:15:29,192-Speed 3307.21 samples/sec   Loss 1.3617   LearningRate 0.0164   Epoch: 11   Global Step: 198530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:32,277-Speed 3320.70 samples/sec   Loss 1.3004   LearningRate 0.0164   Epoch: 11   Global Step: 198540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:35,424-Speed 3253.81 samples/sec   Loss 1.3158   LearningRate 0.0164   Epoch: 11   Global Step: 198550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:38,556-Speed 3270.82 samples/sec   Loss 1.3761   LearningRate 0.0164   Epoch: 11   Global Step: 198560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:41,623-Speed 3339.86 samples/sec   Loss 1.3929   LearningRate 0.0164   Epoch: 11   Global Step: 198570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:44,701-Speed 3326.59 samples/sec   Loss 1.2906   LearningRate 0.0164   Epoch: 11   Global Step: 198580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:47,840-Speed 3263.98 samples/sec   Loss 1.3656   LearningRate 0.0164   Epoch: 11   Global Step: 198590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:50,953-Speed 3289.46 samples/sec   Loss 1.3039   LearningRate 0.0164   Epoch: 11   Global Step: 198600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:54,081-Speed 3274.82 samples/sec   Loss 1.3852   LearningRate 0.0164   Epoch: 11   Global Step: 198610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:15:57,188-Speed 3296.46 samples/sec   Loss 1.2490   LearningRate 0.0164   Epoch: 11   Global Step: 198620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:00,336-Speed 3254.39 samples/sec   Loss 1.3164   LearningRate 0.0164   Epoch: 11   Global Step: 198630   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:16:03,414-Speed 3327.32 samples/sec   Loss 1.3414   LearningRate 0.0164   Epoch: 11   Global Step: 198640   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:16:06,494-Speed 3325.25 samples/sec   Loss 1.3043   LearningRate 0.0164   Epoch: 11   Global Step: 198650   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:16:09,579-Speed 3319.88 samples/sec   Loss 1.3588   LearningRate 0.0164   Epoch: 11   Global Step: 198660   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:16:12,644-Speed 3343.06 samples/sec   Loss 1.3858   LearningRate 0.0164   Epoch: 11   Global Step: 198670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:15,726-Speed 3323.27 samples/sec   Loss 1.3737   LearningRate 0.0164   Epoch: 11   Global Step: 198680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:18,817-Speed 3314.07 samples/sec   Loss 1.3217   LearningRate 0.0164   Epoch: 11   Global Step: 198690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:21,920-Speed 3300.94 samples/sec   Loss 1.3064   LearningRate 0.0164   Epoch: 11   Global Step: 198700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:24,994-Speed 3331.48 samples/sec   Loss 1.3460   LearningRate 0.0164   Epoch: 11   Global Step: 198710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:28,068-Speed 3331.54 samples/sec   Loss 1.3492   LearningRate 0.0164   Epoch: 11   Global Step: 198720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:31,140-Speed 3334.02 samples/sec   Loss 1.2934   LearningRate 0.0164   Epoch: 11   Global Step: 198730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:34,225-Speed 3320.33 samples/sec   Loss 1.2911   LearningRate 0.0164   Epoch: 11   Global Step: 198740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:37,309-Speed 3321.18 samples/sec   Loss 1.3253   LearningRate 0.0164   Epoch: 11   Global Step: 198750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:40,387-Speed 3326.93 samples/sec   Loss 1.3342   LearningRate 0.0164   Epoch: 11   Global Step: 198760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:43,467-Speed 3326.17 samples/sec   Loss 1.3450   LearningRate 0.0164   Epoch: 11   Global Step: 198770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:16:46,542-Speed 3330.94 samples/sec   Loss 1.3035   LearningRate 0.0164   Epoch: 11   Global Step: 198780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:16:49,622-Speed 3325.10 samples/sec   Loss 1.3043   LearningRate 0.0164   Epoch: 11   Global Step: 198790   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:16:52,700-Speed 3328.21 samples/sec   Loss 1.3263   LearningRate 0.0164   Epoch: 11   Global Step: 198800   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:16:55,758-Speed 3349.17 samples/sec   Loss 1.2947   LearningRate 0.0164   Epoch: 11   Global Step: 198810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:16:58,850-Speed 3312.00 samples/sec   Loss 1.3368   LearningRate 0.0164   Epoch: 11   Global Step: 198820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:01,933-Speed 3322.97 samples/sec   Loss 1.3276   LearningRate 0.0164   Epoch: 11   Global Step: 198830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:05,024-Speed 3313.00 samples/sec   Loss 1.3949   LearningRate 0.0163   Epoch: 11   Global Step: 198840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:08,100-Speed 3330.16 samples/sec   Loss 1.3567   LearningRate 0.0163   Epoch: 11   Global Step: 198850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:11,174-Speed 3331.66 samples/sec   Loss 1.3355   LearningRate 0.0163   Epoch: 11   Global Step: 198860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:14,253-Speed 3327.20 samples/sec   Loss 1.2923   LearningRate 0.0163   Epoch: 11   Global Step: 198870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:17,349-Speed 3308.11 samples/sec   Loss 1.3708   LearningRate 0.0163   Epoch: 11   Global Step: 198880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:20,422-Speed 3332.47 samples/sec   Loss 1.3946   LearningRate 0.0163   Epoch: 11   Global Step: 198890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:23,495-Speed 3333.37 samples/sec   Loss 1.3511   LearningRate 0.0163   Epoch: 11   Global Step: 198900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:26,571-Speed 3329.71 samples/sec   Loss 1.3514   LearningRate 0.0163   Epoch: 11   Global Step: 198910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:17:29,635-Speed 3342.22 samples/sec   Loss 1.3154   LearningRate 0.0163   Epoch: 11   Global Step: 198920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:32,715-Speed 3325.52 samples/sec   Loss 1.3782   LearningRate 0.0163   Epoch: 11   Global Step: 198930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:35,794-Speed 3326.83 samples/sec   Loss 1.3202   LearningRate 0.0163   Epoch: 11   Global Step: 198940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:38,873-Speed 3326.42 samples/sec   Loss 1.3333   LearningRate 0.0163   Epoch: 11   Global Step: 198950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:41,956-Speed 3322.98 samples/sec   Loss 1.2847   LearningRate 0.0163   Epoch: 11   Global Step: 198960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:45,028-Speed 3333.40 samples/sec   Loss 1.3385   LearningRate 0.0163   Epoch: 11   Global Step: 198970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:48,101-Speed 3333.57 samples/sec   Loss 1.3462   LearningRate 0.0163   Epoch: 11   Global Step: 198980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:51,185-Speed 3320.75 samples/sec   Loss 1.3110   LearningRate 0.0163   Epoch: 11   Global Step: 198990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:54,260-Speed 3330.42 samples/sec   Loss 1.3060   LearningRate 0.0163   Epoch: 11   Global Step: 199000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:17:57,340-Speed 3325.51 samples/sec   Loss 1.3503   LearningRate 0.0163   Epoch: 11   Global Step: 199010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:18:00,437-Speed 3306.94 samples/sec   Loss 1.3492   LearningRate 0.0163   Epoch: 11   Global Step: 199020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:03,526-Speed 3316.61 samples/sec   Loss 1.3262   LearningRate 0.0163   Epoch: 11   Global Step: 199030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:06,605-Speed 3326.63 samples/sec   Loss 1.3175   LearningRate 0.0163   Epoch: 11   Global Step: 199040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:09,684-Speed 3326.36 samples/sec   Loss 1.2474   LearningRate 0.0163   Epoch: 11   Global Step: 199050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:12,763-Speed 3326.87 samples/sec   Loss 1.3210   LearningRate 0.0163   Epoch: 11   Global Step: 199060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:15,838-Speed 3330.91 samples/sec   Loss 1.3352   LearningRate 0.0163   Epoch: 11   Global Step: 199070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:18,915-Speed 3327.69 samples/sec   Loss 1.3216   LearningRate 0.0163   Epoch: 11   Global Step: 199080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:22,005-Speed 3315.63 samples/sec   Loss 1.3603   LearningRate 0.0163   Epoch: 11   Global Step: 199090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:25,116-Speed 3291.56 samples/sec   Loss 1.3405   LearningRate 0.0163   Epoch: 11   Global Step: 199100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:28,208-Speed 3312.88 samples/sec   Loss 1.2857   LearningRate 0.0163   Epoch: 11   Global Step: 199110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:31,268-Speed 3347.10 samples/sec   Loss 1.3349   LearningRate 0.0163   Epoch: 11   Global Step: 199120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:34,341-Speed 3333.35 samples/sec   Loss 1.3222   LearningRate 0.0163   Epoch: 11   Global Step: 199130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:37,420-Speed 3326.69 samples/sec   Loss 1.3348   LearningRate 0.0163   Epoch: 11   Global Step: 199140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:40,498-Speed 3327.84 samples/sec   Loss 1.3255   LearningRate 0.0163   Epoch: 11   Global Step: 199150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:43,584-Speed 3318.48 samples/sec   Loss 1.3556   LearningRate 0.0163   Epoch: 11   Global Step: 199160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:46,696-Speed 3291.10 samples/sec   Loss 1.3566   LearningRate 0.0163   Epoch: 11   Global Step: 199170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:49,781-Speed 3319.72 samples/sec   Loss 1.2901   LearningRate 0.0163   Epoch: 11   Global Step: 199180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:52,875-Speed 3310.72 samples/sec   Loss 1.3399   LearningRate 0.0163   Epoch: 11   Global Step: 199190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:55,969-Speed 3309.96 samples/sec   Loss 1.3710   LearningRate 0.0163   Epoch: 11   Global Step: 199200   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:18:59,049-Speed 3326.00 samples/sec   Loss 1.3141   LearningRate 0.0163   Epoch: 11   Global Step: 199210   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:19:02,141-Speed 3312.48 samples/sec   Loss 1.3673   LearningRate 0.0163   Epoch: 11   Global Step: 199220   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:19:05,233-Speed 3312.96 samples/sec   Loss 1.3244   LearningRate 0.0163   Epoch: 11   Global Step: 199230   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:19:08,345-Speed 3291.04 samples/sec   Loss 1.3489   LearningRate 0.0163   Epoch: 11   Global Step: 199240   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:19:11,425-Speed 3325.12 samples/sec   Loss 1.3104   LearningRate 0.0163   Epoch: 11   Global Step: 199250   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:19:14,510-Speed 3320.49 samples/sec   Loss 1.3736   LearningRate 0.0162   Epoch: 11   Global Step: 199260   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:19:17,600-Speed 3314.91 samples/sec   Loss 1.3774   LearningRate 0.0162   Epoch: 11   Global Step: 199270   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:19:20,705-Speed 3298.06 samples/sec   Loss 1.3646   LearningRate 0.0162   Epoch: 11   Global Step: 199280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:23,817-Speed 3291.97 samples/sec   Loss 1.3304   LearningRate 0.0162   Epoch: 11   Global Step: 199290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:26,914-Speed 3307.37 samples/sec   Loss 1.3122   LearningRate 0.0162   Epoch: 11   Global Step: 199300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:29,990-Speed 3329.20 samples/sec   Loss 1.3161   LearningRate 0.0162   Epoch: 11   Global Step: 199310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:33,063-Speed 3332.88 samples/sec   Loss 1.3084   LearningRate 0.0162   Epoch: 11   Global Step: 199320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:36,139-Speed 3330.04 samples/sec   Loss 1.3582   LearningRate 0.0162   Epoch: 11   Global Step: 199330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:39,213-Speed 3331.75 samples/sec   Loss 1.3073   LearningRate 0.0162   Epoch: 11   Global Step: 199340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:42,318-Speed 3298.55 samples/sec   Loss 1.3071   LearningRate 0.0162   Epoch: 11   Global Step: 199350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:45,434-Speed 3287.49 samples/sec   Loss 1.3613   LearningRate 0.0162   Epoch: 11   Global Step: 199360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:48,539-Speed 3297.64 samples/sec   Loss 1.3486   LearningRate 0.0162   Epoch: 11   Global Step: 199370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:19:51,628-Speed 3317.08 samples/sec   Loss 1.3326   LearningRate 0.0162   Epoch: 11   Global Step: 199380   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:19:54,713-Speed 3319.17 samples/sec   Loss 1.3168   LearningRate 0.0162   Epoch: 11   Global Step: 199390   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:19:57,804-Speed 3313.97 samples/sec   Loss 1.3737   LearningRate 0.0162   Epoch: 11   Global Step: 199400   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:00,932-Speed 3274.44 samples/sec   Loss 1.3085   LearningRate 0.0162   Epoch: 11   Global Step: 199410   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:04,010-Speed 3327.72 samples/sec   Loss 1.3075   LearningRate 0.0162   Epoch: 11   Global Step: 199420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:07,094-Speed 3320.53 samples/sec   Loss 1.3649   LearningRate 0.0162   Epoch: 11   Global Step: 199430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:10,182-Speed 3317.15 samples/sec   Loss 1.3183   LearningRate 0.0162   Epoch: 11   Global Step: 199440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:13,267-Speed 3319.79 samples/sec   Loss 1.3239   LearningRate 0.0162   Epoch: 11   Global Step: 199450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:16,347-Speed 3326.27 samples/sec   Loss 1.3347   LearningRate 0.0162   Epoch: 11   Global Step: 199460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:19,428-Speed 3323.60 samples/sec   Loss 1.3530   LearningRate 0.0162   Epoch: 11   Global Step: 199470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:22,498-Speed 3336.76 samples/sec   Loss 1.3343   LearningRate 0.0162   Epoch: 11   Global Step: 199480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:25,605-Speed 3296.11 samples/sec   Loss 1.3547   LearningRate 0.0162   Epoch: 11   Global Step: 199490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:28,699-Speed 3310.59 samples/sec   Loss 1.3211   LearningRate 0.0162   Epoch: 11   Global Step: 199500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:31,782-Speed 3321.80 samples/sec   Loss 1.3659   LearningRate 0.0162   Epoch: 11   Global Step: 199510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:34,863-Speed 3325.62 samples/sec   Loss 1.3378   LearningRate 0.0162   Epoch: 11   Global Step: 199520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:37,958-Speed 3308.53 samples/sec   Loss 1.3337   LearningRate 0.0162   Epoch: 11   Global Step: 199530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:41,041-Speed 3322.46 samples/sec   Loss 1.3193   LearningRate 0.0162   Epoch: 11   Global Step: 199540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:44,133-Speed 3312.69 samples/sec   Loss 1.3059   LearningRate 0.0162   Epoch: 11   Global Step: 199550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:47,209-Speed 3329.73 samples/sec   Loss 1.3854   LearningRate 0.0162   Epoch: 11   Global Step: 199560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:50,298-Speed 3316.27 samples/sec   Loss 1.3738   LearningRate 0.0162   Epoch: 11   Global Step: 199570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:20:53,370-Speed 3333.50 samples/sec   Loss 1.2905   LearningRate 0.0162   Epoch: 11   Global Step: 199580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:56,447-Speed 3328.51 samples/sec   Loss 1.3062   LearningRate 0.0162   Epoch: 11   Global Step: 199590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:20:59,543-Speed 3308.89 samples/sec   Loss 1.3589   LearningRate 0.0162   Epoch: 11   Global Step: 199600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:21:02,610-Speed 3339.53 samples/sec   Loss 1.3455   LearningRate 0.0162   Epoch: 11   Global Step: 199610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:05,692-Speed 3322.64 samples/sec   Loss 1.3221   LearningRate 0.0162   Epoch: 11   Global Step: 199620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:08,773-Speed 3324.54 samples/sec   Loss 1.3153   LearningRate 0.0162   Epoch: 11   Global Step: 199630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:11,857-Speed 3321.31 samples/sec   Loss 1.3573   LearningRate 0.0162   Epoch: 11   Global Step: 199640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:14,938-Speed 3324.55 samples/sec   Loss 1.3506   LearningRate 0.0162   Epoch: 11   Global Step: 199650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:18,012-Speed 3331.88 samples/sec   Loss 1.3255   LearningRate 0.0162   Epoch: 11   Global Step: 199660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:21,115-Speed 3300.88 samples/sec   Loss 1.3579   LearningRate 0.0161   Epoch: 11   Global Step: 199670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:24,216-Speed 3302.61 samples/sec   Loss 1.3221   LearningRate 0.0161   Epoch: 11   Global Step: 199680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:27,359-Speed 3259.15 samples/sec   Loss 1.2946   LearningRate 0.0161   Epoch: 11   Global Step: 199690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:30,476-Speed 3285.75 samples/sec   Loss 1.3161   LearningRate 0.0161   Epoch: 11   Global Step: 199700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:21:33,587-Speed 3292.32 samples/sec   Loss 1.3601   LearningRate 0.0161   Epoch: 11   Global Step: 199710   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:21:36,688-Speed 3302.47 samples/sec   Loss 1.3885   LearningRate 0.0161   Epoch: 11   Global Step: 199720   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:21:39,767-Speed 3327.00 samples/sec   Loss 1.3745   LearningRate 0.0161   Epoch: 11   Global Step: 199730   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:21:42,886-Speed 3283.82 samples/sec   Loss 1.3357   LearningRate 0.0161   Epoch: 11   Global Step: 199740   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:21:46,103-Speed 3183.67 samples/sec   Loss 1.3165   LearningRate 0.0161   Epoch: 11   Global Step: 199750   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:21:49,219-Speed 3287.94 samples/sec   Loss 1.2939   LearningRate 0.0161   Epoch: 11   Global Step: 199760   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:21:52,353-Speed 3267.86 samples/sec   Loss 1.3569   LearningRate 0.0161   Epoch: 11   Global Step: 199770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:21:55,460-Speed 3295.85 samples/sec   Loss 1.3169   LearningRate 0.0161   Epoch: 11   Global Step: 199780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:21:58,651-Speed 3210.27 samples/sec   Loss 1.3389   LearningRate 0.0161   Epoch: 11   Global Step: 199790   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:22:01,756-Speed 3298.54 samples/sec   Loss 1.3106   LearningRate 0.0161   Epoch: 11   Global Step: 199800   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:22:04,839-Speed 3321.82 samples/sec   Loss 1.3390   LearningRate 0.0161   Epoch: 11   Global Step: 199810   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:22:07,930-Speed 3313.92 samples/sec   Loss 1.3617   LearningRate 0.0161   Epoch: 11   Global Step: 199820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:22:11,001-Speed 3335.70 samples/sec   Loss 1.3270   LearningRate 0.0161   Epoch: 11   Global Step: 199830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:14,075-Speed 3332.28 samples/sec   Loss 1.3721   LearningRate 0.0161   Epoch: 11   Global Step: 199840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:17,167-Speed 3311.99 samples/sec   Loss 1.3543   LearningRate 0.0161   Epoch: 11   Global Step: 199850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:20,244-Speed 3328.59 samples/sec   Loss 1.3235   LearningRate 0.0161   Epoch: 11   Global Step: 199860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:23,398-Speed 3248.03 samples/sec   Loss 1.3402   LearningRate 0.0161   Epoch: 11   Global Step: 199870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:26,474-Speed 3328.74 samples/sec   Loss 1.3044   LearningRate 0.0161   Epoch: 11   Global Step: 199880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:29,562-Speed 3317.19 samples/sec   Loss 1.2834   LearningRate 0.0161   Epoch: 11   Global Step: 199890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:32,650-Speed 3317.74 samples/sec   Loss 1.2519   LearningRate 0.0161   Epoch: 11   Global Step: 199900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:35,724-Speed 3330.84 samples/sec   Loss 1.3083   LearningRate 0.0161   Epoch: 11   Global Step: 199910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:38,795-Speed 3335.25 samples/sec   Loss 1.3292   LearningRate 0.0161   Epoch: 11   Global Step: 199920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:41,874-Speed 3327.06 samples/sec   Loss 1.3300   LearningRate 0.0161   Epoch: 11   Global Step: 199930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:22:44,946-Speed 3333.61 samples/sec   Loss 1.3508   LearningRate 0.0161   Epoch: 11   Global Step: 199940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:22:48,016-Speed 3336.38 samples/sec   Loss 1.3496   LearningRate 0.0161   Epoch: 11   Global Step: 199950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:51,110-Speed 3310.11 samples/sec   Loss 1.3197   LearningRate 0.0161   Epoch: 11   Global Step: 199960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:54,271-Speed 3241.12 samples/sec   Loss 1.3446   LearningRate 0.0161   Epoch: 11   Global Step: 199970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:22:57,357-Speed 3318.53 samples/sec   Loss 1.3374   LearningRate 0.0161   Epoch: 11   Global Step: 199980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:23:00,430-Speed 3333.26 samples/sec   Loss 1.3526   LearningRate 0.0161   Epoch: 11   Global Step: 199990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:23:03,568-Speed 3263.66 samples/sec   Loss 1.3777   LearningRate 0.0161   Epoch: 11   Global Step: 200000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:23:46,954-[lfw][200000]XNorm: 21.631169
Training: 2022-04-11 20:23:46,955-[lfw][200000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-11 20:23:46,955-[lfw][200000]Accuracy-Highest: 0.99817
Training: 2022-04-11 20:24:37,425-[cfp_fp][200000]XNorm: 22.291177
Training: 2022-04-11 20:24:37,426-[cfp_fp][200000]Accuracy-Flip: 0.98929+-0.00444
Training: 2022-04-11 20:24:37,427-[cfp_fp][200000]Accuracy-Highest: 0.99029
Training: 2022-04-11 20:25:20,917-[agedb_30][200000]XNorm: 23.055870
Training: 2022-04-11 20:25:20,918-[agedb_30][200000]Accuracy-Flip: 0.98567+-0.00620
Training: 2022-04-11 20:25:20,918-[agedb_30][200000]Accuracy-Highest: 0.98567
Training: 2022-04-11 20:25:23,993-Speed 72.92 samples/sec   Loss 1.3519   LearningRate 0.0161   Epoch: 11   Global Step: 200010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:25:27,123-Speed 3272.62 samples/sec   Loss 1.3466   LearningRate 0.0161   Epoch: 11   Global Step: 200020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:25:30,184-Speed 3345.87 samples/sec   Loss 1.3651   LearningRate 0.0161   Epoch: 11   Global Step: 200030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:25:33,271-Speed 3318.01 samples/sec   Loss 1.3752   LearningRate 0.0161   Epoch: 11   Global Step: 200040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:25:36,462-Speed 3209.38 samples/sec   Loss 1.3515   LearningRate 0.0161   Epoch: 11   Global Step: 200050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:25:39,538-Speed 3329.52 samples/sec   Loss 1.3529   LearningRate 0.0161   Epoch: 11   Global Step: 200060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:25:42,606-Speed 3338.95 samples/sec   Loss 1.2676   LearningRate 0.0161   Epoch: 11   Global Step: 200070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:25:45,682-Speed 3329.81 samples/sec   Loss 1.3482   LearningRate 0.0161   Epoch: 11   Global Step: 200080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:25:48,756-Speed 3331.18 samples/sec   Loss 1.3074   LearningRate 0.0160   Epoch: 11   Global Step: 200090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:25:51,876-Speed 3283.07 samples/sec   Loss 1.3425   LearningRate 0.0160   Epoch: 11   Global Step: 200100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:25:54,994-Speed 3285.44 samples/sec   Loss 1.3060   LearningRate 0.0160   Epoch: 11   Global Step: 200110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:25:58,067-Speed 3332.31 samples/sec   Loss 1.3283   LearningRate 0.0160   Epoch: 11   Global Step: 200120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:01,152-Speed 3320.66 samples/sec   Loss 1.3638   LearningRate 0.0160   Epoch: 11   Global Step: 200130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:04,241-Speed 3316.09 samples/sec   Loss 1.3116   LearningRate 0.0160   Epoch: 11   Global Step: 200140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:07,345-Speed 3299.49 samples/sec   Loss 1.3120   LearningRate 0.0160   Epoch: 11   Global Step: 200150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:10,465-Speed 3282.17 samples/sec   Loss 1.3526   LearningRate 0.0160   Epoch: 11   Global Step: 200160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:13,619-Speed 3247.44 samples/sec   Loss 1.4035   LearningRate 0.0160   Epoch: 11   Global Step: 200170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:16,693-Speed 3332.40 samples/sec   Loss 1.3633   LearningRate 0.0160   Epoch: 11   Global Step: 200180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:19,782-Speed 3315.80 samples/sec   Loss 1.3517   LearningRate 0.0160   Epoch: 11   Global Step: 200190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:26:22,871-Speed 3315.45 samples/sec   Loss 1.3281   LearningRate 0.0160   Epoch: 11   Global Step: 200200   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:26:25,960-Speed 3315.45 samples/sec   Loss 1.3994   LearningRate 0.0160   Epoch: 11   Global Step: 200210   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:26:29,041-Speed 3325.22 samples/sec   Loss 1.3410   LearningRate 0.0160   Epoch: 11   Global Step: 200220   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:26:32,185-Speed 3257.64 samples/sec   Loss 1.2866   LearningRate 0.0160   Epoch: 11   Global Step: 200230   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:26:35,267-Speed 3322.58 samples/sec   Loss 1.3212   LearningRate 0.0160   Epoch: 11   Global Step: 200240   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:26:38,349-Speed 3324.14 samples/sec   Loss 1.3120   LearningRate 0.0160   Epoch: 11   Global Step: 200250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:41,455-Speed 3296.47 samples/sec   Loss 1.3911   LearningRate 0.0160   Epoch: 11   Global Step: 200260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:44,565-Speed 3293.74 samples/sec   Loss 1.3304   LearningRate 0.0160   Epoch: 11   Global Step: 200270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:47,739-Speed 3227.80 samples/sec   Loss 1.3581   LearningRate 0.0160   Epoch: 11   Global Step: 200280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:26:51,075-Speed 3069.81 samples/sec   Loss 1.3507   LearningRate 0.0160   Epoch: 11   Global Step: 200290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:27:27,140-Speed 283.94 samples/sec   Loss 1.0106   LearningRate 0.0160   Epoch: 12   Global Step: 200300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:27:30,464-Speed 3082.16 samples/sec   Loss 0.9028   LearningRate 0.0160   Epoch: 12   Global Step: 200310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:27:33,559-Speed 3308.50 samples/sec   Loss 0.9163   LearningRate 0.0160   Epoch: 12   Global Step: 200320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:27:36,674-Speed 3289.31 samples/sec   Loss 0.8966   LearningRate 0.0160   Epoch: 12   Global Step: 200330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:27:39,820-Speed 3255.13 samples/sec   Loss 0.9233   LearningRate 0.0160   Epoch: 12   Global Step: 200340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:27:43,054-Speed 3167.51 samples/sec   Loss 0.9313   LearningRate 0.0160   Epoch: 12   Global Step: 200350   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:27:46,149-Speed 3308.31 samples/sec   Loss 0.8734   LearningRate 0.0160   Epoch: 12   Global Step: 200360   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:27:49,295-Speed 3256.19 samples/sec   Loss 0.8849   LearningRate 0.0160   Epoch: 12   Global Step: 200370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:27:52,407-Speed 3291.58 samples/sec   Loss 0.8733   LearningRate 0.0160   Epoch: 12   Global Step: 200380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:27:55,483-Speed 3329.27 samples/sec   Loss 0.9012   LearningRate 0.0160   Epoch: 12   Global Step: 200390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:27:58,574-Speed 3313.41 samples/sec   Loss 0.8912   LearningRate 0.0160   Epoch: 12   Global Step: 200400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:01,658-Speed 3321.94 samples/sec   Loss 0.8986   LearningRate 0.0160   Epoch: 12   Global Step: 200410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:04,745-Speed 3318.01 samples/sec   Loss 0.8634   LearningRate 0.0160   Epoch: 12   Global Step: 200420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:07,880-Speed 3267.09 samples/sec   Loss 0.8783   LearningRate 0.0160   Epoch: 12   Global Step: 200430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:10,967-Speed 3317.24 samples/sec   Loss 0.8786   LearningRate 0.0160   Epoch: 12   Global Step: 200440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:14,141-Speed 3226.86 samples/sec   Loss 0.9153   LearningRate 0.0160   Epoch: 12   Global Step: 200450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:17,318-Speed 3223.68 samples/sec   Loss 0.8740   LearningRate 0.0160   Epoch: 12   Global Step: 200460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:20,426-Speed 3295.71 samples/sec   Loss 0.9052   LearningRate 0.0160   Epoch: 12   Global Step: 200470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:23,964-Speed 2894.92 samples/sec   Loss 0.8552   LearningRate 0.0160   Epoch: 12   Global Step: 200480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:28:27,111-Speed 3254.66 samples/sec   Loss 0.8694   LearningRate 0.0160   Epoch: 12   Global Step: 200490   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:28:30,560-Speed 2969.86 samples/sec   Loss 0.8806   LearningRate 0.0160   Epoch: 12   Global Step: 200500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:28:33,660-Speed 3303.94 samples/sec   Loss 0.8822   LearningRate 0.0159   Epoch: 12   Global Step: 200510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:28:36,847-Speed 3214.49 samples/sec   Loss 0.9031   LearningRate 0.0159   Epoch: 12   Global Step: 200520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:28:39,932-Speed 3319.42 samples/sec   Loss 0.9224   LearningRate 0.0159   Epoch: 12   Global Step: 200530   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:28:43,030-Speed 3306.22 samples/sec   Loss 0.8942   LearningRate 0.0159   Epoch: 12   Global Step: 200540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:28:46,205-Speed 3226.05 samples/sec   Loss 0.8813   LearningRate 0.0159   Epoch: 12   Global Step: 200550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:49,327-Speed 3280.86 samples/sec   Loss 0.8505   LearningRate 0.0159   Epoch: 12   Global Step: 200560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:52,432-Speed 3298.27 samples/sec   Loss 0.8730   LearningRate 0.0159   Epoch: 12   Global Step: 200570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:28:55,505-Speed 3333.42 samples/sec   Loss 0.8847   LearningRate 0.0159   Epoch: 12   Global Step: 200580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:28:58,584-Speed 3326.54 samples/sec   Loss 0.9114   LearningRate 0.0159   Epoch: 12   Global Step: 200590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:29:01,680-Speed 3308.25 samples/sec   Loss 0.9134   LearningRate 0.0159   Epoch: 12   Global Step: 200600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:29:04,756-Speed 3330.14 samples/sec   Loss 0.9101   LearningRate 0.0159   Epoch: 12   Global Step: 200610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:29:07,846-Speed 3313.91 samples/sec   Loss 0.8888   LearningRate 0.0159   Epoch: 12   Global Step: 200620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:29:11,099-Speed 3148.88 samples/sec   Loss 0.9047   LearningRate 0.0159   Epoch: 12   Global Step: 200630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:29:14,196-Speed 3307.10 samples/sec   Loss 0.8467   LearningRate 0.0159   Epoch: 12   Global Step: 200640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:29:17,294-Speed 3306.66 samples/sec   Loss 0.8935   LearningRate 0.0159   Epoch: 12   Global Step: 200650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:29:20,433-Speed 3262.63 samples/sec   Loss 0.8962   LearningRate 0.0159   Epoch: 12   Global Step: 200660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:29:23,515-Speed 3322.96 samples/sec   Loss 0.8456   LearningRate 0.0159   Epoch: 12   Global Step: 200670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:29:26,630-Speed 3287.98 samples/sec   Loss 0.8837   LearningRate 0.0159   Epoch: 12   Global Step: 200680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:29,769-Speed 3263.05 samples/sec   Loss 0.8953   LearningRate 0.0159   Epoch: 12   Global Step: 200690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:32,897-Speed 3274.69 samples/sec   Loss 0.8592   LearningRate 0.0159   Epoch: 12   Global Step: 200700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:35,997-Speed 3304.02 samples/sec   Loss 0.8813   LearningRate 0.0159   Epoch: 12   Global Step: 200710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:39,097-Speed 3304.44 samples/sec   Loss 0.8908   LearningRate 0.0159   Epoch: 12   Global Step: 200720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:42,267-Speed 3230.62 samples/sec   Loss 0.9404   LearningRate 0.0159   Epoch: 12   Global Step: 200730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:45,388-Speed 3282.11 samples/sec   Loss 0.9072   LearningRate 0.0159   Epoch: 12   Global Step: 200740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:48,465-Speed 3329.11 samples/sec   Loss 0.8872   LearningRate 0.0159   Epoch: 12   Global Step: 200750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:51,643-Speed 3221.81 samples/sec   Loss 0.9260   LearningRate 0.0159   Epoch: 12   Global Step: 200760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:54,898-Speed 3147.21 samples/sec   Loss 0.8728   LearningRate 0.0159   Epoch: 12   Global Step: 200770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:29:58,077-Speed 3221.78 samples/sec   Loss 0.8543   LearningRate 0.0159   Epoch: 12   Global Step: 200780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:30:01,170-Speed 3311.30 samples/sec   Loss 0.8815   LearningRate 0.0159   Epoch: 12   Global Step: 200790   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:30:04,286-Speed 3286.56 samples/sec   Loss 0.8712   LearningRate 0.0159   Epoch: 12   Global Step: 200800   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:30:07,470-Speed 3217.60 samples/sec   Loss 0.8883   LearningRate 0.0159   Epoch: 12   Global Step: 200810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:10,547-Speed 3328.76 samples/sec   Loss 0.8994   LearningRate 0.0159   Epoch: 12   Global Step: 200820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:13,630-Speed 3321.72 samples/sec   Loss 0.9327   LearningRate 0.0159   Epoch: 12   Global Step: 200830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:16,707-Speed 3329.28 samples/sec   Loss 0.8865   LearningRate 0.0159   Epoch: 12   Global Step: 200840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:19,802-Speed 3308.50 samples/sec   Loss 0.8989   LearningRate 0.0159   Epoch: 12   Global Step: 200850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:22,909-Speed 3296.70 samples/sec   Loss 0.9018   LearningRate 0.0159   Epoch: 12   Global Step: 200860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:26,021-Speed 3291.63 samples/sec   Loss 0.8454   LearningRate 0.0159   Epoch: 12   Global Step: 200870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:29,116-Speed 3309.31 samples/sec   Loss 0.8736   LearningRate 0.0159   Epoch: 12   Global Step: 200880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:32,284-Speed 3232.50 samples/sec   Loss 0.9022   LearningRate 0.0159   Epoch: 12   Global Step: 200890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:35,406-Speed 3281.08 samples/sec   Loss 0.8902   LearningRate 0.0159   Epoch: 12   Global Step: 200900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:30:38,617-Speed 3189.66 samples/sec   Loss 0.9174   LearningRate 0.0159   Epoch: 12   Global Step: 200910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:30:41,730-Speed 3290.01 samples/sec   Loss 0.9213   LearningRate 0.0158   Epoch: 12   Global Step: 200920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:30:44,806-Speed 3330.08 samples/sec   Loss 0.9030   LearningRate 0.0158   Epoch: 12   Global Step: 200930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:30:47,952-Speed 3256.03 samples/sec   Loss 0.8833   LearningRate 0.0158   Epoch: 12   Global Step: 200940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:30:51,103-Speed 3250.21 samples/sec   Loss 0.9096   LearningRate 0.0158   Epoch: 12   Global Step: 200950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:30:54,181-Speed 3326.98 samples/sec   Loss 0.9076   LearningRate 0.0158   Epoch: 12   Global Step: 200960   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:30:57,334-Speed 3248.76 samples/sec   Loss 0.9207   LearningRate 0.0158   Epoch: 12   Global Step: 200970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:00,411-Speed 3328.80 samples/sec   Loss 0.8665   LearningRate 0.0158   Epoch: 12   Global Step: 200980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:03,488-Speed 3329.32 samples/sec   Loss 0.9169   LearningRate 0.0158   Epoch: 12   Global Step: 200990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:06,600-Speed 3303.19 samples/sec   Loss 0.8846   LearningRate 0.0158   Epoch: 12   Global Step: 201000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:09,754-Speed 3246.86 samples/sec   Loss 0.8802   LearningRate 0.0158   Epoch: 12   Global Step: 201010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:12,850-Speed 3308.58 samples/sec   Loss 0.8945   LearningRate 0.0158   Epoch: 12   Global Step: 201020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:15,969-Speed 3283.53 samples/sec   Loss 0.8998   LearningRate 0.0158   Epoch: 12   Global Step: 201030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:19,058-Speed 3316.05 samples/sec   Loss 0.8929   LearningRate 0.0158   Epoch: 12   Global Step: 201040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:22,138-Speed 3324.51 samples/sec   Loss 0.8994   LearningRate 0.0158   Epoch: 12   Global Step: 201050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:25,353-Speed 3186.58 samples/sec   Loss 0.9255   LearningRate 0.0158   Epoch: 12   Global Step: 201060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:28,435-Speed 3323.31 samples/sec   Loss 0.8822   LearningRate 0.0158   Epoch: 12   Global Step: 201070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:31:31,524-Speed 3316.19 samples/sec   Loss 0.9126   LearningRate 0.0158   Epoch: 12   Global Step: 201080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:31:34,598-Speed 3331.18 samples/sec   Loss 0.8809   LearningRate 0.0158   Epoch: 12   Global Step: 201090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:31:37,729-Speed 3271.66 samples/sec   Loss 0.9154   LearningRate 0.0158   Epoch: 12   Global Step: 201100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:31:40,884-Speed 3245.77 samples/sec   Loss 0.9063   LearningRate 0.0158   Epoch: 12   Global Step: 201110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:31:43,980-Speed 3308.12 samples/sec   Loss 0.8591   LearningRate 0.0158   Epoch: 12   Global Step: 201120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:31:47,075-Speed 3309.90 samples/sec   Loss 0.9382   LearningRate 0.0158   Epoch: 12   Global Step: 201130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:31:50,163-Speed 3317.01 samples/sec   Loss 0.8748   LearningRate 0.0158   Epoch: 12   Global Step: 201140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:31:53,306-Speed 3258.94 samples/sec   Loss 0.9196   LearningRate 0.0158   Epoch: 12   Global Step: 201150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:31:56,450-Speed 3257.70 samples/sec   Loss 0.8829   LearningRate 0.0158   Epoch: 12   Global Step: 201160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:31:59,527-Speed 3328.21 samples/sec   Loss 0.8806   LearningRate 0.0158   Epoch: 12   Global Step: 201170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:32:02,676-Speed 3252.96 samples/sec   Loss 0.9038   LearningRate 0.0158   Epoch: 12   Global Step: 201180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:05,763-Speed 3317.76 samples/sec   Loss 0.9474   LearningRate 0.0158   Epoch: 12   Global Step: 201190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:08,918-Speed 3246.93 samples/sec   Loss 0.9498   LearningRate 0.0158   Epoch: 12   Global Step: 201200   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:12,011-Speed 3311.32 samples/sec   Loss 0.8862   LearningRate 0.0158   Epoch: 12   Global Step: 201210   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:15,177-Speed 3235.00 samples/sec   Loss 0.9194   LearningRate 0.0158   Epoch: 12   Global Step: 201220   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:18,257-Speed 3325.45 samples/sec   Loss 0.9419   LearningRate 0.0158   Epoch: 12   Global Step: 201230   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:21,367-Speed 3293.79 samples/sec   Loss 0.8881   LearningRate 0.0158   Epoch: 12   Global Step: 201240   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:24,498-Speed 3271.59 samples/sec   Loss 0.9018   LearningRate 0.0158   Epoch: 12   Global Step: 201250   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:27,591-Speed 3310.47 samples/sec   Loss 0.9026   LearningRate 0.0158   Epoch: 12   Global Step: 201260   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:30,667-Speed 3330.38 samples/sec   Loss 0.9469   LearningRate 0.0158   Epoch: 12   Global Step: 201270   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:33,759-Speed 3312.46 samples/sec   Loss 0.9392   LearningRate 0.0158   Epoch: 12   Global Step: 201280   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:36,859-Speed 3304.04 samples/sec   Loss 0.8968   LearningRate 0.0158   Epoch: 12   Global Step: 201290   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:39,957-Speed 3306.31 samples/sec   Loss 0.9190   LearningRate 0.0158   Epoch: 12   Global Step: 201300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:43,045-Speed 3316.31 samples/sec   Loss 0.8864   LearningRate 0.0158   Epoch: 12   Global Step: 201310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:46,131-Speed 3318.81 samples/sec   Loss 0.9296   LearningRate 0.0158   Epoch: 12   Global Step: 201320   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:49,226-Speed 3310.42 samples/sec   Loss 0.9541   LearningRate 0.0158   Epoch: 12   Global Step: 201330   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:52,427-Speed 3199.65 samples/sec   Loss 0.8952   LearningRate 0.0157   Epoch: 12   Global Step: 201340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:55,507-Speed 3324.75 samples/sec   Loss 0.9048   LearningRate 0.0157   Epoch: 12   Global Step: 201350   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:32:58,588-Speed 3324.59 samples/sec   Loss 0.9307   LearningRate 0.0157   Epoch: 12   Global Step: 201360   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:33:01,698-Speed 3293.38 samples/sec   Loss 0.9278   LearningRate 0.0157   Epoch: 12   Global Step: 201370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:33:04,791-Speed 3313.46 samples/sec   Loss 0.8819   LearningRate 0.0157   Epoch: 12   Global Step: 201380   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:33:07,891-Speed 3304.43 samples/sec   Loss 0.9172   LearningRate 0.0157   Epoch: 12   Global Step: 201390   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:33:11,010-Speed 3283.56 samples/sec   Loss 0.9103   LearningRate 0.0157   Epoch: 12   Global Step: 201400   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:33:14,129-Speed 3283.96 samples/sec   Loss 0.9380   LearningRate 0.0157   Epoch: 12   Global Step: 201410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:17,217-Speed 3317.50 samples/sec   Loss 0.9053   LearningRate 0.0157   Epoch: 12   Global Step: 201420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:20,315-Speed 3306.14 samples/sec   Loss 0.9185   LearningRate 0.0157   Epoch: 12   Global Step: 201430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:23,438-Speed 3279.45 samples/sec   Loss 0.9162   LearningRate 0.0157   Epoch: 12   Global Step: 201440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:26,593-Speed 3246.19 samples/sec   Loss 0.9103   LearningRate 0.0157   Epoch: 12   Global Step: 201450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:29,672-Speed 3326.92 samples/sec   Loss 0.9246   LearningRate 0.0157   Epoch: 12   Global Step: 201460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:32,769-Speed 3306.28 samples/sec   Loss 0.9210   LearningRate 0.0157   Epoch: 12   Global Step: 201470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:35,874-Speed 3299.91 samples/sec   Loss 0.8958   LearningRate 0.0157   Epoch: 12   Global Step: 201480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:39,007-Speed 3268.37 samples/sec   Loss 0.9227   LearningRate 0.0157   Epoch: 12   Global Step: 201490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:42,109-Speed 3301.71 samples/sec   Loss 0.9159   LearningRate 0.0157   Epoch: 12   Global Step: 201500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:33:45,198-Speed 3315.72 samples/sec   Loss 0.9056   LearningRate 0.0157   Epoch: 12   Global Step: 201510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:33:48,292-Speed 3310.84 samples/sec   Loss 0.8958   LearningRate 0.0157   Epoch: 12   Global Step: 201520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:33:51,455-Speed 3238.17 samples/sec   Loss 0.9000   LearningRate 0.0157   Epoch: 12   Global Step: 201530   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:33:54,627-Speed 3229.21 samples/sec   Loss 0.8730   LearningRate 0.0157   Epoch: 12   Global Step: 201540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:33:57,775-Speed 3252.66 samples/sec   Loss 0.9503   LearningRate 0.0157   Epoch: 12   Global Step: 201550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:00,866-Speed 3314.50 samples/sec   Loss 0.9093   LearningRate 0.0157   Epoch: 12   Global Step: 201560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:04,023-Speed 3244.15 samples/sec   Loss 0.9578   LearningRate 0.0157   Epoch: 12   Global Step: 201570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:07,103-Speed 3325.56 samples/sec   Loss 0.8929   LearningRate 0.0157   Epoch: 12   Global Step: 201580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:10,205-Speed 3302.28 samples/sec   Loss 0.9385   LearningRate 0.0157   Epoch: 12   Global Step: 201590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:13,343-Speed 3263.62 samples/sec   Loss 0.9146   LearningRate 0.0157   Epoch: 12   Global Step: 201600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:16,421-Speed 3327.06 samples/sec   Loss 0.9265   LearningRate 0.0157   Epoch: 12   Global Step: 201610   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 20:34:19,489-Speed 3338.90 samples/sec   Loss 0.9394   LearningRate 0.0157   Epoch: 12   Global Step: 201620   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:22,570-Speed 3324.22 samples/sec   Loss 0.9006   LearningRate 0.0157   Epoch: 12   Global Step: 201630   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:25,655-Speed 3319.59 samples/sec   Loss 0.9359   LearningRate 0.0157   Epoch: 12   Global Step: 201640   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:28,786-Speed 3271.79 samples/sec   Loss 0.8935   LearningRate 0.0157   Epoch: 12   Global Step: 201650   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:31,864-Speed 3327.93 samples/sec   Loss 0.9061   LearningRate 0.0157   Epoch: 12   Global Step: 201660   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:34,940-Speed 3329.76 samples/sec   Loss 0.9420   LearningRate 0.0157   Epoch: 12   Global Step: 201670   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:34:38,033-Speed 3312.04 samples/sec   Loss 0.9379   LearningRate 0.0157   Epoch: 12   Global Step: 201680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:34:41,142-Speed 3294.59 samples/sec   Loss 0.9165   LearningRate 0.0157   Epoch: 12   Global Step: 201690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:34:44,236-Speed 3309.31 samples/sec   Loss 0.9094   LearningRate 0.0157   Epoch: 12   Global Step: 201700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:34:47,321-Speed 3320.88 samples/sec   Loss 0.9362   LearningRate 0.0157   Epoch: 12   Global Step: 201710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:34:50,561-Speed 3160.87 samples/sec   Loss 0.9127   LearningRate 0.0157   Epoch: 12   Global Step: 201720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:34:53,726-Speed 3235.65 samples/sec   Loss 0.9752   LearningRate 0.0157   Epoch: 12   Global Step: 201730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:34:56,982-Speed 3146.04 samples/sec   Loss 0.9308   LearningRate 0.0157   Epoch: 12   Global Step: 201740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:35:00,230-Speed 3153.57 samples/sec   Loss 0.9165   LearningRate 0.0157   Epoch: 12   Global Step: 201750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:35:03,309-Speed 3327.40 samples/sec   Loss 0.9684   LearningRate 0.0157   Epoch: 12   Global Step: 201760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:35:06,383-Speed 3331.60 samples/sec   Loss 0.8960   LearningRate 0.0156   Epoch: 12   Global Step: 201770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:35:09,461-Speed 3327.82 samples/sec   Loss 0.9563   LearningRate 0.0156   Epoch: 12   Global Step: 201780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:12,537-Speed 3329.73 samples/sec   Loss 0.8992   LearningRate 0.0156   Epoch: 12   Global Step: 201790   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:15,613-Speed 3329.64 samples/sec   Loss 0.8952   LearningRate 0.0156   Epoch: 12   Global Step: 201800   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:18,743-Speed 3271.85 samples/sec   Loss 0.9457   LearningRate 0.0156   Epoch: 12   Global Step: 201810   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:21,822-Speed 3326.59 samples/sec   Loss 0.9035   LearningRate 0.0156   Epoch: 12   Global Step: 201820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:24,897-Speed 3331.26 samples/sec   Loss 0.9328   LearningRate 0.0156   Epoch: 12   Global Step: 201830   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:27,983-Speed 3319.08 samples/sec   Loss 0.9492   LearningRate 0.0156   Epoch: 12   Global Step: 201840   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:31,066-Speed 3322.03 samples/sec   Loss 0.9648   LearningRate 0.0156   Epoch: 12   Global Step: 201850   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:34,201-Speed 3266.71 samples/sec   Loss 0.8989   LearningRate 0.0156   Epoch: 12   Global Step: 201860   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:37,378-Speed 3224.31 samples/sec   Loss 0.9290   LearningRate 0.0156   Epoch: 12   Global Step: 201870   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:40,462-Speed 3321.20 samples/sec   Loss 0.9105   LearningRate 0.0156   Epoch: 12   Global Step: 201880   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:43,553-Speed 3312.99 samples/sec   Loss 0.9295   LearningRate 0.0156   Epoch: 12   Global Step: 201890   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:46,639-Speed 3319.62 samples/sec   Loss 0.9328   LearningRate 0.0156   Epoch: 12   Global Step: 201900   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:49,743-Speed 3299.78 samples/sec   Loss 0.9270   LearningRate 0.0156   Epoch: 12   Global Step: 201910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:52,846-Speed 3301.07 samples/sec   Loss 0.9392   LearningRate 0.0156   Epoch: 12   Global Step: 201920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:55,943-Speed 3306.35 samples/sec   Loss 0.9369   LearningRate 0.0156   Epoch: 12   Global Step: 201930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:35:59,022-Speed 3327.09 samples/sec   Loss 0.9079   LearningRate 0.0156   Epoch: 12   Global Step: 201940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:36:02,158-Speed 3265.65 samples/sec   Loss 0.9246   LearningRate 0.0156   Epoch: 12   Global Step: 201950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:36:05,357-Speed 3202.30 samples/sec   Loss 0.8859   LearningRate 0.0156   Epoch: 12   Global Step: 201960   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:36:08,502-Speed 3256.62 samples/sec   Loss 0.9249   LearningRate 0.0156   Epoch: 12   Global Step: 201970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:36:11,591-Speed 3315.93 samples/sec   Loss 0.9435   LearningRate 0.0156   Epoch: 12   Global Step: 201980   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 20:36:14,652-Speed 3345.80 samples/sec   Loss 0.9447   LearningRate 0.0156   Epoch: 12   Global Step: 201990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:36:17,739-Speed 3318.37 samples/sec   Loss 0.9393   LearningRate 0.0156   Epoch: 12   Global Step: 202000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:37:01,891-[lfw][202000]XNorm: 20.932021
Training: 2022-04-11 20:37:01,892-[lfw][202000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 20:37:01,893-[lfw][202000]Accuracy-Highest: 0.99817
Training: 2022-04-11 20:37:53,109-[cfp_fp][202000]XNorm: 21.647951
Training: 2022-04-11 20:37:53,110-[cfp_fp][202000]Accuracy-Flip: 0.98900+-0.00378
Training: 2022-04-11 20:37:53,110-[cfp_fp][202000]Accuracy-Highest: 0.99029
Training: 2022-04-11 20:38:37,351-[agedb_30][202000]XNorm: 22.332799
Training: 2022-04-11 20:38:37,352-[agedb_30][202000]Accuracy-Flip: 0.98317+-0.00647
Training: 2022-04-11 20:38:37,352-[agedb_30][202000]Accuracy-Highest: 0.98567
Training: 2022-04-11 20:38:40,440-Speed 71.76 samples/sec   Loss 0.9545   LearningRate 0.0156   Epoch: 12   Global Step: 202010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:38:43,515-Speed 3330.62 samples/sec   Loss 0.9341   LearningRate 0.0156   Epoch: 12   Global Step: 202020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:38:46,603-Speed 3316.58 samples/sec   Loss 0.9524   LearningRate 0.0156   Epoch: 12   Global Step: 202030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:38:49,670-Speed 3340.05 samples/sec   Loss 0.9450   LearningRate 0.0156   Epoch: 12   Global Step: 202040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:38:52,780-Speed 3293.15 samples/sec   Loss 0.9564   LearningRate 0.0156   Epoch: 12   Global Step: 202050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:38:55,852-Speed 3333.24 samples/sec   Loss 0.9502   LearningRate 0.0156   Epoch: 12   Global Step: 202060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:38:58,922-Speed 3336.95 samples/sec   Loss 0.9706   LearningRate 0.0156   Epoch: 12   Global Step: 202070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:02,051-Speed 3273.09 samples/sec   Loss 0.9194   LearningRate 0.0156   Epoch: 12   Global Step: 202080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:05,123-Speed 3334.68 samples/sec   Loss 0.9068   LearningRate 0.0156   Epoch: 12   Global Step: 202090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:39:08,191-Speed 3337.82 samples/sec   Loss 0.9310   LearningRate 0.0156   Epoch: 12   Global Step: 202100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:39:11,261-Speed 3336.29 samples/sec   Loss 0.9990   LearningRate 0.0156   Epoch: 12   Global Step: 202110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:39:14,335-Speed 3332.42 samples/sec   Loss 0.9715   LearningRate 0.0156   Epoch: 12   Global Step: 202120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:39:17,427-Speed 3312.37 samples/sec   Loss 0.8966   LearningRate 0.0156   Epoch: 12   Global Step: 202130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:39:20,541-Speed 3289.15 samples/sec   Loss 0.9415   LearningRate 0.0156   Epoch: 12   Global Step: 202140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:39:23,610-Speed 3337.80 samples/sec   Loss 0.9241   LearningRate 0.0156   Epoch: 12   Global Step: 202150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:26,847-Speed 3163.75 samples/sec   Loss 0.9762   LearningRate 0.0156   Epoch: 12   Global Step: 202160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:30,039-Speed 3209.16 samples/sec   Loss 0.9449   LearningRate 0.0156   Epoch: 12   Global Step: 202170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:33,126-Speed 3318.10 samples/sec   Loss 0.9729   LearningRate 0.0156   Epoch: 12   Global Step: 202180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:36,196-Speed 3335.29 samples/sec   Loss 0.9293   LearningRate 0.0155   Epoch: 12   Global Step: 202190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:39,298-Speed 3302.50 samples/sec   Loss 0.9550   LearningRate 0.0155   Epoch: 12   Global Step: 202200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:42,369-Speed 3334.53 samples/sec   Loss 0.9225   LearningRate 0.0155   Epoch: 12   Global Step: 202210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:45,586-Speed 3184.00 samples/sec   Loss 0.9474   LearningRate 0.0155   Epoch: 12   Global Step: 202220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:48,816-Speed 3170.51 samples/sec   Loss 0.9482   LearningRate 0.0155   Epoch: 12   Global Step: 202230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:51,886-Speed 3337.27 samples/sec   Loss 0.9506   LearningRate 0.0155   Epoch: 12   Global Step: 202240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:39:54,960-Speed 3331.44 samples/sec   Loss 0.9248   LearningRate 0.0155   Epoch: 12   Global Step: 202250   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:39:58,098-Speed 3264.39 samples/sec   Loss 0.9652   LearningRate 0.0155   Epoch: 12   Global Step: 202260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:01,188-Speed 3314.28 samples/sec   Loss 0.9494   LearningRate 0.0155   Epoch: 12   Global Step: 202270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:04,264-Speed 3330.23 samples/sec   Loss 0.9355   LearningRate 0.0155   Epoch: 12   Global Step: 202280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:07,357-Speed 3310.78 samples/sec   Loss 0.9148   LearningRate 0.0155   Epoch: 12   Global Step: 202290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:10,466-Speed 3295.01 samples/sec   Loss 0.9281   LearningRate 0.0155   Epoch: 12   Global Step: 202300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:13,712-Speed 3155.44 samples/sec   Loss 0.9477   LearningRate 0.0155   Epoch: 12   Global Step: 202310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:16,893-Speed 3219.72 samples/sec   Loss 0.9625   LearningRate 0.0155   Epoch: 12   Global Step: 202320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:20,045-Speed 3248.73 samples/sec   Loss 0.9617   LearningRate 0.0155   Epoch: 12   Global Step: 202330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:23,147-Speed 3302.94 samples/sec   Loss 0.8995   LearningRate 0.0155   Epoch: 12   Global Step: 202340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:26,220-Speed 3332.51 samples/sec   Loss 0.9125   LearningRate 0.0155   Epoch: 12   Global Step: 202350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:29,291-Speed 3335.72 samples/sec   Loss 0.9362   LearningRate 0.0155   Epoch: 12   Global Step: 202360   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:40:32,359-Speed 3337.58 samples/sec   Loss 0.9587   LearningRate 0.0155   Epoch: 12   Global Step: 202370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:40:35,457-Speed 3306.71 samples/sec   Loss 1.0073   LearningRate 0.0155   Epoch: 12   Global Step: 202380   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:40:38,539-Speed 3323.34 samples/sec   Loss 0.9142   LearningRate 0.0155   Epoch: 12   Global Step: 202390   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:40:41,623-Speed 3320.45 samples/sec   Loss 0.9970   LearningRate 0.0155   Epoch: 12   Global Step: 202400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:44,720-Speed 3307.85 samples/sec   Loss 0.9574   LearningRate 0.0155   Epoch: 12   Global Step: 202410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:47,806-Speed 3318.91 samples/sec   Loss 0.9562   LearningRate 0.0155   Epoch: 12   Global Step: 202420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:50,877-Speed 3335.41 samples/sec   Loss 0.9762   LearningRate 0.0155   Epoch: 12   Global Step: 202430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:53,969-Speed 3312.95 samples/sec   Loss 0.9107   LearningRate 0.0155   Epoch: 12   Global Step: 202440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:40:57,129-Speed 3240.67 samples/sec   Loss 0.9850   LearningRate 0.0155   Epoch: 12   Global Step: 202450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:00,196-Speed 3339.11 samples/sec   Loss 0.9263   LearningRate 0.0155   Epoch: 12   Global Step: 202460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:03,314-Speed 3285.36 samples/sec   Loss 0.9647   LearningRate 0.0155   Epoch: 12   Global Step: 202470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:06,381-Speed 3339.60 samples/sec   Loss 0.9542   LearningRate 0.0155   Epoch: 12   Global Step: 202480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:09,453-Speed 3333.21 samples/sec   Loss 0.9228   LearningRate 0.0155   Epoch: 12   Global Step: 202490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:12,551-Speed 3306.23 samples/sec   Loss 0.9578   LearningRate 0.0155   Epoch: 12   Global Step: 202500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:41:15,640-Speed 3316.26 samples/sec   Loss 0.9067   LearningRate 0.0155   Epoch: 12   Global Step: 202510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:41:18,726-Speed 3319.37 samples/sec   Loss 0.9373   LearningRate 0.0155   Epoch: 12   Global Step: 202520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:41:21,814-Speed 3317.19 samples/sec   Loss 0.9576   LearningRate 0.0155   Epoch: 12   Global Step: 202530   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:41:24,897-Speed 3321.73 samples/sec   Loss 0.9509   LearningRate 0.0155   Epoch: 12   Global Step: 202540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:41:27,954-Speed 3350.01 samples/sec   Loss 0.9345   LearningRate 0.0155   Epoch: 12   Global Step: 202550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:31,025-Speed 3334.76 samples/sec   Loss 0.9505   LearningRate 0.0155   Epoch: 12   Global Step: 202560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:34,098-Speed 3333.92 samples/sec   Loss 0.9474   LearningRate 0.0155   Epoch: 12   Global Step: 202570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:37,174-Speed 3328.81 samples/sec   Loss 1.0006   LearningRate 0.0155   Epoch: 12   Global Step: 202580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:41,169-Speed 2563.88 samples/sec   Loss 0.9219   LearningRate 0.0155   Epoch: 12   Global Step: 202590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:45,895-Speed 2167.43 samples/sec   Loss 0.9848   LearningRate 0.0155   Epoch: 12   Global Step: 202600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:49,924-Speed 2541.94 samples/sec   Loss 0.9571   LearningRate 0.0154   Epoch: 12   Global Step: 202610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:55,007-Speed 2014.61 samples/sec   Loss 0.9447   LearningRate 0.0154   Epoch: 12   Global Step: 202620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:41:58,131-Speed 3279.02 samples/sec   Loss 0.9330   LearningRate 0.0154   Epoch: 12   Global Step: 202630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:42:01,255-Speed 3278.97 samples/sec   Loss 0.9365   LearningRate 0.0154   Epoch: 12   Global Step: 202640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:42:04,322-Speed 3339.11 samples/sec   Loss 0.9334   LearningRate 0.0154   Epoch: 12   Global Step: 202650   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:42:07,391-Speed 3338.25 samples/sec   Loss 0.9742   LearningRate 0.0154   Epoch: 12   Global Step: 202660   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:42:10,455-Speed 3341.82 samples/sec   Loss 1.0147   LearningRate 0.0154   Epoch: 12   Global Step: 202670   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:42:13,522-Speed 3339.52 samples/sec   Loss 0.9787   LearningRate 0.0154   Epoch: 12   Global Step: 202680   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:42:16,584-Speed 3345.74 samples/sec   Loss 1.0207   LearningRate 0.0154   Epoch: 12   Global Step: 202690   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:42:19,625-Speed 3367.83 samples/sec   Loss 0.9532   LearningRate 0.0154   Epoch: 12   Global Step: 202700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:22,771-Speed 3257.43 samples/sec   Loss 0.9856   LearningRate 0.0154   Epoch: 12   Global Step: 202710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:25,848-Speed 3328.54 samples/sec   Loss 0.9629   LearningRate 0.0154   Epoch: 12   Global Step: 202720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:28,917-Speed 3336.89 samples/sec   Loss 0.9645   LearningRate 0.0154   Epoch: 12   Global Step: 202730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:31,993-Speed 3330.03 samples/sec   Loss 0.9999   LearningRate 0.0154   Epoch: 12   Global Step: 202740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:35,054-Speed 3345.93 samples/sec   Loss 0.9695   LearningRate 0.0154   Epoch: 12   Global Step: 202750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:38,130-Speed 3329.86 samples/sec   Loss 0.9794   LearningRate 0.0154   Epoch: 12   Global Step: 202760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:41,192-Speed 3344.54 samples/sec   Loss 0.9665   LearningRate 0.0154   Epoch: 12   Global Step: 202770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:44,256-Speed 3343.22 samples/sec   Loss 0.9661   LearningRate 0.0154   Epoch: 12   Global Step: 202780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:47,324-Speed 3339.06 samples/sec   Loss 0.9813   LearningRate 0.0154   Epoch: 12   Global Step: 202790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:42:50,385-Speed 3347.41 samples/sec   Loss 0.9806   LearningRate 0.0154   Epoch: 12   Global Step: 202800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:42:53,481-Speed 3307.19 samples/sec   Loss 0.9367   LearningRate 0.0154   Epoch: 12   Global Step: 202810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:42:56,566-Speed 3320.96 samples/sec   Loss 0.9516   LearningRate 0.0154   Epoch: 12   Global Step: 202820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:42:59,633-Speed 3339.93 samples/sec   Loss 0.9424   LearningRate 0.0154   Epoch: 12   Global Step: 202830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:02,701-Speed 3338.11 samples/sec   Loss 0.9917   LearningRate 0.0154   Epoch: 12   Global Step: 202840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:05,765-Speed 3342.59 samples/sec   Loss 0.9561   LearningRate 0.0154   Epoch: 12   Global Step: 202850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:08,954-Speed 3211.78 samples/sec   Loss 0.9485   LearningRate 0.0154   Epoch: 12   Global Step: 202860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:12,057-Speed 3301.23 samples/sec   Loss 0.9794   LearningRate 0.0154   Epoch: 12   Global Step: 202870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:15,119-Speed 3344.47 samples/sec   Loss 0.9462   LearningRate 0.0154   Epoch: 12   Global Step: 202880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:18,185-Speed 3340.14 samples/sec   Loss 1.0026   LearningRate 0.0154   Epoch: 12   Global Step: 202890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:21,241-Speed 3352.28 samples/sec   Loss 0.9583   LearningRate 0.0154   Epoch: 12   Global Step: 202900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:24,311-Speed 3335.84 samples/sec   Loss 0.9858   LearningRate 0.0154   Epoch: 12   Global Step: 202910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:27,396-Speed 3320.45 samples/sec   Loss 0.9396   LearningRate 0.0154   Epoch: 12   Global Step: 202920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:30,482-Speed 3318.98 samples/sec   Loss 1.0492   LearningRate 0.0154   Epoch: 12   Global Step: 202930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:33,554-Speed 3334.16 samples/sec   Loss 0.9499   LearningRate 0.0154   Epoch: 12   Global Step: 202940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:36,618-Speed 3342.98 samples/sec   Loss 0.9531   LearningRate 0.0154   Epoch: 12   Global Step: 202950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:39,777-Speed 3242.50 samples/sec   Loss 0.9575   LearningRate 0.0154   Epoch: 12   Global Step: 202960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:43,009-Speed 3168.63 samples/sec   Loss 0.9779   LearningRate 0.0154   Epoch: 12   Global Step: 202970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:46,249-Speed 3160.78 samples/sec   Loss 0.9751   LearningRate 0.0154   Epoch: 12   Global Step: 202980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:49,500-Speed 3150.86 samples/sec   Loss 0.9892   LearningRate 0.0154   Epoch: 12   Global Step: 202990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:43:52,565-Speed 3342.01 samples/sec   Loss 0.9510   LearningRate 0.0154   Epoch: 12   Global Step: 203000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:43:55,632-Speed 3340.40 samples/sec   Loss 0.9625   LearningRate 0.0154   Epoch: 12   Global Step: 203010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:43:58,694-Speed 3344.63 samples/sec   Loss 0.9986   LearningRate 0.0154   Epoch: 12   Global Step: 203020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:01,766-Speed 3334.15 samples/sec   Loss 0.9896   LearningRate 0.0154   Epoch: 12   Global Step: 203030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:04,936-Speed 3230.53 samples/sec   Loss 0.9778   LearningRate 0.0153   Epoch: 12   Global Step: 203040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:08,019-Speed 3322.45 samples/sec   Loss 0.9545   LearningRate 0.0153   Epoch: 12   Global Step: 203050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:11,088-Speed 3336.96 samples/sec   Loss 0.9729   LearningRate 0.0153   Epoch: 12   Global Step: 203060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:14,187-Speed 3305.68 samples/sec   Loss 0.9407   LearningRate 0.0153   Epoch: 12   Global Step: 203070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:17,250-Speed 3343.66 samples/sec   Loss 0.9483   LearningRate 0.0153   Epoch: 12   Global Step: 203080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:20,340-Speed 3314.71 samples/sec   Loss 0.9760   LearningRate 0.0153   Epoch: 12   Global Step: 203090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:23,395-Speed 3353.18 samples/sec   Loss 0.9584   LearningRate 0.0153   Epoch: 12   Global Step: 203100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:26,458-Speed 3344.06 samples/sec   Loss 0.9653   LearningRate 0.0153   Epoch: 12   Global Step: 203110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:29,527-Speed 3337.16 samples/sec   Loss 0.9621   LearningRate 0.0153   Epoch: 12   Global Step: 203120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:32,592-Speed 3340.99 samples/sec   Loss 0.9643   LearningRate 0.0153   Epoch: 12   Global Step: 203130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:35,675-Speed 3322.09 samples/sec   Loss 0.9396   LearningRate 0.0153   Epoch: 12   Global Step: 203140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:38,766-Speed 3314.38 samples/sec   Loss 0.9683   LearningRate 0.0153   Epoch: 12   Global Step: 203150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:41,832-Speed 3339.78 samples/sec   Loss 0.9364   LearningRate 0.0153   Epoch: 12   Global Step: 203160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:44,898-Speed 3341.16 samples/sec   Loss 0.9495   LearningRate 0.0153   Epoch: 12   Global Step: 203170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:44:47,953-Speed 3352.59 samples/sec   Loss 0.9745   LearningRate 0.0153   Epoch: 12   Global Step: 203180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:44:51,019-Speed 3341.57 samples/sec   Loss 1.0057   LearningRate 0.0153   Epoch: 12   Global Step: 203190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:44:54,110-Speed 3313.35 samples/sec   Loss 1.0016   LearningRate 0.0153   Epoch: 12   Global Step: 203200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:44:57,177-Speed 3339.17 samples/sec   Loss 0.9751   LearningRate 0.0153   Epoch: 12   Global Step: 203210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:00,256-Speed 3326.18 samples/sec   Loss 0.9432   LearningRate 0.0153   Epoch: 12   Global Step: 203220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:03,333-Speed 3329.05 samples/sec   Loss 1.0163   LearningRate 0.0153   Epoch: 12   Global Step: 203230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:06,405-Speed 3333.60 samples/sec   Loss 0.9876   LearningRate 0.0153   Epoch: 12   Global Step: 203240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:09,479-Speed 3332.82 samples/sec   Loss 0.9841   LearningRate 0.0153   Epoch: 12   Global Step: 203250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:12,544-Speed 3341.25 samples/sec   Loss 0.9524   LearningRate 0.0153   Epoch: 12   Global Step: 203260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:15,617-Speed 3333.58 samples/sec   Loss 1.0174   LearningRate 0.0153   Epoch: 12   Global Step: 203270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:18,681-Speed 3342.57 samples/sec   Loss 0.9689   LearningRate 0.0153   Epoch: 12   Global Step: 203280   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:45:21,766-Speed 3320.36 samples/sec   Loss 0.9919   LearningRate 0.0153   Epoch: 12   Global Step: 203290   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:45:24,930-Speed 3237.01 samples/sec   Loss 0.9772   LearningRate 0.0153   Epoch: 12   Global Step: 203300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:45:28,028-Speed 3306.17 samples/sec   Loss 0.9275   LearningRate 0.0153   Epoch: 12   Global Step: 203310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:45:31,083-Speed 3352.65 samples/sec   Loss 1.0191   LearningRate 0.0153   Epoch: 12   Global Step: 203320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:34,150-Speed 3339.05 samples/sec   Loss 0.9520   LearningRate 0.0153   Epoch: 12   Global Step: 203330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:37,236-Speed 3319.44 samples/sec   Loss 0.9697   LearningRate 0.0153   Epoch: 12   Global Step: 203340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:40,300-Speed 3343.51 samples/sec   Loss 1.0156   LearningRate 0.0153   Epoch: 12   Global Step: 203350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:43,471-Speed 3230.03 samples/sec   Loss 1.0028   LearningRate 0.0153   Epoch: 12   Global Step: 203360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:46,587-Speed 3286.80 samples/sec   Loss 0.9451   LearningRate 0.0153   Epoch: 12   Global Step: 203370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:49,701-Speed 3288.61 samples/sec   Loss 0.9869   LearningRate 0.0153   Epoch: 12   Global Step: 203380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:52,851-Speed 3251.25 samples/sec   Loss 1.0051   LearningRate 0.0153   Epoch: 12   Global Step: 203390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:55,924-Speed 3333.63 samples/sec   Loss 1.0019   LearningRate 0.0153   Epoch: 12   Global Step: 203400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:45:58,998-Speed 3332.12 samples/sec   Loss 1.0124   LearningRate 0.0153   Epoch: 12   Global Step: 203410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:02,065-Speed 3339.43 samples/sec   Loss 0.9688   LearningRate 0.0153   Epoch: 12   Global Step: 203420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:46:05,210-Speed 3256.69 samples/sec   Loss 1.0231   LearningRate 0.0153   Epoch: 12   Global Step: 203430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:46:08,289-Speed 3326.80 samples/sec   Loss 0.9920   LearningRate 0.0153   Epoch: 12   Global Step: 203440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:46:11,359-Speed 3336.02 samples/sec   Loss 0.9428   LearningRate 0.0153   Epoch: 12   Global Step: 203450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:46:14,443-Speed 3321.89 samples/sec   Loss 0.9882   LearningRate 0.0152   Epoch: 12   Global Step: 203460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:46:17,513-Speed 3335.31 samples/sec   Loss 0.9786   LearningRate 0.0152   Epoch: 12   Global Step: 203470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:46:20,646-Speed 3269.33 samples/sec   Loss 0.9570   LearningRate 0.0152   Epoch: 12   Global Step: 203480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:46:23,771-Speed 3277.77 samples/sec   Loss 0.9544   LearningRate 0.0152   Epoch: 12   Global Step: 203490   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:46:26,831-Speed 3347.06 samples/sec   Loss 0.9984   LearningRate 0.0152   Epoch: 12   Global Step: 203500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:29,903-Speed 3334.97 samples/sec   Loss 0.9471   LearningRate 0.0152   Epoch: 12   Global Step: 203510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:32,988-Speed 3319.07 samples/sec   Loss 1.0333   LearningRate 0.0152   Epoch: 12   Global Step: 203520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:36,062-Speed 3332.67 samples/sec   Loss 1.0008   LearningRate 0.0152   Epoch: 12   Global Step: 203530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:39,131-Speed 3337.33 samples/sec   Loss 0.9743   LearningRate 0.0152   Epoch: 12   Global Step: 203540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:42,196-Speed 3342.07 samples/sec   Loss 1.0256   LearningRate 0.0152   Epoch: 12   Global Step: 203550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:45,280-Speed 3321.04 samples/sec   Loss 0.9783   LearningRate 0.0152   Epoch: 12   Global Step: 203560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:48,400-Speed 3282.45 samples/sec   Loss 0.9905   LearningRate 0.0152   Epoch: 12   Global Step: 203570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:51,483-Speed 3322.19 samples/sec   Loss 1.0243   LearningRate 0.0152   Epoch: 12   Global Step: 203580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:54,568-Speed 3319.76 samples/sec   Loss 0.9438   LearningRate 0.0152   Epoch: 12   Global Step: 203590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:46:57,635-Speed 3340.20 samples/sec   Loss 0.9770   LearningRate 0.0152   Epoch: 12   Global Step: 203600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:00,835-Speed 3200.41 samples/sec   Loss 0.9759   LearningRate 0.0152   Epoch: 12   Global Step: 203610   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:03,931-Speed 3308.71 samples/sec   Loss 1.0138   LearningRate 0.0152   Epoch: 12   Global Step: 203620   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:07,033-Speed 3301.58 samples/sec   Loss 1.0381   LearningRate 0.0152   Epoch: 12   Global Step: 203630   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:10,132-Speed 3305.01 samples/sec   Loss 1.0085   LearningRate 0.0152   Epoch: 12   Global Step: 203640   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:13,242-Speed 3293.17 samples/sec   Loss 0.9988   LearningRate 0.0152   Epoch: 12   Global Step: 203650   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:16,330-Speed 3316.13 samples/sec   Loss 0.9927   LearningRate 0.0152   Epoch: 12   Global Step: 203660   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:19,414-Speed 3321.83 samples/sec   Loss 1.0146   LearningRate 0.0152   Epoch: 12   Global Step: 203670   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:22,522-Speed 3295.50 samples/sec   Loss 1.0496   LearningRate 0.0152   Epoch: 12   Global Step: 203680   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:25,588-Speed 3340.35 samples/sec   Loss 0.9941   LearningRate 0.0152   Epoch: 12   Global Step: 203690   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:28,685-Speed 3307.13 samples/sec   Loss 0.9897   LearningRate 0.0152   Epoch: 12   Global Step: 203700   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:47:31,740-Speed 3353.31 samples/sec   Loss 1.0026   LearningRate 0.0152   Epoch: 12   Global Step: 203710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:47:34,853-Speed 3289.66 samples/sec   Loss 0.9808   LearningRate 0.0152   Epoch: 12   Global Step: 203720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:47:37,953-Speed 3305.02 samples/sec   Loss 1.0373   LearningRate 0.0152   Epoch: 12   Global Step: 203730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:47:41,056-Speed 3300.33 samples/sec   Loss 0.9887   LearningRate 0.0152   Epoch: 12   Global Step: 203740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:47:44,132-Speed 3328.92 samples/sec   Loss 1.0065   LearningRate 0.0152   Epoch: 12   Global Step: 203750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:47:47,203-Speed 3335.54 samples/sec   Loss 0.9918   LearningRate 0.0152   Epoch: 12   Global Step: 203760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:47:50,272-Speed 3338.00 samples/sec   Loss 0.9912   LearningRate 0.0152   Epoch: 12   Global Step: 203770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:47:53,370-Speed 3305.86 samples/sec   Loss 0.9955   LearningRate 0.0152   Epoch: 12   Global Step: 203780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:47:56,458-Speed 3316.50 samples/sec   Loss 0.9664   LearningRate 0.0152   Epoch: 12   Global Step: 203790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:47:59,556-Speed 3306.77 samples/sec   Loss 0.9647   LearningRate 0.0152   Epoch: 12   Global Step: 203800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:02,628-Speed 3334.11 samples/sec   Loss 1.0002   LearningRate 0.0152   Epoch: 12   Global Step: 203810   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:48:05,693-Speed 3341.85 samples/sec   Loss 0.9986   LearningRate 0.0152   Epoch: 12   Global Step: 203820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:48:08,764-Speed 3334.81 samples/sec   Loss 1.0024   LearningRate 0.0152   Epoch: 12   Global Step: 203830   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:48:11,836-Speed 3334.21 samples/sec   Loss 0.9737   LearningRate 0.0152   Epoch: 12   Global Step: 203840   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:48:14,990-Speed 3246.76 samples/sec   Loss 1.0010   LearningRate 0.0152   Epoch: 12   Global Step: 203850   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:48:18,087-Speed 3308.05 samples/sec   Loss 1.0071   LearningRate 0.0152   Epoch: 12   Global Step: 203860   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:48:21,151-Speed 3343.27 samples/sec   Loss 1.0082   LearningRate 0.0152   Epoch: 12   Global Step: 203870   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:48:24,205-Speed 3352.70 samples/sec   Loss 1.0433   LearningRate 0.0152   Epoch: 12   Global Step: 203880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:27,294-Speed 3316.09 samples/sec   Loss 1.0096   LearningRate 0.0151   Epoch: 12   Global Step: 203890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:30,380-Speed 3319.05 samples/sec   Loss 0.9686   LearningRate 0.0151   Epoch: 12   Global Step: 203900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:33,452-Speed 3334.28 samples/sec   Loss 1.0156   LearningRate 0.0151   Epoch: 12   Global Step: 203910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:36,538-Speed 3318.73 samples/sec   Loss 0.9968   LearningRate 0.0151   Epoch: 12   Global Step: 203920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:39,613-Speed 3330.66 samples/sec   Loss 1.0048   LearningRate 0.0151   Epoch: 12   Global Step: 203930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:42,700-Speed 3318.24 samples/sec   Loss 0.9688   LearningRate 0.0151   Epoch: 12   Global Step: 203940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:45,784-Speed 3321.43 samples/sec   Loss 1.0073   LearningRate 0.0151   Epoch: 12   Global Step: 203950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:48,886-Speed 3301.84 samples/sec   Loss 1.0075   LearningRate 0.0151   Epoch: 12   Global Step: 203960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:51,974-Speed 3317.39 samples/sec   Loss 0.9666   LearningRate 0.0151   Epoch: 12   Global Step: 203970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:48:55,206-Speed 3168.61 samples/sec   Loss 0.9946   LearningRate 0.0151   Epoch: 12   Global Step: 203980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:48:58,323-Speed 3285.93 samples/sec   Loss 1.0245   LearningRate 0.0151   Epoch: 12   Global Step: 203990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:49:01,393-Speed 3336.50 samples/sec   Loss 1.0021   LearningRate 0.0151   Epoch: 12   Global Step: 204000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:49:45,850-[lfw][204000]XNorm: 22.520686
Training: 2022-04-11 20:49:45,851-[lfw][204000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 20:49:45,851-[lfw][204000]Accuracy-Highest: 0.99817
Training: 2022-04-11 20:50:37,443-[cfp_fp][204000]XNorm: 22.766375
Training: 2022-04-11 20:50:37,444-[cfp_fp][204000]Accuracy-Flip: 0.99086+-0.00543
Training: 2022-04-11 20:50:37,444-[cfp_fp][204000]Accuracy-Highest: 0.99086
Training: 2022-04-11 20:51:21,923-[agedb_30][204000]XNorm: 23.431673
Training: 2022-04-11 20:51:21,924-[agedb_30][204000]Accuracy-Flip: 0.98267+-0.00764
Training: 2022-04-11 20:51:21,924-[agedb_30][204000]Accuracy-Highest: 0.98567
Training: 2022-04-11 20:51:25,001-Speed 71.31 samples/sec   Loss 0.9767   LearningRate 0.0151   Epoch: 12   Global Step: 204010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:28,064-Speed 3343.00 samples/sec   Loss 1.0326   LearningRate 0.0151   Epoch: 12   Global Step: 204020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:31,132-Speed 3339.45 samples/sec   Loss 0.9868   LearningRate 0.0151   Epoch: 12   Global Step: 204030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:34,191-Speed 3347.88 samples/sec   Loss 1.0034   LearningRate 0.0151   Epoch: 12   Global Step: 204040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:37,252-Speed 3346.30 samples/sec   Loss 1.0221   LearningRate 0.0151   Epoch: 12   Global Step: 204050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:40,314-Speed 3344.69 samples/sec   Loss 1.0518   LearningRate 0.0151   Epoch: 12   Global Step: 204060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:43,385-Speed 3335.26 samples/sec   Loss 0.9798   LearningRate 0.0151   Epoch: 12   Global Step: 204070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:46,468-Speed 3321.88 samples/sec   Loss 1.0014   LearningRate 0.0151   Epoch: 12   Global Step: 204080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:49,547-Speed 3326.98 samples/sec   Loss 0.9886   LearningRate 0.0151   Epoch: 12   Global Step: 204090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:52,634-Speed 3318.18 samples/sec   Loss 1.0127   LearningRate 0.0151   Epoch: 12   Global Step: 204100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:51:55,699-Speed 3341.72 samples/sec   Loss 0.9986   LearningRate 0.0151   Epoch: 12   Global Step: 204110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:51:58,755-Speed 3351.13 samples/sec   Loss 1.0565   LearningRate 0.0151   Epoch: 12   Global Step: 204120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:52:01,816-Speed 3346.87 samples/sec   Loss 1.0345   LearningRate 0.0151   Epoch: 12   Global Step: 204130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:52:04,900-Speed 3320.86 samples/sec   Loss 1.0161   LearningRate 0.0151   Epoch: 12   Global Step: 204140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:52:07,953-Speed 3354.32 samples/sec   Loss 1.0268   LearningRate 0.0151   Epoch: 12   Global Step: 204150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:11,186-Speed 3168.59 samples/sec   Loss 1.0046   LearningRate 0.0151   Epoch: 12   Global Step: 204160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:14,285-Speed 3304.75 samples/sec   Loss 1.0579   LearningRate 0.0151   Epoch: 12   Global Step: 204170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:17,363-Speed 3327.67 samples/sec   Loss 0.9597   LearningRate 0.0151   Epoch: 12   Global Step: 204180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:20,477-Speed 3288.93 samples/sec   Loss 1.0070   LearningRate 0.0151   Epoch: 12   Global Step: 204190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:23,574-Speed 3307.53 samples/sec   Loss 0.9642   LearningRate 0.0151   Epoch: 12   Global Step: 204200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:26,650-Speed 3329.41 samples/sec   Loss 1.0341   LearningRate 0.0151   Epoch: 12   Global Step: 204210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:29,713-Speed 3344.38 samples/sec   Loss 1.0153   LearningRate 0.0151   Epoch: 12   Global Step: 204220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:32,802-Speed 3315.94 samples/sec   Loss 1.0038   LearningRate 0.0151   Epoch: 12   Global Step: 204230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:35,870-Speed 3337.96 samples/sec   Loss 0.9779   LearningRate 0.0151   Epoch: 12   Global Step: 204240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:38,929-Speed 3347.95 samples/sec   Loss 0.9902   LearningRate 0.0151   Epoch: 12   Global Step: 204250   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:52:42,005-Speed 3329.97 samples/sec   Loss 0.9765   LearningRate 0.0151   Epoch: 12   Global Step: 204260   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:52:45,068-Speed 3343.56 samples/sec   Loss 0.9742   LearningRate 0.0151   Epoch: 12   Global Step: 204270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:48,156-Speed 3318.32 samples/sec   Loss 0.9703   LearningRate 0.0151   Epoch: 12   Global Step: 204280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:51,218-Speed 3344.51 samples/sec   Loss 1.0488   LearningRate 0.0151   Epoch: 12   Global Step: 204290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:54,305-Speed 3318.31 samples/sec   Loss 1.1031   LearningRate 0.0151   Epoch: 12   Global Step: 204300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:52:57,379-Speed 3331.65 samples/sec   Loss 1.0270   LearningRate 0.0151   Epoch: 12   Global Step: 204310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:00,487-Speed 3296.14 samples/sec   Loss 1.0048   LearningRate 0.0150   Epoch: 12   Global Step: 204320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:03,613-Speed 3275.82 samples/sec   Loss 1.0300   LearningRate 0.0150   Epoch: 12   Global Step: 204330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:06,707-Speed 3310.34 samples/sec   Loss 0.9945   LearningRate 0.0150   Epoch: 12   Global Step: 204340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:09,811-Speed 3300.08 samples/sec   Loss 1.0029   LearningRate 0.0150   Epoch: 12   Global Step: 204350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:12,875-Speed 3343.15 samples/sec   Loss 0.9716   LearningRate 0.0150   Epoch: 12   Global Step: 204360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:15,954-Speed 3326.77 samples/sec   Loss 1.0272   LearningRate 0.0150   Epoch: 12   Global Step: 204370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:53:19,032-Speed 3328.24 samples/sec   Loss 0.9640   LearningRate 0.0150   Epoch: 12   Global Step: 204380   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:53:22,087-Speed 3351.62 samples/sec   Loss 1.0302   LearningRate 0.0150   Epoch: 12   Global Step: 204390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:25,148-Speed 3346.90 samples/sec   Loss 1.0206   LearningRate 0.0150   Epoch: 12   Global Step: 204400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:28,230-Speed 3323.39 samples/sec   Loss 0.9871   LearningRate 0.0150   Epoch: 12   Global Step: 204410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:31,292-Speed 3344.87 samples/sec   Loss 1.0133   LearningRate 0.0150   Epoch: 12   Global Step: 204420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:34,418-Speed 3276.11 samples/sec   Loss 0.9761   LearningRate 0.0150   Epoch: 12   Global Step: 204430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:37,550-Speed 3270.74 samples/sec   Loss 1.0349   LearningRate 0.0150   Epoch: 12   Global Step: 204440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:40,783-Speed 3167.20 samples/sec   Loss 1.0221   LearningRate 0.0150   Epoch: 12   Global Step: 204450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:43,888-Speed 3299.56 samples/sec   Loss 1.0088   LearningRate 0.0150   Epoch: 12   Global Step: 204460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:46,994-Speed 3297.44 samples/sec   Loss 1.0314   LearningRate 0.0150   Epoch: 12   Global Step: 204470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:50,098-Speed 3299.67 samples/sec   Loss 1.0271   LearningRate 0.0150   Epoch: 12   Global Step: 204480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:53:53,171-Speed 3333.19 samples/sec   Loss 1.0241   LearningRate 0.0150   Epoch: 12   Global Step: 204490   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:53:56,234-Speed 3344.46 samples/sec   Loss 0.9995   LearningRate 0.0150   Epoch: 12   Global Step: 204500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:53:59,303-Speed 3336.48 samples/sec   Loss 1.0379   LearningRate 0.0150   Epoch: 12   Global Step: 204510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:02,378-Speed 3331.18 samples/sec   Loss 1.0252   LearningRate 0.0150   Epoch: 12   Global Step: 204520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:05,452-Speed 3331.65 samples/sec   Loss 1.0518   LearningRate 0.0150   Epoch: 12   Global Step: 204530   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:08,551-Speed 3305.95 samples/sec   Loss 1.0350   LearningRate 0.0150   Epoch: 12   Global Step: 204540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:11,612-Speed 3345.64 samples/sec   Loss 1.0235   LearningRate 0.0150   Epoch: 12   Global Step: 204550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:14,672-Speed 3347.36 samples/sec   Loss 1.0362   LearningRate 0.0150   Epoch: 12   Global Step: 204560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:17,737-Speed 3341.71 samples/sec   Loss 1.0011   LearningRate 0.0150   Epoch: 12   Global Step: 204570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:20,801-Speed 3343.09 samples/sec   Loss 0.9992   LearningRate 0.0150   Epoch: 12   Global Step: 204580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:23,863-Speed 3344.43 samples/sec   Loss 1.0656   LearningRate 0.0150   Epoch: 12   Global Step: 204590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:26,936-Speed 3333.27 samples/sec   Loss 1.0039   LearningRate 0.0150   Epoch: 12   Global Step: 204600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:29,994-Speed 3349.95 samples/sec   Loss 0.9942   LearningRate 0.0150   Epoch: 12   Global Step: 204610   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:33,096-Speed 3301.71 samples/sec   Loss 1.0419   LearningRate 0.0150   Epoch: 12   Global Step: 204620   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:36,156-Speed 3347.38 samples/sec   Loss 1.0241   LearningRate 0.0150   Epoch: 12   Global Step: 204630   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:54:39,215-Speed 3347.71 samples/sec   Loss 0.9970   LearningRate 0.0150   Epoch: 12   Global Step: 204640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:54:42,278-Speed 3344.16 samples/sec   Loss 1.0253   LearningRate 0.0150   Epoch: 12   Global Step: 204650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:54:45,343-Speed 3341.19 samples/sec   Loss 0.9925   LearningRate 0.0150   Epoch: 12   Global Step: 204660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:54:48,406-Speed 3344.38 samples/sec   Loss 1.0600   LearningRate 0.0150   Epoch: 12   Global Step: 204670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:54:51,466-Speed 3347.52 samples/sec   Loss 0.9970   LearningRate 0.0150   Epoch: 12   Global Step: 204680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:54:54,528-Speed 3344.73 samples/sec   Loss 1.0396   LearningRate 0.0150   Epoch: 12   Global Step: 204690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:54:57,600-Speed 3333.30 samples/sec   Loss 1.0252   LearningRate 0.0150   Epoch: 12   Global Step: 204700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:00,673-Speed 3333.61 samples/sec   Loss 1.0342   LearningRate 0.0150   Epoch: 12   Global Step: 204710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:03,733-Speed 3346.69 samples/sec   Loss 1.0202   LearningRate 0.0150   Epoch: 12   Global Step: 204720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:06,808-Speed 3331.31 samples/sec   Loss 1.0243   LearningRate 0.0150   Epoch: 12   Global Step: 204730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:09,871-Speed 3344.31 samples/sec   Loss 0.9898   LearningRate 0.0150   Epoch: 12   Global Step: 204740   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:55:12,959-Speed 3316.70 samples/sec   Loss 1.0585   LearningRate 0.0149   Epoch: 12   Global Step: 204750   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:55:16,036-Speed 3328.51 samples/sec   Loss 0.9811   LearningRate 0.0149   Epoch: 12   Global Step: 204760   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:55:19,115-Speed 3326.93 samples/sec   Loss 1.0068   LearningRate 0.0149   Epoch: 12   Global Step: 204770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:55:22,177-Speed 3344.53 samples/sec   Loss 1.0177   LearningRate 0.0149   Epoch: 12   Global Step: 204780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:55:25,258-Speed 3324.82 samples/sec   Loss 0.9974   LearningRate 0.0149   Epoch: 12   Global Step: 204790   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:55:28,323-Speed 3341.79 samples/sec   Loss 0.9718   LearningRate 0.0149   Epoch: 12   Global Step: 204800   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:55:31,394-Speed 3335.04 samples/sec   Loss 1.0002   LearningRate 0.0149   Epoch: 12   Global Step: 204810   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:55:34,466-Speed 3334.79 samples/sec   Loss 1.0352   LearningRate 0.0149   Epoch: 12   Global Step: 204820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:55:37,631-Speed 3235.33 samples/sec   Loss 1.0496   LearningRate 0.0149   Epoch: 12   Global Step: 204830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:40,733-Speed 3302.73 samples/sec   Loss 0.9959   LearningRate 0.0149   Epoch: 12   Global Step: 204840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:43,799-Speed 3339.78 samples/sec   Loss 1.0480   LearningRate 0.0149   Epoch: 12   Global Step: 204850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:46,896-Speed 3307.39 samples/sec   Loss 1.0355   LearningRate 0.0149   Epoch: 12   Global Step: 204860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:49,963-Speed 3339.51 samples/sec   Loss 1.0294   LearningRate 0.0149   Epoch: 12   Global Step: 204870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:53,040-Speed 3328.82 samples/sec   Loss 1.0241   LearningRate 0.0149   Epoch: 12   Global Step: 204880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:56,106-Speed 3340.45 samples/sec   Loss 1.0010   LearningRate 0.0149   Epoch: 12   Global Step: 204890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:55:59,210-Speed 3299.49 samples/sec   Loss 1.0433   LearningRate 0.0149   Epoch: 12   Global Step: 204900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:56:02,295-Speed 3320.50 samples/sec   Loss 1.0020   LearningRate 0.0149   Epoch: 12   Global Step: 204910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:56:05,414-Speed 3283.59 samples/sec   Loss 1.0237   LearningRate 0.0149   Epoch: 12   Global Step: 204920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:56:08,506-Speed 3312.43 samples/sec   Loss 0.9845   LearningRate 0.0149   Epoch: 12   Global Step: 204930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:11,594-Speed 3317.54 samples/sec   Loss 1.0411   LearningRate 0.0149   Epoch: 12   Global Step: 204940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:14,662-Speed 3338.53 samples/sec   Loss 1.0620   LearningRate 0.0149   Epoch: 12   Global Step: 204950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:17,724-Speed 3345.10 samples/sec   Loss 1.0183   LearningRate 0.0149   Epoch: 12   Global Step: 204960   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:20,788-Speed 3342.10 samples/sec   Loss 1.0155   LearningRate 0.0149   Epoch: 12   Global Step: 204970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:23,867-Speed 3326.97 samples/sec   Loss 1.0288   LearningRate 0.0149   Epoch: 12   Global Step: 204980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:26,944-Speed 3328.71 samples/sec   Loss 1.0097   LearningRate 0.0149   Epoch: 12   Global Step: 204990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:30,007-Speed 3343.81 samples/sec   Loss 1.0274   LearningRate 0.0149   Epoch: 12   Global Step: 205000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:33,124-Speed 3285.86 samples/sec   Loss 1.0353   LearningRate 0.0149   Epoch: 12   Global Step: 205010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:36,206-Speed 3323.32 samples/sec   Loss 0.9826   LearningRate 0.0149   Epoch: 12   Global Step: 205020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:39,291-Speed 3320.12 samples/sec   Loss 1.0329   LearningRate 0.0149   Epoch: 12   Global Step: 205030   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 20:56:42,423-Speed 3270.27 samples/sec   Loss 1.0147   LearningRate 0.0149   Epoch: 12   Global Step: 205040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:45,490-Speed 3339.09 samples/sec   Loss 1.0502   LearningRate 0.0149   Epoch: 12   Global Step: 205050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:48,568-Speed 3327.86 samples/sec   Loss 1.0716   LearningRate 0.0149   Epoch: 12   Global Step: 205060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:51,631-Speed 3343.44 samples/sec   Loss 1.0358   LearningRate 0.0149   Epoch: 12   Global Step: 205070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:56:54,723-Speed 3313.34 samples/sec   Loss 1.0327   LearningRate 0.0149   Epoch: 12   Global Step: 205080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:56:57,790-Speed 3339.39 samples/sec   Loss 1.0317   LearningRate 0.0149   Epoch: 12   Global Step: 205090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:00,860-Speed 3336.28 samples/sec   Loss 1.0347   LearningRate 0.0149   Epoch: 12   Global Step: 205100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:03,926-Speed 3340.64 samples/sec   Loss 1.0326   LearningRate 0.0149   Epoch: 12   Global Step: 205110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:07,078-Speed 3249.82 samples/sec   Loss 1.0308   LearningRate 0.0149   Epoch: 12   Global Step: 205120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:10,199-Speed 3282.22 samples/sec   Loss 1.0275   LearningRate 0.0149   Epoch: 12   Global Step: 205130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:13,271-Speed 3333.30 samples/sec   Loss 1.0225   LearningRate 0.0149   Epoch: 12   Global Step: 205140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:16,344-Speed 3333.65 samples/sec   Loss 1.0409   LearningRate 0.0149   Epoch: 12   Global Step: 205150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:19,567-Speed 3177.14 samples/sec   Loss 1.0444   LearningRate 0.0149   Epoch: 12   Global Step: 205160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:22,691-Speed 3278.66 samples/sec   Loss 1.0157   LearningRate 0.0149   Epoch: 12   Global Step: 205170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:25,763-Speed 3334.98 samples/sec   Loss 1.0391   LearningRate 0.0149   Epoch: 12   Global Step: 205180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:57:28,836-Speed 3332.78 samples/sec   Loss 1.0197   LearningRate 0.0148   Epoch: 12   Global Step: 205190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:57:31,904-Speed 3338.39 samples/sec   Loss 0.9973   LearningRate 0.0148   Epoch: 12   Global Step: 205200   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:57:34,976-Speed 3334.56 samples/sec   Loss 1.0576   LearningRate 0.0148   Epoch: 12   Global Step: 205210   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:57:38,040-Speed 3343.23 samples/sec   Loss 1.0477   LearningRate 0.0148   Epoch: 12   Global Step: 205220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:57:41,100-Speed 3347.24 samples/sec   Loss 1.1013   LearningRate 0.0148   Epoch: 12   Global Step: 205230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:57:44,177-Speed 3328.49 samples/sec   Loss 1.0534   LearningRate 0.0148   Epoch: 12   Global Step: 205240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:57:47,268-Speed 3313.70 samples/sec   Loss 1.0410   LearningRate 0.0148   Epoch: 12   Global Step: 205250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:57:50,348-Speed 3325.10 samples/sec   Loss 1.0273   LearningRate 0.0148   Epoch: 12   Global Step: 205260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:57:53,415-Speed 3340.11 samples/sec   Loss 1.0078   LearningRate 0.0148   Epoch: 12   Global Step: 205270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:57:56,490-Speed 3331.07 samples/sec   Loss 1.0290   LearningRate 0.0148   Epoch: 12   Global Step: 205280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:57:59,571-Speed 3323.84 samples/sec   Loss 1.0258   LearningRate 0.0148   Epoch: 12   Global Step: 205290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:58:02,650-Speed 3326.12 samples/sec   Loss 1.0486   LearningRate 0.0148   Epoch: 12   Global Step: 205300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:58:05,733-Speed 3323.14 samples/sec   Loss 1.0479   LearningRate 0.0148   Epoch: 12   Global Step: 205310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:58:08,823-Speed 3314.31 samples/sec   Loss 1.0032   LearningRate 0.0148   Epoch: 12   Global Step: 205320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 20:58:11,913-Speed 3314.53 samples/sec   Loss 0.9854   LearningRate 0.0148   Epoch: 12   Global Step: 205330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:14,984-Speed 3334.89 samples/sec   Loss 1.0396   LearningRate 0.0148   Epoch: 12   Global Step: 205340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:18,058-Speed 3332.02 samples/sec   Loss 1.0310   LearningRate 0.0148   Epoch: 12   Global Step: 205350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:21,181-Speed 3280.33 samples/sec   Loss 0.9709   LearningRate 0.0148   Epoch: 12   Global Step: 205360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:24,252-Speed 3335.41 samples/sec   Loss 1.0746   LearningRate 0.0148   Epoch: 12   Global Step: 205370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:27,321-Speed 3337.36 samples/sec   Loss 1.0191   LearningRate 0.0148   Epoch: 12   Global Step: 205380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:30,385-Speed 3342.63 samples/sec   Loss 1.0154   LearningRate 0.0148   Epoch: 12   Global Step: 205390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:33,538-Speed 3248.02 samples/sec   Loss 1.0166   LearningRate 0.0148   Epoch: 12   Global Step: 205400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:36,626-Speed 3316.76 samples/sec   Loss 1.0225   LearningRate 0.0148   Epoch: 12   Global Step: 205410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:39,730-Speed 3300.12 samples/sec   Loss 1.0179   LearningRate 0.0148   Epoch: 12   Global Step: 205420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:58:42,821-Speed 3314.20 samples/sec   Loss 1.0816   LearningRate 0.0148   Epoch: 12   Global Step: 205430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:58:45,909-Speed 3316.23 samples/sec   Loss 1.0324   LearningRate 0.0148   Epoch: 12   Global Step: 205440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:58:49,148-Speed 3162.22 samples/sec   Loss 1.0637   LearningRate 0.0148   Epoch: 12   Global Step: 205450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:58:52,256-Speed 3295.34 samples/sec   Loss 1.0456   LearningRate 0.0148   Epoch: 12   Global Step: 205460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:58:55,325-Speed 3338.22 samples/sec   Loss 1.0349   LearningRate 0.0148   Epoch: 12   Global Step: 205470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:58:58,444-Speed 3283.34 samples/sec   Loss 1.0182   LearningRate 0.0148   Epoch: 12   Global Step: 205480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:01,620-Speed 3224.57 samples/sec   Loss 1.0510   LearningRate 0.0148   Epoch: 12   Global Step: 205490   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:04,709-Speed 3315.91 samples/sec   Loss 1.0147   LearningRate 0.0148   Epoch: 12   Global Step: 205500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:07,798-Speed 3315.79 samples/sec   Loss 1.0564   LearningRate 0.0148   Epoch: 12   Global Step: 205510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:10,906-Speed 3295.84 samples/sec   Loss 1.0426   LearningRate 0.0148   Epoch: 12   Global Step: 205520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:13,981-Speed 3331.72 samples/sec   Loss 0.9891   LearningRate 0.0148   Epoch: 12   Global Step: 205530   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:17,068-Speed 3317.04 samples/sec   Loss 1.0560   LearningRate 0.0148   Epoch: 12   Global Step: 205540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:20,159-Speed 3313.47 samples/sec   Loss 1.0770   LearningRate 0.0148   Epoch: 12   Global Step: 205550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:23,381-Speed 3179.39 samples/sec   Loss 1.0755   LearningRate 0.0148   Epoch: 12   Global Step: 205560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:26,587-Speed 3194.60 samples/sec   Loss 1.0519   LearningRate 0.0148   Epoch: 12   Global Step: 205570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 20:59:29,645-Speed 3348.91 samples/sec   Loss 1.0363   LearningRate 0.0148   Epoch: 12   Global Step: 205580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:59:32,725-Speed 3325.71 samples/sec   Loss 1.0253   LearningRate 0.0148   Epoch: 12   Global Step: 205590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:59:35,799-Speed 3331.69 samples/sec   Loss 1.0536   LearningRate 0.0148   Epoch: 12   Global Step: 205600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:59:38,899-Speed 3304.84 samples/sec   Loss 1.0148   LearningRate 0.0148   Epoch: 12   Global Step: 205610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:59:42,007-Speed 3295.27 samples/sec   Loss 1.0427   LearningRate 0.0147   Epoch: 12   Global Step: 205620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:59:45,085-Speed 3328.23 samples/sec   Loss 1.0360   LearningRate 0.0147   Epoch: 12   Global Step: 205630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:59:48,152-Speed 3338.96 samples/sec   Loss 1.0489   LearningRate 0.0147   Epoch: 12   Global Step: 205640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:59:51,220-Speed 3338.17 samples/sec   Loss 1.0759   LearningRate 0.0147   Epoch: 12   Global Step: 205650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:59:54,310-Speed 3314.60 samples/sec   Loss 1.0100   LearningRate 0.0147   Epoch: 12   Global Step: 205660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 20:59:57,377-Speed 3340.04 samples/sec   Loss 1.0615   LearningRate 0.0147   Epoch: 12   Global Step: 205670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:00:00,448-Speed 3335.35 samples/sec   Loss 1.0618   LearningRate 0.0147   Epoch: 12   Global Step: 205680   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:00:03,505-Speed 3350.24 samples/sec   Loss 1.0397   LearningRate 0.0147   Epoch: 12   Global Step: 205690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:06,573-Speed 3338.18 samples/sec   Loss 1.0724   LearningRate 0.0147   Epoch: 12   Global Step: 205700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:09,651-Speed 3328.45 samples/sec   Loss 1.0160   LearningRate 0.0147   Epoch: 12   Global Step: 205710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:12,719-Speed 3337.94 samples/sec   Loss 0.9844   LearningRate 0.0147   Epoch: 12   Global Step: 205720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:15,813-Speed 3311.00 samples/sec   Loss 1.0138   LearningRate 0.0147   Epoch: 12   Global Step: 205730   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:18,898-Speed 3320.07 samples/sec   Loss 1.0185   LearningRate 0.0147   Epoch: 12   Global Step: 205740   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:21,976-Speed 3327.52 samples/sec   Loss 1.0310   LearningRate 0.0147   Epoch: 12   Global Step: 205750   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:25,044-Speed 3337.88 samples/sec   Loss 1.0444   LearningRate 0.0147   Epoch: 12   Global Step: 205760   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:28,135-Speed 3314.05 samples/sec   Loss 1.0317   LearningRate 0.0147   Epoch: 12   Global Step: 205770   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:31,203-Speed 3338.72 samples/sec   Loss 1.0619   LearningRate 0.0147   Epoch: 12   Global Step: 205780   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 21:00:34,318-Speed 3288.46 samples/sec   Loss 1.0494   LearningRate 0.0147   Epoch: 12   Global Step: 205790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:00:37,461-Speed 3258.25 samples/sec   Loss 1.0674   LearningRate 0.0147   Epoch: 12   Global Step: 205800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:00:40,536-Speed 3331.09 samples/sec   Loss 1.1143   LearningRate 0.0147   Epoch: 12   Global Step: 205810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:00:43,614-Speed 3327.09 samples/sec   Loss 1.0272   LearningRate 0.0147   Epoch: 12   Global Step: 205820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:00:46,684-Speed 3336.63 samples/sec   Loss 1.0327   LearningRate 0.0147   Epoch: 12   Global Step: 205830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:00:49,751-Speed 3339.57 samples/sec   Loss 1.0727   LearningRate 0.0147   Epoch: 12   Global Step: 205840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:00:52,822-Speed 3334.69 samples/sec   Loss 1.0432   LearningRate 0.0147   Epoch: 12   Global Step: 205850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:00:55,888-Speed 3340.88 samples/sec   Loss 1.0871   LearningRate 0.0147   Epoch: 12   Global Step: 205860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:00:58,958-Speed 3337.33 samples/sec   Loss 1.0350   LearningRate 0.0147   Epoch: 12   Global Step: 205870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:01:02,046-Speed 3316.00 samples/sec   Loss 1.0406   LearningRate 0.0147   Epoch: 12   Global Step: 205880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:01:05,114-Speed 3338.30 samples/sec   Loss 1.0101   LearningRate 0.0147   Epoch: 12   Global Step: 205890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:08,184-Speed 3337.19 samples/sec   Loss 1.0513   LearningRate 0.0147   Epoch: 12   Global Step: 205900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:11,264-Speed 3325.21 samples/sec   Loss 1.0212   LearningRate 0.0147   Epoch: 12   Global Step: 205910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:14,335-Speed 3334.62 samples/sec   Loss 1.0813   LearningRate 0.0147   Epoch: 12   Global Step: 205920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:17,417-Speed 3323.92 samples/sec   Loss 1.0390   LearningRate 0.0147   Epoch: 12   Global Step: 205930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:20,496-Speed 3326.51 samples/sec   Loss 1.0986   LearningRate 0.0147   Epoch: 12   Global Step: 205940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:23,573-Speed 3327.96 samples/sec   Loss 1.0556   LearningRate 0.0147   Epoch: 12   Global Step: 205950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:26,645-Speed 3335.53 samples/sec   Loss 1.0382   LearningRate 0.0147   Epoch: 12   Global Step: 205960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:29,716-Speed 3334.99 samples/sec   Loss 1.0417   LearningRate 0.0147   Epoch: 12   Global Step: 205970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:32,784-Speed 3338.60 samples/sec   Loss 1.0038   LearningRate 0.0147   Epoch: 12   Global Step: 205980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:01:35,855-Speed 3335.11 samples/sec   Loss 1.0434   LearningRate 0.0147   Epoch: 12   Global Step: 205990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:01:38,923-Speed 3337.68 samples/sec   Loss 1.0169   LearningRate 0.0147   Epoch: 12   Global Step: 206000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:02:23,213-[lfw][206000]XNorm: 20.972038
Training: 2022-04-11 21:02:23,213-[lfw][206000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-11 21:02:23,214-[lfw][206000]Accuracy-Highest: 0.99817
Training: 2022-04-11 21:03:14,663-[cfp_fp][206000]XNorm: 21.625598
Training: 2022-04-11 21:03:14,664-[cfp_fp][206000]Accuracy-Flip: 0.98814+-0.00511
Training: 2022-04-11 21:03:14,664-[cfp_fp][206000]Accuracy-Highest: 0.99086
Training: 2022-04-11 21:03:58,824-[agedb_30][206000]XNorm: 22.098682
Training: 2022-04-11 21:03:58,825-[agedb_30][206000]Accuracy-Flip: 0.98450+-0.00633
Training: 2022-04-11 21:03:58,825-[agedb_30][206000]Accuracy-Highest: 0.98567
Training: 2022-04-11 21:04:01,910-Speed 71.62 samples/sec   Loss 1.0609   LearningRate 0.0147   Epoch: 12   Global Step: 206010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:04:05,032-Speed 3279.97 samples/sec   Loss 1.0437   LearningRate 0.0147   Epoch: 12   Global Step: 206020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:04:08,256-Speed 3177.26 samples/sec   Loss 1.0301   LearningRate 0.0147   Epoch: 12   Global Step: 206030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:04:11,365-Speed 3294.67 samples/sec   Loss 1.0375   LearningRate 0.0147   Epoch: 12   Global Step: 206040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:04:14,423-Speed 3348.98 samples/sec   Loss 1.0343   LearningRate 0.0146   Epoch: 12   Global Step: 206050   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:04:17,553-Speed 3272.48 samples/sec   Loss 1.0188   LearningRate 0.0146   Epoch: 12   Global Step: 206060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:04:20,638-Speed 3320.72 samples/sec   Loss 1.0149   LearningRate 0.0146   Epoch: 12   Global Step: 206070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:04:23,703-Speed 3342.09 samples/sec   Loss 1.0112   LearningRate 0.0146   Epoch: 12   Global Step: 206080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:04:26,766-Speed 3343.13 samples/sec   Loss 1.0483   LearningRate 0.0146   Epoch: 12   Global Step: 206090   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 21:04:29,850-Speed 3321.41 samples/sec   Loss 1.0452   LearningRate 0.0146   Epoch: 12   Global Step: 206100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 21:04:32,945-Speed 3308.88 samples/sec   Loss 1.0512   LearningRate 0.0146   Epoch: 12   Global Step: 206110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:04:36,039-Speed 3310.99 samples/sec   Loss 1.0538   LearningRate 0.0146   Epoch: 12   Global Step: 206120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:04:39,098-Speed 3347.42 samples/sec   Loss 1.0415   LearningRate 0.0146   Epoch: 12   Global Step: 206130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:04:42,166-Speed 3338.85 samples/sec   Loss 1.0495   LearningRate 0.0146   Epoch: 12   Global Step: 206140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:04:45,227-Speed 3346.22 samples/sec   Loss 1.0668   LearningRate 0.0146   Epoch: 12   Global Step: 206150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:04:48,315-Speed 3316.63 samples/sec   Loss 1.0993   LearningRate 0.0146   Epoch: 12   Global Step: 206160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:04:51,537-Speed 3178.99 samples/sec   Loss 1.0358   LearningRate 0.0146   Epoch: 12   Global Step: 206170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:04:54,661-Speed 3278.98 samples/sec   Loss 1.0606   LearningRate 0.0146   Epoch: 12   Global Step: 206180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:04:57,718-Speed 3350.37 samples/sec   Loss 1.0661   LearningRate 0.0146   Epoch: 12   Global Step: 206190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:05:00,813-Speed 3309.42 samples/sec   Loss 1.0627   LearningRate 0.0146   Epoch: 12   Global Step: 206200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-11 21:05:03,870-Speed 3350.70 samples/sec   Loss 1.0361   LearningRate 0.0146   Epoch: 12   Global Step: 206210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:05:06,933-Speed 3343.55 samples/sec   Loss 1.0197   LearningRate 0.0146   Epoch: 12   Global Step: 206220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:05:09,989-Speed 3350.93 samples/sec   Loss 1.0278   LearningRate 0.0146   Epoch: 12   Global Step: 206230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:05:13,046-Speed 3350.92 samples/sec   Loss 1.0653   LearningRate 0.0146   Epoch: 12   Global Step: 206240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:05:16,111-Speed 3341.91 samples/sec   Loss 1.0260   LearningRate 0.0146   Epoch: 12   Global Step: 206250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:05:19,173-Speed 3345.51 samples/sec   Loss 1.0602   LearningRate 0.0146   Epoch: 12   Global Step: 206260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:05:22,238-Speed 3341.07 samples/sec   Loss 1.0605   LearningRate 0.0146   Epoch: 12   Global Step: 206270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:05:25,348-Speed 3293.64 samples/sec   Loss 1.0447   LearningRate 0.0146   Epoch: 12   Global Step: 206280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-11 21:05:28,459-Speed 3292.57 samples/sec   Loss 1.0203   LearningRate 0.0146   Epoch: 12   Global Step: 206290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:05:31,596-Speed 3264.33 samples/sec   Loss 0.9853   LearningRate 0.0146   Epoch: 12   Global Step: 206300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:05:34,653-Speed 3350.74 samples/sec   Loss 1.0433   LearningRate 0.0146   Epoch: 12   Global Step: 206310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:05:37,779-Speed 3276.79 samples/sec   Loss 1.0234   LearningRate 0.0146   Epoch: 12   Global Step: 206320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:05:40,930-Speed 3250.20 samples/sec   Loss 1.0554   LearningRate 0.0146   Epoch: 12   Global Step: 206330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:05:43,998-Speed 3339.12 samples/sec   Loss 1.0779   LearningRate 0.0146   Epoch: 12   Global Step: 206340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:05:47,086-Speed 3316.14 samples/sec   Loss 1.0580   LearningRate 0.0146   Epoch: 12   Global Step: 206350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:05:50,145-Speed 3348.54 samples/sec   Loss 1.0686   LearningRate 0.0146   Epoch: 12   Global Step: 206360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:05:53,270-Speed 3277.49 samples/sec   Loss 1.0364   LearningRate 0.0146   Epoch: 12   Global Step: 206370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:05:56,476-Speed 3194.87 samples/sec   Loss 1.0694   LearningRate 0.0146   Epoch: 12   Global Step: 206380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:05:59,593-Speed 3285.98 samples/sec   Loss 1.0394   LearningRate 0.0146   Epoch: 12   Global Step: 206390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:06:02,689-Speed 3308.19 samples/sec   Loss 1.0710   LearningRate 0.0146   Epoch: 12   Global Step: 206400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:06:05,752-Speed 3343.82 samples/sec   Loss 1.0480   LearningRate 0.0146   Epoch: 12   Global Step: 206410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:08,817-Speed 3341.30 samples/sec   Loss 1.0755   LearningRate 0.0146   Epoch: 12   Global Step: 206420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:11,878-Speed 3351.00 samples/sec   Loss 1.0218   LearningRate 0.0146   Epoch: 12   Global Step: 206430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:14,933-Speed 3352.85 samples/sec   Loss 1.0247   LearningRate 0.0146   Epoch: 12   Global Step: 206440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:17,990-Speed 3349.76 samples/sec   Loss 1.0438   LearningRate 0.0146   Epoch: 12   Global Step: 206450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:21,050-Speed 3347.45 samples/sec   Loss 1.0216   LearningRate 0.0146   Epoch: 12   Global Step: 206460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:24,105-Speed 3352.29 samples/sec   Loss 1.0182   LearningRate 0.0146   Epoch: 12   Global Step: 206470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:27,168-Speed 3344.62 samples/sec   Loss 1.0226   LearningRate 0.0146   Epoch: 12   Global Step: 206480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:30,226-Speed 3348.95 samples/sec   Loss 1.0530   LearningRate 0.0145   Epoch: 12   Global Step: 206490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:33,303-Speed 3328.80 samples/sec   Loss 1.0683   LearningRate 0.0145   Epoch: 12   Global Step: 206500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:36,438-Speed 3267.18 samples/sec   Loss 1.0576   LearningRate 0.0145   Epoch: 12   Global Step: 206510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:06:39,503-Speed 3341.35 samples/sec   Loss 1.0765   LearningRate 0.0145   Epoch: 12   Global Step: 206520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:06:42,641-Speed 3263.88 samples/sec   Loss 1.0747   LearningRate 0.0145   Epoch: 12   Global Step: 206530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:06:45,726-Speed 3320.63 samples/sec   Loss 1.0281   LearningRate 0.0145   Epoch: 12   Global Step: 206540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:06:48,797-Speed 3334.46 samples/sec   Loss 1.0452   LearningRate 0.0145   Epoch: 12   Global Step: 206550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:51,857-Speed 3347.24 samples/sec   Loss 1.0801   LearningRate 0.0145   Epoch: 12   Global Step: 206560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:54,909-Speed 3356.40 samples/sec   Loss 1.0564   LearningRate 0.0145   Epoch: 12   Global Step: 206570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:06:57,982-Speed 3332.93 samples/sec   Loss 1.0145   LearningRate 0.0145   Epoch: 12   Global Step: 206580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:01,046-Speed 3342.84 samples/sec   Loss 1.0724   LearningRate 0.0145   Epoch: 12   Global Step: 206590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:04,103-Speed 3349.99 samples/sec   Loss 1.0847   LearningRate 0.0145   Epoch: 12   Global Step: 206600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:07,220-Speed 3286.26 samples/sec   Loss 1.0542   LearningRate 0.0145   Epoch: 12   Global Step: 206610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:10,284-Speed 3343.53 samples/sec   Loss 1.0268   LearningRate 0.0145   Epoch: 12   Global Step: 206620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:13,346-Speed 3345.04 samples/sec   Loss 1.0712   LearningRate 0.0145   Epoch: 12   Global Step: 206630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:16,401-Speed 3351.55 samples/sec   Loss 1.0874   LearningRate 0.0145   Epoch: 12   Global Step: 206640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:19,477-Speed 3330.12 samples/sec   Loss 1.0709   LearningRate 0.0145   Epoch: 12   Global Step: 206650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:22,554-Speed 3328.82 samples/sec   Loss 1.1001   LearningRate 0.0145   Epoch: 12   Global Step: 206660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:25,638-Speed 3320.98 samples/sec   Loss 1.0762   LearningRate 0.0145   Epoch: 12   Global Step: 206670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:28,734-Speed 3308.34 samples/sec   Loss 1.0337   LearningRate 0.0145   Epoch: 12   Global Step: 206680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:31,812-Speed 3327.57 samples/sec   Loss 1.0865   LearningRate 0.0145   Epoch: 12   Global Step: 206690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:07:34,871-Speed 3348.74 samples/sec   Loss 1.0882   LearningRate 0.0145   Epoch: 12   Global Step: 206700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:37,946-Speed 3330.70 samples/sec   Loss 1.0694   LearningRate 0.0145   Epoch: 12   Global Step: 206710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:41,059-Speed 3289.52 samples/sec   Loss 1.0765   LearningRate 0.0145   Epoch: 12   Global Step: 206720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:44,122-Speed 3344.00 samples/sec   Loss 1.1167   LearningRate 0.0145   Epoch: 12   Global Step: 206730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:47,183-Speed 3346.33 samples/sec   Loss 1.0380   LearningRate 0.0145   Epoch: 12   Global Step: 206740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:50,247-Speed 3343.26 samples/sec   Loss 1.1005   LearningRate 0.0145   Epoch: 12   Global Step: 206750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:53,353-Speed 3297.26 samples/sec   Loss 1.0478   LearningRate 0.0145   Epoch: 12   Global Step: 206760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:56,424-Speed 3335.60 samples/sec   Loss 1.0770   LearningRate 0.0145   Epoch: 12   Global Step: 206770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:07:59,525-Speed 3302.69 samples/sec   Loss 1.0285   LearningRate 0.0145   Epoch: 12   Global Step: 206780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:02,598-Speed 3332.77 samples/sec   Loss 1.0538   LearningRate 0.0145   Epoch: 12   Global Step: 206790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:05,790-Speed 3208.66 samples/sec   Loss 1.0484   LearningRate 0.0145   Epoch: 12   Global Step: 206800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:08:08,940-Speed 3252.30 samples/sec   Loss 1.0848   LearningRate 0.0145   Epoch: 12   Global Step: 206810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:08:12,003-Speed 3343.70 samples/sec   Loss 1.0580   LearningRate 0.0145   Epoch: 12   Global Step: 206820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:08:15,077-Speed 3332.02 samples/sec   Loss 1.0971   LearningRate 0.0145   Epoch: 12   Global Step: 206830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:08:18,226-Speed 3252.39 samples/sec   Loss 1.0520   LearningRate 0.0145   Epoch: 12   Global Step: 206840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:08:21,314-Speed 3316.50 samples/sec   Loss 1.0534   LearningRate 0.0145   Epoch: 12   Global Step: 206850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:08:24,453-Speed 3262.69 samples/sec   Loss 1.0383   LearningRate 0.0145   Epoch: 12   Global Step: 206860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:08:27,550-Speed 3307.10 samples/sec   Loss 1.0667   LearningRate 0.0145   Epoch: 12   Global Step: 206870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:08:30,626-Speed 3330.82 samples/sec   Loss 1.0626   LearningRate 0.0145   Epoch: 12   Global Step: 206880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:33,703-Speed 3327.96 samples/sec   Loss 1.0470   LearningRate 0.0145   Epoch: 12   Global Step: 206890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:36,865-Speed 3239.55 samples/sec   Loss 1.0934   LearningRate 0.0145   Epoch: 12   Global Step: 206900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:40,079-Speed 3186.64 samples/sec   Loss 1.0356   LearningRate 0.0145   Epoch: 12   Global Step: 206910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:43,181-Speed 3301.67 samples/sec   Loss 1.0505   LearningRate 0.0145   Epoch: 12   Global Step: 206920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:46,347-Speed 3235.22 samples/sec   Loss 1.0297   LearningRate 0.0144   Epoch: 12   Global Step: 206930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:49,418-Speed 3335.22 samples/sec   Loss 1.0752   LearningRate 0.0144   Epoch: 12   Global Step: 206940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:52,480-Speed 3345.45 samples/sec   Loss 1.0742   LearningRate 0.0144   Epoch: 12   Global Step: 206950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:55,554-Speed 3331.69 samples/sec   Loss 1.0749   LearningRate 0.0144   Epoch: 12   Global Step: 206960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:08:58,624-Speed 3336.56 samples/sec   Loss 1.0792   LearningRate 0.0144   Epoch: 12   Global Step: 206970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:01,686-Speed 3344.28 samples/sec   Loss 1.0731   LearningRate 0.0144   Epoch: 12   Global Step: 206980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:09:04,754-Speed 3338.68 samples/sec   Loss 1.0463   LearningRate 0.0144   Epoch: 12   Global Step: 206990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:09:07,828-Speed 3332.09 samples/sec   Loss 1.0758   LearningRate 0.0144   Epoch: 12   Global Step: 207000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:09:10,882-Speed 3353.82 samples/sec   Loss 1.0657   LearningRate 0.0144   Epoch: 12   Global Step: 207010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:13,948-Speed 3340.80 samples/sec   Loss 1.0741   LearningRate 0.0144   Epoch: 12   Global Step: 207020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:17,015-Speed 3338.69 samples/sec   Loss 1.0995   LearningRate 0.0144   Epoch: 12   Global Step: 207030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:20,080-Speed 3342.34 samples/sec   Loss 1.0676   LearningRate 0.0144   Epoch: 12   Global Step: 207040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:23,139-Speed 3348.63 samples/sec   Loss 1.0863   LearningRate 0.0144   Epoch: 12   Global Step: 207050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:26,208-Speed 3337.54 samples/sec   Loss 1.0296   LearningRate 0.0144   Epoch: 12   Global Step: 207060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:29,288-Speed 3324.84 samples/sec   Loss 1.0596   LearningRate 0.0144   Epoch: 12   Global Step: 207070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:32,354-Speed 3340.64 samples/sec   Loss 1.0500   LearningRate 0.0144   Epoch: 12   Global Step: 207080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:35,420-Speed 3341.14 samples/sec   Loss 1.0394   LearningRate 0.0144   Epoch: 12   Global Step: 207090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:38,486-Speed 3340.22 samples/sec   Loss 1.0808   LearningRate 0.0144   Epoch: 12   Global Step: 207100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:41,564-Speed 3327.27 samples/sec   Loss 1.0510   LearningRate 0.0144   Epoch: 12   Global Step: 207110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:09:44,630-Speed 3341.45 samples/sec   Loss 1.0877   LearningRate 0.0144   Epoch: 12   Global Step: 207120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:09:47,808-Speed 3223.08 samples/sec   Loss 1.0813   LearningRate 0.0144   Epoch: 12   Global Step: 207130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:50,896-Speed 3316.49 samples/sec   Loss 1.0216   LearningRate 0.0144   Epoch: 12   Global Step: 207140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:53,970-Speed 3331.99 samples/sec   Loss 1.0682   LearningRate 0.0144   Epoch: 12   Global Step: 207150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:09:57,074-Speed 3300.40 samples/sec   Loss 1.0501   LearningRate 0.0144   Epoch: 12   Global Step: 207160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:10:00,160-Speed 3318.83 samples/sec   Loss 1.0934   LearningRate 0.0144   Epoch: 12   Global Step: 207170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:10:03,238-Speed 3327.13 samples/sec   Loss 1.0740   LearningRate 0.0144   Epoch: 12   Global Step: 207180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:10:06,303-Speed 3341.67 samples/sec   Loss 1.0452   LearningRate 0.0144   Epoch: 12   Global Step: 207190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:10:09,368-Speed 3342.23 samples/sec   Loss 1.0574   LearningRate 0.0144   Epoch: 12   Global Step: 207200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:10:12,480-Speed 3291.17 samples/sec   Loss 1.0916   LearningRate 0.0144   Epoch: 12   Global Step: 207210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:10:15,706-Speed 3175.13 samples/sec   Loss 1.0047   LearningRate 0.0144   Epoch: 12   Global Step: 207220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:10:18,836-Speed 3272.18 samples/sec   Loss 1.0239   LearningRate 0.0144   Epoch: 12   Global Step: 207230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:21,908-Speed 3334.36 samples/sec   Loss 1.0827   LearningRate 0.0144   Epoch: 12   Global Step: 207240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:24,972-Speed 3342.47 samples/sec   Loss 1.0324   LearningRate 0.0144   Epoch: 12   Global Step: 207250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:28,101-Speed 3273.13 samples/sec   Loss 1.0429   LearningRate 0.0144   Epoch: 12   Global Step: 207260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:31,206-Speed 3298.66 samples/sec   Loss 1.1019   LearningRate 0.0144   Epoch: 12   Global Step: 207270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:34,282-Speed 3329.76 samples/sec   Loss 1.0652   LearningRate 0.0144   Epoch: 12   Global Step: 207280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:37,365-Speed 3323.33 samples/sec   Loss 1.0640   LearningRate 0.0144   Epoch: 12   Global Step: 207290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:40,458-Speed 3310.78 samples/sec   Loss 1.0917   LearningRate 0.0144   Epoch: 12   Global Step: 207300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:43,526-Speed 3338.96 samples/sec   Loss 1.0485   LearningRate 0.0144   Epoch: 12   Global Step: 207310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:46,591-Speed 3341.24 samples/sec   Loss 1.0733   LearningRate 0.0144   Epoch: 12   Global Step: 207320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:49,642-Speed 3356.67 samples/sec   Loss 1.0682   LearningRate 0.0144   Epoch: 12   Global Step: 207330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:10:52,711-Speed 3338.43 samples/sec   Loss 1.0578   LearningRate 0.0144   Epoch: 12   Global Step: 207340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:10:55,780-Speed 3336.66 samples/sec   Loss 1.0752   LearningRate 0.0144   Epoch: 12   Global Step: 207350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:10:58,842-Speed 3344.88 samples/sec   Loss 1.0543   LearningRate 0.0144   Epoch: 12   Global Step: 207360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:01,924-Speed 3323.66 samples/sec   Loss 1.0526   LearningRate 0.0143   Epoch: 12   Global Step: 207370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:04,991-Speed 3340.16 samples/sec   Loss 1.0669   LearningRate 0.0143   Epoch: 12   Global Step: 207380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:08,137-Speed 3255.33 samples/sec   Loss 1.1056   LearningRate 0.0143   Epoch: 12   Global Step: 207390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:11,212-Speed 3330.41 samples/sec   Loss 1.0917   LearningRate 0.0143   Epoch: 12   Global Step: 207400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:14,311-Speed 3305.01 samples/sec   Loss 1.0733   LearningRate 0.0143   Epoch: 12   Global Step: 207410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:17,414-Speed 3301.20 samples/sec   Loss 1.0680   LearningRate 0.0143   Epoch: 12   Global Step: 207420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:20,513-Speed 3304.47 samples/sec   Loss 1.0671   LearningRate 0.0143   Epoch: 12   Global Step: 207430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:23,579-Speed 3340.34 samples/sec   Loss 1.0350   LearningRate 0.0143   Epoch: 12   Global Step: 207440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:11:26,655-Speed 3330.30 samples/sec   Loss 1.0805   LearningRate 0.0143   Epoch: 12   Global Step: 207450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:11:29,722-Speed 3339.65 samples/sec   Loss 1.0752   LearningRate 0.0143   Epoch: 12   Global Step: 207460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:11:32,786-Speed 3342.60 samples/sec   Loss 1.0546   LearningRate 0.0143   Epoch: 12   Global Step: 207470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:11:35,849-Speed 3343.59 samples/sec   Loss 1.0543   LearningRate 0.0143   Epoch: 12   Global Step: 207480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:11:38,902-Speed 3354.74 samples/sec   Loss 1.0936   LearningRate 0.0143   Epoch: 12   Global Step: 207490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:41,986-Speed 3321.93 samples/sec   Loss 1.0590   LearningRate 0.0143   Epoch: 12   Global Step: 207500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:45,049-Speed 3343.72 samples/sec   Loss 1.0715   LearningRate 0.0143   Epoch: 12   Global Step: 207510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:48,120-Speed 3334.50 samples/sec   Loss 1.0673   LearningRate 0.0143   Epoch: 12   Global Step: 207520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:51,195-Speed 3331.27 samples/sec   Loss 1.0456   LearningRate 0.0143   Epoch: 12   Global Step: 207530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:54,274-Speed 3325.98 samples/sec   Loss 1.0188   LearningRate 0.0143   Epoch: 12   Global Step: 207540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:11:57,352-Speed 3327.70 samples/sec   Loss 1.0809   LearningRate 0.0143   Epoch: 12   Global Step: 207550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:00,420-Speed 3338.75 samples/sec   Loss 1.0488   LearningRate 0.0143   Epoch: 12   Global Step: 207560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:03,504-Speed 3320.99 samples/sec   Loss 1.0459   LearningRate 0.0143   Epoch: 12   Global Step: 207570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:06,571-Speed 3340.44 samples/sec   Loss 1.0853   LearningRate 0.0143   Epoch: 12   Global Step: 207580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:09,639-Speed 3337.63 samples/sec   Loss 1.0655   LearningRate 0.0143   Epoch: 12   Global Step: 207590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:12:12,717-Speed 3327.18 samples/sec   Loss 1.0973   LearningRate 0.0143   Epoch: 12   Global Step: 207600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:12:15,792-Speed 3331.85 samples/sec   Loss 1.0656   LearningRate 0.0143   Epoch: 12   Global Step: 207610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:12:18,873-Speed 3323.52 samples/sec   Loss 1.0150   LearningRate 0.0143   Epoch: 12   Global Step: 207620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:12:21,965-Speed 3312.79 samples/sec   Loss 1.0879   LearningRate 0.0143   Epoch: 12   Global Step: 207630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:12:25,033-Speed 3339.29 samples/sec   Loss 1.0928   LearningRate 0.0143   Epoch: 12   Global Step: 207640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:12:28,096-Speed 3343.16 samples/sec   Loss 1.1188   LearningRate 0.0143   Epoch: 12   Global Step: 207650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:12:31,155-Speed 3349.10 samples/sec   Loss 1.0814   LearningRate 0.0143   Epoch: 12   Global Step: 207660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:34,228-Speed 3333.15 samples/sec   Loss 1.1089   LearningRate 0.0143   Epoch: 12   Global Step: 207670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:37,298-Speed 3335.24 samples/sec   Loss 1.0982   LearningRate 0.0143   Epoch: 12   Global Step: 207680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:40,362-Speed 3343.15 samples/sec   Loss 1.0994   LearningRate 0.0143   Epoch: 12   Global Step: 207690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:43,430-Speed 3338.16 samples/sec   Loss 1.0839   LearningRate 0.0143   Epoch: 12   Global Step: 207700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:46,493-Speed 3343.85 samples/sec   Loss 1.0188   LearningRate 0.0143   Epoch: 12   Global Step: 207710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:49,570-Speed 3329.09 samples/sec   Loss 1.0523   LearningRate 0.0143   Epoch: 12   Global Step: 207720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:52,649-Speed 3327.29 samples/sec   Loss 1.0719   LearningRate 0.0143   Epoch: 12   Global Step: 207730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:55,712-Speed 3343.71 samples/sec   Loss 1.0807   LearningRate 0.0143   Epoch: 12   Global Step: 207740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:12:58,779-Speed 3339.51 samples/sec   Loss 1.0812   LearningRate 0.0143   Epoch: 12   Global Step: 207750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:01,852-Speed 3332.19 samples/sec   Loss 1.0950   LearningRate 0.0143   Epoch: 12   Global Step: 207760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:13:04,928-Speed 3329.95 samples/sec   Loss 1.1061   LearningRate 0.0143   Epoch: 12   Global Step: 207770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:13:08,027-Speed 3304.95 samples/sec   Loss 1.1185   LearningRate 0.0143   Epoch: 12   Global Step: 207780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:13:11,144-Speed 3286.59 samples/sec   Loss 1.0341   LearningRate 0.0143   Epoch: 12   Global Step: 207790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:13:14,245-Speed 3301.97 samples/sec   Loss 1.0665   LearningRate 0.0143   Epoch: 12   Global Step: 207800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:13:17,373-Speed 3274.63 samples/sec   Loss 1.0787   LearningRate 0.0142   Epoch: 12   Global Step: 207810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:13:20,457-Speed 3321.51 samples/sec   Loss 1.0671   LearningRate 0.0142   Epoch: 12   Global Step: 207820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:13:23,524-Speed 3339.93 samples/sec   Loss 1.1040   LearningRate 0.0142   Epoch: 12   Global Step: 207830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:13:26,579-Speed 3352.14 samples/sec   Loss 1.0430   LearningRate 0.0142   Epoch: 12   Global Step: 207840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:29,648-Speed 3337.41 samples/sec   Loss 1.0450   LearningRate 0.0142   Epoch: 12   Global Step: 207850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:32,735-Speed 3318.04 samples/sec   Loss 1.0801   LearningRate 0.0142   Epoch: 12   Global Step: 207860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:35,857-Speed 3280.75 samples/sec   Loss 1.0710   LearningRate 0.0142   Epoch: 12   Global Step: 207870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:38,927-Speed 3335.92 samples/sec   Loss 1.0675   LearningRate 0.0142   Epoch: 12   Global Step: 207880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:42,091-Speed 3236.60 samples/sec   Loss 1.1067   LearningRate 0.0142   Epoch: 12   Global Step: 207890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:45,196-Speed 3299.54 samples/sec   Loss 1.0422   LearningRate 0.0142   Epoch: 12   Global Step: 207900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:48,289-Speed 3311.66 samples/sec   Loss 1.0999   LearningRate 0.0142   Epoch: 12   Global Step: 207910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:51,381-Speed 3312.09 samples/sec   Loss 1.1181   LearningRate 0.0142   Epoch: 12   Global Step: 207920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:54,477-Speed 3308.81 samples/sec   Loss 1.0806   LearningRate 0.0142   Epoch: 12   Global Step: 207930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:13:57,540-Speed 3344.19 samples/sec   Loss 1.0880   LearningRate 0.0142   Epoch: 12   Global Step: 207940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:14:00,609-Speed 3336.56 samples/sec   Loss 1.0686   LearningRate 0.0142   Epoch: 12   Global Step: 207950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:14:03,691-Speed 3324.02 samples/sec   Loss 1.0683   LearningRate 0.0142   Epoch: 12   Global Step: 207960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:14:06,762-Speed 3334.89 samples/sec   Loss 1.0765   LearningRate 0.0142   Epoch: 12   Global Step: 207970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:14:09,898-Speed 3265.82 samples/sec   Loss 1.0278   LearningRate 0.0142   Epoch: 12   Global Step: 207980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:14:12,978-Speed 3325.48 samples/sec   Loss 1.1104   LearningRate 0.0142   Epoch: 12   Global Step: 207990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:14:16,110-Speed 3271.02 samples/sec   Loss 1.0944   LearningRate 0.0142   Epoch: 12   Global Step: 208000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:15:00,323-[lfw][208000]XNorm: 21.421805
Training: 2022-04-11 21:15:00,324-[lfw][208000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-11 21:15:00,324-[lfw][208000]Accuracy-Highest: 0.99817
Training: 2022-04-11 21:15:51,600-[cfp_fp][208000]XNorm: 21.511615
Training: 2022-04-11 21:15:51,601-[cfp_fp][208000]Accuracy-Flip: 0.99043+-0.00456
Training: 2022-04-11 21:15:51,601-[cfp_fp][208000]Accuracy-Highest: 0.99086
Training: 2022-04-11 21:16:35,746-[agedb_30][208000]XNorm: 22.039902
Training: 2022-04-11 21:16:35,746-[agedb_30][208000]Accuracy-Flip: 0.98417+-0.00620
Training: 2022-04-11 21:16:35,747-[agedb_30][208000]Accuracy-Highest: 0.98567
Training: 2022-04-11 21:16:38,810-Speed 71.76 samples/sec   Loss 1.0908   LearningRate 0.0142   Epoch: 12   Global Step: 208010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:16:41,868-Speed 3349.78 samples/sec   Loss 1.0840   LearningRate 0.0142   Epoch: 12   Global Step: 208020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:16:44,925-Speed 3349.86 samples/sec   Loss 1.0457   LearningRate 0.0142   Epoch: 12   Global Step: 208030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:16:47,991-Speed 3341.40 samples/sec   Loss 1.0887   LearningRate 0.0142   Epoch: 12   Global Step: 208040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:16:51,047-Speed 3351.38 samples/sec   Loss 1.0774   LearningRate 0.0142   Epoch: 12   Global Step: 208050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:16:54,112-Speed 3341.18 samples/sec   Loss 1.0640   LearningRate 0.0142   Epoch: 12   Global Step: 208060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:16:57,169-Speed 3349.98 samples/sec   Loss 1.0992   LearningRate 0.0142   Epoch: 12   Global Step: 208070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:00,227-Speed 3350.66 samples/sec   Loss 1.0982   LearningRate 0.0142   Epoch: 12   Global Step: 208080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:03,288-Speed 3345.70 samples/sec   Loss 1.1237   LearningRate 0.0142   Epoch: 12   Global Step: 208090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:06,366-Speed 3327.98 samples/sec   Loss 1.0741   LearningRate 0.0142   Epoch: 12   Global Step: 208100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:09,504-Speed 3263.13 samples/sec   Loss 1.0650   LearningRate 0.0142   Epoch: 12   Global Step: 208110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:12,577-Speed 3333.17 samples/sec   Loss 1.0566   LearningRate 0.0142   Epoch: 12   Global Step: 208120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:15,648-Speed 3335.84 samples/sec   Loss 1.0628   LearningRate 0.0142   Epoch: 12   Global Step: 208130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:18,722-Speed 3331.84 samples/sec   Loss 1.1243   LearningRate 0.0142   Epoch: 12   Global Step: 208140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:21,806-Speed 3320.22 samples/sec   Loss 1.0654   LearningRate 0.0142   Epoch: 12   Global Step: 208150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:24,875-Speed 3338.27 samples/sec   Loss 1.0760   LearningRate 0.0142   Epoch: 12   Global Step: 208160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:27,952-Speed 3328.37 samples/sec   Loss 1.0771   LearningRate 0.0142   Epoch: 12   Global Step: 208170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:17:31,039-Speed 3318.33 samples/sec   Loss 1.0706   LearningRate 0.0142   Epoch: 12   Global Step: 208180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:17:34,187-Speed 3253.19 samples/sec   Loss 1.0881   LearningRate 0.0142   Epoch: 12   Global Step: 208190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:17:37,257-Speed 3336.35 samples/sec   Loss 1.0725   LearningRate 0.0142   Epoch: 12   Global Step: 208200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:17:40,347-Speed 3314.03 samples/sec   Loss 1.0524   LearningRate 0.0142   Epoch: 12   Global Step: 208210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:17:43,409-Speed 3345.66 samples/sec   Loss 1.0864   LearningRate 0.0142   Epoch: 12   Global Step: 208220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:17:46,474-Speed 3341.91 samples/sec   Loss 1.1231   LearningRate 0.0142   Epoch: 12   Global Step: 208230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:17:49,563-Speed 3315.28 samples/sec   Loss 1.0798   LearningRate 0.0142   Epoch: 12   Global Step: 208240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:17:52,631-Speed 3338.71 samples/sec   Loss 1.0880   LearningRate 0.0141   Epoch: 12   Global Step: 208250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:17:55,695-Speed 3343.55 samples/sec   Loss 1.0673   LearningRate 0.0141   Epoch: 12   Global Step: 208260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:17:58,771-Speed 3329.46 samples/sec   Loss 1.1013   LearningRate 0.0141   Epoch: 12   Global Step: 208270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:01,848-Speed 3328.85 samples/sec   Loss 1.0507   LearningRate 0.0141   Epoch: 12   Global Step: 208280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:18:04,910-Speed 3344.32 samples/sec   Loss 1.0538   LearningRate 0.0141   Epoch: 12   Global Step: 208290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:18:07,975-Speed 3341.86 samples/sec   Loss 1.0660   LearningRate 0.0141   Epoch: 12   Global Step: 208300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:18:11,038-Speed 3344.53 samples/sec   Loss 1.0743   LearningRate 0.0141   Epoch: 12   Global Step: 208310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:18:14,106-Speed 3338.20 samples/sec   Loss 1.1104   LearningRate 0.0141   Epoch: 12   Global Step: 208320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:18:17,171-Speed 3341.78 samples/sec   Loss 1.0939   LearningRate 0.0141   Epoch: 12   Global Step: 208330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:18:20,230-Speed 3348.99 samples/sec   Loss 1.1007   LearningRate 0.0141   Epoch: 12   Global Step: 208340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:18:23,292-Speed 3344.98 samples/sec   Loss 1.0726   LearningRate 0.0141   Epoch: 12   Global Step: 208350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:26,350-Speed 3349.28 samples/sec   Loss 1.0744   LearningRate 0.0141   Epoch: 12   Global Step: 208360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:29,416-Speed 3339.48 samples/sec   Loss 1.0722   LearningRate 0.0141   Epoch: 12   Global Step: 208370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:32,482-Speed 3341.84 samples/sec   Loss 1.1099   LearningRate 0.0141   Epoch: 12   Global Step: 208380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:35,549-Speed 3338.44 samples/sec   Loss 1.1279   LearningRate 0.0141   Epoch: 12   Global Step: 208390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:38,626-Speed 3328.73 samples/sec   Loss 1.1031   LearningRate 0.0141   Epoch: 12   Global Step: 208400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:41,819-Speed 3207.95 samples/sec   Loss 1.0765   LearningRate 0.0141   Epoch: 12   Global Step: 208410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:44,969-Speed 3252.52 samples/sec   Loss 1.0756   LearningRate 0.0141   Epoch: 12   Global Step: 208420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:48,052-Speed 3322.36 samples/sec   Loss 1.1097   LearningRate 0.0141   Epoch: 12   Global Step: 208430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:51,223-Speed 3230.14 samples/sec   Loss 1.0551   LearningRate 0.0141   Epoch: 12   Global Step: 208440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:18:54,349-Speed 3276.27 samples/sec   Loss 1.0672   LearningRate 0.0141   Epoch: 12   Global Step: 208450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:18:57,482-Speed 3269.13 samples/sec   Loss 1.0653   LearningRate 0.0141   Epoch: 12   Global Step: 208460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:00,545-Speed 3343.44 samples/sec   Loss 1.1193   LearningRate 0.0141   Epoch: 12   Global Step: 208470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:03,651-Speed 3297.36 samples/sec   Loss 1.0778   LearningRate 0.0141   Epoch: 12   Global Step: 208480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:06,842-Speed 3209.61 samples/sec   Loss 1.0900   LearningRate 0.0141   Epoch: 12   Global Step: 208490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:10,046-Speed 3197.21 samples/sec   Loss 1.1091   LearningRate 0.0141   Epoch: 12   Global Step: 208500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:13,154-Speed 3296.46 samples/sec   Loss 1.0858   LearningRate 0.0141   Epoch: 12   Global Step: 208510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:16,228-Speed 3331.77 samples/sec   Loss 1.0803   LearningRate 0.0141   Epoch: 12   Global Step: 208520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:19,289-Speed 3345.83 samples/sec   Loss 1.0500   LearningRate 0.0141   Epoch: 12   Global Step: 208530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:22,398-Speed 3295.08 samples/sec   Loss 1.1086   LearningRate 0.0141   Epoch: 12   Global Step: 208540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:25,448-Speed 3357.29 samples/sec   Loss 1.0467   LearningRate 0.0141   Epoch: 12   Global Step: 208550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:28,514-Speed 3341.41 samples/sec   Loss 1.0397   LearningRate 0.0141   Epoch: 12   Global Step: 208560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:31,573-Speed 3348.32 samples/sec   Loss 1.1112   LearningRate 0.0141   Epoch: 12   Global Step: 208570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:34,639-Speed 3339.58 samples/sec   Loss 1.0974   LearningRate 0.0141   Epoch: 12   Global Step: 208580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:37,713-Speed 3332.58 samples/sec   Loss 1.1107   LearningRate 0.0141   Epoch: 12   Global Step: 208590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:40,806-Speed 3312.07 samples/sec   Loss 1.0579   LearningRate 0.0141   Epoch: 12   Global Step: 208600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:43,871-Speed 3341.82 samples/sec   Loss 1.0528   LearningRate 0.0141   Epoch: 12   Global Step: 208610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:46,993-Speed 3280.13 samples/sec   Loss 1.0564   LearningRate 0.0141   Epoch: 12   Global Step: 208620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:50,075-Speed 3323.77 samples/sec   Loss 1.0806   LearningRate 0.0141   Epoch: 12   Global Step: 208630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:53,150-Speed 3330.11 samples/sec   Loss 1.0284   LearningRate 0.0141   Epoch: 12   Global Step: 208640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:56,203-Speed 3355.15 samples/sec   Loss 1.0685   LearningRate 0.0141   Epoch: 12   Global Step: 208650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:19:59,350-Speed 3254.17 samples/sec   Loss 1.1245   LearningRate 0.0141   Epoch: 12   Global Step: 208660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:02,424-Speed 3332.56 samples/sec   Loss 1.0799   LearningRate 0.0141   Epoch: 12   Global Step: 208670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:05,486-Speed 3344.62 samples/sec   Loss 1.0483   LearningRate 0.0141   Epoch: 12   Global Step: 208680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:08,549-Speed 3343.79 samples/sec   Loss 1.1060   LearningRate 0.0141   Epoch: 12   Global Step: 208690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:11,618-Speed 3337.19 samples/sec   Loss 1.1137   LearningRate 0.0140   Epoch: 12   Global Step: 208700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:14,690-Speed 3334.94 samples/sec   Loss 1.1028   LearningRate 0.0140   Epoch: 12   Global Step: 208710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:17,752-Speed 3344.91 samples/sec   Loss 1.0586   LearningRate 0.0140   Epoch: 12   Global Step: 208720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:20,868-Speed 3287.02 samples/sec   Loss 1.0596   LearningRate 0.0140   Epoch: 12   Global Step: 208730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:23,940-Speed 3334.30 samples/sec   Loss 1.1071   LearningRate 0.0140   Epoch: 12   Global Step: 208740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:26,998-Speed 3349.22 samples/sec   Loss 1.0435   LearningRate 0.0140   Epoch: 12   Global Step: 208750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:30,079-Speed 3324.14 samples/sec   Loss 1.0429   LearningRate 0.0140   Epoch: 12   Global Step: 208760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:33,151-Speed 3333.64 samples/sec   Loss 1.0837   LearningRate 0.0140   Epoch: 12   Global Step: 208770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:36,224-Speed 3333.20 samples/sec   Loss 1.0591   LearningRate 0.0140   Epoch: 12   Global Step: 208780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:39,404-Speed 3221.19 samples/sec   Loss 1.1029   LearningRate 0.0140   Epoch: 12   Global Step: 208790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:20:42,461-Speed 3351.00 samples/sec   Loss 1.0666   LearningRate 0.0140   Epoch: 12   Global Step: 208800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:20:45,529-Speed 3338.03 samples/sec   Loss 1.0765   LearningRate 0.0140   Epoch: 12   Global Step: 208810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:20:48,604-Speed 3330.65 samples/sec   Loss 1.0890   LearningRate 0.0140   Epoch: 12   Global Step: 208820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:20:51,668-Speed 3342.96 samples/sec   Loss 1.0908   LearningRate 0.0140   Epoch: 12   Global Step: 208830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:20:54,771-Speed 3300.43 samples/sec   Loss 1.0780   LearningRate 0.0140   Epoch: 12   Global Step: 208840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:20:57,839-Speed 3339.01 samples/sec   Loss 1.1294   LearningRate 0.0140   Epoch: 12   Global Step: 208850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:00,904-Speed 3341.27 samples/sec   Loss 1.0981   LearningRate 0.0140   Epoch: 12   Global Step: 208860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:03,974-Speed 3336.36 samples/sec   Loss 1.0721   LearningRate 0.0140   Epoch: 12   Global Step: 208870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:07,040-Speed 3341.37 samples/sec   Loss 1.0891   LearningRate 0.0140   Epoch: 12   Global Step: 208880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:10,105-Speed 3341.03 samples/sec   Loss 1.0714   LearningRate 0.0140   Epoch: 12   Global Step: 208890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:13,178-Speed 3332.97 samples/sec   Loss 1.0747   LearningRate 0.0140   Epoch: 12   Global Step: 208900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:16,253-Speed 3330.80 samples/sec   Loss 1.0729   LearningRate 0.0140   Epoch: 12   Global Step: 208910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:19,336-Speed 3322.26 samples/sec   Loss 1.0443   LearningRate 0.0140   Epoch: 12   Global Step: 208920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:22,452-Speed 3287.68 samples/sec   Loss 1.1085   LearningRate 0.0140   Epoch: 12   Global Step: 208930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:25,522-Speed 3336.01 samples/sec   Loss 1.1293   LearningRate 0.0140   Epoch: 12   Global Step: 208940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:28,654-Speed 3270.47 samples/sec   Loss 1.0997   LearningRate 0.0140   Epoch: 12   Global Step: 208950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:31,832-Speed 3223.04 samples/sec   Loss 1.0565   LearningRate 0.0140   Epoch: 12   Global Step: 208960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:34,948-Speed 3287.12 samples/sec   Loss 1.1165   LearningRate 0.0140   Epoch: 12   Global Step: 208970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:38,025-Speed 3328.24 samples/sec   Loss 1.0363   LearningRate 0.0140   Epoch: 12   Global Step: 208980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:41,091-Speed 3341.41 samples/sec   Loss 1.0905   LearningRate 0.0140   Epoch: 12   Global Step: 208990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:21:44,154-Speed 3342.82 samples/sec   Loss 1.0881   LearningRate 0.0140   Epoch: 12   Global Step: 209000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:47,222-Speed 3338.83 samples/sec   Loss 1.0277   LearningRate 0.0140   Epoch: 12   Global Step: 209010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:50,354-Speed 3269.94 samples/sec   Loss 1.0803   LearningRate 0.0140   Epoch: 12   Global Step: 209020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:53,490-Speed 3266.92 samples/sec   Loss 1.1269   LearningRate 0.0140   Epoch: 12   Global Step: 209030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:56,554-Speed 3342.81 samples/sec   Loss 1.0992   LearningRate 0.0140   Epoch: 12   Global Step: 209040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:21:59,617-Speed 3343.52 samples/sec   Loss 1.1014   LearningRate 0.0140   Epoch: 12   Global Step: 209050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:02,687-Speed 3336.59 samples/sec   Loss 1.1110   LearningRate 0.0140   Epoch: 12   Global Step: 209060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:05,781-Speed 3310.52 samples/sec   Loss 1.0896   LearningRate 0.0140   Epoch: 12   Global Step: 209070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:08,853-Speed 3333.04 samples/sec   Loss 1.0555   LearningRate 0.0140   Epoch: 12   Global Step: 209080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:12,045-Speed 3208.91 samples/sec   Loss 1.0801   LearningRate 0.0140   Epoch: 12   Global Step: 209090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:15,101-Speed 3352.34 samples/sec   Loss 1.1198   LearningRate 0.0140   Epoch: 12   Global Step: 209100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:18,173-Speed 3334.05 samples/sec   Loss 1.0910   LearningRate 0.0140   Epoch: 12   Global Step: 209110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:21,253-Speed 3325.13 samples/sec   Loss 1.0744   LearningRate 0.0140   Epoch: 12   Global Step: 209120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:24,323-Speed 3336.56 samples/sec   Loss 1.0634   LearningRate 0.0140   Epoch: 12   Global Step: 209130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:27,469-Speed 3255.09 samples/sec   Loss 1.0719   LearningRate 0.0139   Epoch: 12   Global Step: 209140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:30,583-Speed 3289.28 samples/sec   Loss 1.0555   LearningRate 0.0139   Epoch: 12   Global Step: 209150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:33,653-Speed 3337.01 samples/sec   Loss 1.1312   LearningRate 0.0139   Epoch: 12   Global Step: 209160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:36,727-Speed 3331.89 samples/sec   Loss 1.0701   LearningRate 0.0139   Epoch: 12   Global Step: 209170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:39,790-Speed 3343.24 samples/sec   Loss 1.0656   LearningRate 0.0139   Epoch: 12   Global Step: 209180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:42,873-Speed 3322.46 samples/sec   Loss 1.0657   LearningRate 0.0139   Epoch: 12   Global Step: 209190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:22:45,968-Speed 3308.74 samples/sec   Loss 1.0576   LearningRate 0.0139   Epoch: 12   Global Step: 209200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:22:49,051-Speed 3322.45 samples/sec   Loss 1.0809   LearningRate 0.0139   Epoch: 12   Global Step: 209210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:22:52,261-Speed 3190.66 samples/sec   Loss 1.1202   LearningRate 0.0139   Epoch: 12   Global Step: 209220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:22:55,412-Speed 3251.05 samples/sec   Loss 1.0936   LearningRate 0.0139   Epoch: 12   Global Step: 209230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:22:58,492-Speed 3325.50 samples/sec   Loss 1.0549   LearningRate 0.0139   Epoch: 12   Global Step: 209240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:23:01,556-Speed 3343.15 samples/sec   Loss 1.0498   LearningRate 0.0139   Epoch: 12   Global Step: 209250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:23:04,620-Speed 3342.21 samples/sec   Loss 1.0769   LearningRate 0.0139   Epoch: 12   Global Step: 209260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:23:07,698-Speed 3327.52 samples/sec   Loss 1.1180   LearningRate 0.0139   Epoch: 12   Global Step: 209270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:23:10,749-Speed 3356.44 samples/sec   Loss 1.0800   LearningRate 0.0139   Epoch: 12   Global Step: 209280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:13,846-Speed 3307.36 samples/sec   Loss 1.0682   LearningRate 0.0139   Epoch: 12   Global Step: 209290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:16,949-Speed 3301.63 samples/sec   Loss 1.1107   LearningRate 0.0139   Epoch: 12   Global Step: 209300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:20,036-Speed 3317.71 samples/sec   Loss 1.1105   LearningRate 0.0139   Epoch: 12   Global Step: 209310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:23,103-Speed 3339.57 samples/sec   Loss 1.0427   LearningRate 0.0139   Epoch: 12   Global Step: 209320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:26,184-Speed 3324.18 samples/sec   Loss 1.0723   LearningRate 0.0139   Epoch: 12   Global Step: 209330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:29,260-Speed 3329.97 samples/sec   Loss 1.0937   LearningRate 0.0139   Epoch: 12   Global Step: 209340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:32,412-Speed 3249.14 samples/sec   Loss 1.0718   LearningRate 0.0139   Epoch: 12   Global Step: 209350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:35,637-Speed 3176.32 samples/sec   Loss 1.0981   LearningRate 0.0139   Epoch: 12   Global Step: 209360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:38,713-Speed 3329.53 samples/sec   Loss 1.0784   LearningRate 0.0139   Epoch: 12   Global Step: 209370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:41,784-Speed 3335.01 samples/sec   Loss 1.0745   LearningRate 0.0139   Epoch: 12   Global Step: 209380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:23:44,861-Speed 3329.27 samples/sec   Loss 1.0907   LearningRate 0.0139   Epoch: 12   Global Step: 209390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:23:47,918-Speed 3350.13 samples/sec   Loss 1.1275   LearningRate 0.0139   Epoch: 12   Global Step: 209400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:50,983-Speed 3342.21 samples/sec   Loss 1.0922   LearningRate 0.0139   Epoch: 12   Global Step: 209410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:54,050-Speed 3339.14 samples/sec   Loss 1.0899   LearningRate 0.0139   Epoch: 12   Global Step: 209420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:23:57,120-Speed 3336.73 samples/sec   Loss 1.0806   LearningRate 0.0139   Epoch: 12   Global Step: 209430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:00,188-Speed 3338.47 samples/sec   Loss 1.0518   LearningRate 0.0139   Epoch: 12   Global Step: 209440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:03,255-Speed 3339.09 samples/sec   Loss 1.0919   LearningRate 0.0139   Epoch: 12   Global Step: 209450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:06,338-Speed 3322.59 samples/sec   Loss 1.0722   LearningRate 0.0139   Epoch: 12   Global Step: 209460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:09,412-Speed 3332.02 samples/sec   Loss 1.0501   LearningRate 0.0139   Epoch: 12   Global Step: 209470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:12,491-Speed 3326.48 samples/sec   Loss 1.0850   LearningRate 0.0139   Epoch: 12   Global Step: 209480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:15,563-Speed 3333.47 samples/sec   Loss 1.0742   LearningRate 0.0139   Epoch: 12   Global Step: 209490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:18,632-Speed 3337.34 samples/sec   Loss 1.1085   LearningRate 0.0139   Epoch: 12   Global Step: 209500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:24:21,706-Speed 3332.11 samples/sec   Loss 1.0495   LearningRate 0.0139   Epoch: 12   Global Step: 209510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:24:24,768-Speed 3345.13 samples/sec   Loss 1.1065   LearningRate 0.0139   Epoch: 12   Global Step: 209520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:24:27,834-Speed 3340.99 samples/sec   Loss 1.0752   LearningRate 0.0139   Epoch: 12   Global Step: 209530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:24:30,919-Speed 3319.30 samples/sec   Loss 1.0986   LearningRate 0.0139   Epoch: 12   Global Step: 209540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:24:33,985-Speed 3341.24 samples/sec   Loss 1.0962   LearningRate 0.0139   Epoch: 12   Global Step: 209550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:37,052-Speed 3339.65 samples/sec   Loss 1.1107   LearningRate 0.0139   Epoch: 12   Global Step: 209560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:40,137-Speed 3320.40 samples/sec   Loss 1.1107   LearningRate 0.0139   Epoch: 12   Global Step: 209570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:43,243-Speed 3297.29 samples/sec   Loss 1.1037   LearningRate 0.0139   Epoch: 12   Global Step: 209580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:46,321-Speed 3327.04 samples/sec   Loss 1.0949   LearningRate 0.0138   Epoch: 12   Global Step: 209590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:49,396-Speed 3331.60 samples/sec   Loss 1.1201   LearningRate 0.0138   Epoch: 12   Global Step: 209600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:52,481-Speed 3319.20 samples/sec   Loss 1.1494   LearningRate 0.0138   Epoch: 12   Global Step: 209610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:55,651-Speed 3231.07 samples/sec   Loss 1.1149   LearningRate 0.0138   Epoch: 12   Global Step: 209620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:24:58,733-Speed 3323.52 samples/sec   Loss 1.0624   LearningRate 0.0138   Epoch: 12   Global Step: 209630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:25:01,801-Speed 3338.85 samples/sec   Loss 1.0530   LearningRate 0.0138   Epoch: 12   Global Step: 209640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:25:04,966-Speed 3237.21 samples/sec   Loss 1.0489   LearningRate 0.0138   Epoch: 12   Global Step: 209650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:25:08,069-Speed 3300.85 samples/sec   Loss 1.0776   LearningRate 0.0138   Epoch: 12   Global Step: 209660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:25:11,203-Speed 3267.66 samples/sec   Loss 1.0777   LearningRate 0.0138   Epoch: 12   Global Step: 209670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:25:14,337-Speed 3268.84 samples/sec   Loss 1.1317   LearningRate 0.0138   Epoch: 12   Global Step: 209680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:25:17,432-Speed 3308.36 samples/sec   Loss 1.1124   LearningRate 0.0138   Epoch: 12   Global Step: 209690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:25:20,520-Speed 3316.78 samples/sec   Loss 1.0621   LearningRate 0.0138   Epoch: 12   Global Step: 209700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:25:23,671-Speed 3250.76 samples/sec   Loss 1.0454   LearningRate 0.0138   Epoch: 12   Global Step: 209710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:25:26,747-Speed 3330.18 samples/sec   Loss 1.1047   LearningRate 0.0138   Epoch: 12   Global Step: 209720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:25:29,832-Speed 3320.61 samples/sec   Loss 1.0586   LearningRate 0.0138   Epoch: 12   Global Step: 209730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:25:32,917-Speed 3320.09 samples/sec   Loss 1.0845   LearningRate 0.0138   Epoch: 12   Global Step: 209740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:25:36,001-Speed 3321.08 samples/sec   Loss 1.0431   LearningRate 0.0138   Epoch: 12   Global Step: 209750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:25:39,087-Speed 3318.30 samples/sec   Loss 1.1489   LearningRate 0.0138   Epoch: 12   Global Step: 209760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:25:42,166-Speed 3326.57 samples/sec   Loss 1.0748   LearningRate 0.0138   Epoch: 12   Global Step: 209770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:25:45,238-Speed 3334.10 samples/sec   Loss 1.0851   LearningRate 0.0138   Epoch: 12   Global Step: 209780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:25:48,301-Speed 3343.64 samples/sec   Loss 1.0972   LearningRate 0.0138   Epoch: 12   Global Step: 209790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:25:51,384-Speed 3322.48 samples/sec   Loss 1.0852   LearningRate 0.0138   Epoch: 12   Global Step: 209800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:25:54,450-Speed 3340.81 samples/sec   Loss 1.0713   LearningRate 0.0138   Epoch: 12   Global Step: 209810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:25:57,517-Speed 3340.10 samples/sec   Loss 1.0684   LearningRate 0.0138   Epoch: 12   Global Step: 209820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:26:00,590-Speed 3332.87 samples/sec   Loss 1.0968   LearningRate 0.0138   Epoch: 12   Global Step: 209830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:26:03,661-Speed 3335.27 samples/sec   Loss 1.0882   LearningRate 0.0138   Epoch: 12   Global Step: 209840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:26:06,744-Speed 3322.33 samples/sec   Loss 1.0790   LearningRate 0.0138   Epoch: 12   Global Step: 209850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:26:09,871-Speed 3275.31 samples/sec   Loss 1.0895   LearningRate 0.0138   Epoch: 12   Global Step: 209860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:12,945-Speed 3331.48 samples/sec   Loss 1.0868   LearningRate 0.0138   Epoch: 12   Global Step: 209870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:16,009-Speed 3343.04 samples/sec   Loss 1.0929   LearningRate 0.0138   Epoch: 12   Global Step: 209880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:19,083-Speed 3331.18 samples/sec   Loss 1.1155   LearningRate 0.0138   Epoch: 12   Global Step: 209890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:22,160-Speed 3329.27 samples/sec   Loss 1.0853   LearningRate 0.0138   Epoch: 12   Global Step: 209900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:25,227-Speed 3340.06 samples/sec   Loss 1.1014   LearningRate 0.0138   Epoch: 12   Global Step: 209910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:28,288-Speed 3345.33 samples/sec   Loss 1.0890   LearningRate 0.0138   Epoch: 12   Global Step: 209920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:31,359-Speed 3335.06 samples/sec   Loss 1.1177   LearningRate 0.0138   Epoch: 12   Global Step: 209930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:34,490-Speed 3271.33 samples/sec   Loss 1.0741   LearningRate 0.0138   Epoch: 12   Global Step: 209940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:37,603-Speed 3290.67 samples/sec   Loss 1.0846   LearningRate 0.0138   Epoch: 12   Global Step: 209950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:26:40,707-Speed 3299.68 samples/sec   Loss 1.0980   LearningRate 0.0138   Epoch: 12   Global Step: 209960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:26:43,831-Speed 3278.37 samples/sec   Loss 1.0913   LearningRate 0.0138   Epoch: 12   Global Step: 209970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:26:46,955-Speed 3278.54 samples/sec   Loss 1.1046   LearningRate 0.0138   Epoch: 12   Global Step: 209980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:26:50,061-Speed 3298.00 samples/sec   Loss 1.0747   LearningRate 0.0138   Epoch: 12   Global Step: 209990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:26:53,133-Speed 3334.53 samples/sec   Loss 1.0967   LearningRate 0.0138   Epoch: 12   Global Step: 210000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:27:37,212-[lfw][210000]XNorm: 22.479736
Training: 2022-04-11 21:27:37,212-[lfw][210000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-11 21:27:37,213-[lfw][210000]Accuracy-Highest: 0.99817
Training: 2022-04-11 21:28:28,408-[cfp_fp][210000]XNorm: 23.074610
Training: 2022-04-11 21:28:28,408-[cfp_fp][210000]Accuracy-Flip: 0.98957+-0.00550
Training: 2022-04-11 21:28:28,409-[cfp_fp][210000]Accuracy-Highest: 0.99086
Training: 2022-04-11 21:29:12,923-[agedb_30][210000]XNorm: 23.605374
Training: 2022-04-11 21:29:12,923-[agedb_30][210000]Accuracy-Flip: 0.98500+-0.00749
Training: 2022-04-11 21:29:12,924-[agedb_30][210000]Accuracy-Highest: 0.98567
Training: 2022-04-11 21:29:15,989-Speed 71.68 samples/sec   Loss 1.0717   LearningRate 0.0138   Epoch: 12   Global Step: 210010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:29:19,047-Speed 3349.59 samples/sec   Loss 1.1044   LearningRate 0.0138   Epoch: 12   Global Step: 210020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:29:22,144-Speed 3307.39 samples/sec   Loss 1.1058   LearningRate 0.0138   Epoch: 12   Global Step: 210030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:25,268-Speed 3278.55 samples/sec   Loss 1.0862   LearningRate 0.0137   Epoch: 12   Global Step: 210040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:28,329-Speed 3345.62 samples/sec   Loss 1.1000   LearningRate 0.0137   Epoch: 12   Global Step: 210050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:31,424-Speed 3309.49 samples/sec   Loss 1.0991   LearningRate 0.0137   Epoch: 12   Global Step: 210060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:34,492-Speed 3338.11 samples/sec   Loss 1.0844   LearningRate 0.0137   Epoch: 12   Global Step: 210070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:37,564-Speed 3333.73 samples/sec   Loss 1.1008   LearningRate 0.0137   Epoch: 12   Global Step: 210080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:40,688-Speed 3279.27 samples/sec   Loss 1.1230   LearningRate 0.0137   Epoch: 12   Global Step: 210090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:43,765-Speed 3328.32 samples/sec   Loss 1.0609   LearningRate 0.0137   Epoch: 12   Global Step: 210100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:46,911-Speed 3255.62 samples/sec   Loss 1.0699   LearningRate 0.0137   Epoch: 12   Global Step: 210110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:49,995-Speed 3321.70 samples/sec   Loss 1.1040   LearningRate 0.0137   Epoch: 12   Global Step: 210120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:29:53,063-Speed 3337.41 samples/sec   Loss 1.0581   LearningRate 0.0137   Epoch: 12   Global Step: 210130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:29:56,128-Speed 3342.72 samples/sec   Loss 1.0783   LearningRate 0.0137   Epoch: 12   Global Step: 210140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:29:59,193-Speed 3341.28 samples/sec   Loss 1.1069   LearningRate 0.0137   Epoch: 12   Global Step: 210150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:02,266-Speed 3332.86 samples/sec   Loss 1.0762   LearningRate 0.0137   Epoch: 12   Global Step: 210160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:05,330-Speed 3342.84 samples/sec   Loss 1.1299   LearningRate 0.0137   Epoch: 12   Global Step: 210170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:08,408-Speed 3327.44 samples/sec   Loss 1.1074   LearningRate 0.0137   Epoch: 12   Global Step: 210180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:11,467-Speed 3349.05 samples/sec   Loss 1.0832   LearningRate 0.0137   Epoch: 12   Global Step: 210190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:14,649-Speed 3218.11 samples/sec   Loss 1.0950   LearningRate 0.0137   Epoch: 12   Global Step: 210200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:17,723-Speed 3332.72 samples/sec   Loss 1.0660   LearningRate 0.0137   Epoch: 12   Global Step: 210210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:20,791-Speed 3338.51 samples/sec   Loss 1.1036   LearningRate 0.0137   Epoch: 12   Global Step: 210220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:23,886-Speed 3308.42 samples/sec   Loss 1.0941   LearningRate 0.0137   Epoch: 12   Global Step: 210230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:26,958-Speed 3334.80 samples/sec   Loss 1.0575   LearningRate 0.0137   Epoch: 12   Global Step: 210240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:30,055-Speed 3306.55 samples/sec   Loss 1.1090   LearningRate 0.0137   Epoch: 12   Global Step: 210250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:33,140-Speed 3320.82 samples/sec   Loss 1.1184   LearningRate 0.0137   Epoch: 12   Global Step: 210260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:36,198-Speed 3348.92 samples/sec   Loss 1.1173   LearningRate 0.0137   Epoch: 12   Global Step: 210270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:39,269-Speed 3336.22 samples/sec   Loss 1.0829   LearningRate 0.0137   Epoch: 12   Global Step: 210280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:42,331-Speed 3344.62 samples/sec   Loss 1.0775   LearningRate 0.0137   Epoch: 12   Global Step: 210290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:45,411-Speed 3325.51 samples/sec   Loss 1.0885   LearningRate 0.0137   Epoch: 12   Global Step: 210300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:48,478-Speed 3338.66 samples/sec   Loss 1.1646   LearningRate 0.0137   Epoch: 12   Global Step: 210310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:30:51,529-Speed 3357.33 samples/sec   Loss 1.0929   LearningRate 0.0137   Epoch: 12   Global Step: 210320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:30:54,597-Speed 3338.79 samples/sec   Loss 1.0809   LearningRate 0.0137   Epoch: 12   Global Step: 210330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:30:57,661-Speed 3343.12 samples/sec   Loss 1.1273   LearningRate 0.0137   Epoch: 12   Global Step: 210340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:00,736-Speed 3329.90 samples/sec   Loss 1.0969   LearningRate 0.0137   Epoch: 12   Global Step: 210350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:03,804-Speed 3338.72 samples/sec   Loss 1.0921   LearningRate 0.0137   Epoch: 12   Global Step: 210360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:06,860-Speed 3351.74 samples/sec   Loss 1.0701   LearningRate 0.0137   Epoch: 12   Global Step: 210370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:09,933-Speed 3332.94 samples/sec   Loss 1.1249   LearningRate 0.0137   Epoch: 12   Global Step: 210380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:13,011-Speed 3328.26 samples/sec   Loss 1.1260   LearningRate 0.0137   Epoch: 12   Global Step: 210390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:16,074-Speed 3343.11 samples/sec   Loss 1.0641   LearningRate 0.0137   Epoch: 12   Global Step: 210400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:19,155-Speed 3324.40 samples/sec   Loss 1.0748   LearningRate 0.0137   Epoch: 12   Global Step: 210410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:22,214-Speed 3348.78 samples/sec   Loss 1.1083   LearningRate 0.0137   Epoch: 12   Global Step: 210420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:31:25,266-Speed 3355.50 samples/sec   Loss 1.1356   LearningRate 0.0137   Epoch: 12   Global Step: 210430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:28,344-Speed 3328.48 samples/sec   Loss 1.1097   LearningRate 0.0137   Epoch: 12   Global Step: 210440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:31,408-Speed 3342.94 samples/sec   Loss 1.0977   LearningRate 0.0137   Epoch: 12   Global Step: 210450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:34,467-Speed 3347.79 samples/sec   Loss 1.1236   LearningRate 0.0137   Epoch: 12   Global Step: 210460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:37,536-Speed 3337.17 samples/sec   Loss 1.1666   LearningRate 0.0137   Epoch: 12   Global Step: 210470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:40,612-Speed 3329.73 samples/sec   Loss 1.1124   LearningRate 0.0137   Epoch: 12   Global Step: 210480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:43,671-Speed 3348.16 samples/sec   Loss 1.1346   LearningRate 0.0136   Epoch: 12   Global Step: 210490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:46,744-Speed 3333.45 samples/sec   Loss 1.1162   LearningRate 0.0136   Epoch: 12   Global Step: 210500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:49,828-Speed 3321.21 samples/sec   Loss 1.1506   LearningRate 0.0136   Epoch: 12   Global Step: 210510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:52,893-Speed 3341.60 samples/sec   Loss 1.0953   LearningRate 0.0136   Epoch: 12   Global Step: 210520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:31:55,961-Speed 3338.41 samples/sec   Loss 1.1514   LearningRate 0.0136   Epoch: 12   Global Step: 210530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:31:59,060-Speed 3305.35 samples/sec   Loss 1.1142   LearningRate 0.0136   Epoch: 12   Global Step: 210540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:32:02,127-Speed 3339.10 samples/sec   Loss 1.1083   LearningRate 0.0136   Epoch: 12   Global Step: 210550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:32:05,193-Speed 3340.87 samples/sec   Loss 1.0828   LearningRate 0.0136   Epoch: 12   Global Step: 210560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:32:08,236-Speed 3366.41 samples/sec   Loss 1.1170   LearningRate 0.0136   Epoch: 12   Global Step: 210570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:11,300-Speed 3342.20 samples/sec   Loss 1.1032   LearningRate 0.0136   Epoch: 12   Global Step: 210580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:14,360-Speed 3347.41 samples/sec   Loss 1.0918   LearningRate 0.0136   Epoch: 12   Global Step: 210590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:17,432-Speed 3334.20 samples/sec   Loss 1.1134   LearningRate 0.0136   Epoch: 12   Global Step: 210600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:20,503-Speed 3335.03 samples/sec   Loss 1.0833   LearningRate 0.0136   Epoch: 12   Global Step: 210610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:23,617-Speed 3288.75 samples/sec   Loss 1.0441   LearningRate 0.0136   Epoch: 12   Global Step: 210620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:26,686-Speed 3338.23 samples/sec   Loss 1.0863   LearningRate 0.0136   Epoch: 12   Global Step: 210630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:29,746-Speed 3346.70 samples/sec   Loss 1.0565   LearningRate 0.0136   Epoch: 12   Global Step: 210640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:32,818-Speed 3333.99 samples/sec   Loss 1.0737   LearningRate 0.0136   Epoch: 12   Global Step: 210650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:35,877-Speed 3347.96 samples/sec   Loss 1.1053   LearningRate 0.0136   Epoch: 12   Global Step: 210660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:32:38,955-Speed 3327.80 samples/sec   Loss 1.1215   LearningRate 0.0136   Epoch: 12   Global Step: 210670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:32:42,029-Speed 3331.92 samples/sec   Loss 1.0930   LearningRate 0.0136   Epoch: 12   Global Step: 210680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:32:45,097-Speed 3337.92 samples/sec   Loss 1.0929   LearningRate 0.0136   Epoch: 12   Global Step: 210690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:32:48,174-Speed 3328.85 samples/sec   Loss 1.0747   LearningRate 0.0136   Epoch: 12   Global Step: 210700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:32:51,242-Speed 3338.94 samples/sec   Loss 1.0718   LearningRate 0.0136   Epoch: 12   Global Step: 210710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:32:54,299-Speed 3350.28 samples/sec   Loss 1.0763   LearningRate 0.0136   Epoch: 12   Global Step: 210720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:32:57,366-Speed 3339.82 samples/sec   Loss 1.1316   LearningRate 0.0136   Epoch: 12   Global Step: 210730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:00,429-Speed 3344.40 samples/sec   Loss 1.1010   LearningRate 0.0136   Epoch: 12   Global Step: 210740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:03,492-Speed 3343.29 samples/sec   Loss 1.0950   LearningRate 0.0136   Epoch: 12   Global Step: 210750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:06,564-Speed 3334.45 samples/sec   Loss 1.0933   LearningRate 0.0136   Epoch: 12   Global Step: 210760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:09,649-Speed 3320.02 samples/sec   Loss 1.0918   LearningRate 0.0136   Epoch: 12   Global Step: 210770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:33:12,721-Speed 3333.96 samples/sec   Loss 1.1095   LearningRate 0.0136   Epoch: 12   Global Step: 210780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:33:15,794-Speed 3333.63 samples/sec   Loss 1.0866   LearningRate 0.0136   Epoch: 12   Global Step: 210790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:33:18,852-Speed 3348.64 samples/sec   Loss 1.1301   LearningRate 0.0136   Epoch: 12   Global Step: 210800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:33:21,912-Speed 3347.05 samples/sec   Loss 1.1281   LearningRate 0.0136   Epoch: 12   Global Step: 210810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:33:24,985-Speed 3333.31 samples/sec   Loss 1.1173   LearningRate 0.0136   Epoch: 12   Global Step: 210820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:33:28,045-Speed 3347.48 samples/sec   Loss 1.1551   LearningRate 0.0136   Epoch: 12   Global Step: 210830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:33:31,244-Speed 3201.46 samples/sec   Loss 1.1159   LearningRate 0.0136   Epoch: 12   Global Step: 210840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:33:34,474-Speed 3171.28 samples/sec   Loss 1.1378   LearningRate 0.0136   Epoch: 12   Global Step: 210850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:33:37,635-Speed 3239.75 samples/sec   Loss 1.1072   LearningRate 0.0136   Epoch: 12   Global Step: 210860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:40,706-Speed 3335.79 samples/sec   Loss 1.0979   LearningRate 0.0136   Epoch: 12   Global Step: 210870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:43,778-Speed 3334.09 samples/sec   Loss 1.0770   LearningRate 0.0136   Epoch: 12   Global Step: 210880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:46,845-Speed 3339.12 samples/sec   Loss 1.0781   LearningRate 0.0136   Epoch: 12   Global Step: 210890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:49,913-Speed 3338.15 samples/sec   Loss 1.1469   LearningRate 0.0136   Epoch: 12   Global Step: 210900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:52,990-Speed 3328.69 samples/sec   Loss 1.1127   LearningRate 0.0136   Epoch: 12   Global Step: 210910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:56,053-Speed 3343.82 samples/sec   Loss 1.1474   LearningRate 0.0136   Epoch: 12   Global Step: 210920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:33:59,254-Speed 3200.00 samples/sec   Loss 1.1611   LearningRate 0.0136   Epoch: 12   Global Step: 210930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:02,340-Speed 3318.59 samples/sec   Loss 1.0889   LearningRate 0.0135   Epoch: 12   Global Step: 210940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:05,411-Speed 3335.35 samples/sec   Loss 1.0322   LearningRate 0.0135   Epoch: 12   Global Step: 210950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:08,476-Speed 3341.89 samples/sec   Loss 1.1321   LearningRate 0.0135   Epoch: 12   Global Step: 210960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:34:11,559-Speed 3322.32 samples/sec   Loss 1.0583   LearningRate 0.0135   Epoch: 12   Global Step: 210970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:34:14,625-Speed 3340.83 samples/sec   Loss 1.1160   LearningRate 0.0135   Epoch: 12   Global Step: 210980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:34:17,693-Speed 3338.17 samples/sec   Loss 1.1132   LearningRate 0.0135   Epoch: 12   Global Step: 210990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:34:20,744-Speed 3356.95 samples/sec   Loss 1.1244   LearningRate 0.0135   Epoch: 12   Global Step: 211000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:23,807-Speed 3344.35 samples/sec   Loss 1.1259   LearningRate 0.0135   Epoch: 12   Global Step: 211010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:26,867-Speed 3347.84 samples/sec   Loss 1.1029   LearningRate 0.0135   Epoch: 12   Global Step: 211020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:29,966-Speed 3304.47 samples/sec   Loss 1.1148   LearningRate 0.0135   Epoch: 12   Global Step: 211030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:33,085-Speed 3283.58 samples/sec   Loss 1.0857   LearningRate 0.0135   Epoch: 12   Global Step: 211040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:36,155-Speed 3336.62 samples/sec   Loss 1.1219   LearningRate 0.0135   Epoch: 12   Global Step: 211050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:39,221-Speed 3340.46 samples/sec   Loss 1.0999   LearningRate 0.0135   Epoch: 12   Global Step: 211060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:42,285-Speed 3343.08 samples/sec   Loss 1.0791   LearningRate 0.0135   Epoch: 12   Global Step: 211070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:45,369-Speed 3321.88 samples/sec   Loss 1.1145   LearningRate 0.0135   Epoch: 12   Global Step: 211080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:48,434-Speed 3341.01 samples/sec   Loss 1.1229   LearningRate 0.0135   Epoch: 12   Global Step: 211090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:34:51,498-Speed 3342.77 samples/sec   Loss 1.1111   LearningRate 0.0135   Epoch: 12   Global Step: 211100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:34:54,564-Speed 3340.53 samples/sec   Loss 1.1328   LearningRate 0.0135   Epoch: 12   Global Step: 211110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:34:57,631-Speed 3339.51 samples/sec   Loss 1.1065   LearningRate 0.0135   Epoch: 12   Global Step: 211120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:00,700-Speed 3338.41 samples/sec   Loss 1.0312   LearningRate 0.0135   Epoch: 12   Global Step: 211130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:03,785-Speed 3319.45 samples/sec   Loss 1.1018   LearningRate 0.0135   Epoch: 12   Global Step: 211140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:06,873-Speed 3316.47 samples/sec   Loss 1.1241   LearningRate 0.0135   Epoch: 12   Global Step: 211150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:09,939-Speed 3341.27 samples/sec   Loss 1.0732   LearningRate 0.0135   Epoch: 12   Global Step: 211160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:13,012-Speed 3332.62 samples/sec   Loss 1.0865   LearningRate 0.0135   Epoch: 12   Global Step: 211170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:16,071-Speed 3347.92 samples/sec   Loss 1.0848   LearningRate 0.0135   Epoch: 12   Global Step: 211180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:19,140-Speed 3337.87 samples/sec   Loss 1.0619   LearningRate 0.0135   Epoch: 12   Global Step: 211190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:22,206-Speed 3340.12 samples/sec   Loss 1.1053   LearningRate 0.0135   Epoch: 12   Global Step: 211200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:25,294-Speed 3317.10 samples/sec   Loss 1.0968   LearningRate 0.0135   Epoch: 12   Global Step: 211210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:28,533-Speed 3161.95 samples/sec   Loss 1.1038   LearningRate 0.0135   Epoch: 12   Global Step: 211220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:31,607-Speed 3332.23 samples/sec   Loss 1.0748   LearningRate 0.0135   Epoch: 12   Global Step: 211230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:34,716-Speed 3294.41 samples/sec   Loss 1.0938   LearningRate 0.0135   Epoch: 12   Global Step: 211240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:37,852-Speed 3266.30 samples/sec   Loss 1.0830   LearningRate 0.0135   Epoch: 12   Global Step: 211250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:40,935-Speed 3322.10 samples/sec   Loss 1.1053   LearningRate 0.0135   Epoch: 12   Global Step: 211260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:44,008-Speed 3333.91 samples/sec   Loss 1.0837   LearningRate 0.0135   Epoch: 12   Global Step: 211270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:47,075-Speed 3339.39 samples/sec   Loss 1.1085   LearningRate 0.0135   Epoch: 12   Global Step: 211280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:50,140-Speed 3341.49 samples/sec   Loss 1.0763   LearningRate 0.0135   Epoch: 12   Global Step: 211290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:53,209-Speed 3336.95 samples/sec   Loss 1.0592   LearningRate 0.0135   Epoch: 12   Global Step: 211300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:35:56,329-Speed 3283.45 samples/sec   Loss 1.1161   LearningRate 0.0135   Epoch: 12   Global Step: 211310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:35:59,401-Speed 3334.19 samples/sec   Loss 1.1070   LearningRate 0.0135   Epoch: 12   Global Step: 211320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:02,484-Speed 3322.03 samples/sec   Loss 1.0915   LearningRate 0.0135   Epoch: 12   Global Step: 211330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:05,549-Speed 3341.50 samples/sec   Loss 1.1026   LearningRate 0.0135   Epoch: 12   Global Step: 211340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:08,614-Speed 3341.34 samples/sec   Loss 1.0788   LearningRate 0.0135   Epoch: 12   Global Step: 211350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:11,678-Speed 3342.96 samples/sec   Loss 1.0877   LearningRate 0.0135   Epoch: 12   Global Step: 211360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:14,754-Speed 3330.65 samples/sec   Loss 1.1180   LearningRate 0.0135   Epoch: 12   Global Step: 211370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:17,824-Speed 3335.69 samples/sec   Loss 1.0605   LearningRate 0.0135   Epoch: 12   Global Step: 211380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:20,993-Speed 3231.78 samples/sec   Loss 1.0955   LearningRate 0.0135   Epoch: 12   Global Step: 211390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:24,071-Speed 3327.81 samples/sec   Loss 1.1648   LearningRate 0.0134   Epoch: 12   Global Step: 211400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:27,149-Speed 3327.67 samples/sec   Loss 1.1176   LearningRate 0.0134   Epoch: 12   Global Step: 211410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:30,260-Speed 3292.80 samples/sec   Loss 1.0996   LearningRate 0.0134   Epoch: 12   Global Step: 211420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:36:33,323-Speed 3344.10 samples/sec   Loss 1.1141   LearningRate 0.0134   Epoch: 12   Global Step: 211430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:36:36,432-Speed 3293.85 samples/sec   Loss 1.1115   LearningRate 0.0134   Epoch: 12   Global Step: 211440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:40,303-Speed 2645.52 samples/sec   Loss 1.1178   LearningRate 0.0134   Epoch: 12   Global Step: 211450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:43,409-Speed 3298.03 samples/sec   Loss 1.0949   LearningRate 0.0134   Epoch: 12   Global Step: 211460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:46,541-Speed 3271.07 samples/sec   Loss 1.1277   LearningRate 0.0134   Epoch: 12   Global Step: 211470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:49,634-Speed 3311.57 samples/sec   Loss 1.1369   LearningRate 0.0134   Epoch: 12   Global Step: 211480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:52,695-Speed 3346.26 samples/sec   Loss 1.0710   LearningRate 0.0134   Epoch: 12   Global Step: 211490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:55,816-Speed 3281.82 samples/sec   Loss 1.0981   LearningRate 0.0134   Epoch: 12   Global Step: 211500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:36:58,878-Speed 3344.38 samples/sec   Loss 1.1290   LearningRate 0.0134   Epoch: 12   Global Step: 211510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:37:01,938-Speed 3347.93 samples/sec   Loss 1.1148   LearningRate 0.0134   Epoch: 12   Global Step: 211520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:37:05,016-Speed 3327.36 samples/sec   Loss 1.0704   LearningRate 0.0134   Epoch: 12   Global Step: 211530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:37:08,084-Speed 3338.58 samples/sec   Loss 1.1113   LearningRate 0.0134   Epoch: 12   Global Step: 211540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:11,174-Speed 3314.78 samples/sec   Loss 1.0656   LearningRate 0.0134   Epoch: 12   Global Step: 211550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:14,243-Speed 3337.72 samples/sec   Loss 1.0939   LearningRate 0.0134   Epoch: 12   Global Step: 211560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:17,322-Speed 3325.95 samples/sec   Loss 1.1140   LearningRate 0.0134   Epoch: 12   Global Step: 211570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:20,402-Speed 3325.54 samples/sec   Loss 1.0915   LearningRate 0.0134   Epoch: 12   Global Step: 211580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:23,466-Speed 3342.47 samples/sec   Loss 1.0645   LearningRate 0.0134   Epoch: 12   Global Step: 211590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:26,550-Speed 3321.67 samples/sec   Loss 1.0651   LearningRate 0.0134   Epoch: 12   Global Step: 211600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:29,615-Speed 3341.33 samples/sec   Loss 1.0435   LearningRate 0.0134   Epoch: 12   Global Step: 211610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:32,683-Speed 3339.33 samples/sec   Loss 1.0767   LearningRate 0.0134   Epoch: 12   Global Step: 211620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:35,754-Speed 3334.54 samples/sec   Loss 1.0869   LearningRate 0.0134   Epoch: 12   Global Step: 211630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:38,811-Speed 3350.94 samples/sec   Loss 1.1351   LearningRate 0.0134   Epoch: 12   Global Step: 211640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:41,886-Speed 3329.85 samples/sec   Loss 1.1234   LearningRate 0.0134   Epoch: 12   Global Step: 211650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:45,016-Speed 3272.91 samples/sec   Loss 1.0829   LearningRate 0.0134   Epoch: 12   Global Step: 211660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:48,125-Speed 3294.10 samples/sec   Loss 1.0765   LearningRate 0.0134   Epoch: 12   Global Step: 211670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:51,206-Speed 3323.85 samples/sec   Loss 1.0879   LearningRate 0.0134   Epoch: 12   Global Step: 211680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:54,294-Speed 3317.98 samples/sec   Loss 1.0728   LearningRate 0.0134   Epoch: 12   Global Step: 211690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:37:57,558-Speed 3138.16 samples/sec   Loss 1.1324   LearningRate 0.0134   Epoch: 12   Global Step: 211700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:38:00,653-Speed 3309.54 samples/sec   Loss 1.0638   LearningRate 0.0134   Epoch: 12   Global Step: 211710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:38:03,741-Speed 3316.24 samples/sec   Loss 1.1094   LearningRate 0.0134   Epoch: 12   Global Step: 211720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:38:06,810-Speed 3337.77 samples/sec   Loss 1.1281   LearningRate 0.0134   Epoch: 12   Global Step: 211730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:38:09,868-Speed 3348.90 samples/sec   Loss 1.0485   LearningRate 0.0134   Epoch: 12   Global Step: 211740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:38:12,952-Speed 3320.64 samples/sec   Loss 1.1294   LearningRate 0.0134   Epoch: 12   Global Step: 211750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:38:16,019-Speed 3340.25 samples/sec   Loss 1.1041   LearningRate 0.0134   Epoch: 12   Global Step: 211760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:19,090-Speed 3335.06 samples/sec   Loss 1.1260   LearningRate 0.0134   Epoch: 12   Global Step: 211770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:22,162-Speed 3333.93 samples/sec   Loss 1.1223   LearningRate 0.0134   Epoch: 12   Global Step: 211780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:25,231-Speed 3337.91 samples/sec   Loss 1.0694   LearningRate 0.0134   Epoch: 12   Global Step: 211790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:28,300-Speed 3336.98 samples/sec   Loss 1.0995   LearningRate 0.0134   Epoch: 12   Global Step: 211800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:31,366-Speed 3340.81 samples/sec   Loss 1.0956   LearningRate 0.0134   Epoch: 12   Global Step: 211810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:34,434-Speed 3338.73 samples/sec   Loss 1.1040   LearningRate 0.0134   Epoch: 12   Global Step: 211820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:37,503-Speed 3336.96 samples/sec   Loss 1.1398   LearningRate 0.0134   Epoch: 12   Global Step: 211830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:40,624-Speed 3282.00 samples/sec   Loss 1.0805   LearningRate 0.0134   Epoch: 12   Global Step: 211840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:43,724-Speed 3303.09 samples/sec   Loss 1.1456   LearningRate 0.0134   Epoch: 12   Global Step: 211850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:46,830-Speed 3297.63 samples/sec   Loss 1.1054   LearningRate 0.0133   Epoch: 12   Global Step: 211860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:38:49,956-Speed 3276.73 samples/sec   Loss 1.1010   LearningRate 0.0133   Epoch: 12   Global Step: 211870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:53,025-Speed 3337.76 samples/sec   Loss 1.1154   LearningRate 0.0133   Epoch: 12   Global Step: 211880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:56,091-Speed 3340.60 samples/sec   Loss 1.1429   LearningRate 0.0133   Epoch: 12   Global Step: 211890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:38:59,170-Speed 3326.27 samples/sec   Loss 1.1250   LearningRate 0.0133   Epoch: 12   Global Step: 211900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:39:02,266-Speed 3308.30 samples/sec   Loss 1.0991   LearningRate 0.0133   Epoch: 12   Global Step: 211910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:39:05,330-Speed 3343.10 samples/sec   Loss 1.1217   LearningRate 0.0133   Epoch: 12   Global Step: 211920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:39:08,396-Speed 3340.91 samples/sec   Loss 1.1299   LearningRate 0.0133   Epoch: 12   Global Step: 211930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:39:11,472-Speed 3329.50 samples/sec   Loss 1.1167   LearningRate 0.0133   Epoch: 12   Global Step: 211940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:39:14,547-Speed 3331.51 samples/sec   Loss 1.0905   LearningRate 0.0133   Epoch: 12   Global Step: 211950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:39:17,612-Speed 3341.74 samples/sec   Loss 1.1097   LearningRate 0.0133   Epoch: 12   Global Step: 211960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:39:20,678-Speed 3340.43 samples/sec   Loss 1.1281   LearningRate 0.0133   Epoch: 12   Global Step: 211970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:39:23,741-Speed 3343.77 samples/sec   Loss 1.0990   LearningRate 0.0133   Epoch: 12   Global Step: 211980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:39:26,809-Speed 3338.82 samples/sec   Loss 1.1411   LearningRate 0.0133   Epoch: 12   Global Step: 211990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:39:29,885-Speed 3329.41 samples/sec   Loss 1.1661   LearningRate 0.0133   Epoch: 12   Global Step: 212000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:40:13,852-[lfw][212000]XNorm: 21.705825
Training: 2022-04-11 21:40:13,852-[lfw][212000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 21:40:13,853-[lfw][212000]Accuracy-Highest: 0.99817
Training: 2022-04-11 21:41:05,041-[cfp_fp][212000]XNorm: 22.322761
Training: 2022-04-11 21:41:05,042-[cfp_fp][212000]Accuracy-Flip: 0.98943+-0.00492
Training: 2022-04-11 21:41:05,042-[cfp_fp][212000]Accuracy-Highest: 0.99086
Training: 2022-04-11 21:41:49,012-[agedb_30][212000]XNorm: 22.973783
Training: 2022-04-11 21:41:49,013-[agedb_30][212000]Accuracy-Flip: 0.98433+-0.00564
Training: 2022-04-11 21:41:49,013-[agedb_30][212000]Accuracy-Highest: 0.98567
Training: 2022-04-11 21:41:52,116-Speed 72.00 samples/sec   Loss 1.1449   LearningRate 0.0133   Epoch: 12   Global Step: 212010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:41:55,207-Speed 3312.65 samples/sec   Loss 1.0925   LearningRate 0.0133   Epoch: 12   Global Step: 212020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:41:58,293-Speed 3320.06 samples/sec   Loss 1.0774   LearningRate 0.0133   Epoch: 12   Global Step: 212030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:01,344-Speed 3356.92 samples/sec   Loss 1.1573   LearningRate 0.0133   Epoch: 12   Global Step: 212040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:04,413-Speed 3337.16 samples/sec   Loss 1.1373   LearningRate 0.0133   Epoch: 12   Global Step: 212050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:07,486-Speed 3332.98 samples/sec   Loss 1.0999   LearningRate 0.0133   Epoch: 12   Global Step: 212060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:10,553-Speed 3339.93 samples/sec   Loss 1.1292   LearningRate 0.0133   Epoch: 12   Global Step: 212070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:13,610-Speed 3349.85 samples/sec   Loss 1.1853   LearningRate 0.0133   Epoch: 12   Global Step: 212080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:16,669-Speed 3348.96 samples/sec   Loss 1.1140   LearningRate 0.0133   Epoch: 12   Global Step: 212090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:19,748-Speed 3325.91 samples/sec   Loss 1.0955   LearningRate 0.0133   Epoch: 12   Global Step: 212100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:22,873-Speed 3277.02 samples/sec   Loss 1.1432   LearningRate 0.0133   Epoch: 12   Global Step: 212110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:25,934-Speed 3346.42 samples/sec   Loss 1.1433   LearningRate 0.0133   Epoch: 12   Global Step: 212120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:29,028-Speed 3310.25 samples/sec   Loss 1.1497   LearningRate 0.0133   Epoch: 12   Global Step: 212130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:32,094-Speed 3340.62 samples/sec   Loss 1.1133   LearningRate 0.0133   Epoch: 12   Global Step: 212140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:35,179-Speed 3320.69 samples/sec   Loss 1.1128   LearningRate 0.0133   Epoch: 12   Global Step: 212150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:38,285-Speed 3297.21 samples/sec   Loss 1.0852   LearningRate 0.0133   Epoch: 12   Global Step: 212160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:41,408-Speed 3280.26 samples/sec   Loss 1.0960   LearningRate 0.0133   Epoch: 12   Global Step: 212170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:44,484-Speed 3329.50 samples/sec   Loss 1.0970   LearningRate 0.0133   Epoch: 12   Global Step: 212180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:47,549-Speed 3341.55 samples/sec   Loss 1.1370   LearningRate 0.0133   Epoch: 12   Global Step: 212190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:50,624-Speed 3330.44 samples/sec   Loss 1.0980   LearningRate 0.0133   Epoch: 12   Global Step: 212200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:53,686-Speed 3345.64 samples/sec   Loss 1.0929   LearningRate 0.0133   Epoch: 12   Global Step: 212210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:42:56,799-Speed 3290.00 samples/sec   Loss 1.1186   LearningRate 0.0133   Epoch: 12   Global Step: 212220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:42:59,875-Speed 3329.36 samples/sec   Loss 1.0850   LearningRate 0.0133   Epoch: 12   Global Step: 212230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:43:03,000-Speed 3277.74 samples/sec   Loss 1.1191   LearningRate 0.0133   Epoch: 12   Global Step: 212240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:43:06,219-Speed 3182.56 samples/sec   Loss 1.1056   LearningRate 0.0133   Epoch: 12   Global Step: 212250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:43:09,353-Speed 3267.35 samples/sec   Loss 1.0718   LearningRate 0.0133   Epoch: 12   Global Step: 212260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:12,445-Speed 3313.28 samples/sec   Loss 1.1390   LearningRate 0.0133   Epoch: 12   Global Step: 212270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:15,530-Speed 3319.41 samples/sec   Loss 1.0846   LearningRate 0.0133   Epoch: 12   Global Step: 212280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:18,614-Speed 3321.94 samples/sec   Loss 1.0662   LearningRate 0.0133   Epoch: 12   Global Step: 212290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:21,706-Speed 3312.43 samples/sec   Loss 1.0835   LearningRate 0.0133   Epoch: 12   Global Step: 212300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:24,771-Speed 3341.66 samples/sec   Loss 1.1200   LearningRate 0.0132   Epoch: 12   Global Step: 212310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:27,864-Speed 3311.07 samples/sec   Loss 1.0967   LearningRate 0.0132   Epoch: 12   Global Step: 212320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:30,940-Speed 3330.24 samples/sec   Loss 1.1058   LearningRate 0.0132   Epoch: 12   Global Step: 212330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:34,072-Speed 3270.69 samples/sec   Loss 1.1016   LearningRate 0.0132   Epoch: 12   Global Step: 212340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:37,217-Speed 3256.66 samples/sec   Loss 1.0842   LearningRate 0.0132   Epoch: 12   Global Step: 212350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:43:40,292-Speed 3330.11 samples/sec   Loss 1.1300   LearningRate 0.0132   Epoch: 12   Global Step: 212360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:43:43,402-Speed 3293.89 samples/sec   Loss 1.1483   LearningRate 0.0132   Epoch: 12   Global Step: 212370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:43:46,496-Speed 3310.56 samples/sec   Loss 1.1125   LearningRate 0.0132   Epoch: 12   Global Step: 212380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:43:49,629-Speed 3269.27 samples/sec   Loss 1.0942   LearningRate 0.0132   Epoch: 12   Global Step: 212390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:43:52,698-Speed 3336.31 samples/sec   Loss 1.1578   LearningRate 0.0132   Epoch: 12   Global Step: 212400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:43:55,850-Speed 3250.17 samples/sec   Loss 1.1030   LearningRate 0.0132   Epoch: 12   Global Step: 212410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:43:58,938-Speed 3316.83 samples/sec   Loss 1.0380   LearningRate 0.0132   Epoch: 12   Global Step: 212420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:44:01,998-Speed 3347.88 samples/sec   Loss 1.1260   LearningRate 0.0132   Epoch: 12   Global Step: 212430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:44:05,059-Speed 3345.30 samples/sec   Loss 1.0933   LearningRate 0.0132   Epoch: 12   Global Step: 212440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:44:08,126-Speed 3339.41 samples/sec   Loss 1.1256   LearningRate 0.0132   Epoch: 12   Global Step: 212450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:11,187-Speed 3346.46 samples/sec   Loss 1.0670   LearningRate 0.0132   Epoch: 12   Global Step: 212460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:14,291-Speed 3300.02 samples/sec   Loss 1.0737   LearningRate 0.0132   Epoch: 12   Global Step: 212470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:17,376-Speed 3320.26 samples/sec   Loss 1.0967   LearningRate 0.0132   Epoch: 12   Global Step: 212480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:20,470-Speed 3310.10 samples/sec   Loss 1.1027   LearningRate 0.0132   Epoch: 12   Global Step: 212490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:23,544-Speed 3331.07 samples/sec   Loss 1.0901   LearningRate 0.0132   Epoch: 12   Global Step: 212500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:26,644-Speed 3304.75 samples/sec   Loss 1.1008   LearningRate 0.0132   Epoch: 12   Global Step: 212510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:29,704-Speed 3347.88 samples/sec   Loss 1.1183   LearningRate 0.0132   Epoch: 12   Global Step: 212520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:32,761-Speed 3350.23 samples/sec   Loss 1.0798   LearningRate 0.0132   Epoch: 12   Global Step: 212530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:35,828-Speed 3339.59 samples/sec   Loss 1.1430   LearningRate 0.0132   Epoch: 12   Global Step: 212540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:38,900-Speed 3333.93 samples/sec   Loss 1.1468   LearningRate 0.0132   Epoch: 12   Global Step: 212550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:44:41,979-Speed 3325.98 samples/sec   Loss 1.0973   LearningRate 0.0132   Epoch: 12   Global Step: 212560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:44:45,154-Speed 3226.67 samples/sec   Loss 1.1530   LearningRate 0.0132   Epoch: 12   Global Step: 212570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:44:48,259-Speed 3298.52 samples/sec   Loss 1.1476   LearningRate 0.0132   Epoch: 12   Global Step: 212580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:51,337-Speed 3327.51 samples/sec   Loss 1.1284   LearningRate 0.0132   Epoch: 12   Global Step: 212590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:54,399-Speed 3345.20 samples/sec   Loss 1.1466   LearningRate 0.0132   Epoch: 12   Global Step: 212600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:44:57,496-Speed 3307.11 samples/sec   Loss 1.0772   LearningRate 0.0132   Epoch: 12   Global Step: 212610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:00,572-Speed 3330.43 samples/sec   Loss 1.0764   LearningRate 0.0132   Epoch: 12   Global Step: 212620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:03,647-Speed 3330.89 samples/sec   Loss 1.0966   LearningRate 0.0132   Epoch: 12   Global Step: 212630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:06,730-Speed 3322.13 samples/sec   Loss 1.0853   LearningRate 0.0132   Epoch: 12   Global Step: 212640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:09,799-Speed 3337.26 samples/sec   Loss 1.1071   LearningRate 0.0132   Epoch: 12   Global Step: 212650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:12,867-Speed 3338.59 samples/sec   Loss 1.1048   LearningRate 0.0132   Epoch: 12   Global Step: 212660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:15,936-Speed 3337.38 samples/sec   Loss 1.0960   LearningRate 0.0132   Epoch: 12   Global Step: 212670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:19,069-Speed 3269.15 samples/sec   Loss 1.1004   LearningRate 0.0132   Epoch: 12   Global Step: 212680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:22,157-Speed 3316.79 samples/sec   Loss 1.0882   LearningRate 0.0132   Epoch: 12   Global Step: 212690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:25,223-Speed 3340.63 samples/sec   Loss 1.1029   LearningRate 0.0132   Epoch: 12   Global Step: 212700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:28,286-Speed 3344.81 samples/sec   Loss 1.1392   LearningRate 0.0132   Epoch: 12   Global Step: 212710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:31,348-Speed 3345.01 samples/sec   Loss 1.1178   LearningRate 0.0132   Epoch: 12   Global Step: 212720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:34,407-Speed 3347.42 samples/sec   Loss 1.1156   LearningRate 0.0132   Epoch: 12   Global Step: 212730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:45:37,470-Speed 3344.50 samples/sec   Loss 1.1134   LearningRate 0.0132   Epoch: 12   Global Step: 212740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:40,532-Speed 3344.30 samples/sec   Loss 1.1411   LearningRate 0.0132   Epoch: 12   Global Step: 212750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:43,631-Speed 3305.72 samples/sec   Loss 1.1205   LearningRate 0.0132   Epoch: 12   Global Step: 212760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:46,719-Speed 3316.20 samples/sec   Loss 1.0870   LearningRate 0.0131   Epoch: 12   Global Step: 212770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:49,945-Speed 3175.80 samples/sec   Loss 1.1008   LearningRate 0.0131   Epoch: 12   Global Step: 212780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:53,172-Speed 3173.43 samples/sec   Loss 1.0873   LearningRate 0.0131   Epoch: 12   Global Step: 212790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:56,289-Speed 3286.45 samples/sec   Loss 1.1272   LearningRate 0.0131   Epoch: 12   Global Step: 212800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:45:59,418-Speed 3273.61 samples/sec   Loss 1.1223   LearningRate 0.0131   Epoch: 12   Global Step: 212810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:46:02,560-Speed 3259.96 samples/sec   Loss 1.0830   LearningRate 0.0131   Epoch: 12   Global Step: 212820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:46:05,647-Speed 3316.87 samples/sec   Loss 1.1094   LearningRate 0.0131   Epoch: 12   Global Step: 212830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:46:08,709-Speed 3346.19 samples/sec   Loss 1.0980   LearningRate 0.0131   Epoch: 12   Global Step: 212840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:11,773-Speed 3342.54 samples/sec   Loss 1.1036   LearningRate 0.0131   Epoch: 12   Global Step: 212850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:14,942-Speed 3231.74 samples/sec   Loss 1.1122   LearningRate 0.0131   Epoch: 12   Global Step: 212860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:18,019-Speed 3329.29 samples/sec   Loss 1.0986   LearningRate 0.0131   Epoch: 12   Global Step: 212870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:21,089-Speed 3336.64 samples/sec   Loss 1.0911   LearningRate 0.0131   Epoch: 12   Global Step: 212880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:24,156-Speed 3339.18 samples/sec   Loss 1.0943   LearningRate 0.0131   Epoch: 12   Global Step: 212890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:27,226-Speed 3336.18 samples/sec   Loss 1.0805   LearningRate 0.0131   Epoch: 12   Global Step: 212900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:30,302-Speed 3329.66 samples/sec   Loss 1.0963   LearningRate 0.0131   Epoch: 12   Global Step: 212910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:33,392-Speed 3314.95 samples/sec   Loss 1.1451   LearningRate 0.0131   Epoch: 12   Global Step: 212920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:36,482-Speed 3314.78 samples/sec   Loss 1.1252   LearningRate 0.0131   Epoch: 12   Global Step: 212930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:46:39,555-Speed 3332.66 samples/sec   Loss 1.0846   LearningRate 0.0131   Epoch: 12   Global Step: 212940   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 21:46:42,598-Speed 3366.86 samples/sec   Loss 1.1297   LearningRate 0.0131   Epoch: 12   Global Step: 212950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:46:45,660-Speed 3345.11 samples/sec   Loss 1.0744   LearningRate 0.0131   Epoch: 12   Global Step: 212960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:46:48,722-Speed 3344.15 samples/sec   Loss 1.1172   LearningRate 0.0131   Epoch: 12   Global Step: 212970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:46:51,782-Speed 3347.74 samples/sec   Loss 1.1368   LearningRate 0.0131   Epoch: 12   Global Step: 212980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:46:54,845-Speed 3343.49 samples/sec   Loss 1.0948   LearningRate 0.0131   Epoch: 12   Global Step: 212990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:46:57,907-Speed 3345.31 samples/sec   Loss 1.1447   LearningRate 0.0131   Epoch: 12   Global Step: 213000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:00,983-Speed 3330.06 samples/sec   Loss 1.0720   LearningRate 0.0131   Epoch: 12   Global Step: 213010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:04,048-Speed 3341.06 samples/sec   Loss 1.1250   LearningRate 0.0131   Epoch: 12   Global Step: 213020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:07,138-Speed 3314.60 samples/sec   Loss 1.1213   LearningRate 0.0131   Epoch: 12   Global Step: 213030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:10,206-Speed 3338.44 samples/sec   Loss 1.1083   LearningRate 0.0131   Epoch: 12   Global Step: 213040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:13,317-Speed 3293.19 samples/sec   Loss 1.1415   LearningRate 0.0131   Epoch: 12   Global Step: 213050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:47:16,369-Speed 3355.04 samples/sec   Loss 1.0984   LearningRate 0.0131   Epoch: 12   Global Step: 213060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:19,441-Speed 3334.41 samples/sec   Loss 1.1212   LearningRate 0.0131   Epoch: 12   Global Step: 213070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:22,551-Speed 3293.40 samples/sec   Loss 1.1814   LearningRate 0.0131   Epoch: 12   Global Step: 213080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:25,623-Speed 3334.17 samples/sec   Loss 1.1173   LearningRate 0.0131   Epoch: 12   Global Step: 213090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:28,688-Speed 3341.88 samples/sec   Loss 1.0916   LearningRate 0.0131   Epoch: 12   Global Step: 213100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:31,759-Speed 3334.62 samples/sec   Loss 1.1468   LearningRate 0.0131   Epoch: 12   Global Step: 213110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:34,828-Speed 3337.05 samples/sec   Loss 1.1132   LearningRate 0.0131   Epoch: 12   Global Step: 213120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:37,900-Speed 3334.62 samples/sec   Loss 1.1279   LearningRate 0.0131   Epoch: 12   Global Step: 213130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:40,984-Speed 3321.58 samples/sec   Loss 1.0977   LearningRate 0.0131   Epoch: 12   Global Step: 213140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:47:44,042-Speed 3349.15 samples/sec   Loss 1.1337   LearningRate 0.0131   Epoch: 12   Global Step: 213150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:47:47,109-Speed 3339.18 samples/sec   Loss 1.1423   LearningRate 0.0131   Epoch: 12   Global Step: 213160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:47:50,192-Speed 3322.60 samples/sec   Loss 1.1396   LearningRate 0.0131   Epoch: 12   Global Step: 213170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:47:53,257-Speed 3341.07 samples/sec   Loss 1.0828   LearningRate 0.0131   Epoch: 12   Global Step: 213180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:47:56,340-Speed 3322.27 samples/sec   Loss 1.1149   LearningRate 0.0131   Epoch: 12   Global Step: 213190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:47:59,409-Speed 3337.44 samples/sec   Loss 1.0437   LearningRate 0.0131   Epoch: 12   Global Step: 213200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:48:02,482-Speed 3333.35 samples/sec   Loss 1.0829   LearningRate 0.0131   Epoch: 12   Global Step: 213210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:48:05,543-Speed 3346.59 samples/sec   Loss 1.0919   LearningRate 0.0131   Epoch: 12   Global Step: 213220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:48:08,606-Speed 3343.40 samples/sec   Loss 1.1008   LearningRate 0.0130   Epoch: 12   Global Step: 213230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:48:11,668-Speed 3344.93 samples/sec   Loss 1.0944   LearningRate 0.0130   Epoch: 12   Global Step: 213240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:48:14,727-Speed 3348.49 samples/sec   Loss 1.1280   LearningRate 0.0130   Epoch: 12   Global Step: 213250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:17,835-Speed 3295.59 samples/sec   Loss 1.0917   LearningRate 0.0130   Epoch: 12   Global Step: 213260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:20,893-Speed 3349.28 samples/sec   Loss 1.0897   LearningRate 0.0130   Epoch: 12   Global Step: 213270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:24,011-Speed 3284.95 samples/sec   Loss 1.0920   LearningRate 0.0130   Epoch: 12   Global Step: 213280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:27,123-Speed 3290.59 samples/sec   Loss 1.0744   LearningRate 0.0130   Epoch: 12   Global Step: 213290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:30,201-Speed 3327.22 samples/sec   Loss 1.0647   LearningRate 0.0130   Epoch: 12   Global Step: 213300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:33,261-Speed 3348.57 samples/sec   Loss 1.1191   LearningRate 0.0130   Epoch: 12   Global Step: 213310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:36,320-Speed 3347.84 samples/sec   Loss 1.1006   LearningRate 0.0130   Epoch: 12   Global Step: 213320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:39,388-Speed 3338.39 samples/sec   Loss 1.1098   LearningRate 0.0130   Epoch: 12   Global Step: 213330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:42,450-Speed 3345.23 samples/sec   Loss 1.0612   LearningRate 0.0130   Epoch: 12   Global Step: 213340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:48:45,526-Speed 3329.03 samples/sec   Loss 1.0950   LearningRate 0.0130   Epoch: 12   Global Step: 213350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:48:48,610-Speed 3321.91 samples/sec   Loss 1.0154   LearningRate 0.0130   Epoch: 12   Global Step: 213360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:48:51,676-Speed 3340.79 samples/sec   Loss 1.0824   LearningRate 0.0130   Epoch: 12   Global Step: 213370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:48:54,776-Speed 3303.10 samples/sec   Loss 1.0869   LearningRate 0.0130   Epoch: 12   Global Step: 213380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:48:57,876-Speed 3304.41 samples/sec   Loss 1.0956   LearningRate 0.0130   Epoch: 12   Global Step: 213390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:00,944-Speed 3338.21 samples/sec   Loss 1.0896   LearningRate 0.0130   Epoch: 12   Global Step: 213400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:04,026-Speed 3323.47 samples/sec   Loss 1.1234   LearningRate 0.0130   Epoch: 12   Global Step: 213410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:07,104-Speed 3328.06 samples/sec   Loss 1.0806   LearningRate 0.0130   Epoch: 12   Global Step: 213420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:10,227-Speed 3279.24 samples/sec   Loss 1.0809   LearningRate 0.0130   Epoch: 12   Global Step: 213430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:13,314-Speed 3317.99 samples/sec   Loss 1.0961   LearningRate 0.0130   Epoch: 12   Global Step: 213440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:16,400-Speed 3318.35 samples/sec   Loss 1.0841   LearningRate 0.0130   Epoch: 12   Global Step: 213450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:19,480-Speed 3325.66 samples/sec   Loss 1.1102   LearningRate 0.0130   Epoch: 12   Global Step: 213460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:22,568-Speed 3316.61 samples/sec   Loss 1.1026   LearningRate 0.0130   Epoch: 12   Global Step: 213470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:25,659-Speed 3313.68 samples/sec   Loss 1.1165   LearningRate 0.0130   Epoch: 12   Global Step: 213480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:28,738-Speed 3326.99 samples/sec   Loss 1.1299   LearningRate 0.0130   Epoch: 12   Global Step: 213490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:49:31,809-Speed 3335.01 samples/sec   Loss 1.0928   LearningRate 0.0130   Epoch: 12   Global Step: 213500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:34,889-Speed 3325.80 samples/sec   Loss 1.0820   LearningRate 0.0130   Epoch: 12   Global Step: 213510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:37,970-Speed 3323.78 samples/sec   Loss 1.0728   LearningRate 0.0130   Epoch: 12   Global Step: 213520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:41,040-Speed 3336.61 samples/sec   Loss 1.1114   LearningRate 0.0130   Epoch: 12   Global Step: 213530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:44,117-Speed 3328.65 samples/sec   Loss 1.1090   LearningRate 0.0130   Epoch: 12   Global Step: 213540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:47,215-Speed 3306.13 samples/sec   Loss 1.1026   LearningRate 0.0130   Epoch: 12   Global Step: 213550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:50,319-Speed 3299.18 samples/sec   Loss 1.0844   LearningRate 0.0130   Epoch: 12   Global Step: 213560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:53,387-Speed 3339.25 samples/sec   Loss 1.0893   LearningRate 0.0130   Epoch: 12   Global Step: 213570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:56,489-Speed 3301.92 samples/sec   Loss 1.1484   LearningRate 0.0130   Epoch: 12   Global Step: 213580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:49:59,554-Speed 3341.24 samples/sec   Loss 1.1045   LearningRate 0.0130   Epoch: 12   Global Step: 213590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:02,662-Speed 3295.26 samples/sec   Loss 1.1283   LearningRate 0.0130   Epoch: 12   Global Step: 213600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:05,798-Speed 3266.31 samples/sec   Loss 1.0703   LearningRate 0.0130   Epoch: 12   Global Step: 213610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:08,900-Speed 3302.06 samples/sec   Loss 1.0964   LearningRate 0.0130   Epoch: 12   Global Step: 213620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:11,995-Speed 3309.41 samples/sec   Loss 1.1386   LearningRate 0.0130   Epoch: 12   Global Step: 213630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:15,065-Speed 3335.77 samples/sec   Loss 1.1354   LearningRate 0.0130   Epoch: 12   Global Step: 213640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:18,142-Speed 3328.85 samples/sec   Loss 1.0750   LearningRate 0.0130   Epoch: 12   Global Step: 213650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:21,215-Speed 3333.44 samples/sec   Loss 1.0972   LearningRate 0.0130   Epoch: 12   Global Step: 213660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:24,475-Speed 3141.39 samples/sec   Loss 1.1109   LearningRate 0.0130   Epoch: 12   Global Step: 213670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:27,570-Speed 3309.47 samples/sec   Loss 1.0966   LearningRate 0.0130   Epoch: 12   Global Step: 213680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:30,641-Speed 3334.95 samples/sec   Loss 1.1273   LearningRate 0.0130   Epoch: 12   Global Step: 213690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:33,714-Speed 3332.98 samples/sec   Loss 1.0977   LearningRate 0.0129   Epoch: 12   Global Step: 213700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:36,790-Speed 3330.54 samples/sec   Loss 1.1626   LearningRate 0.0129   Epoch: 12   Global Step: 213710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:39,857-Speed 3339.06 samples/sec   Loss 1.0859   LearningRate 0.0129   Epoch: 12   Global Step: 213720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:42,968-Speed 3292.28 samples/sec   Loss 1.1025   LearningRate 0.0129   Epoch: 12   Global Step: 213730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:46,122-Speed 3247.29 samples/sec   Loss 1.1194   LearningRate 0.0129   Epoch: 12   Global Step: 213740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:49,215-Speed 3312.32 samples/sec   Loss 1.1007   LearningRate 0.0129   Epoch: 12   Global Step: 213750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:52,341-Speed 3276.29 samples/sec   Loss 1.0370   LearningRate 0.0129   Epoch: 12   Global Step: 213760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:55,409-Speed 3338.92 samples/sec   Loss 1.1331   LearningRate 0.0129   Epoch: 12   Global Step: 213770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:50:58,562-Speed 3247.69 samples/sec   Loss 1.1037   LearningRate 0.0129   Epoch: 12   Global Step: 213780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:51:01,689-Speed 3275.12 samples/sec   Loss 1.1713   LearningRate 0.0129   Epoch: 12   Global Step: 213790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:04,766-Speed 3328.98 samples/sec   Loss 1.1254   LearningRate 0.0129   Epoch: 12   Global Step: 213800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:07,841-Speed 3331.36 samples/sec   Loss 1.0933   LearningRate 0.0129   Epoch: 12   Global Step: 213810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:10,921-Speed 3325.02 samples/sec   Loss 1.1066   LearningRate 0.0129   Epoch: 12   Global Step: 213820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:14,045-Speed 3279.21 samples/sec   Loss 1.1392   LearningRate 0.0129   Epoch: 12   Global Step: 213830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:17,118-Speed 3333.02 samples/sec   Loss 1.1353   LearningRate 0.0129   Epoch: 12   Global Step: 213840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:20,241-Speed 3280.23 samples/sec   Loss 1.1215   LearningRate 0.0129   Epoch: 12   Global Step: 213850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:23,398-Speed 3244.06 samples/sec   Loss 1.1166   LearningRate 0.0129   Epoch: 12   Global Step: 213860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:26,466-Speed 3338.40 samples/sec   Loss 1.1145   LearningRate 0.0129   Epoch: 12   Global Step: 213870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:29,541-Speed 3330.21 samples/sec   Loss 1.1207   LearningRate 0.0129   Epoch: 12   Global Step: 213880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:32,611-Speed 3336.02 samples/sec   Loss 1.1241   LearningRate 0.0129   Epoch: 12   Global Step: 213890   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 21:51:35,682-Speed 3335.63 samples/sec   Loss 1.1042   LearningRate 0.0129   Epoch: 12   Global Step: 213900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:38,763-Speed 3323.81 samples/sec   Loss 1.0792   LearningRate 0.0129   Epoch: 12   Global Step: 213910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:41,864-Speed 3304.10 samples/sec   Loss 1.1121   LearningRate 0.0129   Epoch: 12   Global Step: 213920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:44,969-Speed 3298.80 samples/sec   Loss 1.0890   LearningRate 0.0129   Epoch: 12   Global Step: 213930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:48,079-Speed 3292.90 samples/sec   Loss 1.0888   LearningRate 0.0129   Epoch: 12   Global Step: 213940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:51,222-Speed 3259.18 samples/sec   Loss 1.0950   LearningRate 0.0129   Epoch: 12   Global Step: 213950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:51:54,287-Speed 3341.44 samples/sec   Loss 1.0884   LearningRate 0.0129   Epoch: 12   Global Step: 213960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:51:57,356-Speed 3336.50 samples/sec   Loss 1.0991   LearningRate 0.0129   Epoch: 12   Global Step: 213970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:52:00,450-Speed 3310.90 samples/sec   Loss 1.1083   LearningRate 0.0129   Epoch: 12   Global Step: 213980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:52:03,558-Speed 3295.21 samples/sec   Loss 1.1321   LearningRate 0.0129   Epoch: 12   Global Step: 213990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:52:06,653-Speed 3309.34 samples/sec   Loss 1.1112   LearningRate 0.0129   Epoch: 12   Global Step: 214000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:52:50,809-[lfw][214000]XNorm: 19.967534
Training: 2022-04-11 21:52:50,810-[lfw][214000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 21:52:50,810-[lfw][214000]Accuracy-Highest: 0.99817
Training: 2022-04-11 21:53:42,379-[cfp_fp][214000]XNorm: 20.262101
Training: 2022-04-11 21:53:42,379-[cfp_fp][214000]Accuracy-Flip: 0.98943+-0.00384
Training: 2022-04-11 21:53:42,380-[cfp_fp][214000]Accuracy-Highest: 0.99086
Training: 2022-04-11 21:54:26,656-[agedb_30][214000]XNorm: 21.104116
Training: 2022-04-11 21:54:26,657-[agedb_30][214000]Accuracy-Flip: 0.98467+-0.00657
Training: 2022-04-11 21:54:26,657-[agedb_30][214000]Accuracy-Highest: 0.98567
Training: 2022-04-11 21:54:29,735-Speed 71.57 samples/sec   Loss 1.1240   LearningRate 0.0129   Epoch: 12   Global Step: 214010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:54:32,835-Speed 3304.85 samples/sec   Loss 1.1034   LearningRate 0.0129   Epoch: 12   Global Step: 214020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:54:35,902-Speed 3338.82 samples/sec   Loss 1.1296   LearningRate 0.0129   Epoch: 12   Global Step: 214030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:54:38,969-Speed 3339.21 samples/sec   Loss 1.1284   LearningRate 0.0129   Epoch: 12   Global Step: 214040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:54:42,034-Speed 3341.73 samples/sec   Loss 1.0909   LearningRate 0.0129   Epoch: 12   Global Step: 214050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:54:45,095-Speed 3346.32 samples/sec   Loss 1.0800   LearningRate 0.0129   Epoch: 12   Global Step: 214060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:54:48,155-Speed 3346.84 samples/sec   Loss 1.0946   LearningRate 0.0129   Epoch: 12   Global Step: 214070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:54:51,224-Speed 3337.97 samples/sec   Loss 1.0368   LearningRate 0.0129   Epoch: 12   Global Step: 214080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:54:54,288-Speed 3342.85 samples/sec   Loss 1.0952   LearningRate 0.0129   Epoch: 12   Global Step: 214090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:54:57,344-Speed 3351.49 samples/sec   Loss 1.0895   LearningRate 0.0129   Epoch: 12   Global Step: 214100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:55:00,400-Speed 3351.51 samples/sec   Loss 1.1258   LearningRate 0.0129   Epoch: 12   Global Step: 214110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:55:03,492-Speed 3312.48 samples/sec   Loss 1.0946   LearningRate 0.0129   Epoch: 12   Global Step: 214120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:55:06,600-Speed 3295.64 samples/sec   Loss 1.1093   LearningRate 0.0129   Epoch: 12   Global Step: 214130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:55:09,676-Speed 3329.82 samples/sec   Loss 1.1080   LearningRate 0.0129   Epoch: 12   Global Step: 214140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:55:12,799-Speed 3279.28 samples/sec   Loss 1.1276   LearningRate 0.0129   Epoch: 12   Global Step: 214150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:55:15,862-Speed 3344.04 samples/sec   Loss 1.1100   LearningRate 0.0128   Epoch: 12   Global Step: 214160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:55:18,925-Speed 3343.21 samples/sec   Loss 1.1272   LearningRate 0.0128   Epoch: 12   Global Step: 214170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:55:22,031-Speed 3298.48 samples/sec   Loss 1.1500   LearningRate 0.0128   Epoch: 12   Global Step: 214180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:55:25,078-Speed 3360.64 samples/sec   Loss 1.0882   LearningRate 0.0128   Epoch: 12   Global Step: 214190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:28,135-Speed 3350.98 samples/sec   Loss 1.0814   LearningRate 0.0128   Epoch: 12   Global Step: 214200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:31,201-Speed 3340.80 samples/sec   Loss 1.0878   LearningRate 0.0128   Epoch: 12   Global Step: 214210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:34,258-Speed 3350.29 samples/sec   Loss 1.1309   LearningRate 0.0128   Epoch: 12   Global Step: 214220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:37,321-Speed 3343.27 samples/sec   Loss 1.1208   LearningRate 0.0128   Epoch: 12   Global Step: 214230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:40,373-Speed 3356.95 samples/sec   Loss 1.0897   LearningRate 0.0128   Epoch: 12   Global Step: 214240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:43,425-Speed 3354.87 samples/sec   Loss 1.0412   LearningRate 0.0128   Epoch: 12   Global Step: 214250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:46,485-Speed 3347.42 samples/sec   Loss 1.1148   LearningRate 0.0128   Epoch: 12   Global Step: 214260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:49,553-Speed 3338.91 samples/sec   Loss 1.0993   LearningRate 0.0128   Epoch: 12   Global Step: 214270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:52,620-Speed 3339.28 samples/sec   Loss 1.1255   LearningRate 0.0128   Epoch: 12   Global Step: 214280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:55,666-Speed 3363.23 samples/sec   Loss 1.1065   LearningRate 0.0128   Epoch: 12   Global Step: 214290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:55:58,722-Speed 3350.67 samples/sec   Loss 1.1269   LearningRate 0.0128   Epoch: 12   Global Step: 214300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:56:01,784-Speed 3344.69 samples/sec   Loss 1.1044   LearningRate 0.0128   Epoch: 12   Global Step: 214310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:56:04,865-Speed 3324.53 samples/sec   Loss 1.0993   LearningRate 0.0128   Epoch: 12   Global Step: 214320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:56:07,977-Speed 3291.00 samples/sec   Loss 1.1192   LearningRate 0.0128   Epoch: 12   Global Step: 214330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:56:11,046-Speed 3337.99 samples/sec   Loss 1.1522   LearningRate 0.0128   Epoch: 12   Global Step: 214340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:56:14,114-Speed 3338.68 samples/sec   Loss 1.0712   LearningRate 0.0128   Epoch: 12   Global Step: 214350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:56:17,185-Speed 3335.24 samples/sec   Loss 1.0953   LearningRate 0.0128   Epoch: 12   Global Step: 214360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:56:20,247-Speed 3345.43 samples/sec   Loss 1.1061   LearningRate 0.0128   Epoch: 12   Global Step: 214370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:56:23,305-Speed 3348.83 samples/sec   Loss 1.1114   LearningRate 0.0128   Epoch: 12   Global Step: 214380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 21:56:26,358-Speed 3354.84 samples/sec   Loss 1.1122   LearningRate 0.0128   Epoch: 12   Global Step: 214390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:29,420-Speed 3344.86 samples/sec   Loss 1.0789   LearningRate 0.0128   Epoch: 12   Global Step: 214400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:32,479-Speed 3348.58 samples/sec   Loss 1.0992   LearningRate 0.0128   Epoch: 12   Global Step: 214410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:35,548-Speed 3337.30 samples/sec   Loss 1.0701   LearningRate 0.0128   Epoch: 12   Global Step: 214420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:38,668-Speed 3283.01 samples/sec   Loss 1.1490   LearningRate 0.0128   Epoch: 12   Global Step: 214430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:41,736-Speed 3338.33 samples/sec   Loss 1.1051   LearningRate 0.0128   Epoch: 12   Global Step: 214440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:44,796-Speed 3347.76 samples/sec   Loss 1.1019   LearningRate 0.0128   Epoch: 12   Global Step: 214450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:47,857-Speed 3345.25 samples/sec   Loss 1.0651   LearningRate 0.0128   Epoch: 12   Global Step: 214460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:50,963-Speed 3298.05 samples/sec   Loss 1.1369   LearningRate 0.0128   Epoch: 12   Global Step: 214470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:54,018-Speed 3352.59 samples/sec   Loss 1.1029   LearningRate 0.0128   Epoch: 12   Global Step: 214480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:56:57,073-Speed 3352.88 samples/sec   Loss 1.0971   LearningRate 0.0128   Epoch: 12   Global Step: 214490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:00,137-Speed 3342.35 samples/sec   Loss 1.1061   LearningRate 0.0128   Epoch: 12   Global Step: 214500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:03,216-Speed 3326.50 samples/sec   Loss 1.0969   LearningRate 0.0128   Epoch: 12   Global Step: 214510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:06,288-Speed 3333.97 samples/sec   Loss 1.1073   LearningRate 0.0128   Epoch: 12   Global Step: 214520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:09,341-Speed 3355.08 samples/sec   Loss 1.1004   LearningRate 0.0128   Epoch: 12   Global Step: 214530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:12,411-Speed 3335.97 samples/sec   Loss 1.0714   LearningRate 0.0128   Epoch: 12   Global Step: 214540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:15,467-Speed 3351.75 samples/sec   Loss 1.1261   LearningRate 0.0128   Epoch: 12   Global Step: 214550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:18,648-Speed 3219.75 samples/sec   Loss 1.1172   LearningRate 0.0128   Epoch: 12   Global Step: 214560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:21,706-Speed 3350.08 samples/sec   Loss 1.0836   LearningRate 0.0128   Epoch: 12   Global Step: 214570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:24,771-Speed 3340.68 samples/sec   Loss 1.1018   LearningRate 0.0128   Epoch: 12   Global Step: 214580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:27,830-Speed 3348.72 samples/sec   Loss 1.0973   LearningRate 0.0128   Epoch: 12   Global Step: 214590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:30,915-Speed 3319.59 samples/sec   Loss 1.1333   LearningRate 0.0128   Epoch: 12   Global Step: 214600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:34,001-Speed 3319.08 samples/sec   Loss 1.1063   LearningRate 0.0128   Epoch: 12   Global Step: 214610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:57:37,113-Speed 3291.79 samples/sec   Loss 1.1105   LearningRate 0.0128   Epoch: 12   Global Step: 214620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:40,194-Speed 3324.96 samples/sec   Loss 1.1336   LearningRate 0.0127   Epoch: 12   Global Step: 214630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:43,258-Speed 3341.73 samples/sec   Loss 1.1287   LearningRate 0.0127   Epoch: 12   Global Step: 214640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:46,321-Speed 3343.77 samples/sec   Loss 1.1053   LearningRate 0.0127   Epoch: 12   Global Step: 214650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:49,391-Speed 3336.96 samples/sec   Loss 1.1309   LearningRate 0.0127   Epoch: 12   Global Step: 214660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:52,454-Speed 3343.92 samples/sec   Loss 1.0790   LearningRate 0.0127   Epoch: 12   Global Step: 214670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:55,526-Speed 3334.09 samples/sec   Loss 1.0939   LearningRate 0.0127   Epoch: 12   Global Step: 214680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:57:58,586-Speed 3346.16 samples/sec   Loss 1.1018   LearningRate 0.0127   Epoch: 12   Global Step: 214690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:01,642-Speed 3352.35 samples/sec   Loss 1.1029   LearningRate 0.0127   Epoch: 12   Global Step: 214700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:04,715-Speed 3333.34 samples/sec   Loss 1.1100   LearningRate 0.0127   Epoch: 12   Global Step: 214710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:07,771-Speed 3350.69 samples/sec   Loss 1.0682   LearningRate 0.0127   Epoch: 12   Global Step: 214720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:10,834-Speed 3344.19 samples/sec   Loss 1.1598   LearningRate 0.0127   Epoch: 12   Global Step: 214730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:13,892-Speed 3349.04 samples/sec   Loss 1.1225   LearningRate 0.0127   Epoch: 12   Global Step: 214740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:16,950-Speed 3349.20 samples/sec   Loss 1.1207   LearningRate 0.0127   Epoch: 12   Global Step: 214750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:20,013-Speed 3343.88 samples/sec   Loss 1.1285   LearningRate 0.0127   Epoch: 12   Global Step: 214760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:23,080-Speed 3340.61 samples/sec   Loss 1.0978   LearningRate 0.0127   Epoch: 12   Global Step: 214770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:26,152-Speed 3332.98 samples/sec   Loss 1.0764   LearningRate 0.0127   Epoch: 12   Global Step: 214780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:29,210-Speed 3349.76 samples/sec   Loss 1.1177   LearningRate 0.0127   Epoch: 12   Global Step: 214790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:58:32,279-Speed 3337.20 samples/sec   Loss 1.1479   LearningRate 0.0127   Epoch: 12   Global Step: 214800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:58:35,341-Speed 3345.81 samples/sec   Loss 1.0815   LearningRate 0.0127   Epoch: 12   Global Step: 214810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:58:38,414-Speed 3333.23 samples/sec   Loss 1.1500   LearningRate 0.0127   Epoch: 12   Global Step: 214820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:58:41,492-Speed 3326.82 samples/sec   Loss 1.1137   LearningRate 0.0127   Epoch: 12   Global Step: 214830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:58:44,560-Speed 3338.94 samples/sec   Loss 1.1120   LearningRate 0.0127   Epoch: 12   Global Step: 214840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:58:47,616-Speed 3351.21 samples/sec   Loss 1.1406   LearningRate 0.0127   Epoch: 12   Global Step: 214850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:50,675-Speed 3348.36 samples/sec   Loss 1.1336   LearningRate 0.0127   Epoch: 12   Global Step: 214860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:53,741-Speed 3340.31 samples/sec   Loss 1.0894   LearningRate 0.0127   Epoch: 12   Global Step: 214870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:56,863-Speed 3281.04 samples/sec   Loss 1.1366   LearningRate 0.0127   Epoch: 12   Global Step: 214880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:58:59,977-Speed 3288.93 samples/sec   Loss 1.1180   LearningRate 0.0127   Epoch: 12   Global Step: 214890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:03,041-Speed 3342.79 samples/sec   Loss 1.1136   LearningRate 0.0127   Epoch: 12   Global Step: 214900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:06,117-Speed 3329.98 samples/sec   Loss 1.1415   LearningRate 0.0127   Epoch: 12   Global Step: 214910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:09,175-Speed 3348.96 samples/sec   Loss 1.1606   LearningRate 0.0127   Epoch: 12   Global Step: 214920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:12,239-Speed 3342.73 samples/sec   Loss 1.0839   LearningRate 0.0127   Epoch: 12   Global Step: 214930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:15,297-Speed 3349.52 samples/sec   Loss 1.0830   LearningRate 0.0127   Epoch: 12   Global Step: 214940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:18,419-Speed 3280.44 samples/sec   Loss 1.1154   LearningRate 0.0127   Epoch: 12   Global Step: 214950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:59:21,528-Speed 3294.55 samples/sec   Loss 1.0749   LearningRate 0.0127   Epoch: 12   Global Step: 214960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:59:24,582-Speed 3354.08 samples/sec   Loss 1.1120   LearningRate 0.0127   Epoch: 12   Global Step: 214970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:27,646-Speed 3343.12 samples/sec   Loss 1.0725   LearningRate 0.0127   Epoch: 12   Global Step: 214980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:30,702-Speed 3351.53 samples/sec   Loss 1.1006   LearningRate 0.0127   Epoch: 12   Global Step: 214990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:33,760-Speed 3349.04 samples/sec   Loss 1.0955   LearningRate 0.0127   Epoch: 12   Global Step: 215000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:36,909-Speed 3252.30 samples/sec   Loss 1.0730   LearningRate 0.0127   Epoch: 12   Global Step: 215010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:39,998-Speed 3316.17 samples/sec   Loss 1.0734   LearningRate 0.0127   Epoch: 12   Global Step: 215020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:43,054-Speed 3351.60 samples/sec   Loss 1.1379   LearningRate 0.0127   Epoch: 12   Global Step: 215030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:46,114-Speed 3347.18 samples/sec   Loss 1.1003   LearningRate 0.0127   Epoch: 12   Global Step: 215040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:49,179-Speed 3341.38 samples/sec   Loss 1.1092   LearningRate 0.0127   Epoch: 12   Global Step: 215050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:52,239-Speed 3347.11 samples/sec   Loss 1.0938   LearningRate 0.0127   Epoch: 12   Global Step: 215060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 21:59:55,339-Speed 3304.43 samples/sec   Loss 1.0976   LearningRate 0.0127   Epoch: 12   Global Step: 215070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 21:59:58,412-Speed 3332.97 samples/sec   Loss 1.1011   LearningRate 0.0127   Epoch: 12   Global Step: 215080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:01,473-Speed 3345.66 samples/sec   Loss 1.1554   LearningRate 0.0127   Epoch: 12   Global Step: 215090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:04,535-Speed 3344.75 samples/sec   Loss 1.1251   LearningRate 0.0126   Epoch: 12   Global Step: 215100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:07,599-Speed 3343.58 samples/sec   Loss 1.1024   LearningRate 0.0126   Epoch: 12   Global Step: 215110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:10,656-Speed 3350.10 samples/sec   Loss 1.1291   LearningRate 0.0126   Epoch: 12   Global Step: 215120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:13,730-Speed 3331.77 samples/sec   Loss 1.1560   LearningRate 0.0126   Epoch: 12   Global Step: 215130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:16,791-Speed 3345.62 samples/sec   Loss 1.1008   LearningRate 0.0126   Epoch: 12   Global Step: 215140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:19,866-Speed 3331.61 samples/sec   Loss 1.1184   LearningRate 0.0126   Epoch: 12   Global Step: 215150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:22,932-Speed 3340.82 samples/sec   Loss 1.0855   LearningRate 0.0126   Epoch: 12   Global Step: 215160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:25,992-Speed 3346.64 samples/sec   Loss 1.1049   LearningRate 0.0126   Epoch: 12   Global Step: 215170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:29,057-Speed 3341.37 samples/sec   Loss 1.1012   LearningRate 0.0126   Epoch: 12   Global Step: 215180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:32,115-Speed 3349.95 samples/sec   Loss 1.0758   LearningRate 0.0126   Epoch: 12   Global Step: 215190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:35,176-Speed 3346.00 samples/sec   Loss 1.0843   LearningRate 0.0126   Epoch: 12   Global Step: 215200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:38,241-Speed 3341.73 samples/sec   Loss 1.0452   LearningRate 0.0126   Epoch: 12   Global Step: 215210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:41,301-Speed 3347.38 samples/sec   Loss 1.1188   LearningRate 0.0126   Epoch: 12   Global Step: 215220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:44,367-Speed 3341.01 samples/sec   Loss 1.1108   LearningRate 0.0126   Epoch: 12   Global Step: 215230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:47,434-Speed 3338.63 samples/sec   Loss 1.1098   LearningRate 0.0126   Epoch: 12   Global Step: 215240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:50,495-Speed 3346.68 samples/sec   Loss 1.1447   LearningRate 0.0126   Epoch: 12   Global Step: 215250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:53,550-Speed 3352.09 samples/sec   Loss 1.0675   LearningRate 0.0126   Epoch: 12   Global Step: 215260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:56,673-Speed 3280.49 samples/sec   Loss 1.1207   LearningRate 0.0126   Epoch: 12   Global Step: 215270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:00:59,758-Speed 3319.81 samples/sec   Loss 1.1073   LearningRate 0.0126   Epoch: 12   Global Step: 215280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:01:02,832-Speed 3331.54 samples/sec   Loss 1.1113   LearningRate 0.0126   Epoch: 12   Global Step: 215290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:01:05,900-Speed 3338.59 samples/sec   Loss 1.1007   LearningRate 0.0126   Epoch: 12   Global Step: 215300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:09,001-Speed 3302.48 samples/sec   Loss 1.0770   LearningRate 0.0126   Epoch: 12   Global Step: 215310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:12,058-Speed 3350.65 samples/sec   Loss 1.1019   LearningRate 0.0126   Epoch: 12   Global Step: 215320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:15,116-Speed 3350.15 samples/sec   Loss 1.0942   LearningRate 0.0126   Epoch: 12   Global Step: 215330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:18,189-Speed 3332.86 samples/sec   Loss 1.0854   LearningRate 0.0126   Epoch: 12   Global Step: 215340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:21,406-Speed 3183.50 samples/sec   Loss 1.0940   LearningRate 0.0126   Epoch: 12   Global Step: 215350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:24,469-Speed 3343.56 samples/sec   Loss 1.0961   LearningRate 0.0126   Epoch: 12   Global Step: 215360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:27,534-Speed 3342.29 samples/sec   Loss 1.1310   LearningRate 0.0126   Epoch: 12   Global Step: 215370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:30,597-Speed 3343.78 samples/sec   Loss 1.1070   LearningRate 0.0126   Epoch: 12   Global Step: 215380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:33,668-Speed 3334.58 samples/sec   Loss 1.1147   LearningRate 0.0126   Epoch: 12   Global Step: 215390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:36,748-Speed 3325.86 samples/sec   Loss 1.1090   LearningRate 0.0126   Epoch: 12   Global Step: 215400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:01:39,814-Speed 3340.38 samples/sec   Loss 1.1236   LearningRate 0.0126   Epoch: 12   Global Step: 215410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:01:42,955-Speed 3261.18 samples/sec   Loss 1.0700   LearningRate 0.0126   Epoch: 12   Global Step: 215420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:01:46,020-Speed 3341.68 samples/sec   Loss 1.1009   LearningRate 0.0126   Epoch: 12   Global Step: 215430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:01:49,080-Speed 3346.83 samples/sec   Loss 1.1156   LearningRate 0.0126   Epoch: 12   Global Step: 215440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:52,162-Speed 3324.10 samples/sec   Loss 1.1585   LearningRate 0.0126   Epoch: 12   Global Step: 215450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:55,238-Speed 3328.91 samples/sec   Loss 1.0896   LearningRate 0.0126   Epoch: 12   Global Step: 215460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:01:58,332-Speed 3310.30 samples/sec   Loss 1.1106   LearningRate 0.0126   Epoch: 12   Global Step: 215470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:02:01,407-Speed 3331.68 samples/sec   Loss 1.0831   LearningRate 0.0126   Epoch: 12   Global Step: 215480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:02:04,469-Speed 3344.61 samples/sec   Loss 1.0649   LearningRate 0.0126   Epoch: 12   Global Step: 215490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:02:07,538-Speed 3337.54 samples/sec   Loss 1.1186   LearningRate 0.0126   Epoch: 12   Global Step: 215500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:02:10,607-Speed 3337.12 samples/sec   Loss 1.1226   LearningRate 0.0126   Epoch: 12   Global Step: 215510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:02:13,674-Speed 3340.14 samples/sec   Loss 1.0972   LearningRate 0.0126   Epoch: 12   Global Step: 215520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:02:16,770-Speed 3307.86 samples/sec   Loss 1.1088   LearningRate 0.0126   Epoch: 12   Global Step: 215530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 22:02:19,874-Speed 3300.09 samples/sec   Loss 1.1460   LearningRate 0.0126   Epoch: 12   Global Step: 215540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:02:22,949-Speed 3330.50 samples/sec   Loss 1.1235   LearningRate 0.0126   Epoch: 12   Global Step: 215550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:02:26,018-Speed 3337.85 samples/sec   Loss 1.1462   LearningRate 0.0126   Epoch: 12   Global Step: 215560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 22:02:29,130-Speed 3291.39 samples/sec   Loss 1.1146   LearningRate 0.0125   Epoch: 12   Global Step: 215570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:32,219-Speed 3315.46 samples/sec   Loss 1.1153   LearningRate 0.0125   Epoch: 12   Global Step: 215580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:35,321-Speed 3301.31 samples/sec   Loss 1.1075   LearningRate 0.0125   Epoch: 12   Global Step: 215590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:38,415-Speed 3311.08 samples/sec   Loss 1.1340   LearningRate 0.0125   Epoch: 12   Global Step: 215600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:41,488-Speed 3332.63 samples/sec   Loss 1.0822   LearningRate 0.0125   Epoch: 12   Global Step: 215610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:44,558-Speed 3336.72 samples/sec   Loss 1.0437   LearningRate 0.0125   Epoch: 12   Global Step: 215620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:47,625-Speed 3339.22 samples/sec   Loss 1.0582   LearningRate 0.0125   Epoch: 12   Global Step: 215630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:50,707-Speed 3323.32 samples/sec   Loss 1.1205   LearningRate 0.0125   Epoch: 12   Global Step: 215640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:53,780-Speed 3332.89 samples/sec   Loss 1.1332   LearningRate 0.0125   Epoch: 12   Global Step: 215650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:56,843-Speed 3344.25 samples/sec   Loss 1.1240   LearningRate 0.0125   Epoch: 12   Global Step: 215660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:02:59,909-Speed 3340.08 samples/sec   Loss 1.0454   LearningRate 0.0125   Epoch: 12   Global Step: 215670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:02,975-Speed 3340.77 samples/sec   Loss 1.1069   LearningRate 0.0125   Epoch: 12   Global Step: 215680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:06,050-Speed 3330.73 samples/sec   Loss 1.1511   LearningRate 0.0125   Epoch: 12   Global Step: 215690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:09,116-Speed 3340.73 samples/sec   Loss 1.0947   LearningRate 0.0125   Epoch: 12   Global Step: 215700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:12,182-Speed 3340.69 samples/sec   Loss 1.0898   LearningRate 0.0125   Epoch: 12   Global Step: 215710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:15,255-Speed 3332.94 samples/sec   Loss 1.1186   LearningRate 0.0125   Epoch: 12   Global Step: 215720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:18,338-Speed 3322.12 samples/sec   Loss 1.0703   LearningRate 0.0125   Epoch: 12   Global Step: 215730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:21,388-Speed 3358.56 samples/sec   Loss 1.0880   LearningRate 0.0125   Epoch: 12   Global Step: 215740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:24,449-Speed 3346.43 samples/sec   Loss 1.1038   LearningRate 0.0125   Epoch: 12   Global Step: 215750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:27,515-Speed 3342.00 samples/sec   Loss 1.1100   LearningRate 0.0125   Epoch: 12   Global Step: 215760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:30,579-Speed 3342.36 samples/sec   Loss 1.0519   LearningRate 0.0125   Epoch: 12   Global Step: 215770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:03:33,642-Speed 3344.20 samples/sec   Loss 1.0834   LearningRate 0.0125   Epoch: 12   Global Step: 215780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:03:36,714-Speed 3333.70 samples/sec   Loss 1.0589   LearningRate 0.0125   Epoch: 12   Global Step: 215790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:03:39,812-Speed 3306.80 samples/sec   Loss 1.1446   LearningRate 0.0125   Epoch: 12   Global Step: 215800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:03:42,911-Speed 3304.00 samples/sec   Loss 1.0806   LearningRate 0.0125   Epoch: 12   Global Step: 215810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:03:45,975-Speed 3343.33 samples/sec   Loss 1.1845   LearningRate 0.0125   Epoch: 12   Global Step: 215820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:03:49,062-Speed 3317.98 samples/sec   Loss 1.1336   LearningRate 0.0125   Epoch: 12   Global Step: 215830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:03:52,126-Speed 3343.20 samples/sec   Loss 1.0981   LearningRate 0.0125   Epoch: 12   Global Step: 215840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:03:55,189-Speed 3343.86 samples/sec   Loss 1.1118   LearningRate 0.0125   Epoch: 12   Global Step: 215850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:03:58,258-Speed 3337.12 samples/sec   Loss 1.1302   LearningRate 0.0125   Epoch: 12   Global Step: 215860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:01,392-Speed 3268.68 samples/sec   Loss 1.1135   LearningRate 0.0125   Epoch: 12   Global Step: 215870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:04,461-Speed 3337.09 samples/sec   Loss 1.1064   LearningRate 0.0125   Epoch: 12   Global Step: 215880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:04:07,530-Speed 3336.95 samples/sec   Loss 1.1109   LearningRate 0.0125   Epoch: 12   Global Step: 215890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:10,627-Speed 3307.48 samples/sec   Loss 1.1123   LearningRate 0.0125   Epoch: 12   Global Step: 215900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:13,750-Speed 3279.12 samples/sec   Loss 1.0874   LearningRate 0.0125   Epoch: 12   Global Step: 215910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:16,834-Speed 3322.12 samples/sec   Loss 1.1119   LearningRate 0.0125   Epoch: 12   Global Step: 215920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:20,000-Speed 3234.85 samples/sec   Loss 1.0912   LearningRate 0.0125   Epoch: 12   Global Step: 215930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:23,077-Speed 3328.50 samples/sec   Loss 1.1238   LearningRate 0.0125   Epoch: 12   Global Step: 215940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:26,140-Speed 3343.54 samples/sec   Loss 1.0894   LearningRate 0.0125   Epoch: 12   Global Step: 215950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:29,212-Speed 3333.85 samples/sec   Loss 1.1040   LearningRate 0.0125   Epoch: 12   Global Step: 215960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:32,291-Speed 3326.48 samples/sec   Loss 1.1293   LearningRate 0.0125   Epoch: 12   Global Step: 215970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:35,368-Speed 3328.55 samples/sec   Loss 1.0989   LearningRate 0.0125   Epoch: 12   Global Step: 215980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:38,460-Speed 3312.59 samples/sec   Loss 1.1203   LearningRate 0.0125   Epoch: 12   Global Step: 215990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:04:41,533-Speed 3333.98 samples/sec   Loss 1.1078   LearningRate 0.0125   Epoch: 12   Global Step: 216000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:05:25,216-[lfw][216000]XNorm: 21.793292
Training: 2022-04-11 22:05:25,216-[lfw][216000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-11 22:05:25,217-[lfw][216000]Accuracy-Highest: 0.99817
Training: 2022-04-11 22:06:15,847-[cfp_fp][216000]XNorm: 22.098081
Training: 2022-04-11 22:06:15,848-[cfp_fp][216000]Accuracy-Flip: 0.99000+-0.00456
Training: 2022-04-11 22:06:15,848-[cfp_fp][216000]Accuracy-Highest: 0.99086
Training: 2022-04-11 22:06:59,427-[agedb_30][216000]XNorm: 22.682130
Training: 2022-04-11 22:06:59,427-[agedb_30][216000]Accuracy-Flip: 0.98450+-0.00633
Training: 2022-04-11 22:06:59,428-[agedb_30][216000]Accuracy-Highest: 0.98567
Training: 2022-04-11 22:07:02,501-Speed 72.64 samples/sec   Loss 1.1060   LearningRate 0.0125   Epoch: 12   Global Step: 216010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:07:05,565-Speed 3342.40 samples/sec   Loss 1.0716   LearningRate 0.0125   Epoch: 12   Global Step: 216020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:07:08,629-Speed 3343.38 samples/sec   Loss 1.0813   LearningRate 0.0125   Epoch: 12   Global Step: 216030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:11,681-Speed 3355.10 samples/sec   Loss 1.1203   LearningRate 0.0124   Epoch: 12   Global Step: 216040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:14,755-Speed 3332.37 samples/sec   Loss 1.0801   LearningRate 0.0124   Epoch: 12   Global Step: 216050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:17,946-Speed 3209.28 samples/sec   Loss 1.1071   LearningRate 0.0124   Epoch: 12   Global Step: 216060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:21,084-Speed 3263.89 samples/sec   Loss 1.1160   LearningRate 0.0124   Epoch: 12   Global Step: 216070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:24,231-Speed 3254.73 samples/sec   Loss 1.1191   LearningRate 0.0124   Epoch: 12   Global Step: 216080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:27,322-Speed 3314.18 samples/sec   Loss 1.0941   LearningRate 0.0124   Epoch: 12   Global Step: 216090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:30,499-Speed 3223.01 samples/sec   Loss 1.0999   LearningRate 0.0124   Epoch: 12   Global Step: 216100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:33,589-Speed 3315.40 samples/sec   Loss 1.0523   LearningRate 0.0124   Epoch: 12   Global Step: 216110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:36,666-Speed 3328.85 samples/sec   Loss 1.1512   LearningRate 0.0124   Epoch: 12   Global Step: 216120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:07:39,821-Speed 3246.76 samples/sec   Loss 1.0853   LearningRate 0.0124   Epoch: 12   Global Step: 216130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:07:42,931-Speed 3293.15 samples/sec   Loss 1.0755   LearningRate 0.0124   Epoch: 12   Global Step: 216140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:07:46,019-Speed 3316.37 samples/sec   Loss 1.0862   LearningRate 0.0124   Epoch: 12   Global Step: 216150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:07:49,083-Speed 3343.17 samples/sec   Loss 1.1012   LearningRate 0.0124   Epoch: 12   Global Step: 216160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:07:52,152-Speed 3336.80 samples/sec   Loss 1.0654   LearningRate 0.0124   Epoch: 12   Global Step: 216170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:07:55,222-Speed 3336.31 samples/sec   Loss 1.1142   LearningRate 0.0124   Epoch: 12   Global Step: 216180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:07:58,287-Speed 3341.66 samples/sec   Loss 1.1307   LearningRate 0.0124   Epoch: 12   Global Step: 216190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:01,436-Speed 3252.83 samples/sec   Loss 1.1052   LearningRate 0.0124   Epoch: 12   Global Step: 216200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:04,498-Speed 3345.22 samples/sec   Loss 1.1342   LearningRate 0.0124   Epoch: 12   Global Step: 216210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:07,574-Speed 3329.71 samples/sec   Loss 1.1239   LearningRate 0.0124   Epoch: 12   Global Step: 216220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:10,650-Speed 3329.73 samples/sec   Loss 1.1588   LearningRate 0.0124   Epoch: 12   Global Step: 216230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:08:13,744-Speed 3310.56 samples/sec   Loss 1.1375   LearningRate 0.0124   Epoch: 12   Global Step: 216240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:08:16,870-Speed 3276.26 samples/sec   Loss 1.1032   LearningRate 0.0124   Epoch: 12   Global Step: 216250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:08:19,962-Speed 3312.41 samples/sec   Loss 1.1119   LearningRate 0.0124   Epoch: 12   Global Step: 216260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:08:23,024-Speed 3345.27 samples/sec   Loss 1.1207   LearningRate 0.0124   Epoch: 12   Global Step: 216270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:08:26,100-Speed 3329.77 samples/sec   Loss 1.1577   LearningRate 0.0124   Epoch: 12   Global Step: 216280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:08:29,160-Speed 3347.17 samples/sec   Loss 1.0944   LearningRate 0.0124   Epoch: 12   Global Step: 216290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:08:32,230-Speed 3336.58 samples/sec   Loss 1.1309   LearningRate 0.0124   Epoch: 12   Global Step: 216300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:08:35,286-Speed 3351.02 samples/sec   Loss 1.0721   LearningRate 0.0124   Epoch: 12   Global Step: 216310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:38,354-Speed 3338.94 samples/sec   Loss 1.0960   LearningRate 0.0124   Epoch: 12   Global Step: 216320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:41,427-Speed 3332.23 samples/sec   Loss 1.1131   LearningRate 0.0124   Epoch: 12   Global Step: 216330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:44,508-Speed 3324.89 samples/sec   Loss 1.0973   LearningRate 0.0124   Epoch: 12   Global Step: 216340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:47,600-Speed 3312.10 samples/sec   Loss 1.1213   LearningRate 0.0124   Epoch: 12   Global Step: 216350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:50,675-Speed 3330.65 samples/sec   Loss 1.0644   LearningRate 0.0124   Epoch: 12   Global Step: 216360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:53,741-Speed 3340.74 samples/sec   Loss 1.0803   LearningRate 0.0124   Epoch: 12   Global Step: 216370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:56,809-Speed 3339.54 samples/sec   Loss 1.1017   LearningRate 0.0124   Epoch: 12   Global Step: 216380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:08:59,940-Speed 3270.62 samples/sec   Loss 1.0774   LearningRate 0.0124   Epoch: 12   Global Step: 216390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:03,088-Speed 3253.39 samples/sec   Loss 1.1193   LearningRate 0.0124   Epoch: 12   Global Step: 216400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:06,190-Speed 3302.76 samples/sec   Loss 1.1425   LearningRate 0.0124   Epoch: 12   Global Step: 216410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:09:09,260-Speed 3335.57 samples/sec   Loss 1.0931   LearningRate 0.0124   Epoch: 12   Global Step: 216420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:09:12,325-Speed 3342.15 samples/sec   Loss 1.0963   LearningRate 0.0124   Epoch: 12   Global Step: 216430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:09:15,415-Speed 3314.18 samples/sec   Loss 1.1194   LearningRate 0.0124   Epoch: 12   Global Step: 216440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:09:18,489-Speed 3331.83 samples/sec   Loss 1.1105   LearningRate 0.0124   Epoch: 12   Global Step: 216450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:09:21,581-Speed 3313.07 samples/sec   Loss 1.1053   LearningRate 0.0124   Epoch: 12   Global Step: 216460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:09:24,666-Speed 3320.77 samples/sec   Loss 1.0912   LearningRate 0.0124   Epoch: 12   Global Step: 216470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:27,765-Speed 3304.02 samples/sec   Loss 1.1169   LearningRate 0.0124   Epoch: 12   Global Step: 216480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:30,849-Speed 3321.21 samples/sec   Loss 1.1148   LearningRate 0.0124   Epoch: 12   Global Step: 216490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:33,956-Speed 3296.38 samples/sec   Loss 1.0836   LearningRate 0.0124   Epoch: 12   Global Step: 216500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:37,026-Speed 3336.97 samples/sec   Loss 1.1043   LearningRate 0.0123   Epoch: 12   Global Step: 216510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:40,094-Speed 3338.66 samples/sec   Loss 1.0627   LearningRate 0.0123   Epoch: 12   Global Step: 216520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:43,174-Speed 3324.67 samples/sec   Loss 1.1475   LearningRate 0.0123   Epoch: 12   Global Step: 216530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:46,254-Speed 3325.48 samples/sec   Loss 1.1080   LearningRate 0.0123   Epoch: 12   Global Step: 216540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:49,323-Speed 3337.83 samples/sec   Loss 1.1627   LearningRate 0.0123   Epoch: 12   Global Step: 216550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:52,389-Speed 3341.41 samples/sec   Loss 1.0849   LearningRate 0.0123   Epoch: 12   Global Step: 216560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:09:55,460-Speed 3334.44 samples/sec   Loss 1.1285   LearningRate 0.0123   Epoch: 12   Global Step: 216570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:09:58,526-Speed 3341.47 samples/sec   Loss 1.0802   LearningRate 0.0123   Epoch: 12   Global Step: 216580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:10:01,588-Speed 3344.09 samples/sec   Loss 1.1192   LearningRate 0.0123   Epoch: 12   Global Step: 216590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:04,649-Speed 3345.85 samples/sec   Loss 1.0773   LearningRate 0.0123   Epoch: 12   Global Step: 216600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:07,794-Speed 3257.38 samples/sec   Loss 1.1323   LearningRate 0.0123   Epoch: 12   Global Step: 216610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:10,857-Speed 3343.33 samples/sec   Loss 1.1321   LearningRate 0.0123   Epoch: 12   Global Step: 216620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:13,943-Speed 3318.95 samples/sec   Loss 1.1636   LearningRate 0.0123   Epoch: 12   Global Step: 216630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:17,022-Speed 3326.98 samples/sec   Loss 1.1140   LearningRate 0.0123   Epoch: 12   Global Step: 216640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:20,133-Speed 3292.64 samples/sec   Loss 1.1261   LearningRate 0.0123   Epoch: 12   Global Step: 216650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:23,193-Speed 3346.63 samples/sec   Loss 1.1112   LearningRate 0.0123   Epoch: 12   Global Step: 216660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:26,260-Speed 3340.59 samples/sec   Loss 1.1060   LearningRate 0.0123   Epoch: 12   Global Step: 216670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:29,341-Speed 3323.74 samples/sec   Loss 1.1280   LearningRate 0.0123   Epoch: 12   Global Step: 216680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:10:32,438-Speed 3307.62 samples/sec   Loss 1.0815   LearningRate 0.0123   Epoch: 12   Global Step: 216690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:10:35,498-Speed 3346.78 samples/sec   Loss 1.1366   LearningRate 0.0123   Epoch: 12   Global Step: 216700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:10:38,563-Speed 3341.69 samples/sec   Loss 1.0826   LearningRate 0.0123   Epoch: 12   Global Step: 216710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:10:41,636-Speed 3333.07 samples/sec   Loss 1.0596   LearningRate 0.0123   Epoch: 12   Global Step: 216720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:10:44,717-Speed 3324.79 samples/sec   Loss 1.0720   LearningRate 0.0123   Epoch: 12   Global Step: 216730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:10:47,785-Speed 3338.09 samples/sec   Loss 1.1188   LearningRate 0.0123   Epoch: 12   Global Step: 216740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:10:50,858-Speed 3333.44 samples/sec   Loss 1.1179   LearningRate 0.0123   Epoch: 12   Global Step: 216750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:10:53,965-Speed 3295.53 samples/sec   Loss 1.0858   LearningRate 0.0123   Epoch: 12   Global Step: 216760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:10:57,040-Speed 3331.08 samples/sec   Loss 1.0546   LearningRate 0.0123   Epoch: 12   Global Step: 216770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:00,102-Speed 3345.02 samples/sec   Loss 1.0845   LearningRate 0.0123   Epoch: 12   Global Step: 216780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:03,160-Speed 3349.46 samples/sec   Loss 1.1065   LearningRate 0.0123   Epoch: 12   Global Step: 216790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:06,242-Speed 3323.54 samples/sec   Loss 1.0653   LearningRate 0.0123   Epoch: 12   Global Step: 216800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:09,324-Speed 3323.37 samples/sec   Loss 1.1481   LearningRate 0.0123   Epoch: 12   Global Step: 216810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:12,407-Speed 3322.98 samples/sec   Loss 1.1032   LearningRate 0.0123   Epoch: 12   Global Step: 216820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:15,468-Speed 3345.92 samples/sec   Loss 1.0825   LearningRate 0.0123   Epoch: 12   Global Step: 216830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:18,536-Speed 3338.45 samples/sec   Loss 1.1068   LearningRate 0.0123   Epoch: 12   Global Step: 216840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:21,605-Speed 3337.13 samples/sec   Loss 1.1102   LearningRate 0.0123   Epoch: 12   Global Step: 216850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:24,669-Speed 3342.89 samples/sec   Loss 1.0479   LearningRate 0.0123   Epoch: 12   Global Step: 216860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:27,733-Speed 3342.09 samples/sec   Loss 1.1120   LearningRate 0.0123   Epoch: 12   Global Step: 216870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:30,799-Speed 3341.45 samples/sec   Loss 1.0579   LearningRate 0.0123   Epoch: 12   Global Step: 216880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:33,871-Speed 3334.01 samples/sec   Loss 1.0563   LearningRate 0.0123   Epoch: 12   Global Step: 216890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:36,936-Speed 3341.73 samples/sec   Loss 1.0888   LearningRate 0.0123   Epoch: 12   Global Step: 216900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:40,002-Speed 3341.13 samples/sec   Loss 1.1016   LearningRate 0.0123   Epoch: 12   Global Step: 216910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:43,066-Speed 3343.15 samples/sec   Loss 1.0850   LearningRate 0.0123   Epoch: 12   Global Step: 216920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:46,142-Speed 3330.24 samples/sec   Loss 1.1121   LearningRate 0.0123   Epoch: 12   Global Step: 216930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:49,220-Speed 3327.25 samples/sec   Loss 1.1049   LearningRate 0.0123   Epoch: 12   Global Step: 216940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:52,328-Speed 3295.89 samples/sec   Loss 1.0691   LearningRate 0.0123   Epoch: 12   Global Step: 216950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:11:55,406-Speed 3327.53 samples/sec   Loss 1.0647   LearningRate 0.0123   Epoch: 12   Global Step: 216960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:11:58,486-Speed 3324.59 samples/sec   Loss 1.1040   LearningRate 0.0123   Epoch: 12   Global Step: 216970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:12:02,105-Speed 2830.44 samples/sec   Loss 1.1160   LearningRate 0.0123   Epoch: 12   Global Step: 216980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:12:39,786-Speed 271.76 samples/sec   Loss 0.8364   LearningRate 0.0122   Epoch: 13   Global Step: 216990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:12:42,917-Speed 3272.35 samples/sec   Loss 0.7389   LearningRate 0.0122   Epoch: 13   Global Step: 217000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:12:46,054-Speed 3263.88 samples/sec   Loss 0.7175   LearningRate 0.0122   Epoch: 13   Global Step: 217010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:12:49,128-Speed 3333.00 samples/sec   Loss 0.7101   LearningRate 0.0122   Epoch: 13   Global Step: 217020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:12:52,285-Speed 3244.13 samples/sec   Loss 0.7064   LearningRate 0.0122   Epoch: 13   Global Step: 217030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:12:55,345-Speed 3346.54 samples/sec   Loss 0.7091   LearningRate 0.0122   Epoch: 13   Global Step: 217040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:12:58,430-Speed 3320.60 samples/sec   Loss 0.7158   LearningRate 0.0122   Epoch: 13   Global Step: 217050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:01,589-Speed 3241.97 samples/sec   Loss 0.7062   LearningRate 0.0122   Epoch: 13   Global Step: 217060   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 22:13:04,644-Speed 3353.03 samples/sec   Loss 0.7129   LearningRate 0.0122   Epoch: 13   Global Step: 217070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:07,710-Speed 3340.79 samples/sec   Loss 0.7227   LearningRate 0.0122   Epoch: 13   Global Step: 217080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:10,782-Speed 3333.70 samples/sec   Loss 0.6992   LearningRate 0.0122   Epoch: 13   Global Step: 217090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:13,851-Speed 3337.97 samples/sec   Loss 0.7165   LearningRate 0.0122   Epoch: 13   Global Step: 217100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:16,923-Speed 3334.13 samples/sec   Loss 0.6783   LearningRate 0.0122   Epoch: 13   Global Step: 217110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:20,010-Speed 3317.53 samples/sec   Loss 0.7235   LearningRate 0.0122   Epoch: 13   Global Step: 217120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:23,139-Speed 3273.54 samples/sec   Loss 0.6861   LearningRate 0.0122   Epoch: 13   Global Step: 217130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:26,234-Speed 3309.37 samples/sec   Loss 0.6988   LearningRate 0.0122   Epoch: 13   Global Step: 217140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:29,325-Speed 3313.24 samples/sec   Loss 0.7061   LearningRate 0.0122   Epoch: 13   Global Step: 217150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:32,407-Speed 3323.99 samples/sec   Loss 0.6770   LearningRate 0.0122   Epoch: 13   Global Step: 217160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:35,663-Speed 3145.72 samples/sec   Loss 0.7001   LearningRate 0.0122   Epoch: 13   Global Step: 217170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:38,733-Speed 3335.31 samples/sec   Loss 0.6987   LearningRate 0.0122   Epoch: 13   Global Step: 217180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:42,291-Speed 2878.81 samples/sec   Loss 0.6825   LearningRate 0.0122   Epoch: 13   Global Step: 217190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:45,551-Speed 3141.90 samples/sec   Loss 0.6691   LearningRate 0.0122   Epoch: 13   Global Step: 217200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:48,641-Speed 3314.52 samples/sec   Loss 0.7057   LearningRate 0.0122   Epoch: 13   Global Step: 217210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:51,740-Speed 3305.71 samples/sec   Loss 0.6937   LearningRate 0.0122   Epoch: 13   Global Step: 217220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:13:54,809-Speed 3337.60 samples/sec   Loss 0.6952   LearningRate 0.0122   Epoch: 13   Global Step: 217230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:13:57,869-Speed 3346.64 samples/sec   Loss 0.7030   LearningRate 0.0122   Epoch: 13   Global Step: 217240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:00,951-Speed 3323.89 samples/sec   Loss 0.7107   LearningRate 0.0122   Epoch: 13   Global Step: 217250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:04,025-Speed 3331.45 samples/sec   Loss 0.7199   LearningRate 0.0122   Epoch: 13   Global Step: 217260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:07,154-Speed 3273.33 samples/sec   Loss 0.6938   LearningRate 0.0122   Epoch: 13   Global Step: 217270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:10,252-Speed 3306.22 samples/sec   Loss 0.6970   LearningRate 0.0122   Epoch: 13   Global Step: 217280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:13,340-Speed 3316.54 samples/sec   Loss 0.7021   LearningRate 0.0122   Epoch: 13   Global Step: 217290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:16,413-Speed 3333.86 samples/sec   Loss 0.6693   LearningRate 0.0122   Epoch: 13   Global Step: 217300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:19,482-Speed 3337.64 samples/sec   Loss 0.7503   LearningRate 0.0122   Epoch: 13   Global Step: 217310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:22,544-Speed 3345.02 samples/sec   Loss 0.7114   LearningRate 0.0122   Epoch: 13   Global Step: 217320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:25,606-Speed 3345.52 samples/sec   Loss 0.6765   LearningRate 0.0122   Epoch: 13   Global Step: 217330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:14:28,666-Speed 3346.24 samples/sec   Loss 0.6865   LearningRate 0.0122   Epoch: 13   Global Step: 217340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:14:31,725-Speed 3348.22 samples/sec   Loss 0.7141   LearningRate 0.0122   Epoch: 13   Global Step: 217350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:14:34,809-Speed 3321.71 samples/sec   Loss 0.7207   LearningRate 0.0122   Epoch: 13   Global Step: 217360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:14:37,884-Speed 3330.86 samples/sec   Loss 0.7019   LearningRate 0.0122   Epoch: 13   Global Step: 217370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:14:40,967-Speed 3321.32 samples/sec   Loss 0.7054   LearningRate 0.0122   Epoch: 13   Global Step: 217380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:14:44,084-Speed 3286.63 samples/sec   Loss 0.7074   LearningRate 0.0122   Epoch: 13   Global Step: 217390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:14:47,196-Speed 3291.86 samples/sec   Loss 0.6876   LearningRate 0.0122   Epoch: 13   Global Step: 217400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:14:50,267-Speed 3334.69 samples/sec   Loss 0.7260   LearningRate 0.0122   Epoch: 13   Global Step: 217410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:53,337-Speed 3336.38 samples/sec   Loss 0.7145   LearningRate 0.0122   Epoch: 13   Global Step: 217420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:56,433-Speed 3308.08 samples/sec   Loss 0.7055   LearningRate 0.0122   Epoch: 13   Global Step: 217430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:14:59,524-Speed 3313.31 samples/sec   Loss 0.6859   LearningRate 0.0122   Epoch: 13   Global Step: 217440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:15:02,609-Speed 3320.91 samples/sec   Loss 0.6763   LearningRate 0.0122   Epoch: 13   Global Step: 217450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:15:05,673-Speed 3341.95 samples/sec   Loss 0.6941   LearningRate 0.0122   Epoch: 13   Global Step: 217460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:15:08,743-Speed 3336.46 samples/sec   Loss 0.7055   LearningRate 0.0121   Epoch: 13   Global Step: 217470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:15:11,800-Speed 3350.51 samples/sec   Loss 0.7205   LearningRate 0.0121   Epoch: 13   Global Step: 217480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:15:14,901-Speed 3303.79 samples/sec   Loss 0.6995   LearningRate 0.0121   Epoch: 13   Global Step: 217490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:15:17,975-Speed 3331.43 samples/sec   Loss 0.7083   LearningRate 0.0121   Epoch: 13   Global Step: 217500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:15:21,061-Speed 3319.10 samples/sec   Loss 0.6643   LearningRate 0.0121   Epoch: 13   Global Step: 217510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:24,139-Speed 3327.26 samples/sec   Loss 0.7348   LearningRate 0.0121   Epoch: 13   Global Step: 217520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:27,206-Speed 3340.04 samples/sec   Loss 0.6984   LearningRate 0.0121   Epoch: 13   Global Step: 217530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:30,296-Speed 3314.25 samples/sec   Loss 0.7043   LearningRate 0.0121   Epoch: 13   Global Step: 217540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:33,469-Speed 3227.69 samples/sec   Loss 0.6972   LearningRate 0.0121   Epoch: 13   Global Step: 217550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:36,612-Speed 3259.11 samples/sec   Loss 0.7045   LearningRate 0.0121   Epoch: 13   Global Step: 217560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:39,674-Speed 3345.61 samples/sec   Loss 0.7068   LearningRate 0.0121   Epoch: 13   Global Step: 217570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:42,747-Speed 3332.45 samples/sec   Loss 0.7223   LearningRate 0.0121   Epoch: 13   Global Step: 217580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:45,807-Speed 3347.42 samples/sec   Loss 0.7059   LearningRate 0.0121   Epoch: 13   Global Step: 217590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:48,877-Speed 3335.96 samples/sec   Loss 0.7029   LearningRate 0.0121   Epoch: 13   Global Step: 217600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:51,997-Speed 3283.52 samples/sec   Loss 0.6975   LearningRate 0.0121   Epoch: 13   Global Step: 217610   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 22:15:55,053-Speed 3351.15 samples/sec   Loss 0.6871   LearningRate 0.0121   Epoch: 13   Global Step: 217620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:15:58,119-Speed 3340.87 samples/sec   Loss 0.6902   LearningRate 0.0121   Epoch: 13   Global Step: 217630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:01,198-Speed 3325.83 samples/sec   Loss 0.6823   LearningRate 0.0121   Epoch: 13   Global Step: 217640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:04,265-Speed 3340.53 samples/sec   Loss 0.7363   LearningRate 0.0121   Epoch: 13   Global Step: 217650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:07,323-Speed 3348.86 samples/sec   Loss 0.7248   LearningRate 0.0121   Epoch: 13   Global Step: 217660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:10,384-Speed 3346.94 samples/sec   Loss 0.7014   LearningRate 0.0121   Epoch: 13   Global Step: 217670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:13,441-Speed 3349.89 samples/sec   Loss 0.7150   LearningRate 0.0121   Epoch: 13   Global Step: 217680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:16,509-Speed 3338.43 samples/sec   Loss 0.7600   LearningRate 0.0121   Epoch: 13   Global Step: 217690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:19,578-Speed 3337.34 samples/sec   Loss 0.7304   LearningRate 0.0121   Epoch: 13   Global Step: 217700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:22,640-Speed 3345.09 samples/sec   Loss 0.7345   LearningRate 0.0121   Epoch: 13   Global Step: 217710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:25,685-Speed 3364.06 samples/sec   Loss 0.7011   LearningRate 0.0121   Epoch: 13   Global Step: 217720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:28,770-Speed 3319.87 samples/sec   Loss 0.6817   LearningRate 0.0121   Epoch: 13   Global Step: 217730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:31,910-Speed 3262.41 samples/sec   Loss 0.7089   LearningRate 0.0121   Epoch: 13   Global Step: 217740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:34,990-Speed 3325.72 samples/sec   Loss 0.7405   LearningRate 0.0121   Epoch: 13   Global Step: 217750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:38,076-Speed 3318.29 samples/sec   Loss 0.6867   LearningRate 0.0121   Epoch: 13   Global Step: 217760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:41,190-Speed 3289.30 samples/sec   Loss 0.7045   LearningRate 0.0121   Epoch: 13   Global Step: 217770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:44,357-Speed 3234.39 samples/sec   Loss 0.7132   LearningRate 0.0121   Epoch: 13   Global Step: 217780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:47,449-Speed 3312.23 samples/sec   Loss 0.7283   LearningRate 0.0121   Epoch: 13   Global Step: 217790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:50,657-Speed 3192.84 samples/sec   Loss 0.7251   LearningRate 0.0121   Epoch: 13   Global Step: 217800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:53,774-Speed 3285.64 samples/sec   Loss 0.7477   LearningRate 0.0121   Epoch: 13   Global Step: 217810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:16:56,846-Speed 3334.68 samples/sec   Loss 0.7114   LearningRate 0.0121   Epoch: 13   Global Step: 217820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:16:59,959-Speed 3290.19 samples/sec   Loss 0.7522   LearningRate 0.0121   Epoch: 13   Global Step: 217830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:03,043-Speed 3321.85 samples/sec   Loss 0.7388   LearningRate 0.0121   Epoch: 13   Global Step: 217840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:06,128-Speed 3319.65 samples/sec   Loss 0.6943   LearningRate 0.0121   Epoch: 13   Global Step: 217850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:09,195-Speed 3339.39 samples/sec   Loss 0.7005   LearningRate 0.0121   Epoch: 13   Global Step: 217860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:12,264-Speed 3337.74 samples/sec   Loss 0.7036   LearningRate 0.0121   Epoch: 13   Global Step: 217870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:15,321-Speed 3349.67 samples/sec   Loss 0.7253   LearningRate 0.0121   Epoch: 13   Global Step: 217880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:18,410-Speed 3316.40 samples/sec   Loss 0.7083   LearningRate 0.0121   Epoch: 13   Global Step: 217890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:21,512-Speed 3301.32 samples/sec   Loss 0.7124   LearningRate 0.0121   Epoch: 13   Global Step: 217900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:24,689-Speed 3223.99 samples/sec   Loss 0.7157   LearningRate 0.0121   Epoch: 13   Global Step: 217910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:27,753-Speed 3343.47 samples/sec   Loss 0.7474   LearningRate 0.0121   Epoch: 13   Global Step: 217920   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 22:17:30,832-Speed 3326.48 samples/sec   Loss 0.6943   LearningRate 0.0121   Epoch: 13   Global Step: 217930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:33,920-Speed 3316.45 samples/sec   Loss 0.7062   LearningRate 0.0121   Epoch: 13   Global Step: 217940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:36,996-Speed 3329.70 samples/sec   Loss 0.7171   LearningRate 0.0120   Epoch: 13   Global Step: 217950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:40,139-Speed 3259.18 samples/sec   Loss 0.6919   LearningRate 0.0120   Epoch: 13   Global Step: 217960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:43,367-Speed 3172.66 samples/sec   Loss 0.7198   LearningRate 0.0120   Epoch: 13   Global Step: 217970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:46,508-Speed 3261.07 samples/sec   Loss 0.7434   LearningRate 0.0120   Epoch: 13   Global Step: 217980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:49,576-Speed 3338.55 samples/sec   Loss 0.7315   LearningRate 0.0120   Epoch: 13   Global Step: 217990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:17:52,698-Speed 3280.88 samples/sec   Loss 0.7047   LearningRate 0.0120   Epoch: 13   Global Step: 218000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:18:36,686-[lfw][218000]XNorm: 20.087014
Training: 2022-04-11 22:18:36,686-[lfw][218000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-11 22:18:36,687-[lfw][218000]Accuracy-Highest: 0.99817
Training: 2022-04-11 22:19:27,605-[cfp_fp][218000]XNorm: 20.891124
Training: 2022-04-11 22:19:27,606-[cfp_fp][218000]Accuracy-Flip: 0.99100+-0.00414
Training: 2022-04-11 22:19:27,606-[cfp_fp][218000]Accuracy-Highest: 0.99100
Training: 2022-04-11 22:20:11,497-[agedb_30][218000]XNorm: 21.343681
Training: 2022-04-11 22:20:11,498-[agedb_30][218000]Accuracy-Flip: 0.98400+-0.00688
Training: 2022-04-11 22:20:11,498-[agedb_30][218000]Accuracy-Highest: 0.98567
Training: 2022-04-11 22:20:14,571-Speed 72.18 samples/sec   Loss 0.7104   LearningRate 0.0120   Epoch: 13   Global Step: 218010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:20:17,647-Speed 3329.56 samples/sec   Loss 0.7471   LearningRate 0.0120   Epoch: 13   Global Step: 218020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:20:20,868-Speed 3180.95 samples/sec   Loss 0.7412   LearningRate 0.0120   Epoch: 13   Global Step: 218030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:20:24,030-Speed 3238.57 samples/sec   Loss 0.7161   LearningRate 0.0120   Epoch: 13   Global Step: 218040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:20:27,126-Speed 3308.62 samples/sec   Loss 0.7007   LearningRate 0.0120   Epoch: 13   Global Step: 218050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:30,182-Speed 3351.32 samples/sec   Loss 0.7165   LearningRate 0.0120   Epoch: 13   Global Step: 218060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:33,258-Speed 3329.62 samples/sec   Loss 0.7173   LearningRate 0.0120   Epoch: 13   Global Step: 218070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:36,319-Speed 3346.22 samples/sec   Loss 0.7249   LearningRate 0.0120   Epoch: 13   Global Step: 218080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:39,381-Speed 3345.47 samples/sec   Loss 0.7290   LearningRate 0.0120   Epoch: 13   Global Step: 218090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:42,495-Speed 3288.95 samples/sec   Loss 0.7267   LearningRate 0.0120   Epoch: 13   Global Step: 218100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:45,604-Speed 3294.76 samples/sec   Loss 0.7262   LearningRate 0.0120   Epoch: 13   Global Step: 218110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:48,748-Speed 3257.76 samples/sec   Loss 0.7351   LearningRate 0.0120   Epoch: 13   Global Step: 218120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:51,802-Speed 3353.54 samples/sec   Loss 0.7166   LearningRate 0.0120   Epoch: 13   Global Step: 218130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:54,898-Speed 3308.52 samples/sec   Loss 0.7290   LearningRate 0.0120   Epoch: 13   Global Step: 218140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:20:57,956-Speed 3348.82 samples/sec   Loss 0.7218   LearningRate 0.0120   Epoch: 13   Global Step: 218150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:21:01,070-Speed 3289.58 samples/sec   Loss 0.7134   LearningRate 0.0120   Epoch: 13   Global Step: 218160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:04,126-Speed 3350.98 samples/sec   Loss 0.7097   LearningRate 0.0120   Epoch: 13   Global Step: 218170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:07,190-Speed 3342.76 samples/sec   Loss 0.7528   LearningRate 0.0120   Epoch: 13   Global Step: 218180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:10,292-Speed 3303.20 samples/sec   Loss 0.7515   LearningRate 0.0120   Epoch: 13   Global Step: 218190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:13,365-Speed 3332.40 samples/sec   Loss 0.7042   LearningRate 0.0120   Epoch: 13   Global Step: 218200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:16,422-Speed 3350.75 samples/sec   Loss 0.7206   LearningRate 0.0120   Epoch: 13   Global Step: 218210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:19,530-Speed 3295.42 samples/sec   Loss 0.7255   LearningRate 0.0120   Epoch: 13   Global Step: 218220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:22,661-Speed 3271.43 samples/sec   Loss 0.7102   LearningRate 0.0120   Epoch: 13   Global Step: 218230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:25,821-Speed 3241.19 samples/sec   Loss 0.6803   LearningRate 0.0120   Epoch: 13   Global Step: 218240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:28,931-Speed 3292.88 samples/sec   Loss 0.7085   LearningRate 0.0120   Epoch: 13   Global Step: 218250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:31,994-Speed 3343.63 samples/sec   Loss 0.7158   LearningRate 0.0120   Epoch: 13   Global Step: 218260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:21:35,051-Speed 3350.42 samples/sec   Loss 0.7469   LearningRate 0.0120   Epoch: 13   Global Step: 218270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:21:38,117-Speed 3341.28 samples/sec   Loss 0.7295   LearningRate 0.0120   Epoch: 13   Global Step: 218280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:21:41,170-Speed 3355.40 samples/sec   Loss 0.7549   LearningRate 0.0120   Epoch: 13   Global Step: 218290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:21:44,235-Speed 3341.63 samples/sec   Loss 0.7111   LearningRate 0.0120   Epoch: 13   Global Step: 218300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:21:47,295-Speed 3347.39 samples/sec   Loss 0.7226   LearningRate 0.0120   Epoch: 13   Global Step: 218310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:21:50,357-Speed 3344.86 samples/sec   Loss 0.7437   LearningRate 0.0120   Epoch: 13   Global Step: 218320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:53,414-Speed 3350.48 samples/sec   Loss 0.7411   LearningRate 0.0120   Epoch: 13   Global Step: 218330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:56,476-Speed 3344.64 samples/sec   Loss 0.7167   LearningRate 0.0120   Epoch: 13   Global Step: 218340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:21:59,540-Speed 3342.99 samples/sec   Loss 0.7654   LearningRate 0.0120   Epoch: 13   Global Step: 218350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:02,597-Speed 3350.25 samples/sec   Loss 0.7334   LearningRate 0.0120   Epoch: 13   Global Step: 218360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:05,658-Speed 3346.84 samples/sec   Loss 0.7528   LearningRate 0.0120   Epoch: 13   Global Step: 218370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:08,710-Speed 3355.07 samples/sec   Loss 0.6913   LearningRate 0.0120   Epoch: 13   Global Step: 218380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:11,769-Speed 3349.27 samples/sec   Loss 0.7322   LearningRate 0.0120   Epoch: 13   Global Step: 218390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:14,836-Speed 3339.33 samples/sec   Loss 0.7288   LearningRate 0.0120   Epoch: 13   Global Step: 218400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:17,894-Speed 3348.49 samples/sec   Loss 0.7244   LearningRate 0.0120   Epoch: 13   Global Step: 218410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:20,941-Speed 3361.61 samples/sec   Loss 0.7414   LearningRate 0.0120   Epoch: 13   Global Step: 218420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:24,011-Speed 3336.72 samples/sec   Loss 0.7672   LearningRate 0.0119   Epoch: 13   Global Step: 218430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:27,117-Speed 3297.17 samples/sec   Loss 0.6996   LearningRate 0.0119   Epoch: 13   Global Step: 218440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:30,185-Speed 3338.90 samples/sec   Loss 0.7229   LearningRate 0.0119   Epoch: 13   Global Step: 218450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:33,271-Speed 3318.78 samples/sec   Loss 0.7342   LearningRate 0.0119   Epoch: 13   Global Step: 218460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:36,340-Speed 3336.78 samples/sec   Loss 0.7585   LearningRate 0.0119   Epoch: 13   Global Step: 218470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:39,400-Speed 3348.07 samples/sec   Loss 0.7473   LearningRate 0.0119   Epoch: 13   Global Step: 218480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:42,457-Speed 3350.27 samples/sec   Loss 0.7298   LearningRate 0.0119   Epoch: 13   Global Step: 218490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:45,565-Speed 3295.28 samples/sec   Loss 0.7053   LearningRate 0.0119   Epoch: 13   Global Step: 218500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:48,627-Speed 3345.44 samples/sec   Loss 0.7529   LearningRate 0.0119   Epoch: 13   Global Step: 218510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:22:51,686-Speed 3347.64 samples/sec   Loss 0.7595   LearningRate 0.0119   Epoch: 13   Global Step: 218520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:22:54,773-Speed 3317.41 samples/sec   Loss 0.6892   LearningRate 0.0119   Epoch: 13   Global Step: 218530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:22:57,839-Speed 3341.37 samples/sec   Loss 0.7474   LearningRate 0.0119   Epoch: 13   Global Step: 218540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:00,913-Speed 3332.72 samples/sec   Loss 0.7533   LearningRate 0.0119   Epoch: 13   Global Step: 218550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:04,045-Speed 3269.81 samples/sec   Loss 0.7128   LearningRate 0.0119   Epoch: 13   Global Step: 218560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:07,101-Speed 3350.69 samples/sec   Loss 0.7310   LearningRate 0.0119   Epoch: 13   Global Step: 218570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:10,163-Speed 3345.21 samples/sec   Loss 0.7327   LearningRate 0.0119   Epoch: 13   Global Step: 218580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:13,235-Speed 3334.62 samples/sec   Loss 0.7188   LearningRate 0.0119   Epoch: 13   Global Step: 218590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:16,297-Speed 3345.22 samples/sec   Loss 0.7284   LearningRate 0.0119   Epoch: 13   Global Step: 218600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:19,347-Speed 3357.72 samples/sec   Loss 0.7270   LearningRate 0.0119   Epoch: 13   Global Step: 218610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:22,409-Speed 3344.98 samples/sec   Loss 0.7339   LearningRate 0.0119   Epoch: 13   Global Step: 218620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:25,466-Speed 3351.00 samples/sec   Loss 0.7399   LearningRate 0.0119   Epoch: 13   Global Step: 218630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:28,528-Speed 3344.60 samples/sec   Loss 0.7166   LearningRate 0.0119   Epoch: 13   Global Step: 218640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:31,585-Speed 3351.44 samples/sec   Loss 0.7566   LearningRate 0.0119   Epoch: 13   Global Step: 218650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:34,644-Speed 3347.61 samples/sec   Loss 0.7448   LearningRate 0.0119   Epoch: 13   Global Step: 218660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:37,719-Speed 3331.36 samples/sec   Loss 0.7325   LearningRate 0.0119   Epoch: 13   Global Step: 218670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:40,911-Speed 3208.05 samples/sec   Loss 0.7349   LearningRate 0.0119   Epoch: 13   Global Step: 218680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:43,975-Speed 3343.12 samples/sec   Loss 0.7656   LearningRate 0.0119   Epoch: 13   Global Step: 218690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:47,045-Speed 3336.24 samples/sec   Loss 0.7329   LearningRate 0.0119   Epoch: 13   Global Step: 218700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:23:50,116-Speed 3335.81 samples/sec   Loss 0.7510   LearningRate 0.0119   Epoch: 13   Global Step: 218710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:53,196-Speed 3325.82 samples/sec   Loss 0.7707   LearningRate 0.0119   Epoch: 13   Global Step: 218720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:56,257-Speed 3345.28 samples/sec   Loss 0.7537   LearningRate 0.0119   Epoch: 13   Global Step: 218730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:23:59,380-Speed 3279.83 samples/sec   Loss 0.7239   LearningRate 0.0119   Epoch: 13   Global Step: 218740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:24:02,437-Speed 3350.63 samples/sec   Loss 0.7292   LearningRate 0.0119   Epoch: 13   Global Step: 218750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:24:05,492-Speed 3351.95 samples/sec   Loss 0.7440   LearningRate 0.0119   Epoch: 13   Global Step: 218760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:24:08,576-Speed 3321.70 samples/sec   Loss 0.7371   LearningRate 0.0119   Epoch: 13   Global Step: 218770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:11,649-Speed 3333.54 samples/sec   Loss 0.7383   LearningRate 0.0119   Epoch: 13   Global Step: 218780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:14,716-Speed 3338.77 samples/sec   Loss 0.7523   LearningRate 0.0119   Epoch: 13   Global Step: 218790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:17,785-Speed 3338.03 samples/sec   Loss 0.7525   LearningRate 0.0119   Epoch: 13   Global Step: 218800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:20,911-Speed 3276.32 samples/sec   Loss 0.7249   LearningRate 0.0119   Epoch: 13   Global Step: 218810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:23,967-Speed 3352.22 samples/sec   Loss 0.7260   LearningRate 0.0119   Epoch: 13   Global Step: 218820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:27,029-Speed 3343.95 samples/sec   Loss 0.7108   LearningRate 0.0119   Epoch: 13   Global Step: 218830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:30,089-Speed 3347.55 samples/sec   Loss 0.7449   LearningRate 0.0119   Epoch: 13   Global Step: 218840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:33,147-Speed 3349.52 samples/sec   Loss 0.7216   LearningRate 0.0119   Epoch: 13   Global Step: 218850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:36,202-Speed 3352.27 samples/sec   Loss 0.7225   LearningRate 0.0119   Epoch: 13   Global Step: 218860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:24:39,301-Speed 3305.68 samples/sec   Loss 0.7239   LearningRate 0.0119   Epoch: 13   Global Step: 218870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:24:42,360-Speed 3348.23 samples/sec   Loss 0.7645   LearningRate 0.0119   Epoch: 13   Global Step: 218880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:24:45,471-Speed 3292.74 samples/sec   Loss 0.7300   LearningRate 0.0119   Epoch: 13   Global Step: 218890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:24:48,582-Speed 3291.76 samples/sec   Loss 0.7392   LearningRate 0.0119   Epoch: 13   Global Step: 218900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:24:51,662-Speed 3325.83 samples/sec   Loss 0.7675   LearningRate 0.0118   Epoch: 13   Global Step: 218910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:24:54,784-Speed 3280.85 samples/sec   Loss 0.7164   LearningRate 0.0118   Epoch: 13   Global Step: 218920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:24:58,004-Speed 3180.32 samples/sec   Loss 0.7381   LearningRate 0.0118   Epoch: 13   Global Step: 218930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:02,003-Speed 2561.39 samples/sec   Loss 0.7745   LearningRate 0.0118   Epoch: 13   Global Step: 218940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:05,075-Speed 3333.91 samples/sec   Loss 0.7440   LearningRate 0.0118   Epoch: 13   Global Step: 218950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:11,144-Speed 1687.50 samples/sec   Loss 0.7523   LearningRate 0.0118   Epoch: 13   Global Step: 218960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:14,998-Speed 2657.52 samples/sec   Loss 0.7727   LearningRate 0.0118   Epoch: 13   Global Step: 218970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:18,068-Speed 3336.45 samples/sec   Loss 0.7128   LearningRate 0.0118   Epoch: 13   Global Step: 218980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:21,196-Speed 3274.58 samples/sec   Loss 0.7371   LearningRate 0.0118   Epoch: 13   Global Step: 218990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:24,273-Speed 3328.35 samples/sec   Loss 0.7426   LearningRate 0.0118   Epoch: 13   Global Step: 219000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:27,327-Speed 3352.89 samples/sec   Loss 0.7573   LearningRate 0.0118   Epoch: 13   Global Step: 219010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:30,388-Speed 3346.85 samples/sec   Loss 0.7589   LearningRate 0.0118   Epoch: 13   Global Step: 219020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:33,457-Speed 3336.79 samples/sec   Loss 0.7119   LearningRate 0.0118   Epoch: 13   Global Step: 219030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:36,535-Speed 3327.79 samples/sec   Loss 0.7699   LearningRate 0.0118   Epoch: 13   Global Step: 219040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:39,641-Speed 3298.14 samples/sec   Loss 0.7288   LearningRate 0.0118   Epoch: 13   Global Step: 219050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:42,847-Speed 3194.74 samples/sec   Loss 0.7653   LearningRate 0.0118   Epoch: 13   Global Step: 219060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:45,928-Speed 3323.84 samples/sec   Loss 0.7873   LearningRate 0.0118   Epoch: 13   Global Step: 219070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:49,019-Speed 3314.27 samples/sec   Loss 0.7125   LearningRate 0.0118   Epoch: 13   Global Step: 219080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:52,193-Speed 3227.00 samples/sec   Loss 0.7418   LearningRate 0.0118   Epoch: 13   Global Step: 219090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:25:55,291-Speed 3305.74 samples/sec   Loss 0.7552   LearningRate 0.0118   Epoch: 13   Global Step: 219100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:25:58,412-Speed 3281.25 samples/sec   Loss 0.7564   LearningRate 0.0118   Epoch: 13   Global Step: 219110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:01,480-Speed 3338.70 samples/sec   Loss 0.7345   LearningRate 0.0118   Epoch: 13   Global Step: 219120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:04,551-Speed 3335.64 samples/sec   Loss 0.7320   LearningRate 0.0118   Epoch: 13   Global Step: 219130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:07,650-Speed 3304.56 samples/sec   Loss 0.7187   LearningRate 0.0118   Epoch: 13   Global Step: 219140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:10,722-Speed 3334.97 samples/sec   Loss 0.7972   LearningRate 0.0118   Epoch: 13   Global Step: 219150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:13,836-Speed 3288.33 samples/sec   Loss 0.7532   LearningRate 0.0118   Epoch: 13   Global Step: 219160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:16,897-Speed 3346.44 samples/sec   Loss 0.7369   LearningRate 0.0118   Epoch: 13   Global Step: 219170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:20,000-Speed 3300.18 samples/sec   Loss 0.7644   LearningRate 0.0118   Epoch: 13   Global Step: 219180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:23,222-Speed 3179.21 samples/sec   Loss 0.7608   LearningRate 0.0118   Epoch: 13   Global Step: 219190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:26,381-Speed 3242.64 samples/sec   Loss 0.7131   LearningRate 0.0118   Epoch: 13   Global Step: 219200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:26:29,471-Speed 3314.69 samples/sec   Loss 0.7384   LearningRate 0.0118   Epoch: 13   Global Step: 219210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:26:32,552-Speed 3323.83 samples/sec   Loss 0.7378   LearningRate 0.0118   Epoch: 13   Global Step: 219220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:26:35,641-Speed 3316.36 samples/sec   Loss 0.7367   LearningRate 0.0118   Epoch: 13   Global Step: 219230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:26:38,707-Speed 3340.13 samples/sec   Loss 0.7294   LearningRate 0.0118   Epoch: 13   Global Step: 219240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:26:41,768-Speed 3346.72 samples/sec   Loss 0.7576   LearningRate 0.0118   Epoch: 13   Global Step: 219250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:26:44,840-Speed 3333.97 samples/sec   Loss 0.7531   LearningRate 0.0118   Epoch: 13   Global Step: 219260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:26:47,917-Speed 3328.18 samples/sec   Loss 0.7098   LearningRate 0.0118   Epoch: 13   Global Step: 219270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:26:51,014-Speed 3307.92 samples/sec   Loss 0.7687   LearningRate 0.0118   Epoch: 13   Global Step: 219280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:26:54,061-Speed 3360.89 samples/sec   Loss 0.7023   LearningRate 0.0118   Epoch: 13   Global Step: 219290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:26:57,131-Speed 3336.08 samples/sec   Loss 0.7748   LearningRate 0.0118   Epoch: 13   Global Step: 219300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:00,197-Speed 3340.92 samples/sec   Loss 0.7354   LearningRate 0.0118   Epoch: 13   Global Step: 219310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:03,355-Speed 3243.44 samples/sec   Loss 0.7685   LearningRate 0.0118   Epoch: 13   Global Step: 219320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:06,538-Speed 3218.11 samples/sec   Loss 0.7841   LearningRate 0.0118   Epoch: 13   Global Step: 219330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:09,720-Speed 3218.70 samples/sec   Loss 0.7766   LearningRate 0.0118   Epoch: 13   Global Step: 219340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:12,880-Speed 3241.23 samples/sec   Loss 0.7402   LearningRate 0.0118   Epoch: 13   Global Step: 219350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:15,956-Speed 3330.29 samples/sec   Loss 0.7429   LearningRate 0.0118   Epoch: 13   Global Step: 219360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:19,033-Speed 3327.60 samples/sec   Loss 0.7294   LearningRate 0.0118   Epoch: 13   Global Step: 219370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:22,189-Speed 3245.94 samples/sec   Loss 0.7297   LearningRate 0.0118   Epoch: 13   Global Step: 219380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:25,443-Speed 3147.24 samples/sec   Loss 0.7900   LearningRate 0.0118   Epoch: 13   Global Step: 219390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:27:28,501-Speed 3349.54 samples/sec   Loss 0.7726   LearningRate 0.0117   Epoch: 13   Global Step: 219400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:31,582-Speed 3325.25 samples/sec   Loss 0.7657   LearningRate 0.0117   Epoch: 13   Global Step: 219410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:34,748-Speed 3234.85 samples/sec   Loss 0.7286   LearningRate 0.0117   Epoch: 13   Global Step: 219420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:37,811-Speed 3343.24 samples/sec   Loss 0.7462   LearningRate 0.0117   Epoch: 13   Global Step: 219430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:40,908-Speed 3307.84 samples/sec   Loss 0.7748   LearningRate 0.0117   Epoch: 13   Global Step: 219440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:43,969-Speed 3345.80 samples/sec   Loss 0.7376   LearningRate 0.0117   Epoch: 13   Global Step: 219450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:47,040-Speed 3335.29 samples/sec   Loss 0.7387   LearningRate 0.0117   Epoch: 13   Global Step: 219460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:50,143-Speed 3300.72 samples/sec   Loss 0.7419   LearningRate 0.0117   Epoch: 13   Global Step: 219470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:53,226-Speed 3322.46 samples/sec   Loss 0.7560   LearningRate 0.0117   Epoch: 13   Global Step: 219480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:56,299-Speed 3332.86 samples/sec   Loss 0.7624   LearningRate 0.0117   Epoch: 13   Global Step: 219490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:27:59,389-Speed 3315.29 samples/sec   Loss 0.7891   LearningRate 0.0117   Epoch: 13   Global Step: 219500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:28:02,485-Speed 3307.30 samples/sec   Loss 0.7981   LearningRate 0.0117   Epoch: 13   Global Step: 219510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:28:05,618-Speed 3269.79 samples/sec   Loss 0.7694   LearningRate 0.0117   Epoch: 13   Global Step: 219520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:28:08,749-Speed 3270.62 samples/sec   Loss 0.7619   LearningRate 0.0117   Epoch: 13   Global Step: 219530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:28:11,825-Speed 3330.21 samples/sec   Loss 0.7809   LearningRate 0.0117   Epoch: 13   Global Step: 219540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:14,903-Speed 3327.74 samples/sec   Loss 0.7823   LearningRate 0.0117   Epoch: 13   Global Step: 219550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:18,043-Speed 3261.88 samples/sec   Loss 0.7962   LearningRate 0.0117   Epoch: 13   Global Step: 219560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:21,268-Speed 3176.10 samples/sec   Loss 0.7537   LearningRate 0.0117   Epoch: 13   Global Step: 219570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:24,493-Speed 3176.10 samples/sec   Loss 0.7698   LearningRate 0.0117   Epoch: 13   Global Step: 219580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:27,640-Speed 3254.14 samples/sec   Loss 0.7579   LearningRate 0.0117   Epoch: 13   Global Step: 219590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:30,856-Speed 3185.00 samples/sec   Loss 0.7625   LearningRate 0.0117   Epoch: 13   Global Step: 219600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:33,964-Speed 3294.98 samples/sec   Loss 0.7790   LearningRate 0.0117   Epoch: 13   Global Step: 219610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:37,080-Speed 3287.02 samples/sec   Loss 0.7836   LearningRate 0.0117   Epoch: 13   Global Step: 219620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:40,201-Speed 3282.42 samples/sec   Loss 0.7472   LearningRate 0.0117   Epoch: 13   Global Step: 219630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:28:43,267-Speed 3340.48 samples/sec   Loss 0.7495   LearningRate 0.0117   Epoch: 13   Global Step: 219640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:28:46,352-Speed 3320.11 samples/sec   Loss 0.8047   LearningRate 0.0117   Epoch: 13   Global Step: 219650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:28:49,433-Speed 3323.65 samples/sec   Loss 0.7821   LearningRate 0.0117   Epoch: 13   Global Step: 219660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:28:52,571-Speed 3264.46 samples/sec   Loss 0.7619   LearningRate 0.0117   Epoch: 13   Global Step: 219670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:28:55,695-Speed 3278.94 samples/sec   Loss 0.7847   LearningRate 0.0117   Epoch: 13   Global Step: 219680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:28:58,766-Speed 3335.12 samples/sec   Loss 0.7636   LearningRate 0.0117   Epoch: 13   Global Step: 219690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:29:01,878-Speed 3290.91 samples/sec   Loss 0.7918   LearningRate 0.0117   Epoch: 13   Global Step: 219700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:29:04,953-Speed 3331.71 samples/sec   Loss 0.7698   LearningRate 0.0117   Epoch: 13   Global Step: 219710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:29:08,029-Speed 3329.55 samples/sec   Loss 0.7858   LearningRate 0.0117   Epoch: 13   Global Step: 219720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:29:11,083-Speed 3353.97 samples/sec   Loss 0.7673   LearningRate 0.0117   Epoch: 13   Global Step: 219730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:14,150-Speed 3338.55 samples/sec   Loss 0.7298   LearningRate 0.0117   Epoch: 13   Global Step: 219740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:17,261-Speed 3293.46 samples/sec   Loss 0.7713   LearningRate 0.0117   Epoch: 13   Global Step: 219750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:20,326-Speed 3341.57 samples/sec   Loss 0.7726   LearningRate 0.0117   Epoch: 13   Global Step: 219760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:23,396-Speed 3335.83 samples/sec   Loss 0.7949   LearningRate 0.0117   Epoch: 13   Global Step: 219770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:26,476-Speed 3325.72 samples/sec   Loss 0.7541   LearningRate 0.0117   Epoch: 13   Global Step: 219780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:29,550-Speed 3331.71 samples/sec   Loss 0.7517   LearningRate 0.0117   Epoch: 13   Global Step: 219790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:32,616-Speed 3340.86 samples/sec   Loss 0.7557   LearningRate 0.0117   Epoch: 13   Global Step: 219800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:35,697-Speed 3323.58 samples/sec   Loss 0.7337   LearningRate 0.0117   Epoch: 13   Global Step: 219810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:38,766-Speed 3337.85 samples/sec   Loss 0.7792   LearningRate 0.0117   Epoch: 13   Global Step: 219820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:41,846-Speed 3324.95 samples/sec   Loss 0.7389   LearningRate 0.0117   Epoch: 13   Global Step: 219830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:29:44,954-Speed 3296.31 samples/sec   Loss 0.7396   LearningRate 0.0117   Epoch: 13   Global Step: 219840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:29:48,024-Speed 3336.72 samples/sec   Loss 0.7533   LearningRate 0.0117   Epoch: 13   Global Step: 219850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:29:51,091-Speed 3338.76 samples/sec   Loss 0.7831   LearningRate 0.0117   Epoch: 13   Global Step: 219860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:54,157-Speed 3341.61 samples/sec   Loss 0.7687   LearningRate 0.0117   Epoch: 13   Global Step: 219870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:29:57,224-Speed 3338.83 samples/sec   Loss 0.7733   LearningRate 0.0117   Epoch: 13   Global Step: 219880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:00,291-Speed 3339.90 samples/sec   Loss 0.7725   LearningRate 0.0116   Epoch: 13   Global Step: 219890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:03,360-Speed 3337.08 samples/sec   Loss 0.7624   LearningRate 0.0116   Epoch: 13   Global Step: 219900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:06,464-Speed 3299.75 samples/sec   Loss 0.7817   LearningRate 0.0116   Epoch: 13   Global Step: 219910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:09,604-Speed 3262.49 samples/sec   Loss 0.7563   LearningRate 0.0116   Epoch: 13   Global Step: 219920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:12,676-Speed 3334.37 samples/sec   Loss 0.7715   LearningRate 0.0116   Epoch: 13   Global Step: 219930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:15,776-Speed 3303.23 samples/sec   Loss 0.7905   LearningRate 0.0116   Epoch: 13   Global Step: 219940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:18,841-Speed 3342.39 samples/sec   Loss 0.7775   LearningRate 0.0116   Epoch: 13   Global Step: 219950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:21,950-Speed 3293.82 samples/sec   Loss 0.7720   LearningRate 0.0116   Epoch: 13   Global Step: 219960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:30:25,027-Speed 3329.23 samples/sec   Loss 0.7693   LearningRate 0.0116   Epoch: 13   Global Step: 219970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:30:28,099-Speed 3333.94 samples/sec   Loss 0.7414   LearningRate 0.0116   Epoch: 13   Global Step: 219980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:31,172-Speed 3332.72 samples/sec   Loss 0.7473   LearningRate 0.0116   Epoch: 13   Global Step: 219990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:30:34,244-Speed 3334.55 samples/sec   Loss 0.7771   LearningRate 0.0116   Epoch: 13   Global Step: 220000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:31:18,537-[lfw][220000]XNorm: 22.635737
Training: 2022-04-11 22:31:18,538-[lfw][220000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-11 22:31:18,538-[lfw][220000]Accuracy-Highest: 0.99817
Training: 2022-04-11 22:32:10,033-[cfp_fp][220000]XNorm: 23.086130
Training: 2022-04-11 22:32:10,034-[cfp_fp][220000]Accuracy-Flip: 0.99043+-0.00452
Training: 2022-04-11 22:32:10,034-[cfp_fp][220000]Accuracy-Highest: 0.99100
Training: 2022-04-11 22:32:54,339-[agedb_30][220000]XNorm: 23.498052
Training: 2022-04-11 22:32:54,339-[agedb_30][220000]Accuracy-Flip: 0.98450+-0.00628
Training: 2022-04-11 22:32:54,340-[agedb_30][220000]Accuracy-Highest: 0.98567
Training: 2022-04-11 22:32:57,516-Speed 71.47 samples/sec   Loss 0.7683   LearningRate 0.0116   Epoch: 13   Global Step: 220010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:00,601-Speed 3320.27 samples/sec   Loss 0.7828   LearningRate 0.0116   Epoch: 13   Global Step: 220020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:03,659-Speed 3350.31 samples/sec   Loss 0.7621   LearningRate 0.0116   Epoch: 13   Global Step: 220030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:06,762-Speed 3300.26 samples/sec   Loss 0.8025   LearningRate 0.0116   Epoch: 13   Global Step: 220040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:09,831-Speed 3337.61 samples/sec   Loss 0.7746   LearningRate 0.0116   Epoch: 13   Global Step: 220050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:13,003-Speed 3228.28 samples/sec   Loss 0.7965   LearningRate 0.0116   Epoch: 13   Global Step: 220060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:16,110-Speed 3296.32 samples/sec   Loss 0.7950   LearningRate 0.0116   Epoch: 13   Global Step: 220070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:19,179-Speed 3337.60 samples/sec   Loss 0.8002   LearningRate 0.0116   Epoch: 13   Global Step: 220080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:22,261-Speed 3323.08 samples/sec   Loss 0.7565   LearningRate 0.0116   Epoch: 13   Global Step: 220090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:25,321-Speed 3347.61 samples/sec   Loss 0.8015   LearningRate 0.0116   Epoch: 13   Global Step: 220100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:28,401-Speed 3325.41 samples/sec   Loss 0.7767   LearningRate 0.0116   Epoch: 13   Global Step: 220110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:31,515-Speed 3288.74 samples/sec   Loss 0.7650   LearningRate 0.0116   Epoch: 13   Global Step: 220120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:34,605-Speed 3314.67 samples/sec   Loss 0.7849   LearningRate 0.0116   Epoch: 13   Global Step: 220130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:37,667-Speed 3345.64 samples/sec   Loss 0.7767   LearningRate 0.0116   Epoch: 13   Global Step: 220140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:40,736-Speed 3336.52 samples/sec   Loss 0.7372   LearningRate 0.0116   Epoch: 13   Global Step: 220150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:43,865-Speed 3273.21 samples/sec   Loss 0.8255   LearningRate 0.0116   Epoch: 13   Global Step: 220160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:47,013-Speed 3254.29 samples/sec   Loss 0.7713   LearningRate 0.0116   Epoch: 13   Global Step: 220170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:33:50,137-Speed 3279.09 samples/sec   Loss 0.7748   LearningRate 0.0116   Epoch: 13   Global Step: 220180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:33:53,293-Speed 3244.57 samples/sec   Loss 0.8063   LearningRate 0.0116   Epoch: 13   Global Step: 220190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:33:56,352-Speed 3348.82 samples/sec   Loss 0.7717   LearningRate 0.0116   Epoch: 13   Global Step: 220200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:33:59,555-Speed 3197.06 samples/sec   Loss 0.8156   LearningRate 0.0116   Epoch: 13   Global Step: 220210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:34:02,688-Speed 3269.64 samples/sec   Loss 0.7546   LearningRate 0.0116   Epoch: 13   Global Step: 220220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:34:05,750-Speed 3344.70 samples/sec   Loss 0.7832   LearningRate 0.0116   Epoch: 13   Global Step: 220230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:34:08,828-Speed 3327.96 samples/sec   Loss 0.7775   LearningRate 0.0116   Epoch: 13   Global Step: 220240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:34:11,888-Speed 3346.38 samples/sec   Loss 0.7648   LearningRate 0.0116   Epoch: 13   Global Step: 220250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:34:14,954-Speed 3340.34 samples/sec   Loss 0.7845   LearningRate 0.0116   Epoch: 13   Global Step: 220260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:34:18,022-Speed 3339.57 samples/sec   Loss 0.7798   LearningRate 0.0116   Epoch: 13   Global Step: 220270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:34:21,108-Speed 3318.63 samples/sec   Loss 0.7779   LearningRate 0.0116   Epoch: 13   Global Step: 220280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:34:24,273-Speed 3236.11 samples/sec   Loss 0.7655   LearningRate 0.0116   Epoch: 13   Global Step: 220290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:27,367-Speed 3310.20 samples/sec   Loss 0.7701   LearningRate 0.0116   Epoch: 13   Global Step: 220300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:30,472-Speed 3298.97 samples/sec   Loss 0.7583   LearningRate 0.0116   Epoch: 13   Global Step: 220310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:33,618-Speed 3255.35 samples/sec   Loss 0.7603   LearningRate 0.0116   Epoch: 13   Global Step: 220320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:36,762-Speed 3257.42 samples/sec   Loss 0.8287   LearningRate 0.0116   Epoch: 13   Global Step: 220330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:39,853-Speed 3313.41 samples/sec   Loss 0.7819   LearningRate 0.0116   Epoch: 13   Global Step: 220340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:42,913-Speed 3347.09 samples/sec   Loss 0.7761   LearningRate 0.0116   Epoch: 13   Global Step: 220350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:46,002-Speed 3316.34 samples/sec   Loss 0.8118   LearningRate 0.0116   Epoch: 13   Global Step: 220360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:49,152-Speed 3252.00 samples/sec   Loss 0.7771   LearningRate 0.0116   Epoch: 13   Global Step: 220370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:52,251-Speed 3304.79 samples/sec   Loss 0.7938   LearningRate 0.0115   Epoch: 13   Global Step: 220380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:34:55,325-Speed 3331.59 samples/sec   Loss 0.7851   LearningRate 0.0115   Epoch: 13   Global Step: 220390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:34:58,398-Speed 3333.06 samples/sec   Loss 0.7881   LearningRate 0.0115   Epoch: 13   Global Step: 220400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:35:01,477-Speed 3326.26 samples/sec   Loss 0.7751   LearningRate 0.0115   Epoch: 13   Global Step: 220410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:35:04,587-Speed 3293.96 samples/sec   Loss 0.8287   LearningRate 0.0115   Epoch: 13   Global Step: 220420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:35:07,658-Speed 3334.29 samples/sec   Loss 0.7731   LearningRate 0.0115   Epoch: 13   Global Step: 220430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:35:10,720-Speed 3345.32 samples/sec   Loss 0.7844   LearningRate 0.0115   Epoch: 13   Global Step: 220440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:35:13,819-Speed 3305.60 samples/sec   Loss 0.7862   LearningRate 0.0115   Epoch: 13   Global Step: 220450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:35:16,904-Speed 3319.59 samples/sec   Loss 0.7767   LearningRate 0.0115   Epoch: 13   Global Step: 220460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:35:19,983-Speed 3326.56 samples/sec   Loss 0.8081   LearningRate 0.0115   Epoch: 13   Global Step: 220470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:23,067-Speed 3321.36 samples/sec   Loss 0.7809   LearningRate 0.0115   Epoch: 13   Global Step: 220480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:26,132-Speed 3341.09 samples/sec   Loss 0.7420   LearningRate 0.0115   Epoch: 13   Global Step: 220490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:29,252-Speed 3283.04 samples/sec   Loss 0.7897   LearningRate 0.0115   Epoch: 13   Global Step: 220500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:32,353-Speed 3303.23 samples/sec   Loss 0.7898   LearningRate 0.0115   Epoch: 13   Global Step: 220510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:35,436-Speed 3322.09 samples/sec   Loss 0.8025   LearningRate 0.0115   Epoch: 13   Global Step: 220520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:38,511-Speed 3330.71 samples/sec   Loss 0.8072   LearningRate 0.0115   Epoch: 13   Global Step: 220530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:41,604-Speed 3312.13 samples/sec   Loss 0.7816   LearningRate 0.0115   Epoch: 13   Global Step: 220540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:44,738-Speed 3268.13 samples/sec   Loss 0.7917   LearningRate 0.0115   Epoch: 13   Global Step: 220550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:47,837-Speed 3304.68 samples/sec   Loss 0.7746   LearningRate 0.0115   Epoch: 13   Global Step: 220560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:35:50,911-Speed 3331.60 samples/sec   Loss 0.7724   LearningRate 0.0115   Epoch: 13   Global Step: 220570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:35:53,977-Speed 3341.24 samples/sec   Loss 0.7661   LearningRate 0.0115   Epoch: 13   Global Step: 220580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:35:57,038-Speed 3345.87 samples/sec   Loss 0.7848   LearningRate 0.0115   Epoch: 13   Global Step: 220590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:00,097-Speed 3347.76 samples/sec   Loss 0.8059   LearningRate 0.0115   Epoch: 13   Global Step: 220600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:03,226-Speed 3273.45 samples/sec   Loss 0.8007   LearningRate 0.0115   Epoch: 13   Global Step: 220610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:06,305-Speed 3326.39 samples/sec   Loss 0.8061   LearningRate 0.0115   Epoch: 13   Global Step: 220620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:09,387-Speed 3324.32 samples/sec   Loss 0.7569   LearningRate 0.0115   Epoch: 13   Global Step: 220630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:12,481-Speed 3309.69 samples/sec   Loss 0.7784   LearningRate 0.0115   Epoch: 13   Global Step: 220640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:15,575-Speed 3310.84 samples/sec   Loss 0.7628   LearningRate 0.0115   Epoch: 13   Global Step: 220650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:18,705-Speed 3271.34 samples/sec   Loss 0.8006   LearningRate 0.0115   Epoch: 13   Global Step: 220660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:21,843-Speed 3264.11 samples/sec   Loss 0.7860   LearningRate 0.0115   Epoch: 13   Global Step: 220670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:24,906-Speed 3343.79 samples/sec   Loss 0.7744   LearningRate 0.0115   Epoch: 13   Global Step: 220680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:28,044-Speed 3264.45 samples/sec   Loss 0.8087   LearningRate 0.0115   Epoch: 13   Global Step: 220690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:31,147-Speed 3300.33 samples/sec   Loss 0.8163   LearningRate 0.0115   Epoch: 13   Global Step: 220700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:36:34,214-Speed 3339.90 samples/sec   Loss 0.7891   LearningRate 0.0115   Epoch: 13   Global Step: 220710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:37,275-Speed 3346.31 samples/sec   Loss 0.8149   LearningRate 0.0115   Epoch: 13   Global Step: 220720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:40,333-Speed 3349.50 samples/sec   Loss 0.7725   LearningRate 0.0115   Epoch: 13   Global Step: 220730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:43,418-Speed 3319.95 samples/sec   Loss 0.7643   LearningRate 0.0115   Epoch: 13   Global Step: 220740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:46,575-Speed 3243.90 samples/sec   Loss 0.7992   LearningRate 0.0115   Epoch: 13   Global Step: 220750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:49,653-Speed 3328.01 samples/sec   Loss 0.7625   LearningRate 0.0115   Epoch: 13   Global Step: 220760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:52,720-Speed 3339.29 samples/sec   Loss 0.8165   LearningRate 0.0115   Epoch: 13   Global Step: 220770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:55,855-Speed 3266.94 samples/sec   Loss 0.7758   LearningRate 0.0115   Epoch: 13   Global Step: 220780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:36:58,953-Speed 3306.12 samples/sec   Loss 0.7552   LearningRate 0.0115   Epoch: 13   Global Step: 220790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:37:02,045-Speed 3313.34 samples/sec   Loss 0.7745   LearningRate 0.0115   Epoch: 13   Global Step: 220800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:37:05,132-Speed 3317.66 samples/sec   Loss 0.8006   LearningRate 0.0115   Epoch: 13   Global Step: 220810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:37:08,194-Speed 3345.45 samples/sec   Loss 0.7688   LearningRate 0.0115   Epoch: 13   Global Step: 220820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:37:11,305-Speed 3292.11 samples/sec   Loss 0.7745   LearningRate 0.0115   Epoch: 13   Global Step: 220830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:37:14,385-Speed 3325.35 samples/sec   Loss 0.7962   LearningRate 0.0115   Epoch: 13   Global Step: 220840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:37:17,499-Speed 3288.74 samples/sec   Loss 0.8374   LearningRate 0.0115   Epoch: 13   Global Step: 220850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:20,567-Speed 3338.87 samples/sec   Loss 0.7679   LearningRate 0.0115   Epoch: 13   Global Step: 220860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:23,635-Speed 3337.41 samples/sec   Loss 0.7896   LearningRate 0.0114   Epoch: 13   Global Step: 220870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:26,789-Speed 3248.32 samples/sec   Loss 0.7806   LearningRate 0.0114   Epoch: 13   Global Step: 220880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:29,889-Speed 3303.88 samples/sec   Loss 0.7688   LearningRate 0.0114   Epoch: 13   Global Step: 220890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:33,026-Speed 3264.59 samples/sec   Loss 0.7591   LearningRate 0.0114   Epoch: 13   Global Step: 220900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:36,119-Speed 3311.61 samples/sec   Loss 0.8115   LearningRate 0.0114   Epoch: 13   Global Step: 220910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:39,190-Speed 3334.86 samples/sec   Loss 0.8012   LearningRate 0.0114   Epoch: 13   Global Step: 220920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:42,255-Speed 3342.35 samples/sec   Loss 0.7824   LearningRate 0.0114   Epoch: 13   Global Step: 220930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:45,325-Speed 3335.94 samples/sec   Loss 0.8102   LearningRate 0.0114   Epoch: 13   Global Step: 220940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:37:48,411-Speed 3319.42 samples/sec   Loss 0.8169   LearningRate 0.0114   Epoch: 13   Global Step: 220950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:37:51,493-Speed 3322.42 samples/sec   Loss 0.8367   LearningRate 0.0114   Epoch: 13   Global Step: 220960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:37:54,560-Speed 3339.78 samples/sec   Loss 0.8040   LearningRate 0.0114   Epoch: 13   Global Step: 220970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:37:57,738-Speed 3223.08 samples/sec   Loss 0.7886   LearningRate 0.0114   Epoch: 13   Global Step: 220980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:38:00,873-Speed 3267.72 samples/sec   Loss 0.8195   LearningRate 0.0114   Epoch: 13   Global Step: 220990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:38:03,939-Speed 3339.59 samples/sec   Loss 0.7761   LearningRate 0.0114   Epoch: 13   Global Step: 221000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:07,091-Speed 3250.00 samples/sec   Loss 0.7997   LearningRate 0.0114   Epoch: 13   Global Step: 221010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:10,177-Speed 3318.35 samples/sec   Loss 0.7981   LearningRate 0.0114   Epoch: 13   Global Step: 221020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:13,329-Speed 3249.76 samples/sec   Loss 0.7830   LearningRate 0.0114   Epoch: 13   Global Step: 221030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:16,402-Speed 3332.52 samples/sec   Loss 0.8118   LearningRate 0.0114   Epoch: 13   Global Step: 221040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:19,474-Speed 3334.47 samples/sec   Loss 0.7994   LearningRate 0.0114   Epoch: 13   Global Step: 221050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:22,544-Speed 3336.93 samples/sec   Loss 0.8342   LearningRate 0.0114   Epoch: 13   Global Step: 221060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:25,606-Speed 3345.29 samples/sec   Loss 0.7917   LearningRate 0.0114   Epoch: 13   Global Step: 221070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:28,674-Speed 3338.04 samples/sec   Loss 0.7967   LearningRate 0.0114   Epoch: 13   Global Step: 221080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:31,806-Speed 3269.44 samples/sec   Loss 0.7912   LearningRate 0.0114   Epoch: 13   Global Step: 221090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:34,912-Speed 3298.26 samples/sec   Loss 0.8045   LearningRate 0.0114   Epoch: 13   Global Step: 221100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:38:37,986-Speed 3331.47 samples/sec   Loss 0.8252   LearningRate 0.0114   Epoch: 13   Global Step: 221110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:38:41,064-Speed 3328.11 samples/sec   Loss 0.7993   LearningRate 0.0114   Epoch: 13   Global Step: 221120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:38:44,126-Speed 3344.12 samples/sec   Loss 0.7882   LearningRate 0.0114   Epoch: 13   Global Step: 221130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:38:47,198-Speed 3334.60 samples/sec   Loss 0.8243   LearningRate 0.0114   Epoch: 13   Global Step: 221140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:38:50,271-Speed 3333.37 samples/sec   Loss 0.7786   LearningRate 0.0114   Epoch: 13   Global Step: 221150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:53,338-Speed 3338.86 samples/sec   Loss 0.8077   LearningRate 0.0114   Epoch: 13   Global Step: 221160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:56,553-Speed 3186.28 samples/sec   Loss 0.8070   LearningRate 0.0114   Epoch: 13   Global Step: 221170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:38:59,735-Speed 3218.13 samples/sec   Loss 0.8005   LearningRate 0.0114   Epoch: 13   Global Step: 221180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:39:02,852-Speed 3286.75 samples/sec   Loss 0.8063   LearningRate 0.0114   Epoch: 13   Global Step: 221190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:39:05,935-Speed 3321.80 samples/sec   Loss 0.7720   LearningRate 0.0114   Epoch: 13   Global Step: 221200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:39:09,000-Speed 3342.07 samples/sec   Loss 0.8121   LearningRate 0.0114   Epoch: 13   Global Step: 221210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:39:12,092-Speed 3312.46 samples/sec   Loss 0.7946   LearningRate 0.0114   Epoch: 13   Global Step: 221220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:39:15,221-Speed 3272.88 samples/sec   Loss 0.7842   LearningRate 0.0114   Epoch: 13   Global Step: 221230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:39:18,321-Speed 3304.17 samples/sec   Loss 0.7649   LearningRate 0.0114   Epoch: 13   Global Step: 221240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:39:21,396-Speed 3331.31 samples/sec   Loss 0.7660   LearningRate 0.0114   Epoch: 13   Global Step: 221250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:24,472-Speed 3329.68 samples/sec   Loss 0.7867   LearningRate 0.0114   Epoch: 13   Global Step: 221260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:27,627-Speed 3246.43 samples/sec   Loss 0.8336   LearningRate 0.0114   Epoch: 13   Global Step: 221270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:30,700-Speed 3332.96 samples/sec   Loss 0.7994   LearningRate 0.0114   Epoch: 13   Global Step: 221280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:33,762-Speed 3344.81 samples/sec   Loss 0.8243   LearningRate 0.0114   Epoch: 13   Global Step: 221290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:36,878-Speed 3286.93 samples/sec   Loss 0.8515   LearningRate 0.0114   Epoch: 13   Global Step: 221300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:39,942-Speed 3342.50 samples/sec   Loss 0.7982   LearningRate 0.0114   Epoch: 13   Global Step: 221310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:43,044-Speed 3301.71 samples/sec   Loss 0.7796   LearningRate 0.0114   Epoch: 13   Global Step: 221320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:46,123-Speed 3326.65 samples/sec   Loss 0.7894   LearningRate 0.0114   Epoch: 13   Global Step: 221330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:49,191-Speed 3339.14 samples/sec   Loss 0.7680   LearningRate 0.0114   Epoch: 13   Global Step: 221340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:39:52,268-Speed 3327.65 samples/sec   Loss 0.7945   LearningRate 0.0114   Epoch: 13   Global Step: 221350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:39:55,398-Speed 3272.46 samples/sec   Loss 0.8194   LearningRate 0.0113   Epoch: 13   Global Step: 221360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:39:58,495-Speed 3307.07 samples/sec   Loss 0.8096   LearningRate 0.0113   Epoch: 13   Global Step: 221370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:01,578-Speed 3322.44 samples/sec   Loss 0.7844   LearningRate 0.0113   Epoch: 13   Global Step: 221380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:04,653-Speed 3331.02 samples/sec   Loss 0.8114   LearningRate 0.0113   Epoch: 13   Global Step: 221390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:07,782-Speed 3273.39 samples/sec   Loss 0.7797   LearningRate 0.0113   Epoch: 13   Global Step: 221400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:10,933-Speed 3250.87 samples/sec   Loss 0.7963   LearningRate 0.0113   Epoch: 13   Global Step: 221410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:14,015-Speed 3323.60 samples/sec   Loss 0.7981   LearningRate 0.0113   Epoch: 13   Global Step: 221420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:17,123-Speed 3295.41 samples/sec   Loss 0.7743   LearningRate 0.0113   Epoch: 13   Global Step: 221430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:20,206-Speed 3321.64 samples/sec   Loss 0.8165   LearningRate 0.0113   Epoch: 13   Global Step: 221440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:23,336-Speed 3272.37 samples/sec   Loss 0.7873   LearningRate 0.0113   Epoch: 13   Global Step: 221450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:26,405-Speed 3337.54 samples/sec   Loss 0.8310   LearningRate 0.0113   Epoch: 13   Global Step: 221460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:29,490-Speed 3319.69 samples/sec   Loss 0.7973   LearningRate 0.0113   Epoch: 13   Global Step: 221470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:32,572-Speed 3323.34 samples/sec   Loss 0.8283   LearningRate 0.0113   Epoch: 13   Global Step: 221480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:35,640-Speed 3338.17 samples/sec   Loss 0.7813   LearningRate 0.0113   Epoch: 13   Global Step: 221490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:38,724-Speed 3321.44 samples/sec   Loss 0.8144   LearningRate 0.0113   Epoch: 13   Global Step: 221500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:41,789-Speed 3341.86 samples/sec   Loss 0.7412   LearningRate 0.0113   Epoch: 13   Global Step: 221510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:44,881-Speed 3312.58 samples/sec   Loss 0.7879   LearningRate 0.0113   Epoch: 13   Global Step: 221520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:47,948-Speed 3339.37 samples/sec   Loss 0.7860   LearningRate 0.0113   Epoch: 13   Global Step: 221530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:51,014-Speed 3340.15 samples/sec   Loss 0.8089   LearningRate 0.0113   Epoch: 13   Global Step: 221540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:54,220-Speed 3195.34 samples/sec   Loss 0.8072   LearningRate 0.0113   Epoch: 13   Global Step: 221550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:40:57,324-Speed 3299.67 samples/sec   Loss 0.7688   LearningRate 0.0113   Epoch: 13   Global Step: 221560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:41:00,478-Speed 3247.05 samples/sec   Loss 0.7896   LearningRate 0.0113   Epoch: 13   Global Step: 221570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:03,544-Speed 3340.39 samples/sec   Loss 0.8049   LearningRate 0.0113   Epoch: 13   Global Step: 221580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:06,609-Speed 3342.20 samples/sec   Loss 0.7707   LearningRate 0.0113   Epoch: 13   Global Step: 221590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:09,733-Speed 3279.20 samples/sec   Loss 0.8499   LearningRate 0.0113   Epoch: 13   Global Step: 221600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:12,921-Speed 3211.96 samples/sec   Loss 0.8085   LearningRate 0.0113   Epoch: 13   Global Step: 221610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:15,992-Speed 3335.27 samples/sec   Loss 0.8260   LearningRate 0.0113   Epoch: 13   Global Step: 221620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:19,064-Speed 3334.86 samples/sec   Loss 0.8337   LearningRate 0.0113   Epoch: 13   Global Step: 221630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:22,261-Speed 3202.82 samples/sec   Loss 0.7772   LearningRate 0.0113   Epoch: 13   Global Step: 221640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:25,467-Speed 3195.32 samples/sec   Loss 0.8110   LearningRate 0.0113   Epoch: 13   Global Step: 221650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:28,644-Speed 3223.10 samples/sec   Loss 0.8444   LearningRate 0.0113   Epoch: 13   Global Step: 221660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:31,719-Speed 3331.85 samples/sec   Loss 0.7979   LearningRate 0.0113   Epoch: 13   Global Step: 221670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:34,797-Speed 3327.00 samples/sec   Loss 0.8060   LearningRate 0.0113   Epoch: 13   Global Step: 221680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:37,927-Speed 3272.98 samples/sec   Loss 0.7891   LearningRate 0.0113   Epoch: 13   Global Step: 221690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:40,993-Speed 3340.61 samples/sec   Loss 0.7908   LearningRate 0.0113   Epoch: 13   Global Step: 221700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:44,065-Speed 3333.73 samples/sec   Loss 0.7679   LearningRate 0.0113   Epoch: 13   Global Step: 221710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:47,146-Speed 3324.12 samples/sec   Loss 0.8385   LearningRate 0.0113   Epoch: 13   Global Step: 221720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:50,290-Speed 3257.54 samples/sec   Loss 0.8093   LearningRate 0.0113   Epoch: 13   Global Step: 221730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:53,429-Speed 3263.45 samples/sec   Loss 0.8112   LearningRate 0.0113   Epoch: 13   Global Step: 221740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:56,501-Speed 3333.73 samples/sec   Loss 0.7622   LearningRate 0.0113   Epoch: 13   Global Step: 221750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:41:59,581-Speed 3326.17 samples/sec   Loss 0.7736   LearningRate 0.0113   Epoch: 13   Global Step: 221760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:02,642-Speed 3346.04 samples/sec   Loss 0.8229   LearningRate 0.0113   Epoch: 13   Global Step: 221770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:05,709-Speed 3339.68 samples/sec   Loss 0.8053   LearningRate 0.0113   Epoch: 13   Global Step: 221780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:08,773-Speed 3342.90 samples/sec   Loss 0.7662   LearningRate 0.0113   Epoch: 13   Global Step: 221790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:11,837-Speed 3342.20 samples/sec   Loss 0.8074   LearningRate 0.0113   Epoch: 13   Global Step: 221800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:14,918-Speed 3323.84 samples/sec   Loss 0.7827   LearningRate 0.0113   Epoch: 13   Global Step: 221810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:18,034-Speed 3287.32 samples/sec   Loss 0.8063   LearningRate 0.0113   Epoch: 13   Global Step: 221820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:21,237-Speed 3197.80 samples/sec   Loss 0.8321   LearningRate 0.0113   Epoch: 13   Global Step: 221830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:24,436-Speed 3201.63 samples/sec   Loss 0.8030   LearningRate 0.0113   Epoch: 13   Global Step: 221840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:27,567-Speed 3271.91 samples/sec   Loss 0.8245   LearningRate 0.0113   Epoch: 13   Global Step: 221850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:30,674-Speed 3296.51 samples/sec   Loss 0.8291   LearningRate 0.0112   Epoch: 13   Global Step: 221860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:33,740-Speed 3340.32 samples/sec   Loss 0.8272   LearningRate 0.0112   Epoch: 13   Global Step: 221870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:36,805-Speed 3341.83 samples/sec   Loss 0.7971   LearningRate 0.0112   Epoch: 13   Global Step: 221880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:39,935-Speed 3271.78 samples/sec   Loss 0.8283   LearningRate 0.0112   Epoch: 13   Global Step: 221890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:43,013-Speed 3327.16 samples/sec   Loss 0.8051   LearningRate 0.0112   Epoch: 13   Global Step: 221900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:46,085-Speed 3334.91 samples/sec   Loss 0.8744   LearningRate 0.0112   Epoch: 13   Global Step: 221910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:49,149-Speed 3342.68 samples/sec   Loss 0.8198   LearningRate 0.0112   Epoch: 13   Global Step: 221920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:52,239-Speed 3314.29 samples/sec   Loss 0.8660   LearningRate 0.0112   Epoch: 13   Global Step: 221930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:55,311-Speed 3334.64 samples/sec   Loss 0.7843   LearningRate 0.0112   Epoch: 13   Global Step: 221940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:42:58,389-Speed 3327.78 samples/sec   Loss 0.8502   LearningRate 0.0112   Epoch: 13   Global Step: 221950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:43:01,467-Speed 3326.54 samples/sec   Loss 0.8172   LearningRate 0.0112   Epoch: 13   Global Step: 221960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:43:04,562-Speed 3309.84 samples/sec   Loss 0.8283   LearningRate 0.0112   Epoch: 13   Global Step: 221970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:43:07,644-Speed 3323.21 samples/sec   Loss 0.8243   LearningRate 0.0112   Epoch: 13   Global Step: 221980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:43:10,743-Speed 3305.02 samples/sec   Loss 0.8482   LearningRate 0.0112   Epoch: 13   Global Step: 221990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:43:13,839-Speed 3307.73 samples/sec   Loss 0.8179   LearningRate 0.0112   Epoch: 13   Global Step: 222000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:43:57,552-[lfw][222000]XNorm: 21.454036
Training: 2022-04-11 22:43:57,553-[lfw][222000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-11 22:43:57,553-[lfw][222000]Accuracy-Highest: 0.99817
Training: 2022-04-11 22:44:48,075-[cfp_fp][222000]XNorm: 22.308540
Training: 2022-04-11 22:44:48,075-[cfp_fp][222000]Accuracy-Flip: 0.99129+-0.00364
Training: 2022-04-11 22:44:48,076-[cfp_fp][222000]Accuracy-Highest: 0.99129
Training: 2022-04-11 22:45:31,675-[agedb_30][222000]XNorm: 22.937775
Training: 2022-04-11 22:45:31,676-[agedb_30][222000]Accuracy-Flip: 0.98367+-0.00745
Training: 2022-04-11 22:45:31,676-[agedb_30][222000]Accuracy-Highest: 0.98567
Training: 2022-04-11 22:45:34,732-Speed 72.68 samples/sec   Loss 0.8371   LearningRate 0.0112   Epoch: 13   Global Step: 222010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:45:37,815-Speed 3322.47 samples/sec   Loss 0.7893   LearningRate 0.0112   Epoch: 13   Global Step: 222020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:45:40,881-Speed 3340.80 samples/sec   Loss 0.7812   LearningRate 0.0112   Epoch: 13   Global Step: 222030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:45:43,938-Speed 3350.75 samples/sec   Loss 0.7728   LearningRate 0.0112   Epoch: 13   Global Step: 222040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:45:47,015-Speed 3328.41 samples/sec   Loss 0.8329   LearningRate 0.0112   Epoch: 13   Global Step: 222050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:45:50,182-Speed 3234.26 samples/sec   Loss 0.8237   LearningRate 0.0112   Epoch: 13   Global Step: 222060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:45:53,312-Speed 3272.02 samples/sec   Loss 0.8257   LearningRate 0.0112   Epoch: 13   Global Step: 222070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:45:56,429-Speed 3286.25 samples/sec   Loss 0.7938   LearningRate 0.0112   Epoch: 13   Global Step: 222080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:45:59,585-Speed 3244.46 samples/sec   Loss 0.7986   LearningRate 0.0112   Epoch: 13   Global Step: 222090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:46:02,766-Speed 3219.69 samples/sec   Loss 0.8008   LearningRate 0.0112   Epoch: 13   Global Step: 222100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:46:06,007-Speed 3160.66 samples/sec   Loss 0.8194   LearningRate 0.0112   Epoch: 13   Global Step: 222110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:09,068-Speed 3345.78 samples/sec   Loss 0.8320   LearningRate 0.0112   Epoch: 13   Global Step: 222120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:12,185-Speed 3286.73 samples/sec   Loss 0.8450   LearningRate 0.0112   Epoch: 13   Global Step: 222130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:15,320-Speed 3266.41 samples/sec   Loss 0.8305   LearningRate 0.0112   Epoch: 13   Global Step: 222140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:18,441-Speed 3282.24 samples/sec   Loss 0.8094   LearningRate 0.0112   Epoch: 13   Global Step: 222150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:21,528-Speed 3317.98 samples/sec   Loss 0.8056   LearningRate 0.0112   Epoch: 13   Global Step: 222160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:24,649-Speed 3281.46 samples/sec   Loss 0.8286   LearningRate 0.0112   Epoch: 13   Global Step: 222170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:27,832-Speed 3217.39 samples/sec   Loss 0.7989   LearningRate 0.0112   Epoch: 13   Global Step: 222180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:31,050-Speed 3183.14 samples/sec   Loss 0.8179   LearningRate 0.0112   Epoch: 13   Global Step: 222190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:34,174-Speed 3278.71 samples/sec   Loss 0.7898   LearningRate 0.0112   Epoch: 13   Global Step: 222200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:37,233-Speed 3347.70 samples/sec   Loss 0.7949   LearningRate 0.0112   Epoch: 13   Global Step: 222210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:40,345-Speed 3291.58 samples/sec   Loss 0.8149   LearningRate 0.0112   Epoch: 13   Global Step: 222220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:43,493-Speed 3253.39 samples/sec   Loss 0.7828   LearningRate 0.0112   Epoch: 13   Global Step: 222230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:46,591-Speed 3306.25 samples/sec   Loss 0.7932   LearningRate 0.0112   Epoch: 13   Global Step: 222240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:49,759-Speed 3233.59 samples/sec   Loss 0.8097   LearningRate 0.0112   Epoch: 13   Global Step: 222250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:46:52,855-Speed 3308.07 samples/sec   Loss 0.8232   LearningRate 0.0112   Epoch: 13   Global Step: 222260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:46:56,032-Speed 3223.60 samples/sec   Loss 0.7864   LearningRate 0.0112   Epoch: 13   Global Step: 222270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:46:59,202-Speed 3231.37 samples/sec   Loss 0.7755   LearningRate 0.0112   Epoch: 13   Global Step: 222280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:47:02,330-Speed 3274.40 samples/sec   Loss 0.8005   LearningRate 0.0112   Epoch: 13   Global Step: 222290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:47:05,407-Speed 3328.48 samples/sec   Loss 0.8170   LearningRate 0.0112   Epoch: 13   Global Step: 222300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:47:08,500-Speed 3312.01 samples/sec   Loss 0.7921   LearningRate 0.0112   Epoch: 13   Global Step: 222310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:47:11,678-Speed 3222.56 samples/sec   Loss 0.8014   LearningRate 0.0112   Epoch: 13   Global Step: 222320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:47:14,798-Speed 3283.01 samples/sec   Loss 0.7940   LearningRate 0.0112   Epoch: 13   Global Step: 222330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:47:17,899-Speed 3302.18 samples/sec   Loss 0.8075   LearningRate 0.0112   Epoch: 13   Global Step: 222340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:47:20,970-Speed 3335.42 samples/sec   Loss 0.7924   LearningRate 0.0112   Epoch: 13   Global Step: 222350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:47:24,058-Speed 3316.50 samples/sec   Loss 0.8180   LearningRate 0.0111   Epoch: 13   Global Step: 222360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:27,148-Speed 3314.78 samples/sec   Loss 0.7883   LearningRate 0.0111   Epoch: 13   Global Step: 222370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:30,235-Speed 3318.26 samples/sec   Loss 0.8095   LearningRate 0.0111   Epoch: 13   Global Step: 222380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:33,330-Speed 3309.02 samples/sec   Loss 0.8179   LearningRate 0.0111   Epoch: 13   Global Step: 222390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:36,400-Speed 3336.82 samples/sec   Loss 0.8378   LearningRate 0.0111   Epoch: 13   Global Step: 222400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:39,500-Speed 3303.66 samples/sec   Loss 0.8137   LearningRate 0.0111   Epoch: 13   Global Step: 222410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:42,623-Speed 3279.86 samples/sec   Loss 0.8076   LearningRate 0.0111   Epoch: 13   Global Step: 222420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:45,698-Speed 3331.66 samples/sec   Loss 0.8184   LearningRate 0.0111   Epoch: 13   Global Step: 222430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:48,807-Speed 3294.04 samples/sec   Loss 0.7930   LearningRate 0.0111   Epoch: 13   Global Step: 222440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:51,925-Speed 3284.11 samples/sec   Loss 0.8432   LearningRate 0.0111   Epoch: 13   Global Step: 222450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:47:54,993-Speed 3338.24 samples/sec   Loss 0.8301   LearningRate 0.0111   Epoch: 13   Global Step: 222460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:47:58,065-Speed 3335.12 samples/sec   Loss 0.9093   LearningRate 0.0111   Epoch: 13   Global Step: 222470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:48:01,171-Speed 3297.17 samples/sec   Loss 0.8141   LearningRate 0.0111   Epoch: 13   Global Step: 222480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:48:04,246-Speed 3331.68 samples/sec   Loss 0.8548   LearningRate 0.0111   Epoch: 13   Global Step: 222490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:48:07,346-Speed 3303.09 samples/sec   Loss 0.8142   LearningRate 0.0111   Epoch: 13   Global Step: 222500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:48:10,424-Speed 3327.94 samples/sec   Loss 0.8128   LearningRate 0.0111   Epoch: 13   Global Step: 222510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:13,525-Speed 3303.06 samples/sec   Loss 0.8471   LearningRate 0.0111   Epoch: 13   Global Step: 222520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:16,684-Speed 3241.96 samples/sec   Loss 0.7858   LearningRate 0.0111   Epoch: 13   Global Step: 222530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:19,751-Speed 3339.39 samples/sec   Loss 0.8155   LearningRate 0.0111   Epoch: 13   Global Step: 222540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:22,826-Speed 3330.69 samples/sec   Loss 0.7905   LearningRate 0.0111   Epoch: 13   Global Step: 222550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:25,901-Speed 3331.42 samples/sec   Loss 0.8399   LearningRate 0.0111   Epoch: 13   Global Step: 222560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:28,990-Speed 3315.40 samples/sec   Loss 0.8289   LearningRate 0.0111   Epoch: 13   Global Step: 222570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:32,135-Speed 3257.09 samples/sec   Loss 0.8333   LearningRate 0.0111   Epoch: 13   Global Step: 222580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:35,204-Speed 3337.76 samples/sec   Loss 0.8325   LearningRate 0.0111   Epoch: 13   Global Step: 222590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:38,320-Speed 3286.33 samples/sec   Loss 0.8424   LearningRate 0.0111   Epoch: 13   Global Step: 222600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:48:41,406-Speed 3318.89 samples/sec   Loss 0.8212   LearningRate 0.0111   Epoch: 13   Global Step: 222610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:48:44,478-Speed 3334.05 samples/sec   Loss 0.8004   LearningRate 0.0111   Epoch: 13   Global Step: 222620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:48:47,621-Speed 3259.24 samples/sec   Loss 0.8032   LearningRate 0.0111   Epoch: 13   Global Step: 222630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:48:50,759-Speed 3263.48 samples/sec   Loss 0.8108   LearningRate 0.0111   Epoch: 13   Global Step: 222640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:48:53,957-Speed 3203.05 samples/sec   Loss 0.8421   LearningRate 0.0111   Epoch: 13   Global Step: 222650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:48:57,022-Speed 3342.02 samples/sec   Loss 0.8553   LearningRate 0.0111   Epoch: 13   Global Step: 222660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:49:00,191-Speed 3232.31 samples/sec   Loss 0.8472   LearningRate 0.0111   Epoch: 13   Global Step: 222670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:49:03,316-Speed 3276.97 samples/sec   Loss 0.8244   LearningRate 0.0111   Epoch: 13   Global Step: 222680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:49:06,406-Speed 3314.75 samples/sec   Loss 0.8341   LearningRate 0.0111   Epoch: 13   Global Step: 222690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:09,493-Speed 3318.42 samples/sec   Loss 0.8394   LearningRate 0.0111   Epoch: 13   Global Step: 222700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:12,563-Speed 3335.57 samples/sec   Loss 0.8503   LearningRate 0.0111   Epoch: 13   Global Step: 222710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:15,666-Speed 3302.95 samples/sec   Loss 0.8438   LearningRate 0.0111   Epoch: 13   Global Step: 222720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:18,808-Speed 3258.77 samples/sec   Loss 0.8481   LearningRate 0.0111   Epoch: 13   Global Step: 222730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:21,963-Speed 3246.17 samples/sec   Loss 0.8275   LearningRate 0.0111   Epoch: 13   Global Step: 222740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:25,069-Speed 3298.11 samples/sec   Loss 0.7834   LearningRate 0.0111   Epoch: 13   Global Step: 222750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:28,177-Speed 3295.79 samples/sec   Loss 0.8199   LearningRate 0.0111   Epoch: 13   Global Step: 222760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:31,263-Speed 3318.58 samples/sec   Loss 0.8132   LearningRate 0.0111   Epoch: 13   Global Step: 222770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:34,385-Speed 3281.11 samples/sec   Loss 0.7876   LearningRate 0.0111   Epoch: 13   Global Step: 222780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:37,469-Speed 3321.02 samples/sec   Loss 0.8269   LearningRate 0.0111   Epoch: 13   Global Step: 222790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:49:40,542-Speed 3332.53 samples/sec   Loss 0.8013   LearningRate 0.0111   Epoch: 13   Global Step: 222800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:49:43,608-Speed 3341.30 samples/sec   Loss 0.8311   LearningRate 0.0111   Epoch: 13   Global Step: 222810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:46,744-Speed 3266.26 samples/sec   Loss 0.7993   LearningRate 0.0111   Epoch: 13   Global Step: 222820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:49,862-Speed 3284.97 samples/sec   Loss 0.7933   LearningRate 0.0111   Epoch: 13   Global Step: 222830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:52,929-Speed 3339.42 samples/sec   Loss 0.8098   LearningRate 0.0111   Epoch: 13   Global Step: 222840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:56,057-Speed 3274.36 samples/sec   Loss 0.7703   LearningRate 0.0111   Epoch: 13   Global Step: 222850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:49:59,256-Speed 3201.45 samples/sec   Loss 0.8165   LearningRate 0.0110   Epoch: 13   Global Step: 222860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:02,364-Speed 3295.15 samples/sec   Loss 0.8325   LearningRate 0.0110   Epoch: 13   Global Step: 222870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:05,440-Speed 3330.29 samples/sec   Loss 0.8518   LearningRate 0.0110   Epoch: 13   Global Step: 222880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:08,551-Speed 3292.34 samples/sec   Loss 0.8218   LearningRate 0.0110   Epoch: 13   Global Step: 222890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:11,749-Speed 3201.98 samples/sec   Loss 0.8187   LearningRate 0.0110   Epoch: 13   Global Step: 222900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:14,845-Speed 3308.27 samples/sec   Loss 0.8351   LearningRate 0.0110   Epoch: 13   Global Step: 222910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:50:17,909-Speed 3343.78 samples/sec   Loss 0.8404   LearningRate 0.0110   Epoch: 13   Global Step: 222920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:50:20,988-Speed 3326.34 samples/sec   Loss 0.8006   LearningRate 0.0110   Epoch: 13   Global Step: 222930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:50:24,141-Speed 3248.63 samples/sec   Loss 0.7918   LearningRate 0.0110   Epoch: 13   Global Step: 222940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:27,207-Speed 3340.11 samples/sec   Loss 0.8479   LearningRate 0.0110   Epoch: 13   Global Step: 222950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:30,292-Speed 3320.08 samples/sec   Loss 0.8078   LearningRate 0.0110   Epoch: 13   Global Step: 222960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:33,362-Speed 3336.71 samples/sec   Loss 0.8605   LearningRate 0.0110   Epoch: 13   Global Step: 222970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:36,457-Speed 3309.07 samples/sec   Loss 0.8435   LearningRate 0.0110   Epoch: 13   Global Step: 222980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:39,597-Speed 3261.93 samples/sec   Loss 0.8268   LearningRate 0.0110   Epoch: 13   Global Step: 222990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:42,673-Speed 3329.21 samples/sec   Loss 0.8615   LearningRate 0.0110   Epoch: 13   Global Step: 223000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:45,746-Speed 3333.18 samples/sec   Loss 0.8409   LearningRate 0.0110   Epoch: 13   Global Step: 223010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:48,865-Speed 3284.00 samples/sec   Loss 0.8574   LearningRate 0.0110   Epoch: 13   Global Step: 223020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:52,028-Speed 3238.78 samples/sec   Loss 0.8454   LearningRate 0.0110   Epoch: 13   Global Step: 223030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:50:55,098-Speed 3335.32 samples/sec   Loss 0.8596   LearningRate 0.0110   Epoch: 13   Global Step: 223040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:50:58,189-Speed 3314.42 samples/sec   Loss 0.8211   LearningRate 0.0110   Epoch: 13   Global Step: 223050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:01,346-Speed 3243.97 samples/sec   Loss 0.8684   LearningRate 0.0110   Epoch: 13   Global Step: 223060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:04,439-Speed 3310.87 samples/sec   Loss 0.8508   LearningRate 0.0110   Epoch: 13   Global Step: 223070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:07,544-Speed 3298.64 samples/sec   Loss 0.8278   LearningRate 0.0110   Epoch: 13   Global Step: 223080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:10,689-Speed 3257.17 samples/sec   Loss 0.8778   LearningRate 0.0110   Epoch: 13   Global Step: 223090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:13,808-Speed 3284.19 samples/sec   Loss 0.8020   LearningRate 0.0110   Epoch: 13   Global Step: 223100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:16,902-Speed 3310.26 samples/sec   Loss 0.8136   LearningRate 0.0110   Epoch: 13   Global Step: 223110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:19,969-Speed 3340.06 samples/sec   Loss 0.8086   LearningRate 0.0110   Epoch: 13   Global Step: 223120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:23,040-Speed 3334.31 samples/sec   Loss 0.8451   LearningRate 0.0110   Epoch: 13   Global Step: 223130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:26,120-Speed 3325.20 samples/sec   Loss 0.8242   LearningRate 0.0110   Epoch: 13   Global Step: 223140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:51:29,181-Speed 3346.41 samples/sec   Loss 0.8310   LearningRate 0.0110   Epoch: 13   Global Step: 223150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:51:32,236-Speed 3352.43 samples/sec   Loss 0.8377   LearningRate 0.0110   Epoch: 13   Global Step: 223160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:51:35,327-Speed 3313.99 samples/sec   Loss 0.8257   LearningRate 0.0110   Epoch: 13   Global Step: 223170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:51:38,409-Speed 3323.73 samples/sec   Loss 0.8161   LearningRate 0.0110   Epoch: 13   Global Step: 223180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:51:41,480-Speed 3335.00 samples/sec   Loss 0.8472   LearningRate 0.0110   Epoch: 13   Global Step: 223190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:51:44,614-Speed 3268.22 samples/sec   Loss 0.8487   LearningRate 0.0110   Epoch: 13   Global Step: 223200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:51:47,783-Speed 3231.28 samples/sec   Loss 0.8184   LearningRate 0.0110   Epoch: 13   Global Step: 223210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:51:50,922-Speed 3263.02 samples/sec   Loss 0.8137   LearningRate 0.0110   Epoch: 13   Global Step: 223220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:51:54,003-Speed 3323.92 samples/sec   Loss 0.8181   LearningRate 0.0110   Epoch: 13   Global Step: 223230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:51:57,073-Speed 3336.28 samples/sec   Loss 0.8340   LearningRate 0.0110   Epoch: 13   Global Step: 223240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:52:00,148-Speed 3331.40 samples/sec   Loss 0.7909   LearningRate 0.0110   Epoch: 13   Global Step: 223250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:52:03,232-Speed 3321.27 samples/sec   Loss 0.8326   LearningRate 0.0110   Epoch: 13   Global Step: 223260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:06,303-Speed 3335.37 samples/sec   Loss 0.8452   LearningRate 0.0110   Epoch: 13   Global Step: 223270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:09,369-Speed 3339.83 samples/sec   Loss 0.7922   LearningRate 0.0110   Epoch: 13   Global Step: 223280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:12,460-Speed 3313.72 samples/sec   Loss 0.8368   LearningRate 0.0110   Epoch: 13   Global Step: 223290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:15,537-Speed 3329.35 samples/sec   Loss 0.8594   LearningRate 0.0110   Epoch: 13   Global Step: 223300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:18,619-Speed 3322.68 samples/sec   Loss 0.8139   LearningRate 0.0110   Epoch: 13   Global Step: 223310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:21,788-Speed 3232.33 samples/sec   Loss 0.8679   LearningRate 0.0110   Epoch: 13   Global Step: 223320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:24,855-Speed 3339.36 samples/sec   Loss 0.8149   LearningRate 0.0110   Epoch: 13   Global Step: 223330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:27,935-Speed 3325.37 samples/sec   Loss 0.8230   LearningRate 0.0110   Epoch: 13   Global Step: 223340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:31,054-Speed 3283.89 samples/sec   Loss 0.8279   LearningRate 0.0110   Epoch: 13   Global Step: 223350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:34,129-Speed 3331.76 samples/sec   Loss 0.8319   LearningRate 0.0109   Epoch: 13   Global Step: 223360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:52:37,223-Speed 3310.46 samples/sec   Loss 0.8241   LearningRate 0.0109   Epoch: 13   Global Step: 223370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:40,343-Speed 3282.14 samples/sec   Loss 0.8203   LearningRate 0.0109   Epoch: 13   Global Step: 223380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:43,509-Speed 3234.69 samples/sec   Loss 0.8393   LearningRate 0.0109   Epoch: 13   Global Step: 223390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:46,603-Speed 3311.15 samples/sec   Loss 0.8303   LearningRate 0.0109   Epoch: 13   Global Step: 223400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:49,725-Speed 3280.43 samples/sec   Loss 0.7911   LearningRate 0.0109   Epoch: 13   Global Step: 223410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:52,809-Speed 3320.73 samples/sec   Loss 0.8368   LearningRate 0.0109   Epoch: 13   Global Step: 223420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:55,892-Speed 3322.99 samples/sec   Loss 0.7899   LearningRate 0.0109   Epoch: 13   Global Step: 223430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:52:59,017-Speed 3276.53 samples/sec   Loss 0.8105   LearningRate 0.0109   Epoch: 13   Global Step: 223440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:02,121-Speed 3300.03 samples/sec   Loss 0.8115   LearningRate 0.0109   Epoch: 13   Global Step: 223450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:05,201-Speed 3326.65 samples/sec   Loss 0.8042   LearningRate 0.0109   Epoch: 13   Global Step: 223460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:08,293-Speed 3312.23 samples/sec   Loss 0.8638   LearningRate 0.0109   Epoch: 13   Global Step: 223470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:53:11,416-Speed 3279.20 samples/sec   Loss 0.8730   LearningRate 0.0109   Epoch: 13   Global Step: 223480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:53:14,555-Speed 3263.22 samples/sec   Loss 0.8423   LearningRate 0.0109   Epoch: 13   Global Step: 223490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:53:17,695-Speed 3261.30 samples/sec   Loss 0.8224   LearningRate 0.0109   Epoch: 13   Global Step: 223500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:53:20,765-Speed 3336.76 samples/sec   Loss 0.8083   LearningRate 0.0109   Epoch: 13   Global Step: 223510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:53:23,839-Speed 3331.43 samples/sec   Loss 0.8415   LearningRate 0.0109   Epoch: 13   Global Step: 223520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:53:26,928-Speed 3316.87 samples/sec   Loss 0.8469   LearningRate 0.0109   Epoch: 13   Global Step: 223530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:53:30,000-Speed 3333.70 samples/sec   Loss 0.7894   LearningRate 0.0109   Epoch: 13   Global Step: 223540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:53:33,081-Speed 3324.73 samples/sec   Loss 0.8003   LearningRate 0.0109   Epoch: 13   Global Step: 223550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:53:36,129-Speed 3359.93 samples/sec   Loss 0.8228   LearningRate 0.0109   Epoch: 13   Global Step: 223560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:39,196-Speed 3339.17 samples/sec   Loss 0.8248   LearningRate 0.0109   Epoch: 13   Global Step: 223570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:42,275-Speed 3326.23 samples/sec   Loss 0.8176   LearningRate 0.0109   Epoch: 13   Global Step: 223580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:45,435-Speed 3241.96 samples/sec   Loss 0.7901   LearningRate 0.0109   Epoch: 13   Global Step: 223590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:48,613-Speed 3222.95 samples/sec   Loss 0.7970   LearningRate 0.0109   Epoch: 13   Global Step: 223600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:51,728-Speed 3288.00 samples/sec   Loss 0.8290   LearningRate 0.0109   Epoch: 13   Global Step: 223610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:54,874-Speed 3255.35 samples/sec   Loss 0.8262   LearningRate 0.0109   Epoch: 13   Global Step: 223620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:53:58,023-Speed 3252.68 samples/sec   Loss 0.8182   LearningRate 0.0109   Epoch: 13   Global Step: 223630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:01,111-Speed 3316.76 samples/sec   Loss 0.8194   LearningRate 0.0109   Epoch: 13   Global Step: 223640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:04,192-Speed 3324.86 samples/sec   Loss 0.8387   LearningRate 0.0109   Epoch: 13   Global Step: 223650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:07,273-Speed 3324.58 samples/sec   Loss 0.8274   LearningRate 0.0109   Epoch: 13   Global Step: 223660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:54:10,344-Speed 3334.43 samples/sec   Loss 0.8224   LearningRate 0.0109   Epoch: 13   Global Step: 223670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:13,418-Speed 3332.33 samples/sec   Loss 0.8260   LearningRate 0.0109   Epoch: 13   Global Step: 223680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:16,572-Speed 3247.46 samples/sec   Loss 0.8386   LearningRate 0.0109   Epoch: 13   Global Step: 223690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:19,684-Speed 3290.52 samples/sec   Loss 0.8522   LearningRate 0.0109   Epoch: 13   Global Step: 223700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:22,959-Speed 3127.81 samples/sec   Loss 0.8390   LearningRate 0.0109   Epoch: 13   Global Step: 223710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:26,107-Speed 3253.88 samples/sec   Loss 0.8462   LearningRate 0.0109   Epoch: 13   Global Step: 223720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:29,196-Speed 3316.30 samples/sec   Loss 0.8329   LearningRate 0.0109   Epoch: 13   Global Step: 223730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:54:32,335-Speed 3262.39 samples/sec   Loss 0.8659   LearningRate 0.0109   Epoch: 13   Global Step: 223740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:54:35,413-Speed 3327.41 samples/sec   Loss 0.8699   LearningRate 0.0109   Epoch: 13   Global Step: 223750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:54:38,524-Speed 3292.93 samples/sec   Loss 0.8328   LearningRate 0.0109   Epoch: 13   Global Step: 223760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:54:41,601-Speed 3328.45 samples/sec   Loss 0.8313   LearningRate 0.0109   Epoch: 13   Global Step: 223770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:54:44,694-Speed 3310.46 samples/sec   Loss 0.8201   LearningRate 0.0109   Epoch: 13   Global Step: 223780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:54:47,796-Speed 3302.34 samples/sec   Loss 0.8720   LearningRate 0.0109   Epoch: 13   Global Step: 223790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:54:50,870-Speed 3332.03 samples/sec   Loss 0.8354   LearningRate 0.0109   Epoch: 13   Global Step: 223800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:54:53,956-Speed 3319.56 samples/sec   Loss 0.8482   LearningRate 0.0109   Epoch: 13   Global Step: 223810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:54:57,042-Speed 3318.96 samples/sec   Loss 0.8015   LearningRate 0.0109   Epoch: 13   Global Step: 223820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:55:00,121-Speed 3326.25 samples/sec   Loss 0.8672   LearningRate 0.0109   Epoch: 13   Global Step: 223830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-11 22:55:03,190-Speed 3337.07 samples/sec   Loss 0.8385   LearningRate 0.0109   Epoch: 13   Global Step: 223840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:06,274-Speed 3321.55 samples/sec   Loss 0.8434   LearningRate 0.0109   Epoch: 13   Global Step: 223850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:09,365-Speed 3313.47 samples/sec   Loss 0.8092   LearningRate 0.0109   Epoch: 13   Global Step: 223860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:12,438-Speed 3332.68 samples/sec   Loss 0.8165   LearningRate 0.0108   Epoch: 13   Global Step: 223870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:15,512-Speed 3331.82 samples/sec   Loss 0.8563   LearningRate 0.0108   Epoch: 13   Global Step: 223880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:18,624-Speed 3291.85 samples/sec   Loss 0.8489   LearningRate 0.0108   Epoch: 13   Global Step: 223890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:21,730-Speed 3297.33 samples/sec   Loss 0.8679   LearningRate 0.0108   Epoch: 13   Global Step: 223900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:24,799-Speed 3337.17 samples/sec   Loss 0.8769   LearningRate 0.0108   Epoch: 13   Global Step: 223910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:27,883-Speed 3321.05 samples/sec   Loss 0.8080   LearningRate 0.0108   Epoch: 13   Global Step: 223920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:30,955-Speed 3334.20 samples/sec   Loss 0.8307   LearningRate 0.0108   Epoch: 13   Global Step: 223930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:34,063-Speed 3295.42 samples/sec   Loss 0.8861   LearningRate 0.0108   Epoch: 13   Global Step: 223940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:55:37,170-Speed 3296.09 samples/sec   Loss 0.8563   LearningRate 0.0108   Epoch: 13   Global Step: 223950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:55:40,243-Speed 3333.36 samples/sec   Loss 0.8099   LearningRate 0.0108   Epoch: 13   Global Step: 223960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:43,324-Speed 3324.02 samples/sec   Loss 0.8311   LearningRate 0.0108   Epoch: 13   Global Step: 223970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:46,397-Speed 3333.07 samples/sec   Loss 0.8153   LearningRate 0.0108   Epoch: 13   Global Step: 223980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:49,472-Speed 3331.41 samples/sec   Loss 0.7945   LearningRate 0.0108   Epoch: 13   Global Step: 223990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:55:52,595-Speed 3279.40 samples/sec   Loss 0.8591   LearningRate 0.0108   Epoch: 13   Global Step: 224000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:56:35,974-[lfw][224000]XNorm: 21.873795
Training: 2022-04-11 22:56:35,975-[lfw][224000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-11 22:56:35,975-[lfw][224000]Accuracy-Highest: 0.99817
Training: 2022-04-11 22:57:26,356-[cfp_fp][224000]XNorm: 22.827671
Training: 2022-04-11 22:57:26,356-[cfp_fp][224000]Accuracy-Flip: 0.99057+-0.00457
Training: 2022-04-11 22:57:26,357-[cfp_fp][224000]Accuracy-Highest: 0.99129
Training: 2022-04-11 22:58:09,697-[agedb_30][224000]XNorm: 23.397329
Training: 2022-04-11 22:58:09,697-[agedb_30][224000]Accuracy-Flip: 0.98283+-0.00749
Training: 2022-04-11 22:58:09,698-[agedb_30][224000]Accuracy-Highest: 0.98567
Training: 2022-04-11 22:58:12,778-Speed 73.05 samples/sec   Loss 0.8495   LearningRate 0.0108   Epoch: 13   Global Step: 224010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:58:15,870-Speed 3312.44 samples/sec   Loss 0.8088   LearningRate 0.0108   Epoch: 13   Global Step: 224020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:58:18,981-Speed 3292.61 samples/sec   Loss 0.8238   LearningRate 0.0108   Epoch: 13   Global Step: 224030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:58:22,078-Speed 3306.58 samples/sec   Loss 0.8522   LearningRate 0.0108   Epoch: 13   Global Step: 224040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:58:25,200-Speed 3280.54 samples/sec   Loss 0.8191   LearningRate 0.0108   Epoch: 13   Global Step: 224050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:58:28,281-Speed 3324.44 samples/sec   Loss 0.8530   LearningRate 0.0108   Epoch: 13   Global Step: 224060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:31,362-Speed 3324.90 samples/sec   Loss 0.8417   LearningRate 0.0108   Epoch: 13   Global Step: 224070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:34,479-Speed 3285.48 samples/sec   Loss 0.8356   LearningRate 0.0108   Epoch: 13   Global Step: 224080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:37,568-Speed 3316.15 samples/sec   Loss 0.8428   LearningRate 0.0108   Epoch: 13   Global Step: 224090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:40,641-Speed 3333.13 samples/sec   Loss 0.8582   LearningRate 0.0108   Epoch: 13   Global Step: 224100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:43,706-Speed 3341.71 samples/sec   Loss 0.8513   LearningRate 0.0108   Epoch: 13   Global Step: 224110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:46,768-Speed 3344.84 samples/sec   Loss 0.8494   LearningRate 0.0108   Epoch: 13   Global Step: 224120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:49,832-Speed 3342.16 samples/sec   Loss 0.8233   LearningRate 0.0108   Epoch: 13   Global Step: 224130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:52,929-Speed 3307.77 samples/sec   Loss 0.8658   LearningRate 0.0108   Epoch: 13   Global Step: 224140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:56,047-Speed 3284.59 samples/sec   Loss 0.8142   LearningRate 0.0108   Epoch: 13   Global Step: 224150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:58:59,109-Speed 3345.27 samples/sec   Loss 0.8791   LearningRate 0.0108   Epoch: 13   Global Step: 224160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:02,209-Speed 3303.45 samples/sec   Loss 0.8098   LearningRate 0.0108   Epoch: 13   Global Step: 224170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:05,276-Speed 3340.48 samples/sec   Loss 0.8550   LearningRate 0.0108   Epoch: 13   Global Step: 224180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:08,362-Speed 3318.76 samples/sec   Loss 0.8398   LearningRate 0.0108   Epoch: 13   Global Step: 224190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:11,444-Speed 3322.75 samples/sec   Loss 0.8917   LearningRate 0.0108   Epoch: 13   Global Step: 224200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:14,559-Speed 3288.64 samples/sec   Loss 0.8304   LearningRate 0.0108   Epoch: 13   Global Step: 224210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:17,641-Speed 3323.20 samples/sec   Loss 0.8550   LearningRate 0.0108   Epoch: 13   Global Step: 224220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:20,721-Speed 3325.84 samples/sec   Loss 0.8460   LearningRate 0.0108   Epoch: 13   Global Step: 224230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:23,906-Speed 3216.00 samples/sec   Loss 0.8382   LearningRate 0.0108   Epoch: 13   Global Step: 224240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:27,065-Speed 3241.44 samples/sec   Loss 0.8487   LearningRate 0.0108   Epoch: 13   Global Step: 224250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:30,211-Speed 3255.90 samples/sec   Loss 0.8329   LearningRate 0.0108   Epoch: 13   Global Step: 224260   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 22:59:33,294-Speed 3322.53 samples/sec   Loss 0.8404   LearningRate 0.0108   Epoch: 13   Global Step: 224270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:36,523-Speed 3171.66 samples/sec   Loss 0.8401   LearningRate 0.0108   Epoch: 13   Global Step: 224280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:39,586-Speed 3344.03 samples/sec   Loss 0.8680   LearningRate 0.0108   Epoch: 13   Global Step: 224290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 22:59:42,697-Speed 3292.39 samples/sec   Loss 0.8336   LearningRate 0.0108   Epoch: 13   Global Step: 224300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:59:45,822-Speed 3277.62 samples/sec   Loss 0.8359   LearningRate 0.0108   Epoch: 13   Global Step: 224310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:59:48,896-Speed 3332.13 samples/sec   Loss 0.8471   LearningRate 0.0108   Epoch: 13   Global Step: 224320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:59:51,968-Speed 3333.92 samples/sec   Loss 0.8513   LearningRate 0.0108   Epoch: 13   Global Step: 224330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:59:55,074-Speed 3297.89 samples/sec   Loss 0.8462   LearningRate 0.0108   Epoch: 13   Global Step: 224340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 22:59:58,170-Speed 3307.95 samples/sec   Loss 0.8289   LearningRate 0.0108   Epoch: 13   Global Step: 224350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:01,234-Speed 3342.27 samples/sec   Loss 0.8381   LearningRate 0.0108   Epoch: 13   Global Step: 224360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:04,303-Speed 3337.57 samples/sec   Loss 0.8134   LearningRate 0.0107   Epoch: 13   Global Step: 224370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:07,362-Speed 3348.39 samples/sec   Loss 0.8430   LearningRate 0.0107   Epoch: 13   Global Step: 224380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:10,495-Speed 3269.34 samples/sec   Loss 0.8389   LearningRate 0.0107   Epoch: 13   Global Step: 224390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:13,556-Speed 3346.43 samples/sec   Loss 0.8573   LearningRate 0.0107   Epoch: 13   Global Step: 224400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:00:16,636-Speed 3325.72 samples/sec   Loss 0.8169   LearningRate 0.0107   Epoch: 13   Global Step: 224410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:00:19,746-Speed 3292.62 samples/sec   Loss 0.8146   LearningRate 0.0107   Epoch: 13   Global Step: 224420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:22,863-Speed 3285.94 samples/sec   Loss 0.8693   LearningRate 0.0107   Epoch: 13   Global Step: 224430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:26,014-Speed 3250.58 samples/sec   Loss 0.8403   LearningRate 0.0107   Epoch: 13   Global Step: 224440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:29,088-Speed 3332.52 samples/sec   Loss 0.8264   LearningRate 0.0107   Epoch: 13   Global Step: 224450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:32,150-Speed 3344.75 samples/sec   Loss 0.8225   LearningRate 0.0107   Epoch: 13   Global Step: 224460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:35,219-Speed 3337.99 samples/sec   Loss 0.8788   LearningRate 0.0107   Epoch: 13   Global Step: 224470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:38,295-Speed 3328.72 samples/sec   Loss 0.8177   LearningRate 0.0107   Epoch: 13   Global Step: 224480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:41,361-Speed 3341.24 samples/sec   Loss 0.8119   LearningRate 0.0107   Epoch: 13   Global Step: 224490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:44,483-Speed 3280.53 samples/sec   Loss 0.8173   LearningRate 0.0107   Epoch: 13   Global Step: 224500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:47,609-Speed 3276.30 samples/sec   Loss 0.8610   LearningRate 0.0107   Epoch: 13   Global Step: 224510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:00:50,692-Speed 3322.42 samples/sec   Loss 0.8947   LearningRate 0.0107   Epoch: 13   Global Step: 224520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:00:53,852-Speed 3241.06 samples/sec   Loss 0.8390   LearningRate 0.0107   Epoch: 13   Global Step: 224530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:00:56,922-Speed 3336.54 samples/sec   Loss 0.8621   LearningRate 0.0107   Epoch: 13   Global Step: 224540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:01:00,078-Speed 3245.41 samples/sec   Loss 0.8242   LearningRate 0.0107   Epoch: 13   Global Step: 224550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:01:03,175-Speed 3307.36 samples/sec   Loss 0.8352   LearningRate 0.0107   Epoch: 13   Global Step: 224560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:06,242-Speed 3339.60 samples/sec   Loss 0.8259   LearningRate 0.0107   Epoch: 13   Global Step: 224570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:09,299-Speed 3350.43 samples/sec   Loss 0.8173   LearningRate 0.0107   Epoch: 13   Global Step: 224580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:12,381-Speed 3322.89 samples/sec   Loss 0.8431   LearningRate 0.0107   Epoch: 13   Global Step: 224590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:15,464-Speed 3321.98 samples/sec   Loss 0.8015   LearningRate 0.0107   Epoch: 13   Global Step: 224600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:18,530-Speed 3341.38 samples/sec   Loss 0.8654   LearningRate 0.0107   Epoch: 13   Global Step: 224610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:21,601-Speed 3334.84 samples/sec   Loss 0.8246   LearningRate 0.0107   Epoch: 13   Global Step: 224620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:24,737-Speed 3266.77 samples/sec   Loss 0.8258   LearningRate 0.0107   Epoch: 13   Global Step: 224630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:27,804-Speed 3339.24 samples/sec   Loss 0.8762   LearningRate 0.0107   Epoch: 13   Global Step: 224640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:30,868-Speed 3342.14 samples/sec   Loss 0.8387   LearningRate 0.0107   Epoch: 13   Global Step: 224650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:33,940-Speed 3334.92 samples/sec   Loss 0.8356   LearningRate 0.0107   Epoch: 13   Global Step: 224660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:01:37,003-Speed 3343.56 samples/sec   Loss 0.8550   LearningRate 0.0107   Epoch: 13   Global Step: 224670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:01:40,076-Speed 3332.62 samples/sec   Loss 0.8599   LearningRate 0.0107   Epoch: 13   Global Step: 224680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:01:43,153-Speed 3328.60 samples/sec   Loss 0.8176   LearningRate 0.0107   Epoch: 13   Global Step: 224690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:01:46,222-Speed 3338.14 samples/sec   Loss 0.8611   LearningRate 0.0107   Epoch: 13   Global Step: 224700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:49,300-Speed 3328.03 samples/sec   Loss 0.8139   LearningRate 0.0107   Epoch: 13   Global Step: 224710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:52,447-Speed 3254.83 samples/sec   Loss 0.8123   LearningRate 0.0107   Epoch: 13   Global Step: 224720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:55,533-Speed 3318.22 samples/sec   Loss 0.7947   LearningRate 0.0107   Epoch: 13   Global Step: 224730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:01:58,651-Speed 3285.27 samples/sec   Loss 0.8660   LearningRate 0.0107   Epoch: 13   Global Step: 224740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:02:01,722-Speed 3334.58 samples/sec   Loss 0.8532   LearningRate 0.0107   Epoch: 13   Global Step: 224750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:02:04,803-Speed 3325.12 samples/sec   Loss 0.8137   LearningRate 0.0107   Epoch: 13   Global Step: 224760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:02:07,956-Speed 3247.91 samples/sec   Loss 0.8648   LearningRate 0.0107   Epoch: 13   Global Step: 224770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:02:11,051-Speed 3309.57 samples/sec   Loss 0.8460   LearningRate 0.0107   Epoch: 13   Global Step: 224780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:02:14,165-Speed 3288.39 samples/sec   Loss 0.8453   LearningRate 0.0107   Epoch: 13   Global Step: 224790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:02:17,277-Speed 3292.35 samples/sec   Loss 0.8475   LearningRate 0.0107   Epoch: 13   Global Step: 224800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:20,358-Speed 3323.73 samples/sec   Loss 0.8419   LearningRate 0.0107   Epoch: 13   Global Step: 224810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:23,427-Speed 3337.93 samples/sec   Loss 0.8408   LearningRate 0.0107   Epoch: 13   Global Step: 224820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:26,497-Speed 3336.14 samples/sec   Loss 0.8307   LearningRate 0.0107   Epoch: 13   Global Step: 224830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:29,629-Speed 3269.94 samples/sec   Loss 0.8215   LearningRate 0.0107   Epoch: 13   Global Step: 224840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:32,732-Speed 3300.83 samples/sec   Loss 0.8763   LearningRate 0.0107   Epoch: 13   Global Step: 224850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:35,829-Speed 3306.52 samples/sec   Loss 0.8814   LearningRate 0.0107   Epoch: 13   Global Step: 224860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:38,913-Speed 3321.66 samples/sec   Loss 0.8199   LearningRate 0.0107   Epoch: 13   Global Step: 224870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:42,038-Speed 3277.70 samples/sec   Loss 0.8612   LearningRate 0.0107   Epoch: 13   Global Step: 224880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:45,216-Speed 3223.50 samples/sec   Loss 0.8181   LearningRate 0.0106   Epoch: 13   Global Step: 224890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:48,289-Speed 3332.59 samples/sec   Loss 0.8227   LearningRate 0.0106   Epoch: 13   Global Step: 224900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:51,384-Speed 3309.10 samples/sec   Loss 0.8715   LearningRate 0.0106   Epoch: 13   Global Step: 224910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:54,453-Speed 3337.71 samples/sec   Loss 0.8591   LearningRate 0.0106   Epoch: 13   Global Step: 224920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:02:57,539-Speed 3318.68 samples/sec   Loss 0.8513   LearningRate 0.0106   Epoch: 13   Global Step: 224930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:03:00,653-Speed 3289.41 samples/sec   Loss 0.8485   LearningRate 0.0106   Epoch: 13   Global Step: 224940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:03:03,740-Speed 3317.21 samples/sec   Loss 0.8624   LearningRate 0.0106   Epoch: 13   Global Step: 224950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:03:06,804-Speed 3342.86 samples/sec   Loss 0.8646   LearningRate 0.0106   Epoch: 13   Global Step: 224960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:03:09,869-Speed 3342.50 samples/sec   Loss 0.8252   LearningRate 0.0106   Epoch: 13   Global Step: 224970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:12,958-Speed 3315.10 samples/sec   Loss 0.8373   LearningRate 0.0106   Epoch: 13   Global Step: 224980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:16,039-Speed 3324.35 samples/sec   Loss 0.8413   LearningRate 0.0106   Epoch: 13   Global Step: 224990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:19,113-Speed 3332.87 samples/sec   Loss 0.8468   LearningRate 0.0106   Epoch: 13   Global Step: 225000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:22,179-Speed 3340.21 samples/sec   Loss 0.8380   LearningRate 0.0106   Epoch: 13   Global Step: 225010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:25,272-Speed 3310.80 samples/sec   Loss 0.8478   LearningRate 0.0106   Epoch: 13   Global Step: 225020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:28,359-Speed 3317.83 samples/sec   Loss 0.8498   LearningRate 0.0106   Epoch: 13   Global Step: 225030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:31,514-Speed 3246.29 samples/sec   Loss 0.8542   LearningRate 0.0106   Epoch: 13   Global Step: 225040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:34,649-Speed 3267.77 samples/sec   Loss 0.8415   LearningRate 0.0106   Epoch: 13   Global Step: 225050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:37,781-Speed 3270.48 samples/sec   Loss 0.8696   LearningRate 0.0106   Epoch: 13   Global Step: 225060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:40,881-Speed 3304.19 samples/sec   Loss 0.8429   LearningRate 0.0106   Epoch: 13   Global Step: 225070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 23:03:43,986-Speed 3298.48 samples/sec   Loss 0.8429   LearningRate 0.0106   Epoch: 13   Global Step: 225080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:47,138-Speed 3249.38 samples/sec   Loss 0.8560   LearningRate 0.0106   Epoch: 13   Global Step: 225090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 23:03:50,268-Speed 3272.59 samples/sec   Loss 0.8308   LearningRate 0.0106   Epoch: 13   Global Step: 225100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:03:53,374-Speed 3296.78 samples/sec   Loss 0.8877   LearningRate 0.0106   Epoch: 13   Global Step: 225110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:03:56,460-Speed 3319.71 samples/sec   Loss 0.8414   LearningRate 0.0106   Epoch: 13   Global Step: 225120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:03:59,557-Speed 3307.09 samples/sec   Loss 0.8277   LearningRate 0.0106   Epoch: 13   Global Step: 225130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:02,685-Speed 3274.00 samples/sec   Loss 0.8723   LearningRate 0.0106   Epoch: 13   Global Step: 225140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:05,754-Speed 3337.23 samples/sec   Loss 0.8462   LearningRate 0.0106   Epoch: 13   Global Step: 225150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:08,869-Speed 3288.60 samples/sec   Loss 0.8656   LearningRate 0.0106   Epoch: 13   Global Step: 225160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:11,984-Speed 3287.43 samples/sec   Loss 0.8659   LearningRate 0.0106   Epoch: 13   Global Step: 225170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:15,041-Speed 3350.78 samples/sec   Loss 0.8553   LearningRate 0.0106   Epoch: 13   Global Step: 225180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:18,112-Speed 3335.83 samples/sec   Loss 0.8856   LearningRate 0.0106   Epoch: 13   Global Step: 225190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:21,244-Speed 3270.04 samples/sec   Loss 0.8404   LearningRate 0.0106   Epoch: 13   Global Step: 225200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:24,358-Speed 3289.21 samples/sec   Loss 0.8769   LearningRate 0.0106   Epoch: 13   Global Step: 225210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:27,446-Speed 3316.44 samples/sec   Loss 0.8151   LearningRate 0.0106   Epoch: 13   Global Step: 225220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:30,519-Speed 3333.76 samples/sec   Loss 0.8376   LearningRate 0.0106   Epoch: 13   Global Step: 225230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:33,596-Speed 3328.87 samples/sec   Loss 0.8708   LearningRate 0.0106   Epoch: 13   Global Step: 225240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:36,804-Speed 3192.38 samples/sec   Loss 0.8346   LearningRate 0.0106   Epoch: 13   Global Step: 225250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:39,921-Speed 3285.35 samples/sec   Loss 0.8162   LearningRate 0.0106   Epoch: 13   Global Step: 225260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:43,129-Speed 3193.08 samples/sec   Loss 0.8391   LearningRate 0.0106   Epoch: 13   Global Step: 225270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:46,242-Speed 3290.43 samples/sec   Loss 0.8614   LearningRate 0.0106   Epoch: 13   Global Step: 225280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:04:49,446-Speed 3196.84 samples/sec   Loss 0.8392   LearningRate 0.0106   Epoch: 13   Global Step: 225290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:04:52,551-Speed 3297.83 samples/sec   Loss 0.8608   LearningRate 0.0106   Epoch: 13   Global Step: 225300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:55,657-Speed 3298.14 samples/sec   Loss 0.8850   LearningRate 0.0106   Epoch: 13   Global Step: 225310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:04:58,710-Speed 3355.36 samples/sec   Loss 0.8420   LearningRate 0.0106   Epoch: 13   Global Step: 225320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:01,785-Speed 3329.92 samples/sec   Loss 0.8524   LearningRate 0.0106   Epoch: 13   Global Step: 225330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:04,870-Speed 3320.31 samples/sec   Loss 0.8698   LearningRate 0.0106   Epoch: 13   Global Step: 225340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:07,945-Speed 3331.64 samples/sec   Loss 0.8533   LearningRate 0.0106   Epoch: 13   Global Step: 225350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:11,010-Speed 3341.08 samples/sec   Loss 0.8340   LearningRate 0.0106   Epoch: 13   Global Step: 225360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:14,084-Speed 3332.30 samples/sec   Loss 0.8074   LearningRate 0.0106   Epoch: 13   Global Step: 225370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:17,249-Speed 3236.11 samples/sec   Loss 0.8462   LearningRate 0.0106   Epoch: 13   Global Step: 225380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:20,357-Speed 3295.21 samples/sec   Loss 0.8421   LearningRate 0.0106   Epoch: 13   Global Step: 225390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:23,438-Speed 3324.30 samples/sec   Loss 0.8440   LearningRate 0.0105   Epoch: 13   Global Step: 225400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:26,501-Speed 3343.97 samples/sec   Loss 0.8946   LearningRate 0.0105   Epoch: 13   Global Step: 225410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:05:29,579-Speed 3327.45 samples/sec   Loss 0.8705   LearningRate 0.0105   Epoch: 13   Global Step: 225420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:05:32,665-Speed 3319.47 samples/sec   Loss 0.8479   LearningRate 0.0105   Epoch: 13   Global Step: 225430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:05:35,738-Speed 3332.78 samples/sec   Loss 0.8459   LearningRate 0.0105   Epoch: 13   Global Step: 225440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:05:38,809-Speed 3335.03 samples/sec   Loss 0.8298   LearningRate 0.0105   Epoch: 13   Global Step: 225450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:05:41,879-Speed 3336.62 samples/sec   Loss 0.8125   LearningRate 0.0105   Epoch: 13   Global Step: 225460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:05:45,100-Speed 3179.30 samples/sec   Loss 0.8450   LearningRate 0.0105   Epoch: 13   Global Step: 225470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:05:48,272-Speed 3229.12 samples/sec   Loss 0.8633   LearningRate 0.0105   Epoch: 13   Global Step: 225480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:05:51,400-Speed 3273.77 samples/sec   Loss 0.8731   LearningRate 0.0105   Epoch: 13   Global Step: 225490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:05:54,577-Speed 3224.99 samples/sec   Loss 0.8936   LearningRate 0.0105   Epoch: 13   Global Step: 225500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:05:57,695-Speed 3284.78 samples/sec   Loss 0.8617   LearningRate 0.0105   Epoch: 13   Global Step: 225510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:06:00,793-Speed 3306.50 samples/sec   Loss 0.8372   LearningRate 0.0105   Epoch: 13   Global Step: 225520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:03,883-Speed 3315.09 samples/sec   Loss 0.8678   LearningRate 0.0105   Epoch: 13   Global Step: 225530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:07,038-Speed 3246.43 samples/sec   Loss 0.8467   LearningRate 0.0105   Epoch: 13   Global Step: 225540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:10,170-Speed 3270.38 samples/sec   Loss 0.8719   LearningRate 0.0105   Epoch: 13   Global Step: 225550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:13,238-Speed 3338.04 samples/sec   Loss 0.8227   LearningRate 0.0105   Epoch: 13   Global Step: 225560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:16,314-Speed 3330.12 samples/sec   Loss 0.8348   LearningRate 0.0105   Epoch: 13   Global Step: 225570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:19,479-Speed 3235.34 samples/sec   Loss 0.8977   LearningRate 0.0105   Epoch: 13   Global Step: 225580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:22,621-Speed 3260.47 samples/sec   Loss 0.8262   LearningRate 0.0105   Epoch: 13   Global Step: 225590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:25,711-Speed 3315.02 samples/sec   Loss 0.7966   LearningRate 0.0105   Epoch: 13   Global Step: 225600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:28,783-Speed 3333.87 samples/sec   Loss 0.8444   LearningRate 0.0105   Epoch: 13   Global Step: 225610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:31,851-Speed 3338.14 samples/sec   Loss 0.8669   LearningRate 0.0105   Epoch: 13   Global Step: 225620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:34,919-Speed 3338.91 samples/sec   Loss 0.8542   LearningRate 0.0105   Epoch: 13   Global Step: 225630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:37,992-Speed 3333.11 samples/sec   Loss 0.8351   LearningRate 0.0105   Epoch: 13   Global Step: 225640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:41,071-Speed 3325.53 samples/sec   Loss 0.8244   LearningRate 0.0105   Epoch: 13   Global Step: 225650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:44,207-Speed 3266.20 samples/sec   Loss 0.8394   LearningRate 0.0105   Epoch: 13   Global Step: 225660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:47,293-Speed 3319.08 samples/sec   Loss 0.8579   LearningRate 0.0105   Epoch: 13   Global Step: 225670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:50,404-Speed 3293.40 samples/sec   Loss 0.8214   LearningRate 0.0105   Epoch: 13   Global Step: 225680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:53,574-Speed 3230.34 samples/sec   Loss 0.8938   LearningRate 0.0105   Epoch: 13   Global Step: 225690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:06:56,656-Speed 3323.50 samples/sec   Loss 0.8180   LearningRate 0.0105   Epoch: 13   Global Step: 225700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:06:59,741-Speed 3320.36 samples/sec   Loss 0.8616   LearningRate 0.0105   Epoch: 13   Global Step: 225710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:02,808-Speed 3338.75 samples/sec   Loss 0.8566   LearningRate 0.0105   Epoch: 13   Global Step: 225720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:05,879-Speed 3335.34 samples/sec   Loss 0.8712   LearningRate 0.0105   Epoch: 13   Global Step: 225730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:08,950-Speed 3335.84 samples/sec   Loss 0.8389   LearningRate 0.0105   Epoch: 13   Global Step: 225740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:12,105-Speed 3246.57 samples/sec   Loss 0.8391   LearningRate 0.0105   Epoch: 13   Global Step: 225750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:15,287-Speed 3218.77 samples/sec   Loss 0.8556   LearningRate 0.0105   Epoch: 13   Global Step: 225760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:18,406-Speed 3284.44 samples/sec   Loss 0.8439   LearningRate 0.0105   Epoch: 13   Global Step: 225770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:21,485-Speed 3326.06 samples/sec   Loss 0.8573   LearningRate 0.0105   Epoch: 13   Global Step: 225780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:24,556-Speed 3334.92 samples/sec   Loss 0.8547   LearningRate 0.0105   Epoch: 13   Global Step: 225790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:27,627-Speed 3335.81 samples/sec   Loss 0.8109   LearningRate 0.0105   Epoch: 13   Global Step: 225800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:07:30,706-Speed 3326.35 samples/sec   Loss 0.8950   LearningRate 0.0105   Epoch: 13   Global Step: 225810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:07:33,793-Speed 3317.02 samples/sec   Loss 0.8506   LearningRate 0.0105   Epoch: 13   Global Step: 225820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:36,881-Speed 3317.82 samples/sec   Loss 0.8804   LearningRate 0.0105   Epoch: 13   Global Step: 225830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:39,961-Speed 3324.84 samples/sec   Loss 0.8455   LearningRate 0.0105   Epoch: 13   Global Step: 225840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:43,097-Speed 3266.42 samples/sec   Loss 0.8443   LearningRate 0.0105   Epoch: 13   Global Step: 225850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:46,237-Speed 3262.00 samples/sec   Loss 0.8461   LearningRate 0.0105   Epoch: 13   Global Step: 225860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:49,336-Speed 3304.41 samples/sec   Loss 0.8458   LearningRate 0.0105   Epoch: 13   Global Step: 225870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:52,408-Speed 3334.22 samples/sec   Loss 0.8467   LearningRate 0.0105   Epoch: 13   Global Step: 225880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:55,530-Speed 3281.46 samples/sec   Loss 0.8440   LearningRate 0.0105   Epoch: 13   Global Step: 225890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:07:58,618-Speed 3315.84 samples/sec   Loss 0.8404   LearningRate 0.0105   Epoch: 13   Global Step: 225900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:08:01,712-Speed 3310.74 samples/sec   Loss 0.8806   LearningRate 0.0104   Epoch: 13   Global Step: 225910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:08:04,866-Speed 3247.33 samples/sec   Loss 0.8217   LearningRate 0.0104   Epoch: 13   Global Step: 225920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:08:07,964-Speed 3306.22 samples/sec   Loss 0.8716   LearningRate 0.0104   Epoch: 13   Global Step: 225930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:08:11,053-Speed 3316.37 samples/sec   Loss 0.8708   LearningRate 0.0104   Epoch: 13   Global Step: 225940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:08:14,143-Speed 3314.67 samples/sec   Loss 0.8234   LearningRate 0.0104   Epoch: 13   Global Step: 225950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:08:17,217-Speed 3332.05 samples/sec   Loss 0.8568   LearningRate 0.0104   Epoch: 13   Global Step: 225960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:08:20,313-Speed 3307.77 samples/sec   Loss 0.8455   LearningRate 0.0104   Epoch: 13   Global Step: 225970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:08:23,377-Speed 3342.35 samples/sec   Loss 0.8342   LearningRate 0.0104   Epoch: 13   Global Step: 225980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:08:26,438-Speed 3346.20 samples/sec   Loss 0.8625   LearningRate 0.0104   Epoch: 13   Global Step: 225990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:08:29,507-Speed 3337.82 samples/sec   Loss 0.8791   LearningRate 0.0104   Epoch: 13   Global Step: 226000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:09:13,140-[lfw][226000]XNorm: 21.930860
Training: 2022-04-11 23:09:13,141-[lfw][226000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-11 23:09:13,141-[lfw][226000]Accuracy-Highest: 0.99817
Training: 2022-04-11 23:10:03,521-[cfp_fp][226000]XNorm: 22.590117
Training: 2022-04-11 23:10:03,521-[cfp_fp][226000]Accuracy-Flip: 0.99014+-0.00505
Training: 2022-04-11 23:10:03,521-[cfp_fp][226000]Accuracy-Highest: 0.99129
Training: 2022-04-11 23:10:46,873-[agedb_30][226000]XNorm: 23.119668
Training: 2022-04-11 23:10:46,874-[agedb_30][226000]Accuracy-Flip: 0.98483+-0.00630
Training: 2022-04-11 23:10:46,874-[agedb_30][226000]Accuracy-Highest: 0.98567
Training: 2022-04-11 23:10:50,001-Speed 72.89 samples/sec   Loss 0.8582   LearningRate 0.0104   Epoch: 13   Global Step: 226010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:10:53,055-Speed 3353.97 samples/sec   Loss 0.8809   LearningRate 0.0104   Epoch: 13   Global Step: 226020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:10:56,135-Speed 3325.42 samples/sec   Loss 0.8751   LearningRate 0.0104   Epoch: 13   Global Step: 226030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:10:59,191-Speed 3351.37 samples/sec   Loss 0.8444   LearningRate 0.0104   Epoch: 13   Global Step: 226040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:11:02,235-Speed 3365.19 samples/sec   Loss 0.8498   LearningRate 0.0104   Epoch: 13   Global Step: 226050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:05,290-Speed 3352.07 samples/sec   Loss 0.8638   LearningRate 0.0104   Epoch: 13   Global Step: 226060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:08,430-Speed 3262.02 samples/sec   Loss 0.8790   LearningRate 0.0104   Epoch: 13   Global Step: 226070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:11,498-Speed 3338.75 samples/sec   Loss 0.8152   LearningRate 0.0104   Epoch: 13   Global Step: 226080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:14,567-Speed 3336.36 samples/sec   Loss 0.8834   LearningRate 0.0104   Epoch: 13   Global Step: 226090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:17,644-Speed 3328.81 samples/sec   Loss 0.8621   LearningRate 0.0104   Epoch: 13   Global Step: 226100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:20,713-Speed 3336.86 samples/sec   Loss 0.8593   LearningRate 0.0104   Epoch: 13   Global Step: 226110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:23,791-Speed 3328.44 samples/sec   Loss 0.8336   LearningRate 0.0104   Epoch: 13   Global Step: 226120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:26,875-Speed 3321.67 samples/sec   Loss 0.8904   LearningRate 0.0104   Epoch: 13   Global Step: 226130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:29,935-Speed 3347.05 samples/sec   Loss 0.8684   LearningRate 0.0104   Epoch: 13   Global Step: 226140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:11:33,011-Speed 3329.32 samples/sec   Loss 0.8630   LearningRate 0.0104   Epoch: 13   Global Step: 226150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:11:36,117-Speed 3297.32 samples/sec   Loss 0.8639   LearningRate 0.0104   Epoch: 13   Global Step: 226160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:11:39,370-Speed 3148.83 samples/sec   Loss 0.8733   LearningRate 0.0104   Epoch: 13   Global Step: 226170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:11:42,480-Speed 3293.31 samples/sec   Loss 0.8434   LearningRate 0.0104   Epoch: 13   Global Step: 226180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:11:45,548-Speed 3338.46 samples/sec   Loss 0.8720   LearningRate 0.0104   Epoch: 13   Global Step: 226190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:11:48,754-Speed 3195.05 samples/sec   Loss 0.8449   LearningRate 0.0104   Epoch: 13   Global Step: 226200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:11:51,871-Speed 3285.27 samples/sec   Loss 0.8168   LearningRate 0.0104   Epoch: 13   Global Step: 226210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:11:55,020-Speed 3253.44 samples/sec   Loss 0.8547   LearningRate 0.0104   Epoch: 13   Global Step: 226220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:11:58,103-Speed 3322.68 samples/sec   Loss 0.8659   LearningRate 0.0104   Epoch: 13   Global Step: 226230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:12:01,167-Speed 3341.90 samples/sec   Loss 0.8968   LearningRate 0.0104   Epoch: 13   Global Step: 226240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:12:04,239-Speed 3334.30 samples/sec   Loss 0.8522   LearningRate 0.0104   Epoch: 13   Global Step: 226250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:12:07,304-Speed 3342.30 samples/sec   Loss 0.8865   LearningRate 0.0104   Epoch: 13   Global Step: 226260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:12:10,385-Speed 3323.27 samples/sec   Loss 0.8572   LearningRate 0.0104   Epoch: 13   Global Step: 226270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:12:13,542-Speed 3245.28 samples/sec   Loss 0.8580   LearningRate 0.0104   Epoch: 13   Global Step: 226280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:16,613-Speed 3335.12 samples/sec   Loss 0.8373   LearningRate 0.0104   Epoch: 13   Global Step: 226290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:19,720-Speed 3296.14 samples/sec   Loss 0.9209   LearningRate 0.0104   Epoch: 13   Global Step: 226300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:22,791-Speed 3335.39 samples/sec   Loss 0.8193   LearningRate 0.0104   Epoch: 13   Global Step: 226310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:25,875-Speed 3321.16 samples/sec   Loss 0.8483   LearningRate 0.0104   Epoch: 13   Global Step: 226320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:28,950-Speed 3330.63 samples/sec   Loss 0.8612   LearningRate 0.0104   Epoch: 13   Global Step: 226330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:32,022-Speed 3334.02 samples/sec   Loss 0.8728   LearningRate 0.0104   Epoch: 13   Global Step: 226340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:35,138-Speed 3287.33 samples/sec   Loss 0.8445   LearningRate 0.0104   Epoch: 13   Global Step: 226350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:38,241-Speed 3300.59 samples/sec   Loss 0.8517   LearningRate 0.0104   Epoch: 13   Global Step: 226360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:41,403-Speed 3238.64 samples/sec   Loss 0.8403   LearningRate 0.0104   Epoch: 13   Global Step: 226370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:12:44,571-Speed 3234.10 samples/sec   Loss 0.8554   LearningRate 0.0104   Epoch: 13   Global Step: 226380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:12:47,716-Speed 3256.68 samples/sec   Loss 0.8689   LearningRate 0.0104   Epoch: 13   Global Step: 226390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:12:50,802-Speed 3319.25 samples/sec   Loss 0.8709   LearningRate 0.0104   Epoch: 13   Global Step: 226400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:12:53,902-Speed 3303.83 samples/sec   Loss 0.8212   LearningRate 0.0104   Epoch: 13   Global Step: 226410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:12:57,082-Speed 3220.58 samples/sec   Loss 0.8418   LearningRate 0.0104   Epoch: 13   Global Step: 226420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:13:00,178-Speed 3308.15 samples/sec   Loss 0.8424   LearningRate 0.0103   Epoch: 13   Global Step: 226430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:13:03,349-Speed 3230.85 samples/sec   Loss 0.8819   LearningRate 0.0103   Epoch: 13   Global Step: 226440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:13:06,496-Speed 3253.87 samples/sec   Loss 0.8024   LearningRate 0.0103   Epoch: 13   Global Step: 226450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:13:09,593-Speed 3307.10 samples/sec   Loss 0.8688   LearningRate 0.0103   Epoch: 13   Global Step: 226460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:13:12,660-Speed 3340.84 samples/sec   Loss 0.8387   LearningRate 0.0103   Epoch: 13   Global Step: 226470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:13:15,725-Speed 3341.94 samples/sec   Loss 0.8707   LearningRate 0.0103   Epoch: 13   Global Step: 226480   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 23:13:18,774-Speed 3358.67 samples/sec   Loss 0.8414   LearningRate 0.0103   Epoch: 13   Global Step: 226490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:13:21,839-Speed 3341.46 samples/sec   Loss 0.8275   LearningRate 0.0103   Epoch: 13   Global Step: 226500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:24,921-Speed 3323.33 samples/sec   Loss 0.8618   LearningRate 0.0103   Epoch: 13   Global Step: 226510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:28,078-Speed 3244.92 samples/sec   Loss 0.8686   LearningRate 0.0103   Epoch: 13   Global Step: 226520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:31,326-Speed 3152.66 samples/sec   Loss 0.8335   LearningRate 0.0103   Epoch: 13   Global Step: 226530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:34,566-Speed 3161.27 samples/sec   Loss 0.8557   LearningRate 0.0103   Epoch: 13   Global Step: 226540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:37,638-Speed 3334.25 samples/sec   Loss 0.8776   LearningRate 0.0103   Epoch: 13   Global Step: 226550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:40,775-Speed 3265.53 samples/sec   Loss 0.8368   LearningRate 0.0103   Epoch: 13   Global Step: 226560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:43,842-Speed 3339.07 samples/sec   Loss 0.8542   LearningRate 0.0103   Epoch: 13   Global Step: 226570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:46,922-Speed 3325.20 samples/sec   Loss 0.8760   LearningRate 0.0103   Epoch: 13   Global Step: 226580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:49,991-Speed 3337.23 samples/sec   Loss 0.8977   LearningRate 0.0103   Epoch: 13   Global Step: 226590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:13:53,220-Speed 3172.15 samples/sec   Loss 0.8525   LearningRate 0.0103   Epoch: 13   Global Step: 226600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:13:56,378-Speed 3243.24 samples/sec   Loss 0.8627   LearningRate 0.0103   Epoch: 13   Global Step: 226610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:13:59,463-Speed 3319.86 samples/sec   Loss 0.8578   LearningRate 0.0103   Epoch: 13   Global Step: 226620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:02,608-Speed 3256.97 samples/sec   Loss 0.8461   LearningRate 0.0103   Epoch: 13   Global Step: 226630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:05,691-Speed 3322.39 samples/sec   Loss 0.8832   LearningRate 0.0103   Epoch: 13   Global Step: 226640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:08,757-Speed 3341.23 samples/sec   Loss 0.8765   LearningRate 0.0103   Epoch: 13   Global Step: 226650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:11,928-Speed 3229.52 samples/sec   Loss 0.8644   LearningRate 0.0103   Epoch: 13   Global Step: 226660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:15,031-Speed 3300.95 samples/sec   Loss 0.8867   LearningRate 0.0103   Epoch: 13   Global Step: 226670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:18,100-Speed 3337.17 samples/sec   Loss 0.8509   LearningRate 0.0103   Epoch: 13   Global Step: 226680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:21,177-Speed 3328.46 samples/sec   Loss 0.8525   LearningRate 0.0103   Epoch: 13   Global Step: 226690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:24,259-Speed 3323.50 samples/sec   Loss 0.8764   LearningRate 0.0103   Epoch: 13   Global Step: 226700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:27,326-Speed 3339.57 samples/sec   Loss 0.8602   LearningRate 0.0103   Epoch: 13   Global Step: 226710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:14:30,434-Speed 3295.52 samples/sec   Loss 0.8782   LearningRate 0.0103   Epoch: 13   Global Step: 226720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:14:33,614-Speed 3221.46 samples/sec   Loss 0.8181   LearningRate 0.0103   Epoch: 13   Global Step: 226730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:14:36,812-Speed 3202.17 samples/sec   Loss 0.8457   LearningRate 0.0103   Epoch: 13   Global Step: 226740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:14:39,885-Speed 3333.08 samples/sec   Loss 0.8276   LearningRate 0.0103   Epoch: 13   Global Step: 226750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:14:42,962-Speed 3328.97 samples/sec   Loss 0.8561   LearningRate 0.0103   Epoch: 13   Global Step: 226760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:14:46,047-Speed 3320.06 samples/sec   Loss 0.8796   LearningRate 0.0103   Epoch: 13   Global Step: 226770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:14:49,122-Speed 3330.16 samples/sec   Loss 0.8785   LearningRate 0.0103   Epoch: 13   Global Step: 226780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:14:52,227-Speed 3299.14 samples/sec   Loss 0.8966   LearningRate 0.0103   Epoch: 13   Global Step: 226790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:14:55,406-Speed 3222.24 samples/sec   Loss 0.8662   LearningRate 0.0103   Epoch: 13   Global Step: 226800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:14:58,469-Speed 3343.46 samples/sec   Loss 0.8706   LearningRate 0.0103   Epoch: 13   Global Step: 226810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:15:01,540-Speed 3335.33 samples/sec   Loss 0.8910   LearningRate 0.0103   Epoch: 13   Global Step: 226820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:15:04,629-Speed 3315.92 samples/sec   Loss 0.8681   LearningRate 0.0103   Epoch: 13   Global Step: 226830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:15:07,712-Speed 3322.28 samples/sec   Loss 0.8912   LearningRate 0.0103   Epoch: 13   Global Step: 226840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:15:10,766-Speed 3353.35 samples/sec   Loss 0.8135   LearningRate 0.0103   Epoch: 13   Global Step: 226850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:15:13,857-Speed 3313.90 samples/sec   Loss 0.8646   LearningRate 0.0103   Epoch: 13   Global Step: 226860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:15:16,990-Speed 3269.76 samples/sec   Loss 0.8635   LearningRate 0.0103   Epoch: 13   Global Step: 226870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:15:20,076-Speed 3318.03 samples/sec   Loss 0.8556   LearningRate 0.0103   Epoch: 13   Global Step: 226880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:15:23,139-Speed 3343.98 samples/sec   Loss 0.8297   LearningRate 0.0103   Epoch: 13   Global Step: 226890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:15:26,204-Speed 3342.06 samples/sec   Loss 0.8518   LearningRate 0.0103   Epoch: 13   Global Step: 226900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:15:29,277-Speed 3333.50 samples/sec   Loss 0.8983   LearningRate 0.0103   Epoch: 13   Global Step: 226910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:15:32,426-Speed 3252.62 samples/sec   Loss 0.8601   LearningRate 0.0103   Epoch: 13   Global Step: 226920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:15:35,496-Speed 3335.86 samples/sec   Loss 0.8932   LearningRate 0.0103   Epoch: 13   Global Step: 226930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:15:38,567-Speed 3334.93 samples/sec   Loss 0.8528   LearningRate 0.0103   Epoch: 13   Global Step: 226940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:15:41,680-Speed 3290.81 samples/sec   Loss 0.8819   LearningRate 0.0102   Epoch: 13   Global Step: 226950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:15:44,751-Speed 3334.93 samples/sec   Loss 0.8612   LearningRate 0.0102   Epoch: 13   Global Step: 226960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:15:47,824-Speed 3332.35 samples/sec   Loss 0.8644   LearningRate 0.0102   Epoch: 13   Global Step: 226970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:15:50,938-Speed 3289.01 samples/sec   Loss 0.8836   LearningRate 0.0102   Epoch: 13   Global Step: 226980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:15:54,023-Speed 3320.70 samples/sec   Loss 0.8867   LearningRate 0.0102   Epoch: 13   Global Step: 226990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:15:57,110-Speed 3318.13 samples/sec   Loss 0.8463   LearningRate 0.0102   Epoch: 13   Global Step: 227000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:16:00,178-Speed 3338.02 samples/sec   Loss 0.8590   LearningRate 0.0102   Epoch: 13   Global Step: 227010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:03,277-Speed 3305.18 samples/sec   Loss 0.8168   LearningRate 0.0102   Epoch: 13   Global Step: 227020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:06,444-Speed 3234.31 samples/sec   Loss 0.8648   LearningRate 0.0102   Epoch: 13   Global Step: 227030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:09,566-Speed 3280.11 samples/sec   Loss 0.8210   LearningRate 0.0102   Epoch: 13   Global Step: 227040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:12,638-Speed 3334.81 samples/sec   Loss 0.8992   LearningRate 0.0102   Epoch: 13   Global Step: 227050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:15,755-Speed 3285.59 samples/sec   Loss 0.8614   LearningRate 0.0102   Epoch: 13   Global Step: 227060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:18,837-Speed 3323.48 samples/sec   Loss 0.8741   LearningRate 0.0102   Epoch: 13   Global Step: 227070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:21,991-Speed 3246.61 samples/sec   Loss 0.8651   LearningRate 0.0102   Epoch: 13   Global Step: 227080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:25,209-Speed 3183.24 samples/sec   Loss 0.8640   LearningRate 0.0102   Epoch: 13   Global Step: 227090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:28,333-Speed 3278.63 samples/sec   Loss 0.8607   LearningRate 0.0102   Epoch: 13   Global Step: 227100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:31,494-Speed 3240.24 samples/sec   Loss 0.8481   LearningRate 0.0102   Epoch: 13   Global Step: 227110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:16:34,642-Speed 3253.43 samples/sec   Loss 0.8575   LearningRate 0.0102   Epoch: 13   Global Step: 227120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:16:37,776-Speed 3268.43 samples/sec   Loss 0.8890   LearningRate 0.0102   Epoch: 13   Global Step: 227130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:16:40,906-Speed 3272.64 samples/sec   Loss 0.8482   LearningRate 0.0102   Epoch: 13   Global Step: 227140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:44,011-Speed 3298.74 samples/sec   Loss 0.8978   LearningRate 0.0102   Epoch: 13   Global Step: 227150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:47,106-Speed 3308.58 samples/sec   Loss 0.8414   LearningRate 0.0102   Epoch: 13   Global Step: 227160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:50,188-Speed 3323.77 samples/sec   Loss 0.8811   LearningRate 0.0102   Epoch: 13   Global Step: 227170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:53,258-Speed 3336.62 samples/sec   Loss 0.8749   LearningRate 0.0102   Epoch: 13   Global Step: 227180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:56,375-Speed 3285.62 samples/sec   Loss 0.8537   LearningRate 0.0102   Epoch: 13   Global Step: 227190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:16:59,460-Speed 3319.90 samples/sec   Loss 0.8805   LearningRate 0.0102   Epoch: 13   Global Step: 227200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:02,647-Speed 3213.35 samples/sec   Loss 0.8148   LearningRate 0.0102   Epoch: 13   Global Step: 227210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:05,810-Speed 3238.46 samples/sec   Loss 0.8488   LearningRate 0.0102   Epoch: 13   Global Step: 227220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:08,912-Speed 3301.91 samples/sec   Loss 0.8922   LearningRate 0.0102   Epoch: 13   Global Step: 227230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:11,997-Speed 3320.38 samples/sec   Loss 0.8352   LearningRate 0.0102   Epoch: 13   Global Step: 227240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:15,155-Speed 3243.65 samples/sec   Loss 0.8645   LearningRate 0.0102   Epoch: 13   Global Step: 227250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:18,281-Speed 3275.81 samples/sec   Loss 0.8685   LearningRate 0.0102   Epoch: 13   Global Step: 227260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:21,378-Speed 3306.76 samples/sec   Loss 0.8471   LearningRate 0.0102   Epoch: 13   Global Step: 227270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:24,453-Speed 3331.14 samples/sec   Loss 0.8765   LearningRate 0.0102   Epoch: 13   Global Step: 227280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:27,521-Speed 3338.42 samples/sec   Loss 0.8572   LearningRate 0.0102   Epoch: 13   Global Step: 227290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:17:30,584-Speed 3343.62 samples/sec   Loss 0.8612   LearningRate 0.0102   Epoch: 13   Global Step: 227300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:17:33,665-Speed 3324.36 samples/sec   Loss 0.9014   LearningRate 0.0102   Epoch: 13   Global Step: 227310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:17:36,847-Speed 3219.07 samples/sec   Loss 0.8739   LearningRate 0.0102   Epoch: 13   Global Step: 227320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:17:39,930-Speed 3322.04 samples/sec   Loss 0.8530   LearningRate 0.0102   Epoch: 13   Global Step: 227330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:17:43,020-Speed 3315.68 samples/sec   Loss 0.8589   LearningRate 0.0102   Epoch: 13   Global Step: 227340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:17:46,093-Speed 3332.84 samples/sec   Loss 0.8406   LearningRate 0.0102   Epoch: 13   Global Step: 227350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:17:49,201-Speed 3295.29 samples/sec   Loss 0.8789   LearningRate 0.0102   Epoch: 13   Global Step: 227360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:17:52,378-Speed 3223.34 samples/sec   Loss 0.7953   LearningRate 0.0102   Epoch: 13   Global Step: 227370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:17:55,459-Speed 3324.20 samples/sec   Loss 0.8586   LearningRate 0.0102   Epoch: 13   Global Step: 227380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:17:58,512-Speed 3355.27 samples/sec   Loss 0.8555   LearningRate 0.0102   Epoch: 13   Global Step: 227390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:01,620-Speed 3295.77 samples/sec   Loss 0.8808   LearningRate 0.0102   Epoch: 13   Global Step: 227400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:04,768-Speed 3253.16 samples/sec   Loss 0.8573   LearningRate 0.0102   Epoch: 13   Global Step: 227410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:07,861-Speed 3310.88 samples/sec   Loss 0.8962   LearningRate 0.0102   Epoch: 13   Global Step: 227420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:11,045-Speed 3217.19 samples/sec   Loss 0.8589   LearningRate 0.0102   Epoch: 13   Global Step: 227430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:14,178-Speed 3270.12 samples/sec   Loss 0.8670   LearningRate 0.0102   Epoch: 13   Global Step: 227440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:17,279-Speed 3302.42 samples/sec   Loss 0.8823   LearningRate 0.0102   Epoch: 13   Global Step: 227450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:20,376-Speed 3307.07 samples/sec   Loss 0.8852   LearningRate 0.0102   Epoch: 13   Global Step: 227460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:23,543-Speed 3233.55 samples/sec   Loss 0.8231   LearningRate 0.0101   Epoch: 13   Global Step: 227470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:26,724-Speed 3220.32 samples/sec   Loss 0.8205   LearningRate 0.0101   Epoch: 13   Global Step: 227480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:18:29,996-Speed 3130.34 samples/sec   Loss 0.8879   LearningRate 0.0101   Epoch: 13   Global Step: 227490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:18:33,983-Speed 2568.33 samples/sec   Loss 0.8552   LearningRate 0.0101   Epoch: 13   Global Step: 227500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:18:37,168-Speed 3216.07 samples/sec   Loss 0.8451   LearningRate 0.0101   Epoch: 13   Global Step: 227510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:18:40,263-Speed 3310.40 samples/sec   Loss 0.8452   LearningRate 0.0101   Epoch: 13   Global Step: 227520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:18:43,424-Speed 3239.71 samples/sec   Loss 0.8920   LearningRate 0.0101   Epoch: 13   Global Step: 227530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:18:46,623-Speed 3201.75 samples/sec   Loss 0.8527   LearningRate 0.0101   Epoch: 13   Global Step: 227540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:18:49,686-Speed 3344.21 samples/sec   Loss 0.8492   LearningRate 0.0101   Epoch: 13   Global Step: 227550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:18:52,760-Speed 3331.45 samples/sec   Loss 0.8812   LearningRate 0.0101   Epoch: 13   Global Step: 227560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:18:55,905-Speed 3256.82 samples/sec   Loss 0.8646   LearningRate 0.0101   Epoch: 13   Global Step: 227570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:18:59,035-Speed 3272.57 samples/sec   Loss 0.8826   LearningRate 0.0101   Epoch: 13   Global Step: 227580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:19:02,209-Speed 3226.00 samples/sec   Loss 0.8629   LearningRate 0.0101   Epoch: 13   Global Step: 227590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:19:05,283-Speed 3332.58 samples/sec   Loss 0.8793   LearningRate 0.0101   Epoch: 13   Global Step: 227600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:19:08,368-Speed 3320.01 samples/sec   Loss 0.8566   LearningRate 0.0101   Epoch: 13   Global Step: 227610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:19:11,459-Speed 3314.14 samples/sec   Loss 0.8653   LearningRate 0.0101   Epoch: 13   Global Step: 227620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:19:14,547-Speed 3317.05 samples/sec   Loss 0.8997   LearningRate 0.0101   Epoch: 13   Global Step: 227630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:19:17,744-Speed 3203.63 samples/sec   Loss 0.8740   LearningRate 0.0101   Epoch: 13   Global Step: 227640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:19:20,798-Speed 3352.84 samples/sec   Loss 0.9003   LearningRate 0.0101   Epoch: 13   Global Step: 227650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:23,864-Speed 3341.47 samples/sec   Loss 0.8558   LearningRate 0.0101   Epoch: 13   Global Step: 227660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:27,019-Speed 3245.61 samples/sec   Loss 0.8612   LearningRate 0.0101   Epoch: 13   Global Step: 227670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:30,127-Speed 3295.39 samples/sec   Loss 0.8372   LearningRate 0.0101   Epoch: 13   Global Step: 227680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:33,213-Speed 3318.59 samples/sec   Loss 0.8915   LearningRate 0.0101   Epoch: 13   Global Step: 227690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:36,315-Speed 3302.68 samples/sec   Loss 0.9060   LearningRate 0.0101   Epoch: 13   Global Step: 227700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:39,431-Speed 3286.70 samples/sec   Loss 0.8652   LearningRate 0.0101   Epoch: 13   Global Step: 227710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:42,571-Speed 3262.85 samples/sec   Loss 0.8912   LearningRate 0.0101   Epoch: 13   Global Step: 227720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:45,642-Speed 3334.70 samples/sec   Loss 0.8476   LearningRate 0.0101   Epoch: 13   Global Step: 227730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:48,744-Speed 3301.76 samples/sec   Loss 0.8782   LearningRate 0.0101   Epoch: 13   Global Step: 227740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:19:51,818-Speed 3331.25 samples/sec   Loss 0.8721   LearningRate 0.0101   Epoch: 13   Global Step: 227750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:19:54,908-Speed 3314.70 samples/sec   Loss 0.8453   LearningRate 0.0101   Epoch: 13   Global Step: 227760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:19:57,993-Speed 3320.19 samples/sec   Loss 0.8983   LearningRate 0.0101   Epoch: 13   Global Step: 227770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:01,140-Speed 3255.05 samples/sec   Loss 0.8635   LearningRate 0.0101   Epoch: 13   Global Step: 227780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:04,266-Speed 3276.41 samples/sec   Loss 0.8554   LearningRate 0.0101   Epoch: 13   Global Step: 227790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:07,369-Speed 3301.07 samples/sec   Loss 0.8772   LearningRate 0.0101   Epoch: 13   Global Step: 227800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:10,441-Speed 3333.82 samples/sec   Loss 0.8911   LearningRate 0.0101   Epoch: 13   Global Step: 227810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:13,579-Speed 3264.40 samples/sec   Loss 0.9042   LearningRate 0.0101   Epoch: 13   Global Step: 227820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:16,757-Speed 3222.15 samples/sec   Loss 0.8126   LearningRate 0.0101   Epoch: 13   Global Step: 227830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:19,829-Speed 3333.93 samples/sec   Loss 0.8667   LearningRate 0.0101   Epoch: 13   Global Step: 227840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:22,960-Speed 3271.55 samples/sec   Loss 0.8677   LearningRate 0.0101   Epoch: 13   Global Step: 227850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:20:26,105-Speed 3257.03 samples/sec   Loss 0.8347   LearningRate 0.0101   Epoch: 13   Global Step: 227860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:20:29,253-Speed 3253.26 samples/sec   Loss 0.8529   LearningRate 0.0101   Epoch: 13   Global Step: 227870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:20:32,397-Speed 3257.62 samples/sec   Loss 0.8731   LearningRate 0.0101   Epoch: 13   Global Step: 227880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:20:35,472-Speed 3331.24 samples/sec   Loss 0.8598   LearningRate 0.0101   Epoch: 13   Global Step: 227890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:20:38,640-Speed 3233.13 samples/sec   Loss 0.8804   LearningRate 0.0101   Epoch: 13   Global Step: 227900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:20:41,885-Speed 3156.61 samples/sec   Loss 0.8898   LearningRate 0.0101   Epoch: 13   Global Step: 227910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:20:44,955-Speed 3335.80 samples/sec   Loss 0.8725   LearningRate 0.0101   Epoch: 13   Global Step: 227920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:20:48,080-Speed 3277.29 samples/sec   Loss 0.8786   LearningRate 0.0101   Epoch: 13   Global Step: 227930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:51,282-Speed 3198.83 samples/sec   Loss 0.8841   LearningRate 0.0101   Epoch: 13   Global Step: 227940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:54,365-Speed 3321.91 samples/sec   Loss 0.8481   LearningRate 0.0101   Epoch: 13   Global Step: 227950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:20:57,475-Speed 3293.52 samples/sec   Loss 0.8599   LearningRate 0.0101   Epoch: 13   Global Step: 227960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:21:00,603-Speed 3275.16 samples/sec   Loss 0.8439   LearningRate 0.0101   Epoch: 13   Global Step: 227970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:21:03,763-Speed 3240.54 samples/sec   Loss 0.8463   LearningRate 0.0101   Epoch: 13   Global Step: 227980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:21:06,844-Speed 3324.26 samples/sec   Loss 0.9148   LearningRate 0.0101   Epoch: 13   Global Step: 227990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:21:10,011-Speed 3234.70 samples/sec   Loss 0.8880   LearningRate 0.0100   Epoch: 13   Global Step: 228000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:21:54,542-[lfw][228000]XNorm: 21.231028
Training: 2022-04-11 23:21:54,542-[lfw][228000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-11 23:21:54,543-[lfw][228000]Accuracy-Highest: 0.99817
Training: 2022-04-11 23:22:46,346-[cfp_fp][228000]XNorm: 21.554895
Training: 2022-04-11 23:22:46,346-[cfp_fp][228000]Accuracy-Flip: 0.99029+-0.00398
Training: 2022-04-11 23:22:46,347-[cfp_fp][228000]Accuracy-Highest: 0.99129
Training: 2022-04-11 23:23:30,788-[agedb_30][228000]XNorm: 22.315786
Training: 2022-04-11 23:23:30,789-[agedb_30][228000]Accuracy-Flip: 0.98533+-0.00678
Training: 2022-04-11 23:23:30,789-[agedb_30][228000]Accuracy-Highest: 0.98567
Training: 2022-04-11 23:23:33,860-Speed 71.19 samples/sec   Loss 0.8370   LearningRate 0.0100   Epoch: 13   Global Step: 228010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:23:36,920-Speed 3347.20 samples/sec   Loss 0.8347   LearningRate 0.0100   Epoch: 13   Global Step: 228020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:23:39,996-Speed 3329.77 samples/sec   Loss 0.8765   LearningRate 0.0100   Epoch: 13   Global Step: 228030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:23:43,058-Speed 3345.68 samples/sec   Loss 0.8950   LearningRate 0.0100   Epoch: 13   Global Step: 228040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:23:46,138-Speed 3324.78 samples/sec   Loss 0.8518   LearningRate 0.0100   Epoch: 13   Global Step: 228050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:23:49,251-Speed 3290.73 samples/sec   Loss 0.8796   LearningRate 0.0100   Epoch: 13   Global Step: 228060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:23:52,381-Speed 3271.85 samples/sec   Loss 0.8896   LearningRate 0.0100   Epoch: 13   Global Step: 228070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:23:55,485-Speed 3300.11 samples/sec   Loss 0.8297   LearningRate 0.0100   Epoch: 13   Global Step: 228080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:23:58,601-Speed 3287.21 samples/sec   Loss 0.8794   LearningRate 0.0100   Epoch: 13   Global Step: 228090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:24:01,671-Speed 3336.36 samples/sec   Loss 0.8995   LearningRate 0.0100   Epoch: 13   Global Step: 228100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:24:04,762-Speed 3313.82 samples/sec   Loss 0.8699   LearningRate 0.0100   Epoch: 13   Global Step: 228110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:24:07,837-Speed 3329.74 samples/sec   Loss 0.8877   LearningRate 0.0100   Epoch: 13   Global Step: 228120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:24:10,896-Speed 3348.99 samples/sec   Loss 0.8860   LearningRate 0.0100   Epoch: 13   Global Step: 228130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:24:13,982-Speed 3318.50 samples/sec   Loss 0.8694   LearningRate 0.0100   Epoch: 13   Global Step: 228140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:17,108-Speed 3276.57 samples/sec   Loss 0.8933   LearningRate 0.0100   Epoch: 13   Global Step: 228150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:20,228-Speed 3283.40 samples/sec   Loss 0.8712   LearningRate 0.0100   Epoch: 13   Global Step: 228160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:23,367-Speed 3262.85 samples/sec   Loss 0.8575   LearningRate 0.0100   Epoch: 13   Global Step: 228170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:26,517-Speed 3251.61 samples/sec   Loss 0.8746   LearningRate 0.0100   Epoch: 13   Global Step: 228180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:29,641-Speed 3279.25 samples/sec   Loss 0.8680   LearningRate 0.0100   Epoch: 13   Global Step: 228190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:32,781-Speed 3261.79 samples/sec   Loss 0.9187   LearningRate 0.0100   Epoch: 13   Global Step: 228200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:35,900-Speed 3284.07 samples/sec   Loss 0.8719   LearningRate 0.0100   Epoch: 13   Global Step: 228210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:38,983-Speed 3321.87 samples/sec   Loss 0.8562   LearningRate 0.0100   Epoch: 13   Global Step: 228220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:42,140-Speed 3244.08 samples/sec   Loss 0.9386   LearningRate 0.0100   Epoch: 13   Global Step: 228230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:45,230-Speed 3314.91 samples/sec   Loss 0.8830   LearningRate 0.0100   Epoch: 13   Global Step: 228240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:48,368-Speed 3264.62 samples/sec   Loss 0.8984   LearningRate 0.0100   Epoch: 13   Global Step: 228250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:51,529-Speed 3239.63 samples/sec   Loss 0.8807   LearningRate 0.0100   Epoch: 13   Global Step: 228260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:54,739-Speed 3191.29 samples/sec   Loss 0.8943   LearningRate 0.0100   Epoch: 13   Global Step: 228270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:24:57,914-Speed 3225.50 samples/sec   Loss 0.8624   LearningRate 0.0100   Epoch: 13   Global Step: 228280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:25:01,062-Speed 3253.76 samples/sec   Loss 0.8540   LearningRate 0.0100   Epoch: 13   Global Step: 228290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:25:04,247-Speed 3215.84 samples/sec   Loss 0.8559   LearningRate 0.0100   Epoch: 13   Global Step: 228300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:25:07,426-Speed 3220.96 samples/sec   Loss 0.8549   LearningRate 0.0100   Epoch: 13   Global Step: 228310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:25:10,516-Speed 3315.45 samples/sec   Loss 0.8867   LearningRate 0.0100   Epoch: 13   Global Step: 228320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:25:13,650-Speed 3268.87 samples/sec   Loss 0.9065   LearningRate 0.0100   Epoch: 13   Global Step: 228330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:25:16,816-Speed 3234.50 samples/sec   Loss 0.8760   LearningRate 0.0100   Epoch: 13   Global Step: 228340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:25:19,878-Speed 3345.23 samples/sec   Loss 0.8883   LearningRate 0.0100   Epoch: 13   Global Step: 228350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:25:22,938-Speed 3347.78 samples/sec   Loss 0.8542   LearningRate 0.0100   Epoch: 13   Global Step: 228360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:26,035-Speed 3307.01 samples/sec   Loss 0.8750   LearningRate 0.0100   Epoch: 13   Global Step: 228370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:29,207-Speed 3229.06 samples/sec   Loss 0.8390   LearningRate 0.0100   Epoch: 13   Global Step: 228380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:32,301-Speed 3310.26 samples/sec   Loss 0.8793   LearningRate 0.0100   Epoch: 13   Global Step: 228390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:35,362-Speed 3345.35 samples/sec   Loss 0.8532   LearningRate 0.0100   Epoch: 13   Global Step: 228400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:38,477-Speed 3287.91 samples/sec   Loss 0.8694   LearningRate 0.0100   Epoch: 13   Global Step: 228410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:41,573-Speed 3308.80 samples/sec   Loss 0.9071   LearningRate 0.0100   Epoch: 13   Global Step: 228420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:44,695-Speed 3280.23 samples/sec   Loss 0.8779   LearningRate 0.0100   Epoch: 13   Global Step: 228430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:47,891-Speed 3205.24 samples/sec   Loss 0.8799   LearningRate 0.0100   Epoch: 13   Global Step: 228440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:50,970-Speed 3325.65 samples/sec   Loss 0.8540   LearningRate 0.0100   Epoch: 13   Global Step: 228450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:25:54,061-Speed 3314.20 samples/sec   Loss 0.8381   LearningRate 0.0100   Epoch: 13   Global Step: 228460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:25:57,171-Speed 3292.68 samples/sec   Loss 0.8362   LearningRate 0.0100   Epoch: 13   Global Step: 228470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:00,301-Speed 3273.33 samples/sec   Loss 0.8873   LearningRate 0.0100   Epoch: 13   Global Step: 228480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:03,429-Speed 3274.26 samples/sec   Loss 0.8915   LearningRate 0.0100   Epoch: 13   Global Step: 228490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:06,588-Speed 3241.81 samples/sec   Loss 0.8520   LearningRate 0.0100   Epoch: 13   Global Step: 228500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:09,696-Speed 3295.44 samples/sec   Loss 0.8306   LearningRate 0.0100   Epoch: 13   Global Step: 228510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:12,807-Speed 3293.09 samples/sec   Loss 0.8658   LearningRate 0.0100   Epoch: 13   Global Step: 228520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:15,879-Speed 3333.14 samples/sec   Loss 0.8741   LearningRate 0.0099   Epoch: 13   Global Step: 228530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:18,980-Speed 3303.02 samples/sec   Loss 0.8646   LearningRate 0.0099   Epoch: 13   Global Step: 228540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:22,078-Speed 3306.08 samples/sec   Loss 0.8695   LearningRate 0.0099   Epoch: 13   Global Step: 228550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:25,205-Speed 3276.10 samples/sec   Loss 0.8714   LearningRate 0.0099   Epoch: 13   Global Step: 228560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:26:28,319-Speed 3288.28 samples/sec   Loss 0.8787   LearningRate 0.0099   Epoch: 13   Global Step: 228570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:26:31,410-Speed 3314.34 samples/sec   Loss 0.8672   LearningRate 0.0099   Epoch: 13   Global Step: 228580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:26:34,499-Speed 3316.24 samples/sec   Loss 0.8491   LearningRate 0.0099   Epoch: 13   Global Step: 228590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:37,566-Speed 3339.10 samples/sec   Loss 0.8783   LearningRate 0.0099   Epoch: 13   Global Step: 228600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:40,721-Speed 3246.48 samples/sec   Loss 0.9021   LearningRate 0.0099   Epoch: 13   Global Step: 228610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:43,866-Speed 3256.73 samples/sec   Loss 0.8795   LearningRate 0.0099   Epoch: 13   Global Step: 228620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:46,945-Speed 3325.93 samples/sec   Loss 0.8293   LearningRate 0.0099   Epoch: 13   Global Step: 228630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:50,018-Speed 3332.55 samples/sec   Loss 0.8735   LearningRate 0.0099   Epoch: 13   Global Step: 228640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:53,125-Speed 3297.22 samples/sec   Loss 0.8419   LearningRate 0.0099   Epoch: 13   Global Step: 228650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:56,235-Speed 3293.24 samples/sec   Loss 0.8755   LearningRate 0.0099   Epoch: 13   Global Step: 228660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:26:59,311-Speed 3329.81 samples/sec   Loss 0.8582   LearningRate 0.0099   Epoch: 13   Global Step: 228670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:02,393-Speed 3323.06 samples/sec   Loss 0.8645   LearningRate 0.0099   Epoch: 13   Global Step: 228680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:05,604-Speed 3190.00 samples/sec   Loss 0.8826   LearningRate 0.0099   Epoch: 13   Global Step: 228690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:27:08,774-Speed 3231.17 samples/sec   Loss 0.8934   LearningRate 0.0099   Epoch: 13   Global Step: 228700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:11,843-Speed 3337.01 samples/sec   Loss 0.8511   LearningRate 0.0099   Epoch: 13   Global Step: 228710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:14,928-Speed 3320.04 samples/sec   Loss 0.8588   LearningRate 0.0099   Epoch: 13   Global Step: 228720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:18,000-Speed 3334.04 samples/sec   Loss 0.8857   LearningRate 0.0099   Epoch: 13   Global Step: 228730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:21,084-Speed 3320.81 samples/sec   Loss 0.8906   LearningRate 0.0099   Epoch: 13   Global Step: 228740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:24,153-Speed 3337.54 samples/sec   Loss 0.8873   LearningRate 0.0099   Epoch: 13   Global Step: 228750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:27,216-Speed 3344.01 samples/sec   Loss 0.8667   LearningRate 0.0099   Epoch: 13   Global Step: 228760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:30,309-Speed 3311.97 samples/sec   Loss 0.8927   LearningRate 0.0099   Epoch: 13   Global Step: 228770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:33,398-Speed 3315.55 samples/sec   Loss 0.8622   LearningRate 0.0099   Epoch: 13   Global Step: 228780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:36,499-Speed 3302.61 samples/sec   Loss 0.8422   LearningRate 0.0099   Epoch: 13   Global Step: 228790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:27:39,602-Speed 3300.51 samples/sec   Loss 0.8809   LearningRate 0.0099   Epoch: 13   Global Step: 228800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:27:42,751-Speed 3252.94 samples/sec   Loss 0.8711   LearningRate 0.0099   Epoch: 13   Global Step: 228810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:27:45,829-Speed 3327.80 samples/sec   Loss 0.8733   LearningRate 0.0099   Epoch: 13   Global Step: 228820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:27:48,897-Speed 3337.71 samples/sec   Loss 0.8359   LearningRate 0.0099   Epoch: 13   Global Step: 228830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:27:51,963-Speed 3341.29 samples/sec   Loss 0.8658   LearningRate 0.0099   Epoch: 13   Global Step: 228840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:27:55,034-Speed 3335.13 samples/sec   Loss 0.8623   LearningRate 0.0099   Epoch: 13   Global Step: 228850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:27:58,106-Speed 3333.77 samples/sec   Loss 0.8945   LearningRate 0.0099   Epoch: 13   Global Step: 228860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:01,186-Speed 3325.79 samples/sec   Loss 0.8832   LearningRate 0.0099   Epoch: 13   Global Step: 228870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:04,250-Speed 3342.81 samples/sec   Loss 0.8804   LearningRate 0.0099   Epoch: 13   Global Step: 228880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:07,327-Speed 3328.10 samples/sec   Loss 0.8675   LearningRate 0.0099   Epoch: 13   Global Step: 228890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:10,397-Speed 3336.25 samples/sec   Loss 0.8138   LearningRate 0.0099   Epoch: 13   Global Step: 228900   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 23:28:13,539-Speed 3260.48 samples/sec   Loss 0.8552   LearningRate 0.0099   Epoch: 13   Global Step: 228910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:16,636-Speed 3306.26 samples/sec   Loss 0.8677   LearningRate 0.0099   Epoch: 13   Global Step: 228920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:19,711-Speed 3331.29 samples/sec   Loss 0.8324   LearningRate 0.0099   Epoch: 13   Global Step: 228930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:22,826-Speed 3288.84 samples/sec   Loss 0.8663   LearningRate 0.0099   Epoch: 13   Global Step: 228940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:25,917-Speed 3313.02 samples/sec   Loss 0.8908   LearningRate 0.0099   Epoch: 13   Global Step: 228950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:29,003-Speed 3318.88 samples/sec   Loss 0.8696   LearningRate 0.0099   Epoch: 13   Global Step: 228960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:32,078-Speed 3331.41 samples/sec   Loss 0.8478   LearningRate 0.0099   Epoch: 13   Global Step: 228970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:35,148-Speed 3335.61 samples/sec   Loss 0.8512   LearningRate 0.0099   Epoch: 13   Global Step: 228980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:38,231-Speed 3322.21 samples/sec   Loss 0.8918   LearningRate 0.0099   Epoch: 13   Global Step: 228990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:41,301-Speed 3336.85 samples/sec   Loss 0.8532   LearningRate 0.0099   Epoch: 13   Global Step: 229000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:44,434-Speed 3268.32 samples/sec   Loss 0.8284   LearningRate 0.0099   Epoch: 13   Global Step: 229010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:28:47,532-Speed 3307.33 samples/sec   Loss 0.8719   LearningRate 0.0099   Epoch: 13   Global Step: 229020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:28:50,593-Speed 3346.05 samples/sec   Loss 0.8794   LearningRate 0.0099   Epoch: 13   Global Step: 229030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:28:53,661-Speed 3338.24 samples/sec   Loss 0.8562   LearningRate 0.0099   Epoch: 13   Global Step: 229040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:28:56,742-Speed 3324.43 samples/sec   Loss 0.8448   LearningRate 0.0099   Epoch: 13   Global Step: 229050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:28:59,817-Speed 3330.23 samples/sec   Loss 0.8966   LearningRate 0.0098   Epoch: 13   Global Step: 229060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:29:02,898-Speed 3324.11 samples/sec   Loss 0.9003   LearningRate 0.0098   Epoch: 13   Global Step: 229070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:29:05,971-Speed 3332.84 samples/sec   Loss 0.8839   LearningRate 0.0098   Epoch: 13   Global Step: 229080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:29:09,056-Speed 3320.47 samples/sec   Loss 0.8733   LearningRate 0.0098   Epoch: 13   Global Step: 229090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:29:12,138-Speed 3324.13 samples/sec   Loss 0.8649   LearningRate 0.0098   Epoch: 13   Global Step: 229100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:29:15,264-Speed 3276.25 samples/sec   Loss 0.8981   LearningRate 0.0098   Epoch: 13   Global Step: 229110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:29:18,346-Speed 3323.16 samples/sec   Loss 0.8911   LearningRate 0.0098   Epoch: 13   Global Step: 229120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:21,422-Speed 3329.74 samples/sec   Loss 0.8849   LearningRate 0.0098   Epoch: 13   Global Step: 229130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:24,552-Speed 3271.79 samples/sec   Loss 0.8579   LearningRate 0.0098   Epoch: 13   Global Step: 229140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:27,763-Speed 3190.10 samples/sec   Loss 0.8499   LearningRate 0.0098   Epoch: 13   Global Step: 229150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:30,845-Speed 3323.14 samples/sec   Loss 0.8809   LearningRate 0.0098   Epoch: 13   Global Step: 229160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:34,014-Speed 3231.87 samples/sec   Loss 0.8977   LearningRate 0.0098   Epoch: 13   Global Step: 229170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:37,217-Speed 3197.45 samples/sec   Loss 0.8511   LearningRate 0.0098   Epoch: 13   Global Step: 229180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:40,287-Speed 3337.12 samples/sec   Loss 0.8965   LearningRate 0.0098   Epoch: 13   Global Step: 229190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:43,510-Speed 3177.83 samples/sec   Loss 0.8677   LearningRate 0.0098   Epoch: 13   Global Step: 229200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:46,581-Speed 3335.10 samples/sec   Loss 0.8830   LearningRate 0.0098   Epoch: 13   Global Step: 229210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:29:49,625-Speed 3365.16 samples/sec   Loss 0.8597   LearningRate 0.0098   Epoch: 13   Global Step: 229220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:29:52,741-Speed 3286.54 samples/sec   Loss 0.8616   LearningRate 0.0098   Epoch: 13   Global Step: 229230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:29:55,938-Speed 3203.56 samples/sec   Loss 0.8192   LearningRate 0.0098   Epoch: 13   Global Step: 229240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:29:59,055-Speed 3285.53 samples/sec   Loss 0.8640   LearningRate 0.0098   Epoch: 13   Global Step: 229250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:02,132-Speed 3329.53 samples/sec   Loss 0.8711   LearningRate 0.0098   Epoch: 13   Global Step: 229260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:05,227-Speed 3309.61 samples/sec   Loss 0.8389   LearningRate 0.0098   Epoch: 13   Global Step: 229270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:08,305-Speed 3327.49 samples/sec   Loss 0.8918   LearningRate 0.0098   Epoch: 13   Global Step: 229280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:11,402-Speed 3307.34 samples/sec   Loss 0.8629   LearningRate 0.0098   Epoch: 13   Global Step: 229290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:14,632-Speed 3171.19 samples/sec   Loss 0.8542   LearningRate 0.0098   Epoch: 13   Global Step: 229300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:17,778-Speed 3255.24 samples/sec   Loss 0.8642   LearningRate 0.0098   Epoch: 13   Global Step: 229310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:20,853-Speed 3330.83 samples/sec   Loss 0.8780   LearningRate 0.0098   Epoch: 13   Global Step: 229320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:30:23,923-Speed 3335.93 samples/sec   Loss 0.8491   LearningRate 0.0098   Epoch: 13   Global Step: 229330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:30:26,985-Speed 3345.35 samples/sec   Loss 0.8828   LearningRate 0.0098   Epoch: 13   Global Step: 229340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:30,063-Speed 3327.13 samples/sec   Loss 0.9372   LearningRate 0.0098   Epoch: 13   Global Step: 229350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:33,170-Speed 3296.22 samples/sec   Loss 0.8886   LearningRate 0.0098   Epoch: 13   Global Step: 229360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:36,247-Speed 3329.60 samples/sec   Loss 0.8438   LearningRate 0.0098   Epoch: 13   Global Step: 229370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:39,327-Speed 3324.88 samples/sec   Loss 0.8740   LearningRate 0.0098   Epoch: 13   Global Step: 229380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:42,417-Speed 3314.79 samples/sec   Loss 0.8943   LearningRate 0.0098   Epoch: 13   Global Step: 229390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:45,488-Speed 3335.64 samples/sec   Loss 0.8939   LearningRate 0.0098   Epoch: 13   Global Step: 229400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:48,592-Speed 3299.54 samples/sec   Loss 0.8657   LearningRate 0.0098   Epoch: 13   Global Step: 229410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:51,670-Speed 3326.78 samples/sec   Loss 0.8345   LearningRate 0.0098   Epoch: 13   Global Step: 229420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:54,814-Speed 3259.12 samples/sec   Loss 0.8729   LearningRate 0.0098   Epoch: 13   Global Step: 229430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:30:57,899-Speed 3319.85 samples/sec   Loss 0.8686   LearningRate 0.0098   Epoch: 13   Global Step: 229440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:31:01,050-Speed 3250.37 samples/sec   Loss 0.8568   LearningRate 0.0098   Epoch: 13   Global Step: 229450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:04,175-Speed 3278.15 samples/sec   Loss 0.8645   LearningRate 0.0098   Epoch: 13   Global Step: 229460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:07,310-Speed 3266.98 samples/sec   Loss 0.8754   LearningRate 0.0098   Epoch: 13   Global Step: 229470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:10,377-Speed 3339.30 samples/sec   Loss 0.8632   LearningRate 0.0098   Epoch: 13   Global Step: 229480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:13,474-Speed 3306.94 samples/sec   Loss 0.8645   LearningRate 0.0098   Epoch: 13   Global Step: 229490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:16,588-Speed 3288.86 samples/sec   Loss 0.8926   LearningRate 0.0098   Epoch: 13   Global Step: 229500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:19,687-Speed 3305.10 samples/sec   Loss 0.9452   LearningRate 0.0098   Epoch: 13   Global Step: 229510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:22,868-Speed 3220.26 samples/sec   Loss 0.8637   LearningRate 0.0098   Epoch: 13   Global Step: 229520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:25,939-Speed 3335.44 samples/sec   Loss 0.9076   LearningRate 0.0098   Epoch: 13   Global Step: 229530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:29,021-Speed 3323.10 samples/sec   Loss 0.8741   LearningRate 0.0098   Epoch: 13   Global Step: 229540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:32,099-Speed 3327.11 samples/sec   Loss 0.9002   LearningRate 0.0098   Epoch: 13   Global Step: 229550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:35,189-Speed 3315.12 samples/sec   Loss 0.8476   LearningRate 0.0098   Epoch: 13   Global Step: 229560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:38,271-Speed 3322.74 samples/sec   Loss 0.8843   LearningRate 0.0098   Epoch: 13   Global Step: 229570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:41,338-Speed 3339.55 samples/sec   Loss 0.8865   LearningRate 0.0098   Epoch: 13   Global Step: 229580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:44,408-Speed 3336.28 samples/sec   Loss 0.8689   LearningRate 0.0097   Epoch: 13   Global Step: 229590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:47,488-Speed 3325.05 samples/sec   Loss 0.8829   LearningRate 0.0097   Epoch: 13   Global Step: 229600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:50,589-Speed 3303.04 samples/sec   Loss 0.8731   LearningRate 0.0097   Epoch: 13   Global Step: 229610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:53,669-Speed 3325.40 samples/sec   Loss 0.8699   LearningRate 0.0097   Epoch: 13   Global Step: 229620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:56,754-Speed 3320.57 samples/sec   Loss 0.8565   LearningRate 0.0097   Epoch: 13   Global Step: 229630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:31:59,869-Speed 3287.79 samples/sec   Loss 0.8446   LearningRate 0.0097   Epoch: 13   Global Step: 229640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:02,970-Speed 3302.35 samples/sec   Loss 0.8750   LearningRate 0.0097   Epoch: 13   Global Step: 229650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:32:06,038-Speed 3339.06 samples/sec   Loss 0.8543   LearningRate 0.0097   Epoch: 13   Global Step: 229660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:32:09,120-Speed 3323.02 samples/sec   Loss 0.8921   LearningRate 0.0097   Epoch: 13   Global Step: 229670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:32:12,190-Speed 3336.16 samples/sec   Loss 0.8798   LearningRate 0.0097   Epoch: 13   Global Step: 229680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:15,252-Speed 3344.74 samples/sec   Loss 0.9116   LearningRate 0.0097   Epoch: 13   Global Step: 229690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:18,334-Speed 3323.73 samples/sec   Loss 0.8516   LearningRate 0.0097   Epoch: 13   Global Step: 229700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:21,400-Speed 3340.48 samples/sec   Loss 0.8558   LearningRate 0.0097   Epoch: 13   Global Step: 229710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:24,485-Speed 3320.94 samples/sec   Loss 0.8910   LearningRate 0.0097   Epoch: 13   Global Step: 229720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:27,580-Speed 3309.07 samples/sec   Loss 0.9236   LearningRate 0.0097   Epoch: 13   Global Step: 229730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:30,691-Speed 3292.10 samples/sec   Loss 0.9238   LearningRate 0.0097   Epoch: 13   Global Step: 229740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:33,883-Speed 3208.83 samples/sec   Loss 0.8409   LearningRate 0.0097   Epoch: 13   Global Step: 229750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:36,962-Speed 3325.80 samples/sec   Loss 0.8780   LearningRate 0.0097   Epoch: 13   Global Step: 229760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:40,041-Speed 3326.53 samples/sec   Loss 0.8856   LearningRate 0.0097   Epoch: 13   Global Step: 229770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:32:43,106-Speed 3341.99 samples/sec   Loss 0.8737   LearningRate 0.0097   Epoch: 13   Global Step: 229780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:32:46,188-Speed 3323.39 samples/sec   Loss 0.8956   LearningRate 0.0097   Epoch: 13   Global Step: 229790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:32:49,320-Speed 3270.05 samples/sec   Loss 0.8740   LearningRate 0.0097   Epoch: 13   Global Step: 229800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:32:52,471-Speed 3251.14 samples/sec   Loss 0.8709   LearningRate 0.0097   Epoch: 13   Global Step: 229810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:32:55,544-Speed 3332.88 samples/sec   Loss 0.8400   LearningRate 0.0097   Epoch: 13   Global Step: 229820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:32:58,617-Speed 3332.70 samples/sec   Loss 0.8564   LearningRate 0.0097   Epoch: 13   Global Step: 229830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:33:01,685-Speed 3338.19 samples/sec   Loss 0.8249   LearningRate 0.0097   Epoch: 13   Global Step: 229840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:33:04,762-Speed 3328.42 samples/sec   Loss 0.8542   LearningRate 0.0097   Epoch: 13   Global Step: 229850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:33:07,837-Speed 3331.27 samples/sec   Loss 0.8548   LearningRate 0.0097   Epoch: 13   Global Step: 229860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:33:10,915-Speed 3327.39 samples/sec   Loss 0.8713   LearningRate 0.0097   Epoch: 13   Global Step: 229870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:33:14,001-Speed 3319.72 samples/sec   Loss 0.8808   LearningRate 0.0097   Epoch: 13   Global Step: 229880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:33:17,059-Speed 3349.10 samples/sec   Loss 0.8609   LearningRate 0.0097   Epoch: 13   Global Step: 229890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:20,138-Speed 3325.99 samples/sec   Loss 0.8594   LearningRate 0.0097   Epoch: 13   Global Step: 229900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:23,210-Speed 3334.33 samples/sec   Loss 0.8614   LearningRate 0.0097   Epoch: 13   Global Step: 229910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:26,282-Speed 3334.51 samples/sec   Loss 0.8807   LearningRate 0.0097   Epoch: 13   Global Step: 229920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:29,367-Speed 3319.89 samples/sec   Loss 0.8752   LearningRate 0.0097   Epoch: 13   Global Step: 229930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:32,471-Speed 3299.45 samples/sec   Loss 0.8871   LearningRate 0.0097   Epoch: 13   Global Step: 229940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:35,538-Speed 3339.19 samples/sec   Loss 0.8513   LearningRate 0.0097   Epoch: 13   Global Step: 229950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:38,627-Speed 3315.47 samples/sec   Loss 0.8686   LearningRate 0.0097   Epoch: 13   Global Step: 229960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:41,712-Speed 3321.32 samples/sec   Loss 0.8534   LearningRate 0.0097   Epoch: 13   Global Step: 229970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:44,921-Speed 3191.42 samples/sec   Loss 0.8699   LearningRate 0.0097   Epoch: 13   Global Step: 229980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:48,037-Speed 3286.65 samples/sec   Loss 0.8695   LearningRate 0.0097   Epoch: 13   Global Step: 229990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:33:51,197-Speed 3241.02 samples/sec   Loss 0.8706   LearningRate 0.0097   Epoch: 13   Global Step: 230000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:34:35,070-[lfw][230000]XNorm: 22.659072
Training: 2022-04-11 23:34:35,070-[lfw][230000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-04-11 23:34:35,071-[lfw][230000]Accuracy-Highest: 0.99817
Training: 2022-04-11 23:35:26,039-[cfp_fp][230000]XNorm: 22.934451
Training: 2022-04-11 23:35:26,039-[cfp_fp][230000]Accuracy-Flip: 0.99057+-0.00430
Training: 2022-04-11 23:35:26,040-[cfp_fp][230000]Accuracy-Highest: 0.99129
Training: 2022-04-11 23:36:09,941-[agedb_30][230000]XNorm: 23.436146
Training: 2022-04-11 23:36:09,942-[agedb_30][230000]Accuracy-Flip: 0.98533+-0.00674
Training: 2022-04-11 23:36:09,942-[agedb_30][230000]Accuracy-Highest: 0.98567
Training: 2022-04-11 23:36:13,085-Speed 72.17 samples/sec   Loss 0.8507   LearningRate 0.0097   Epoch: 13   Global Step: 230010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:16,277-Speed 3208.34 samples/sec   Loss 0.8498   LearningRate 0.0097   Epoch: 13   Global Step: 230020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:19,337-Speed 3347.59 samples/sec   Loss 0.8580   LearningRate 0.0097   Epoch: 13   Global Step: 230030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:22,439-Speed 3300.96 samples/sec   Loss 0.8504   LearningRate 0.0097   Epoch: 13   Global Step: 230040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:25,536-Speed 3307.07 samples/sec   Loss 0.8518   LearningRate 0.0097   Epoch: 13   Global Step: 230050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:28,658-Speed 3281.06 samples/sec   Loss 0.8860   LearningRate 0.0097   Epoch: 13   Global Step: 230060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:31,743-Speed 3319.66 samples/sec   Loss 0.8163   LearningRate 0.0097   Epoch: 13   Global Step: 230070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:34,825-Speed 3324.12 samples/sec   Loss 0.8595   LearningRate 0.0097   Epoch: 13   Global Step: 230080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:37,978-Speed 3248.57 samples/sec   Loss 0.9038   LearningRate 0.0097   Epoch: 13   Global Step: 230090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:36:41,183-Speed 3195.64 samples/sec   Loss 0.8318   LearningRate 0.0097   Epoch: 13   Global Step: 230100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:36:44,297-Speed 3289.02 samples/sec   Loss 0.8299   LearningRate 0.0097   Epoch: 13   Global Step: 230110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:36:47,364-Speed 3338.84 samples/sec   Loss 0.8613   LearningRate 0.0097   Epoch: 13   Global Step: 230120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:36:50,431-Speed 3339.36 samples/sec   Loss 0.8439   LearningRate 0.0096   Epoch: 13   Global Step: 230130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:36:53,509-Speed 3327.93 samples/sec   Loss 0.8940   LearningRate 0.0096   Epoch: 13   Global Step: 230140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:56,584-Speed 3331.11 samples/sec   Loss 0.8522   LearningRate 0.0096   Epoch: 13   Global Step: 230150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:36:59,683-Speed 3304.54 samples/sec   Loss 0.8983   LearningRate 0.0096   Epoch: 13   Global Step: 230160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:02,822-Speed 3263.57 samples/sec   Loss 0.9116   LearningRate 0.0096   Epoch: 13   Global Step: 230170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:05,919-Speed 3306.98 samples/sec   Loss 0.8952   LearningRate 0.0096   Epoch: 13   Global Step: 230180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:09,002-Speed 3321.43 samples/sec   Loss 0.8891   LearningRate 0.0096   Epoch: 13   Global Step: 230190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:12,098-Speed 3308.17 samples/sec   Loss 0.8538   LearningRate 0.0096   Epoch: 13   Global Step: 230200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:15,167-Speed 3337.63 samples/sec   Loss 0.8563   LearningRate 0.0096   Epoch: 13   Global Step: 230210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:18,243-Speed 3329.99 samples/sec   Loss 0.8748   LearningRate 0.0096   Epoch: 13   Global Step: 230220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:21,318-Speed 3330.36 samples/sec   Loss 0.8514   LearningRate 0.0096   Epoch: 13   Global Step: 230230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:24,391-Speed 3333.68 samples/sec   Loss 0.9117   LearningRate 0.0096   Epoch: 13   Global Step: 230240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:37:27,486-Speed 3309.59 samples/sec   Loss 0.8996   LearningRate 0.0096   Epoch: 13   Global Step: 230250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:37:30,620-Speed 3268.49 samples/sec   Loss 0.9086   LearningRate 0.0096   Epoch: 13   Global Step: 230260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:37:33,684-Speed 3342.11 samples/sec   Loss 0.8545   LearningRate 0.0096   Epoch: 13   Global Step: 230270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:37:36,772-Speed 3317.46 samples/sec   Loss 0.9155   LearningRate 0.0096   Epoch: 13   Global Step: 230280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:37:39,839-Speed 3338.76 samples/sec   Loss 0.8489   LearningRate 0.0096   Epoch: 13   Global Step: 230290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:42,962-Speed 3279.48 samples/sec   Loss 0.8758   LearningRate 0.0096   Epoch: 13   Global Step: 230300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:46,153-Speed 3209.67 samples/sec   Loss 0.8380   LearningRate 0.0096   Epoch: 13   Global Step: 230310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:49,242-Speed 3316.45 samples/sec   Loss 0.8382   LearningRate 0.0096   Epoch: 13   Global Step: 230320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:52,355-Speed 3289.95 samples/sec   Loss 0.9073   LearningRate 0.0096   Epoch: 13   Global Step: 230330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:55,424-Speed 3337.11 samples/sec   Loss 0.8926   LearningRate 0.0096   Epoch: 13   Global Step: 230340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:37:58,539-Speed 3289.17 samples/sec   Loss 0.8739   LearningRate 0.0096   Epoch: 13   Global Step: 230350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:01,604-Speed 3340.88 samples/sec   Loss 0.8548   LearningRate 0.0096   Epoch: 13   Global Step: 230360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:04,701-Speed 3306.96 samples/sec   Loss 0.8606   LearningRate 0.0096   Epoch: 13   Global Step: 230370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:07,851-Speed 3251.60 samples/sec   Loss 0.8849   LearningRate 0.0096   Epoch: 13   Global Step: 230380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:11,096-Speed 3156.42 samples/sec   Loss 0.8877   LearningRate 0.0096   Epoch: 13   Global Step: 230390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:38:14,332-Speed 3165.28 samples/sec   Loss 0.8741   LearningRate 0.0096   Epoch: 13   Global Step: 230400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:38:17,540-Speed 3192.90 samples/sec   Loss 0.8600   LearningRate 0.0096   Epoch: 13   Global Step: 230410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:38:20,670-Speed 3271.87 samples/sec   Loss 0.8400   LearningRate 0.0096   Epoch: 13   Global Step: 230420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:38:23,738-Speed 3339.53 samples/sec   Loss 0.8423   LearningRate 0.0096   Epoch: 13   Global Step: 230430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:38:26,797-Speed 3348.10 samples/sec   Loss 0.8817   LearningRate 0.0096   Epoch: 13   Global Step: 230440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:29,884-Speed 3317.88 samples/sec   Loss 0.9290   LearningRate 0.0096   Epoch: 13   Global Step: 230450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:33,060-Speed 3224.44 samples/sec   Loss 0.8696   LearningRate 0.0096   Epoch: 13   Global Step: 230460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:36,250-Speed 3210.75 samples/sec   Loss 0.8972   LearningRate 0.0096   Epoch: 13   Global Step: 230470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:39,445-Speed 3205.59 samples/sec   Loss 0.9008   LearningRate 0.0096   Epoch: 13   Global Step: 230480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:42,692-Speed 3153.91 samples/sec   Loss 0.8751   LearningRate 0.0096   Epoch: 13   Global Step: 230490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:45,791-Speed 3305.86 samples/sec   Loss 0.8501   LearningRate 0.0096   Epoch: 13   Global Step: 230500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:48,934-Speed 3258.52 samples/sec   Loss 0.8939   LearningRate 0.0096   Epoch: 13   Global Step: 230510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:52,019-Speed 3320.24 samples/sec   Loss 0.9150   LearningRate 0.0096   Epoch: 13   Global Step: 230520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:55,088-Speed 3337.86 samples/sec   Loss 0.8917   LearningRate 0.0096   Epoch: 13   Global Step: 230530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:38:58,162-Speed 3331.63 samples/sec   Loss 0.8522   LearningRate 0.0096   Epoch: 13   Global Step: 230540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:01,250-Speed 3317.12 samples/sec   Loss 0.8587   LearningRate 0.0096   Epoch: 13   Global Step: 230550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:04,351-Speed 3301.95 samples/sec   Loss 0.8804   LearningRate 0.0096   Epoch: 13   Global Step: 230560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:07,422-Speed 3335.61 samples/sec   Loss 0.8847   LearningRate 0.0096   Epoch: 13   Global Step: 230570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:10,506-Speed 3321.52 samples/sec   Loss 0.9154   LearningRate 0.0096   Epoch: 13   Global Step: 230580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:13,598-Speed 3312.12 samples/sec   Loss 0.9166   LearningRate 0.0096   Epoch: 13   Global Step: 230590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:16,686-Speed 3316.96 samples/sec   Loss 0.8934   LearningRate 0.0096   Epoch: 13   Global Step: 230600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:19,792-Speed 3297.61 samples/sec   Loss 0.8874   LearningRate 0.0096   Epoch: 13   Global Step: 230610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:22,858-Speed 3340.94 samples/sec   Loss 0.8391   LearningRate 0.0096   Epoch: 13   Global Step: 230620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:25,930-Speed 3334.79 samples/sec   Loss 0.8839   LearningRate 0.0096   Epoch: 13   Global Step: 230630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:39:28,999-Speed 3337.05 samples/sec   Loss 0.8900   LearningRate 0.0096   Epoch: 13   Global Step: 230640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:39:32,207-Speed 3192.43 samples/sec   Loss 0.8633   LearningRate 0.0096   Epoch: 13   Global Step: 230650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:39:35,377-Speed 3231.10 samples/sec   Loss 0.8691   LearningRate 0.0095   Epoch: 13   Global Step: 230660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:39:38,496-Speed 3284.36 samples/sec   Loss 0.9026   LearningRate 0.0095   Epoch: 13   Global Step: 230670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:39:41,570-Speed 3331.88 samples/sec   Loss 0.8877   LearningRate 0.0095   Epoch: 13   Global Step: 230680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:39:44,725-Speed 3246.12 samples/sec   Loss 0.8963   LearningRate 0.0095   Epoch: 13   Global Step: 230690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:39:47,789-Speed 3343.09 samples/sec   Loss 0.8709   LearningRate 0.0095   Epoch: 13   Global Step: 230700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:39:50,924-Speed 3267.44 samples/sec   Loss 0.8833   LearningRate 0.0095   Epoch: 13   Global Step: 230710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:39:54,009-Speed 3319.41 samples/sec   Loss 0.8870   LearningRate 0.0095   Epoch: 13   Global Step: 230720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:39:57,095-Speed 3319.55 samples/sec   Loss 0.8468   LearningRate 0.0095   Epoch: 13   Global Step: 230730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:00,164-Speed 3336.83 samples/sec   Loss 0.8665   LearningRate 0.0095   Epoch: 13   Global Step: 230740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:40:03,227-Speed 3343.58 samples/sec   Loss 0.8789   LearningRate 0.0095   Epoch: 13   Global Step: 230750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:06,298-Speed 3335.83 samples/sec   Loss 0.8802   LearningRate 0.0095   Epoch: 13   Global Step: 230760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:09,389-Speed 3313.55 samples/sec   Loss 0.8851   LearningRate 0.0095   Epoch: 13   Global Step: 230770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:12,479-Speed 3314.14 samples/sec   Loss 0.8937   LearningRate 0.0095   Epoch: 13   Global Step: 230780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:15,573-Speed 3310.42 samples/sec   Loss 0.9129   LearningRate 0.0095   Epoch: 13   Global Step: 230790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:18,651-Speed 3328.37 samples/sec   Loss 0.8804   LearningRate 0.0095   Epoch: 13   Global Step: 230800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:21,726-Speed 3330.08 samples/sec   Loss 0.8951   LearningRate 0.0095   Epoch: 13   Global Step: 230810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:24,833-Speed 3296.91 samples/sec   Loss 0.8768   LearningRate 0.0095   Epoch: 13   Global Step: 230820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:28,000-Speed 3234.41 samples/sec   Loss 0.8967   LearningRate 0.0095   Epoch: 13   Global Step: 230830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:31,073-Speed 3332.63 samples/sec   Loss 0.8915   LearningRate 0.0095   Epoch: 13   Global Step: 230840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:40:34,141-Speed 3338.34 samples/sec   Loss 0.8562   LearningRate 0.0095   Epoch: 13   Global Step: 230850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:40:37,210-Speed 3336.93 samples/sec   Loss 0.8688   LearningRate 0.0095   Epoch: 13   Global Step: 230860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:40:40,327-Speed 3286.31 samples/sec   Loss 0.9021   LearningRate 0.0095   Epoch: 13   Global Step: 230870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:40:43,438-Speed 3292.11 samples/sec   Loss 0.9129   LearningRate 0.0095   Epoch: 13   Global Step: 230880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:40:46,550-Speed 3291.15 samples/sec   Loss 0.8680   LearningRate 0.0095   Epoch: 13   Global Step: 230890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:40:49,633-Speed 3322.51 samples/sec   Loss 0.8703   LearningRate 0.0095   Epoch: 13   Global Step: 230900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:40:52,716-Speed 3322.20 samples/sec   Loss 0.9106   LearningRate 0.0095   Epoch: 13   Global Step: 230910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:40:55,801-Speed 3320.52 samples/sec   Loss 0.8784   LearningRate 0.0095   Epoch: 13   Global Step: 230920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:40:58,889-Speed 3316.20 samples/sec   Loss 0.9043   LearningRate 0.0095   Epoch: 13   Global Step: 230930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:41:01,957-Speed 3338.61 samples/sec   Loss 0.8900   LearningRate 0.0095   Epoch: 13   Global Step: 230940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:41:05,017-Speed 3347.74 samples/sec   Loss 0.8525   LearningRate 0.0095   Epoch: 13   Global Step: 230950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:41:08,105-Speed 3316.39 samples/sec   Loss 0.9089   LearningRate 0.0095   Epoch: 13   Global Step: 230960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:41:11,195-Speed 3314.92 samples/sec   Loss 0.8663   LearningRate 0.0095   Epoch: 13   Global Step: 230970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:14,282-Speed 3317.67 samples/sec   Loss 0.8863   LearningRate 0.0095   Epoch: 13   Global Step: 230980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:17,356-Speed 3331.63 samples/sec   Loss 0.8657   LearningRate 0.0095   Epoch: 13   Global Step: 230990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:20,437-Speed 3324.43 samples/sec   Loss 0.8767   LearningRate 0.0095   Epoch: 13   Global Step: 231000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:23,527-Speed 3315.11 samples/sec   Loss 0.8818   LearningRate 0.0095   Epoch: 13   Global Step: 231010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:26,600-Speed 3332.33 samples/sec   Loss 0.9204   LearningRate 0.0095   Epoch: 13   Global Step: 231020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:29,698-Speed 3306.74 samples/sec   Loss 0.8771   LearningRate 0.0095   Epoch: 13   Global Step: 231030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:32,838-Speed 3262.68 samples/sec   Loss 0.8845   LearningRate 0.0095   Epoch: 13   Global Step: 231040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:36,023-Speed 3215.54 samples/sec   Loss 0.8430   LearningRate 0.0095   Epoch: 13   Global Step: 231050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:39,215-Speed 3208.79 samples/sec   Loss 0.8872   LearningRate 0.0095   Epoch: 13   Global Step: 231060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:42,324-Speed 3294.67 samples/sec   Loss 0.8717   LearningRate 0.0095   Epoch: 13   Global Step: 231070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:41:45,442-Speed 3284.86 samples/sec   Loss 0.8682   LearningRate 0.0095   Epoch: 13   Global Step: 231080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:41:48,629-Speed 3212.81 samples/sec   Loss 0.8762   LearningRate 0.0095   Epoch: 13   Global Step: 231090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:41:51,761-Speed 3270.45 samples/sec   Loss 0.8512   LearningRate 0.0095   Epoch: 13   Global Step: 231100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:41:54,833-Speed 3334.20 samples/sec   Loss 0.8896   LearningRate 0.0095   Epoch: 13   Global Step: 231110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:41:57,948-Speed 3288.99 samples/sec   Loss 0.9308   LearningRate 0.0095   Epoch: 13   Global Step: 231120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:01,065-Speed 3286.11 samples/sec   Loss 0.8817   LearningRate 0.0095   Epoch: 13   Global Step: 231130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:04,200-Speed 3266.73 samples/sec   Loss 0.8600   LearningRate 0.0095   Epoch: 13   Global Step: 231140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:07,296-Speed 3307.67 samples/sec   Loss 0.8929   LearningRate 0.0095   Epoch: 13   Global Step: 231150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:10,415-Speed 3283.96 samples/sec   Loss 0.8597   LearningRate 0.0095   Epoch: 13   Global Step: 231160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:13,513-Speed 3306.85 samples/sec   Loss 0.8890   LearningRate 0.0095   Epoch: 13   Global Step: 231170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:16,604-Speed 3313.25 samples/sec   Loss 0.8696   LearningRate 0.0095   Epoch: 13   Global Step: 231180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:19,694-Speed 3314.10 samples/sec   Loss 0.8547   LearningRate 0.0095   Epoch: 13   Global Step: 231190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:22,767-Speed 3334.14 samples/sec   Loss 0.8521   LearningRate 0.0095   Epoch: 13   Global Step: 231200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:25,838-Speed 3334.83 samples/sec   Loss 0.8571   LearningRate 0.0094   Epoch: 13   Global Step: 231210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:42:28,913-Speed 3331.32 samples/sec   Loss 0.8151   LearningRate 0.0094   Epoch: 13   Global Step: 231220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:42:31,977-Speed 3342.91 samples/sec   Loss 0.8706   LearningRate 0.0094   Epoch: 13   Global Step: 231230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:35,050-Speed 3331.89 samples/sec   Loss 0.8746   LearningRate 0.0094   Epoch: 13   Global Step: 231240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:38,152-Speed 3301.98 samples/sec   Loss 0.9013   LearningRate 0.0094   Epoch: 13   Global Step: 231250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:41,363-Speed 3190.52 samples/sec   Loss 0.8818   LearningRate 0.0094   Epoch: 13   Global Step: 231260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:44,511-Speed 3253.52 samples/sec   Loss 0.8608   LearningRate 0.0094   Epoch: 13   Global Step: 231270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:47,599-Speed 3315.93 samples/sec   Loss 0.9318   LearningRate 0.0094   Epoch: 13   Global Step: 231280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:50,675-Speed 3330.76 samples/sec   Loss 0.8861   LearningRate 0.0094   Epoch: 13   Global Step: 231290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:53,787-Speed 3291.51 samples/sec   Loss 0.9204   LearningRate 0.0094   Epoch: 13   Global Step: 231300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:42:56,964-Speed 3223.05 samples/sec   Loss 0.8602   LearningRate 0.0094   Epoch: 13   Global Step: 231310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:00,096-Speed 3270.33 samples/sec   Loss 0.8658   LearningRate 0.0094   Epoch: 13   Global Step: 231320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:03,163-Speed 3339.33 samples/sec   Loss 0.8319   LearningRate 0.0094   Epoch: 13   Global Step: 231330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:43:06,218-Speed 3352.66 samples/sec   Loss 0.8430   LearningRate 0.0094   Epoch: 13   Global Step: 231340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:09,322-Speed 3300.00 samples/sec   Loss 0.8445   LearningRate 0.0094   Epoch: 13   Global Step: 231350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:12,513-Speed 3209.74 samples/sec   Loss 0.8563   LearningRate 0.0094   Epoch: 13   Global Step: 231360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:15,625-Speed 3291.00 samples/sec   Loss 0.8490   LearningRate 0.0094   Epoch: 13   Global Step: 231370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:18,709-Speed 3321.31 samples/sec   Loss 0.8533   LearningRate 0.0094   Epoch: 13   Global Step: 231380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:21,805-Speed 3309.08 samples/sec   Loss 0.8642   LearningRate 0.0094   Epoch: 13   Global Step: 231390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:24,884-Speed 3326.02 samples/sec   Loss 0.8425   LearningRate 0.0094   Epoch: 13   Global Step: 231400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:27,949-Speed 3341.65 samples/sec   Loss 0.8860   LearningRate 0.0094   Epoch: 13   Global Step: 231410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:31,028-Speed 3326.89 samples/sec   Loss 0.8786   LearningRate 0.0094   Epoch: 13   Global Step: 231420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:34,118-Speed 3314.13 samples/sec   Loss 0.8651   LearningRate 0.0094   Epoch: 13   Global Step: 231430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:43:37,250-Speed 3270.53 samples/sec   Loss 0.8580   LearningRate 0.0094   Epoch: 13   Global Step: 231440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:43:40,322-Speed 3334.39 samples/sec   Loss 0.8438   LearningRate 0.0094   Epoch: 13   Global Step: 231450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:43:43,500-Speed 3222.44 samples/sec   Loss 0.9245   LearningRate 0.0094   Epoch: 13   Global Step: 231460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:43:46,582-Speed 3323.26 samples/sec   Loss 0.9200   LearningRate 0.0094   Epoch: 13   Global Step: 231470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:43:49,669-Speed 3318.09 samples/sec   Loss 0.8488   LearningRate 0.0094   Epoch: 13   Global Step: 231480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:43:52,745-Speed 3330.32 samples/sec   Loss 0.8806   LearningRate 0.0094   Epoch: 13   Global Step: 231490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:43:55,815-Speed 3336.37 samples/sec   Loss 0.8850   LearningRate 0.0094   Epoch: 13   Global Step: 231500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:43:58,892-Speed 3328.13 samples/sec   Loss 0.8541   LearningRate 0.0094   Epoch: 13   Global Step: 231510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:44:01,965-Speed 3333.05 samples/sec   Loss 0.8823   LearningRate 0.0094   Epoch: 13   Global Step: 231520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:44:05,128-Speed 3238.44 samples/sec   Loss 0.8854   LearningRate 0.0094   Epoch: 13   Global Step: 231530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:08,261-Speed 3268.16 samples/sec   Loss 0.8614   LearningRate 0.0094   Epoch: 13   Global Step: 231540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:11,492-Speed 3170.98 samples/sec   Loss 0.8565   LearningRate 0.0094   Epoch: 13   Global Step: 231550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:14,619-Speed 3275.51 samples/sec   Loss 0.8567   LearningRate 0.0094   Epoch: 13   Global Step: 231560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:17,688-Speed 3336.85 samples/sec   Loss 0.8821   LearningRate 0.0094   Epoch: 13   Global Step: 231570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:20,772-Speed 3321.77 samples/sec   Loss 0.8549   LearningRate 0.0094   Epoch: 13   Global Step: 231580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:23,850-Speed 3327.57 samples/sec   Loss 0.8718   LearningRate 0.0094   Epoch: 13   Global Step: 231590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:26,923-Speed 3332.90 samples/sec   Loss 0.8300   LearningRate 0.0094   Epoch: 13   Global Step: 231600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:30,048-Speed 3277.88 samples/sec   Loss 0.9284   LearningRate 0.0094   Epoch: 13   Global Step: 231610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:33,115-Speed 3338.87 samples/sec   Loss 0.8284   LearningRate 0.0094   Epoch: 13   Global Step: 231620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:36,180-Speed 3342.23 samples/sec   Loss 0.8746   LearningRate 0.0094   Epoch: 13   Global Step: 231630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:44:39,254-Speed 3331.38 samples/sec   Loss 0.8773   LearningRate 0.0094   Epoch: 13   Global Step: 231640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:44:42,342-Speed 3317.44 samples/sec   Loss 0.9108   LearningRate 0.0094   Epoch: 13   Global Step: 231650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:44:45,398-Speed 3352.86 samples/sec   Loss 0.8728   LearningRate 0.0094   Epoch: 13   Global Step: 231660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:48,474-Speed 3329.92 samples/sec   Loss 0.8685   LearningRate 0.0094   Epoch: 13   Global Step: 231670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:51,572-Speed 3305.75 samples/sec   Loss 0.8786   LearningRate 0.0094   Epoch: 13   Global Step: 231680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:54,642-Speed 3335.68 samples/sec   Loss 0.8752   LearningRate 0.0094   Epoch: 13   Global Step: 231690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:44:57,707-Speed 3342.54 samples/sec   Loss 0.8575   LearningRate 0.0094   Epoch: 13   Global Step: 231700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:00,790-Speed 3321.95 samples/sec   Loss 0.8523   LearningRate 0.0094   Epoch: 13   Global Step: 231710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:03,860-Speed 3337.34 samples/sec   Loss 0.8835   LearningRate 0.0094   Epoch: 13   Global Step: 231720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:06,936-Speed 3329.16 samples/sec   Loss 0.8874   LearningRate 0.0094   Epoch: 13   Global Step: 231730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:09,989-Speed 3354.62 samples/sec   Loss 0.8134   LearningRate 0.0094   Epoch: 13   Global Step: 231740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:13,055-Speed 3340.53 samples/sec   Loss 0.8865   LearningRate 0.0093   Epoch: 13   Global Step: 231750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:16,132-Speed 3329.51 samples/sec   Loss 0.8885   LearningRate 0.0093   Epoch: 13   Global Step: 231760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:19,208-Speed 3329.27 samples/sec   Loss 0.8601   LearningRate 0.0093   Epoch: 13   Global Step: 231770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:22,310-Speed 3302.15 samples/sec   Loss 0.8483   LearningRate 0.0093   Epoch: 13   Global Step: 231780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:25,377-Speed 3338.72 samples/sec   Loss 0.8729   LearningRate 0.0093   Epoch: 13   Global Step: 231790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:28,455-Speed 3328.49 samples/sec   Loss 0.8734   LearningRate 0.0093   Epoch: 13   Global Step: 231800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:31,536-Speed 3323.92 samples/sec   Loss 0.8622   LearningRate 0.0093   Epoch: 13   Global Step: 231810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:34,636-Speed 3304.85 samples/sec   Loss 0.9068   LearningRate 0.0093   Epoch: 13   Global Step: 231820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:37,702-Speed 3339.93 samples/sec   Loss 0.8709   LearningRate 0.0093   Epoch: 13   Global Step: 231830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:45:40,768-Speed 3341.03 samples/sec   Loss 0.8721   LearningRate 0.0093   Epoch: 13   Global Step: 231840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:43,836-Speed 3337.87 samples/sec   Loss 0.8322   LearningRate 0.0093   Epoch: 13   Global Step: 231850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:46,904-Speed 3338.23 samples/sec   Loss 0.8944   LearningRate 0.0093   Epoch: 13   Global Step: 231860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:50,008-Speed 3299.57 samples/sec   Loss 0.8380   LearningRate 0.0093   Epoch: 13   Global Step: 231870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:53,096-Speed 3317.14 samples/sec   Loss 0.9300   LearningRate 0.0093   Epoch: 13   Global Step: 231880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:56,192-Speed 3308.81 samples/sec   Loss 0.9046   LearningRate 0.0093   Epoch: 13   Global Step: 231890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:45:59,259-Speed 3340.01 samples/sec   Loss 0.8395   LearningRate 0.0093   Epoch: 13   Global Step: 231900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:46:02,333-Speed 3331.27 samples/sec   Loss 0.8937   LearningRate 0.0093   Epoch: 13   Global Step: 231910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:46:05,406-Speed 3333.04 samples/sec   Loss 0.8611   LearningRate 0.0093   Epoch: 13   Global Step: 231920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:46:08,642-Speed 3165.85 samples/sec   Loss 0.8582   LearningRate 0.0093   Epoch: 13   Global Step: 231930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:46:11,718-Speed 3328.94 samples/sec   Loss 0.8576   LearningRate 0.0093   Epoch: 13   Global Step: 231940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:46:14,871-Speed 3248.92 samples/sec   Loss 0.8978   LearningRate 0.0093   Epoch: 13   Global Step: 231950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:46:18,112-Speed 3159.51 samples/sec   Loss 0.8578   LearningRate 0.0093   Epoch: 13   Global Step: 231960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:46:21,228-Speed 3288.00 samples/sec   Loss 0.8400   LearningRate 0.0093   Epoch: 13   Global Step: 231970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:46:24,360-Speed 3269.75 samples/sec   Loss 0.8942   LearningRate 0.0093   Epoch: 13   Global Step: 231980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:46:27,474-Speed 3289.74 samples/sec   Loss 0.8970   LearningRate 0.0093   Epoch: 13   Global Step: 231990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:46:30,613-Speed 3262.65 samples/sec   Loss 0.9040   LearningRate 0.0093   Epoch: 13   Global Step: 232000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:47:14,231-[lfw][232000]XNorm: 20.313571
Training: 2022-04-11 23:47:14,232-[lfw][232000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-11 23:47:14,232-[lfw][232000]Accuracy-Highest: 0.99817
Training: 2022-04-11 23:48:05,160-[cfp_fp][232000]XNorm: 20.729209
Training: 2022-04-11 23:48:05,161-[cfp_fp][232000]Accuracy-Flip: 0.98943+-0.00405
Training: 2022-04-11 23:48:05,161-[cfp_fp][232000]Accuracy-Highest: 0.99129
Training: 2022-04-11 23:48:49,019-[agedb_30][232000]XNorm: 21.398841
Training: 2022-04-11 23:48:49,019-[agedb_30][232000]Accuracy-Flip: 0.98533+-0.00686
Training: 2022-04-11 23:48:49,020-[agedb_30][232000]Accuracy-Highest: 0.98567
Training: 2022-04-11 23:48:52,082-Speed 72.38 samples/sec   Loss 0.8366   LearningRate 0.0093   Epoch: 13   Global Step: 232010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:48:55,141-Speed 3347.80 samples/sec   Loss 0.8638   LearningRate 0.0093   Epoch: 13   Global Step: 232020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:48:58,298-Speed 3244.21 samples/sec   Loss 0.9113   LearningRate 0.0093   Epoch: 13   Global Step: 232030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:01,447-Speed 3252.87 samples/sec   Loss 0.8803   LearningRate 0.0093   Epoch: 13   Global Step: 232040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:04,514-Speed 3338.78 samples/sec   Loss 0.8667   LearningRate 0.0093   Epoch: 13   Global Step: 232050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:07,626-Speed 3291.76 samples/sec   Loss 0.8670   LearningRate 0.0093   Epoch: 13   Global Step: 232060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:10,769-Speed 3258.45 samples/sec   Loss 0.8657   LearningRate 0.0093   Epoch: 13   Global Step: 232070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:13,935-Speed 3235.51 samples/sec   Loss 0.8313   LearningRate 0.0093   Epoch: 13   Global Step: 232080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:49:17,086-Speed 3250.31 samples/sec   Loss 0.8387   LearningRate 0.0093   Epoch: 13   Global Step: 232090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:49:20,172-Speed 3318.45 samples/sec   Loss 0.8531   LearningRate 0.0093   Epoch: 13   Global Step: 232100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:49:23,295-Speed 3280.05 samples/sec   Loss 0.8857   LearningRate 0.0093   Epoch: 13   Global Step: 232110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:49:26,362-Speed 3339.41 samples/sec   Loss 0.8855   LearningRate 0.0093   Epoch: 13   Global Step: 232120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:49:29,490-Speed 3274.07 samples/sec   Loss 0.9164   LearningRate 0.0093   Epoch: 13   Global Step: 232130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:32,669-Speed 3221.98 samples/sec   Loss 0.8645   LearningRate 0.0093   Epoch: 13   Global Step: 232140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:35,881-Speed 3188.64 samples/sec   Loss 0.8895   LearningRate 0.0093   Epoch: 13   Global Step: 232150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:39,014-Speed 3269.13 samples/sec   Loss 0.8650   LearningRate 0.0093   Epoch: 13   Global Step: 232160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:42,113-Speed 3305.35 samples/sec   Loss 0.8743   LearningRate 0.0093   Epoch: 13   Global Step: 232170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:45,270-Speed 3244.81 samples/sec   Loss 0.8797   LearningRate 0.0093   Epoch: 13   Global Step: 232180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:48,349-Speed 3326.30 samples/sec   Loss 0.9002   LearningRate 0.0093   Epoch: 13   Global Step: 232190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:51,447-Speed 3306.47 samples/sec   Loss 0.8619   LearningRate 0.0093   Epoch: 13   Global Step: 232200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:54,511-Speed 3342.20 samples/sec   Loss 0.8417   LearningRate 0.0093   Epoch: 13   Global Step: 232210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:49:57,581-Speed 3337.05 samples/sec   Loss 0.8503   LearningRate 0.0093   Epoch: 13   Global Step: 232220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:00,653-Speed 3334.14 samples/sec   Loss 0.8818   LearningRate 0.0093   Epoch: 13   Global Step: 232230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:50:03,736-Speed 3321.69 samples/sec   Loss 0.8802   LearningRate 0.0093   Epoch: 13   Global Step: 232240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:50:06,805-Speed 3337.10 samples/sec   Loss 0.8495   LearningRate 0.0093   Epoch: 13   Global Step: 232250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:50:09,959-Speed 3247.09 samples/sec   Loss 0.8822   LearningRate 0.0093   Epoch: 13   Global Step: 232260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:50:13,111-Speed 3250.27 samples/sec   Loss 0.8909   LearningRate 0.0093   Epoch: 13   Global Step: 232270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:50:16,190-Speed 3326.26 samples/sec   Loss 0.8528   LearningRate 0.0093   Epoch: 13   Global Step: 232280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:19,267-Speed 3328.79 samples/sec   Loss 0.9059   LearningRate 0.0093   Epoch: 13   Global Step: 232290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:22,330-Speed 3343.63 samples/sec   Loss 0.9016   LearningRate 0.0092   Epoch: 13   Global Step: 232300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:25,425-Speed 3309.16 samples/sec   Loss 0.9156   LearningRate 0.0092   Epoch: 13   Global Step: 232310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:28,508-Speed 3322.50 samples/sec   Loss 0.8669   LearningRate 0.0092   Epoch: 13   Global Step: 232320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:31,599-Speed 3313.27 samples/sec   Loss 0.8506   LearningRate 0.0092   Epoch: 13   Global Step: 232330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:34,667-Speed 3338.81 samples/sec   Loss 0.8618   LearningRate 0.0092   Epoch: 13   Global Step: 232340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:37,773-Speed 3297.52 samples/sec   Loss 0.8477   LearningRate 0.0092   Epoch: 13   Global Step: 232350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:40,861-Speed 3317.32 samples/sec   Loss 0.8604   LearningRate 0.0092   Epoch: 13   Global Step: 232360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:43,940-Speed 3326.04 samples/sec   Loss 0.8316   LearningRate 0.0092   Epoch: 13   Global Step: 232370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:47,013-Speed 3333.01 samples/sec   Loss 0.8895   LearningRate 0.0092   Epoch: 13   Global Step: 232380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:50:50,089-Speed 3330.16 samples/sec   Loss 0.8750   LearningRate 0.0092   Epoch: 13   Global Step: 232390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:50:53,163-Speed 3331.34 samples/sec   Loss 0.8758   LearningRate 0.0092   Epoch: 13   Global Step: 232400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:56,245-Speed 3323.89 samples/sec   Loss 0.8856   LearningRate 0.0092   Epoch: 13   Global Step: 232410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:50:59,328-Speed 3322.07 samples/sec   Loss 0.9311   LearningRate 0.0092   Epoch: 13   Global Step: 232420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:51:02,432-Speed 3299.43 samples/sec   Loss 0.8725   LearningRate 0.0092   Epoch: 13   Global Step: 232430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:51:05,505-Speed 3332.78 samples/sec   Loss 0.8592   LearningRate 0.0092   Epoch: 13   Global Step: 232440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:51:08,639-Speed 3268.56 samples/sec   Loss 0.8701   LearningRate 0.0092   Epoch: 13   Global Step: 232450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:51:11,752-Speed 3289.91 samples/sec   Loss 0.8741   LearningRate 0.0092   Epoch: 13   Global Step: 232460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:51:14,826-Speed 3332.21 samples/sec   Loss 0.8749   LearningRate 0.0092   Epoch: 13   Global Step: 232470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:51:17,914-Speed 3317.18 samples/sec   Loss 0.9210   LearningRate 0.0092   Epoch: 13   Global Step: 232480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:51:21,008-Speed 3309.60 samples/sec   Loss 0.8790   LearningRate 0.0092   Epoch: 13   Global Step: 232490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:51:24,111-Speed 3301.31 samples/sec   Loss 0.8736   LearningRate 0.0092   Epoch: 13   Global Step: 232500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:27,232-Speed 3282.14 samples/sec   Loss 0.8773   LearningRate 0.0092   Epoch: 13   Global Step: 232510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:30,337-Speed 3298.38 samples/sec   Loss 0.8452   LearningRate 0.0092   Epoch: 13   Global Step: 232520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:33,410-Speed 3333.40 samples/sec   Loss 0.8947   LearningRate 0.0092   Epoch: 13   Global Step: 232530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:36,496-Speed 3318.97 samples/sec   Loss 0.8387   LearningRate 0.0092   Epoch: 13   Global Step: 232540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:39,634-Speed 3263.42 samples/sec   Loss 0.9034   LearningRate 0.0092   Epoch: 13   Global Step: 232550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:42,719-Speed 3320.45 samples/sec   Loss 0.8945   LearningRate 0.0092   Epoch: 13   Global Step: 232560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:45,830-Speed 3292.79 samples/sec   Loss 0.8391   LearningRate 0.0092   Epoch: 13   Global Step: 232570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:48,927-Speed 3306.50 samples/sec   Loss 0.8641   LearningRate 0.0092   Epoch: 13   Global Step: 232580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:52,011-Speed 3321.27 samples/sec   Loss 0.8367   LearningRate 0.0092   Epoch: 13   Global Step: 232590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:55,142-Speed 3271.12 samples/sec   Loss 0.8625   LearningRate 0.0092   Epoch: 13   Global Step: 232600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:51:58,403-Speed 3141.33 samples/sec   Loss 0.8692   LearningRate 0.0092   Epoch: 13   Global Step: 232610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:52:01,603-Speed 3200.32 samples/sec   Loss 0.8664   LearningRate 0.0092   Epoch: 13   Global Step: 232620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:04,705-Speed 3302.41 samples/sec   Loss 0.8564   LearningRate 0.0092   Epoch: 13   Global Step: 232630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:07,859-Speed 3247.20 samples/sec   Loss 0.9001   LearningRate 0.0092   Epoch: 13   Global Step: 232640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:10,962-Speed 3301.47 samples/sec   Loss 0.9009   LearningRate 0.0092   Epoch: 13   Global Step: 232650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:14,057-Speed 3308.50 samples/sec   Loss 0.8908   LearningRate 0.0092   Epoch: 13   Global Step: 232660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:17,140-Speed 3322.18 samples/sec   Loss 0.8658   LearningRate 0.0092   Epoch: 13   Global Step: 232670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:20,298-Speed 3244.00 samples/sec   Loss 0.8550   LearningRate 0.0092   Epoch: 13   Global Step: 232680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:23,366-Speed 3338.06 samples/sec   Loss 0.8713   LearningRate 0.0092   Epoch: 13   Global Step: 232690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:26,445-Speed 3326.92 samples/sec   Loss 0.8698   LearningRate 0.0092   Epoch: 13   Global Step: 232700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:29,563-Speed 3284.02 samples/sec   Loss 0.8916   LearningRate 0.0092   Epoch: 13   Global Step: 232710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:52:32,640-Speed 3328.95 samples/sec   Loss 0.8807   LearningRate 0.0092   Epoch: 13   Global Step: 232720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:52:35,755-Speed 3288.16 samples/sec   Loss 0.8821   LearningRate 0.0092   Epoch: 13   Global Step: 232730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:52:38,867-Speed 3291.33 samples/sec   Loss 0.8685   LearningRate 0.0092   Epoch: 13   Global Step: 232740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:52:41,961-Speed 3310.26 samples/sec   Loss 0.9027   LearningRate 0.0092   Epoch: 13   Global Step: 232750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:52:45,094-Speed 3270.15 samples/sec   Loss 0.8612   LearningRate 0.0092   Epoch: 13   Global Step: 232760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:52:48,201-Speed 3295.82 samples/sec   Loss 0.8220   LearningRate 0.0092   Epoch: 13   Global Step: 232770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:52:51,295-Speed 3310.69 samples/sec   Loss 0.9146   LearningRate 0.0092   Epoch: 13   Global Step: 232780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:52:54,438-Speed 3258.30 samples/sec   Loss 0.8769   LearningRate 0.0092   Epoch: 13   Global Step: 232790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:52:57,507-Speed 3336.95 samples/sec   Loss 0.8899   LearningRate 0.0092   Epoch: 13   Global Step: 232800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:53:00,582-Speed 3332.26 samples/sec   Loss 0.9064   LearningRate 0.0092   Epoch: 13   Global Step: 232810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:53:03,649-Speed 3339.29 samples/sec   Loss 0.8650   LearningRate 0.0092   Epoch: 13   Global Step: 232820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:53:06,718-Speed 3337.69 samples/sec   Loss 0.8587   LearningRate 0.0092   Epoch: 13   Global Step: 232830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:53:09,784-Speed 3339.97 samples/sec   Loss 0.8712   LearningRate 0.0092   Epoch: 13   Global Step: 232840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:53:12,857-Speed 3333.31 samples/sec   Loss 0.9147   LearningRate 0.0091   Epoch: 13   Global Step: 232850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:53:15,922-Speed 3341.65 samples/sec   Loss 0.8821   LearningRate 0.0091   Epoch: 13   Global Step: 232860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:53:19,019-Speed 3307.14 samples/sec   Loss 0.8380   LearningRate 0.0091   Epoch: 13   Global Step: 232870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:53:22,110-Speed 3313.36 samples/sec   Loss 0.8636   LearningRate 0.0091   Epoch: 13   Global Step: 232880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:53:25,173-Speed 3343.17 samples/sec   Loss 0.8571   LearningRate 0.0091   Epoch: 13   Global Step: 232890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:53:28,252-Speed 3327.35 samples/sec   Loss 0.8696   LearningRate 0.0091   Epoch: 13   Global Step: 232900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:53:31,304-Speed 3355.97 samples/sec   Loss 0.8555   LearningRate 0.0091   Epoch: 13   Global Step: 232910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:53:34,383-Speed 3326.25 samples/sec   Loss 0.8851   LearningRate 0.0091   Epoch: 13   Global Step: 232920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:53:37,458-Speed 3331.37 samples/sec   Loss 0.8468   LearningRate 0.0091   Epoch: 13   Global Step: 232930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:53:40,529-Speed 3334.46 samples/sec   Loss 0.8975   LearningRate 0.0091   Epoch: 13   Global Step: 232940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:53:43,600-Speed 3335.54 samples/sec   Loss 0.8577   LearningRate 0.0091   Epoch: 13   Global Step: 232950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:53:46,821-Speed 3179.60 samples/sec   Loss 0.8843   LearningRate 0.0091   Epoch: 13   Global Step: 232960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:53:50,060-Speed 3162.50 samples/sec   Loss 0.8721   LearningRate 0.0091   Epoch: 13   Global Step: 232970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:53:53,256-Speed 3205.01 samples/sec   Loss 0.8557   LearningRate 0.0091   Epoch: 13   Global Step: 232980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:53:56,351-Speed 3309.28 samples/sec   Loss 0.8679   LearningRate 0.0091   Epoch: 13   Global Step: 232990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:53:59,486-Speed 3266.33 samples/sec   Loss 0.8664   LearningRate 0.0091   Epoch: 13   Global Step: 233000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:02,623-Speed 3265.43 samples/sec   Loss 0.8481   LearningRate 0.0091   Epoch: 13   Global Step: 233010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:54:05,689-Speed 3340.47 samples/sec   Loss 0.8553   LearningRate 0.0091   Epoch: 13   Global Step: 233020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:54:08,795-Speed 3297.49 samples/sec   Loss 0.9079   LearningRate 0.0091   Epoch: 13   Global Step: 233030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:54:11,847-Speed 3356.77 samples/sec   Loss 0.8803   LearningRate 0.0091   Epoch: 13   Global Step: 233040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:14,930-Speed 3321.54 samples/sec   Loss 0.8914   LearningRate 0.0091   Epoch: 13   Global Step: 233050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:18,012-Speed 3323.52 samples/sec   Loss 0.8647   LearningRate 0.0091   Epoch: 13   Global Step: 233060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:21,101-Speed 3316.01 samples/sec   Loss 0.8774   LearningRate 0.0091   Epoch: 13   Global Step: 233070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:24,234-Speed 3269.43 samples/sec   Loss 0.8745   LearningRate 0.0091   Epoch: 13   Global Step: 233080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:27,307-Speed 3332.94 samples/sec   Loss 0.8819   LearningRate 0.0091   Epoch: 13   Global Step: 233090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:30,389-Speed 3322.90 samples/sec   Loss 0.8990   LearningRate 0.0091   Epoch: 13   Global Step: 233100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:33,462-Speed 3333.45 samples/sec   Loss 0.8511   LearningRate 0.0091   Epoch: 13   Global Step: 233110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:36,534-Speed 3333.88 samples/sec   Loss 0.8656   LearningRate 0.0091   Epoch: 13   Global Step: 233120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:39,606-Speed 3334.11 samples/sec   Loss 0.8711   LearningRate 0.0091   Epoch: 13   Global Step: 233130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:54:42,675-Speed 3337.13 samples/sec   Loss 0.8589   LearningRate 0.0091   Epoch: 13   Global Step: 233140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:54:45,743-Speed 3338.39 samples/sec   Loss 0.8428   LearningRate 0.0091   Epoch: 13   Global Step: 233150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:54:48,911-Speed 3232.94 samples/sec   Loss 0.8645   LearningRate 0.0091   Epoch: 13   Global Step: 233160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:54:52,021-Speed 3294.33 samples/sec   Loss 0.8742   LearningRate 0.0091   Epoch: 13   Global Step: 233170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:54:55,116-Speed 3309.53 samples/sec   Loss 0.8918   LearningRate 0.0091   Epoch: 13   Global Step: 233180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:54:58,206-Speed 3313.73 samples/sec   Loss 0.8684   LearningRate 0.0091   Epoch: 13   Global Step: 233190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:55:01,326-Speed 3282.83 samples/sec   Loss 0.8628   LearningRate 0.0091   Epoch: 13   Global Step: 233200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:55:04,423-Speed 3307.35 samples/sec   Loss 0.8713   LearningRate 0.0091   Epoch: 13   Global Step: 233210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:55:07,510-Speed 3317.51 samples/sec   Loss 0.8384   LearningRate 0.0091   Epoch: 13   Global Step: 233220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:55:10,625-Speed 3288.79 samples/sec   Loss 0.8688   LearningRate 0.0091   Epoch: 13   Global Step: 233230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:55:13,717-Speed 3312.80 samples/sec   Loss 0.8860   LearningRate 0.0091   Epoch: 13   Global Step: 233240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:16,863-Speed 3255.12 samples/sec   Loss 0.8937   LearningRate 0.0091   Epoch: 13   Global Step: 233250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:20,023-Speed 3241.14 samples/sec   Loss 0.8646   LearningRate 0.0091   Epoch: 13   Global Step: 233260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:23,103-Speed 3325.13 samples/sec   Loss 0.8272   LearningRate 0.0091   Epoch: 13   Global Step: 233270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:26,172-Speed 3338.27 samples/sec   Loss 0.8730   LearningRate 0.0091   Epoch: 13   Global Step: 233280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:29,255-Speed 3321.94 samples/sec   Loss 0.8374   LearningRate 0.0091   Epoch: 13   Global Step: 233290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:32,342-Speed 3317.07 samples/sec   Loss 0.8627   LearningRate 0.0091   Epoch: 13   Global Step: 233300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:35,471-Speed 3273.50 samples/sec   Loss 0.8761   LearningRate 0.0091   Epoch: 13   Global Step: 233310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:38,549-Speed 3328.42 samples/sec   Loss 0.9098   LearningRate 0.0091   Epoch: 13   Global Step: 233320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:41,672-Speed 3279.61 samples/sec   Loss 0.8628   LearningRate 0.0091   Epoch: 13   Global Step: 233330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:44,729-Speed 3350.87 samples/sec   Loss 0.8529   LearningRate 0.0091   Epoch: 13   Global Step: 233340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:47,804-Speed 3330.82 samples/sec   Loss 0.8820   LearningRate 0.0091   Epoch: 13   Global Step: 233350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:50,870-Speed 3340.44 samples/sec   Loss 0.8760   LearningRate 0.0091   Epoch: 13   Global Step: 233360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:53,940-Speed 3335.57 samples/sec   Loss 0.8671   LearningRate 0.0091   Epoch: 13   Global Step: 233370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:55:57,015-Speed 3331.11 samples/sec   Loss 0.8461   LearningRate 0.0091   Epoch: 13   Global Step: 233380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:00,092-Speed 3328.74 samples/sec   Loss 0.8672   LearningRate 0.0091   Epoch: 13   Global Step: 233390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:03,234-Speed 3259.75 samples/sec   Loss 0.8875   LearningRate 0.0090   Epoch: 13   Global Step: 233400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:06,373-Speed 3262.71 samples/sec   Loss 0.8898   LearningRate 0.0090   Epoch: 13   Global Step: 233410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:09,450-Speed 3329.35 samples/sec   Loss 0.8628   LearningRate 0.0090   Epoch: 13   Global Step: 233420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:12,568-Speed 3284.77 samples/sec   Loss 0.8796   LearningRate 0.0090   Epoch: 13   Global Step: 233430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:15,681-Speed 3289.96 samples/sec   Loss 0.8869   LearningRate 0.0090   Epoch: 13   Global Step: 233440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:18,779-Speed 3306.42 samples/sec   Loss 0.8735   LearningRate 0.0090   Epoch: 13   Global Step: 233450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:22,009-Speed 3170.82 samples/sec   Loss 0.8382   LearningRate 0.0090   Epoch: 13   Global Step: 233460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:25,140-Speed 3271.51 samples/sec   Loss 0.8782   LearningRate 0.0090   Epoch: 13   Global Step: 233470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:28,260-Speed 3282.19 samples/sec   Loss 0.8976   LearningRate 0.0090   Epoch: 13   Global Step: 233480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:31,337-Speed 3328.90 samples/sec   Loss 0.8480   LearningRate 0.0090   Epoch: 13   Global Step: 233490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:34,486-Speed 3252.33 samples/sec   Loss 0.8934   LearningRate 0.0090   Epoch: 13   Global Step: 233500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:37,569-Speed 3323.05 samples/sec   Loss 0.8894   LearningRate 0.0090   Epoch: 13   Global Step: 233510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:40,770-Speed 3199.67 samples/sec   Loss 0.8721   LearningRate 0.0090   Epoch: 13   Global Step: 233520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:43,915-Speed 3256.29 samples/sec   Loss 0.8635   LearningRate 0.0090   Epoch: 13   Global Step: 233530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:46,997-Speed 3323.23 samples/sec   Loss 0.8618   LearningRate 0.0090   Epoch: 13   Global Step: 233540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:50,077-Speed 3324.82 samples/sec   Loss 0.8839   LearningRate 0.0090   Epoch: 13   Global Step: 233550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:56:53,176-Speed 3304.92 samples/sec   Loss 0.9055   LearningRate 0.0090   Epoch: 13   Global Step: 233560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:56,246-Speed 3337.04 samples/sec   Loss 0.8514   LearningRate 0.0090   Epoch: 13   Global Step: 233570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:56:59,313-Speed 3339.30 samples/sec   Loss 0.8896   LearningRate 0.0090   Epoch: 13   Global Step: 233580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:02,406-Speed 3311.85 samples/sec   Loss 0.8963   LearningRate 0.0090   Epoch: 13   Global Step: 233590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:05,478-Speed 3333.35 samples/sec   Loss 0.8729   LearningRate 0.0090   Epoch: 13   Global Step: 233600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:08,551-Speed 3333.81 samples/sec   Loss 0.8906   LearningRate 0.0090   Epoch: 13   Global Step: 233610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:11,636-Speed 3319.78 samples/sec   Loss 0.8514   LearningRate 0.0090   Epoch: 13   Global Step: 233620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:14,709-Speed 3333.41 samples/sec   Loss 0.8562   LearningRate 0.0090   Epoch: 13   Global Step: 233630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:17,845-Speed 3266.11 samples/sec   Loss 0.8660   LearningRate 0.0090   Epoch: 13   Global Step: 233640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:21,002-Speed 3243.89 samples/sec   Loss 0.8621   LearningRate 0.0090   Epoch: 13   Global Step: 233650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:24,136-Speed 3267.44 samples/sec   Loss 0.8617   LearningRate 0.0090   Epoch: 13   Global Step: 233660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:27,946-Speed 2688.58 samples/sec   Loss 0.9047   LearningRate 0.0090   Epoch: 13   Global Step: 233670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:57:59,228-Speed 327.36 samples/sec   Loss 0.6917   LearningRate 0.0090   Epoch: 14   Global Step: 233680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:02,805-Speed 2864.07 samples/sec   Loss 0.5409   LearningRate 0.0090   Epoch: 14   Global Step: 233690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:05,878-Speed 3332.66 samples/sec   Loss 0.5129   LearningRate 0.0090   Epoch: 14   Global Step: 233700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:08,967-Speed 3315.60 samples/sec   Loss 0.5298   LearningRate 0.0090   Epoch: 14   Global Step: 233710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:12,108-Speed 3260.97 samples/sec   Loss 0.5212   LearningRate 0.0090   Epoch: 14   Global Step: 233720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:15,232-Speed 3278.59 samples/sec   Loss 0.5131   LearningRate 0.0090   Epoch: 14   Global Step: 233730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:18,301-Speed 3337.45 samples/sec   Loss 0.5228   LearningRate 0.0090   Epoch: 14   Global Step: 233740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:21,404-Speed 3299.86 samples/sec   Loss 0.5495   LearningRate 0.0090   Epoch: 14   Global Step: 233750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:24,479-Speed 3330.82 samples/sec   Loss 0.5245   LearningRate 0.0090   Epoch: 14   Global Step: 233760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:27,559-Speed 3325.80 samples/sec   Loss 0.5612   LearningRate 0.0090   Epoch: 14   Global Step: 233770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 23:58:30,785-Speed 3175.40 samples/sec   Loss 0.5319   LearningRate 0.0090   Epoch: 14   Global Step: 233780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:58:33,951-Speed 3234.88 samples/sec   Loss 0.5377   LearningRate 0.0090   Epoch: 14   Global Step: 233790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:58:37,067-Speed 3287.09 samples/sec   Loss 0.5126   LearningRate 0.0090   Epoch: 14   Global Step: 233800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:58:40,229-Speed 3238.79 samples/sec   Loss 0.5003   LearningRate 0.0090   Epoch: 14   Global Step: 233810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:58:43,409-Speed 3220.61 samples/sec   Loss 0.5312   LearningRate 0.0090   Epoch: 14   Global Step: 233820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:58:46,493-Speed 3321.01 samples/sec   Loss 0.5342   LearningRate 0.0090   Epoch: 14   Global Step: 233830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:58:49,582-Speed 3316.66 samples/sec   Loss 0.5300   LearningRate 0.0090   Epoch: 14   Global Step: 233840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:58:52,655-Speed 3333.25 samples/sec   Loss 0.5446   LearningRate 0.0090   Epoch: 14   Global Step: 233850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:58:55,745-Speed 3314.23 samples/sec   Loss 0.5471   LearningRate 0.0090   Epoch: 14   Global Step: 233860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:58:58,823-Speed 3327.33 samples/sec   Loss 0.5074   LearningRate 0.0090   Epoch: 14   Global Step: 233870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 23:59:02,188-Speed 3043.78 samples/sec   Loss 0.5357   LearningRate 0.0090   Epoch: 14   Global Step: 233880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:06,132-Speed 2596.73 samples/sec   Loss 0.5149   LearningRate 0.0090   Epoch: 14   Global Step: 233890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:09,217-Speed 3319.73 samples/sec   Loss 0.5107   LearningRate 0.0090   Epoch: 14   Global Step: 233900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:12,343-Speed 3277.10 samples/sec   Loss 0.5494   LearningRate 0.0090   Epoch: 14   Global Step: 233910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:15,421-Speed 3327.96 samples/sec   Loss 0.5151   LearningRate 0.0090   Epoch: 14   Global Step: 233920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:18,499-Speed 3327.78 samples/sec   Loss 0.5233   LearningRate 0.0090   Epoch: 14   Global Step: 233930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:21,609-Speed 3293.26 samples/sec   Loss 0.5072   LearningRate 0.0090   Epoch: 14   Global Step: 233940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:24,703-Speed 3310.48 samples/sec   Loss 0.5615   LearningRate 0.0090   Epoch: 14   Global Step: 233950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:27,811-Speed 3295.34 samples/sec   Loss 0.5389   LearningRate 0.0089   Epoch: 14   Global Step: 233960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:30,921-Speed 3293.47 samples/sec   Loss 0.5247   LearningRate 0.0089   Epoch: 14   Global Step: 233970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:34,002-Speed 3323.56 samples/sec   Loss 0.5509   LearningRate 0.0089   Epoch: 14   Global Step: 233980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:37,153-Speed 3250.44 samples/sec   Loss 0.5183   LearningRate 0.0089   Epoch: 14   Global Step: 233990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 23:59:40,253-Speed 3303.86 samples/sec   Loss 0.5472   LearningRate 0.0089   Epoch: 14   Global Step: 234000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:00:24,572-[lfw][234000]XNorm: 21.716343
Training: 2022-04-12 00:00:24,573-[lfw][234000]Accuracy-Flip: 0.99750+-0.00239
Training: 2022-04-12 00:00:24,573-[lfw][234000]Accuracy-Highest: 0.99817
Training: 2022-04-12 00:01:15,741-[cfp_fp][234000]XNorm: 22.475299
Training: 2022-04-12 00:01:15,741-[cfp_fp][234000]Accuracy-Flip: 0.98971+-0.00388
Training: 2022-04-12 00:01:15,742-[cfp_fp][234000]Accuracy-Highest: 0.99129
Training: 2022-04-12 00:01:59,820-[agedb_30][234000]XNorm: 22.797648
Training: 2022-04-12 00:01:59,821-[agedb_30][234000]Accuracy-Flip: 0.98317+-0.00617
Training: 2022-04-12 00:01:59,821-[agedb_30][234000]Accuracy-Highest: 0.98567
Training: 2022-04-12 00:02:02,884-Speed 71.79 samples/sec   Loss 0.5591   LearningRate 0.0089   Epoch: 14   Global Step: 234010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:05,987-Speed 3300.77 samples/sec   Loss 0.5259   LearningRate 0.0089   Epoch: 14   Global Step: 234020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:09,070-Speed 3322.54 samples/sec   Loss 0.5153   LearningRate 0.0089   Epoch: 14   Global Step: 234030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:12,129-Speed 3348.09 samples/sec   Loss 0.5080   LearningRate 0.0089   Epoch: 14   Global Step: 234040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:15,188-Speed 3348.22 samples/sec   Loss 0.5133   LearningRate 0.0089   Epoch: 14   Global Step: 234050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:18,272-Speed 3320.43 samples/sec   Loss 0.5519   LearningRate 0.0089   Epoch: 14   Global Step: 234060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:21,332-Speed 3347.60 samples/sec   Loss 0.5035   LearningRate 0.0089   Epoch: 14   Global Step: 234070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:24,407-Speed 3330.51 samples/sec   Loss 0.5241   LearningRate 0.0089   Epoch: 14   Global Step: 234080   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-12 00:02:27,461-Speed 3354.53 samples/sec   Loss 0.5497   LearningRate 0.0089   Epoch: 14   Global Step: 234090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:30,512-Speed 3357.24 samples/sec   Loss 0.5352   LearningRate 0.0089   Epoch: 14   Global Step: 234100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:33,575-Speed 3342.91 samples/sec   Loss 0.5360   LearningRate 0.0089   Epoch: 14   Global Step: 234110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:36,638-Speed 3344.55 samples/sec   Loss 0.5276   LearningRate 0.0089   Epoch: 14   Global Step: 234120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:39,751-Speed 3289.64 samples/sec   Loss 0.5204   LearningRate 0.0089   Epoch: 14   Global Step: 234130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:42,818-Speed 3339.83 samples/sec   Loss 0.5374   LearningRate 0.0089   Epoch: 14   Global Step: 234140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:45,887-Speed 3337.17 samples/sec   Loss 0.5384   LearningRate 0.0089   Epoch: 14   Global Step: 234150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:48,973-Speed 3319.49 samples/sec   Loss 0.5144   LearningRate 0.0089   Epoch: 14   Global Step: 234160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:52,065-Speed 3312.53 samples/sec   Loss 0.5087   LearningRate 0.0089   Epoch: 14   Global Step: 234170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:55,139-Speed 3331.89 samples/sec   Loss 0.5431   LearningRate 0.0089   Epoch: 14   Global Step: 234180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:02:58,329-Speed 3210.52 samples/sec   Loss 0.5297   LearningRate 0.0089   Epoch: 14   Global Step: 234190   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-12 00:03:01,405-Speed 3329.62 samples/sec   Loss 0.5192   LearningRate 0.0089   Epoch: 14   Global Step: 234200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:03:04,479-Speed 3331.46 samples/sec   Loss 0.5243   LearningRate 0.0089   Epoch: 14   Global Step: 234210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:07,572-Speed 3311.84 samples/sec   Loss 0.5408   LearningRate 0.0089   Epoch: 14   Global Step: 234220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:10,648-Speed 3330.00 samples/sec   Loss 0.5328   LearningRate 0.0089   Epoch: 14   Global Step: 234230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:13,707-Speed 3348.75 samples/sec   Loss 0.5337   LearningRate 0.0089   Epoch: 14   Global Step: 234240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:16,802-Speed 3308.72 samples/sec   Loss 0.5437   LearningRate 0.0089   Epoch: 14   Global Step: 234250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:19,865-Speed 3344.51 samples/sec   Loss 0.5327   LearningRate 0.0089   Epoch: 14   Global Step: 234260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:22,925-Speed 3347.05 samples/sec   Loss 0.5050   LearningRate 0.0089   Epoch: 14   Global Step: 234270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:25,999-Speed 3331.96 samples/sec   Loss 0.5692   LearningRate 0.0089   Epoch: 14   Global Step: 234280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:29,067-Speed 3338.35 samples/sec   Loss 0.5593   LearningRate 0.0089   Epoch: 14   Global Step: 234290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:32,128-Speed 3345.63 samples/sec   Loss 0.5651   LearningRate 0.0089   Epoch: 14   Global Step: 234300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:03:35,180-Speed 3355.74 samples/sec   Loss 0.5140   LearningRate 0.0089   Epoch: 14   Global Step: 234310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:03:38,283-Speed 3301.23 samples/sec   Loss 0.5627   LearningRate 0.0089   Epoch: 14   Global Step: 234320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:03:41,338-Speed 3352.40 samples/sec   Loss 0.5285   LearningRate 0.0089   Epoch: 14   Global Step: 234330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:03:44,445-Speed 3296.95 samples/sec   Loss 0.5311   LearningRate 0.0089   Epoch: 14   Global Step: 234340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:03:47,529-Speed 3321.69 samples/sec   Loss 0.5355   LearningRate 0.0089   Epoch: 14   Global Step: 234350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:03:50,591-Speed 3344.55 samples/sec   Loss 0.5459   LearningRate 0.0089   Epoch: 14   Global Step: 234360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:03:53,667-Speed 3329.36 samples/sec   Loss 0.5203   LearningRate 0.0089   Epoch: 14   Global Step: 234370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:03:56,725-Speed 3349.70 samples/sec   Loss 0.5395   LearningRate 0.0089   Epoch: 14   Global Step: 234380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:03:59,800-Speed 3331.10 samples/sec   Loss 0.5533   LearningRate 0.0089   Epoch: 14   Global Step: 234390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:04:02,866-Speed 3340.10 samples/sec   Loss 0.5483   LearningRate 0.0089   Epoch: 14   Global Step: 234400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:04:05,936-Speed 3335.68 samples/sec   Loss 0.5211   LearningRate 0.0089   Epoch: 14   Global Step: 234410   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-12 00:04:08,981-Speed 3364.55 samples/sec   Loss 0.5317   LearningRate 0.0089   Epoch: 14   Global Step: 234420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:04:12,028-Speed 3361.01 samples/sec   Loss 0.5112   LearningRate 0.0089   Epoch: 14   Global Step: 234430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:15,085-Speed 3350.22 samples/sec   Loss 0.5709   LearningRate 0.0089   Epoch: 14   Global Step: 234440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:18,166-Speed 3324.35 samples/sec   Loss 0.5164   LearningRate 0.0089   Epoch: 14   Global Step: 234450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:21,224-Speed 3349.31 samples/sec   Loss 0.5411   LearningRate 0.0089   Epoch: 14   Global Step: 234460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:24,286-Speed 3344.91 samples/sec   Loss 0.5385   LearningRate 0.0089   Epoch: 14   Global Step: 234470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:27,363-Speed 3328.91 samples/sec   Loss 0.5670   LearningRate 0.0089   Epoch: 14   Global Step: 234480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:30,422-Speed 3348.15 samples/sec   Loss 0.5379   LearningRate 0.0089   Epoch: 14   Global Step: 234490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:33,503-Speed 3324.89 samples/sec   Loss 0.5258   LearningRate 0.0089   Epoch: 14   Global Step: 234500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:36,564-Speed 3345.93 samples/sec   Loss 0.5363   LearningRate 0.0089   Epoch: 14   Global Step: 234510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:39,628-Speed 3343.16 samples/sec   Loss 0.5311   LearningRate 0.0088   Epoch: 14   Global Step: 234520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:42,682-Speed 3353.09 samples/sec   Loss 0.5377   LearningRate 0.0088   Epoch: 14   Global Step: 234530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:04:45,740-Speed 3349.12 samples/sec   Loss 0.5476   LearningRate 0.0088   Epoch: 14   Global Step: 234540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:04:48,847-Speed 3296.83 samples/sec   Loss 0.5320   LearningRate 0.0088   Epoch: 14   Global Step: 234550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:04:52,074-Speed 3173.40 samples/sec   Loss 0.5133   LearningRate 0.0088   Epoch: 14   Global Step: 234560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-12 00:04:55,158-Speed 3320.85 samples/sec   Loss 0.5181   LearningRate 0.0088   Epoch: 14   Global Step: 234570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:04:58,227-Speed 3338.30 samples/sec   Loss 0.5092   LearningRate 0.0088   Epoch: 14   Global Step: 234580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:05:01,291-Speed 3342.80 samples/sec   Loss 0.5193   LearningRate 0.0088   Epoch: 14   Global Step: 234590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:05:04,357-Speed 3340.59 samples/sec   Loss 0.5222   LearningRate 0.0088   Epoch: 14   Global Step: 234600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:05:07,414-Speed 3351.08 samples/sec   Loss 0.5144   LearningRate 0.0088   Epoch: 14   Global Step: 234610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-12 00:05:10,469-Speed 3351.78 samples/sec   Loss 0.4975   LearningRate 0.0088   Epoch: 14   Global Step: 234620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:13,528-Speed 3348.61 samples/sec   Loss 0.5488   LearningRate 0.0088   Epoch: 14   Global Step: 234630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:16,592-Speed 3342.73 samples/sec   Loss 0.5349   LearningRate 0.0088   Epoch: 14   Global Step: 234640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:19,667-Speed 3330.18 samples/sec   Loss 0.5301   LearningRate 0.0088   Epoch: 14   Global Step: 234650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:22,735-Speed 3339.25 samples/sec   Loss 0.5437   LearningRate 0.0088   Epoch: 14   Global Step: 234660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:25,790-Speed 3352.31 samples/sec   Loss 0.5264   LearningRate 0.0088   Epoch: 14   Global Step: 234670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:05:28,998-Speed 3193.00 samples/sec   Loss 0.5235   LearningRate 0.0088   Epoch: 14   Global Step: 234680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:05:32,067-Speed 3336.97 samples/sec   Loss 0.5327   LearningRate 0.0088   Epoch: 14   Global Step: 234690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:05:35,145-Speed 3328.37 samples/sec   Loss 0.5940   LearningRate 0.0088   Epoch: 14   Global Step: 234700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:38,203-Speed 3348.43 samples/sec   Loss 0.5444   LearningRate 0.0088   Epoch: 14   Global Step: 234710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:41,273-Speed 3336.80 samples/sec   Loss 0.5469   LearningRate 0.0088   Epoch: 14   Global Step: 234720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:44,329-Speed 3351.22 samples/sec   Loss 0.5126   LearningRate 0.0088   Epoch: 14   Global Step: 234730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:47,389-Speed 3347.79 samples/sec   Loss 0.5269   LearningRate 0.0088   Epoch: 14   Global Step: 234740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:50,466-Speed 3327.69 samples/sec   Loss 0.5162   LearningRate 0.0088   Epoch: 14   Global Step: 234750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:53,542-Speed 3330.34 samples/sec   Loss 0.5641   LearningRate 0.0088   Epoch: 14   Global Step: 234760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:56,693-Speed 3250.03 samples/sec   Loss 0.5425   LearningRate 0.0088   Epoch: 14   Global Step: 234770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:05:59,763-Speed 3336.26 samples/sec   Loss 0.5287   LearningRate 0.0088   Epoch: 14   Global Step: 234780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:02,826-Speed 3343.96 samples/sec   Loss 0.5376   LearningRate 0.0088   Epoch: 14   Global Step: 234790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:05,891-Speed 3341.85 samples/sec   Loss 0.5668   LearningRate 0.0088   Epoch: 14   Global Step: 234800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:06:08,939-Speed 3360.27 samples/sec   Loss 0.5571   LearningRate 0.0088   Epoch: 14   Global Step: 234810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:12,014-Speed 3330.80 samples/sec   Loss 0.5655   LearningRate 0.0088   Epoch: 14   Global Step: 234820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:15,077-Speed 3344.36 samples/sec   Loss 0.5256   LearningRate 0.0088   Epoch: 14   Global Step: 234830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:18,178-Speed 3302.37 samples/sec   Loss 0.5391   LearningRate 0.0088   Epoch: 14   Global Step: 234840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:21,267-Speed 3315.77 samples/sec   Loss 0.5637   LearningRate 0.0088   Epoch: 14   Global Step: 234850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:24,345-Speed 3327.90 samples/sec   Loss 0.5400   LearningRate 0.0088   Epoch: 14   Global Step: 234860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:27,454-Speed 3295.01 samples/sec   Loss 0.5235   LearningRate 0.0088   Epoch: 14   Global Step: 234870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:30,526-Speed 3333.16 samples/sec   Loss 0.5188   LearningRate 0.0088   Epoch: 14   Global Step: 234880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:33,583-Speed 3351.04 samples/sec   Loss 0.5291   LearningRate 0.0088   Epoch: 14   Global Step: 234890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:36,637-Speed 3353.61 samples/sec   Loss 0.5456   LearningRate 0.0088   Epoch: 14   Global Step: 234900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:39,706-Speed 3336.64 samples/sec   Loss 0.5306   LearningRate 0.0088   Epoch: 14   Global Step: 234910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:06:42,798-Speed 3312.97 samples/sec   Loss 0.5202   LearningRate 0.0088   Epoch: 14   Global Step: 234920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:46,023-Speed 3175.83 samples/sec   Loss 0.5099   LearningRate 0.0088   Epoch: 14   Global Step: 234930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:49,161-Speed 3264.12 samples/sec   Loss 0.5222   LearningRate 0.0088   Epoch: 14   Global Step: 234940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:52,263-Speed 3301.72 samples/sec   Loss 0.5371   LearningRate 0.0088   Epoch: 14   Global Step: 234950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:55,385-Speed 3280.45 samples/sec   Loss 0.5462   LearningRate 0.0088   Epoch: 14   Global Step: 234960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:06:58,464-Speed 3327.22 samples/sec   Loss 0.5400   LearningRate 0.0088   Epoch: 14   Global Step: 234970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:07:01,528-Speed 3341.93 samples/sec   Loss 0.5205   LearningRate 0.0088   Epoch: 14   Global Step: 234980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:07:04,609-Speed 3324.77 samples/sec   Loss 0.5390   LearningRate 0.0088   Epoch: 14   Global Step: 234990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:07:07,671-Speed 3344.67 samples/sec   Loss 0.5262   LearningRate 0.0088   Epoch: 14   Global Step: 235000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:07:10,741-Speed 3336.03 samples/sec   Loss 0.5627   LearningRate 0.0088   Epoch: 14   Global Step: 235010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:07:13,802-Speed 3345.84 samples/sec   Loss 0.5507   LearningRate 0.0088   Epoch: 14   Global Step: 235020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:16,860-Speed 3350.06 samples/sec   Loss 0.5608   LearningRate 0.0088   Epoch: 14   Global Step: 235030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:19,946-Speed 3319.60 samples/sec   Loss 0.5423   LearningRate 0.0088   Epoch: 14   Global Step: 235040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:23,026-Speed 3324.86 samples/sec   Loss 0.5606   LearningRate 0.0088   Epoch: 14   Global Step: 235050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:26,159-Speed 3269.48 samples/sec   Loss 0.5031   LearningRate 0.0088   Epoch: 14   Global Step: 235060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:29,216-Speed 3350.58 samples/sec   Loss 0.5550   LearningRate 0.0088   Epoch: 14   Global Step: 235070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:32,277-Speed 3345.82 samples/sec   Loss 0.5299   LearningRate 0.0087   Epoch: 14   Global Step: 235080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:35,390-Speed 3289.78 samples/sec   Loss 0.5455   LearningRate 0.0087   Epoch: 14   Global Step: 235090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:38,482-Speed 3312.40 samples/sec   Loss 0.5358   LearningRate 0.0087   Epoch: 14   Global Step: 235100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:41,542-Speed 3347.26 samples/sec   Loss 0.5624   LearningRate 0.0087   Epoch: 14   Global Step: 235110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:44,615-Speed 3332.91 samples/sec   Loss 0.5280   LearningRate 0.0087   Epoch: 14   Global Step: 235120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:47,687-Speed 3334.98 samples/sec   Loss 0.5682   LearningRate 0.0087   Epoch: 14   Global Step: 235130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:50,745-Speed 3349.06 samples/sec   Loss 0.5628   LearningRate 0.0087   Epoch: 14   Global Step: 235140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:53,846-Speed 3302.32 samples/sec   Loss 0.5516   LearningRate 0.0087   Epoch: 14   Global Step: 235150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:07:56,919-Speed 3332.79 samples/sec   Loss 0.5572   LearningRate 0.0087   Epoch: 14   Global Step: 235160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:00,032-Speed 3290.73 samples/sec   Loss 0.5366   LearningRate 0.0087   Epoch: 14   Global Step: 235170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:03,090-Speed 3349.45 samples/sec   Loss 0.5313   LearningRate 0.0087   Epoch: 14   Global Step: 235180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:06,164-Speed 3331.78 samples/sec   Loss 0.5431   LearningRate 0.0087   Epoch: 14   Global Step: 235190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:09,221-Speed 3350.73 samples/sec   Loss 0.5439   LearningRate 0.0087   Epoch: 14   Global Step: 235200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:12,289-Speed 3338.31 samples/sec   Loss 0.5259   LearningRate 0.0087   Epoch: 14   Global Step: 235210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:15,412-Speed 3279.12 samples/sec   Loss 0.5136   LearningRate 0.0087   Epoch: 14   Global Step: 235220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:18,476-Speed 3343.57 samples/sec   Loss 0.5402   LearningRate 0.0087   Epoch: 14   Global Step: 235230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:21,538-Speed 3345.04 samples/sec   Loss 0.5406   LearningRate 0.0087   Epoch: 14   Global Step: 235240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:24,623-Speed 3319.97 samples/sec   Loss 0.5855   LearningRate 0.0087   Epoch: 14   Global Step: 235250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:27,739-Speed 3287.11 samples/sec   Loss 0.5584   LearningRate 0.0087   Epoch: 14   Global Step: 235260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:30,974-Speed 3165.90 samples/sec   Loss 0.5459   LearningRate 0.0087   Epoch: 14   Global Step: 235270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:34,029-Speed 3352.12 samples/sec   Loss 0.5679   LearningRate 0.0087   Epoch: 14   Global Step: 235280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:37,101-Speed 3334.80 samples/sec   Loss 0.5350   LearningRate 0.0087   Epoch: 14   Global Step: 235290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:41,021-Speed 2612.62 samples/sec   Loss 0.5439   LearningRate 0.0087   Epoch: 14   Global Step: 235300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:47,787-Speed 1513.62 samples/sec   Loss 0.5479   LearningRate 0.0087   Epoch: 14   Global Step: 235310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:08:51,757-Speed 2579.29 samples/sec   Loss 0.5398   LearningRate 0.0087   Epoch: 14   Global Step: 235320   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-12 00:08:54,799-Speed 3367.43 samples/sec   Loss 0.5569   LearningRate 0.0087   Epoch: 14   Global Step: 235330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:08:57,897-Speed 3306.47 samples/sec   Loss 0.5434   LearningRate 0.0087   Epoch: 14   Global Step: 235340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:00,977-Speed 3325.31 samples/sec   Loss 0.5607   LearningRate 0.0087   Epoch: 14   Global Step: 235350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:04,034-Speed 3350.24 samples/sec   Loss 0.5467   LearningRate 0.0087   Epoch: 14   Global Step: 235360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:07,099-Speed 3342.31 samples/sec   Loss 0.5456   LearningRate 0.0087   Epoch: 14   Global Step: 235370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:10,186-Speed 3318.31 samples/sec   Loss 0.5477   LearningRate 0.0087   Epoch: 14   Global Step: 235380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:13,254-Speed 3338.18 samples/sec   Loss 0.5409   LearningRate 0.0087   Epoch: 14   Global Step: 235390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:16,356-Speed 3301.25 samples/sec   Loss 0.5339   LearningRate 0.0087   Epoch: 14   Global Step: 235400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:19,457-Speed 3302.95 samples/sec   Loss 0.5820   LearningRate 0.0087   Epoch: 14   Global Step: 235410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:22,534-Speed 3328.98 samples/sec   Loss 0.5389   LearningRate 0.0087   Epoch: 14   Global Step: 235420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:25,621-Speed 3317.05 samples/sec   Loss 0.5713   LearningRate 0.0087   Epoch: 14   Global Step: 235430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:09:28,694-Speed 3333.67 samples/sec   Loss 0.5638   LearningRate 0.0087   Epoch: 14   Global Step: 235440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:09:31,791-Speed 3307.55 samples/sec   Loss 0.5701   LearningRate 0.0087   Epoch: 14   Global Step: 235450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:09:34,856-Speed 3341.18 samples/sec   Loss 0.5235   LearningRate 0.0087   Epoch: 14   Global Step: 235460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:09:37,933-Speed 3329.57 samples/sec   Loss 0.5412   LearningRate 0.0087   Epoch: 14   Global Step: 235470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:09:40,980-Speed 3360.38 samples/sec   Loss 0.5716   LearningRate 0.0087   Epoch: 14   Global Step: 235480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:44,058-Speed 3328.42 samples/sec   Loss 0.5642   LearningRate 0.0087   Epoch: 14   Global Step: 235490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:47,169-Speed 3292.09 samples/sec   Loss 0.5577   LearningRate 0.0087   Epoch: 14   Global Step: 235500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:50,273-Speed 3299.30 samples/sec   Loss 0.5569   LearningRate 0.0087   Epoch: 14   Global Step: 235510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:53,359-Speed 3319.00 samples/sec   Loss 0.5787   LearningRate 0.0087   Epoch: 14   Global Step: 235520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:56,426-Speed 3340.02 samples/sec   Loss 0.5526   LearningRate 0.0087   Epoch: 14   Global Step: 235530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:09:59,501-Speed 3330.48 samples/sec   Loss 0.5647   LearningRate 0.0087   Epoch: 14   Global Step: 235540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:02,589-Speed 3317.29 samples/sec   Loss 0.5471   LearningRate 0.0087   Epoch: 14   Global Step: 235550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:05,650-Speed 3345.27 samples/sec   Loss 0.5443   LearningRate 0.0087   Epoch: 14   Global Step: 235560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:08,740-Speed 3314.84 samples/sec   Loss 0.5404   LearningRate 0.0087   Epoch: 14   Global Step: 235570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:11,859-Speed 3283.83 samples/sec   Loss 0.5710   LearningRate 0.0087   Epoch: 14   Global Step: 235580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:10:14,978-Speed 3284.56 samples/sec   Loss 0.5541   LearningRate 0.0087   Epoch: 14   Global Step: 235590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:18,076-Speed 3305.07 samples/sec   Loss 0.5492   LearningRate 0.0087   Epoch: 14   Global Step: 235600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:21,198-Speed 3281.23 samples/sec   Loss 0.5608   LearningRate 0.0087   Epoch: 14   Global Step: 235610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:24,390-Speed 3208.54 samples/sec   Loss 0.5805   LearningRate 0.0087   Epoch: 14   Global Step: 235620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:27,464-Speed 3332.58 samples/sec   Loss 0.5704   LearningRate 0.0087   Epoch: 14   Global Step: 235630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:30,658-Speed 3206.76 samples/sec   Loss 0.5590   LearningRate 0.0087   Epoch: 14   Global Step: 235640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:33,721-Speed 3343.26 samples/sec   Loss 0.5712   LearningRate 0.0086   Epoch: 14   Global Step: 235650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:36,784-Speed 3344.26 samples/sec   Loss 0.5658   LearningRate 0.0086   Epoch: 14   Global Step: 235660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:39,865-Speed 3323.69 samples/sec   Loss 0.5675   LearningRate 0.0086   Epoch: 14   Global Step: 235670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:42,944-Speed 3326.75 samples/sec   Loss 0.5730   LearningRate 0.0086   Epoch: 14   Global Step: 235680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:10:46,013-Speed 3337.12 samples/sec   Loss 0.5377   LearningRate 0.0086   Epoch: 14   Global Step: 235690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:10:49,128-Speed 3288.02 samples/sec   Loss 0.5701   LearningRate 0.0086   Epoch: 14   Global Step: 235700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:10:52,214-Speed 3320.08 samples/sec   Loss 0.5484   LearningRate 0.0086   Epoch: 14   Global Step: 235710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:10:55,342-Speed 3274.36 samples/sec   Loss 0.5547   LearningRate 0.0086   Epoch: 14   Global Step: 235720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:10:58,540-Speed 3202.61 samples/sec   Loss 0.5626   LearningRate 0.0086   Epoch: 14   Global Step: 235730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:11:01,608-Speed 3337.72 samples/sec   Loss 0.5735   LearningRate 0.0086   Epoch: 14   Global Step: 235740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:11:04,681-Speed 3333.23 samples/sec   Loss 0.5749   LearningRate 0.0086   Epoch: 14   Global Step: 235750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:11:07,739-Speed 3350.08 samples/sec   Loss 0.5695   LearningRate 0.0086   Epoch: 14   Global Step: 235760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:10,853-Speed 3288.57 samples/sec   Loss 0.5719   LearningRate 0.0086   Epoch: 14   Global Step: 235770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:13,960-Speed 3296.08 samples/sec   Loss 0.5574   LearningRate 0.0086   Epoch: 14   Global Step: 235780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:17,041-Speed 3324.55 samples/sec   Loss 0.5880   LearningRate 0.0086   Epoch: 14   Global Step: 235790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:20,102-Speed 3346.43 samples/sec   Loss 0.5590   LearningRate 0.0086   Epoch: 14   Global Step: 235800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:23,169-Speed 3339.76 samples/sec   Loss 0.5640   LearningRate 0.0086   Epoch: 14   Global Step: 235810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:26,311-Speed 3259.31 samples/sec   Loss 0.5796   LearningRate 0.0086   Epoch: 14   Global Step: 235820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:29,411-Speed 3304.10 samples/sec   Loss 0.5498   LearningRate 0.0086   Epoch: 14   Global Step: 235830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:32,496-Speed 3319.76 samples/sec   Loss 0.5419   LearningRate 0.0086   Epoch: 14   Global Step: 235840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:35,561-Speed 3341.65 samples/sec   Loss 0.5282   LearningRate 0.0086   Epoch: 14   Global Step: 235850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:11:38,623-Speed 3345.41 samples/sec   Loss 0.5740   LearningRate 0.0086   Epoch: 14   Global Step: 235860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:11:41,722-Speed 3304.68 samples/sec   Loss 0.5447   LearningRate 0.0086   Epoch: 14   Global Step: 235870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:11:44,790-Speed 3338.82 samples/sec   Loss 0.5502   LearningRate 0.0086   Epoch: 14   Global Step: 235880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:11:47,860-Speed 3336.55 samples/sec   Loss 0.5805   LearningRate 0.0086   Epoch: 14   Global Step: 235890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:11:50,986-Speed 3276.30 samples/sec   Loss 0.5308   LearningRate 0.0086   Epoch: 14   Global Step: 235900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:11:54,070-Speed 3321.36 samples/sec   Loss 0.5974   LearningRate 0.0086   Epoch: 14   Global Step: 235910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:11:57,129-Speed 3347.93 samples/sec   Loss 0.5558   LearningRate 0.0086   Epoch: 14   Global Step: 235920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:12:00,209-Speed 3325.24 samples/sec   Loss 0.5351   LearningRate 0.0086   Epoch: 14   Global Step: 235930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:12:03,287-Speed 3327.88 samples/sec   Loss 0.5619   LearningRate 0.0086   Epoch: 14   Global Step: 235940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:12:06,384-Speed 3306.83 samples/sec   Loss 0.5498   LearningRate 0.0086   Epoch: 14   Global Step: 235950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:12:09,530-Speed 3256.39 samples/sec   Loss 0.5545   LearningRate 0.0086   Epoch: 14   Global Step: 235960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:12:12,618-Speed 3316.60 samples/sec   Loss 0.5418   LearningRate 0.0086   Epoch: 14   Global Step: 235970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:12:15,742-Speed 3278.89 samples/sec   Loss 0.5647   LearningRate 0.0086   Epoch: 14   Global Step: 235980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:12:18,859-Speed 3285.47 samples/sec   Loss 0.5626   LearningRate 0.0086   Epoch: 14   Global Step: 235990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:12:21,968-Speed 3294.01 samples/sec   Loss 0.5524   LearningRate 0.0086   Epoch: 14   Global Step: 236000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:13:06,107-[lfw][236000]XNorm: 21.929284
Training: 2022-04-12 00:13:06,108-[lfw][236000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-12 00:13:06,108-[lfw][236000]Accuracy-Highest: 0.99817
Training: 2022-04-12 00:13:57,391-[cfp_fp][236000]XNorm: 22.959506
Training: 2022-04-12 00:13:57,391-[cfp_fp][236000]Accuracy-Flip: 0.99071+-0.00351
Training: 2022-04-12 00:13:57,392-[cfp_fp][236000]Accuracy-Highest: 0.99129
Training: 2022-04-12 00:14:41,439-[agedb_30][236000]XNorm: 23.475656
Training: 2022-04-12 00:14:41,439-[agedb_30][236000]Accuracy-Flip: 0.98417+-0.00664
Training: 2022-04-12 00:14:41,440-[agedb_30][236000]Accuracy-Highest: 0.98567
Training: 2022-04-12 00:14:44,594-Speed 71.80 samples/sec   Loss 0.5658   LearningRate 0.0086   Epoch: 14   Global Step: 236010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:14:47,659-Speed 3340.76 samples/sec   Loss 0.5937   LearningRate 0.0086   Epoch: 14   Global Step: 236020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:14:50,714-Speed 3353.37 samples/sec   Loss 0.5812   LearningRate 0.0086   Epoch: 14   Global Step: 236030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:14:53,787-Speed 3332.82 samples/sec   Loss 0.5337   LearningRate 0.0086   Epoch: 14   Global Step: 236040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:14:56,881-Speed 3310.62 samples/sec   Loss 0.5583   LearningRate 0.0086   Epoch: 14   Global Step: 236050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:14:59,955-Speed 3331.36 samples/sec   Loss 0.5506   LearningRate 0.0086   Epoch: 14   Global Step: 236060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:15:03,046-Speed 3314.01 samples/sec   Loss 0.5680   LearningRate 0.0086   Epoch: 14   Global Step: 236070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:15:06,156-Speed 3293.43 samples/sec   Loss 0.5589   LearningRate 0.0086   Epoch: 14   Global Step: 236080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:15:09,228-Speed 3333.56 samples/sec   Loss 0.5671   LearningRate 0.0086   Epoch: 14   Global Step: 236090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:15:12,343-Speed 3287.99 samples/sec   Loss 0.5151   LearningRate 0.0086   Epoch: 14   Global Step: 236100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:15:15,403-Speed 3347.78 samples/sec   Loss 0.5927   LearningRate 0.0086   Epoch: 14   Global Step: 236110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:15:18,480-Speed 3328.19 samples/sec   Loss 0.5819   LearningRate 0.0086   Epoch: 14   Global Step: 236120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:21,575-Speed 3309.38 samples/sec   Loss 0.5780   LearningRate 0.0086   Epoch: 14   Global Step: 236130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:24,637-Speed 3345.40 samples/sec   Loss 0.5694   LearningRate 0.0086   Epoch: 14   Global Step: 236140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:27,725-Speed 3316.87 samples/sec   Loss 0.5808   LearningRate 0.0086   Epoch: 14   Global Step: 236150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:30,880-Speed 3246.41 samples/sec   Loss 0.5948   LearningRate 0.0086   Epoch: 14   Global Step: 236160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:33,984-Speed 3298.89 samples/sec   Loss 0.5672   LearningRate 0.0086   Epoch: 14   Global Step: 236170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:37,149-Speed 3237.68 samples/sec   Loss 0.5579   LearningRate 0.0086   Epoch: 14   Global Step: 236180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:40,214-Speed 3341.75 samples/sec   Loss 0.5720   LearningRate 0.0086   Epoch: 14   Global Step: 236190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:43,281-Speed 3340.44 samples/sec   Loss 0.5605   LearningRate 0.0086   Epoch: 14   Global Step: 236200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:46,348-Speed 3338.73 samples/sec   Loss 0.5818   LearningRate 0.0085   Epoch: 14   Global Step: 236210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:49,535-Speed 3214.15 samples/sec   Loss 0.5585   LearningRate 0.0085   Epoch: 14   Global Step: 236220   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-12 00:15:52,613-Speed 3327.01 samples/sec   Loss 0.5683   LearningRate 0.0085   Epoch: 14   Global Step: 236230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:55,682-Speed 3337.43 samples/sec   Loss 0.5777   LearningRate 0.0085   Epoch: 14   Global Step: 236240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:15:58,760-Speed 3327.85 samples/sec   Loss 0.5857   LearningRate 0.0085   Epoch: 14   Global Step: 236250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:01,829-Speed 3336.62 samples/sec   Loss 0.5585   LearningRate 0.0085   Epoch: 14   Global Step: 236260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:04,905-Speed 3330.34 samples/sec   Loss 0.5467   LearningRate 0.0085   Epoch: 14   Global Step: 236270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:07,978-Speed 3333.26 samples/sec   Loss 0.5769   LearningRate 0.0085   Epoch: 14   Global Step: 236280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:11,043-Speed 3342.09 samples/sec   Loss 0.5678   LearningRate 0.0085   Epoch: 14   Global Step: 236290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:14,125-Speed 3322.26 samples/sec   Loss 0.5677   LearningRate 0.0085   Epoch: 14   Global Step: 236300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:17,191-Speed 3341.05 samples/sec   Loss 0.5709   LearningRate 0.0085   Epoch: 14   Global Step: 236310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:20,306-Speed 3287.81 samples/sec   Loss 0.5383   LearningRate 0.0085   Epoch: 14   Global Step: 236320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:23,382-Speed 3329.66 samples/sec   Loss 0.5805   LearningRate 0.0085   Epoch: 14   Global Step: 236330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:26,507-Speed 3277.98 samples/sec   Loss 0.5421   LearningRate 0.0085   Epoch: 14   Global Step: 236340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:29,583-Speed 3329.43 samples/sec   Loss 0.5776   LearningRate 0.0085   Epoch: 14   Global Step: 236350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:32,651-Speed 3339.06 samples/sec   Loss 0.5748   LearningRate 0.0085   Epoch: 14   Global Step: 236360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:35,735-Speed 3321.04 samples/sec   Loss 0.5652   LearningRate 0.0085   Epoch: 14   Global Step: 236370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:38,801-Speed 3340.21 samples/sec   Loss 0.5806   LearningRate 0.0085   Epoch: 14   Global Step: 236380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:41,857-Speed 3352.07 samples/sec   Loss 0.5726   LearningRate 0.0085   Epoch: 14   Global Step: 236390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:16:44,905-Speed 3360.22 samples/sec   Loss 0.5539   LearningRate 0.0085   Epoch: 14   Global Step: 236400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:16:47,978-Speed 3332.68 samples/sec   Loss 0.5743   LearningRate 0.0085   Epoch: 14   Global Step: 236410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:16:51,046-Speed 3338.31 samples/sec   Loss 0.5499   LearningRate 0.0085   Epoch: 14   Global Step: 236420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:16:54,219-Speed 3227.69 samples/sec   Loss 0.5724   LearningRate 0.0085   Epoch: 14   Global Step: 236430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:16:57,280-Speed 3346.96 samples/sec   Loss 0.5211   LearningRate 0.0085   Epoch: 14   Global Step: 236440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:17:00,341-Speed 3345.87 samples/sec   Loss 0.5939   LearningRate 0.0085   Epoch: 14   Global Step: 236450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:17:03,407-Speed 3341.03 samples/sec   Loss 0.5649   LearningRate 0.0085   Epoch: 14   Global Step: 236460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:17:06,558-Speed 3250.52 samples/sec   Loss 0.5682   LearningRate 0.0085   Epoch: 14   Global Step: 236470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:17:09,636-Speed 3326.94 samples/sec   Loss 0.5418   LearningRate 0.0085   Epoch: 14   Global Step: 236480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:17:12,721-Speed 3320.44 samples/sec   Loss 0.5550   LearningRate 0.0085   Epoch: 14   Global Step: 236490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:17:15,796-Speed 3330.01 samples/sec   Loss 0.5752   LearningRate 0.0085   Epoch: 14   Global Step: 236500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:18,889-Speed 3311.43 samples/sec   Loss 0.5729   LearningRate 0.0085   Epoch: 14   Global Step: 236510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:21,975-Speed 3319.04 samples/sec   Loss 0.5649   LearningRate 0.0085   Epoch: 14   Global Step: 236520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:25,029-Speed 3354.21 samples/sec   Loss 0.5506   LearningRate 0.0085   Epoch: 14   Global Step: 236530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:28,085-Speed 3351.87 samples/sec   Loss 0.5588   LearningRate 0.0085   Epoch: 14   Global Step: 236540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:31,157-Speed 3333.70 samples/sec   Loss 0.5775   LearningRate 0.0085   Epoch: 14   Global Step: 236550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:34,231-Speed 3332.22 samples/sec   Loss 0.5824   LearningRate 0.0085   Epoch: 14   Global Step: 236560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:37,300-Speed 3337.01 samples/sec   Loss 0.5512   LearningRate 0.0085   Epoch: 14   Global Step: 236570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:40,373-Speed 3333.22 samples/sec   Loss 0.5523   LearningRate 0.0085   Epoch: 14   Global Step: 236580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:43,484-Speed 3292.24 samples/sec   Loss 0.5596   LearningRate 0.0085   Epoch: 14   Global Step: 236590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:46,538-Speed 3353.37 samples/sec   Loss 0.5861   LearningRate 0.0085   Epoch: 14   Global Step: 236600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:49,614-Speed 3329.81 samples/sec   Loss 0.5641   LearningRate 0.0085   Epoch: 14   Global Step: 236610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:52,671-Speed 3350.72 samples/sec   Loss 0.5810   LearningRate 0.0085   Epoch: 14   Global Step: 236620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:55,736-Speed 3342.21 samples/sec   Loss 0.5585   LearningRate 0.0085   Epoch: 14   Global Step: 236630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:17:58,813-Speed 3328.08 samples/sec   Loss 0.5598   LearningRate 0.0085   Epoch: 14   Global Step: 236640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:01,901-Speed 3316.65 samples/sec   Loss 0.5338   LearningRate 0.0085   Epoch: 14   Global Step: 236650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:04,975-Speed 3332.39 samples/sec   Loss 0.5972   LearningRate 0.0085   Epoch: 14   Global Step: 236660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:08,032-Speed 3350.05 samples/sec   Loss 0.5905   LearningRate 0.0085   Epoch: 14   Global Step: 236670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:11,092-Speed 3347.17 samples/sec   Loss 0.5766   LearningRate 0.0085   Epoch: 14   Global Step: 236680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:14,170-Speed 3327.60 samples/sec   Loss 0.5487   LearningRate 0.0085   Epoch: 14   Global Step: 236690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:17,248-Speed 3327.77 samples/sec   Loss 0.5704   LearningRate 0.0085   Epoch: 14   Global Step: 236700   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-12 00:18:20,346-Speed 3306.22 samples/sec   Loss 0.5764   LearningRate 0.0085   Epoch: 14   Global Step: 236710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:23,425-Speed 3326.85 samples/sec   Loss 0.5912   LearningRate 0.0085   Epoch: 14   Global Step: 236720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:26,501-Speed 3329.75 samples/sec   Loss 0.5665   LearningRate 0.0085   Epoch: 14   Global Step: 236730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:29,565-Speed 3342.09 samples/sec   Loss 0.5763   LearningRate 0.0085   Epoch: 14   Global Step: 236740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:32,622-Speed 3351.35 samples/sec   Loss 0.5776   LearningRate 0.0085   Epoch: 14   Global Step: 236750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:18:35,671-Speed 3358.57 samples/sec   Loss 0.5845   LearningRate 0.0085   Epoch: 14   Global Step: 236760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:18:38,740-Speed 3337.05 samples/sec   Loss 0.5569   LearningRate 0.0085   Epoch: 14   Global Step: 236770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:18:41,923-Speed 3218.16 samples/sec   Loss 0.5690   LearningRate 0.0085   Epoch: 14   Global Step: 236780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:18:45,090-Speed 3234.05 samples/sec   Loss 0.5490   LearningRate 0.0084   Epoch: 14   Global Step: 236790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:18:48,169-Speed 3327.29 samples/sec   Loss 0.5793   LearningRate 0.0084   Epoch: 14   Global Step: 236800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:18:51,243-Speed 3331.62 samples/sec   Loss 0.5760   LearningRate 0.0084   Epoch: 14   Global Step: 236810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:18:54,332-Speed 3315.77 samples/sec   Loss 0.5737   LearningRate 0.0084   Epoch: 14   Global Step: 236820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:18:57,410-Speed 3327.30 samples/sec   Loss 0.5801   LearningRate 0.0084   Epoch: 14   Global Step: 236830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:00,481-Speed 3335.63 samples/sec   Loss 0.5607   LearningRate 0.0084   Epoch: 14   Global Step: 236840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:03,537-Speed 3350.55 samples/sec   Loss 0.6085   LearningRate 0.0084   Epoch: 14   Global Step: 236850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:06,699-Speed 3239.57 samples/sec   Loss 0.5871   LearningRate 0.0084   Epoch: 14   Global Step: 236860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:19:09,860-Speed 3239.70 samples/sec   Loss 0.5793   LearningRate 0.0084   Epoch: 14   Global Step: 236870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:19:12,998-Speed 3264.86 samples/sec   Loss 0.5630   LearningRate 0.0084   Epoch: 14   Global Step: 236880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:19:16,046-Speed 3359.95 samples/sec   Loss 0.5646   LearningRate 0.0084   Epoch: 14   Global Step: 236890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:19,116-Speed 3336.25 samples/sec   Loss 0.5380   LearningRate 0.0084   Epoch: 14   Global Step: 236900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:22,200-Speed 3321.23 samples/sec   Loss 0.6059   LearningRate 0.0084   Epoch: 14   Global Step: 236910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:25,280-Speed 3325.69 samples/sec   Loss 0.6063   LearningRate 0.0084   Epoch: 14   Global Step: 236920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:28,337-Speed 3349.80 samples/sec   Loss 0.5806   LearningRate 0.0084   Epoch: 14   Global Step: 236930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:31,392-Speed 3352.75 samples/sec   Loss 0.5817   LearningRate 0.0084   Epoch: 14   Global Step: 236940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:34,462-Speed 3336.34 samples/sec   Loss 0.5700   LearningRate 0.0084   Epoch: 14   Global Step: 236950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:37,554-Speed 3313.41 samples/sec   Loss 0.5558   LearningRate 0.0084   Epoch: 14   Global Step: 236960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:40,615-Speed 3345.76 samples/sec   Loss 0.5274   LearningRate 0.0084   Epoch: 14   Global Step: 236970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:43,674-Speed 3347.96 samples/sec   Loss 0.5601   LearningRate 0.0084   Epoch: 14   Global Step: 236980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:19:46,740-Speed 3340.54 samples/sec   Loss 0.5899   LearningRate 0.0084   Epoch: 14   Global Step: 236990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:19:49,805-Speed 3341.81 samples/sec   Loss 0.5776   LearningRate 0.0084   Epoch: 14   Global Step: 237000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:19:52,885-Speed 3325.53 samples/sec   Loss 0.5695   LearningRate 0.0084   Epoch: 14   Global Step: 237010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:19:55,954-Speed 3337.14 samples/sec   Loss 0.5709   LearningRate 0.0084   Epoch: 14   Global Step: 237020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:19:59,107-Speed 3247.95 samples/sec   Loss 0.5548   LearningRate 0.0084   Epoch: 14   Global Step: 237030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:20:02,191-Speed 3321.84 samples/sec   Loss 0.5828   LearningRate 0.0084   Epoch: 14   Global Step: 237040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:20:05,321-Speed 3271.91 samples/sec   Loss 0.5606   LearningRate 0.0084   Epoch: 14   Global Step: 237050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:08,424-Speed 3301.14 samples/sec   Loss 0.5936   LearningRate 0.0084   Epoch: 14   Global Step: 237060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:11,536-Speed 3291.87 samples/sec   Loss 0.5394   LearningRate 0.0084   Epoch: 14   Global Step: 237070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:14,591-Speed 3351.72 samples/sec   Loss 0.5911   LearningRate 0.0084   Epoch: 14   Global Step: 237080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:17,652-Speed 3346.76 samples/sec   Loss 0.5942   LearningRate 0.0084   Epoch: 14   Global Step: 237090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:20,710-Speed 3348.52 samples/sec   Loss 0.5614   LearningRate 0.0084   Epoch: 14   Global Step: 237100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:23,782-Speed 3334.47 samples/sec   Loss 0.6009   LearningRate 0.0084   Epoch: 14   Global Step: 237110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:26,924-Speed 3259.57 samples/sec   Loss 0.5764   LearningRate 0.0084   Epoch: 14   Global Step: 237120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:29,994-Speed 3336.17 samples/sec   Loss 0.5808   LearningRate 0.0084   Epoch: 14   Global Step: 237130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:33,086-Speed 3312.60 samples/sec   Loss 0.5898   LearningRate 0.0084   Epoch: 14   Global Step: 237140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:36,169-Speed 3322.74 samples/sec   Loss 0.5580   LearningRate 0.0084   Epoch: 14   Global Step: 237150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:20:39,228-Speed 3348.67 samples/sec   Loss 0.5709   LearningRate 0.0084   Epoch: 14   Global Step: 237160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:42,291-Speed 3343.14 samples/sec   Loss 0.5696   LearningRate 0.0084   Epoch: 14   Global Step: 237170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:45,408-Speed 3286.53 samples/sec   Loss 0.5917   LearningRate 0.0084   Epoch: 14   Global Step: 237180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:48,544-Speed 3266.04 samples/sec   Loss 0.5921   LearningRate 0.0084   Epoch: 14   Global Step: 237190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:51,608-Speed 3341.97 samples/sec   Loss 0.6028   LearningRate 0.0084   Epoch: 14   Global Step: 237200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:54,683-Speed 3330.70 samples/sec   Loss 0.5713   LearningRate 0.0084   Epoch: 14   Global Step: 237210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:20:57,768-Speed 3320.81 samples/sec   Loss 0.5537   LearningRate 0.0084   Epoch: 14   Global Step: 237220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:00,863-Speed 3308.88 samples/sec   Loss 0.5603   LearningRate 0.0084   Epoch: 14   Global Step: 237230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:04,124-Speed 3141.18 samples/sec   Loss 0.5773   LearningRate 0.0084   Epoch: 14   Global Step: 237240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:07,269-Speed 3256.74 samples/sec   Loss 0.5888   LearningRate 0.0084   Epoch: 14   Global Step: 237250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:10,417-Speed 3253.95 samples/sec   Loss 0.5919   LearningRate 0.0084   Epoch: 14   Global Step: 237260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:21:13,474-Speed 3350.51 samples/sec   Loss 0.5483   LearningRate 0.0084   Epoch: 14   Global Step: 237270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:21:16,525-Speed 3356.80 samples/sec   Loss 0.5716   LearningRate 0.0084   Epoch: 14   Global Step: 237280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:19,585-Speed 3346.57 samples/sec   Loss 0.5449   LearningRate 0.0084   Epoch: 14   Global Step: 237290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:22,643-Speed 3349.20 samples/sec   Loss 0.5801   LearningRate 0.0084   Epoch: 14   Global Step: 237300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:25,709-Speed 3340.90 samples/sec   Loss 0.5704   LearningRate 0.0084   Epoch: 14   Global Step: 237310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:28,785-Speed 3329.68 samples/sec   Loss 0.5852   LearningRate 0.0084   Epoch: 14   Global Step: 237320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:31,841-Speed 3352.13 samples/sec   Loss 0.5734   LearningRate 0.0084   Epoch: 14   Global Step: 237330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:34,909-Speed 3338.17 samples/sec   Loss 0.5722   LearningRate 0.0084   Epoch: 14   Global Step: 237340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:38,010-Speed 3303.34 samples/sec   Loss 0.6191   LearningRate 0.0084   Epoch: 14   Global Step: 237350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:41,147-Speed 3264.67 samples/sec   Loss 0.5514   LearningRate 0.0083   Epoch: 14   Global Step: 237360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:44,270-Speed 3279.41 samples/sec   Loss 0.5962   LearningRate 0.0083   Epoch: 14   Global Step: 237370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:21:47,438-Speed 3233.23 samples/sec   Loss 0.5826   LearningRate 0.0083   Epoch: 14   Global Step: 237380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:21:50,512-Speed 3331.95 samples/sec   Loss 0.5635   LearningRate 0.0083   Epoch: 14   Global Step: 237390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:21:53,579-Speed 3339.91 samples/sec   Loss 0.5691   LearningRate 0.0083   Epoch: 14   Global Step: 237400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:21:56,665-Speed 3318.79 samples/sec   Loss 0.6206   LearningRate 0.0083   Epoch: 14   Global Step: 237410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:21:59,731-Speed 3341.06 samples/sec   Loss 0.6160   LearningRate 0.0083   Epoch: 14   Global Step: 237420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:22:02,798-Speed 3339.42 samples/sec   Loss 0.5959   LearningRate 0.0083   Epoch: 14   Global Step: 237430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:22:05,855-Speed 3349.85 samples/sec   Loss 0.6071   LearningRate 0.0083   Epoch: 14   Global Step: 237440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:22:08,907-Speed 3355.77 samples/sec   Loss 0.6102   LearningRate 0.0083   Epoch: 14   Global Step: 237450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:11,984-Speed 3328.94 samples/sec   Loss 0.5717   LearningRate 0.0083   Epoch: 14   Global Step: 237460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:15,055-Speed 3334.94 samples/sec   Loss 0.5561   LearningRate 0.0083   Epoch: 14   Global Step: 237470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:18,124-Speed 3337.57 samples/sec   Loss 0.6047   LearningRate 0.0083   Epoch: 14   Global Step: 237480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:21,184-Speed 3347.46 samples/sec   Loss 0.5933   LearningRate 0.0083   Epoch: 14   Global Step: 237490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:24,249-Speed 3342.36 samples/sec   Loss 0.5777   LearningRate 0.0083   Epoch: 14   Global Step: 237500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:27,341-Speed 3312.40 samples/sec   Loss 0.5386   LearningRate 0.0083   Epoch: 14   Global Step: 237510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:30,403-Speed 3344.36 samples/sec   Loss 0.6006   LearningRate 0.0083   Epoch: 14   Global Step: 237520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:33,477-Speed 3331.99 samples/sec   Loss 0.5998   LearningRate 0.0083   Epoch: 14   Global Step: 237530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:36,545-Speed 3338.03 samples/sec   Loss 0.6058   LearningRate 0.0083   Epoch: 14   Global Step: 237540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:22:39,758-Speed 3188.04 samples/sec   Loss 0.5832   LearningRate 0.0083   Epoch: 14   Global Step: 237550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:22:42,866-Speed 3295.65 samples/sec   Loss 0.5626   LearningRate 0.0083   Epoch: 14   Global Step: 237560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:22:45,936-Speed 3336.44 samples/sec   Loss 0.5809   LearningRate 0.0083   Epoch: 14   Global Step: 237570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:22:49,064-Speed 3275.04 samples/sec   Loss 0.5977   LearningRate 0.0083   Epoch: 14   Global Step: 237580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:22:52,128-Speed 3342.34 samples/sec   Loss 0.5584   LearningRate 0.0083   Epoch: 14   Global Step: 237590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:22:55,190-Speed 3345.04 samples/sec   Loss 0.5507   LearningRate 0.0083   Epoch: 14   Global Step: 237600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:22:58,269-Speed 3326.37 samples/sec   Loss 0.5949   LearningRate 0.0083   Epoch: 14   Global Step: 237610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:01,334-Speed 3341.46 samples/sec   Loss 0.5933   LearningRate 0.0083   Epoch: 14   Global Step: 237620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:04,410-Speed 3329.94 samples/sec   Loss 0.5966   LearningRate 0.0083   Epoch: 14   Global Step: 237630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:07,473-Speed 3344.02 samples/sec   Loss 0.5712   LearningRate 0.0083   Epoch: 14   Global Step: 237640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:10,543-Speed 3335.92 samples/sec   Loss 0.6166   LearningRate 0.0083   Epoch: 14   Global Step: 237650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:13,625-Speed 3323.83 samples/sec   Loss 0.5857   LearningRate 0.0083   Epoch: 14   Global Step: 237660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:16,755-Speed 3272.53 samples/sec   Loss 0.5693   LearningRate 0.0083   Epoch: 14   Global Step: 237670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:19,825-Speed 3336.00 samples/sec   Loss 0.5896   LearningRate 0.0083   Epoch: 14   Global Step: 237680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:22,888-Speed 3344.60 samples/sec   Loss 0.5709   LearningRate 0.0083   Epoch: 14   Global Step: 237690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:26,044-Speed 3244.58 samples/sec   Loss 0.5508   LearningRate 0.0083   Epoch: 14   Global Step: 237700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:29,128-Speed 3321.60 samples/sec   Loss 0.5643   LearningRate 0.0083   Epoch: 14   Global Step: 237710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:32,254-Speed 3276.16 samples/sec   Loss 0.5669   LearningRate 0.0083   Epoch: 14   Global Step: 237720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:35,343-Speed 3315.82 samples/sec   Loss 0.5929   LearningRate 0.0083   Epoch: 14   Global Step: 237730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:23:38,413-Speed 3336.41 samples/sec   Loss 0.5994   LearningRate 0.0083   Epoch: 14   Global Step: 237740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:23:41,477-Speed 3342.32 samples/sec   Loss 0.5767   LearningRate 0.0083   Epoch: 14   Global Step: 237750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:23:44,538-Speed 3346.41 samples/sec   Loss 0.5847   LearningRate 0.0083   Epoch: 14   Global Step: 237760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:23:47,621-Speed 3321.82 samples/sec   Loss 0.5634   LearningRate 0.0083   Epoch: 14   Global Step: 237770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:23:50,698-Speed 3329.48 samples/sec   Loss 0.5769   LearningRate 0.0083   Epoch: 14   Global Step: 237780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:23:53,761-Speed 3343.89 samples/sec   Loss 0.6006   LearningRate 0.0083   Epoch: 14   Global Step: 237790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:23:56,806-Speed 3362.75 samples/sec   Loss 0.5871   LearningRate 0.0083   Epoch: 14   Global Step: 237800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:23:59,865-Speed 3347.96 samples/sec   Loss 0.5946   LearningRate 0.0083   Epoch: 14   Global Step: 237810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:24:02,943-Speed 3327.91 samples/sec   Loss 0.5774   LearningRate 0.0083   Epoch: 14   Global Step: 237820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:24:06,017-Speed 3331.61 samples/sec   Loss 0.5873   LearningRate 0.0083   Epoch: 14   Global Step: 237830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:24:09,078-Speed 3346.33 samples/sec   Loss 0.5749   LearningRate 0.0083   Epoch: 14   Global Step: 237840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:24:12,154-Speed 3330.35 samples/sec   Loss 0.5939   LearningRate 0.0083   Epoch: 14   Global Step: 237850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:24:15,215-Speed 3346.39 samples/sec   Loss 0.5856   LearningRate 0.0083   Epoch: 14   Global Step: 237860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:24:18,304-Speed 3315.51 samples/sec   Loss 0.5695   LearningRate 0.0083   Epoch: 14   Global Step: 237870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:24:21,374-Speed 3335.97 samples/sec   Loss 0.6043   LearningRate 0.0083   Epoch: 14   Global Step: 237880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:24:24,445-Speed 3335.30 samples/sec   Loss 0.5752   LearningRate 0.0083   Epoch: 14   Global Step: 237890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:24:27,523-Speed 3327.22 samples/sec   Loss 0.6038   LearningRate 0.0083   Epoch: 14   Global Step: 237900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:30,583-Speed 3347.20 samples/sec   Loss 0.5927   LearningRate 0.0083   Epoch: 14   Global Step: 237910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:33,679-Speed 3308.51 samples/sec   Loss 0.5966   LearningRate 0.0083   Epoch: 14   Global Step: 237920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:36,741-Speed 3345.15 samples/sec   Loss 0.6241   LearningRate 0.0083   Epoch: 14   Global Step: 237930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:39,802-Speed 3346.26 samples/sec   Loss 0.5854   LearningRate 0.0082   Epoch: 14   Global Step: 237940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:42,886-Speed 3321.09 samples/sec   Loss 0.5802   LearningRate 0.0082   Epoch: 14   Global Step: 237950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:45,952-Speed 3340.51 samples/sec   Loss 0.6067   LearningRate 0.0082   Epoch: 14   Global Step: 237960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:49,018-Speed 3340.49 samples/sec   Loss 0.5911   LearningRate 0.0082   Epoch: 14   Global Step: 237970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:52,101-Speed 3322.72 samples/sec   Loss 0.6363   LearningRate 0.0082   Epoch: 14   Global Step: 237980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:55,165-Speed 3342.08 samples/sec   Loss 0.5910   LearningRate 0.0082   Epoch: 14   Global Step: 237990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:24:58,253-Speed 3316.56 samples/sec   Loss 0.5964   LearningRate 0.0082   Epoch: 14   Global Step: 238000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:25:41,889-[lfw][238000]XNorm: 22.044165
Training: 2022-04-12 00:25:41,889-[lfw][238000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-12 00:25:41,890-[lfw][238000]Accuracy-Highest: 0.99817
Training: 2022-04-12 00:26:32,765-[cfp_fp][238000]XNorm: 22.945901
Training: 2022-04-12 00:26:32,766-[cfp_fp][238000]Accuracy-Flip: 0.99186+-0.00373
Training: 2022-04-12 00:26:32,766-[cfp_fp][238000]Accuracy-Highest: 0.99186
Training: 2022-04-12 00:27:16,779-[agedb_30][238000]XNorm: 23.365036
Training: 2022-04-12 00:27:16,780-[agedb_30][238000]Accuracy-Flip: 0.98533+-0.00605
Training: 2022-04-12 00:27:16,780-[agedb_30][238000]Accuracy-Highest: 0.98567
Training: 2022-04-12 00:27:19,874-Speed 72.31 samples/sec   Loss 0.5862   LearningRate 0.0082   Epoch: 14   Global Step: 238010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:27:22,925-Speed 3355.93 samples/sec   Loss 0.5826   LearningRate 0.0082   Epoch: 14   Global Step: 238020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:27:25,978-Speed 3355.58 samples/sec   Loss 0.6006   LearningRate 0.0082   Epoch: 14   Global Step: 238030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:27:29,026-Speed 3359.75 samples/sec   Loss 0.5684   LearningRate 0.0082   Epoch: 14   Global Step: 238040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:27:32,086-Speed 3347.12 samples/sec   Loss 0.5927   LearningRate 0.0082   Epoch: 14   Global Step: 238050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:27:35,156-Speed 3336.20 samples/sec   Loss 0.5976   LearningRate 0.0082   Epoch: 14   Global Step: 238060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:27:38,212-Speed 3352.62 samples/sec   Loss 0.5771   LearningRate 0.0082   Epoch: 14   Global Step: 238070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:27:41,276-Speed 3342.33 samples/sec   Loss 0.5786   LearningRate 0.0082   Epoch: 14   Global Step: 238080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:27:44,359-Speed 3321.88 samples/sec   Loss 0.5782   LearningRate 0.0082   Epoch: 14   Global Step: 238090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:27:47,476-Speed 3286.27 samples/sec   Loss 0.5933   LearningRate 0.0082   Epoch: 14   Global Step: 238100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:27:50,550-Speed 3332.05 samples/sec   Loss 0.5946   LearningRate 0.0082   Epoch: 14   Global Step: 238110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:27:53,679-Speed 3273.42 samples/sec   Loss 0.5781   LearningRate 0.0082   Epoch: 14   Global Step: 238120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:27:56,745-Speed 3339.77 samples/sec   Loss 0.5860   LearningRate 0.0082   Epoch: 14   Global Step: 238130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:27:59,803-Speed 3349.30 samples/sec   Loss 0.5776   LearningRate 0.0082   Epoch: 14   Global Step: 238140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:28:02,873-Speed 3336.15 samples/sec   Loss 0.6043   LearningRate 0.0082   Epoch: 14   Global Step: 238150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:28:05,980-Speed 3297.06 samples/sec   Loss 0.5942   LearningRate 0.0082   Epoch: 14   Global Step: 238160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:28:09,050-Speed 3336.29 samples/sec   Loss 0.6174   LearningRate 0.0082   Epoch: 14   Global Step: 238170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:28:12,112-Speed 3345.36 samples/sec   Loss 0.6068   LearningRate 0.0082   Epoch: 14   Global Step: 238180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:28:15,179-Speed 3338.84 samples/sec   Loss 0.5887   LearningRate 0.0082   Epoch: 14   Global Step: 238190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:18,244-Speed 3341.96 samples/sec   Loss 0.5962   LearningRate 0.0082   Epoch: 14   Global Step: 238200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:21,310-Speed 3340.82 samples/sec   Loss 0.5850   LearningRate 0.0082   Epoch: 14   Global Step: 238210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:24,383-Speed 3333.02 samples/sec   Loss 0.6563   LearningRate 0.0082   Epoch: 14   Global Step: 238220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:27,454-Speed 3335.41 samples/sec   Loss 0.6125   LearningRate 0.0082   Epoch: 14   Global Step: 238230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:30,532-Speed 3327.32 samples/sec   Loss 0.6111   LearningRate 0.0082   Epoch: 14   Global Step: 238240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:33,605-Speed 3333.65 samples/sec   Loss 0.5683   LearningRate 0.0082   Epoch: 14   Global Step: 238250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:36,683-Speed 3327.74 samples/sec   Loss 0.5915   LearningRate 0.0082   Epoch: 14   Global Step: 238260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:39,748-Speed 3340.78 samples/sec   Loss 0.5792   LearningRate 0.0082   Epoch: 14   Global Step: 238270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:42,816-Speed 3339.21 samples/sec   Loss 0.5967   LearningRate 0.0082   Epoch: 14   Global Step: 238280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:45,874-Speed 3349.17 samples/sec   Loss 0.5667   LearningRate 0.0082   Epoch: 14   Global Step: 238290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:48,937-Speed 3344.12 samples/sec   Loss 0.5858   LearningRate 0.0082   Epoch: 14   Global Step: 238300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:52,007-Speed 3336.01 samples/sec   Loss 0.5701   LearningRate 0.0082   Epoch: 14   Global Step: 238310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:55,081-Speed 3332.02 samples/sec   Loss 0.5977   LearningRate 0.0082   Epoch: 14   Global Step: 238320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:28:58,167-Speed 3318.32 samples/sec   Loss 0.6035   LearningRate 0.0082   Epoch: 14   Global Step: 238330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:01,259-Speed 3313.33 samples/sec   Loss 0.5907   LearningRate 0.0082   Epoch: 14   Global Step: 238340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:04,462-Speed 3198.31 samples/sec   Loss 0.5951   LearningRate 0.0082   Epoch: 14   Global Step: 238350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:07,542-Speed 3324.45 samples/sec   Loss 0.6086   LearningRate 0.0082   Epoch: 14   Global Step: 238360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:10,623-Speed 3324.84 samples/sec   Loss 0.5848   LearningRate 0.0082   Epoch: 14   Global Step: 238370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:13,700-Speed 3329.03 samples/sec   Loss 0.5796   LearningRate 0.0082   Epoch: 14   Global Step: 238380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:16,773-Speed 3332.68 samples/sec   Loss 0.5752   LearningRate 0.0082   Epoch: 14   Global Step: 238390   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-12 00:29:19,824-Speed 3356.83 samples/sec   Loss 0.5778   LearningRate 0.0082   Epoch: 14   Global Step: 238400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:22,892-Speed 3337.64 samples/sec   Loss 0.5583   LearningRate 0.0082   Epoch: 14   Global Step: 238410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:25,952-Speed 3347.39 samples/sec   Loss 0.5937   LearningRate 0.0082   Epoch: 14   Global Step: 238420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:29,052-Speed 3304.72 samples/sec   Loss 0.5683   LearningRate 0.0082   Epoch: 14   Global Step: 238430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:32,112-Speed 3347.77 samples/sec   Loss 0.6033   LearningRate 0.0082   Epoch: 14   Global Step: 238440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:35,176-Speed 3343.46 samples/sec   Loss 0.6086   LearningRate 0.0082   Epoch: 14   Global Step: 238450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:38,251-Speed 3330.01 samples/sec   Loss 0.5803   LearningRate 0.0082   Epoch: 14   Global Step: 238460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:41,311-Speed 3347.81 samples/sec   Loss 0.5734   LearningRate 0.0082   Epoch: 14   Global Step: 238470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:44,379-Speed 3338.33 samples/sec   Loss 0.6143   LearningRate 0.0082   Epoch: 14   Global Step: 238480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:29:47,432-Speed 3355.03 samples/sec   Loss 0.6227   LearningRate 0.0082   Epoch: 14   Global Step: 238490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:29:50,504-Speed 3333.82 samples/sec   Loss 0.6213   LearningRate 0.0082   Epoch: 14   Global Step: 238500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:29:53,589-Speed 3320.32 samples/sec   Loss 0.6031   LearningRate 0.0082   Epoch: 14   Global Step: 238510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:29:56,673-Speed 3321.25 samples/sec   Loss 0.6295   LearningRate 0.0082   Epoch: 14   Global Step: 238520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:29:59,784-Speed 3292.44 samples/sec   Loss 0.5999   LearningRate 0.0081   Epoch: 14   Global Step: 238530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:30:02,866-Speed 3322.87 samples/sec   Loss 0.6234   LearningRate 0.0081   Epoch: 14   Global Step: 238540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:30:05,974-Speed 3295.56 samples/sec   Loss 0.5966   LearningRate 0.0081   Epoch: 14   Global Step: 238550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:30:09,035-Speed 3345.74 samples/sec   Loss 0.5895   LearningRate 0.0081   Epoch: 14   Global Step: 238560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:30:12,099-Speed 3344.19 samples/sec   Loss 0.5890   LearningRate 0.0081   Epoch: 14   Global Step: 238570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:30:15,204-Speed 3298.61 samples/sec   Loss 0.6283   LearningRate 0.0081   Epoch: 14   Global Step: 238580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:30:18,274-Speed 3336.31 samples/sec   Loss 0.5942   LearningRate 0.0081   Epoch: 14   Global Step: 238590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:30:21,419-Speed 3256.35 samples/sec   Loss 0.5717   LearningRate 0.0081   Epoch: 14   Global Step: 238600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:24,497-Speed 3328.36 samples/sec   Loss 0.5817   LearningRate 0.0081   Epoch: 14   Global Step: 238610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:27,612-Speed 3288.21 samples/sec   Loss 0.5894   LearningRate 0.0081   Epoch: 14   Global Step: 238620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:30,696-Speed 3320.58 samples/sec   Loss 0.5741   LearningRate 0.0081   Epoch: 14   Global Step: 238630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:33,780-Speed 3321.80 samples/sec   Loss 0.6012   LearningRate 0.0081   Epoch: 14   Global Step: 238640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:36,848-Speed 3337.52 samples/sec   Loss 0.6175   LearningRate 0.0081   Epoch: 14   Global Step: 238650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:39,925-Speed 3328.42 samples/sec   Loss 0.6101   LearningRate 0.0081   Epoch: 14   Global Step: 238660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:42,999-Speed 3332.21 samples/sec   Loss 0.5875   LearningRate 0.0081   Epoch: 14   Global Step: 238670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:46,078-Speed 3326.52 samples/sec   Loss 0.6058   LearningRate 0.0081   Epoch: 14   Global Step: 238680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:49,144-Speed 3340.18 samples/sec   Loss 0.6167   LearningRate 0.0081   Epoch: 14   Global Step: 238690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:52,210-Speed 3341.58 samples/sec   Loss 0.5994   LearningRate 0.0081   Epoch: 14   Global Step: 238700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:55,339-Speed 3272.88 samples/sec   Loss 0.6059   LearningRate 0.0081   Epoch: 14   Global Step: 238710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:30:58,424-Speed 3319.98 samples/sec   Loss 0.6209   LearningRate 0.0081   Epoch: 14   Global Step: 238720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:01,491-Speed 3339.80 samples/sec   Loss 0.6238   LearningRate 0.0081   Epoch: 14   Global Step: 238730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:04,552-Speed 3346.14 samples/sec   Loss 0.5856   LearningRate 0.0081   Epoch: 14   Global Step: 238740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:07,616-Speed 3342.78 samples/sec   Loss 0.5851   LearningRate 0.0081   Epoch: 14   Global Step: 238750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:10,685-Speed 3336.94 samples/sec   Loss 0.5778   LearningRate 0.0081   Epoch: 14   Global Step: 238760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:13,751-Speed 3340.69 samples/sec   Loss 0.6143   LearningRate 0.0081   Epoch: 14   Global Step: 238770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:16,830-Speed 3326.66 samples/sec   Loss 0.6339   LearningRate 0.0081   Epoch: 14   Global Step: 238780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:19,977-Speed 3255.15 samples/sec   Loss 0.5816   LearningRate 0.0081   Epoch: 14   Global Step: 238790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:23,043-Speed 3340.75 samples/sec   Loss 0.6206   LearningRate 0.0081   Epoch: 14   Global Step: 238800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:31:26,163-Speed 3282.14 samples/sec   Loss 0.6037   LearningRate 0.0081   Epoch: 14   Global Step: 238810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:31:29,262-Speed 3305.49 samples/sec   Loss 0.6054   LearningRate 0.0081   Epoch: 14   Global Step: 238820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:31:32,470-Speed 3192.60 samples/sec   Loss 0.5674   LearningRate 0.0081   Epoch: 14   Global Step: 238830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:31:35,535-Speed 3341.75 samples/sec   Loss 0.5532   LearningRate 0.0081   Epoch: 14   Global Step: 238840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:31:38,611-Speed 3329.31 samples/sec   Loss 0.5880   LearningRate 0.0081   Epoch: 14   Global Step: 238850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:31:41,707-Speed 3308.54 samples/sec   Loss 0.5868   LearningRate 0.0081   Epoch: 14   Global Step: 238860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:31:44,790-Speed 3322.50 samples/sec   Loss 0.6134   LearningRate 0.0081   Epoch: 14   Global Step: 238870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:31:47,861-Speed 3335.82 samples/sec   Loss 0.5978   LearningRate 0.0081   Epoch: 14   Global Step: 238880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:50,929-Speed 3338.22 samples/sec   Loss 0.6023   LearningRate 0.0081   Epoch: 14   Global Step: 238890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:53,994-Speed 3340.98 samples/sec   Loss 0.5945   LearningRate 0.0081   Epoch: 14   Global Step: 238900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:31:57,057-Speed 3344.04 samples/sec   Loss 0.6031   LearningRate 0.0081   Epoch: 14   Global Step: 238910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:32:00,173-Speed 3287.67 samples/sec   Loss 0.5919   LearningRate 0.0081   Epoch: 14   Global Step: 238920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:32:03,291-Speed 3284.77 samples/sec   Loss 0.5682   LearningRate 0.0081   Epoch: 14   Global Step: 238930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:32:06,380-Speed 3314.96 samples/sec   Loss 0.5801   LearningRate 0.0081   Epoch: 14   Global Step: 238940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:32:09,460-Speed 3325.18 samples/sec   Loss 0.6265   LearningRate 0.0081   Epoch: 14   Global Step: 238950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:32:12,571-Speed 3292.61 samples/sec   Loss 0.6057   LearningRate 0.0081   Epoch: 14   Global Step: 238960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:32:15,637-Speed 3341.50 samples/sec   Loss 0.6197   LearningRate 0.0081   Epoch: 14   Global Step: 238970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:32:18,698-Speed 3345.94 samples/sec   Loss 0.6016   LearningRate 0.0081   Epoch: 14   Global Step: 238980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:21,790-Speed 3311.91 samples/sec   Loss 0.5838   LearningRate 0.0081   Epoch: 14   Global Step: 238990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:24,859-Speed 3337.54 samples/sec   Loss 0.5858   LearningRate 0.0081   Epoch: 14   Global Step: 239000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:27,947-Speed 3317.24 samples/sec   Loss 0.6400   LearningRate 0.0081   Epoch: 14   Global Step: 239010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:31,016-Speed 3336.84 samples/sec   Loss 0.5957   LearningRate 0.0081   Epoch: 14   Global Step: 239020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:34,097-Speed 3324.10 samples/sec   Loss 0.6026   LearningRate 0.0081   Epoch: 14   Global Step: 239030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:37,288-Speed 3210.04 samples/sec   Loss 0.6062   LearningRate 0.0081   Epoch: 14   Global Step: 239040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:40,393-Speed 3299.11 samples/sec   Loss 0.6069   LearningRate 0.0081   Epoch: 14   Global Step: 239050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:43,462-Speed 3337.48 samples/sec   Loss 0.5832   LearningRate 0.0081   Epoch: 14   Global Step: 239060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:46,533-Speed 3334.93 samples/sec   Loss 0.6073   LearningRate 0.0081   Epoch: 14   Global Step: 239070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:49,591-Speed 3349.10 samples/sec   Loss 0.6142   LearningRate 0.0081   Epoch: 14   Global Step: 239080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:52,754-Speed 3237.93 samples/sec   Loss 0.6177   LearningRate 0.0081   Epoch: 14   Global Step: 239090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:55,848-Speed 3310.52 samples/sec   Loss 0.6059   LearningRate 0.0081   Epoch: 14   Global Step: 239100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:32:58,921-Speed 3333.50 samples/sec   Loss 0.6044   LearningRate 0.0080   Epoch: 14   Global Step: 239110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:33:02,041-Speed 3282.56 samples/sec   Loss 0.6080   LearningRate 0.0080   Epoch: 14   Global Step: 239120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:33:05,254-Speed 3188.02 samples/sec   Loss 0.5914   LearningRate 0.0080   Epoch: 14   Global Step: 239130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:33:08,398-Speed 3258.14 samples/sec   Loss 0.5803   LearningRate 0.0080   Epoch: 14   Global Step: 239140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:33:11,514-Speed 3286.11 samples/sec   Loss 0.6177   LearningRate 0.0080   Epoch: 14   Global Step: 239150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:33:14,589-Speed 3330.70 samples/sec   Loss 0.6162   LearningRate 0.0080   Epoch: 14   Global Step: 239160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:33:17,719-Speed 3272.53 samples/sec   Loss 0.5859   LearningRate 0.0080   Epoch: 14   Global Step: 239170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:20,819-Speed 3304.23 samples/sec   Loss 0.5749   LearningRate 0.0080   Epoch: 14   Global Step: 239180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:23,887-Speed 3338.05 samples/sec   Loss 0.6113   LearningRate 0.0080   Epoch: 14   Global Step: 239190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:26,956-Speed 3337.28 samples/sec   Loss 0.6303   LearningRate 0.0080   Epoch: 14   Global Step: 239200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:30,036-Speed 3326.55 samples/sec   Loss 0.6182   LearningRate 0.0080   Epoch: 14   Global Step: 239210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:33,107-Speed 3335.28 samples/sec   Loss 0.5818   LearningRate 0.0080   Epoch: 14   Global Step: 239220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:36,175-Speed 3339.03 samples/sec   Loss 0.5536   LearningRate 0.0080   Epoch: 14   Global Step: 239230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:39,244-Speed 3336.83 samples/sec   Loss 0.5905   LearningRate 0.0080   Epoch: 14   Global Step: 239240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:42,320-Speed 3329.91 samples/sec   Loss 0.6171   LearningRate 0.0080   Epoch: 14   Global Step: 239250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:45,398-Speed 3327.18 samples/sec   Loss 0.6101   LearningRate 0.0080   Epoch: 14   Global Step: 239260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:48,469-Speed 3335.45 samples/sec   Loss 0.6139   LearningRate 0.0080   Epoch: 14   Global Step: 239270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:33:51,538-Speed 3336.87 samples/sec   Loss 0.5972   LearningRate 0.0080   Epoch: 14   Global Step: 239280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:33:54,593-Speed 3352.61 samples/sec   Loss 0.6211   LearningRate 0.0080   Epoch: 14   Global Step: 239290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:33:57,664-Speed 3335.11 samples/sec   Loss 0.6108   LearningRate 0.0080   Epoch: 14   Global Step: 239300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:00,737-Speed 3333.28 samples/sec   Loss 0.6210   LearningRate 0.0080   Epoch: 14   Global Step: 239310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:03,820-Speed 3322.72 samples/sec   Loss 0.5997   LearningRate 0.0080   Epoch: 14   Global Step: 239320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:06,952-Speed 3270.18 samples/sec   Loss 0.5711   LearningRate 0.0080   Epoch: 14   Global Step: 239330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:10,034-Speed 3323.04 samples/sec   Loss 0.5905   LearningRate 0.0080   Epoch: 14   Global Step: 239340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:13,100-Speed 3340.44 samples/sec   Loss 0.5919   LearningRate 0.0080   Epoch: 14   Global Step: 239350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:16,216-Speed 3287.66 samples/sec   Loss 0.6137   LearningRate 0.0080   Epoch: 14   Global Step: 239360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:19,293-Speed 3328.41 samples/sec   Loss 0.6287   LearningRate 0.0080   Epoch: 14   Global Step: 239370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:22,444-Speed 3250.73 samples/sec   Loss 0.6162   LearningRate 0.0080   Epoch: 14   Global Step: 239380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:25,503-Speed 3347.89 samples/sec   Loss 0.6170   LearningRate 0.0080   Epoch: 14   Global Step: 239390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:28,573-Speed 3336.26 samples/sec   Loss 0.6078   LearningRate 0.0080   Epoch: 14   Global Step: 239400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:34:31,631-Speed 3349.67 samples/sec   Loss 0.6151   LearningRate 0.0080   Epoch: 14   Global Step: 239410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:34:34,720-Speed 3316.07 samples/sec   Loss 0.5969   LearningRate 0.0080   Epoch: 14   Global Step: 239420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:34:37,929-Speed 3191.73 samples/sec   Loss 0.6011   LearningRate 0.0080   Epoch: 14   Global Step: 239430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:34:41,004-Speed 3330.86 samples/sec   Loss 0.5806   LearningRate 0.0080   Epoch: 14   Global Step: 239440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:34:44,109-Speed 3298.77 samples/sec   Loss 0.6256   LearningRate 0.0080   Epoch: 14   Global Step: 239450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:34:47,190-Speed 3323.48 samples/sec   Loss 0.5666   LearningRate 0.0080   Epoch: 14   Global Step: 239460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:34:50,297-Speed 3296.59 samples/sec   Loss 0.6135   LearningRate 0.0080   Epoch: 14   Global Step: 239470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:34:53,411-Speed 3289.37 samples/sec   Loss 0.6175   LearningRate 0.0080   Epoch: 14   Global Step: 239480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:34:56,483-Speed 3334.77 samples/sec   Loss 0.5986   LearningRate 0.0080   Epoch: 14   Global Step: 239490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:34:59,566-Speed 3322.12 samples/sec   Loss 0.6060   LearningRate 0.0080   Epoch: 14   Global Step: 239500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:35:02,633-Speed 3339.37 samples/sec   Loss 0.6068   LearningRate 0.0080   Epoch: 14   Global Step: 239510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:05,734-Speed 3302.49 samples/sec   Loss 0.6283   LearningRate 0.0080   Epoch: 14   Global Step: 239520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:08,820-Speed 3318.90 samples/sec   Loss 0.6170   LearningRate 0.0080   Epoch: 14   Global Step: 239530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:11,913-Speed 3311.41 samples/sec   Loss 0.6041   LearningRate 0.0080   Epoch: 14   Global Step: 239540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:14,980-Speed 3339.10 samples/sec   Loss 0.6179   LearningRate 0.0080   Epoch: 14   Global Step: 239550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:18,088-Speed 3295.78 samples/sec   Loss 0.5848   LearningRate 0.0080   Epoch: 14   Global Step: 239560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:21,190-Speed 3301.92 samples/sec   Loss 0.6160   LearningRate 0.0080   Epoch: 14   Global Step: 239570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:24,292-Speed 3302.32 samples/sec   Loss 0.6126   LearningRate 0.0080   Epoch: 14   Global Step: 239580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:27,377-Speed 3319.81 samples/sec   Loss 0.6231   LearningRate 0.0080   Epoch: 14   Global Step: 239590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:30,448-Speed 3335.76 samples/sec   Loss 0.6181   LearningRate 0.0080   Epoch: 14   Global Step: 239600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:35:33,555-Speed 3295.98 samples/sec   Loss 0.6035   LearningRate 0.0080   Epoch: 14   Global Step: 239610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:35:36,627-Speed 3334.60 samples/sec   Loss 0.6102   LearningRate 0.0080   Epoch: 14   Global Step: 239620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:35:39,705-Speed 3327.02 samples/sec   Loss 0.5790   LearningRate 0.0080   Epoch: 14   Global Step: 239630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:35:42,782-Speed 3328.80 samples/sec   Loss 0.6005   LearningRate 0.0080   Epoch: 14   Global Step: 239640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:35:45,881-Speed 3305.67 samples/sec   Loss 0.5759   LearningRate 0.0080   Epoch: 14   Global Step: 239650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:35:48,985-Speed 3299.47 samples/sec   Loss 0.6029   LearningRate 0.0080   Epoch: 14   Global Step: 239660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:35:52,055-Speed 3336.36 samples/sec   Loss 0.5713   LearningRate 0.0080   Epoch: 14   Global Step: 239670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:35:55,123-Speed 3339.11 samples/sec   Loss 0.5975   LearningRate 0.0080   Epoch: 14   Global Step: 239680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:35:58,196-Speed 3332.08 samples/sec   Loss 0.5967   LearningRate 0.0080   Epoch: 14   Global Step: 239690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:01,266-Speed 3336.56 samples/sec   Loss 0.5913   LearningRate 0.0079   Epoch: 14   Global Step: 239700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:04,326-Speed 3347.45 samples/sec   Loss 0.6168   LearningRate 0.0079   Epoch: 14   Global Step: 239710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:07,469-Speed 3258.80 samples/sec   Loss 0.5979   LearningRate 0.0079   Epoch: 14   Global Step: 239720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:10,646-Speed 3223.56 samples/sec   Loss 0.6203   LearningRate 0.0079   Epoch: 14   Global Step: 239730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:13,715-Speed 3337.79 samples/sec   Loss 0.6043   LearningRate 0.0079   Epoch: 14   Global Step: 239740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:16,891-Speed 3225.16 samples/sec   Loss 0.6140   LearningRate 0.0079   Epoch: 14   Global Step: 239750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:19,994-Speed 3300.51 samples/sec   Loss 0.6280   LearningRate 0.0079   Epoch: 14   Global Step: 239760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:23,153-Speed 3242.67 samples/sec   Loss 0.5938   LearningRate 0.0079   Epoch: 14   Global Step: 239770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:26,320-Speed 3234.07 samples/sec   Loss 0.5638   LearningRate 0.0079   Epoch: 14   Global Step: 239780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:29,442-Speed 3280.24 samples/sec   Loss 0.6066   LearningRate 0.0079   Epoch: 14   Global Step: 239790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:32,519-Speed 3328.54 samples/sec   Loss 0.6016   LearningRate 0.0079   Epoch: 14   Global Step: 239800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:35,615-Speed 3308.30 samples/sec   Loss 0.6047   LearningRate 0.0079   Epoch: 14   Global Step: 239810   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-12 00:36:38,678-Speed 3344.08 samples/sec   Loss 0.6005   LearningRate 0.0079   Epoch: 14   Global Step: 239820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:41,782-Speed 3300.33 samples/sec   Loss 0.6453   LearningRate 0.0079   Epoch: 14   Global Step: 239830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:36:44,840-Speed 3349.41 samples/sec   Loss 0.5821   LearningRate 0.0079   Epoch: 14   Global Step: 239840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:36:47,924-Speed 3320.88 samples/sec   Loss 0.6153   LearningRate 0.0079   Epoch: 14   Global Step: 239850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:36:51,003-Speed 3326.78 samples/sec   Loss 0.6020   LearningRate 0.0079   Epoch: 14   Global Step: 239860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:36:54,070-Speed 3339.13 samples/sec   Loss 0.6375   LearningRate 0.0079   Epoch: 14   Global Step: 239870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:36:57,136-Speed 3339.98 samples/sec   Loss 0.5991   LearningRate 0.0079   Epoch: 14   Global Step: 239880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:37:00,229-Speed 3311.59 samples/sec   Loss 0.6161   LearningRate 0.0079   Epoch: 14   Global Step: 239890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:37:03,308-Speed 3327.01 samples/sec   Loss 0.5899   LearningRate 0.0079   Epoch: 14   Global Step: 239900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:37:06,377-Speed 3336.64 samples/sec   Loss 0.6275   LearningRate 0.0079   Epoch: 14   Global Step: 239910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:37:09,454-Speed 3329.45 samples/sec   Loss 0.5923   LearningRate 0.0079   Epoch: 14   Global Step: 239920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:37:12,532-Speed 3327.64 samples/sec   Loss 0.6182   LearningRate 0.0079   Epoch: 14   Global Step: 239930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:37:15,604-Speed 3333.61 samples/sec   Loss 0.6090   LearningRate 0.0079   Epoch: 14   Global Step: 239940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:37:18,698-Speed 3311.16 samples/sec   Loss 0.6082   LearningRate 0.0079   Epoch: 14   Global Step: 239950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:37:21,794-Speed 3307.79 samples/sec   Loss 0.6055   LearningRate 0.0079   Epoch: 14   Global Step: 239960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:37:24,871-Speed 3328.02 samples/sec   Loss 0.5944   LearningRate 0.0079   Epoch: 14   Global Step: 239970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:37:27,953-Speed 3323.49 samples/sec   Loss 0.5998   LearningRate 0.0079   Epoch: 14   Global Step: 239980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:37:31,039-Speed 3318.74 samples/sec   Loss 0.6173   LearningRate 0.0079   Epoch: 14   Global Step: 239990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:37:34,107-Speed 3339.11 samples/sec   Loss 0.6158   LearningRate 0.0079   Epoch: 14   Global Step: 240000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:38:18,078-[lfw][240000]XNorm: 19.739302
Training: 2022-04-12 00:38:18,079-[lfw][240000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 00:38:18,079-[lfw][240000]Accuracy-Highest: 0.99817
Training: 2022-04-12 00:39:09,671-[cfp_fp][240000]XNorm: 20.434585
Training: 2022-04-12 00:39:09,672-[cfp_fp][240000]Accuracy-Flip: 0.99043+-0.00414
Training: 2022-04-12 00:39:09,673-[cfp_fp][240000]Accuracy-Highest: 0.99186
Training: 2022-04-12 00:39:53,750-[agedb_30][240000]XNorm: 21.355657
Training: 2022-04-12 00:39:53,751-[agedb_30][240000]Accuracy-Flip: 0.98500+-0.00592
Training: 2022-04-12 00:39:53,751-[agedb_30][240000]Accuracy-Highest: 0.98567
Training: 2022-04-12 00:39:56,851-Speed 71.74 samples/sec   Loss 0.6018   LearningRate 0.0079   Epoch: 14   Global Step: 240010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:39:59,905-Speed 3353.49 samples/sec   Loss 0.6159   LearningRate 0.0079   Epoch: 14   Global Step: 240020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:02,961-Speed 3351.98 samples/sec   Loss 0.6181   LearningRate 0.0079   Epoch: 14   Global Step: 240030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:06,044-Speed 3322.32 samples/sec   Loss 0.6387   LearningRate 0.0079   Epoch: 14   Global Step: 240040   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-12 00:40:09,109-Speed 3341.87 samples/sec   Loss 0.5991   LearningRate 0.0079   Epoch: 14   Global Step: 240050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:12,170-Speed 3345.99 samples/sec   Loss 0.6259   LearningRate 0.0079   Epoch: 14   Global Step: 240060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:15,288-Speed 3285.02 samples/sec   Loss 0.6161   LearningRate 0.0079   Epoch: 14   Global Step: 240070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:18,414-Speed 3276.65 samples/sec   Loss 0.6210   LearningRate 0.0079   Epoch: 14   Global Step: 240080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:21,560-Speed 3255.64 samples/sec   Loss 0.5922   LearningRate 0.0079   Epoch: 14   Global Step: 240090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:24,620-Speed 3346.43 samples/sec   Loss 0.5982   LearningRate 0.0079   Epoch: 14   Global Step: 240100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:27,728-Speed 3296.13 samples/sec   Loss 0.6335   LearningRate 0.0079   Epoch: 14   Global Step: 240110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:30,897-Speed 3232.25 samples/sec   Loss 0.5956   LearningRate 0.0079   Epoch: 14   Global Step: 240120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:34,046-Speed 3252.17 samples/sec   Loss 0.6081   LearningRate 0.0079   Epoch: 14   Global Step: 240130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:40:37,177-Speed 3271.52 samples/sec   Loss 0.6205   LearningRate 0.0079   Epoch: 14   Global Step: 240140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:40:40,238-Speed 3346.02 samples/sec   Loss 0.5848   LearningRate 0.0079   Epoch: 14   Global Step: 240150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:40:43,317-Speed 3326.37 samples/sec   Loss 0.6267   LearningRate 0.0079   Epoch: 14   Global Step: 240160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:40:46,387-Speed 3336.66 samples/sec   Loss 0.5996   LearningRate 0.0079   Epoch: 14   Global Step: 240170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:40:49,456-Speed 3336.74 samples/sec   Loss 0.5922   LearningRate 0.0079   Epoch: 14   Global Step: 240180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:40:52,557-Speed 3303.44 samples/sec   Loss 0.5978   LearningRate 0.0079   Epoch: 14   Global Step: 240190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:40:55,633-Speed 3328.78 samples/sec   Loss 0.6188   LearningRate 0.0079   Epoch: 14   Global Step: 240200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:40:58,712-Speed 3326.95 samples/sec   Loss 0.6412   LearningRate 0.0079   Epoch: 14   Global Step: 240210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:01,794-Speed 3323.35 samples/sec   Loss 0.6320   LearningRate 0.0079   Epoch: 14   Global Step: 240220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:04,870-Speed 3330.02 samples/sec   Loss 0.6526   LearningRate 0.0079   Epoch: 14   Global Step: 240230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:07,943-Speed 3333.66 samples/sec   Loss 0.6171   LearningRate 0.0079   Epoch: 14   Global Step: 240240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:41:11,056-Speed 3289.50 samples/sec   Loss 0.5857   LearningRate 0.0079   Epoch: 14   Global Step: 240250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:14,146-Speed 3315.07 samples/sec   Loss 0.6206   LearningRate 0.0079   Epoch: 14   Global Step: 240260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:17,221-Speed 3330.57 samples/sec   Loss 0.6084   LearningRate 0.0079   Epoch: 14   Global Step: 240270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:20,293-Speed 3333.74 samples/sec   Loss 0.5611   LearningRate 0.0079   Epoch: 14   Global Step: 240280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:23,359-Speed 3340.16 samples/sec   Loss 0.5780   LearningRate 0.0079   Epoch: 14   Global Step: 240290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:26,538-Speed 3222.85 samples/sec   Loss 0.6134   LearningRate 0.0078   Epoch: 14   Global Step: 240300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:29,768-Speed 3170.42 samples/sec   Loss 0.6037   LearningRate 0.0078   Epoch: 14   Global Step: 240310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:32,893-Speed 3277.95 samples/sec   Loss 0.6284   LearningRate 0.0078   Epoch: 14   Global Step: 240320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:35,979-Speed 3319.21 samples/sec   Loss 0.6156   LearningRate 0.0078   Epoch: 14   Global Step: 240330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:39,057-Speed 3327.34 samples/sec   Loss 0.6298   LearningRate 0.0078   Epoch: 14   Global Step: 240340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:41:42,129-Speed 3333.58 samples/sec   Loss 0.5839   LearningRate 0.0078   Epoch: 14   Global Step: 240350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:41:45,202-Speed 3333.06 samples/sec   Loss 0.5848   LearningRate 0.0078   Epoch: 14   Global Step: 240360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:41:48,274-Speed 3334.17 samples/sec   Loss 0.6027   LearningRate 0.0078   Epoch: 14   Global Step: 240370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:41:51,340-Speed 3340.86 samples/sec   Loss 0.6104   LearningRate 0.0078   Epoch: 14   Global Step: 240380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:41:54,415-Speed 3331.38 samples/sec   Loss 0.6168   LearningRate 0.0078   Epoch: 14   Global Step: 240390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:41:57,473-Speed 3349.04 samples/sec   Loss 0.6150   LearningRate 0.0078   Epoch: 14   Global Step: 240400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:00,542-Speed 3337.68 samples/sec   Loss 0.5948   LearningRate 0.0078   Epoch: 14   Global Step: 240410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:03,619-Speed 3328.01 samples/sec   Loss 0.6249   LearningRate 0.0078   Epoch: 14   Global Step: 240420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:06,711-Speed 3312.57 samples/sec   Loss 0.5887   LearningRate 0.0078   Epoch: 14   Global Step: 240430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:09,771-Speed 3346.94 samples/sec   Loss 0.6313   LearningRate 0.0078   Epoch: 14   Global Step: 240440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:12,829-Speed 3349.51 samples/sec   Loss 0.6140   LearningRate 0.0078   Epoch: 14   Global Step: 240450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:15,987-Speed 3243.63 samples/sec   Loss 0.5955   LearningRate 0.0078   Epoch: 14   Global Step: 240460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:19,105-Speed 3285.16 samples/sec   Loss 0.6217   LearningRate 0.0078   Epoch: 14   Global Step: 240470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:22,183-Speed 3328.13 samples/sec   Loss 0.6154   LearningRate 0.0078   Epoch: 14   Global Step: 240480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:25,271-Speed 3316.59 samples/sec   Loss 0.6210   LearningRate 0.0078   Epoch: 14   Global Step: 240490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:28,327-Speed 3350.83 samples/sec   Loss 0.6239   LearningRate 0.0078   Epoch: 14   Global Step: 240500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:31,390-Speed 3344.66 samples/sec   Loss 0.6083   LearningRate 0.0078   Epoch: 14   Global Step: 240510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:34,449-Speed 3347.87 samples/sec   Loss 0.6151   LearningRate 0.0078   Epoch: 14   Global Step: 240520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:37,511-Speed 3344.83 samples/sec   Loss 0.6112   LearningRate 0.0078   Epoch: 14   Global Step: 240530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:40,575-Speed 3342.27 samples/sec   Loss 0.6346   LearningRate 0.0078   Epoch: 14   Global Step: 240540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:42:43,634-Speed 3348.83 samples/sec   Loss 0.6097   LearningRate 0.0078   Epoch: 14   Global Step: 240550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:42:46,711-Speed 3329.15 samples/sec   Loss 0.6200   LearningRate 0.0078   Epoch: 14   Global Step: 240560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:42:49,799-Speed 3316.36 samples/sec   Loss 0.6392   LearningRate 0.0078   Epoch: 14   Global Step: 240570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:42:52,924-Speed 3278.06 samples/sec   Loss 0.5981   LearningRate 0.0078   Epoch: 14   Global Step: 240580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:42:56,012-Speed 3316.50 samples/sec   Loss 0.6438   LearningRate 0.0078   Epoch: 14   Global Step: 240590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:42:59,076-Speed 3342.67 samples/sec   Loss 0.6101   LearningRate 0.0078   Epoch: 14   Global Step: 240600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:43:02,158-Speed 3323.68 samples/sec   Loss 0.6102   LearningRate 0.0078   Epoch: 14   Global Step: 240610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:43:05,249-Speed 3312.69 samples/sec   Loss 0.6356   LearningRate 0.0078   Epoch: 14   Global Step: 240620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:43:08,355-Speed 3298.10 samples/sec   Loss 0.6380   LearningRate 0.0078   Epoch: 14   Global Step: 240630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:43:11,433-Speed 3326.86 samples/sec   Loss 0.6118   LearningRate 0.0078   Epoch: 14   Global Step: 240640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:43:14,511-Speed 3327.86 samples/sec   Loss 0.5945   LearningRate 0.0078   Epoch: 14   Global Step: 240650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:17,595-Speed 3321.18 samples/sec   Loss 0.5939   LearningRate 0.0078   Epoch: 14   Global Step: 240660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:20,666-Speed 3335.72 samples/sec   Loss 0.6169   LearningRate 0.0078   Epoch: 14   Global Step: 240670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:23,747-Speed 3323.76 samples/sec   Loss 0.6170   LearningRate 0.0078   Epoch: 14   Global Step: 240680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:26,811-Speed 3342.79 samples/sec   Loss 0.6201   LearningRate 0.0078   Epoch: 14   Global Step: 240690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:29,966-Speed 3246.53 samples/sec   Loss 0.5873   LearningRate 0.0078   Epoch: 14   Global Step: 240700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:33,097-Speed 3270.67 samples/sec   Loss 0.5948   LearningRate 0.0078   Epoch: 14   Global Step: 240710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:36,217-Speed 3283.41 samples/sec   Loss 0.6039   LearningRate 0.0078   Epoch: 14   Global Step: 240720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:39,320-Speed 3300.09 samples/sec   Loss 0.5904   LearningRate 0.0078   Epoch: 14   Global Step: 240730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:42,391-Speed 3336.40 samples/sec   Loss 0.6173   LearningRate 0.0078   Epoch: 14   Global Step: 240740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:45,536-Speed 3256.25 samples/sec   Loss 0.6133   LearningRate 0.0078   Epoch: 14   Global Step: 240750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:48,606-Speed 3336.40 samples/sec   Loss 0.6373   LearningRate 0.0078   Epoch: 14   Global Step: 240760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:51,689-Speed 3322.40 samples/sec   Loss 0.5996   LearningRate 0.0078   Epoch: 14   Global Step: 240770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:54,752-Speed 3342.98 samples/sec   Loss 0.6101   LearningRate 0.0078   Epoch: 14   Global Step: 240780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:43:57,817-Speed 3342.17 samples/sec   Loss 0.5918   LearningRate 0.0078   Epoch: 14   Global Step: 240790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:44:00,893-Speed 3330.07 samples/sec   Loss 0.6067   LearningRate 0.0078   Epoch: 14   Global Step: 240800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:44:03,949-Speed 3351.35 samples/sec   Loss 0.6369   LearningRate 0.0078   Epoch: 14   Global Step: 240810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:07,013-Speed 3343.35 samples/sec   Loss 0.5807   LearningRate 0.0078   Epoch: 14   Global Step: 240820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:10,076-Speed 3343.49 samples/sec   Loss 0.6118   LearningRate 0.0078   Epoch: 14   Global Step: 240830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:13,151-Speed 3330.58 samples/sec   Loss 0.6198   LearningRate 0.0078   Epoch: 14   Global Step: 240840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:16,233-Speed 3323.98 samples/sec   Loss 0.6577   LearningRate 0.0078   Epoch: 14   Global Step: 240850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:19,304-Speed 3334.24 samples/sec   Loss 0.6100   LearningRate 0.0078   Epoch: 14   Global Step: 240860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:22,366-Speed 3345.68 samples/sec   Loss 0.6051   LearningRate 0.0078   Epoch: 14   Global Step: 240870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:25,428-Speed 3344.50 samples/sec   Loss 0.6029   LearningRate 0.0078   Epoch: 14   Global Step: 240880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:28,499-Speed 3335.03 samples/sec   Loss 0.6317   LearningRate 0.0077   Epoch: 14   Global Step: 240890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:31,566-Speed 3339.29 samples/sec   Loss 0.6176   LearningRate 0.0077   Epoch: 14   Global Step: 240900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:34,643-Speed 3328.83 samples/sec   Loss 0.5904   LearningRate 0.0077   Epoch: 14   Global Step: 240910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:44:37,713-Speed 3336.34 samples/sec   Loss 0.6246   LearningRate 0.0077   Epoch: 14   Global Step: 240920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:44:40,769-Speed 3351.39 samples/sec   Loss 0.6003   LearningRate 0.0077   Epoch: 14   Global Step: 240930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:43,887-Speed 3285.66 samples/sec   Loss 0.6243   LearningRate 0.0077   Epoch: 14   Global Step: 240940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:46,956-Speed 3337.25 samples/sec   Loss 0.6176   LearningRate 0.0077   Epoch: 14   Global Step: 240950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:50,040-Speed 3321.20 samples/sec   Loss 0.6125   LearningRate 0.0077   Epoch: 14   Global Step: 240960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:53,111-Speed 3334.32 samples/sec   Loss 0.6179   LearningRate 0.0077   Epoch: 14   Global Step: 240970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:56,247-Speed 3266.06 samples/sec   Loss 0.6416   LearningRate 0.0077   Epoch: 14   Global Step: 240980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:44:59,328-Speed 3324.43 samples/sec   Loss 0.6241   LearningRate 0.0077   Epoch: 14   Global Step: 240990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:02,422-Speed 3310.89 samples/sec   Loss 0.5970   LearningRate 0.0077   Epoch: 14   Global Step: 241000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:05,485-Speed 3343.91 samples/sec   Loss 0.6232   LearningRate 0.0077   Epoch: 14   Global Step: 241010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:08,575-Speed 3314.03 samples/sec   Loss 0.6347   LearningRate 0.0077   Epoch: 14   Global Step: 241020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:11,747-Speed 3230.00 samples/sec   Loss 0.6001   LearningRate 0.0077   Epoch: 14   Global Step: 241030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:45:14,968-Speed 3179.10 samples/sec   Loss 0.6233   LearningRate 0.0077   Epoch: 14   Global Step: 241040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:18,173-Speed 3196.06 samples/sec   Loss 0.6344   LearningRate 0.0077   Epoch: 14   Global Step: 241050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:21,329-Speed 3244.54 samples/sec   Loss 0.5738   LearningRate 0.0077   Epoch: 14   Global Step: 241060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:24,529-Speed 3200.90 samples/sec   Loss 0.6290   LearningRate 0.0077   Epoch: 14   Global Step: 241070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:27,611-Speed 3324.02 samples/sec   Loss 0.6660   LearningRate 0.0077   Epoch: 14   Global Step: 241080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:30,674-Speed 3343.56 samples/sec   Loss 0.5964   LearningRate 0.0077   Epoch: 14   Global Step: 241090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:33,740-Speed 3340.30 samples/sec   Loss 0.6129   LearningRate 0.0077   Epoch: 14   Global Step: 241100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:36,822-Speed 3323.85 samples/sec   Loss 0.6188   LearningRate 0.0077   Epoch: 14   Global Step: 241110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:39,889-Speed 3339.26 samples/sec   Loss 0.6325   LearningRate 0.0077   Epoch: 14   Global Step: 241120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:42,964-Speed 3330.74 samples/sec   Loss 0.6020   LearningRate 0.0077   Epoch: 14   Global Step: 241130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:45:46,032-Speed 3338.26 samples/sec   Loss 0.5885   LearningRate 0.0077   Epoch: 14   Global Step: 241140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:45:49,099-Speed 3339.80 samples/sec   Loss 0.6149   LearningRate 0.0077   Epoch: 14   Global Step: 241150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:45:52,181-Speed 3322.83 samples/sec   Loss 0.5888   LearningRate 0.0077   Epoch: 14   Global Step: 241160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:45:55,306-Speed 3278.05 samples/sec   Loss 0.6499   LearningRate 0.0077   Epoch: 14   Global Step: 241170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:45:58,374-Speed 3337.91 samples/sec   Loss 0.6354   LearningRate 0.0077   Epoch: 14   Global Step: 241180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:01,442-Speed 3338.81 samples/sec   Loss 0.6151   LearningRate 0.0077   Epoch: 14   Global Step: 241190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:04,521-Speed 3326.11 samples/sec   Loss 0.6272   LearningRate 0.0077   Epoch: 14   Global Step: 241200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:07,582-Speed 3345.98 samples/sec   Loss 0.6287   LearningRate 0.0077   Epoch: 14   Global Step: 241210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:10,657-Speed 3330.55 samples/sec   Loss 0.6585   LearningRate 0.0077   Epoch: 14   Global Step: 241220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:13,740-Speed 3322.89 samples/sec   Loss 0.6064   LearningRate 0.0077   Epoch: 14   Global Step: 241230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:16,790-Speed 3358.00 samples/sec   Loss 0.6598   LearningRate 0.0077   Epoch: 14   Global Step: 241240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:19,869-Speed 3326.81 samples/sec   Loss 0.6054   LearningRate 0.0077   Epoch: 14   Global Step: 241250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:22,937-Speed 3338.33 samples/sec   Loss 0.5911   LearningRate 0.0077   Epoch: 14   Global Step: 241260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:26,040-Speed 3300.78 samples/sec   Loss 0.5962   LearningRate 0.0077   Epoch: 14   Global Step: 241270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:29,132-Speed 3312.37 samples/sec   Loss 0.6221   LearningRate 0.0077   Epoch: 14   Global Step: 241280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:32,194-Speed 3344.93 samples/sec   Loss 0.6216   LearningRate 0.0077   Epoch: 14   Global Step: 241290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:46:35,245-Speed 3357.38 samples/sec   Loss 0.6307   LearningRate 0.0077   Epoch: 14   Global Step: 241300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:46:38,332-Speed 3317.94 samples/sec   Loss 0.6060   LearningRate 0.0077   Epoch: 14   Global Step: 241310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:46:41,409-Speed 3328.08 samples/sec   Loss 0.6488   LearningRate 0.0077   Epoch: 14   Global Step: 241320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:46:44,477-Speed 3338.10 samples/sec   Loss 0.6457   LearningRate 0.0077   Epoch: 14   Global Step: 241330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:46:47,545-Speed 3338.66 samples/sec   Loss 0.6063   LearningRate 0.0077   Epoch: 14   Global Step: 241340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:46:50,632-Speed 3318.31 samples/sec   Loss 0.6298   LearningRate 0.0077   Epoch: 14   Global Step: 241350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:46:53,696-Speed 3342.97 samples/sec   Loss 0.6173   LearningRate 0.0077   Epoch: 14   Global Step: 241360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:46:56,766-Speed 3336.37 samples/sec   Loss 0.5967   LearningRate 0.0077   Epoch: 14   Global Step: 241370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:46:59,903-Speed 3264.29 samples/sec   Loss 0.6029   LearningRate 0.0077   Epoch: 14   Global Step: 241380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:03,007-Speed 3299.39 samples/sec   Loss 0.6402   LearningRate 0.0077   Epoch: 14   Global Step: 241390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:06,089-Speed 3323.75 samples/sec   Loss 0.6830   LearningRate 0.0077   Epoch: 14   Global Step: 241400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:47:09,178-Speed 3315.86 samples/sec   Loss 0.6620   LearningRate 0.0077   Epoch: 14   Global Step: 241410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:47:12,245-Speed 3339.39 samples/sec   Loss 0.6624   LearningRate 0.0077   Epoch: 14   Global Step: 241420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:47:15,425-Speed 3220.49 samples/sec   Loss 0.6435   LearningRate 0.0077   Epoch: 14   Global Step: 241430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:47:18,517-Speed 3312.95 samples/sec   Loss 0.6582   LearningRate 0.0077   Epoch: 14   Global Step: 241440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:47:21,580-Speed 3343.99 samples/sec   Loss 0.6327   LearningRate 0.0077   Epoch: 14   Global Step: 241450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:47:24,688-Speed 3295.68 samples/sec   Loss 0.6247   LearningRate 0.0077   Epoch: 14   Global Step: 241460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:47:27,745-Speed 3350.50 samples/sec   Loss 0.6478   LearningRate 0.0077   Epoch: 14   Global Step: 241470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:30,811-Speed 3339.87 samples/sec   Loss 0.5993   LearningRate 0.0077   Epoch: 14   Global Step: 241480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:33,884-Speed 3333.62 samples/sec   Loss 0.6278   LearningRate 0.0076   Epoch: 14   Global Step: 241490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:36,946-Speed 3345.24 samples/sec   Loss 0.6280   LearningRate 0.0076   Epoch: 14   Global Step: 241500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:40,012-Speed 3340.96 samples/sec   Loss 0.6045   LearningRate 0.0076   Epoch: 14   Global Step: 241510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:43,084-Speed 3335.51 samples/sec   Loss 0.6065   LearningRate 0.0076   Epoch: 14   Global Step: 241520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:46,163-Speed 3326.46 samples/sec   Loss 0.6414   LearningRate 0.0076   Epoch: 14   Global Step: 241530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:49,294-Speed 3271.49 samples/sec   Loss 0.6232   LearningRate 0.0076   Epoch: 14   Global Step: 241540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:52,366-Speed 3333.87 samples/sec   Loss 0.6593   LearningRate 0.0076   Epoch: 14   Global Step: 241550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:55,431-Speed 3342.03 samples/sec   Loss 0.6260   LearningRate 0.0076   Epoch: 14   Global Step: 241560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:47:58,496-Speed 3341.84 samples/sec   Loss 0.5939   LearningRate 0.0076   Epoch: 14   Global Step: 241570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:48:01,590-Speed 3310.06 samples/sec   Loss 0.6066   LearningRate 0.0076   Epoch: 14   Global Step: 241580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:48:04,703-Speed 3289.75 samples/sec   Loss 0.6389   LearningRate 0.0076   Epoch: 14   Global Step: 241590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:48:07,789-Speed 3319.80 samples/sec   Loss 0.6397   LearningRate 0.0076   Epoch: 14   Global Step: 241600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:10,878-Speed 3314.98 samples/sec   Loss 0.6422   LearningRate 0.0076   Epoch: 14   Global Step: 241610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:13,974-Speed 3309.13 samples/sec   Loss 0.6359   LearningRate 0.0076   Epoch: 14   Global Step: 241620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:17,055-Speed 3324.28 samples/sec   Loss 0.6612   LearningRate 0.0076   Epoch: 14   Global Step: 241630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:20,151-Speed 3308.28 samples/sec   Loss 0.6583   LearningRate 0.0076   Epoch: 14   Global Step: 241640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:23,270-Speed 3283.72 samples/sec   Loss 0.6278   LearningRate 0.0076   Epoch: 14   Global Step: 241650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:26,349-Speed 3326.17 samples/sec   Loss 0.6473   LearningRate 0.0076   Epoch: 14   Global Step: 241660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:29,414-Speed 3341.90 samples/sec   Loss 0.5926   LearningRate 0.0076   Epoch: 14   Global Step: 241670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:32,479-Speed 3341.00 samples/sec   Loss 0.6346   LearningRate 0.0076   Epoch: 14   Global Step: 241680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:35,617-Speed 3264.07 samples/sec   Loss 0.6456   LearningRate 0.0076   Epoch: 14   Global Step: 241690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:48:38,703-Speed 3319.90 samples/sec   Loss 0.5947   LearningRate 0.0076   Epoch: 14   Global Step: 241700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:48:41,769-Speed 3340.05 samples/sec   Loss 0.6298   LearningRate 0.0076   Epoch: 14   Global Step: 241710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:48:44,850-Speed 3324.63 samples/sec   Loss 0.6545   LearningRate 0.0076   Epoch: 14   Global Step: 241720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:48:47,973-Speed 3279.29 samples/sec   Loss 0.6250   LearningRate 0.0076   Epoch: 14   Global Step: 241730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:48:51,132-Speed 3242.54 samples/sec   Loss 0.6307   LearningRate 0.0076   Epoch: 14   Global Step: 241740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:48:54,352-Speed 3180.41 samples/sec   Loss 0.6437   LearningRate 0.0076   Epoch: 14   Global Step: 241750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:48:57,443-Speed 3314.03 samples/sec   Loss 0.6490   LearningRate 0.0076   Epoch: 14   Global Step: 241760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:49:00,508-Speed 3340.84 samples/sec   Loss 0.6304   LearningRate 0.0076   Epoch: 14   Global Step: 241770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:49:03,593-Speed 3320.81 samples/sec   Loss 0.6362   LearningRate 0.0076   Epoch: 14   Global Step: 241780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:49:06,663-Speed 3335.94 samples/sec   Loss 0.6151   LearningRate 0.0076   Epoch: 14   Global Step: 241790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:49:09,740-Speed 3328.47 samples/sec   Loss 0.6285   LearningRate 0.0076   Epoch: 14   Global Step: 241800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:12,827-Speed 3318.15 samples/sec   Loss 0.6532   LearningRate 0.0076   Epoch: 14   Global Step: 241810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:15,958-Speed 3271.37 samples/sec   Loss 0.6280   LearningRate 0.0076   Epoch: 14   Global Step: 241820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:19,063-Speed 3298.37 samples/sec   Loss 0.6325   LearningRate 0.0076   Epoch: 14   Global Step: 241830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:22,189-Speed 3276.83 samples/sec   Loss 0.6626   LearningRate 0.0076   Epoch: 14   Global Step: 241840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:25,337-Speed 3253.18 samples/sec   Loss 0.6235   LearningRate 0.0076   Epoch: 14   Global Step: 241850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:28,403-Speed 3341.51 samples/sec   Loss 0.6281   LearningRate 0.0076   Epoch: 14   Global Step: 241860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:31,467-Speed 3342.80 samples/sec   Loss 0.6550   LearningRate 0.0076   Epoch: 14   Global Step: 241870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:34,606-Speed 3262.79 samples/sec   Loss 0.6297   LearningRate 0.0076   Epoch: 14   Global Step: 241880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:37,844-Speed 3162.98 samples/sec   Loss 0.6233   LearningRate 0.0076   Epoch: 14   Global Step: 241890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:40,965-Speed 3281.07 samples/sec   Loss 0.6478   LearningRate 0.0076   Epoch: 14   Global Step: 241900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:44,084-Speed 3284.04 samples/sec   Loss 0.6271   LearningRate 0.0076   Epoch: 14   Global Step: 241910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:47,151-Speed 3339.60 samples/sec   Loss 0.6538   LearningRate 0.0076   Epoch: 14   Global Step: 241920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:50,210-Speed 3347.75 samples/sec   Loss 0.6145   LearningRate 0.0076   Epoch: 14   Global Step: 241930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:49:53,269-Speed 3349.47 samples/sec   Loss 0.6491   LearningRate 0.0076   Epoch: 14   Global Step: 241940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:49:56,336-Speed 3339.38 samples/sec   Loss 0.6024   LearningRate 0.0076   Epoch: 14   Global Step: 241950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:49:59,426-Speed 3314.15 samples/sec   Loss 0.6096   LearningRate 0.0076   Epoch: 14   Global Step: 241960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:50:02,513-Speed 3318.22 samples/sec   Loss 0.6164   LearningRate 0.0076   Epoch: 14   Global Step: 241970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:50:05,601-Speed 3317.01 samples/sec   Loss 0.6531   LearningRate 0.0076   Epoch: 14   Global Step: 241980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:50:08,699-Speed 3305.34 samples/sec   Loss 0.6330   LearningRate 0.0076   Epoch: 14   Global Step: 241990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:50:11,865-Speed 3234.82 samples/sec   Loss 0.6206   LearningRate 0.0076   Epoch: 14   Global Step: 242000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:50:55,778-[lfw][242000]XNorm: 22.031279
Training: 2022-04-12 00:50:55,779-[lfw][242000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-12 00:50:55,780-[lfw][242000]Accuracy-Highest: 0.99817
Training: 2022-04-12 00:51:46,539-[cfp_fp][242000]XNorm: 22.196191
Training: 2022-04-12 00:51:46,539-[cfp_fp][242000]Accuracy-Flip: 0.99000+-0.00557
Training: 2022-04-12 00:51:46,540-[cfp_fp][242000]Accuracy-Highest: 0.99186
Training: 2022-04-12 00:52:30,190-[agedb_30][242000]XNorm: 22.915663
Training: 2022-04-12 00:52:30,191-[agedb_30][242000]Accuracy-Flip: 0.98433+-0.00559
Training: 2022-04-12 00:52:30,191-[agedb_30][242000]Accuracy-Highest: 0.98567
Training: 2022-04-12 00:52:33,259-Speed 72.42 samples/sec   Loss 0.6182   LearningRate 0.0076   Epoch: 14   Global Step: 242010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:52:36,337-Speed 3327.91 samples/sec   Loss 0.6325   LearningRate 0.0076   Epoch: 14   Global Step: 242020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:52:39,381-Speed 3365.07 samples/sec   Loss 0.6316   LearningRate 0.0076   Epoch: 14   Global Step: 242030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:52:42,475-Speed 3310.24 samples/sec   Loss 0.6322   LearningRate 0.0076   Epoch: 14   Global Step: 242040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:52:45,546-Speed 3334.21 samples/sec   Loss 0.6319   LearningRate 0.0076   Epoch: 14   Global Step: 242050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:52:48,597-Speed 3357.56 samples/sec   Loss 0.6263   LearningRate 0.0076   Epoch: 14   Global Step: 242060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:52:51,650-Speed 3354.59 samples/sec   Loss 0.6387   LearningRate 0.0076   Epoch: 14   Global Step: 242070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:52:54,719-Speed 3337.79 samples/sec   Loss 0.5933   LearningRate 0.0076   Epoch: 14   Global Step: 242080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:52:57,779-Speed 3347.10 samples/sec   Loss 0.6279   LearningRate 0.0076   Epoch: 14   Global Step: 242090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:53:00,877-Speed 3305.40 samples/sec   Loss 0.6045   LearningRate 0.0075   Epoch: 14   Global Step: 242100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:53:03,936-Speed 3348.75 samples/sec   Loss 0.6150   LearningRate 0.0075   Epoch: 14   Global Step: 242110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:53:07,058-Speed 3280.34 samples/sec   Loss 0.6177   LearningRate 0.0075   Epoch: 14   Global Step: 242120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:53:10,210-Speed 3249.42 samples/sec   Loss 0.6487   LearningRate 0.0075   Epoch: 14   Global Step: 242130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:53:13,273-Speed 3344.19 samples/sec   Loss 0.6387   LearningRate 0.0075   Epoch: 14   Global Step: 242140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:16,363-Speed 3314.62 samples/sec   Loss 0.6411   LearningRate 0.0075   Epoch: 14   Global Step: 242150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:19,499-Speed 3266.06 samples/sec   Loss 0.6505   LearningRate 0.0075   Epoch: 14   Global Step: 242160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:22,575-Speed 3329.99 samples/sec   Loss 0.6478   LearningRate 0.0075   Epoch: 14   Global Step: 242170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:25,660-Speed 3320.15 samples/sec   Loss 0.6463   LearningRate 0.0075   Epoch: 14   Global Step: 242180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:28,738-Speed 3327.46 samples/sec   Loss 0.6289   LearningRate 0.0075   Epoch: 14   Global Step: 242190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:31,835-Speed 3306.68 samples/sec   Loss 0.6451   LearningRate 0.0075   Epoch: 14   Global Step: 242200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:34,927-Speed 3312.16 samples/sec   Loss 0.6282   LearningRate 0.0075   Epoch: 14   Global Step: 242210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:38,032-Speed 3299.25 samples/sec   Loss 0.5930   LearningRate 0.0075   Epoch: 14   Global Step: 242220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:41,108-Speed 3329.90 samples/sec   Loss 0.6205   LearningRate 0.0075   Epoch: 14   Global Step: 242230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:44,212-Speed 3299.99 samples/sec   Loss 0.6603   LearningRate 0.0075   Epoch: 14   Global Step: 242240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:53:47,283-Speed 3334.82 samples/sec   Loss 0.6488   LearningRate 0.0075   Epoch: 14   Global Step: 242250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:53:50,351-Speed 3338.25 samples/sec   Loss 0.6479   LearningRate 0.0075   Epoch: 14   Global Step: 242260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:53:53,410-Speed 3348.79 samples/sec   Loss 0.6455   LearningRate 0.0075   Epoch: 14   Global Step: 242270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:56,465-Speed 3351.89 samples/sec   Loss 0.6184   LearningRate 0.0075   Epoch: 14   Global Step: 242280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:53:59,557-Speed 3312.46 samples/sec   Loss 0.6137   LearningRate 0.0075   Epoch: 14   Global Step: 242290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:02,639-Speed 3323.88 samples/sec   Loss 0.6538   LearningRate 0.0075   Epoch: 14   Global Step: 242300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:05,752-Speed 3290.04 samples/sec   Loss 0.6387   LearningRate 0.0075   Epoch: 14   Global Step: 242310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:08,828-Speed 3329.33 samples/sec   Loss 0.6361   LearningRate 0.0075   Epoch: 14   Global Step: 242320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:11,894-Speed 3340.81 samples/sec   Loss 0.6522   LearningRate 0.0075   Epoch: 14   Global Step: 242330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:14,973-Speed 3327.34 samples/sec   Loss 0.6363   LearningRate 0.0075   Epoch: 14   Global Step: 242340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:18,062-Speed 3314.69 samples/sec   Loss 0.6417   LearningRate 0.0075   Epoch: 14   Global Step: 242350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:21,129-Speed 3340.07 samples/sec   Loss 0.6346   LearningRate 0.0075   Epoch: 14   Global Step: 242360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:24,189-Speed 3346.79 samples/sec   Loss 0.6402   LearningRate 0.0075   Epoch: 14   Global Step: 242370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:54:27,248-Speed 3348.60 samples/sec   Loss 0.6407   LearningRate 0.0075   Epoch: 14   Global Step: 242380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:30,320-Speed 3333.28 samples/sec   Loss 0.6415   LearningRate 0.0075   Epoch: 14   Global Step: 242390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:33,383-Speed 3344.17 samples/sec   Loss 0.6328   LearningRate 0.0075   Epoch: 14   Global Step: 242400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:36,456-Speed 3333.50 samples/sec   Loss 0.6343   LearningRate 0.0075   Epoch: 14   Global Step: 242410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:39,625-Speed 3232.14 samples/sec   Loss 0.6132   LearningRate 0.0075   Epoch: 14   Global Step: 242420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:42,688-Speed 3343.78 samples/sec   Loss 0.6217   LearningRate 0.0075   Epoch: 14   Global Step: 242430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:45,771-Speed 3322.61 samples/sec   Loss 0.6230   LearningRate 0.0075   Epoch: 14   Global Step: 242440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:48,842-Speed 3334.49 samples/sec   Loss 0.6381   LearningRate 0.0075   Epoch: 14   Global Step: 242450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:51,924-Speed 3323.70 samples/sec   Loss 0.6257   LearningRate 0.0075   Epoch: 14   Global Step: 242460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:55,005-Speed 3324.20 samples/sec   Loss 0.6537   LearningRate 0.0075   Epoch: 14   Global Step: 242470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:54:58,082-Speed 3328.07 samples/sec   Loss 0.6165   LearningRate 0.0075   Epoch: 14   Global Step: 242480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:55:01,248-Speed 3235.62 samples/sec   Loss 0.6482   LearningRate 0.0075   Epoch: 14   Global Step: 242490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:55:04,315-Speed 3339.16 samples/sec   Loss 0.6410   LearningRate 0.0075   Epoch: 14   Global Step: 242500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:55:07,389-Speed 3332.54 samples/sec   Loss 0.6201   LearningRate 0.0075   Epoch: 14   Global Step: 242510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:10,531-Speed 3258.96 samples/sec   Loss 0.6636   LearningRate 0.0075   Epoch: 14   Global Step: 242520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:13,684-Speed 3249.42 samples/sec   Loss 0.6065   LearningRate 0.0075   Epoch: 14   Global Step: 242530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:16,752-Speed 3338.16 samples/sec   Loss 0.6358   LearningRate 0.0075   Epoch: 14   Global Step: 242540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:19,824-Speed 3333.80 samples/sec   Loss 0.6440   LearningRate 0.0075   Epoch: 14   Global Step: 242550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:22,928-Speed 3299.60 samples/sec   Loss 0.6086   LearningRate 0.0075   Epoch: 14   Global Step: 242560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:26,100-Speed 3228.89 samples/sec   Loss 0.6604   LearningRate 0.0075   Epoch: 14   Global Step: 242570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:29,167-Speed 3340.12 samples/sec   Loss 0.6413   LearningRate 0.0075   Epoch: 14   Global Step: 242580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:32,233-Speed 3340.70 samples/sec   Loss 0.6260   LearningRate 0.0075   Epoch: 14   Global Step: 242590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:35,295-Speed 3344.19 samples/sec   Loss 0.6423   LearningRate 0.0075   Epoch: 14   Global Step: 242600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:38,375-Speed 3325.60 samples/sec   Loss 0.6591   LearningRate 0.0075   Epoch: 14   Global Step: 242610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:55:41,435-Speed 3347.37 samples/sec   Loss 0.6288   LearningRate 0.0075   Epoch: 14   Global Step: 242620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:44,497-Speed 3345.40 samples/sec   Loss 0.6445   LearningRate 0.0075   Epoch: 14   Global Step: 242630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:47,611-Speed 3288.24 samples/sec   Loss 0.6359   LearningRate 0.0075   Epoch: 14   Global Step: 242640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:50,675-Speed 3343.43 samples/sec   Loss 0.6055   LearningRate 0.0075   Epoch: 14   Global Step: 242650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:53,752-Speed 3328.17 samples/sec   Loss 0.6207   LearningRate 0.0075   Epoch: 14   Global Step: 242660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:56,818-Speed 3341.37 samples/sec   Loss 0.6345   LearningRate 0.0075   Epoch: 14   Global Step: 242670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:55:59,949-Speed 3271.32 samples/sec   Loss 0.6374   LearningRate 0.0075   Epoch: 14   Global Step: 242680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:56:03,010-Speed 3345.13 samples/sec   Loss 0.6268   LearningRate 0.0075   Epoch: 14   Global Step: 242690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:56:06,090-Speed 3325.58 samples/sec   Loss 0.6355   LearningRate 0.0075   Epoch: 14   Global Step: 242700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:56:09,158-Speed 3338.53 samples/sec   Loss 0.6125   LearningRate 0.0074   Epoch: 14   Global Step: 242710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:56:12,219-Speed 3346.27 samples/sec   Loss 0.6369   LearningRate 0.0074   Epoch: 14   Global Step: 242720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:15,285-Speed 3339.90 samples/sec   Loss 0.6406   LearningRate 0.0074   Epoch: 14   Global Step: 242730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:18,397-Speed 3291.06 samples/sec   Loss 0.6165   LearningRate 0.0074   Epoch: 14   Global Step: 242740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:21,500-Speed 3301.07 samples/sec   Loss 0.6158   LearningRate 0.0074   Epoch: 14   Global Step: 242750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:24,606-Speed 3298.07 samples/sec   Loss 0.5881   LearningRate 0.0074   Epoch: 14   Global Step: 242760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:27,677-Speed 3335.09 samples/sec   Loss 0.6165   LearningRate 0.0074   Epoch: 14   Global Step: 242770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:30,741-Speed 3342.74 samples/sec   Loss 0.6165   LearningRate 0.0074   Epoch: 14   Global Step: 242780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:33,818-Speed 3328.87 samples/sec   Loss 0.6450   LearningRate 0.0074   Epoch: 14   Global Step: 242790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:36,888-Speed 3335.85 samples/sec   Loss 0.6524   LearningRate 0.0074   Epoch: 14   Global Step: 242800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:40,038-Speed 3251.77 samples/sec   Loss 0.6112   LearningRate 0.0074   Epoch: 14   Global Step: 242810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:43,178-Speed 3262.04 samples/sec   Loss 0.6334   LearningRate 0.0074   Epoch: 14   Global Step: 242820   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-12 00:56:46,233-Speed 3352.45 samples/sec   Loss 0.6453   LearningRate 0.0074   Epoch: 14   Global Step: 242830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:49,327-Speed 3310.74 samples/sec   Loss 0.6098   LearningRate 0.0074   Epoch: 14   Global Step: 242840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:52,393-Speed 3340.06 samples/sec   Loss 0.6147   LearningRate 0.0074   Epoch: 14   Global Step: 242850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:55,458-Speed 3341.70 samples/sec   Loss 0.6061   LearningRate 0.0074   Epoch: 14   Global Step: 242860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:56:58,532-Speed 3332.62 samples/sec   Loss 0.6600   LearningRate 0.0074   Epoch: 14   Global Step: 242870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:57:01,633-Speed 3302.80 samples/sec   Loss 0.5996   LearningRate 0.0074   Epoch: 14   Global Step: 242880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:57:04,707-Speed 3331.31 samples/sec   Loss 0.6454   LearningRate 0.0074   Epoch: 14   Global Step: 242890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:57:07,773-Speed 3340.89 samples/sec   Loss 0.6175   LearningRate 0.0074   Epoch: 14   Global Step: 242900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:57:10,824-Speed 3356.48 samples/sec   Loss 0.6512   LearningRate 0.0074   Epoch: 14   Global Step: 242910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:13,901-Speed 3329.41 samples/sec   Loss 0.6504   LearningRate 0.0074   Epoch: 14   Global Step: 242920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:17,065-Speed 3236.33 samples/sec   Loss 0.6101   LearningRate 0.0074   Epoch: 14   Global Step: 242930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:20,211-Speed 3256.05 samples/sec   Loss 0.6591   LearningRate 0.0074   Epoch: 14   Global Step: 242940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:23,296-Speed 3320.30 samples/sec   Loss 0.6435   LearningRate 0.0074   Epoch: 14   Global Step: 242950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:26,384-Speed 3316.89 samples/sec   Loss 0.6217   LearningRate 0.0074   Epoch: 14   Global Step: 242960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:29,569-Speed 3215.62 samples/sec   Loss 0.6156   LearningRate 0.0074   Epoch: 14   Global Step: 242970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:32,697-Speed 3274.84 samples/sec   Loss 0.6290   LearningRate 0.0074   Epoch: 14   Global Step: 242980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:35,804-Speed 3295.87 samples/sec   Loss 0.6251   LearningRate 0.0074   Epoch: 14   Global Step: 242990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:38,883-Speed 3326.59 samples/sec   Loss 0.5930   LearningRate 0.0074   Epoch: 14   Global Step: 243000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:57:41,980-Speed 3307.85 samples/sec   Loss 0.6283   LearningRate 0.0074   Epoch: 14   Global Step: 243010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:57:45,045-Speed 3340.93 samples/sec   Loss 0.6230   LearningRate 0.0074   Epoch: 14   Global Step: 243020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:57:48,150-Speed 3299.08 samples/sec   Loss 0.6218   LearningRate 0.0074   Epoch: 14   Global Step: 243030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:57:51,232-Speed 3324.05 samples/sec   Loss 0.6341   LearningRate 0.0074   Epoch: 14   Global Step: 243040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:57:54,296-Speed 3342.78 samples/sec   Loss 0.6157   LearningRate 0.0074   Epoch: 14   Global Step: 243050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:57:57,378-Speed 3322.88 samples/sec   Loss 0.6416   LearningRate 0.0074   Epoch: 14   Global Step: 243060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:00,544-Speed 3235.38 samples/sec   Loss 0.6233   LearningRate 0.0074   Epoch: 14   Global Step: 243070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:03,652-Speed 3294.41 samples/sec   Loss 0.6390   LearningRate 0.0074   Epoch: 14   Global Step: 243080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:06,750-Speed 3307.42 samples/sec   Loss 0.6194   LearningRate 0.0074   Epoch: 14   Global Step: 243090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:09,820-Speed 3335.37 samples/sec   Loss 0.6714   LearningRate 0.0074   Epoch: 14   Global Step: 243100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:12,938-Speed 3285.25 samples/sec   Loss 0.6610   LearningRate 0.0074   Epoch: 14   Global Step: 243110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:16,008-Speed 3336.19 samples/sec   Loss 0.6427   LearningRate 0.0074   Epoch: 14   Global Step: 243120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:19,099-Speed 3313.79 samples/sec   Loss 0.6683   LearningRate 0.0074   Epoch: 14   Global Step: 243130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:22,200-Speed 3302.93 samples/sec   Loss 0.6262   LearningRate 0.0074   Epoch: 14   Global Step: 243140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:25,266-Speed 3340.75 samples/sec   Loss 0.6312   LearningRate 0.0074   Epoch: 14   Global Step: 243150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:28,386-Speed 3282.86 samples/sec   Loss 0.6337   LearningRate 0.0074   Epoch: 14   Global Step: 243160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:31,475-Speed 3315.72 samples/sec   Loss 0.6407   LearningRate 0.0074   Epoch: 14   Global Step: 243170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:58:34,540-Speed 3341.65 samples/sec   Loss 0.6203   LearningRate 0.0074   Epoch: 14   Global Step: 243180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:37,622-Speed 3323.04 samples/sec   Loss 0.6283   LearningRate 0.0074   Epoch: 14   Global Step: 243190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:40,711-Speed 3315.64 samples/sec   Loss 0.6066   LearningRate 0.0074   Epoch: 14   Global Step: 243200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:43,782-Speed 3335.80 samples/sec   Loss 0.6568   LearningRate 0.0074   Epoch: 14   Global Step: 243210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:46,849-Speed 3339.36 samples/sec   Loss 0.6925   LearningRate 0.0074   Epoch: 14   Global Step: 243220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:49,921-Speed 3333.63 samples/sec   Loss 0.6431   LearningRate 0.0074   Epoch: 14   Global Step: 243230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:52,995-Speed 3331.87 samples/sec   Loss 0.6280   LearningRate 0.0074   Epoch: 14   Global Step: 243240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:56,068-Speed 3332.98 samples/sec   Loss 0.6479   LearningRate 0.0074   Epoch: 14   Global Step: 243250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:58:59,133-Speed 3341.71 samples/sec   Loss 0.6813   LearningRate 0.0074   Epoch: 14   Global Step: 243260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:59:02,231-Speed 3306.29 samples/sec   Loss 0.6532   LearningRate 0.0074   Epoch: 14   Global Step: 243270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:59:05,309-Speed 3327.68 samples/sec   Loss 0.6758   LearningRate 0.0074   Epoch: 14   Global Step: 243280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:08,393-Speed 3320.99 samples/sec   Loss 0.6033   LearningRate 0.0074   Epoch: 14   Global Step: 243290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:11,472-Speed 3326.70 samples/sec   Loss 0.6259   LearningRate 0.0074   Epoch: 14   Global Step: 243300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:14,578-Speed 3297.69 samples/sec   Loss 0.6392   LearningRate 0.0074   Epoch: 14   Global Step: 243310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:17,655-Speed 3328.24 samples/sec   Loss 0.6448   LearningRate 0.0073   Epoch: 14   Global Step: 243320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:20,722-Speed 3339.55 samples/sec   Loss 0.6430   LearningRate 0.0073   Epoch: 14   Global Step: 243330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:23,819-Speed 3307.92 samples/sec   Loss 0.6398   LearningRate 0.0073   Epoch: 14   Global Step: 243340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:26,885-Speed 3340.21 samples/sec   Loss 0.6183   LearningRate 0.0073   Epoch: 14   Global Step: 243350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:30,008-Speed 3279.75 samples/sec   Loss 0.6240   LearningRate 0.0073   Epoch: 14   Global Step: 243360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:33,071-Speed 3343.71 samples/sec   Loss 0.6386   LearningRate 0.0073   Epoch: 14   Global Step: 243370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 00:59:36,164-Speed 3310.93 samples/sec   Loss 0.6269   LearningRate 0.0073   Epoch: 14   Global Step: 243380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:59:39,237-Speed 3333.84 samples/sec   Loss 0.6133   LearningRate 0.0073   Epoch: 14   Global Step: 243390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:59:42,389-Speed 3248.84 samples/sec   Loss 0.6349   LearningRate 0.0073   Epoch: 14   Global Step: 243400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:59:45,450-Speed 3346.14 samples/sec   Loss 0.6408   LearningRate 0.0073   Epoch: 14   Global Step: 243410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:59:48,523-Speed 3332.73 samples/sec   Loss 0.6306   LearningRate 0.0073   Epoch: 14   Global Step: 243420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 00:59:51,586-Speed 3343.98 samples/sec   Loss 0.6895   LearningRate 0.0073   Epoch: 14   Global Step: 243430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:59:54,652-Speed 3341.11 samples/sec   Loss 0.6523   LearningRate 0.0073   Epoch: 14   Global Step: 243440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 00:59:57,732-Speed 3325.40 samples/sec   Loss 0.6261   LearningRate 0.0073   Epoch: 14   Global Step: 243450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 01:00:00,817-Speed 3320.08 samples/sec   Loss 0.6223   LearningRate 0.0073   Epoch: 14   Global Step: 243460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 01:00:03,900-Speed 3321.86 samples/sec   Loss 0.6335   LearningRate 0.0073   Epoch: 14   Global Step: 243470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 01:00:06,964-Speed 3343.10 samples/sec   Loss 0.6317   LearningRate 0.0073   Epoch: 14   Global Step: 243480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 01:00:10,099-Speed 3267.48 samples/sec   Loss 0.6411   LearningRate 0.0073   Epoch: 14   Global Step: 243490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 01:00:13,179-Speed 3325.04 samples/sec   Loss 0.6584   LearningRate 0.0073   Epoch: 14   Global Step: 243500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 01:00:16,330-Speed 3250.18 samples/sec   Loss 0.6416   LearningRate 0.0073   Epoch: 14   Global Step: 243510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 01:00:19,515-Speed 3215.82 samples/sec   Loss 0.6271   LearningRate 0.0073   Epoch: 14   Global Step: 243520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-12 01:00:22,590-Speed 3331.03 samples/sec   Loss 0.6428   LearningRate 0.0073   Epoch: 14   Global Step: 243530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:25,687-Speed 3307.18 samples/sec   Loss 0.6392   LearningRate 0.0073   Epoch: 14   Global Step: 243540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:29,795-Speed 2493.10 samples/sec   Loss 0.6435   LearningRate 0.0073   Epoch: 14   Global Step: 243550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:32,909-Speed 3289.49 samples/sec   Loss 0.6388   LearningRate 0.0073   Epoch: 14   Global Step: 243560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:35,982-Speed 3332.93 samples/sec   Loss 0.6423   LearningRate 0.0073   Epoch: 14   Global Step: 243570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:39,095-Speed 3289.89 samples/sec   Loss 0.6398   LearningRate 0.0073   Epoch: 14   Global Step: 243580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:42,271-Speed 3224.75 samples/sec   Loss 0.6417   LearningRate 0.0073   Epoch: 14   Global Step: 243590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:45,375-Speed 3299.58 samples/sec   Loss 0.6433   LearningRate 0.0073   Epoch: 14   Global Step: 243600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:48,459-Speed 3320.91 samples/sec   Loss 0.6564   LearningRate 0.0073   Epoch: 14   Global Step: 243610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:51,573-Speed 3289.14 samples/sec   Loss 0.6695   LearningRate 0.0073   Epoch: 14   Global Step: 243620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:00:54,644-Speed 3335.25 samples/sec   Loss 0.6155   LearningRate 0.0073   Epoch: 14   Global Step: 243630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:00:57,717-Speed 3332.74 samples/sec   Loss 0.6615   LearningRate 0.0073   Epoch: 14   Global Step: 243640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:01:00,780-Speed 3344.52 samples/sec   Loss 0.6183   LearningRate 0.0073   Epoch: 14   Global Step: 243650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:03,844-Speed 3342.11 samples/sec   Loss 0.6184   LearningRate 0.0073   Epoch: 14   Global Step: 243660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:06,911-Speed 3339.46 samples/sec   Loss 0.6234   LearningRate 0.0073   Epoch: 14   Global Step: 243670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:09,977-Speed 3341.08 samples/sec   Loss 0.6319   LearningRate 0.0073   Epoch: 14   Global Step: 243680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:13,041-Speed 3342.59 samples/sec   Loss 0.6225   LearningRate 0.0073   Epoch: 14   Global Step: 243690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:16,112-Speed 3335.02 samples/sec   Loss 0.6217   LearningRate 0.0073   Epoch: 14   Global Step: 243700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:19,194-Speed 3323.71 samples/sec   Loss 0.6574   LearningRate 0.0073   Epoch: 14   Global Step: 243710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:22,274-Speed 3325.74 samples/sec   Loss 0.6373   LearningRate 0.0073   Epoch: 14   Global Step: 243720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:25,350-Speed 3329.84 samples/sec   Loss 0.6445   LearningRate 0.0073   Epoch: 14   Global Step: 243730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:28,414-Speed 3341.88 samples/sec   Loss 0.6508   LearningRate 0.0073   Epoch: 14   Global Step: 243740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:31,485-Speed 3335.67 samples/sec   Loss 0.6515   LearningRate 0.0073   Epoch: 14   Global Step: 243750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:01:34,650-Speed 3235.59 samples/sec   Loss 0.6255   LearningRate 0.0073   Epoch: 14   Global Step: 243760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:01:37,741-Speed 3313.81 samples/sec   Loss 0.6549   LearningRate 0.0073   Epoch: 14   Global Step: 243770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:01:40,836-Speed 3309.89 samples/sec   Loss 0.6133   LearningRate 0.0073   Epoch: 14   Global Step: 243780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:01:43,948-Speed 3290.58 samples/sec   Loss 0.6061   LearningRate 0.0073   Epoch: 14   Global Step: 243790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:47,142-Speed 3206.52 samples/sec   Loss 0.6778   LearningRate 0.0073   Epoch: 14   Global Step: 243800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:50,255-Speed 3290.92 samples/sec   Loss 0.5895   LearningRate 0.0073   Epoch: 14   Global Step: 243810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:53,352-Speed 3306.60 samples/sec   Loss 0.6081   LearningRate 0.0073   Epoch: 14   Global Step: 243820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:56,554-Speed 3199.11 samples/sec   Loss 0.6366   LearningRate 0.0073   Epoch: 14   Global Step: 243830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:01:59,623-Speed 3337.38 samples/sec   Loss 0.6422   LearningRate 0.0073   Epoch: 14   Global Step: 243840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:02,696-Speed 3333.55 samples/sec   Loss 0.6271   LearningRate 0.0073   Epoch: 14   Global Step: 243850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:05,766-Speed 3336.16 samples/sec   Loss 0.6764   LearningRate 0.0073   Epoch: 14   Global Step: 243860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:08,911-Speed 3255.88 samples/sec   Loss 0.6414   LearningRate 0.0073   Epoch: 14   Global Step: 243870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:12,013-Speed 3302.49 samples/sec   Loss 0.6027   LearningRate 0.0073   Epoch: 14   Global Step: 243880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:15,078-Speed 3341.17 samples/sec   Loss 0.6023   LearningRate 0.0073   Epoch: 14   Global Step: 243890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:02:18,205-Speed 3275.49 samples/sec   Loss 0.6678   LearningRate 0.0073   Epoch: 14   Global Step: 243900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:02:21,280-Speed 3330.66 samples/sec   Loss 0.6338   LearningRate 0.0073   Epoch: 14   Global Step: 243910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:24,448-Speed 3233.14 samples/sec   Loss 0.6825   LearningRate 0.0073   Epoch: 14   Global Step: 243920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:27,532-Speed 3321.79 samples/sec   Loss 0.6253   LearningRate 0.0073   Epoch: 14   Global Step: 243930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:30,680-Speed 3253.18 samples/sec   Loss 0.6219   LearningRate 0.0072   Epoch: 14   Global Step: 243940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:33,785-Speed 3298.95 samples/sec   Loss 0.6090   LearningRate 0.0072   Epoch: 14   Global Step: 243950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:02:36,850-Speed 3341.85 samples/sec   Loss 0.6513   LearningRate 0.0072   Epoch: 14   Global Step: 243960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:02:39,923-Speed 3332.36 samples/sec   Loss 0.6419   LearningRate 0.0072   Epoch: 14   Global Step: 243970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:02:43,005-Speed 3323.91 samples/sec   Loss 0.6488   LearningRate 0.0072   Epoch: 14   Global Step: 243980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:02:46,087-Speed 3323.16 samples/sec   Loss 0.6649   LearningRate 0.0072   Epoch: 14   Global Step: 243990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:02:49,168-Speed 3323.97 samples/sec   Loss 0.6424   LearningRate 0.0072   Epoch: 14   Global Step: 244000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:03:33,013-[lfw][244000]XNorm: 21.560836
Training: 2022-04-12 01:03:33,014-[lfw][244000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 01:03:33,014-[lfw][244000]Accuracy-Highest: 0.99817
Training: 2022-04-12 01:04:24,213-[cfp_fp][244000]XNorm: 22.774833
Training: 2022-04-12 01:04:24,214-[cfp_fp][244000]Accuracy-Flip: 0.99114+-0.00446
Training: 2022-04-12 01:04:24,214-[cfp_fp][244000]Accuracy-Highest: 0.99186
Training: 2022-04-12 01:05:08,062-[agedb_30][244000]XNorm: 23.298879
Training: 2022-04-12 01:05:08,062-[agedb_30][244000]Accuracy-Flip: 0.98500+-0.00738
Training: 2022-04-12 01:05:08,063-[agedb_30][244000]Accuracy-Highest: 0.98567
Training: 2022-04-12 01:05:11,130-Speed 72.13 samples/sec   Loss 0.6421   LearningRate 0.0072   Epoch: 14   Global Step: 244010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:05:14,196-Speed 3340.52 samples/sec   Loss 0.6591   LearningRate 0.0072   Epoch: 14   Global Step: 244020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:05:17,280-Speed 3321.02 samples/sec   Loss 0.6515   LearningRate 0.0072   Epoch: 14   Global Step: 244030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:05:20,355-Speed 3330.55 samples/sec   Loss 0.6038   LearningRate 0.0072   Epoch: 14   Global Step: 244040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:05:23,449-Speed 3310.45 samples/sec   Loss 0.6198   LearningRate 0.0072   Epoch: 14   Global Step: 244050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:05:26,581-Speed 3270.97 samples/sec   Loss 0.6246   LearningRate 0.0072   Epoch: 14   Global Step: 244060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-12 01:05:29,627-Speed 3362.33 samples/sec   Loss 0.6591   LearningRate 0.0072   Epoch: 14   Global Step: 244070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-12 01:05:32,688-Speed 3345.89 samples/sec   Loss 0.6772   LearningRate 0.0072   Epoch: 14   Global Step: 244080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:05:35,752-Speed 3342.93 samples/sec   Loss 0.6363   LearningRate 0.0072   Epoch: 14   Global Step: 244090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:05:38,818-Speed 3340.69 samples/sec   Loss 0.6841   LearningRate 0.0072   Epoch: 14   Global Step: 244100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:05:41,884-Speed 3340.43 samples/sec   Loss 0.6483   LearningRate 0.0072   Epoch: 14   Global Step: 244110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:05:44,982-Speed 3306.61 samples/sec   Loss 0.6104   LearningRate 0.0072   Epoch: 14   Global Step: 244120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:05:48,054-Speed 3333.67 samples/sec   Loss 0.6272   LearningRate 0.0072   Epoch: 14   Global Step: 244130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:05:51,175-Speed 3281.63 samples/sec   Loss 0.6754   LearningRate 0.0072   Epoch: 14   Global Step: 244140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:05:54,291-Speed 3287.34 samples/sec   Loss 0.6542   LearningRate 0.0072   Epoch: 14   Global Step: 244150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:05:57,417-Speed 3276.29 samples/sec   Loss 0.6194   LearningRate 0.0072   Epoch: 14   Global Step: 244160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:06:00,534-Speed 3285.95 samples/sec   Loss 0.6009   LearningRate 0.0072   Epoch: 14   Global Step: 244170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:06:03,658-Speed 3278.48 samples/sec   Loss 0.6642   LearningRate 0.0072   Epoch: 14   Global Step: 244180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:06,837-Speed 3222.68 samples/sec   Loss 0.6114   LearningRate 0.0072   Epoch: 14   Global Step: 244190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:09,912-Speed 3330.26 samples/sec   Loss 0.6435   LearningRate 0.0072   Epoch: 14   Global Step: 244200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:12,983-Speed 3334.87 samples/sec   Loss 0.6654   LearningRate 0.0072   Epoch: 14   Global Step: 244210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:16,114-Speed 3271.57 samples/sec   Loss 0.6029   LearningRate 0.0072   Epoch: 14   Global Step: 244220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:19,225-Speed 3292.89 samples/sec   Loss 0.6655   LearningRate 0.0072   Epoch: 14   Global Step: 244230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:22,405-Speed 3220.15 samples/sec   Loss 0.6528   LearningRate 0.0072   Epoch: 14   Global Step: 244240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:25,514-Speed 3294.38 samples/sec   Loss 0.6381   LearningRate 0.0072   Epoch: 14   Global Step: 244250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:28,674-Speed 3241.34 samples/sec   Loss 0.6566   LearningRate 0.0072   Epoch: 14   Global Step: 244260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:31,786-Speed 3291.97 samples/sec   Loss 0.6294   LearningRate 0.0072   Epoch: 14   Global Step: 244270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:34,852-Speed 3340.09 samples/sec   Loss 0.6448   LearningRate 0.0072   Epoch: 14   Global Step: 244280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:06:37,932-Speed 3325.70 samples/sec   Loss 0.6491   LearningRate 0.0072   Epoch: 14   Global Step: 244290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:06:41,001-Speed 3337.14 samples/sec   Loss 0.6433   LearningRate 0.0072   Epoch: 14   Global Step: 244300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:44,093-Speed 3312.11 samples/sec   Loss 0.6503   LearningRate 0.0072   Epoch: 14   Global Step: 244310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:47,174-Speed 3324.42 samples/sec   Loss 0.6478   LearningRate 0.0072   Epoch: 14   Global Step: 244320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:50,296-Speed 3281.62 samples/sec   Loss 0.6534   LearningRate 0.0072   Epoch: 14   Global Step: 244330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:53,472-Speed 3224.65 samples/sec   Loss 0.6501   LearningRate 0.0072   Epoch: 14   Global Step: 244340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:56,541-Speed 3337.42 samples/sec   Loss 0.6693   LearningRate 0.0072   Epoch: 14   Global Step: 244350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:06:59,618-Speed 3327.72 samples/sec   Loss 0.6212   LearningRate 0.0072   Epoch: 14   Global Step: 244360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:02,694-Speed 3330.11 samples/sec   Loss 0.6460   LearningRate 0.0072   Epoch: 14   Global Step: 244370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:05,759-Speed 3341.34 samples/sec   Loss 0.6647   LearningRate 0.0072   Epoch: 14   Global Step: 244380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:08,850-Speed 3314.49 samples/sec   Loss 0.6585   LearningRate 0.0072   Epoch: 14   Global Step: 244390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:11,943-Speed 3310.91 samples/sec   Loss 0.6546   LearningRate 0.0072   Epoch: 14   Global Step: 244400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:07:15,041-Speed 3305.94 samples/sec   Loss 0.5940   LearningRate 0.0072   Epoch: 14   Global Step: 244410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:07:18,125-Speed 3321.33 samples/sec   Loss 0.6586   LearningRate 0.0072   Epoch: 14   Global Step: 244420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:07:21,211-Speed 3319.14 samples/sec   Loss 0.6440   LearningRate 0.0072   Epoch: 14   Global Step: 244430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:07:24,266-Speed 3352.66 samples/sec   Loss 0.6594   LearningRate 0.0072   Epoch: 14   Global Step: 244440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:27,338-Speed 3334.07 samples/sec   Loss 0.6247   LearningRate 0.0072   Epoch: 14   Global Step: 244450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:30,408-Speed 3336.15 samples/sec   Loss 0.6463   LearningRate 0.0072   Epoch: 14   Global Step: 244460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:33,484-Speed 3329.79 samples/sec   Loss 0.6546   LearningRate 0.0072   Epoch: 14   Global Step: 244470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:36,541-Speed 3350.27 samples/sec   Loss 0.6729   LearningRate 0.0072   Epoch: 14   Global Step: 244480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:39,627-Speed 3319.18 samples/sec   Loss 0.6235   LearningRate 0.0072   Epoch: 14   Global Step: 244490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:42,696-Speed 3337.48 samples/sec   Loss 0.6379   LearningRate 0.0072   Epoch: 14   Global Step: 244500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:45,767-Speed 3335.11 samples/sec   Loss 0.6423   LearningRate 0.0072   Epoch: 14   Global Step: 244510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:48,842-Speed 3331.38 samples/sec   Loss 0.6340   LearningRate 0.0072   Epoch: 14   Global Step: 244520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:51,902-Speed 3346.76 samples/sec   Loss 0.6360   LearningRate 0.0072   Epoch: 14   Global Step: 244530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:07:54,975-Speed 3332.85 samples/sec   Loss 0.6540   LearningRate 0.0072   Epoch: 14   Global Step: 244540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:07:58,045-Speed 3336.83 samples/sec   Loss 0.6572   LearningRate 0.0072   Epoch: 14   Global Step: 244550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:01,112-Speed 3339.07 samples/sec   Loss 0.6368   LearningRate 0.0071   Epoch: 14   Global Step: 244560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:04,268-Speed 3245.16 samples/sec   Loss 0.6380   LearningRate 0.0071   Epoch: 14   Global Step: 244570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:07,443-Speed 3225.62 samples/sec   Loss 0.6479   LearningRate 0.0071   Epoch: 14   Global Step: 244580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:10,565-Speed 3280.79 samples/sec   Loss 0.6271   LearningRate 0.0071   Epoch: 14   Global Step: 244590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:13,690-Speed 3278.41 samples/sec   Loss 0.6365   LearningRate 0.0071   Epoch: 14   Global Step: 244600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:16,759-Speed 3336.63 samples/sec   Loss 0.6100   LearningRate 0.0071   Epoch: 14   Global Step: 244610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:19,905-Speed 3255.84 samples/sec   Loss 0.6329   LearningRate 0.0071   Epoch: 14   Global Step: 244620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:23,073-Speed 3233.37 samples/sec   Loss 0.6445   LearningRate 0.0071   Epoch: 14   Global Step: 244630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:26,177-Speed 3299.87 samples/sec   Loss 0.6705   LearningRate 0.0071   Epoch: 14   Global Step: 244640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:29,249-Speed 3333.54 samples/sec   Loss 0.6084   LearningRate 0.0071   Epoch: 14   Global Step: 244650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:32,316-Speed 3339.49 samples/sec   Loss 0.6214   LearningRate 0.0071   Epoch: 14   Global Step: 244660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:08:35,374-Speed 3349.13 samples/sec   Loss 0.6189   LearningRate 0.0071   Epoch: 14   Global Step: 244670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:08:38,477-Speed 3300.87 samples/sec   Loss 0.6067   LearningRate 0.0071   Epoch: 14   Global Step: 244680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:08:41,566-Speed 3315.72 samples/sec   Loss 0.6508   LearningRate 0.0071   Epoch: 14   Global Step: 244690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:08:44,642-Speed 3330.86 samples/sec   Loss 0.6347   LearningRate 0.0071   Epoch: 14   Global Step: 244700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:08:47,716-Speed 3331.83 samples/sec   Loss 0.6441   LearningRate 0.0071   Epoch: 14   Global Step: 244710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:08:50,804-Speed 3316.49 samples/sec   Loss 0.6286   LearningRate 0.0071   Epoch: 14   Global Step: 244720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:08:53,904-Speed 3303.79 samples/sec   Loss 0.6452   LearningRate 0.0071   Epoch: 14   Global Step: 244730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:08:56,990-Speed 3319.10 samples/sec   Loss 0.6248   LearningRate 0.0071   Epoch: 14   Global Step: 244740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:09:00,102-Speed 3291.40 samples/sec   Loss 0.6401   LearningRate 0.0071   Epoch: 14   Global Step: 244750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:09:03,175-Speed 3331.88 samples/sec   Loss 0.6178   LearningRate 0.0071   Epoch: 14   Global Step: 244760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:09:06,254-Speed 3327.53 samples/sec   Loss 0.6359   LearningRate 0.0071   Epoch: 14   Global Step: 244770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:09:09,326-Speed 3333.72 samples/sec   Loss 0.6460   LearningRate 0.0071   Epoch: 14   Global Step: 244780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:09:12,456-Speed 3272.73 samples/sec   Loss 0.6318   LearningRate 0.0071   Epoch: 14   Global Step: 244790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:09:15,555-Speed 3304.87 samples/sec   Loss 0.6564   LearningRate 0.0071   Epoch: 14   Global Step: 244800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:09:18,625-Speed 3336.05 samples/sec   Loss 0.6466   LearningRate 0.0071   Epoch: 14   Global Step: 244810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:21,723-Speed 3305.80 samples/sec   Loss 0.6378   LearningRate 0.0071   Epoch: 14   Global Step: 244820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:24,788-Speed 3342.63 samples/sec   Loss 0.6420   LearningRate 0.0071   Epoch: 14   Global Step: 244830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:27,875-Speed 3317.36 samples/sec   Loss 0.6820   LearningRate 0.0071   Epoch: 14   Global Step: 244840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:30,980-Speed 3298.71 samples/sec   Loss 0.6561   LearningRate 0.0071   Epoch: 14   Global Step: 244850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:34,072-Speed 3312.42 samples/sec   Loss 0.6497   LearningRate 0.0071   Epoch: 14   Global Step: 244860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:37,259-Speed 3214.38 samples/sec   Loss 0.6146   LearningRate 0.0071   Epoch: 14   Global Step: 244870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:40,488-Speed 3171.68 samples/sec   Loss 0.6217   LearningRate 0.0071   Epoch: 14   Global Step: 244880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:43,567-Speed 3326.12 samples/sec   Loss 0.6669   LearningRate 0.0071   Epoch: 14   Global Step: 244890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:46,651-Speed 3321.84 samples/sec   Loss 0.6657   LearningRate 0.0071   Epoch: 14   Global Step: 244900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:09:49,720-Speed 3336.63 samples/sec   Loss 0.6646   LearningRate 0.0071   Epoch: 14   Global Step: 244910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:09:52,793-Speed 3333.25 samples/sec   Loss 0.6424   LearningRate 0.0071   Epoch: 14   Global Step: 244920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:09:55,873-Speed 3324.97 samples/sec   Loss 0.6335   LearningRate 0.0071   Epoch: 14   Global Step: 244930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:09:58,961-Speed 3317.56 samples/sec   Loss 0.6284   LearningRate 0.0071   Epoch: 14   Global Step: 244940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:10:02,026-Speed 3341.90 samples/sec   Loss 0.6666   LearningRate 0.0071   Epoch: 14   Global Step: 244950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:10:05,164-Speed 3263.19 samples/sec   Loss 0.6550   LearningRate 0.0071   Epoch: 14   Global Step: 244960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:08,306-Speed 3260.58 samples/sec   Loss 0.6321   LearningRate 0.0071   Epoch: 14   Global Step: 244970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:11,374-Speed 3337.61 samples/sec   Loss 0.6295   LearningRate 0.0071   Epoch: 14   Global Step: 244980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:14,633-Speed 3143.03 samples/sec   Loss 0.6510   LearningRate 0.0071   Epoch: 14   Global Step: 244990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:17,761-Speed 3274.89 samples/sec   Loss 0.6570   LearningRate 0.0071   Epoch: 14   Global Step: 245000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:20,839-Speed 3327.38 samples/sec   Loss 0.6522   LearningRate 0.0071   Epoch: 14   Global Step: 245010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:24,000-Speed 3239.60 samples/sec   Loss 0.6813   LearningRate 0.0071   Epoch: 14   Global Step: 245020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:27,114-Speed 3290.41 samples/sec   Loss 0.6574   LearningRate 0.0071   Epoch: 14   Global Step: 245030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:30,199-Speed 3320.06 samples/sec   Loss 0.6830   LearningRate 0.0071   Epoch: 14   Global Step: 245040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:33,269-Speed 3336.12 samples/sec   Loss 0.6432   LearningRate 0.0071   Epoch: 14   Global Step: 245050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:36,349-Speed 3325.68 samples/sec   Loss 0.6716   LearningRate 0.0071   Epoch: 14   Global Step: 245060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:10:39,443-Speed 3309.73 samples/sec   Loss 0.6238   LearningRate 0.0071   Epoch: 14   Global Step: 245070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:10:42,513-Speed 3336.69 samples/sec   Loss 0.6198   LearningRate 0.0071   Epoch: 14   Global Step: 245080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:45,606-Speed 3310.63 samples/sec   Loss 0.6481   LearningRate 0.0071   Epoch: 14   Global Step: 245090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:48,680-Speed 3331.96 samples/sec   Loss 0.6289   LearningRate 0.0071   Epoch: 14   Global Step: 245100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:51,837-Speed 3244.65 samples/sec   Loss 0.6601   LearningRate 0.0071   Epoch: 14   Global Step: 245110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:54,912-Speed 3330.99 samples/sec   Loss 0.6365   LearningRate 0.0071   Epoch: 14   Global Step: 245120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:10:57,993-Speed 3324.28 samples/sec   Loss 0.6332   LearningRate 0.0071   Epoch: 14   Global Step: 245130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:01,059-Speed 3340.45 samples/sec   Loss 0.6449   LearningRate 0.0071   Epoch: 14   Global Step: 245140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:04,125-Speed 3341.15 samples/sec   Loss 0.6584   LearningRate 0.0071   Epoch: 14   Global Step: 245150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:07,206-Speed 3324.35 samples/sec   Loss 0.6671   LearningRate 0.0071   Epoch: 14   Global Step: 245160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:10,280-Speed 3331.18 samples/sec   Loss 0.6298   LearningRate 0.0071   Epoch: 14   Global Step: 245170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:13,432-Speed 3250.51 samples/sec   Loss 0.6332   LearningRate 0.0071   Epoch: 14   Global Step: 245180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:16,533-Speed 3302.89 samples/sec   Loss 0.6416   LearningRate 0.0070   Epoch: 14   Global Step: 245190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:19,640-Speed 3296.11 samples/sec   Loss 0.6656   LearningRate 0.0070   Epoch: 14   Global Step: 245200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:22,810-Speed 3230.62 samples/sec   Loss 0.5954   LearningRate 0.0070   Epoch: 14   Global Step: 245210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:25,930-Speed 3283.72 samples/sec   Loss 0.6257   LearningRate 0.0070   Epoch: 14   Global Step: 245220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:29,003-Speed 3332.90 samples/sec   Loss 0.6776   LearningRate 0.0070   Epoch: 14   Global Step: 245230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:32,193-Speed 3210.17 samples/sec   Loss 0.6203   LearningRate 0.0070   Epoch: 14   Global Step: 245240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:35,275-Speed 3323.69 samples/sec   Loss 0.6682   LearningRate 0.0070   Epoch: 14   Global Step: 245250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:38,358-Speed 3321.19 samples/sec   Loss 0.6583   LearningRate 0.0070   Epoch: 14   Global Step: 245260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:41,558-Speed 3200.83 samples/sec   Loss 0.6345   LearningRate 0.0070   Epoch: 14   Global Step: 245270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:44,630-Speed 3334.58 samples/sec   Loss 0.6348   LearningRate 0.0070   Epoch: 14   Global Step: 245280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:11:47,708-Speed 3327.49 samples/sec   Loss 0.6526   LearningRate 0.0070   Epoch: 14   Global Step: 245290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:50,777-Speed 3337.71 samples/sec   Loss 0.6415   LearningRate 0.0070   Epoch: 14   Global Step: 245300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:53,869-Speed 3312.21 samples/sec   Loss 0.6336   LearningRate 0.0070   Epoch: 14   Global Step: 245310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:11:56,940-Speed 3334.79 samples/sec   Loss 0.6338   LearningRate 0.0070   Epoch: 14   Global Step: 245320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:00,025-Speed 3322.00 samples/sec   Loss 0.6356   LearningRate 0.0070   Epoch: 14   Global Step: 245330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:03,108-Speed 3322.03 samples/sec   Loss 0.6455   LearningRate 0.0070   Epoch: 14   Global Step: 245340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:06,222-Speed 3289.42 samples/sec   Loss 0.6423   LearningRate 0.0070   Epoch: 14   Global Step: 245350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:09,312-Speed 3314.63 samples/sec   Loss 0.6387   LearningRate 0.0070   Epoch: 14   Global Step: 245360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:12,457-Speed 3257.00 samples/sec   Loss 0.6312   LearningRate 0.0070   Epoch: 14   Global Step: 245370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:15,677-Speed 3180.14 samples/sec   Loss 0.6439   LearningRate 0.0070   Epoch: 14   Global Step: 245380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:18,816-Speed 3264.41 samples/sec   Loss 0.6689   LearningRate 0.0070   Epoch: 14   Global Step: 245390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:21,991-Speed 3225.47 samples/sec   Loss 0.6520   LearningRate 0.0070   Epoch: 14   Global Step: 245400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:25,057-Speed 3341.21 samples/sec   Loss 0.6283   LearningRate 0.0070   Epoch: 14   Global Step: 245410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:28,123-Speed 3340.25 samples/sec   Loss 0.6692   LearningRate 0.0070   Epoch: 14   Global Step: 245420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:31,223-Speed 3303.61 samples/sec   Loss 0.6773   LearningRate 0.0070   Epoch: 14   Global Step: 245430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:34,411-Speed 3212.39 samples/sec   Loss 0.6380   LearningRate 0.0070   Epoch: 14   Global Step: 245440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:37,473-Speed 3344.92 samples/sec   Loss 0.6466   LearningRate 0.0070   Epoch: 14   Global Step: 245450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:40,540-Speed 3339.60 samples/sec   Loss 0.6188   LearningRate 0.0070   Epoch: 14   Global Step: 245460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:43,638-Speed 3306.37 samples/sec   Loss 0.6488   LearningRate 0.0070   Epoch: 14   Global Step: 245470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:46,703-Speed 3342.71 samples/sec   Loss 0.6711   LearningRate 0.0070   Epoch: 14   Global Step: 245480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:12:49,767-Speed 3342.46 samples/sec   Loss 0.6532   LearningRate 0.0070   Epoch: 14   Global Step: 245490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:52,835-Speed 3338.04 samples/sec   Loss 0.6296   LearningRate 0.0070   Epoch: 14   Global Step: 245500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:55,918-Speed 3322.77 samples/sec   Loss 0.6338   LearningRate 0.0070   Epoch: 14   Global Step: 245510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:12:58,986-Speed 3337.69 samples/sec   Loss 0.6289   LearningRate 0.0070   Epoch: 14   Global Step: 245520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:02,063-Speed 3328.69 samples/sec   Loss 0.6656   LearningRate 0.0070   Epoch: 14   Global Step: 245530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:05,151-Speed 3317.47 samples/sec   Loss 0.6420   LearningRate 0.0070   Epoch: 14   Global Step: 245540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:08,314-Speed 3237.24 samples/sec   Loss 0.6369   LearningRate 0.0070   Epoch: 14   Global Step: 245550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:11,489-Speed 3226.31 samples/sec   Loss 0.6179   LearningRate 0.0070   Epoch: 14   Global Step: 245560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:14,699-Speed 3191.16 samples/sec   Loss 0.6653   LearningRate 0.0070   Epoch: 14   Global Step: 245570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:17,944-Speed 3155.96 samples/sec   Loss 0.6435   LearningRate 0.0070   Epoch: 14   Global Step: 245580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:21,083-Speed 3263.08 samples/sec   Loss 0.6004   LearningRate 0.0070   Epoch: 14   Global Step: 245590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:13:24,147-Speed 3343.27 samples/sec   Loss 0.6296   LearningRate 0.0070   Epoch: 14   Global Step: 245600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:27,257-Speed 3293.08 samples/sec   Loss 0.6931   LearningRate 0.0070   Epoch: 14   Global Step: 245610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:30,373-Speed 3286.56 samples/sec   Loss 0.6623   LearningRate 0.0070   Epoch: 14   Global Step: 245620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:33,455-Speed 3323.42 samples/sec   Loss 0.6335   LearningRate 0.0070   Epoch: 14   Global Step: 245630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:36,555-Speed 3304.39 samples/sec   Loss 0.6357   LearningRate 0.0070   Epoch: 14   Global Step: 245640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:39,645-Speed 3314.81 samples/sec   Loss 0.6719   LearningRate 0.0070   Epoch: 14   Global Step: 245650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:42,711-Speed 3340.04 samples/sec   Loss 0.6358   LearningRate 0.0070   Epoch: 14   Global Step: 245660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:45,787-Speed 3329.58 samples/sec   Loss 0.6252   LearningRate 0.0070   Epoch: 14   Global Step: 245670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:48,860-Speed 3333.95 samples/sec   Loss 0.6093   LearningRate 0.0070   Epoch: 14   Global Step: 245680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:51,928-Speed 3337.55 samples/sec   Loss 0.6352   LearningRate 0.0070   Epoch: 14   Global Step: 245690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:13:55,002-Speed 3332.08 samples/sec   Loss 0.6506   LearningRate 0.0070   Epoch: 14   Global Step: 245700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:13:58,084-Speed 3323.49 samples/sec   Loss 0.6431   LearningRate 0.0070   Epoch: 14   Global Step: 245710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:14:01,167-Speed 3321.79 samples/sec   Loss 0.6318   LearningRate 0.0070   Epoch: 14   Global Step: 245720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:14:04,239-Speed 3334.51 samples/sec   Loss 0.6564   LearningRate 0.0070   Epoch: 14   Global Step: 245730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:14:07,307-Speed 3338.38 samples/sec   Loss 0.6253   LearningRate 0.0070   Epoch: 14   Global Step: 245740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:14:10,395-Speed 3317.32 samples/sec   Loss 0.6426   LearningRate 0.0070   Epoch: 14   Global Step: 245750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:14:13,475-Speed 3324.87 samples/sec   Loss 0.6420   LearningRate 0.0070   Epoch: 14   Global Step: 245760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:16,543-Speed 3338.54 samples/sec   Loss 0.6318   LearningRate 0.0070   Epoch: 14   Global Step: 245770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:19,623-Speed 3325.23 samples/sec   Loss 0.6514   LearningRate 0.0070   Epoch: 14   Global Step: 245780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:22,698-Speed 3331.29 samples/sec   Loss 0.6136   LearningRate 0.0070   Epoch: 14   Global Step: 245790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:25,765-Speed 3338.74 samples/sec   Loss 0.6257   LearningRate 0.0070   Epoch: 14   Global Step: 245800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:28,850-Speed 3320.41 samples/sec   Loss 0.6035   LearningRate 0.0070   Epoch: 14   Global Step: 245810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:31,921-Speed 3335.24 samples/sec   Loss 0.6307   LearningRate 0.0069   Epoch: 14   Global Step: 245820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:35,007-Speed 3319.19 samples/sec   Loss 0.6643   LearningRate 0.0069   Epoch: 14   Global Step: 245830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:38,086-Speed 3326.49 samples/sec   Loss 0.6565   LearningRate 0.0069   Epoch: 14   Global Step: 245840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:41,198-Speed 3291.70 samples/sec   Loss 0.6407   LearningRate 0.0069   Epoch: 14   Global Step: 245850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:14:44,347-Speed 3251.77 samples/sec   Loss 0.6509   LearningRate 0.0069   Epoch: 14   Global Step: 245860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:14:47,530-Speed 3218.00 samples/sec   Loss 0.6344   LearningRate 0.0069   Epoch: 14   Global Step: 245870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:14:50,619-Speed 3315.79 samples/sec   Loss 0.6390   LearningRate 0.0069   Epoch: 14   Global Step: 245880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:14:53,698-Speed 3327.13 samples/sec   Loss 0.6420   LearningRate 0.0069   Epoch: 14   Global Step: 245890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:14:56,832-Speed 3268.38 samples/sec   Loss 0.6615   LearningRate 0.0069   Epoch: 14   Global Step: 245900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:14:59,958-Speed 3275.89 samples/sec   Loss 0.6848   LearningRate 0.0069   Epoch: 14   Global Step: 245910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:15:03,053-Speed 3309.95 samples/sec   Loss 0.6555   LearningRate 0.0069   Epoch: 14   Global Step: 245920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:15:06,128-Speed 3330.43 samples/sec   Loss 0.6301   LearningRate 0.0069   Epoch: 14   Global Step: 245930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:15:09,322-Speed 3207.23 samples/sec   Loss 0.6628   LearningRate 0.0069   Epoch: 14   Global Step: 245940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:15:12,391-Speed 3337.01 samples/sec   Loss 0.6485   LearningRate 0.0069   Epoch: 14   Global Step: 245950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:15:15,499-Speed 3295.31 samples/sec   Loss 0.6681   LearningRate 0.0069   Epoch: 14   Global Step: 245960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:15:18,574-Speed 3330.80 samples/sec   Loss 0.6783   LearningRate 0.0069   Epoch: 14   Global Step: 245970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:15:21,660-Speed 3318.61 samples/sec   Loss 0.6108   LearningRate 0.0069   Epoch: 14   Global Step: 245980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:15:24,792-Speed 3271.24 samples/sec   Loss 0.6362   LearningRate 0.0069   Epoch: 14   Global Step: 245990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:15:27,864-Speed 3333.93 samples/sec   Loss 0.6782   LearningRate 0.0069   Epoch: 14   Global Step: 246000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:16:12,077-[lfw][246000]XNorm: 21.413617
Training: 2022-04-12 01:16:12,078-[lfw][246000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 01:16:12,078-[lfw][246000]Accuracy-Highest: 0.99817
Training: 2022-04-12 01:17:03,429-[cfp_fp][246000]XNorm: 22.522482
Training: 2022-04-12 01:17:03,430-[cfp_fp][246000]Accuracy-Flip: 0.99014+-0.00411
Training: 2022-04-12 01:17:03,430-[cfp_fp][246000]Accuracy-Highest: 0.99186
Training: 2022-04-12 01:17:47,652-[agedb_30][246000]XNorm: 22.873974
Training: 2022-04-12 01:17:47,653-[agedb_30][246000]Accuracy-Flip: 0.98467+-0.00763
Training: 2022-04-12 01:17:47,653-[agedb_30][246000]Accuracy-Highest: 0.98567
Training: 2022-04-12 01:17:50,749-Speed 71.67 samples/sec   Loss 0.6549   LearningRate 0.0069   Epoch: 14   Global Step: 246010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:17:53,808-Speed 3349.08 samples/sec   Loss 0.6121   LearningRate 0.0069   Epoch: 14   Global Step: 246020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:17:56,894-Speed 3318.69 samples/sec   Loss 0.6241   LearningRate 0.0069   Epoch: 14   Global Step: 246030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:17:59,973-Speed 3326.34 samples/sec   Loss 0.6353   LearningRate 0.0069   Epoch: 14   Global Step: 246040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:18:03,049-Speed 3329.69 samples/sec   Loss 0.6311   LearningRate 0.0069   Epoch: 14   Global Step: 246050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:18:06,139-Speed 3314.31 samples/sec   Loss 0.6499   LearningRate 0.0069   Epoch: 14   Global Step: 246060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:18:09,196-Speed 3350.51 samples/sec   Loss 0.6624   LearningRate 0.0069   Epoch: 14   Global Step: 246070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:18:12,364-Speed 3233.13 samples/sec   Loss 0.6006   LearningRate 0.0069   Epoch: 14   Global Step: 246080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:18:15,431-Speed 3339.58 samples/sec   Loss 0.6750   LearningRate 0.0069   Epoch: 14   Global Step: 246090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:18,515-Speed 3320.98 samples/sec   Loss 0.6082   LearningRate 0.0069   Epoch: 14   Global Step: 246100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:21,631-Speed 3287.14 samples/sec   Loss 0.6589   LearningRate 0.0069   Epoch: 14   Global Step: 246110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:24,694-Speed 3343.88 samples/sec   Loss 0.6053   LearningRate 0.0069   Epoch: 14   Global Step: 246120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:27,753-Speed 3348.93 samples/sec   Loss 0.6341   LearningRate 0.0069   Epoch: 14   Global Step: 246130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:30,831-Speed 3327.76 samples/sec   Loss 0.6288   LearningRate 0.0069   Epoch: 14   Global Step: 246140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:34,028-Speed 3203.58 samples/sec   Loss 0.6620   LearningRate 0.0069   Epoch: 14   Global Step: 246150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:37,125-Speed 3306.39 samples/sec   Loss 0.6816   LearningRate 0.0069   Epoch: 14   Global Step: 246160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:40,236-Speed 3292.91 samples/sec   Loss 0.6196   LearningRate 0.0069   Epoch: 14   Global Step: 246170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:43,382-Speed 3255.37 samples/sec   Loss 0.6716   LearningRate 0.0069   Epoch: 14   Global Step: 246180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:18:46,581-Speed 3201.74 samples/sec   Loss 0.6457   LearningRate 0.0069   Epoch: 14   Global Step: 246190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:18:49,681-Speed 3304.88 samples/sec   Loss 0.6299   LearningRate 0.0069   Epoch: 14   Global Step: 246200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:18:52,780-Speed 3304.67 samples/sec   Loss 0.6054   LearningRate 0.0069   Epoch: 14   Global Step: 246210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:18:55,844-Speed 3342.46 samples/sec   Loss 0.6310   LearningRate 0.0069   Epoch: 14   Global Step: 246220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:18:58,913-Speed 3337.68 samples/sec   Loss 0.6789   LearningRate 0.0069   Epoch: 14   Global Step: 246230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:19:01,993-Speed 3325.03 samples/sec   Loss 0.6262   LearningRate 0.0069   Epoch: 14   Global Step: 246240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:19:05,128-Speed 3267.73 samples/sec   Loss 0.6412   LearningRate 0.0069   Epoch: 14   Global Step: 246250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:19:08,240-Speed 3290.97 samples/sec   Loss 0.6390   LearningRate 0.0069   Epoch: 14   Global Step: 246260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:19:11,450-Speed 3191.08 samples/sec   Loss 0.6613   LearningRate 0.0069   Epoch: 14   Global Step: 246270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:19:14,575-Speed 3278.19 samples/sec   Loss 0.6857   LearningRate 0.0069   Epoch: 14   Global Step: 246280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:19:17,664-Speed 3315.72 samples/sec   Loss 0.6806   LearningRate 0.0069   Epoch: 14   Global Step: 246290   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-12 01:19:20,733-Speed 3337.06 samples/sec   Loss 0.6464   LearningRate 0.0069   Epoch: 14   Global Step: 246300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:19:23,837-Speed 3299.09 samples/sec   Loss 0.6909   LearningRate 0.0069   Epoch: 14   Global Step: 246310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:26,987-Speed 3252.10 samples/sec   Loss 0.6305   LearningRate 0.0069   Epoch: 14   Global Step: 246320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:30,111-Speed 3278.50 samples/sec   Loss 0.6234   LearningRate 0.0069   Epoch: 14   Global Step: 246330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:33,208-Speed 3306.84 samples/sec   Loss 0.6686   LearningRate 0.0069   Epoch: 14   Global Step: 246340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:36,389-Speed 3220.06 samples/sec   Loss 0.6251   LearningRate 0.0069   Epoch: 14   Global Step: 246350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:39,458-Speed 3337.00 samples/sec   Loss 0.6187   LearningRate 0.0069   Epoch: 14   Global Step: 246360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:42,534-Speed 3329.96 samples/sec   Loss 0.6103   LearningRate 0.0069   Epoch: 14   Global Step: 246370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:45,596-Speed 3345.59 samples/sec   Loss 0.6458   LearningRate 0.0069   Epoch: 14   Global Step: 246380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:48,662-Speed 3339.88 samples/sec   Loss 0.6732   LearningRate 0.0069   Epoch: 14   Global Step: 246390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:51,732-Speed 3336.18 samples/sec   Loss 0.6716   LearningRate 0.0069   Epoch: 14   Global Step: 246400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:19:54,794-Speed 3345.58 samples/sec   Loss 0.6329   LearningRate 0.0069   Epoch: 14   Global Step: 246410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:19:57,872-Speed 3327.45 samples/sec   Loss 0.6327   LearningRate 0.0069   Epoch: 14   Global Step: 246420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:20:01,036-Speed 3237.14 samples/sec   Loss 0.6286   LearningRate 0.0069   Epoch: 14   Global Step: 246430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:20:04,111-Speed 3331.24 samples/sec   Loss 0.6798   LearningRate 0.0069   Epoch: 14   Global Step: 246440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:20:07,242-Speed 3270.67 samples/sec   Loss 0.6483   LearningRate 0.0069   Epoch: 14   Global Step: 246450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:20:10,310-Speed 3338.65 samples/sec   Loss 0.6499   LearningRate 0.0068   Epoch: 14   Global Step: 246460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:13,392-Speed 3323.80 samples/sec   Loss 0.6363   LearningRate 0.0068   Epoch: 14   Global Step: 246470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:16,556-Speed 3236.53 samples/sec   Loss 0.6667   LearningRate 0.0068   Epoch: 14   Global Step: 246480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:19,660-Speed 3299.93 samples/sec   Loss 0.6201   LearningRate 0.0068   Epoch: 14   Global Step: 246490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:22,794-Speed 3267.46 samples/sec   Loss 0.6095   LearningRate 0.0068   Epoch: 14   Global Step: 246500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:25,858-Speed 3343.50 samples/sec   Loss 0.6264   LearningRate 0.0068   Epoch: 14   Global Step: 246510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:28,949-Speed 3313.28 samples/sec   Loss 0.6244   LearningRate 0.0068   Epoch: 14   Global Step: 246520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:32,034-Speed 3320.31 samples/sec   Loss 0.6628   LearningRate 0.0068   Epoch: 14   Global Step: 246530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:35,111-Speed 3328.63 samples/sec   Loss 0.6285   LearningRate 0.0068   Epoch: 14   Global Step: 246540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:38,183-Speed 3334.14 samples/sec   Loss 0.6461   LearningRate 0.0068   Epoch: 14   Global Step: 246550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:41,251-Speed 3338.40 samples/sec   Loss 0.6327   LearningRate 0.0068   Epoch: 14   Global Step: 246560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:20:44,302-Speed 3358.23 samples/sec   Loss 0.6446   LearningRate 0.0068   Epoch: 14   Global Step: 246570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:47,391-Speed 3315.87 samples/sec   Loss 0.6192   LearningRate 0.0068   Epoch: 14   Global Step: 246580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:50,457-Speed 3341.02 samples/sec   Loss 0.6074   LearningRate 0.0068   Epoch: 14   Global Step: 246590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:53,533-Speed 3329.79 samples/sec   Loss 0.6830   LearningRate 0.0068   Epoch: 14   Global Step: 246600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:56,657-Speed 3278.36 samples/sec   Loss 0.6524   LearningRate 0.0068   Epoch: 14   Global Step: 246610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:20:59,771-Speed 3289.42 samples/sec   Loss 0.6696   LearningRate 0.0068   Epoch: 14   Global Step: 246620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:21:02,846-Speed 3331.10 samples/sec   Loss 0.6132   LearningRate 0.0068   Epoch: 14   Global Step: 246630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:05,917-Speed 3334.90 samples/sec   Loss 0.6262   LearningRate 0.0068   Epoch: 14   Global Step: 246640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:09,029-Speed 3291.23 samples/sec   Loss 0.6306   LearningRate 0.0068   Epoch: 14   Global Step: 246650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:12,107-Speed 3327.63 samples/sec   Loss 0.6298   LearningRate 0.0068   Epoch: 14   Global Step: 246660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:15,209-Speed 3301.62 samples/sec   Loss 0.6312   LearningRate 0.0068   Epoch: 14   Global Step: 246670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:18,308-Speed 3305.08 samples/sec   Loss 0.6537   LearningRate 0.0068   Epoch: 14   Global Step: 246680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:21,372-Speed 3343.05 samples/sec   Loss 0.6173   LearningRate 0.0068   Epoch: 14   Global Step: 246690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:24,477-Speed 3298.05 samples/sec   Loss 0.6331   LearningRate 0.0068   Epoch: 14   Global Step: 246700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:27,579-Speed 3302.72 samples/sec   Loss 0.6487   LearningRate 0.0068   Epoch: 14   Global Step: 246710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:30,682-Speed 3300.19 samples/sec   Loss 0.6712   LearningRate 0.0068   Epoch: 14   Global Step: 246720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:21:33,832-Speed 3252.27 samples/sec   Loss 0.6323   LearningRate 0.0068   Epoch: 14   Global Step: 246730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:21:36,976-Speed 3257.25 samples/sec   Loss 0.6208   LearningRate 0.0068   Epoch: 14   Global Step: 246740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:21:40,099-Speed 3279.01 samples/sec   Loss 0.6591   LearningRate 0.0068   Epoch: 14   Global Step: 246750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:21:43,182-Speed 3322.29 samples/sec   Loss 0.6702   LearningRate 0.0068   Epoch: 14   Global Step: 246760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:21:46,276-Speed 3310.50 samples/sec   Loss 0.6307   LearningRate 0.0068   Epoch: 14   Global Step: 246770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:21:49,375-Speed 3305.47 samples/sec   Loss 0.6738   LearningRate 0.0068   Epoch: 14   Global Step: 246780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:21:52,456-Speed 3324.37 samples/sec   Loss 0.6459   LearningRate 0.0068   Epoch: 14   Global Step: 246790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:21:55,548-Speed 3312.30 samples/sec   Loss 0.6414   LearningRate 0.0068   Epoch: 14   Global Step: 246800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:21:58,626-Speed 3327.37 samples/sec   Loss 0.6806   LearningRate 0.0068   Epoch: 14   Global Step: 246810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:01,746-Speed 3283.78 samples/sec   Loss 0.6544   LearningRate 0.0068   Epoch: 14   Global Step: 246820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:04,824-Speed 3326.55 samples/sec   Loss 0.6691   LearningRate 0.0068   Epoch: 14   Global Step: 246830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:22:07,888-Speed 3343.34 samples/sec   Loss 0.6375   LearningRate 0.0068   Epoch: 14   Global Step: 246840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:22:11,043-Speed 3246.17 samples/sec   Loss 0.6469   LearningRate 0.0068   Epoch: 14   Global Step: 246850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:22:14,107-Speed 3343.28 samples/sec   Loss 0.6712   LearningRate 0.0068   Epoch: 14   Global Step: 246860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:17,181-Speed 3331.00 samples/sec   Loss 0.6610   LearningRate 0.0068   Epoch: 14   Global Step: 246870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:20,257-Speed 3330.56 samples/sec   Loss 0.6367   LearningRate 0.0068   Epoch: 14   Global Step: 246880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:23,412-Speed 3246.91 samples/sec   Loss 0.6451   LearningRate 0.0068   Epoch: 14   Global Step: 246890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:26,530-Speed 3284.20 samples/sec   Loss 0.6097   LearningRate 0.0068   Epoch: 14   Global Step: 246900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:29,614-Speed 3321.30 samples/sec   Loss 0.6818   LearningRate 0.0068   Epoch: 14   Global Step: 246910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:32,697-Speed 3322.24 samples/sec   Loss 0.6409   LearningRate 0.0068   Epoch: 14   Global Step: 246920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:35,838-Speed 3260.83 samples/sec   Loss 0.6514   LearningRate 0.0068   Epoch: 14   Global Step: 246930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:38,990-Speed 3249.37 samples/sec   Loss 0.6435   LearningRate 0.0068   Epoch: 14   Global Step: 246940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:42,196-Speed 3194.00 samples/sec   Loss 0.6778   LearningRate 0.0068   Epoch: 14   Global Step: 246950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:22:45,309-Speed 3291.16 samples/sec   Loss 0.6707   LearningRate 0.0068   Epoch: 14   Global Step: 246960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:22:48,401-Speed 3312.46 samples/sec   Loss 0.6653   LearningRate 0.0068   Epoch: 14   Global Step: 246970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:22:51,481-Speed 3325.07 samples/sec   Loss 0.6718   LearningRate 0.0068   Epoch: 14   Global Step: 246980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:22:54,564-Speed 3322.27 samples/sec   Loss 0.6457   LearningRate 0.0068   Epoch: 14   Global Step: 246990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:22:57,708-Speed 3258.29 samples/sec   Loss 0.6233   LearningRate 0.0068   Epoch: 14   Global Step: 247000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:00,803-Speed 3309.42 samples/sec   Loss 0.6306   LearningRate 0.0068   Epoch: 14   Global Step: 247010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:03,881-Speed 3327.70 samples/sec   Loss 0.6671   LearningRate 0.0068   Epoch: 14   Global Step: 247020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:07,025-Speed 3256.63 samples/sec   Loss 0.6627   LearningRate 0.0068   Epoch: 14   Global Step: 247030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:10,094-Speed 3338.09 samples/sec   Loss 0.6719   LearningRate 0.0068   Epoch: 14   Global Step: 247040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:13,170-Speed 3329.33 samples/sec   Loss 0.6196   LearningRate 0.0068   Epoch: 14   Global Step: 247050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:16,261-Speed 3314.02 samples/sec   Loss 0.6659   LearningRate 0.0068   Epoch: 14   Global Step: 247060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:19,338-Speed 3329.22 samples/sec   Loss 0.6454   LearningRate 0.0068   Epoch: 14   Global Step: 247070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:22,410-Speed 3333.30 samples/sec   Loss 0.6129   LearningRate 0.0068   Epoch: 14   Global Step: 247080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:25,546-Speed 3267.66 samples/sec   Loss 0.6428   LearningRate 0.0068   Epoch: 14   Global Step: 247090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:28,632-Speed 3318.82 samples/sec   Loss 0.6518   LearningRate 0.0067   Epoch: 14   Global Step: 247100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:23:31,728-Speed 3308.03 samples/sec   Loss 0.6424   LearningRate 0.0067   Epoch: 14   Global Step: 247110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:34,824-Speed 3308.38 samples/sec   Loss 0.6592   LearningRate 0.0067   Epoch: 14   Global Step: 247120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:37,945-Speed 3281.17 samples/sec   Loss 0.6025   LearningRate 0.0067   Epoch: 14   Global Step: 247130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:41,108-Speed 3238.71 samples/sec   Loss 0.6449   LearningRate 0.0067   Epoch: 14   Global Step: 247140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:44,174-Speed 3340.92 samples/sec   Loss 0.6293   LearningRate 0.0067   Epoch: 14   Global Step: 247150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:47,364-Speed 3210.40 samples/sec   Loss 0.6932   LearningRate 0.0067   Epoch: 14   Global Step: 247160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:50,504-Speed 3261.96 samples/sec   Loss 0.6757   LearningRate 0.0067   Epoch: 14   Global Step: 247170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:53,699-Speed 3205.65 samples/sec   Loss 0.6461   LearningRate 0.0067   Epoch: 14   Global Step: 247180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:56,782-Speed 3321.89 samples/sec   Loss 0.6517   LearningRate 0.0067   Epoch: 14   Global Step: 247190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:23:59,872-Speed 3314.26 samples/sec   Loss 0.6427   LearningRate 0.0067   Epoch: 14   Global Step: 247200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:24:02,994-Speed 3281.45 samples/sec   Loss 0.6870   LearningRate 0.0067   Epoch: 14   Global Step: 247210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:06,129-Speed 3266.86 samples/sec   Loss 0.6434   LearningRate 0.0067   Epoch: 14   Global Step: 247220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:09,341-Speed 3188.98 samples/sec   Loss 0.6533   LearningRate 0.0067   Epoch: 14   Global Step: 247230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:12,419-Speed 3327.94 samples/sec   Loss 0.6341   LearningRate 0.0067   Epoch: 14   Global Step: 247240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:15,500-Speed 3324.57 samples/sec   Loss 0.6204   LearningRate 0.0067   Epoch: 14   Global Step: 247250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:18,571-Speed 3334.07 samples/sec   Loss 0.6490   LearningRate 0.0067   Epoch: 14   Global Step: 247260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:21,639-Speed 3338.51 samples/sec   Loss 0.6253   LearningRate 0.0067   Epoch: 14   Global Step: 247270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:24,707-Speed 3338.41 samples/sec   Loss 0.6644   LearningRate 0.0067   Epoch: 14   Global Step: 247280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:27,780-Speed 3333.51 samples/sec   Loss 0.6466   LearningRate 0.0067   Epoch: 14   Global Step: 247290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:30,873-Speed 3311.12 samples/sec   Loss 0.6430   LearningRate 0.0067   Epoch: 14   Global Step: 247300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:24:33,931-Speed 3349.64 samples/sec   Loss 0.6627   LearningRate 0.0067   Epoch: 14   Global Step: 247310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:24:37,049-Speed 3284.79 samples/sec   Loss 0.6516   LearningRate 0.0067   Epoch: 14   Global Step: 247320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:24:40,117-Speed 3338.87 samples/sec   Loss 0.6519   LearningRate 0.0067   Epoch: 14   Global Step: 247330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:24:43,323-Speed 3194.99 samples/sec   Loss 0.6295   LearningRate 0.0067   Epoch: 14   Global Step: 247340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:24:46,445-Speed 3280.64 samples/sec   Loss 0.6678   LearningRate 0.0067   Epoch: 14   Global Step: 247350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:24:49,511-Speed 3339.81 samples/sec   Loss 0.6930   LearningRate 0.0067   Epoch: 14   Global Step: 247360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:24:52,591-Speed 3326.19 samples/sec   Loss 0.6209   LearningRate 0.0067   Epoch: 14   Global Step: 247370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:24:55,688-Speed 3307.39 samples/sec   Loss 0.6515   LearningRate 0.0067   Epoch: 14   Global Step: 247380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:24:58,780-Speed 3312.48 samples/sec   Loss 0.6363   LearningRate 0.0067   Epoch: 14   Global Step: 247390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:25:01,846-Speed 3340.03 samples/sec   Loss 0.6347   LearningRate 0.0067   Epoch: 14   Global Step: 247400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:25:04,971-Speed 3277.76 samples/sec   Loss 0.6526   LearningRate 0.0067   Epoch: 14   Global Step: 247410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:25:08,038-Speed 3339.86 samples/sec   Loss 0.6526   LearningRate 0.0067   Epoch: 14   Global Step: 247420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:25:11,117-Speed 3326.53 samples/sec   Loss 0.6220   LearningRate 0.0067   Epoch: 14   Global Step: 247430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:25:14,209-Speed 3312.29 samples/sec   Loss 0.6380   LearningRate 0.0067   Epoch: 14   Global Step: 247440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:17,290-Speed 3324.41 samples/sec   Loss 0.6499   LearningRate 0.0067   Epoch: 14   Global Step: 247450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:20,454-Speed 3236.40 samples/sec   Loss 0.6337   LearningRate 0.0067   Epoch: 14   Global Step: 247460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:23,527-Speed 3333.16 samples/sec   Loss 0.6631   LearningRate 0.0067   Epoch: 14   Global Step: 247470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:26,608-Speed 3325.08 samples/sec   Loss 0.6673   LearningRate 0.0067   Epoch: 14   Global Step: 247480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:29,699-Speed 3313.07 samples/sec   Loss 0.6600   LearningRate 0.0067   Epoch: 14   Global Step: 247490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:32,808-Speed 3294.67 samples/sec   Loss 0.6528   LearningRate 0.0067   Epoch: 14   Global Step: 247500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:35,882-Speed 3331.93 samples/sec   Loss 0.6552   LearningRate 0.0067   Epoch: 14   Global Step: 247510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:38,996-Speed 3288.48 samples/sec   Loss 0.6313   LearningRate 0.0067   Epoch: 14   Global Step: 247520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:42,131-Speed 3267.18 samples/sec   Loss 0.6157   LearningRate 0.0067   Epoch: 14   Global Step: 247530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:45,212-Speed 3325.16 samples/sec   Loss 0.6683   LearningRate 0.0067   Epoch: 14   Global Step: 247540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:25:48,298-Speed 3318.73 samples/sec   Loss 0.6531   LearningRate 0.0067   Epoch: 14   Global Step: 247550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:25:51,365-Speed 3339.08 samples/sec   Loss 0.6067   LearningRate 0.0067   Epoch: 14   Global Step: 247560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:54,509-Speed 3257.97 samples/sec   Loss 0.6460   LearningRate 0.0067   Epoch: 14   Global Step: 247570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:25:57,611-Speed 3302.18 samples/sec   Loss 0.6782   LearningRate 0.0067   Epoch: 14   Global Step: 247580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:00,700-Speed 3315.47 samples/sec   Loss 0.6565   LearningRate 0.0067   Epoch: 14   Global Step: 247590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:03,787-Speed 3317.89 samples/sec   Loss 0.6319   LearningRate 0.0067   Epoch: 14   Global Step: 247600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:06,873-Speed 3318.69 samples/sec   Loss 0.6401   LearningRate 0.0067   Epoch: 14   Global Step: 247610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:09,950-Speed 3329.05 samples/sec   Loss 0.6577   LearningRate 0.0067   Epoch: 14   Global Step: 247620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:13,025-Speed 3330.85 samples/sec   Loss 0.6409   LearningRate 0.0067   Epoch: 14   Global Step: 247630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:16,128-Speed 3300.37 samples/sec   Loss 0.6657   LearningRate 0.0067   Epoch: 14   Global Step: 247640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:19,208-Speed 3325.60 samples/sec   Loss 0.6842   LearningRate 0.0067   Epoch: 14   Global Step: 247650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:22,345-Speed 3265.70 samples/sec   Loss 0.6588   LearningRate 0.0067   Epoch: 14   Global Step: 247660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:26:25,442-Speed 3306.74 samples/sec   Loss 0.6478   LearningRate 0.0067   Epoch: 14   Global Step: 247670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:26:28,550-Speed 3295.47 samples/sec   Loss 0.6756   LearningRate 0.0067   Epoch: 14   Global Step: 247680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:26:31,621-Speed 3335.56 samples/sec   Loss 0.6954   LearningRate 0.0067   Epoch: 14   Global Step: 247690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:34,699-Speed 3327.64 samples/sec   Loss 0.6619   LearningRate 0.0067   Epoch: 14   Global Step: 247700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:37,855-Speed 3245.19 samples/sec   Loss 0.6303   LearningRate 0.0067   Epoch: 14   Global Step: 247710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:40,920-Speed 3340.72 samples/sec   Loss 0.6397   LearningRate 0.0067   Epoch: 14   Global Step: 247720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:43,991-Speed 3336.02 samples/sec   Loss 0.6514   LearningRate 0.0067   Epoch: 14   Global Step: 247730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:47,093-Speed 3301.12 samples/sec   Loss 0.6317   LearningRate 0.0066   Epoch: 14   Global Step: 247740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:50,165-Speed 3334.45 samples/sec   Loss 0.6286   LearningRate 0.0066   Epoch: 14   Global Step: 247750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:53,254-Speed 3316.44 samples/sec   Loss 0.6411   LearningRate 0.0066   Epoch: 14   Global Step: 247760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:56,319-Speed 3340.59 samples/sec   Loss 0.6294   LearningRate 0.0066   Epoch: 14   Global Step: 247770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:26:59,413-Speed 3310.87 samples/sec   Loss 0.6520   LearningRate 0.0066   Epoch: 14   Global Step: 247780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:02,487-Speed 3332.37 samples/sec   Loss 0.6823   LearningRate 0.0066   Epoch: 14   Global Step: 247790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:27:05,557-Speed 3335.80 samples/sec   Loss 0.6625   LearningRate 0.0066   Epoch: 14   Global Step: 247800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:27:08,622-Speed 3341.72 samples/sec   Loss 0.6388   LearningRate 0.0066   Epoch: 14   Global Step: 247810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:11,717-Speed 3309.34 samples/sec   Loss 0.6850   LearningRate 0.0066   Epoch: 14   Global Step: 247820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:14,824-Speed 3296.81 samples/sec   Loss 0.6610   LearningRate 0.0066   Epoch: 14   Global Step: 247830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:17,928-Speed 3299.76 samples/sec   Loss 0.6496   LearningRate 0.0066   Epoch: 14   Global Step: 247840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:21,027-Speed 3304.42 samples/sec   Loss 0.6541   LearningRate 0.0066   Epoch: 14   Global Step: 247850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:24,097-Speed 3336.76 samples/sec   Loss 0.6749   LearningRate 0.0066   Epoch: 14   Global Step: 247860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:27,179-Speed 3323.12 samples/sec   Loss 0.6051   LearningRate 0.0066   Epoch: 14   Global Step: 247870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:30,246-Speed 3338.88 samples/sec   Loss 0.6410   LearningRate 0.0066   Epoch: 14   Global Step: 247880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:33,313-Speed 3339.75 samples/sec   Loss 0.6302   LearningRate 0.0066   Epoch: 14   Global Step: 247890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:36,380-Speed 3340.05 samples/sec   Loss 0.6216   LearningRate 0.0066   Epoch: 14   Global Step: 247900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:39,468-Speed 3316.62 samples/sec   Loss 0.6376   LearningRate 0.0066   Epoch: 14   Global Step: 247910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:27:42,558-Speed 3314.71 samples/sec   Loss 0.6487   LearningRate 0.0066   Epoch: 14   Global Step: 247920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:27:45,626-Speed 3338.37 samples/sec   Loss 0.6750   LearningRate 0.0066   Epoch: 14   Global Step: 247930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:27:48,681-Speed 3353.14 samples/sec   Loss 0.6491   LearningRate 0.0066   Epoch: 14   Global Step: 247940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:51,765-Speed 3321.02 samples/sec   Loss 0.6316   LearningRate 0.0066   Epoch: 14   Global Step: 247950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:54,993-Speed 3172.73 samples/sec   Loss 0.6504   LearningRate 0.0066   Epoch: 14   Global Step: 247960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:27:58,055-Speed 3344.95 samples/sec   Loss 0.6625   LearningRate 0.0066   Epoch: 14   Global Step: 247970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:28:01,132-Speed 3328.89 samples/sec   Loss 0.6353   LearningRate 0.0066   Epoch: 14   Global Step: 247980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:28:04,204-Speed 3333.99 samples/sec   Loss 0.6186   LearningRate 0.0066   Epoch: 14   Global Step: 247990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:28:07,282-Speed 3327.63 samples/sec   Loss 0.6498   LearningRate 0.0066   Epoch: 14   Global Step: 248000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:28:50,869-[lfw][248000]XNorm: 21.803173
Training: 2022-04-12 01:28:50,869-[lfw][248000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-12 01:28:50,870-[lfw][248000]Accuracy-Highest: 0.99817
Training: 2022-04-12 01:29:41,510-[cfp_fp][248000]XNorm: 23.176701
Training: 2022-04-12 01:29:41,510-[cfp_fp][248000]Accuracy-Flip: 0.99114+-0.00518
Training: 2022-04-12 01:29:41,511-[cfp_fp][248000]Accuracy-Highest: 0.99186
Training: 2022-04-12 01:30:25,064-[agedb_30][248000]XNorm: 23.649254
Training: 2022-04-12 01:30:25,064-[agedb_30][248000]Accuracy-Flip: 0.98450+-0.00723
Training: 2022-04-12 01:30:25,065-[agedb_30][248000]Accuracy-Highest: 0.98567
Training: 2022-04-12 01:30:28,183-Speed 72.68 samples/sec   Loss 0.6443   LearningRate 0.0066   Epoch: 14   Global Step: 248010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:30:31,239-Speed 3351.69 samples/sec   Loss 0.6219   LearningRate 0.0066   Epoch: 14   Global Step: 248020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:30:34,305-Speed 3340.69 samples/sec   Loss 0.6367   LearningRate 0.0066   Epoch: 14   Global Step: 248030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:30:37,444-Speed 3263.28 samples/sec   Loss 0.6406   LearningRate 0.0066   Epoch: 14   Global Step: 248040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:30:40,528-Speed 3320.21 samples/sec   Loss 0.6600   LearningRate 0.0066   Epoch: 14   Global Step: 248050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:30:43,621-Speed 3311.95 samples/sec   Loss 0.6577   LearningRate 0.0066   Epoch: 14   Global Step: 248060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:30:46,699-Speed 3327.27 samples/sec   Loss 0.6610   LearningRate 0.0066   Epoch: 14   Global Step: 248070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:30:49,906-Speed 3194.58 samples/sec   Loss 0.6401   LearningRate 0.0066   Epoch: 14   Global Step: 248080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:30:52,975-Speed 3337.18 samples/sec   Loss 0.6207   LearningRate 0.0066   Epoch: 14   Global Step: 248090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:30:56,075-Speed 3303.33 samples/sec   Loss 0.6573   LearningRate 0.0066   Epoch: 14   Global Step: 248100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:30:59,261-Speed 3215.52 samples/sec   Loss 0.6607   LearningRate 0.0066   Epoch: 14   Global Step: 248110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:02,395-Speed 3267.24 samples/sec   Loss 0.6437   LearningRate 0.0066   Epoch: 14   Global Step: 248120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:05,467-Speed 3334.52 samples/sec   Loss 0.6321   LearningRate 0.0066   Epoch: 14   Global Step: 248130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:08,637-Speed 3230.88 samples/sec   Loss 0.6627   LearningRate 0.0066   Epoch: 14   Global Step: 248140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:11,719-Speed 3323.33 samples/sec   Loss 0.6295   LearningRate 0.0066   Epoch: 14   Global Step: 248150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:14,781-Speed 3344.80 samples/sec   Loss 0.6591   LearningRate 0.0066   Epoch: 14   Global Step: 248160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:17,847-Speed 3341.44 samples/sec   Loss 0.6511   LearningRate 0.0066   Epoch: 14   Global Step: 248170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:20,911-Speed 3342.01 samples/sec   Loss 0.6460   LearningRate 0.0066   Epoch: 14   Global Step: 248180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:23,992-Speed 3325.22 samples/sec   Loss 0.6251   LearningRate 0.0066   Epoch: 14   Global Step: 248190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:31:27,081-Speed 3314.97 samples/sec   Loss 0.6591   LearningRate 0.0066   Epoch: 14   Global Step: 248200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:31:30,245-Speed 3237.60 samples/sec   Loss 0.6281   LearningRate 0.0066   Epoch: 14   Global Step: 248210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:31:33,330-Speed 3319.17 samples/sec   Loss 0.6377   LearningRate 0.0066   Epoch: 14   Global Step: 248220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:31:36,414-Speed 3336.42 samples/sec   Loss 0.6310   LearningRate 0.0066   Epoch: 14   Global Step: 248230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:31:39,466-Speed 3355.64 samples/sec   Loss 0.6193   LearningRate 0.0066   Epoch: 14   Global Step: 248240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:42,529-Speed 3343.92 samples/sec   Loss 0.6459   LearningRate 0.0066   Epoch: 14   Global Step: 248250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:45,629-Speed 3304.27 samples/sec   Loss 0.6808   LearningRate 0.0066   Epoch: 14   Global Step: 248260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:48,707-Speed 3327.23 samples/sec   Loss 0.6652   LearningRate 0.0066   Epoch: 14   Global Step: 248270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:51,795-Speed 3316.56 samples/sec   Loss 0.6546   LearningRate 0.0066   Epoch: 14   Global Step: 248280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:54,866-Speed 3335.88 samples/sec   Loss 0.6208   LearningRate 0.0066   Epoch: 14   Global Step: 248290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:31:57,952-Speed 3318.51 samples/sec   Loss 0.6524   LearningRate 0.0066   Epoch: 14   Global Step: 248300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:32:01,044-Speed 3312.39 samples/sec   Loss 0.6615   LearningRate 0.0066   Epoch: 14   Global Step: 248310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:32:04,110-Speed 3341.16 samples/sec   Loss 0.6379   LearningRate 0.0066   Epoch: 14   Global Step: 248320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:32:07,179-Speed 3338.16 samples/sec   Loss 0.6432   LearningRate 0.0066   Epoch: 14   Global Step: 248330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:32:10,245-Speed 3339.46 samples/sec   Loss 0.6527   LearningRate 0.0066   Epoch: 14   Global Step: 248340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:32:13,353-Speed 3295.42 samples/sec   Loss 0.6432   LearningRate 0.0066   Epoch: 14   Global Step: 248350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:32:16,438-Speed 3320.28 samples/sec   Loss 0.6091   LearningRate 0.0066   Epoch: 14   Global Step: 248360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:32:19,505-Speed 3339.40 samples/sec   Loss 0.6489   LearningRate 0.0066   Epoch: 14   Global Step: 248370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:32:22,578-Speed 3333.63 samples/sec   Loss 0.6381   LearningRate 0.0066   Epoch: 14   Global Step: 248380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:32:25,660-Speed 3322.61 samples/sec   Loss 0.6207   LearningRate 0.0065   Epoch: 14   Global Step: 248390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:32:28,730-Speed 3336.90 samples/sec   Loss 0.6275   LearningRate 0.0065   Epoch: 14   Global Step: 248400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:32:31,802-Speed 3334.44 samples/sec   Loss 0.6470   LearningRate 0.0065   Epoch: 14   Global Step: 248410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:32:34,863-Speed 3345.28 samples/sec   Loss 0.6528   LearningRate 0.0065   Epoch: 14   Global Step: 248420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:32:37,933-Speed 3337.20 samples/sec   Loss 0.6154   LearningRate 0.0065   Epoch: 14   Global Step: 248430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:32:41,035-Speed 3301.55 samples/sec   Loss 0.6213   LearningRate 0.0065   Epoch: 14   Global Step: 248440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:32:44,132-Speed 3306.46 samples/sec   Loss 0.6506   LearningRate 0.0065   Epoch: 14   Global Step: 248450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:32:47,220-Speed 3316.90 samples/sec   Loss 0.6770   LearningRate 0.0065   Epoch: 14   Global Step: 248460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:32:50,310-Speed 3315.05 samples/sec   Loss 0.6612   LearningRate 0.0065   Epoch: 14   Global Step: 248470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:32:53,384-Speed 3331.75 samples/sec   Loss 0.6817   LearningRate 0.0065   Epoch: 14   Global Step: 248480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:32:56,580-Speed 3204.18 samples/sec   Loss 0.6446   LearningRate 0.0065   Epoch: 14   Global Step: 248490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:32:59,695-Speed 3289.06 samples/sec   Loss 0.6651   LearningRate 0.0065   Epoch: 14   Global Step: 248500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:33:02,760-Speed 3341.79 samples/sec   Loss 0.6386   LearningRate 0.0065   Epoch: 14   Global Step: 248510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:33:05,830-Speed 3335.61 samples/sec   Loss 0.6625   LearningRate 0.0065   Epoch: 14   Global Step: 248520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:33:08,900-Speed 3336.81 samples/sec   Loss 0.6398   LearningRate 0.0065   Epoch: 14   Global Step: 248530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:12,075-Speed 3225.33 samples/sec   Loss 0.6384   LearningRate 0.0065   Epoch: 14   Global Step: 248540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:15,200-Speed 3277.43 samples/sec   Loss 0.6521   LearningRate 0.0065   Epoch: 14   Global Step: 248550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:18,294-Speed 3310.97 samples/sec   Loss 0.7050   LearningRate 0.0065   Epoch: 14   Global Step: 248560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:21,384-Speed 3314.34 samples/sec   Loss 0.6343   LearningRate 0.0065   Epoch: 14   Global Step: 248570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:24,485-Speed 3302.86 samples/sec   Loss 0.6529   LearningRate 0.0065   Epoch: 14   Global Step: 248580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:27,549-Speed 3343.74 samples/sec   Loss 0.6288   LearningRate 0.0065   Epoch: 14   Global Step: 248590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:30,644-Speed 3308.81 samples/sec   Loss 0.6697   LearningRate 0.0065   Epoch: 14   Global Step: 248600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:33,719-Speed 3330.29 samples/sec   Loss 0.6360   LearningRate 0.0065   Epoch: 14   Global Step: 248610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:36,807-Speed 3317.32 samples/sec   Loss 0.6615   LearningRate 0.0065   Epoch: 14   Global Step: 248620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:39,880-Speed 3333.34 samples/sec   Loss 0.6492   LearningRate 0.0065   Epoch: 14   Global Step: 248630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:33:42,949-Speed 3336.39 samples/sec   Loss 0.6410   LearningRate 0.0065   Epoch: 14   Global Step: 248640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:33:46,033-Speed 3321.40 samples/sec   Loss 0.6693   LearningRate 0.0065   Epoch: 14   Global Step: 248650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:33:49,098-Speed 3341.51 samples/sec   Loss 0.6360   LearningRate 0.0065   Epoch: 14   Global Step: 248660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:33:52,182-Speed 3321.22 samples/sec   Loss 0.6662   LearningRate 0.0065   Epoch: 14   Global Step: 248670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:55,269-Speed 3317.97 samples/sec   Loss 0.6529   LearningRate 0.0065   Epoch: 14   Global Step: 248680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:33:58,381-Speed 3291.63 samples/sec   Loss 0.6734   LearningRate 0.0065   Epoch: 14   Global Step: 248690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:01,484-Speed 3300.29 samples/sec   Loss 0.6594   LearningRate 0.0065   Epoch: 14   Global Step: 248700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:04,567-Speed 3322.36 samples/sec   Loss 0.5952   LearningRate 0.0065   Epoch: 14   Global Step: 248710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:07,641-Speed 3331.54 samples/sec   Loss 0.6536   LearningRate 0.0065   Epoch: 14   Global Step: 248720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:10,809-Speed 3233.22 samples/sec   Loss 0.6614   LearningRate 0.0065   Epoch: 14   Global Step: 248730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:13,875-Speed 3340.55 samples/sec   Loss 0.6461   LearningRate 0.0065   Epoch: 14   Global Step: 248740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:16,948-Speed 3333.11 samples/sec   Loss 0.6422   LearningRate 0.0065   Epoch: 14   Global Step: 248750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:20,020-Speed 3334.84 samples/sec   Loss 0.6589   LearningRate 0.0065   Epoch: 14   Global Step: 248760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:23,082-Speed 3344.52 samples/sec   Loss 0.6382   LearningRate 0.0065   Epoch: 14   Global Step: 248770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:34:26,158-Speed 3329.30 samples/sec   Loss 0.6497   LearningRate 0.0065   Epoch: 14   Global Step: 248780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:34:29,242-Speed 3321.09 samples/sec   Loss 0.6547   LearningRate 0.0065   Epoch: 14   Global Step: 248790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:34:32,338-Speed 3308.27 samples/sec   Loss 0.6680   LearningRate 0.0065   Epoch: 14   Global Step: 248800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:34:35,413-Speed 3330.74 samples/sec   Loss 0.6508   LearningRate 0.0065   Epoch: 14   Global Step: 248810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:34:38,494-Speed 3324.72 samples/sec   Loss 0.6690   LearningRate 0.0065   Epoch: 14   Global Step: 248820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:34:41,553-Speed 3348.41 samples/sec   Loss 0.6602   LearningRate 0.0065   Epoch: 14   Global Step: 248830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:44,625-Speed 3333.92 samples/sec   Loss 0.6358   LearningRate 0.0065   Epoch: 14   Global Step: 248840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:47,735-Speed 3293.17 samples/sec   Loss 0.6916   LearningRate 0.0065   Epoch: 14   Global Step: 248850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:50,819-Speed 3321.73 samples/sec   Loss 0.6176   LearningRate 0.0065   Epoch: 14   Global Step: 248860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:53,890-Speed 3334.45 samples/sec   Loss 0.6581   LearningRate 0.0065   Epoch: 14   Global Step: 248870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:34:56,979-Speed 3316.48 samples/sec   Loss 0.6667   LearningRate 0.0065   Epoch: 14   Global Step: 248880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:00,050-Speed 3334.93 samples/sec   Loss 0.6320   LearningRate 0.0065   Epoch: 14   Global Step: 248890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:03,119-Speed 3337.44 samples/sec   Loss 0.6434   LearningRate 0.0065   Epoch: 14   Global Step: 248900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:06,185-Speed 3340.23 samples/sec   Loss 0.6630   LearningRate 0.0065   Epoch: 14   Global Step: 248910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:09,270-Speed 3319.84 samples/sec   Loss 0.6724   LearningRate 0.0065   Epoch: 14   Global Step: 248920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:12,345-Speed 3331.01 samples/sec   Loss 0.6386   LearningRate 0.0065   Epoch: 14   Global Step: 248930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:35:15,412-Speed 3340.08 samples/sec   Loss 0.6401   LearningRate 0.0065   Epoch: 14   Global Step: 248940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:35:18,503-Speed 3313.48 samples/sec   Loss 0.6345   LearningRate 0.0065   Epoch: 14   Global Step: 248950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:35:21,607-Speed 3299.84 samples/sec   Loss 0.6519   LearningRate 0.0065   Epoch: 14   Global Step: 248960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:35:24,695-Speed 3316.24 samples/sec   Loss 0.6414   LearningRate 0.0065   Epoch: 14   Global Step: 248970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:35:27,765-Speed 3336.61 samples/sec   Loss 0.6683   LearningRate 0.0065   Epoch: 14   Global Step: 248980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:30,950-Speed 3215.88 samples/sec   Loss 0.6733   LearningRate 0.0065   Epoch: 14   Global Step: 248990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:34,016-Speed 3340.61 samples/sec   Loss 0.6580   LearningRate 0.0065   Epoch: 14   Global Step: 249000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:37,091-Speed 3329.89 samples/sec   Loss 0.6328   LearningRate 0.0065   Epoch: 14   Global Step: 249010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:40,158-Speed 3340.28 samples/sec   Loss 0.6568   LearningRate 0.0065   Epoch: 14   Global Step: 249020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:43,225-Speed 3339.27 samples/sec   Loss 0.6248   LearningRate 0.0065   Epoch: 14   Global Step: 249030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:46,309-Speed 3320.92 samples/sec   Loss 0.6373   LearningRate 0.0065   Epoch: 14   Global Step: 249040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:49,404-Speed 3310.34 samples/sec   Loss 0.6493   LearningRate 0.0064   Epoch: 14   Global Step: 249050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:52,482-Speed 3326.82 samples/sec   Loss 0.6509   LearningRate 0.0064   Epoch: 14   Global Step: 249060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:55,555-Speed 3333.35 samples/sec   Loss 0.6558   LearningRate 0.0064   Epoch: 14   Global Step: 249070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:35:58,664-Speed 3294.64 samples/sec   Loss 0.6196   LearningRate 0.0064   Epoch: 14   Global Step: 249080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:36:01,774-Speed 3292.32 samples/sec   Loss 0.6402   LearningRate 0.0064   Epoch: 14   Global Step: 249090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:36:04,895-Speed 3281.98 samples/sec   Loss 0.6444   LearningRate 0.0064   Epoch: 14   Global Step: 249100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:36:07,970-Speed 3331.26 samples/sec   Loss 0.6499   LearningRate 0.0064   Epoch: 14   Global Step: 249110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:36:11,056-Speed 3319.74 samples/sec   Loss 0.6842   LearningRate 0.0064   Epoch: 14   Global Step: 249120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:36:14,121-Speed 3341.38 samples/sec   Loss 0.6638   LearningRate 0.0064   Epoch: 14   Global Step: 249130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:36:17,222-Speed 3302.44 samples/sec   Loss 0.6542   LearningRate 0.0064   Epoch: 14   Global Step: 249140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:36:20,306-Speed 3321.37 samples/sec   Loss 0.6688   LearningRate 0.0064   Epoch: 14   Global Step: 249150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:36:23,378-Speed 3333.99 samples/sec   Loss 0.6392   LearningRate 0.0064   Epoch: 14   Global Step: 249160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:36:26,433-Speed 3352.63 samples/sec   Loss 0.6281   LearningRate 0.0064   Epoch: 14   Global Step: 249170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:29,522-Speed 3316.05 samples/sec   Loss 0.6327   LearningRate 0.0064   Epoch: 14   Global Step: 249180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:32,598-Speed 3328.91 samples/sec   Loss 0.6459   LearningRate 0.0064   Epoch: 14   Global Step: 249190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:35,695-Speed 3308.00 samples/sec   Loss 0.6530   LearningRate 0.0064   Epoch: 14   Global Step: 249200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:38,759-Speed 3342.30 samples/sec   Loss 0.6363   LearningRate 0.0064   Epoch: 14   Global Step: 249210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:41,974-Speed 3186.09 samples/sec   Loss 0.6266   LearningRate 0.0064   Epoch: 14   Global Step: 249220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:45,061-Speed 3317.22 samples/sec   Loss 0.6656   LearningRate 0.0064   Epoch: 14   Global Step: 249230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:48,138-Speed 3328.94 samples/sec   Loss 0.6605   LearningRate 0.0064   Epoch: 14   Global Step: 249240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:51,207-Speed 3337.63 samples/sec   Loss 0.6473   LearningRate 0.0064   Epoch: 14   Global Step: 249250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:54,280-Speed 3332.65 samples/sec   Loss 0.6631   LearningRate 0.0064   Epoch: 14   Global Step: 249260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:36:57,350-Speed 3336.82 samples/sec   Loss 0.6613   LearningRate 0.0064   Epoch: 14   Global Step: 249270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:00,414-Speed 3342.63 samples/sec   Loss 0.6275   LearningRate 0.0064   Epoch: 14   Global Step: 249280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:03,486-Speed 3333.72 samples/sec   Loss 0.6432   LearningRate 0.0064   Epoch: 14   Global Step: 249290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:06,570-Speed 3321.35 samples/sec   Loss 0.6243   LearningRate 0.0064   Epoch: 14   Global Step: 249300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:09,656-Speed 3319.39 samples/sec   Loss 0.6678   LearningRate 0.0064   Epoch: 14   Global Step: 249310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:12,720-Speed 3342.73 samples/sec   Loss 0.6291   LearningRate 0.0064   Epoch: 14   Global Step: 249320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:15,790-Speed 3336.94 samples/sec   Loss 0.6299   LearningRate 0.0064   Epoch: 14   Global Step: 249330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:18,859-Speed 3336.89 samples/sec   Loss 0.6200   LearningRate 0.0064   Epoch: 14   Global Step: 249340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:21,931-Speed 3333.94 samples/sec   Loss 0.6727   LearningRate 0.0064   Epoch: 14   Global Step: 249350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:25,011-Speed 3325.19 samples/sec   Loss 0.6417   LearningRate 0.0064   Epoch: 14   Global Step: 249360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:28,088-Speed 3328.48 samples/sec   Loss 0.6714   LearningRate 0.0064   Epoch: 14   Global Step: 249370   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-12 01:37:31,153-Speed 3342.66 samples/sec   Loss 0.6394   LearningRate 0.0064   Epoch: 14   Global Step: 249380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:34,222-Speed 3337.38 samples/sec   Loss 0.6199   LearningRate 0.0064   Epoch: 14   Global Step: 249390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:37,321-Speed 3305.39 samples/sec   Loss 0.6370   LearningRate 0.0064   Epoch: 14   Global Step: 249400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:40,406-Speed 3320.25 samples/sec   Loss 0.6773   LearningRate 0.0064   Epoch: 14   Global Step: 249410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:43,576-Speed 3230.36 samples/sec   Loss 0.6629   LearningRate 0.0064   Epoch: 14   Global Step: 249420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:46,793-Speed 3183.62 samples/sec   Loss 0.6508   LearningRate 0.0064   Epoch: 14   Global Step: 249430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:49,870-Speed 3328.95 samples/sec   Loss 0.6592   LearningRate 0.0064   Epoch: 14   Global Step: 249440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:52,959-Speed 3315.91 samples/sec   Loss 0.6658   LearningRate 0.0064   Epoch: 14   Global Step: 249450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:37:56,022-Speed 3344.20 samples/sec   Loss 0.6321   LearningRate 0.0064   Epoch: 14   Global Step: 249460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:37:59,088-Speed 3340.49 samples/sec   Loss 0.6369   LearningRate 0.0064   Epoch: 14   Global Step: 249470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:38:02,190-Speed 3301.58 samples/sec   Loss 0.6372   LearningRate 0.0064   Epoch: 14   Global Step: 249480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:38:05,304-Speed 3289.30 samples/sec   Loss 0.6298   LearningRate 0.0064   Epoch: 14   Global Step: 249490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:38:08,378-Speed 3331.99 samples/sec   Loss 0.6496   LearningRate 0.0064   Epoch: 14   Global Step: 249500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:38:11,445-Speed 3339.20 samples/sec   Loss 0.6522   LearningRate 0.0064   Epoch: 14   Global Step: 249510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:38:14,525-Speed 3325.42 samples/sec   Loss 0.6636   LearningRate 0.0064   Epoch: 14   Global Step: 249520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:38:17,604-Speed 3327.02 samples/sec   Loss 0.6363   LearningRate 0.0064   Epoch: 14   Global Step: 249530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:38:20,681-Speed 3328.04 samples/sec   Loss 0.6422   LearningRate 0.0064   Epoch: 14   Global Step: 249540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:38:23,761-Speed 3325.62 samples/sec   Loss 0.6464   LearningRate 0.0064   Epoch: 14   Global Step: 249550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:38:26,859-Speed 3305.94 samples/sec   Loss 0.7001   LearningRate 0.0064   Epoch: 14   Global Step: 249560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:38:29,934-Speed 3331.02 samples/sec   Loss 0.6548   LearningRate 0.0064   Epoch: 14   Global Step: 249570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:38:33,021-Speed 3318.39 samples/sec   Loss 0.6183   LearningRate 0.0064   Epoch: 14   Global Step: 249580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:38:36,089-Speed 3337.69 samples/sec   Loss 0.6402   LearningRate 0.0064   Epoch: 14   Global Step: 249590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:38:39,170-Speed 3324.29 samples/sec   Loss 0.6242   LearningRate 0.0064   Epoch: 14   Global Step: 249600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:38:42,233-Speed 3343.88 samples/sec   Loss 0.6389   LearningRate 0.0064   Epoch: 14   Global Step: 249610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:38:45,315-Speed 3323.86 samples/sec   Loss 0.6215   LearningRate 0.0064   Epoch: 14   Global Step: 249620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:38:48,391-Speed 3329.45 samples/sec   Loss 0.6574   LearningRate 0.0064   Epoch: 14   Global Step: 249630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:38:51,468-Speed 3329.09 samples/sec   Loss 0.6608   LearningRate 0.0064   Epoch: 14   Global Step: 249640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:38:54,552-Speed 3320.33 samples/sec   Loss 0.6219   LearningRate 0.0064   Epoch: 14   Global Step: 249650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:38:57,646-Speed 3310.95 samples/sec   Loss 0.6137   LearningRate 0.0064   Epoch: 14   Global Step: 249660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:00,733-Speed 3318.67 samples/sec   Loss 0.6492   LearningRate 0.0064   Epoch: 14   Global Step: 249670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:03,816-Speed 3321.47 samples/sec   Loss 0.6754   LearningRate 0.0064   Epoch: 14   Global Step: 249680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:06,919-Speed 3300.76 samples/sec   Loss 0.6181   LearningRate 0.0064   Epoch: 14   Global Step: 249690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:10,040-Speed 3281.66 samples/sec   Loss 0.6442   LearningRate 0.0064   Epoch: 14   Global Step: 249700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:13,166-Speed 3277.13 samples/sec   Loss 0.6753   LearningRate 0.0063   Epoch: 14   Global Step: 249710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:39:16,413-Speed 3153.77 samples/sec   Loss 0.6597   LearningRate 0.0063   Epoch: 14   Global Step: 249720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:39:19,523-Speed 3293.68 samples/sec   Loss 0.6344   LearningRate 0.0063   Epoch: 14   Global Step: 249730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:39:22,620-Speed 3307.67 samples/sec   Loss 0.6597   LearningRate 0.0063   Epoch: 14   Global Step: 249740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:39:25,690-Speed 3336.10 samples/sec   Loss 0.6463   LearningRate 0.0063   Epoch: 14   Global Step: 249750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:39:28,752-Speed 3344.97 samples/sec   Loss 0.6309   LearningRate 0.0063   Epoch: 14   Global Step: 249760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:31,872-Speed 3282.49 samples/sec   Loss 0.6563   LearningRate 0.0063   Epoch: 14   Global Step: 249770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:34,949-Speed 3328.76 samples/sec   Loss 0.6412   LearningRate 0.0063   Epoch: 14   Global Step: 249780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:38,039-Speed 3314.40 samples/sec   Loss 0.6463   LearningRate 0.0063   Epoch: 14   Global Step: 249790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:41,113-Speed 3332.55 samples/sec   Loss 0.6404   LearningRate 0.0063   Epoch: 14   Global Step: 249800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:44,202-Speed 3314.77 samples/sec   Loss 0.6709   LearningRate 0.0063   Epoch: 14   Global Step: 249810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:47,273-Speed 3336.41 samples/sec   Loss 0.6302   LearningRate 0.0063   Epoch: 14   Global Step: 249820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:50,407-Speed 3267.75 samples/sec   Loss 0.6478   LearningRate 0.0063   Epoch: 14   Global Step: 249830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:53,650-Speed 3157.99 samples/sec   Loss 0.6835   LearningRate 0.0063   Epoch: 14   Global Step: 249840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:56,727-Speed 3328.99 samples/sec   Loss 0.6382   LearningRate 0.0063   Epoch: 14   Global Step: 249850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:39:59,814-Speed 3317.79 samples/sec   Loss 0.6278   LearningRate 0.0063   Epoch: 14   Global Step: 249860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:02,945-Speed 3271.60 samples/sec   Loss 0.6526   LearningRate 0.0063   Epoch: 14   Global Step: 249870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:06,021-Speed 3330.04 samples/sec   Loss 0.6386   LearningRate 0.0063   Epoch: 14   Global Step: 249880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:09,094-Speed 3332.54 samples/sec   Loss 0.6526   LearningRate 0.0063   Epoch: 14   Global Step: 249890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:12,178-Speed 3320.60 samples/sec   Loss 0.6480   LearningRate 0.0063   Epoch: 14   Global Step: 249900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:15,256-Speed 3327.55 samples/sec   Loss 0.6381   LearningRate 0.0063   Epoch: 14   Global Step: 249910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:18,341-Speed 3319.91 samples/sec   Loss 0.6643   LearningRate 0.0063   Epoch: 14   Global Step: 249920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:21,417-Speed 3330.25 samples/sec   Loss 0.6303   LearningRate 0.0063   Epoch: 14   Global Step: 249930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:24,504-Speed 3318.46 samples/sec   Loss 0.6768   LearningRate 0.0063   Epoch: 14   Global Step: 249940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:27,575-Speed 3335.15 samples/sec   Loss 0.6135   LearningRate 0.0063   Epoch: 14   Global Step: 249950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:40:30,656-Speed 3323.54 samples/sec   Loss 0.6559   LearningRate 0.0063   Epoch: 14   Global Step: 249960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:40:33,749-Speed 3312.15 samples/sec   Loss 0.6657   LearningRate 0.0063   Epoch: 14   Global Step: 249970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:40:36,829-Speed 3325.18 samples/sec   Loss 0.6400   LearningRate 0.0063   Epoch: 14   Global Step: 249980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:40:39,907-Speed 3327.42 samples/sec   Loss 0.6551   LearningRate 0.0063   Epoch: 14   Global Step: 249990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:40:42,987-Speed 3325.81 samples/sec   Loss 0.6617   LearningRate 0.0063   Epoch: 14   Global Step: 250000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:41:26,937-[lfw][250000]XNorm: 21.483194
Training: 2022-04-12 01:41:26,938-[lfw][250000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-12 01:41:26,938-[lfw][250000]Accuracy-Highest: 0.99817
Training: 2022-04-12 01:42:18,215-[cfp_fp][250000]XNorm: 22.757849
Training: 2022-04-12 01:42:18,215-[cfp_fp][250000]Accuracy-Flip: 0.99071+-0.00395
Training: 2022-04-12 01:42:18,216-[cfp_fp][250000]Accuracy-Highest: 0.99186
Training: 2022-04-12 01:43:02,434-[agedb_30][250000]XNorm: 23.345549
Training: 2022-04-12 01:43:02,435-[agedb_30][250000]Accuracy-Flip: 0.98450+-0.00506
Training: 2022-04-12 01:43:02,435-[agedb_30][250000]Accuracy-Highest: 0.98567
Training: 2022-04-12 01:43:05,508-Speed 71.85 samples/sec   Loss 0.6325   LearningRate 0.0063   Epoch: 14   Global Step: 250010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:43:08,560-Speed 3355.37 samples/sec   Loss 0.6160   LearningRate 0.0063   Epoch: 14   Global Step: 250020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:43:11,665-Speed 3298.64 samples/sec   Loss 0.6255   LearningRate 0.0063   Epoch: 14   Global Step: 250030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:43:14,755-Speed 3314.17 samples/sec   Loss 0.6721   LearningRate 0.0063   Epoch: 14   Global Step: 250040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:43:17,824-Speed 3338.55 samples/sec   Loss 0.6294   LearningRate 0.0063   Epoch: 14   Global Step: 250050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:43:20,873-Speed 3359.29 samples/sec   Loss 0.5989   LearningRate 0.0063   Epoch: 14   Global Step: 250060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:43:23,935-Speed 3345.08 samples/sec   Loss 0.6474   LearningRate 0.0063   Epoch: 14   Global Step: 250070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:43:26,981-Speed 3363.42 samples/sec   Loss 0.6390   LearningRate 0.0063   Epoch: 14   Global Step: 250080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:30,075-Speed 3310.02 samples/sec   Loss 0.6639   LearningRate 0.0063   Epoch: 14   Global Step: 250090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:33,262-Speed 3213.77 samples/sec   Loss 0.6665   LearningRate 0.0063   Epoch: 14   Global Step: 250100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:36,368-Speed 3296.81 samples/sec   Loss 0.6309   LearningRate 0.0063   Epoch: 14   Global Step: 250110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:39,457-Speed 3315.63 samples/sec   Loss 0.6477   LearningRate 0.0063   Epoch: 14   Global Step: 250120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:42,524-Speed 3339.57 samples/sec   Loss 0.6574   LearningRate 0.0063   Epoch: 14   Global Step: 250130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:45,621-Speed 3308.40 samples/sec   Loss 0.6640   LearningRate 0.0063   Epoch: 14   Global Step: 250140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:48,741-Speed 3282.00 samples/sec   Loss 0.6717   LearningRate 0.0063   Epoch: 14   Global Step: 250150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:51,891-Speed 3251.28 samples/sec   Loss 0.6501   LearningRate 0.0063   Epoch: 14   Global Step: 250160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:54,951-Speed 3347.03 samples/sec   Loss 0.6360   LearningRate 0.0063   Epoch: 14   Global Step: 250170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:43:58,070-Speed 3283.91 samples/sec   Loss 0.6229   LearningRate 0.0063   Epoch: 14   Global Step: 250180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:44:01,137-Speed 3340.41 samples/sec   Loss 0.6434   LearningRate 0.0063   Epoch: 14   Global Step: 250190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:44:04,201-Speed 3342.36 samples/sec   Loss 0.6545   LearningRate 0.0063   Epoch: 14   Global Step: 250200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:07,273-Speed 3333.71 samples/sec   Loss 0.6175   LearningRate 0.0063   Epoch: 14   Global Step: 250210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:10,423-Speed 3251.37 samples/sec   Loss 0.6477   LearningRate 0.0063   Epoch: 14   Global Step: 250220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:13,502-Speed 3326.64 samples/sec   Loss 0.6658   LearningRate 0.0063   Epoch: 14   Global Step: 250230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:16,570-Speed 3338.57 samples/sec   Loss 0.6078   LearningRate 0.0063   Epoch: 14   Global Step: 250240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:19,642-Speed 3333.99 samples/sec   Loss 0.6420   LearningRate 0.0063   Epoch: 14   Global Step: 250250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:22,715-Speed 3333.67 samples/sec   Loss 0.6387   LearningRate 0.0063   Epoch: 14   Global Step: 250260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:25,776-Speed 3345.29 samples/sec   Loss 0.6177   LearningRate 0.0063   Epoch: 14   Global Step: 250270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:28,840-Speed 3343.00 samples/sec   Loss 0.6436   LearningRate 0.0063   Epoch: 14   Global Step: 250280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:31,958-Speed 3285.15 samples/sec   Loss 0.6535   LearningRate 0.0063   Epoch: 14   Global Step: 250290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:35,038-Speed 3324.92 samples/sec   Loss 0.6430   LearningRate 0.0063   Epoch: 14   Global Step: 250300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:44:38,109-Speed 3335.70 samples/sec   Loss 0.6454   LearningRate 0.0063   Epoch: 14   Global Step: 250310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:44:41,180-Speed 3335.48 samples/sec   Loss 0.6342   LearningRate 0.0063   Epoch: 14   Global Step: 250320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:44,250-Speed 3336.59 samples/sec   Loss 0.6483   LearningRate 0.0063   Epoch: 14   Global Step: 250330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:47,320-Speed 3336.28 samples/sec   Loss 0.6601   LearningRate 0.0063   Epoch: 14   Global Step: 250340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:50,416-Speed 3307.35 samples/sec   Loss 0.6402   LearningRate 0.0063   Epoch: 14   Global Step: 250350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:44:53,828-Speed 3001.68 samples/sec   Loss 0.6288   LearningRate 0.0063   Epoch: 14   Global Step: 250360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:45:36,087-Speed 242.33 samples/sec   Loss 0.5261   LearningRate 0.0062   Epoch: 15   Global Step: 250370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:45:39,676-Speed 2854.11 samples/sec   Loss 0.3958   LearningRate 0.0062   Epoch: 15   Global Step: 250380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:45:42,790-Speed 3289.19 samples/sec   Loss 0.3581   LearningRate 0.0062   Epoch: 15   Global Step: 250390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:45:45,900-Speed 3293.34 samples/sec   Loss 0.3862   LearningRate 0.0062   Epoch: 15   Global Step: 250400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:45:48,996-Speed 3308.22 samples/sec   Loss 0.3747   LearningRate 0.0062   Epoch: 15   Global Step: 250410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:45:52,091-Speed 3309.84 samples/sec   Loss 0.3923   LearningRate 0.0062   Epoch: 15   Global Step: 250420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:45:55,268-Speed 3224.26 samples/sec   Loss 0.3955   LearningRate 0.0062   Epoch: 15   Global Step: 250430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:45:58,432-Speed 3237.13 samples/sec   Loss 0.3671   LearningRate 0.0062   Epoch: 15   Global Step: 250440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:46:01,494-Speed 3344.56 samples/sec   Loss 0.3491   LearningRate 0.0062   Epoch: 15   Global Step: 250450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:46:04,577-Speed 3322.38 samples/sec   Loss 0.3742   LearningRate 0.0062   Epoch: 15   Global Step: 250460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:46:07,653-Speed 3328.97 samples/sec   Loss 0.3740   LearningRate 0.0062   Epoch: 15   Global Step: 250470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:46:10,724-Speed 3335.41 samples/sec   Loss 0.3798   LearningRate 0.0062   Epoch: 15   Global Step: 250480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:46:13,790-Speed 3340.79 samples/sec   Loss 0.3665   LearningRate 0.0062   Epoch: 15   Global Step: 250490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:46:16,861-Speed 3334.93 samples/sec   Loss 0.3489   LearningRate 0.0062   Epoch: 15   Global Step: 250500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:46:19,930-Speed 3338.49 samples/sec   Loss 0.3647   LearningRate 0.0062   Epoch: 15   Global Step: 250510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:46:23,080-Speed 3251.06 samples/sec   Loss 0.3725   LearningRate 0.0062   Epoch: 15   Global Step: 250520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:46:26,421-Speed 3065.35 samples/sec   Loss 0.3842   LearningRate 0.0062   Epoch: 15   Global Step: 250530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:29,528-Speed 3296.56 samples/sec   Loss 0.3617   LearningRate 0.0062   Epoch: 15   Global Step: 250540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:32,616-Speed 3316.64 samples/sec   Loss 0.3792   LearningRate 0.0062   Epoch: 15   Global Step: 250550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:35,738-Speed 3280.77 samples/sec   Loss 0.3776   LearningRate 0.0062   Epoch: 15   Global Step: 250560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:38,901-Speed 3238.34 samples/sec   Loss 0.3800   LearningRate 0.0062   Epoch: 15   Global Step: 250570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:42,490-Speed 2853.63 samples/sec   Loss 0.3654   LearningRate 0.0062   Epoch: 15   Global Step: 250580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:45,557-Speed 3339.73 samples/sec   Loss 0.3604   LearningRate 0.0062   Epoch: 15   Global Step: 250590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:48,979-Speed 2992.86 samples/sec   Loss 0.3640   LearningRate 0.0062   Epoch: 15   Global Step: 250600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:52,042-Speed 3344.24 samples/sec   Loss 0.3708   LearningRate 0.0062   Epoch: 15   Global Step: 250610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:55,132-Speed 3314.92 samples/sec   Loss 0.3598   LearningRate 0.0062   Epoch: 15   Global Step: 250620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:46:58,215-Speed 3321.81 samples/sec   Loss 0.3800   LearningRate 0.0062   Epoch: 15   Global Step: 250630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:01,280-Speed 3342.21 samples/sec   Loss 0.3706   LearningRate 0.0062   Epoch: 15   Global Step: 250640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:04,377-Speed 3306.98 samples/sec   Loss 0.3842   LearningRate 0.0062   Epoch: 15   Global Step: 250650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:07,525-Speed 3253.49 samples/sec   Loss 0.3717   LearningRate 0.0062   Epoch: 15   Global Step: 250660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:10,600-Speed 3330.16 samples/sec   Loss 0.3598   LearningRate 0.0062   Epoch: 15   Global Step: 250670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:13,658-Speed 3350.22 samples/sec   Loss 0.3948   LearningRate 0.0062   Epoch: 15   Global Step: 250680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:16,754-Speed 3308.01 samples/sec   Loss 0.3822   LearningRate 0.0062   Epoch: 15   Global Step: 250690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:19,820-Speed 3341.13 samples/sec   Loss 0.3685   LearningRate 0.0062   Epoch: 15   Global Step: 250700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:22,888-Speed 3338.01 samples/sec   Loss 0.3660   LearningRate 0.0062   Epoch: 15   Global Step: 250710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:25,948-Speed 3347.16 samples/sec   Loss 0.3696   LearningRate 0.0062   Epoch: 15   Global Step: 250720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:47:29,096-Speed 3253.34 samples/sec   Loss 0.3538   LearningRate 0.0062   Epoch: 15   Global Step: 250730   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-12 01:47:32,195-Speed 3305.16 samples/sec   Loss 0.3916   LearningRate 0.0062   Epoch: 15   Global Step: 250740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:47:35,254-Speed 3348.01 samples/sec   Loss 0.3730   LearningRate 0.0062   Epoch: 15   Global Step: 250750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:47:38,312-Speed 3349.27 samples/sec   Loss 0.3625   LearningRate 0.0062   Epoch: 15   Global Step: 250760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:47:41,374-Speed 3345.53 samples/sec   Loss 0.3678   LearningRate 0.0062   Epoch: 15   Global Step: 250770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:47:44,432-Speed 3349.47 samples/sec   Loss 0.3870   LearningRate 0.0062   Epoch: 15   Global Step: 250780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:47:47,542-Speed 3293.31 samples/sec   Loss 0.4089   LearningRate 0.0062   Epoch: 15   Global Step: 250790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:47:50,615-Speed 3332.40 samples/sec   Loss 0.3706   LearningRate 0.0062   Epoch: 15   Global Step: 250800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:47:53,669-Speed 3354.44 samples/sec   Loss 0.3719   LearningRate 0.0062   Epoch: 15   Global Step: 250810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:47:56,789-Speed 3282.66 samples/sec   Loss 0.4042   LearningRate 0.0062   Epoch: 15   Global Step: 250820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:47:59,848-Speed 3348.74 samples/sec   Loss 0.3807   LearningRate 0.0062   Epoch: 15   Global Step: 250830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:48:02,903-Speed 3352.53 samples/sec   Loss 0.3794   LearningRate 0.0062   Epoch: 15   Global Step: 250840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:05,964-Speed 3345.18 samples/sec   Loss 0.3498   LearningRate 0.0062   Epoch: 15   Global Step: 250850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:09,037-Speed 3333.40 samples/sec   Loss 0.3728   LearningRate 0.0062   Epoch: 15   Global Step: 250860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:12,094-Speed 3350.70 samples/sec   Loss 0.3686   LearningRate 0.0062   Epoch: 15   Global Step: 250870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:15,146-Speed 3355.16 samples/sec   Loss 0.3841   LearningRate 0.0062   Epoch: 15   Global Step: 250880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:18,211-Speed 3342.96 samples/sec   Loss 0.3701   LearningRate 0.0062   Epoch: 15   Global Step: 250890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:21,263-Speed 3355.84 samples/sec   Loss 0.3693   LearningRate 0.0062   Epoch: 15   Global Step: 250900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:24,343-Speed 3325.00 samples/sec   Loss 0.3707   LearningRate 0.0062   Epoch: 15   Global Step: 250910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:27,500-Speed 3244.57 samples/sec   Loss 0.3845   LearningRate 0.0062   Epoch: 15   Global Step: 250920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:30,558-Speed 3348.69 samples/sec   Loss 0.3640   LearningRate 0.0062   Epoch: 15   Global Step: 250930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:33,616-Speed 3349.69 samples/sec   Loss 0.3745   LearningRate 0.0062   Epoch: 15   Global Step: 250940   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-12 01:48:36,714-Speed 3305.78 samples/sec   Loss 0.3693   LearningRate 0.0062   Epoch: 15   Global Step: 250950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:39,774-Speed 3347.92 samples/sec   Loss 0.3794   LearningRate 0.0062   Epoch: 15   Global Step: 250960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:42,841-Speed 3339.52 samples/sec   Loss 0.3675   LearningRate 0.0062   Epoch: 15   Global Step: 250970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:45,897-Speed 3351.37 samples/sec   Loss 0.3751   LearningRate 0.0062   Epoch: 15   Global Step: 250980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:48,978-Speed 3324.89 samples/sec   Loss 0.3637   LearningRate 0.0062   Epoch: 15   Global Step: 250990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:52,050-Speed 3333.84 samples/sec   Loss 0.3653   LearningRate 0.0062   Epoch: 15   Global Step: 251000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:55,163-Speed 3290.29 samples/sec   Loss 0.3664   LearningRate 0.0062   Epoch: 15   Global Step: 251010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:48:58,224-Speed 3345.85 samples/sec   Loss 0.3845   LearningRate 0.0062   Epoch: 15   Global Step: 251020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:01,280-Speed 3351.19 samples/sec   Loss 0.3682   LearningRate 0.0062   Epoch: 15   Global Step: 251030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:04,337-Speed 3350.99 samples/sec   Loss 0.3516   LearningRate 0.0061   Epoch: 15   Global Step: 251040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:07,399-Speed 3344.24 samples/sec   Loss 0.3672   LearningRate 0.0061   Epoch: 15   Global Step: 251050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:10,476-Speed 3329.15 samples/sec   Loss 0.3841   LearningRate 0.0061   Epoch: 15   Global Step: 251060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:13,539-Speed 3344.47 samples/sec   Loss 0.3823   LearningRate 0.0061   Epoch: 15   Global Step: 251070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:16,636-Speed 3306.54 samples/sec   Loss 0.3674   LearningRate 0.0061   Epoch: 15   Global Step: 251080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:19,731-Speed 3309.67 samples/sec   Loss 0.3707   LearningRate 0.0061   Epoch: 15   Global Step: 251090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:22,811-Speed 3325.57 samples/sec   Loss 0.3908   LearningRate 0.0061   Epoch: 15   Global Step: 251100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:25,867-Speed 3351.31 samples/sec   Loss 0.3689   LearningRate 0.0061   Epoch: 15   Global Step: 251110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:28,932-Speed 3341.46 samples/sec   Loss 0.3930   LearningRate 0.0061   Epoch: 15   Global Step: 251120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:31,991-Speed 3348.07 samples/sec   Loss 0.3665   LearningRate 0.0061   Epoch: 15   Global Step: 251130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:35,063-Speed 3333.74 samples/sec   Loss 0.3665   LearningRate 0.0061   Epoch: 15   Global Step: 251140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:38,119-Speed 3351.95 samples/sec   Loss 0.3697   LearningRate 0.0061   Epoch: 15   Global Step: 251150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:41,182-Speed 3344.13 samples/sec   Loss 0.3567   LearningRate 0.0061   Epoch: 15   Global Step: 251160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:44,257-Speed 3330.98 samples/sec   Loss 0.3970   LearningRate 0.0061   Epoch: 15   Global Step: 251170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:47,327-Speed 3336.27 samples/sec   Loss 0.3891   LearningRate 0.0061   Epoch: 15   Global Step: 251180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:50,383-Speed 3351.79 samples/sec   Loss 0.3591   LearningRate 0.0061   Epoch: 15   Global Step: 251190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:49:53,539-Speed 3244.68 samples/sec   Loss 0.3630   LearningRate 0.0061   Epoch: 15   Global Step: 251200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:56,622-Speed 3322.31 samples/sec   Loss 0.3885   LearningRate 0.0061   Epoch: 15   Global Step: 251210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:49:59,746-Speed 3279.16 samples/sec   Loss 0.3698   LearningRate 0.0061   Epoch: 15   Global Step: 251220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:50:02,834-Speed 3316.81 samples/sec   Loss 0.3762   LearningRate 0.0061   Epoch: 15   Global Step: 251230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:50:05,882-Speed 3360.40 samples/sec   Loss 0.3610   LearningRate 0.0061   Epoch: 15   Global Step: 251240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:08,950-Speed 3338.54 samples/sec   Loss 0.3705   LearningRate 0.0061   Epoch: 15   Global Step: 251250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:12,045-Speed 3309.55 samples/sec   Loss 0.3731   LearningRate 0.0061   Epoch: 15   Global Step: 251260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:15,175-Speed 3272.42 samples/sec   Loss 0.3816   LearningRate 0.0061   Epoch: 15   Global Step: 251270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:18,235-Speed 3347.33 samples/sec   Loss 0.3662   LearningRate 0.0061   Epoch: 15   Global Step: 251280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:21,331-Speed 3307.64 samples/sec   Loss 0.3993   LearningRate 0.0061   Epoch: 15   Global Step: 251290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:24,419-Speed 3316.42 samples/sec   Loss 0.3666   LearningRate 0.0061   Epoch: 15   Global Step: 251300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:27,539-Speed 3282.79 samples/sec   Loss 0.3888   LearningRate 0.0061   Epoch: 15   Global Step: 251310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:30,613-Speed 3331.84 samples/sec   Loss 0.3758   LearningRate 0.0061   Epoch: 15   Global Step: 251320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:33,675-Speed 3345.08 samples/sec   Loss 0.3754   LearningRate 0.0061   Epoch: 15   Global Step: 251330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:50:36,777-Speed 3302.94 samples/sec   Loss 0.3789   LearningRate 0.0061   Epoch: 15   Global Step: 251340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:50:39,845-Speed 3337.91 samples/sec   Loss 0.3533   LearningRate 0.0061   Epoch: 15   Global Step: 251350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:50:42,939-Speed 3309.98 samples/sec   Loss 0.3928   LearningRate 0.0061   Epoch: 15   Global Step: 251360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:50:46,001-Speed 3344.83 samples/sec   Loss 0.3728   LearningRate 0.0061   Epoch: 15   Global Step: 251370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:50:49,087-Speed 3319.09 samples/sec   Loss 0.3858   LearningRate 0.0061   Epoch: 15   Global Step: 251380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:50:52,227-Speed 3262.18 samples/sec   Loss 0.3644   LearningRate 0.0061   Epoch: 15   Global Step: 251390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:50:55,291-Speed 3343.02 samples/sec   Loss 0.3836   LearningRate 0.0061   Epoch: 15   Global Step: 251400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:50:58,353-Speed 3344.33 samples/sec   Loss 0.3902   LearningRate 0.0061   Epoch: 15   Global Step: 251410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:01,419-Speed 3341.71 samples/sec   Loss 0.3658   LearningRate 0.0061   Epoch: 15   Global Step: 251420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:04,481-Speed 3344.73 samples/sec   Loss 0.3748   LearningRate 0.0061   Epoch: 15   Global Step: 251430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:07,537-Speed 3351.74 samples/sec   Loss 0.3991   LearningRate 0.0061   Epoch: 15   Global Step: 251440   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-12 01:51:10,692-Speed 3246.02 samples/sec   Loss 0.3582   LearningRate 0.0061   Epoch: 15   Global Step: 251450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:13,837-Speed 3256.85 samples/sec   Loss 0.3720   LearningRate 0.0061   Epoch: 15   Global Step: 251460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:16,894-Speed 3349.95 samples/sec   Loss 0.3782   LearningRate 0.0061   Epoch: 15   Global Step: 251470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:19,984-Speed 3315.16 samples/sec   Loss 0.3910   LearningRate 0.0061   Epoch: 15   Global Step: 251480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:23,143-Speed 3241.67 samples/sec   Loss 0.4027   LearningRate 0.0061   Epoch: 15   Global Step: 251490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:26,247-Speed 3299.91 samples/sec   Loss 0.3816   LearningRate 0.0061   Epoch: 15   Global Step: 251500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:29,315-Speed 3338.48 samples/sec   Loss 0.3802   LearningRate 0.0061   Epoch: 15   Global Step: 251510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:51:32,383-Speed 3338.83 samples/sec   Loss 0.3895   LearningRate 0.0061   Epoch: 15   Global Step: 251520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:51:35,485-Speed 3301.51 samples/sec   Loss 0.3913   LearningRate 0.0061   Epoch: 15   Global Step: 251530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:51:38,628-Speed 3259.25 samples/sec   Loss 0.3712   LearningRate 0.0061   Epoch: 15   Global Step: 251540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:51:41,754-Speed 3276.32 samples/sec   Loss 0.3911   LearningRate 0.0061   Epoch: 15   Global Step: 251550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:51:44,826-Speed 3334.08 samples/sec   Loss 0.4026   LearningRate 0.0061   Epoch: 15   Global Step: 251560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:51:47,917-Speed 3313.42 samples/sec   Loss 0.3908   LearningRate 0.0061   Epoch: 15   Global Step: 251570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:51:50,985-Speed 3338.65 samples/sec   Loss 0.3854   LearningRate 0.0061   Epoch: 15   Global Step: 251580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:51:54,049-Speed 3342.42 samples/sec   Loss 0.3859   LearningRate 0.0061   Epoch: 15   Global Step: 251590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:51:57,137-Speed 3316.85 samples/sec   Loss 0.3860   LearningRate 0.0061   Epoch: 15   Global Step: 251600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:52:00,205-Speed 3338.14 samples/sec   Loss 0.3871   LearningRate 0.0061   Epoch: 15   Global Step: 251610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:52:03,272-Speed 3339.48 samples/sec   Loss 0.3591   LearningRate 0.0061   Epoch: 15   Global Step: 251620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:52:06,343-Speed 3337.48 samples/sec   Loss 0.3678   LearningRate 0.0061   Epoch: 15   Global Step: 251630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:52:09,423-Speed 3325.44 samples/sec   Loss 0.3775   LearningRate 0.0061   Epoch: 15   Global Step: 251640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:52:14,589-Speed 1982.53 samples/sec   Loss 0.3923   LearningRate 0.0061   Epoch: 15   Global Step: 251650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:52:18,476-Speed 2634.92 samples/sec   Loss 0.3842   LearningRate 0.0061   Epoch: 15   Global Step: 251660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-12 01:52:23,252-Speed 2144.46 samples/sec   Loss 0.3854   LearningRate 0.0061   Epoch: 15   Global Step: 251670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:26,486-Speed 3167.18 samples/sec   Loss 0.3573   LearningRate 0.0061   Epoch: 15   Global Step: 251680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:29,620-Speed 3268.26 samples/sec   Loss 0.3719   LearningRate 0.0061   Epoch: 15   Global Step: 251690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:32,713-Speed 3310.83 samples/sec   Loss 0.3991   LearningRate 0.0061   Epoch: 15   Global Step: 251700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:35,781-Speed 3338.44 samples/sec   Loss 0.3737   LearningRate 0.0061   Epoch: 15   Global Step: 251710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:38,860-Speed 3327.14 samples/sec   Loss 0.3777   LearningRate 0.0060   Epoch: 15   Global Step: 251720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:41,972-Speed 3291.55 samples/sec   Loss 0.3825   LearningRate 0.0060   Epoch: 15   Global Step: 251730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:45,038-Speed 3340.08 samples/sec   Loss 0.3729   LearningRate 0.0060   Epoch: 15   Global Step: 251740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:48,112-Speed 3332.01 samples/sec   Loss 0.3719   LearningRate 0.0060   Epoch: 15   Global Step: 251750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:51,193-Speed 3325.08 samples/sec   Loss 0.3741   LearningRate 0.0060   Epoch: 15   Global Step: 251760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:52:54,289-Speed 3308.00 samples/sec   Loss 0.3711   LearningRate 0.0060   Epoch: 15   Global Step: 251770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:52:57,352-Speed 3343.91 samples/sec   Loss 0.3725   LearningRate 0.0060   Epoch: 15   Global Step: 251780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:00,487-Speed 3266.81 samples/sec   Loss 0.3875   LearningRate 0.0060   Epoch: 15   Global Step: 251790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:03,651-Speed 3236.65 samples/sec   Loss 0.3631   LearningRate 0.0060   Epoch: 15   Global Step: 251800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:06,841-Speed 3211.01 samples/sec   Loss 0.3763   LearningRate 0.0060   Epoch: 15   Global Step: 251810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:10,076-Speed 3166.49 samples/sec   Loss 0.3632   LearningRate 0.0060   Epoch: 15   Global Step: 251820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:13,245-Speed 3232.43 samples/sec   Loss 0.3528   LearningRate 0.0060   Epoch: 15   Global Step: 251830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:16,310-Speed 3341.24 samples/sec   Loss 0.3848   LearningRate 0.0060   Epoch: 15   Global Step: 251840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:19,363-Speed 3354.97 samples/sec   Loss 0.3640   LearningRate 0.0060   Epoch: 15   Global Step: 251850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:22,433-Speed 3336.41 samples/sec   Loss 0.3978   LearningRate 0.0060   Epoch: 15   Global Step: 251860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:25,522-Speed 3316.22 samples/sec   Loss 0.3554   LearningRate 0.0060   Epoch: 15   Global Step: 251870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:28,632-Speed 3292.70 samples/sec   Loss 0.3796   LearningRate 0.0060   Epoch: 15   Global Step: 251880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:31,704-Speed 3333.79 samples/sec   Loss 0.3728   LearningRate 0.0060   Epoch: 15   Global Step: 251890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:34,783-Speed 3326.47 samples/sec   Loss 0.3965   LearningRate 0.0060   Epoch: 15   Global Step: 251900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:37,864-Speed 3325.22 samples/sec   Loss 0.3870   LearningRate 0.0060   Epoch: 15   Global Step: 251910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:40,943-Speed 3325.61 samples/sec   Loss 0.3668   LearningRate 0.0060   Epoch: 15   Global Step: 251920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:44,015-Speed 3334.36 samples/sec   Loss 0.3812   LearningRate 0.0060   Epoch: 15   Global Step: 251930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:47,149-Speed 3268.36 samples/sec   Loss 0.4000   LearningRate 0.0060   Epoch: 15   Global Step: 251940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:53:50,297-Speed 3254.38 samples/sec   Loss 0.3991   LearningRate 0.0060   Epoch: 15   Global Step: 251950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:53,406-Speed 3293.43 samples/sec   Loss 0.3755   LearningRate 0.0060   Epoch: 15   Global Step: 251960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:56,491-Speed 3320.13 samples/sec   Loss 0.3896   LearningRate 0.0060   Epoch: 15   Global Step: 251970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:53:59,573-Speed 3323.62 samples/sec   Loss 0.3757   LearningRate 0.0060   Epoch: 15   Global Step: 251980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:54:02,641-Speed 3338.03 samples/sec   Loss 0.3690   LearningRate 0.0060   Epoch: 15   Global Step: 251990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:54:05,708-Speed 3340.26 samples/sec   Loss 0.3862   LearningRate 0.0060   Epoch: 15   Global Step: 252000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:54:50,192-[lfw][252000]XNorm: 21.841688
Training: 2022-04-12 01:54:50,193-[lfw][252000]Accuracy-Flip: 0.99733+-0.00260
Training: 2022-04-12 01:54:50,193-[lfw][252000]Accuracy-Highest: 0.99817
Training: 2022-04-12 01:55:41,798-[cfp_fp][252000]XNorm: 22.973996
Training: 2022-04-12 01:55:41,799-[cfp_fp][252000]Accuracy-Flip: 0.99114+-0.00473
Training: 2022-04-12 01:55:41,799-[cfp_fp][252000]Accuracy-Highest: 0.99186
Training: 2022-04-12 01:56:26,069-[agedb_30][252000]XNorm: 23.481039
Training: 2022-04-12 01:56:26,069-[agedb_30][252000]Accuracy-Flip: 0.98583+-0.00593
Training: 2022-04-12 01:56:26,070-[agedb_30][252000]Accuracy-Highest: 0.98583
Training: 2022-04-12 01:56:29,140-Speed 71.39 samples/sec   Loss 0.3652   LearningRate 0.0060   Epoch: 15   Global Step: 252010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:56:32,267-Speed 3275.33 samples/sec   Loss 0.3706   LearningRate 0.0060   Epoch: 15   Global Step: 252020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:56:35,341-Speed 3332.44 samples/sec   Loss 0.3805   LearningRate 0.0060   Epoch: 15   Global Step: 252030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:56:38,401-Speed 3347.38 samples/sec   Loss 0.3693   LearningRate 0.0060   Epoch: 15   Global Step: 252040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:56:41,471-Speed 3335.80 samples/sec   Loss 0.3583   LearningRate 0.0060   Epoch: 15   Global Step: 252050   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-12 01:56:44,517-Speed 3362.02 samples/sec   Loss 0.3893   LearningRate 0.0060   Epoch: 15   Global Step: 252060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:56:47,619-Speed 3302.37 samples/sec   Loss 0.3678   LearningRate 0.0060   Epoch: 15   Global Step: 252070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:56:50,730-Speed 3292.30 samples/sec   Loss 0.3890   LearningRate 0.0060   Epoch: 15   Global Step: 252080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:56:53,831-Speed 3302.56 samples/sec   Loss 0.4033   LearningRate 0.0060   Epoch: 15   Global Step: 252090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:56:56,899-Speed 3338.07 samples/sec   Loss 0.3841   LearningRate 0.0060   Epoch: 15   Global Step: 252100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:56:59,973-Speed 3332.93 samples/sec   Loss 0.3923   LearningRate 0.0060   Epoch: 15   Global Step: 252110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:57:03,036-Speed 3343.33 samples/sec   Loss 0.4084   LearningRate 0.0060   Epoch: 15   Global Step: 252120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:57:06,103-Speed 3339.74 samples/sec   Loss 0.3910   LearningRate 0.0060   Epoch: 15   Global Step: 252130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:57:09,168-Speed 3342.29 samples/sec   Loss 0.3838   LearningRate 0.0060   Epoch: 15   Global Step: 252140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:57:12,405-Speed 3163.75 samples/sec   Loss 0.3795   LearningRate 0.0060   Epoch: 15   Global Step: 252150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:57:15,476-Speed 3334.95 samples/sec   Loss 0.3937   LearningRate 0.0060   Epoch: 15   Global Step: 252160   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-12 01:57:18,526-Speed 3357.79 samples/sec   Loss 0.4080   LearningRate 0.0060   Epoch: 15   Global Step: 252170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:57:21,598-Speed 3334.09 samples/sec   Loss 0.3746   LearningRate 0.0060   Epoch: 15   Global Step: 252180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:57:24,794-Speed 3205.17 samples/sec   Loss 0.3624   LearningRate 0.0060   Epoch: 15   Global Step: 252190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:57:28,020-Speed 3175.26 samples/sec   Loss 0.3822   LearningRate 0.0060   Epoch: 15   Global Step: 252200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:57:31,226-Speed 3194.15 samples/sec   Loss 0.3945   LearningRate 0.0060   Epoch: 15   Global Step: 252210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:57:34,340-Speed 3289.98 samples/sec   Loss 0.3901   LearningRate 0.0060   Epoch: 15   Global Step: 252220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:57:37,497-Speed 3243.90 samples/sec   Loss 0.4053   LearningRate 0.0060   Epoch: 15   Global Step: 252230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:57:40,625-Speed 3274.69 samples/sec   Loss 0.3981   LearningRate 0.0060   Epoch: 15   Global Step: 252240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:57:43,746-Speed 3281.38 samples/sec   Loss 0.3794   LearningRate 0.0060   Epoch: 15   Global Step: 252250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:57:46,879-Speed 3269.48 samples/sec   Loss 0.3764   LearningRate 0.0060   Epoch: 15   Global Step: 252260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:57:50,003-Speed 3278.35 samples/sec   Loss 0.3770   LearningRate 0.0060   Epoch: 15   Global Step: 252270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:57:53,204-Speed 3199.79 samples/sec   Loss 0.3899   LearningRate 0.0060   Epoch: 15   Global Step: 252280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:57:56,263-Speed 3348.59 samples/sec   Loss 0.3829   LearningRate 0.0060   Epoch: 15   Global Step: 252290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:57:59,327-Speed 3342.96 samples/sec   Loss 0.3707   LearningRate 0.0060   Epoch: 15   Global Step: 252300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:58:02,387-Speed 3346.96 samples/sec   Loss 0.4002   LearningRate 0.0060   Epoch: 15   Global Step: 252310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:05,450-Speed 3343.74 samples/sec   Loss 0.3932   LearningRate 0.0060   Epoch: 15   Global Step: 252320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:08,550-Speed 3304.41 samples/sec   Loss 0.3811   LearningRate 0.0060   Epoch: 15   Global Step: 252330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:11,693-Speed 3258.49 samples/sec   Loss 0.3907   LearningRate 0.0060   Epoch: 15   Global Step: 252340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:14,765-Speed 3334.73 samples/sec   Loss 0.3878   LearningRate 0.0060   Epoch: 15   Global Step: 252350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:17,823-Speed 3349.21 samples/sec   Loss 0.3520   LearningRate 0.0060   Epoch: 15   Global Step: 252360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:20,887-Speed 3342.64 samples/sec   Loss 0.3972   LearningRate 0.0060   Epoch: 15   Global Step: 252370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:23,946-Speed 3348.80 samples/sec   Loss 0.3868   LearningRate 0.0060   Epoch: 15   Global Step: 252380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:27,050-Speed 3300.03 samples/sec   Loss 0.3687   LearningRate 0.0060   Epoch: 15   Global Step: 252390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:30,117-Speed 3339.27 samples/sec   Loss 0.3719   LearningRate 0.0059   Epoch: 15   Global Step: 252400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:58:33,223-Speed 3297.33 samples/sec   Loss 0.3647   LearningRate 0.0059   Epoch: 15   Global Step: 252410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:58:36,288-Speed 3342.07 samples/sec   Loss 0.3497   LearningRate 0.0059   Epoch: 15   Global Step: 252420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:58:39,378-Speed 3314.64 samples/sec   Loss 0.3912   LearningRate 0.0059   Epoch: 15   Global Step: 252430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:58:42,491-Speed 3289.69 samples/sec   Loss 0.4001   LearningRate 0.0059   Epoch: 15   Global Step: 252440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:58:45,615-Speed 3278.99 samples/sec   Loss 0.3727   LearningRate 0.0059   Epoch: 15   Global Step: 252450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:58:48,710-Speed 3308.49 samples/sec   Loss 0.3939   LearningRate 0.0059   Epoch: 15   Global Step: 252460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:58:51,822-Speed 3291.75 samples/sec   Loss 0.3726   LearningRate 0.0059   Epoch: 15   Global Step: 252470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:58:55,020-Speed 3202.66 samples/sec   Loss 0.3587   LearningRate 0.0059   Epoch: 15   Global Step: 252480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:58:58,081-Speed 3345.99 samples/sec   Loss 0.3953   LearningRate 0.0059   Epoch: 15   Global Step: 252490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:01,141-Speed 3347.10 samples/sec   Loss 0.3921   LearningRate 0.0059   Epoch: 15   Global Step: 252500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:04,201-Speed 3347.12 samples/sec   Loss 0.4028   LearningRate 0.0059   Epoch: 15   Global Step: 252510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:59:07,262-Speed 3346.66 samples/sec   Loss 0.3666   LearningRate 0.0059   Epoch: 15   Global Step: 252520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:59:10,326-Speed 3342.79 samples/sec   Loss 0.3846   LearningRate 0.0059   Epoch: 15   Global Step: 252530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:59:13,384-Speed 3349.40 samples/sec   Loss 0.4113   LearningRate 0.0059   Epoch: 15   Global Step: 252540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:16,443-Speed 3347.92 samples/sec   Loss 0.3889   LearningRate 0.0059   Epoch: 15   Global Step: 252550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:19,521-Speed 3328.55 samples/sec   Loss 0.3969   LearningRate 0.0059   Epoch: 15   Global Step: 252560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:22,581-Speed 3346.94 samples/sec   Loss 0.3517   LearningRate 0.0059   Epoch: 15   Global Step: 252570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:25,661-Speed 3325.18 samples/sec   Loss 0.3663   LearningRate 0.0059   Epoch: 15   Global Step: 252580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:28,722-Speed 3345.53 samples/sec   Loss 0.3915   LearningRate 0.0059   Epoch: 15   Global Step: 252590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:31,857-Speed 3267.31 samples/sec   Loss 0.3815   LearningRate 0.0059   Epoch: 15   Global Step: 252600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:34,943-Speed 3318.54 samples/sec   Loss 0.3688   LearningRate 0.0059   Epoch: 15   Global Step: 252610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:38,088-Speed 3257.04 samples/sec   Loss 0.3758   LearningRate 0.0059   Epoch: 15   Global Step: 252620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:41,226-Speed 3264.41 samples/sec   Loss 0.3760   LearningRate 0.0059   Epoch: 15   Global Step: 252630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 01:59:44,307-Speed 3324.67 samples/sec   Loss 0.3832   LearningRate 0.0059   Epoch: 15   Global Step: 252640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:59:47,377-Speed 3335.81 samples/sec   Loss 0.4311   LearningRate 0.0059   Epoch: 15   Global Step: 252650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:59:50,446-Speed 3336.92 samples/sec   Loss 0.3968   LearningRate 0.0059   Epoch: 15   Global Step: 252660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:59:53,509-Speed 3344.30 samples/sec   Loss 0.3765   LearningRate 0.0059   Epoch: 15   Global Step: 252670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:59:56,569-Speed 3347.19 samples/sec   Loss 0.3756   LearningRate 0.0059   Epoch: 15   Global Step: 252680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 01:59:59,652-Speed 3322.41 samples/sec   Loss 0.3871   LearningRate 0.0059   Epoch: 15   Global Step: 252690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:00:02,717-Speed 3341.25 samples/sec   Loss 0.3888   LearningRate 0.0059   Epoch: 15   Global Step: 252700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:00:05,774-Speed 3350.37 samples/sec   Loss 0.3856   LearningRate 0.0059   Epoch: 15   Global Step: 252710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:00:08,866-Speed 3312.23 samples/sec   Loss 0.3817   LearningRate 0.0059   Epoch: 15   Global Step: 252720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:00:11,962-Speed 3308.98 samples/sec   Loss 0.3778   LearningRate 0.0059   Epoch: 15   Global Step: 252730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:15,029-Speed 3339.02 samples/sec   Loss 0.3981   LearningRate 0.0059   Epoch: 15   Global Step: 252740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:18,093-Speed 3342.65 samples/sec   Loss 0.3828   LearningRate 0.0059   Epoch: 15   Global Step: 252750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:21,198-Speed 3299.12 samples/sec   Loss 0.4009   LearningRate 0.0059   Epoch: 15   Global Step: 252760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:24,274-Speed 3330.20 samples/sec   Loss 0.4235   LearningRate 0.0059   Epoch: 15   Global Step: 252770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:27,354-Speed 3324.67 samples/sec   Loss 0.4033   LearningRate 0.0059   Epoch: 15   Global Step: 252780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:30,418-Speed 3342.65 samples/sec   Loss 0.3921   LearningRate 0.0059   Epoch: 15   Global Step: 252790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:33,487-Speed 3337.28 samples/sec   Loss 0.3899   LearningRate 0.0059   Epoch: 15   Global Step: 252800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:36,571-Speed 3320.88 samples/sec   Loss 0.3703   LearningRate 0.0059   Epoch: 15   Global Step: 252810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:39,646-Speed 3331.88 samples/sec   Loss 0.3785   LearningRate 0.0059   Epoch: 15   Global Step: 252820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:00:42,736-Speed 3314.80 samples/sec   Loss 0.3992   LearningRate 0.0059   Epoch: 15   Global Step: 252830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:00:45,807-Speed 3334.98 samples/sec   Loss 0.4110   LearningRate 0.0059   Epoch: 15   Global Step: 252840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:00:48,958-Speed 3250.18 samples/sec   Loss 0.3870   LearningRate 0.0059   Epoch: 15   Global Step: 252850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:00:52,043-Speed 3320.73 samples/sec   Loss 0.3813   LearningRate 0.0059   Epoch: 15   Global Step: 252860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:00:55,174-Speed 3270.65 samples/sec   Loss 0.3889   LearningRate 0.0059   Epoch: 15   Global Step: 252870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:00:58,304-Speed 3272.54 samples/sec   Loss 0.4156   LearningRate 0.0059   Epoch: 15   Global Step: 252880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:01,386-Speed 3323.22 samples/sec   Loss 0.3979   LearningRate 0.0059   Epoch: 15   Global Step: 252890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:04,445-Speed 3347.62 samples/sec   Loss 0.3892   LearningRate 0.0059   Epoch: 15   Global Step: 252900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:07,513-Speed 3339.51 samples/sec   Loss 0.3947   LearningRate 0.0059   Epoch: 15   Global Step: 252910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:10,576-Speed 3344.12 samples/sec   Loss 0.3846   LearningRate 0.0059   Epoch: 15   Global Step: 252920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:13,662-Speed 3318.19 samples/sec   Loss 0.3919   LearningRate 0.0059   Epoch: 15   Global Step: 252930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:16,738-Speed 3329.84 samples/sec   Loss 0.3890   LearningRate 0.0059   Epoch: 15   Global Step: 252940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:19,813-Speed 3330.83 samples/sec   Loss 0.3733   LearningRate 0.0059   Epoch: 15   Global Step: 252950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:22,984-Speed 3230.03 samples/sec   Loss 0.4182   LearningRate 0.0059   Epoch: 15   Global Step: 252960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:26,056-Speed 3334.74 samples/sec   Loss 0.3995   LearningRate 0.0059   Epoch: 15   Global Step: 252970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:29,162-Speed 3297.50 samples/sec   Loss 0.3815   LearningRate 0.0059   Epoch: 15   Global Step: 252980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:32,247-Speed 3319.27 samples/sec   Loss 0.3880   LearningRate 0.0059   Epoch: 15   Global Step: 252990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:35,421-Speed 3227.21 samples/sec   Loss 0.3883   LearningRate 0.0059   Epoch: 15   Global Step: 253000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:38,503-Speed 3324.21 samples/sec   Loss 0.4067   LearningRate 0.0059   Epoch: 15   Global Step: 253010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:41,641-Speed 3262.91 samples/sec   Loss 0.3855   LearningRate 0.0059   Epoch: 15   Global Step: 253020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:44,791-Speed 3252.16 samples/sec   Loss 0.3595   LearningRate 0.0059   Epoch: 15   Global Step: 253030   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-12 02:01:47,839-Speed 3360.55 samples/sec   Loss 0.3878   LearningRate 0.0059   Epoch: 15   Global Step: 253040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:50,926-Speed 3317.77 samples/sec   Loss 0.4066   LearningRate 0.0059   Epoch: 15   Global Step: 253050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:53,987-Speed 3345.16 samples/sec   Loss 0.3672   LearningRate 0.0059   Epoch: 15   Global Step: 253060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:01:57,087-Speed 3304.44 samples/sec   Loss 0.3968   LearningRate 0.0059   Epoch: 15   Global Step: 253070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:02:00,290-Speed 3197.30 samples/sec   Loss 0.4062   LearningRate 0.0058   Epoch: 15   Global Step: 253080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:03,521-Speed 3170.69 samples/sec   Loss 0.4143   LearningRate 0.0058   Epoch: 15   Global Step: 253090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:06,678-Speed 3243.87 samples/sec   Loss 0.3697   LearningRate 0.0058   Epoch: 15   Global Step: 253100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:09,806-Speed 3274.77 samples/sec   Loss 0.3988   LearningRate 0.0058   Epoch: 15   Global Step: 253110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:12,874-Speed 3338.70 samples/sec   Loss 0.3864   LearningRate 0.0058   Epoch: 15   Global Step: 253120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:16,085-Speed 3189.61 samples/sec   Loss 0.3688   LearningRate 0.0058   Epoch: 15   Global Step: 253130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:19,169-Speed 3321.02 samples/sec   Loss 0.3912   LearningRate 0.0058   Epoch: 15   Global Step: 253140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:22,271-Speed 3301.52 samples/sec   Loss 0.3886   LearningRate 0.0058   Epoch: 15   Global Step: 253150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:25,450-Speed 3222.28 samples/sec   Loss 0.3726   LearningRate 0.0058   Epoch: 15   Global Step: 253160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:28,540-Speed 3314.96 samples/sec   Loss 0.3758   LearningRate 0.0058   Epoch: 15   Global Step: 253170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:31,703-Speed 3238.13 samples/sec   Loss 0.4064   LearningRate 0.0058   Epoch: 15   Global Step: 253180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:34,919-Speed 3184.21 samples/sec   Loss 0.3900   LearningRate 0.0058   Epoch: 15   Global Step: 253190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:37,998-Speed 3326.70 samples/sec   Loss 0.3911   LearningRate 0.0058   Epoch: 15   Global Step: 253200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:41,092-Speed 3310.33 samples/sec   Loss 0.3880   LearningRate 0.0058   Epoch: 15   Global Step: 253210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:44,160-Speed 3338.73 samples/sec   Loss 0.3907   LearningRate 0.0058   Epoch: 15   Global Step: 253220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:47,229-Speed 3337.85 samples/sec   Loss 0.4160   LearningRate 0.0058   Epoch: 15   Global Step: 253230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:50,298-Speed 3337.13 samples/sec   Loss 0.3905   LearningRate 0.0058   Epoch: 15   Global Step: 253240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:53,364-Speed 3340.64 samples/sec   Loss 0.3958   LearningRate 0.0058   Epoch: 15   Global Step: 253250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:56,424-Speed 3347.51 samples/sec   Loss 0.4100   LearningRate 0.0058   Epoch: 15   Global Step: 253260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:02:59,519-Speed 3309.47 samples/sec   Loss 0.3980   LearningRate 0.0058   Epoch: 15   Global Step: 253270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:02,584-Speed 3341.49 samples/sec   Loss 0.4122   LearningRate 0.0058   Epoch: 15   Global Step: 253280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:03:05,646-Speed 3344.46 samples/sec   Loss 0.4104   LearningRate 0.0058   Epoch: 15   Global Step: 253290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:03:08,709-Speed 3343.88 samples/sec   Loss 0.4000   LearningRate 0.0058   Epoch: 15   Global Step: 253300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:03:11,871-Speed 3240.01 samples/sec   Loss 0.3749   LearningRate 0.0058   Epoch: 15   Global Step: 253310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:03:14,961-Speed 3314.31 samples/sec   Loss 0.4145   LearningRate 0.0058   Epoch: 15   Global Step: 253320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:03:18,155-Speed 3207.02 samples/sec   Loss 0.3857   LearningRate 0.0058   Epoch: 15   Global Step: 253330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:03:21,231-Speed 3329.86 samples/sec   Loss 0.3815   LearningRate 0.0058   Epoch: 15   Global Step: 253340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:03:24,459-Speed 3172.99 samples/sec   Loss 0.3892   LearningRate 0.0058   Epoch: 15   Global Step: 253350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:03:27,515-Speed 3350.86 samples/sec   Loss 0.4110   LearningRate 0.0058   Epoch: 15   Global Step: 253360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:30,580-Speed 3341.82 samples/sec   Loss 0.4010   LearningRate 0.0058   Epoch: 15   Global Step: 253370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:33,668-Speed 3317.54 samples/sec   Loss 0.4012   LearningRate 0.0058   Epoch: 15   Global Step: 253380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:36,760-Speed 3311.46 samples/sec   Loss 0.3737   LearningRate 0.0058   Epoch: 15   Global Step: 253390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:39,827-Speed 3339.83 samples/sec   Loss 0.3695   LearningRate 0.0058   Epoch: 15   Global Step: 253400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:42,897-Speed 3336.03 samples/sec   Loss 0.3960   LearningRate 0.0058   Epoch: 15   Global Step: 253410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:45,967-Speed 3336.86 samples/sec   Loss 0.3571   LearningRate 0.0058   Epoch: 15   Global Step: 253420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:49,034-Speed 3339.28 samples/sec   Loss 0.3928   LearningRate 0.0058   Epoch: 15   Global Step: 253430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:52,193-Speed 3242.51 samples/sec   Loss 0.3914   LearningRate 0.0058   Epoch: 15   Global Step: 253440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:55,274-Speed 3324.49 samples/sec   Loss 0.3982   LearningRate 0.0058   Epoch: 15   Global Step: 253450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-12 02:03:58,408-Speed 3268.07 samples/sec   Loss 0.3761   LearningRate 0.0058   Epoch: 15   Global Step: 253460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-12 02:04:01,486-Speed 3327.82 samples/sec   Loss 0.3864   LearningRate 0.0058   Epoch: 15   Global Step: 253470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:04:04,667-Speed 3219.75 samples/sec   Loss 0.3801   LearningRate 0.0058   Epoch: 15   Global Step: 253480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:04:07,746-Speed 3326.14 samples/sec   Loss 0.4127   LearningRate 0.0058   Epoch: 15   Global Step: 253490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:10,812-Speed 3341.19 samples/sec   Loss 0.4094   LearningRate 0.0058   Epoch: 15   Global Step: 253500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:13,901-Speed 3315.69 samples/sec   Loss 0.3872   LearningRate 0.0058   Epoch: 15   Global Step: 253510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:16,997-Speed 3308.40 samples/sec   Loss 0.4025   LearningRate 0.0058   Epoch: 15   Global Step: 253520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:20,059-Speed 3345.05 samples/sec   Loss 0.4023   LearningRate 0.0058   Epoch: 15   Global Step: 253530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:23,170-Speed 3291.81 samples/sec   Loss 0.3902   LearningRate 0.0058   Epoch: 15   Global Step: 253540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:26,248-Speed 3327.54 samples/sec   Loss 0.4037   LearningRate 0.0058   Epoch: 15   Global Step: 253550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:29,384-Speed 3266.40 samples/sec   Loss 0.3876   LearningRate 0.0058   Epoch: 15   Global Step: 253560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:32,454-Speed 3335.88 samples/sec   Loss 0.3783   LearningRate 0.0058   Epoch: 15   Global Step: 253570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:35,566-Speed 3291.11 samples/sec   Loss 0.4016   LearningRate 0.0058   Epoch: 15   Global Step: 253580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:38,713-Speed 3255.32 samples/sec   Loss 0.4102   LearningRate 0.0058   Epoch: 15   Global Step: 253590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:04:41,784-Speed 3335.23 samples/sec   Loss 0.4132   LearningRate 0.0058   Epoch: 15   Global Step: 253600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:04:44,849-Speed 3341.30 samples/sec   Loss 0.3989   LearningRate 0.0058   Epoch: 15   Global Step: 253610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:04:47,916-Speed 3339.92 samples/sec   Loss 0.3838   LearningRate 0.0058   Epoch: 15   Global Step: 253620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:04:50,978-Speed 3344.57 samples/sec   Loss 0.3767   LearningRate 0.0058   Epoch: 15   Global Step: 253630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:54,078-Speed 3304.71 samples/sec   Loss 0.4152   LearningRate 0.0058   Epoch: 15   Global Step: 253640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:04:57,168-Speed 3314.76 samples/sec   Loss 0.3978   LearningRate 0.0058   Epoch: 15   Global Step: 253650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:00,281-Speed 3289.91 samples/sec   Loss 0.3829   LearningRate 0.0058   Epoch: 15   Global Step: 253660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:03,371-Speed 3314.46 samples/sec   Loss 0.4263   LearningRate 0.0058   Epoch: 15   Global Step: 253670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:06,450-Speed 3326.47 samples/sec   Loss 0.4236   LearningRate 0.0058   Epoch: 15   Global Step: 253680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:09,539-Speed 3316.02 samples/sec   Loss 0.3714   LearningRate 0.0058   Epoch: 15   Global Step: 253690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:12,635-Speed 3308.55 samples/sec   Loss 0.4056   LearningRate 0.0058   Epoch: 15   Global Step: 253700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:15,720-Speed 3320.26 samples/sec   Loss 0.3876   LearningRate 0.0058   Epoch: 15   Global Step: 253710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:18,884-Speed 3236.91 samples/sec   Loss 0.3982   LearningRate 0.0058   Epoch: 15   Global Step: 253720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:22,009-Speed 3277.23 samples/sec   Loss 0.4166   LearningRate 0.0058   Epoch: 15   Global Step: 253730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:05:25,097-Speed 3317.15 samples/sec   Loss 0.4116   LearningRate 0.0058   Epoch: 15   Global Step: 253740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:05:28,178-Speed 3324.54 samples/sec   Loss 0.4064   LearningRate 0.0058   Epoch: 15   Global Step: 253750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:05:31,316-Speed 3263.41 samples/sec   Loss 0.3759   LearningRate 0.0058   Epoch: 15   Global Step: 253760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:05:34,400-Speed 3321.31 samples/sec   Loss 0.3898   LearningRate 0.0058   Epoch: 15   Global Step: 253770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:05:37,511-Speed 3293.10 samples/sec   Loss 0.3851   LearningRate 0.0057   Epoch: 15   Global Step: 253780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:05:40,669-Speed 3243.31 samples/sec   Loss 0.3828   LearningRate 0.0057   Epoch: 15   Global Step: 253790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:05:43,743-Speed 3331.34 samples/sec   Loss 0.3872   LearningRate 0.0057   Epoch: 15   Global Step: 253800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:05:46,826-Speed 3322.40 samples/sec   Loss 0.4022   LearningRate 0.0057   Epoch: 15   Global Step: 253810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:49,894-Speed 3338.55 samples/sec   Loss 0.3751   LearningRate 0.0057   Epoch: 15   Global Step: 253820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:52,971-Speed 3328.81 samples/sec   Loss 0.3887   LearningRate 0.0057   Epoch: 15   Global Step: 253830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:56,050-Speed 3326.44 samples/sec   Loss 0.4184   LearningRate 0.0057   Epoch: 15   Global Step: 253840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:05:59,203-Speed 3247.99 samples/sec   Loss 0.3843   LearningRate 0.0057   Epoch: 15   Global Step: 253850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:06:02,284-Speed 3324.30 samples/sec   Loss 0.4043   LearningRate 0.0057   Epoch: 15   Global Step: 253860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:06:05,358-Speed 3332.06 samples/sec   Loss 0.3972   LearningRate 0.0057   Epoch: 15   Global Step: 253870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:06:08,427-Speed 3338.38 samples/sec   Loss 0.4252   LearningRate 0.0057   Epoch: 15   Global Step: 253880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:06:11,491-Speed 3342.33 samples/sec   Loss 0.3973   LearningRate 0.0057   Epoch: 15   Global Step: 253890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:06:14,643-Speed 3249.89 samples/sec   Loss 0.4057   LearningRate 0.0057   Epoch: 15   Global Step: 253900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:06:17,706-Speed 3343.11 samples/sec   Loss 0.4062   LearningRate 0.0057   Epoch: 15   Global Step: 253910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:06:20,795-Speed 3315.57 samples/sec   Loss 0.3850   LearningRate 0.0057   Epoch: 15   Global Step: 253920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:06:23,890-Speed 3309.28 samples/sec   Loss 0.3837   LearningRate 0.0057   Epoch: 15   Global Step: 253930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:06:26,970-Speed 3325.99 samples/sec   Loss 0.4076   LearningRate 0.0057   Epoch: 15   Global Step: 253940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:06:30,061-Speed 3313.13 samples/sec   Loss 0.3966   LearningRate 0.0057   Epoch: 15   Global Step: 253950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:06:33,165-Speed 3299.75 samples/sec   Loss 0.4206   LearningRate 0.0057   Epoch: 15   Global Step: 253960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:06:36,246-Speed 3324.80 samples/sec   Loss 0.3803   LearningRate 0.0057   Epoch: 15   Global Step: 253970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:06:39,316-Speed 3336.09 samples/sec   Loss 0.3954   LearningRate 0.0057   Epoch: 15   Global Step: 253980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:06:42,402-Speed 3319.60 samples/sec   Loss 0.3865   LearningRate 0.0057   Epoch: 15   Global Step: 253990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:06:45,503-Speed 3302.17 samples/sec   Loss 0.3980   LearningRate 0.0057   Epoch: 15   Global Step: 254000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:07:29,507-[lfw][254000]XNorm: 20.600824
Training: 2022-04-12 02:07:29,507-[lfw][254000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 02:07:29,508-[lfw][254000]Accuracy-Highest: 0.99817
Training: 2022-04-12 02:08:20,672-[cfp_fp][254000]XNorm: 22.231478
Training: 2022-04-12 02:08:20,673-[cfp_fp][254000]Accuracy-Flip: 0.99071+-0.00492
Training: 2022-04-12 02:08:20,673-[cfp_fp][254000]Accuracy-Highest: 0.99186
Training: 2022-04-12 02:09:04,925-[agedb_30][254000]XNorm: 22.373559
Training: 2022-04-12 02:09:04,926-[agedb_30][254000]Accuracy-Flip: 0.98317+-0.00660
Training: 2022-04-12 02:09:04,926-[agedb_30][254000]Accuracy-Highest: 0.98583
Training: 2022-04-12 02:09:07,995-Speed 71.86 samples/sec   Loss 0.3760   LearningRate 0.0057   Epoch: 15   Global Step: 254010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:11,045-Speed 3358.19 samples/sec   Loss 0.4111   LearningRate 0.0057   Epoch: 15   Global Step: 254020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:14,118-Speed 3333.78 samples/sec   Loss 0.3882   LearningRate 0.0057   Epoch: 15   Global Step: 254030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:17,171-Speed 3354.24 samples/sec   Loss 0.4249   LearningRate 0.0057   Epoch: 15   Global Step: 254040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:20,323-Speed 3249.10 samples/sec   Loss 0.3982   LearningRate 0.0057   Epoch: 15   Global Step: 254050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:23,380-Speed 3350.95 samples/sec   Loss 0.4027   LearningRate 0.0057   Epoch: 15   Global Step: 254060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:26,444-Speed 3342.25 samples/sec   Loss 0.4112   LearningRate 0.0057   Epoch: 15   Global Step: 254070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:29,516-Speed 3334.40 samples/sec   Loss 0.3784   LearningRate 0.0057   Epoch: 15   Global Step: 254080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:32,606-Speed 3314.80 samples/sec   Loss 0.4134   LearningRate 0.0057   Epoch: 15   Global Step: 254090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:35,811-Speed 3195.96 samples/sec   Loss 0.4128   LearningRate 0.0057   Epoch: 15   Global Step: 254100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:38,889-Speed 3328.05 samples/sec   Loss 0.4137   LearningRate 0.0057   Epoch: 15   Global Step: 254110   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-12 02:09:41,942-Speed 3354.24 samples/sec   Loss 0.4092   LearningRate 0.0057   Epoch: 15   Global Step: 254120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:45,106-Speed 3236.73 samples/sec   Loss 0.3931   LearningRate 0.0057   Epoch: 15   Global Step: 254130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:48,255-Speed 3253.25 samples/sec   Loss 0.4212   LearningRate 0.0057   Epoch: 15   Global Step: 254140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:51,400-Speed 3256.31 samples/sec   Loss 0.3924   LearningRate 0.0057   Epoch: 15   Global Step: 254150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:09:54,464-Speed 3342.91 samples/sec   Loss 0.4304   LearningRate 0.0057   Epoch: 15   Global Step: 254160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:09:57,531-Speed 3339.41 samples/sec   Loss 0.4092   LearningRate 0.0057   Epoch: 15   Global Step: 254170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:00,593-Speed 3344.96 samples/sec   Loss 0.4045   LearningRate 0.0057   Epoch: 15   Global Step: 254180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:03,676-Speed 3323.17 samples/sec   Loss 0.4047   LearningRate 0.0057   Epoch: 15   Global Step: 254190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:06,741-Speed 3340.94 samples/sec   Loss 0.4240   LearningRate 0.0057   Epoch: 15   Global Step: 254200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:09,822-Speed 3324.70 samples/sec   Loss 0.3934   LearningRate 0.0057   Epoch: 15   Global Step: 254210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:12,882-Speed 3346.99 samples/sec   Loss 0.4225   LearningRate 0.0057   Epoch: 15   Global Step: 254220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:15,939-Speed 3350.91 samples/sec   Loss 0.3783   LearningRate 0.0057   Epoch: 15   Global Step: 254230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:18,997-Speed 3349.48 samples/sec   Loss 0.4107   LearningRate 0.0057   Epoch: 15   Global Step: 254240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:22,073-Speed 3329.23 samples/sec   Loss 0.4068   LearningRate 0.0057   Epoch: 15   Global Step: 254250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:25,134-Speed 3345.84 samples/sec   Loss 0.4377   LearningRate 0.0057   Epoch: 15   Global Step: 254260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:28,200-Speed 3342.31 samples/sec   Loss 0.4004   LearningRate 0.0057   Epoch: 15   Global Step: 254270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:31,286-Speed 3318.61 samples/sec   Loss 0.4071   LearningRate 0.0057   Epoch: 15   Global Step: 254280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:10:34,344-Speed 3350.27 samples/sec   Loss 0.4260   LearningRate 0.0057   Epoch: 15   Global Step: 254290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:37,420-Speed 3328.67 samples/sec   Loss 0.4067   LearningRate 0.0057   Epoch: 15   Global Step: 254300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:40,503-Speed 3322.71 samples/sec   Loss 0.3836   LearningRate 0.0057   Epoch: 15   Global Step: 254310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:43,574-Speed 3334.96 samples/sec   Loss 0.4193   LearningRate 0.0057   Epoch: 15   Global Step: 254320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:46,680-Speed 3298.00 samples/sec   Loss 0.4203   LearningRate 0.0057   Epoch: 15   Global Step: 254330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:49,752-Speed 3333.75 samples/sec   Loss 0.4007   LearningRate 0.0057   Epoch: 15   Global Step: 254340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:52,807-Speed 3353.04 samples/sec   Loss 0.3830   LearningRate 0.0057   Epoch: 15   Global Step: 254350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:55,898-Speed 3312.78 samples/sec   Loss 0.3971   LearningRate 0.0057   Epoch: 15   Global Step: 254360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:10:59,034-Speed 3266.31 samples/sec   Loss 0.4151   LearningRate 0.0057   Epoch: 15   Global Step: 254370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:02,111-Speed 3328.71 samples/sec   Loss 0.4013   LearningRate 0.0057   Epoch: 15   Global Step: 254380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:05,180-Speed 3337.62 samples/sec   Loss 0.3853   LearningRate 0.0057   Epoch: 15   Global Step: 254390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:11:08,262-Speed 3323.40 samples/sec   Loss 0.4187   LearningRate 0.0057   Epoch: 15   Global Step: 254400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:11:11,320-Speed 3349.15 samples/sec   Loss 0.3906   LearningRate 0.0057   Epoch: 15   Global Step: 254410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:11:14,391-Speed 3335.22 samples/sec   Loss 0.3931   LearningRate 0.0057   Epoch: 15   Global Step: 254420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:11:17,450-Speed 3347.88 samples/sec   Loss 0.4049   LearningRate 0.0057   Epoch: 15   Global Step: 254430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:11:20,628-Speed 3223.18 samples/sec   Loss 0.3900   LearningRate 0.0057   Epoch: 15   Global Step: 254440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:11:23,674-Speed 3361.88 samples/sec   Loss 0.4159   LearningRate 0.0057   Epoch: 15   Global Step: 254450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:26,760-Speed 3319.54 samples/sec   Loss 0.4077   LearningRate 0.0057   Epoch: 15   Global Step: 254460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:29,832-Speed 3335.22 samples/sec   Loss 0.4123   LearningRate 0.0057   Epoch: 15   Global Step: 254470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:32,904-Speed 3333.82 samples/sec   Loss 0.3841   LearningRate 0.0056   Epoch: 15   Global Step: 254480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:35,959-Speed 3353.15 samples/sec   Loss 0.4062   LearningRate 0.0056   Epoch: 15   Global Step: 254490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:39,056-Speed 3306.70 samples/sec   Loss 0.3951   LearningRate 0.0056   Epoch: 15   Global Step: 254500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:42,189-Speed 3268.54 samples/sec   Loss 0.4183   LearningRate 0.0056   Epoch: 15   Global Step: 254510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:45,256-Speed 3340.70 samples/sec   Loss 0.4241   LearningRate 0.0056   Epoch: 15   Global Step: 254520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:48,311-Speed 3352.42 samples/sec   Loss 0.4587   LearningRate 0.0056   Epoch: 15   Global Step: 254530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:51,387-Speed 3330.16 samples/sec   Loss 0.4042   LearningRate 0.0056   Epoch: 15   Global Step: 254540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:11:54,447-Speed 3346.38 samples/sec   Loss 0.4132   LearningRate 0.0056   Epoch: 15   Global Step: 254550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:11:57,504-Speed 3350.93 samples/sec   Loss 0.4092   LearningRate 0.0056   Epoch: 15   Global Step: 254560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:00,563-Speed 3348.33 samples/sec   Loss 0.4240   LearningRate 0.0056   Epoch: 15   Global Step: 254570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:03,634-Speed 3334.86 samples/sec   Loss 0.3997   LearningRate 0.0056   Epoch: 15   Global Step: 254580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:06,703-Speed 3336.83 samples/sec   Loss 0.4147   LearningRate 0.0056   Epoch: 15   Global Step: 254590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:09,805-Speed 3302.25 samples/sec   Loss 0.3974   LearningRate 0.0056   Epoch: 15   Global Step: 254600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:12,862-Speed 3351.01 samples/sec   Loss 0.3882   LearningRate 0.0056   Epoch: 15   Global Step: 254610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:15,925-Speed 3344.04 samples/sec   Loss 0.3974   LearningRate 0.0056   Epoch: 15   Global Step: 254620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:18,989-Speed 3342.27 samples/sec   Loss 0.3915   LearningRate 0.0056   Epoch: 15   Global Step: 254630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:22,143-Speed 3247.45 samples/sec   Loss 0.4029   LearningRate 0.0056   Epoch: 15   Global Step: 254640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:25,203-Speed 3346.65 samples/sec   Loss 0.3964   LearningRate 0.0056   Epoch: 15   Global Step: 254650   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-12 02:12:28,257-Speed 3353.72 samples/sec   Loss 0.3748   LearningRate 0.0056   Epoch: 15   Global Step: 254660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:31,339-Speed 3323.12 samples/sec   Loss 0.4183   LearningRate 0.0056   Epoch: 15   Global Step: 254670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:34,439-Speed 3305.04 samples/sec   Loss 0.4155   LearningRate 0.0056   Epoch: 15   Global Step: 254680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:37,577-Speed 3264.11 samples/sec   Loss 0.4002   LearningRate 0.0056   Epoch: 15   Global Step: 254690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:40,640-Speed 3343.98 samples/sec   Loss 0.4157   LearningRate 0.0056   Epoch: 15   Global Step: 254700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:43,706-Speed 3339.92 samples/sec   Loss 0.4243   LearningRate 0.0056   Epoch: 15   Global Step: 254710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:46,769-Speed 3343.53 samples/sec   Loss 0.4263   LearningRate 0.0056   Epoch: 15   Global Step: 254720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:49,835-Speed 3340.74 samples/sec   Loss 0.3993   LearningRate 0.0056   Epoch: 15   Global Step: 254730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:12:52,884-Speed 3359.00 samples/sec   Loss 0.3995   LearningRate 0.0056   Epoch: 15   Global Step: 254740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:12:55,968-Speed 3320.95 samples/sec   Loss 0.3902   LearningRate 0.0056   Epoch: 15   Global Step: 254750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:12:59,028-Speed 3347.73 samples/sec   Loss 0.3831   LearningRate 0.0056   Epoch: 15   Global Step: 254760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:02,102-Speed 3331.45 samples/sec   Loss 0.4170   LearningRate 0.0056   Epoch: 15   Global Step: 254770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:05,161-Speed 3348.82 samples/sec   Loss 0.3911   LearningRate 0.0056   Epoch: 15   Global Step: 254780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:08,220-Speed 3348.14 samples/sec   Loss 0.3943   LearningRate 0.0056   Epoch: 15   Global Step: 254790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:11,319-Speed 3305.59 samples/sec   Loss 0.4218   LearningRate 0.0056   Epoch: 15   Global Step: 254800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:14,401-Speed 3323.29 samples/sec   Loss 0.4163   LearningRate 0.0056   Epoch: 15   Global Step: 254810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:17,520-Speed 3283.45 samples/sec   Loss 0.4215   LearningRate 0.0056   Epoch: 15   Global Step: 254820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:20,634-Speed 3288.87 samples/sec   Loss 0.3973   LearningRate 0.0056   Epoch: 15   Global Step: 254830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:23,776-Speed 3260.23 samples/sec   Loss 0.3849   LearningRate 0.0056   Epoch: 15   Global Step: 254840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:26,887-Speed 3292.13 samples/sec   Loss 0.4004   LearningRate 0.0056   Epoch: 15   Global Step: 254850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:29,991-Speed 3300.07 samples/sec   Loss 0.4080   LearningRate 0.0056   Epoch: 15   Global Step: 254860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:33,105-Speed 3288.94 samples/sec   Loss 0.4008   LearningRate 0.0056   Epoch: 15   Global Step: 254870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:36,238-Speed 3269.30 samples/sec   Loss 0.4055   LearningRate 0.0056   Epoch: 15   Global Step: 254880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:39,391-Speed 3248.61 samples/sec   Loss 0.4238   LearningRate 0.0056   Epoch: 15   Global Step: 254890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:42,602-Speed 3189.55 samples/sec   Loss 0.4060   LearningRate 0.0056   Epoch: 15   Global Step: 254900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:45,757-Speed 3245.97 samples/sec   Loss 0.3906   LearningRate 0.0056   Epoch: 15   Global Step: 254910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:48,935-Speed 3222.71 samples/sec   Loss 0.4197   LearningRate 0.0056   Epoch: 15   Global Step: 254920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:52,055-Speed 3282.63 samples/sec   Loss 0.4127   LearningRate 0.0056   Epoch: 15   Global Step: 254930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:13:55,135-Speed 3326.18 samples/sec   Loss 0.4115   LearningRate 0.0056   Epoch: 15   Global Step: 254940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:13:58,244-Speed 3294.46 samples/sec   Loss 0.4071   LearningRate 0.0056   Epoch: 15   Global Step: 254950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:01,316-Speed 3333.52 samples/sec   Loss 0.4308   LearningRate 0.0056   Epoch: 15   Global Step: 254960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:04,384-Speed 3338.79 samples/sec   Loss 0.3891   LearningRate 0.0056   Epoch: 15   Global Step: 254970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:07,445-Speed 3345.99 samples/sec   Loss 0.3894   LearningRate 0.0056   Epoch: 15   Global Step: 254980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:10,572-Speed 3275.60 samples/sec   Loss 0.4187   LearningRate 0.0056   Epoch: 15   Global Step: 254990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:13,639-Speed 3339.12 samples/sec   Loss 0.4095   LearningRate 0.0056   Epoch: 15   Global Step: 255000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:16,742-Speed 3301.19 samples/sec   Loss 0.3988   LearningRate 0.0056   Epoch: 15   Global Step: 255010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:19,807-Speed 3341.72 samples/sec   Loss 0.4131   LearningRate 0.0056   Epoch: 15   Global Step: 255020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:22,889-Speed 3323.39 samples/sec   Loss 0.4206   LearningRate 0.0056   Epoch: 15   Global Step: 255030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:25,948-Speed 3347.74 samples/sec   Loss 0.3862   LearningRate 0.0056   Epoch: 15   Global Step: 255040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:29,009-Speed 3347.17 samples/sec   Loss 0.4075   LearningRate 0.0056   Epoch: 15   Global Step: 255050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:32,067-Speed 3348.93 samples/sec   Loss 0.4171   LearningRate 0.0056   Epoch: 15   Global Step: 255060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:35,136-Speed 3337.34 samples/sec   Loss 0.3875   LearningRate 0.0056   Epoch: 15   Global Step: 255070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:38,206-Speed 3336.15 samples/sec   Loss 0.4017   LearningRate 0.0056   Epoch: 15   Global Step: 255080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:41,269-Speed 3344.33 samples/sec   Loss 0.3973   LearningRate 0.0056   Epoch: 15   Global Step: 255090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:44,335-Speed 3340.37 samples/sec   Loss 0.3832   LearningRate 0.0056   Epoch: 15   Global Step: 255100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:47,401-Speed 3339.76 samples/sec   Loss 0.3869   LearningRate 0.0056   Epoch: 15   Global Step: 255110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:50,466-Speed 3342.25 samples/sec   Loss 0.3809   LearningRate 0.0056   Epoch: 15   Global Step: 255120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:53,528-Speed 3344.66 samples/sec   Loss 0.4133   LearningRate 0.0056   Epoch: 15   Global Step: 255130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:56,585-Speed 3351.06 samples/sec   Loss 0.4063   LearningRate 0.0056   Epoch: 15   Global Step: 255140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:14:59,679-Speed 3310.28 samples/sec   Loss 0.3956   LearningRate 0.0056   Epoch: 15   Global Step: 255150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:15:02,754-Speed 3331.14 samples/sec   Loss 0.4444   LearningRate 0.0056   Epoch: 15   Global Step: 255160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:05,834-Speed 3324.94 samples/sec   Loss 0.4007   LearningRate 0.0056   Epoch: 15   Global Step: 255170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:08,980-Speed 3255.48 samples/sec   Loss 0.4147   LearningRate 0.0055   Epoch: 15   Global Step: 255180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:12,080-Speed 3304.05 samples/sec   Loss 0.4125   LearningRate 0.0055   Epoch: 15   Global Step: 255190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:15,195-Speed 3287.81 samples/sec   Loss 0.4237   LearningRate 0.0055   Epoch: 15   Global Step: 255200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:18,372-Speed 3224.43 samples/sec   Loss 0.4026   LearningRate 0.0055   Epoch: 15   Global Step: 255210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:21,430-Speed 3349.06 samples/sec   Loss 0.4011   LearningRate 0.0055   Epoch: 15   Global Step: 255220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:24,489-Speed 3348.60 samples/sec   Loss 0.4114   LearningRate 0.0055   Epoch: 15   Global Step: 255230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:27,551-Speed 3345.46 samples/sec   Loss 0.4175   LearningRate 0.0055   Epoch: 15   Global Step: 255240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:30,615-Speed 3342.67 samples/sec   Loss 0.4165   LearningRate 0.0055   Epoch: 15   Global Step: 255250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:15:33,684-Speed 3337.92 samples/sec   Loss 0.4270   LearningRate 0.0055   Epoch: 15   Global Step: 255260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:15:36,803-Speed 3282.85 samples/sec   Loss 0.4001   LearningRate 0.0055   Epoch: 15   Global Step: 255270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:15:39,893-Speed 3314.79 samples/sec   Loss 0.4017   LearningRate 0.0055   Epoch: 15   Global Step: 255280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:15:42,976-Speed 3322.21 samples/sec   Loss 0.4065   LearningRate 0.0055   Epoch: 15   Global Step: 255290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:15:46,040-Speed 3343.02 samples/sec   Loss 0.3983   LearningRate 0.0055   Epoch: 15   Global Step: 255300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:15:49,115-Speed 3330.86 samples/sec   Loss 0.4085   LearningRate 0.0055   Epoch: 15   Global Step: 255310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:15:52,265-Speed 3251.73 samples/sec   Loss 0.3944   LearningRate 0.0055   Epoch: 15   Global Step: 255320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:15:55,366-Speed 3302.75 samples/sec   Loss 0.4201   LearningRate 0.0055   Epoch: 15   Global Step: 255330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:15:58,493-Speed 3275.16 samples/sec   Loss 0.4154   LearningRate 0.0055   Epoch: 15   Global Step: 255340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:16:01,584-Speed 3314.02 samples/sec   Loss 0.4353   LearningRate 0.0055   Epoch: 15   Global Step: 255350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:16:04,646-Speed 3344.68 samples/sec   Loss 0.4178   LearningRate 0.0055   Epoch: 15   Global Step: 255360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:07,706-Speed 3347.70 samples/sec   Loss 0.4062   LearningRate 0.0055   Epoch: 15   Global Step: 255370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:10,799-Speed 3311.22 samples/sec   Loss 0.4062   LearningRate 0.0055   Epoch: 15   Global Step: 255380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:13,861-Speed 3345.55 samples/sec   Loss 0.4117   LearningRate 0.0055   Epoch: 15   Global Step: 255390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:16,944-Speed 3321.45 samples/sec   Loss 0.3978   LearningRate 0.0055   Epoch: 15   Global Step: 255400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:20,013-Speed 3337.95 samples/sec   Loss 0.4106   LearningRate 0.0055   Epoch: 15   Global Step: 255410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:23,076-Speed 3343.16 samples/sec   Loss 0.4211   LearningRate 0.0055   Epoch: 15   Global Step: 255420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:26,160-Speed 3321.17 samples/sec   Loss 0.4179   LearningRate 0.0055   Epoch: 15   Global Step: 255430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:29,222-Speed 3344.89 samples/sec   Loss 0.4143   LearningRate 0.0055   Epoch: 15   Global Step: 255440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:32,357-Speed 3267.45 samples/sec   Loss 0.4393   LearningRate 0.0055   Epoch: 15   Global Step: 255450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:35,437-Speed 3325.20 samples/sec   Loss 0.4094   LearningRate 0.0055   Epoch: 15   Global Step: 255460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:16:38,526-Speed 3315.74 samples/sec   Loss 0.4168   LearningRate 0.0055   Epoch: 15   Global Step: 255470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:16:41,592-Speed 3341.46 samples/sec   Loss 0.4212   LearningRate 0.0055   Epoch: 15   Global Step: 255480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:16:44,658-Speed 3340.38 samples/sec   Loss 0.4175   LearningRate 0.0055   Epoch: 15   Global Step: 255490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:16:47,715-Speed 3350.25 samples/sec   Loss 0.4137   LearningRate 0.0055   Epoch: 15   Global Step: 255500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:50,819-Speed 3299.07 samples/sec   Loss 0.4236   LearningRate 0.0055   Epoch: 15   Global Step: 255510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:53,983-Speed 3237.67 samples/sec   Loss 0.4226   LearningRate 0.0055   Epoch: 15   Global Step: 255520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:16:57,098-Speed 3288.31 samples/sec   Loss 0.3882   LearningRate 0.0055   Epoch: 15   Global Step: 255530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:17:00,196-Speed 3305.17 samples/sec   Loss 0.4124   LearningRate 0.0055   Epoch: 15   Global Step: 255540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:17:03,282-Speed 3319.54 samples/sec   Loss 0.3827   LearningRate 0.0055   Epoch: 15   Global Step: 255550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:17:06,364-Speed 3322.81 samples/sec   Loss 0.4058   LearningRate 0.0055   Epoch: 15   Global Step: 255560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:17:09,474-Speed 3294.82 samples/sec   Loss 0.3911   LearningRate 0.0055   Epoch: 15   Global Step: 255570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:17:12,620-Speed 3255.12 samples/sec   Loss 0.4210   LearningRate 0.0055   Epoch: 15   Global Step: 255580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:17:15,699-Speed 3326.30 samples/sec   Loss 0.4259   LearningRate 0.0055   Epoch: 15   Global Step: 255590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:17:18,764-Speed 3341.50 samples/sec   Loss 0.4266   LearningRate 0.0055   Epoch: 15   Global Step: 255600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:21,833-Speed 3337.96 samples/sec   Loss 0.3798   LearningRate 0.0055   Epoch: 15   Global Step: 255610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:24,983-Speed 3250.78 samples/sec   Loss 0.3815   LearningRate 0.0055   Epoch: 15   Global Step: 255620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:28,090-Speed 3297.19 samples/sec   Loss 0.4303   LearningRate 0.0055   Epoch: 15   Global Step: 255630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:31,173-Speed 3321.33 samples/sec   Loss 0.3902   LearningRate 0.0055   Epoch: 15   Global Step: 255640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:34,237-Speed 3343.37 samples/sec   Loss 0.4254   LearningRate 0.0055   Epoch: 15   Global Step: 255650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:37,327-Speed 3315.57 samples/sec   Loss 0.4118   LearningRate 0.0055   Epoch: 15   Global Step: 255660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:40,418-Speed 3313.81 samples/sec   Loss 0.4349   LearningRate 0.0055   Epoch: 15   Global Step: 255670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:43,498-Speed 3325.01 samples/sec   Loss 0.4431   LearningRate 0.0055   Epoch: 15   Global Step: 255680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:46,561-Speed 3343.83 samples/sec   Loss 0.3986   LearningRate 0.0055   Epoch: 15   Global Step: 255690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:49,615-Speed 3353.67 samples/sec   Loss 0.4041   LearningRate 0.0055   Epoch: 15   Global Step: 255700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:52,694-Speed 3326.07 samples/sec   Loss 0.4492   LearningRate 0.0055   Epoch: 15   Global Step: 255710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:55,776-Speed 3323.10 samples/sec   Loss 0.4119   LearningRate 0.0055   Epoch: 15   Global Step: 255720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:17:58,865-Speed 3315.99 samples/sec   Loss 0.4379   LearningRate 0.0055   Epoch: 15   Global Step: 255730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:01,959-Speed 3310.32 samples/sec   Loss 0.4183   LearningRate 0.0055   Epoch: 15   Global Step: 255740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:05,070-Speed 3292.25 samples/sec   Loss 0.3927   LearningRate 0.0055   Epoch: 15   Global Step: 255750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:08,141-Speed 3335.48 samples/sec   Loss 0.4218   LearningRate 0.0055   Epoch: 15   Global Step: 255760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:11,226-Speed 3319.87 samples/sec   Loss 0.4252   LearningRate 0.0055   Epoch: 15   Global Step: 255770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:14,317-Speed 3313.68 samples/sec   Loss 0.4045   LearningRate 0.0055   Epoch: 15   Global Step: 255780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:17,398-Speed 3324.06 samples/sec   Loss 0.4049   LearningRate 0.0055   Epoch: 15   Global Step: 255790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:20,480-Speed 3323.22 samples/sec   Loss 0.4047   LearningRate 0.0055   Epoch: 15   Global Step: 255800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:23,585-Speed 3299.47 samples/sec   Loss 0.4167   LearningRate 0.0055   Epoch: 15   Global Step: 255810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:26,687-Speed 3301.57 samples/sec   Loss 0.4066   LearningRate 0.0055   Epoch: 15   Global Step: 255820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:29,754-Speed 3339.72 samples/sec   Loss 0.4202   LearningRate 0.0055   Epoch: 15   Global Step: 255830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:32,842-Speed 3316.38 samples/sec   Loss 0.4033   LearningRate 0.0055   Epoch: 15   Global Step: 255840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:35,964-Speed 3281.19 samples/sec   Loss 0.4167   LearningRate 0.0055   Epoch: 15   Global Step: 255850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:39,051-Speed 3317.89 samples/sec   Loss 0.4149   LearningRate 0.0055   Epoch: 15   Global Step: 255860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:42,156-Speed 3298.50 samples/sec   Loss 0.4306   LearningRate 0.0055   Epoch: 15   Global Step: 255870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:45,249-Speed 3311.32 samples/sec   Loss 0.4197   LearningRate 0.0055   Epoch: 15   Global Step: 255880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:48,344-Speed 3308.39 samples/sec   Loss 0.4343   LearningRate 0.0054   Epoch: 15   Global Step: 255890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:51,438-Speed 3311.24 samples/sec   Loss 0.4039   LearningRate 0.0054   Epoch: 15   Global Step: 255900   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-12 02:18:54,499-Speed 3346.41 samples/sec   Loss 0.4106   LearningRate 0.0054   Epoch: 15   Global Step: 255910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:18:57,657-Speed 3242.57 samples/sec   Loss 0.4187   LearningRate 0.0054   Epoch: 15   Global Step: 255920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:19:00,802-Speed 3256.73 samples/sec   Loss 0.4252   LearningRate 0.0054   Epoch: 15   Global Step: 255930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:19:03,889-Speed 3318.44 samples/sec   Loss 0.3930   LearningRate 0.0054   Epoch: 15   Global Step: 255940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:19:06,983-Speed 3310.36 samples/sec   Loss 0.4113   LearningRate 0.0054   Epoch: 15   Global Step: 255950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:19:10,069-Speed 3318.62 samples/sec   Loss 0.4059   LearningRate 0.0054   Epoch: 15   Global Step: 255960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:19:13,195-Speed 3276.54 samples/sec   Loss 0.4354   LearningRate 0.0054   Epoch: 15   Global Step: 255970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:19:16,262-Speed 3339.41 samples/sec   Loss 0.4178   LearningRate 0.0054   Epoch: 15   Global Step: 255980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:19:19,387-Speed 3277.51 samples/sec   Loss 0.4276   LearningRate 0.0054   Epoch: 15   Global Step: 255990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:19:22,608-Speed 3180.09 samples/sec   Loss 0.4289   LearningRate 0.0054   Epoch: 15   Global Step: 256000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:20:06,321-[lfw][256000]XNorm: 21.415534
Training: 2022-04-12 02:20:06,324-[lfw][256000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-12 02:20:06,324-[lfw][256000]Accuracy-Highest: 0.99817
Training: 2022-04-12 02:20:57,434-[cfp_fp][256000]XNorm: 22.661393
Training: 2022-04-12 02:20:57,434-[cfp_fp][256000]Accuracy-Flip: 0.99057+-0.00500
Training: 2022-04-12 02:20:57,435-[cfp_fp][256000]Accuracy-Highest: 0.99186
Training: 2022-04-12 02:21:41,420-[agedb_30][256000]XNorm: 22.993931
Training: 2022-04-12 02:21:41,421-[agedb_30][256000]Accuracy-Flip: 0.98650+-0.00589
Training: 2022-04-12 02:21:41,421-[agedb_30][256000]Accuracy-Highest: 0.98650
Training: 2022-04-12 02:21:44,507-Speed 72.16 samples/sec   Loss 0.4547   LearningRate 0.0054   Epoch: 15   Global Step: 256010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:21:47,557-Speed 3358.17 samples/sec   Loss 0.4581   LearningRate 0.0054   Epoch: 15   Global Step: 256020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:21:50,693-Speed 3266.68 samples/sec   Loss 0.4167   LearningRate 0.0054   Epoch: 15   Global Step: 256030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:21:53,814-Speed 3281.49 samples/sec   Loss 0.4005   LearningRate 0.0054   Epoch: 15   Global Step: 256040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:21:56,912-Speed 3306.28 samples/sec   Loss 0.4200   LearningRate 0.0054   Epoch: 15   Global Step: 256050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:21:59,964-Speed 3355.76 samples/sec   Loss 0.4006   LearningRate 0.0054   Epoch: 15   Global Step: 256060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:22:03,038-Speed 3331.96 samples/sec   Loss 0.4253   LearningRate 0.0054   Epoch: 15   Global Step: 256070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:22:06,149-Speed 3291.65 samples/sec   Loss 0.3907   LearningRate 0.0054   Epoch: 15   Global Step: 256080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:22:09,243-Speed 3310.54 samples/sec   Loss 0.4129   LearningRate 0.0054   Epoch: 15   Global Step: 256090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:22:12,309-Speed 3340.89 samples/sec   Loss 0.4095   LearningRate 0.0054   Epoch: 15   Global Step: 256100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:15,384-Speed 3330.44 samples/sec   Loss 0.3973   LearningRate 0.0054   Epoch: 15   Global Step: 256110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:18,482-Speed 3307.03 samples/sec   Loss 0.4342   LearningRate 0.0054   Epoch: 15   Global Step: 256120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:21,571-Speed 3314.84 samples/sec   Loss 0.4227   LearningRate 0.0054   Epoch: 15   Global Step: 256130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:24,712-Speed 3260.92 samples/sec   Loss 0.4158   LearningRate 0.0054   Epoch: 15   Global Step: 256140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:27,809-Speed 3307.84 samples/sec   Loss 0.4085   LearningRate 0.0054   Epoch: 15   Global Step: 256150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:30,893-Speed 3320.21 samples/sec   Loss 0.4366   LearningRate 0.0054   Epoch: 15   Global Step: 256160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:34,016-Speed 3279.94 samples/sec   Loss 0.4293   LearningRate 0.0054   Epoch: 15   Global Step: 256170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:37,150-Speed 3268.12 samples/sec   Loss 0.4355   LearningRate 0.0054   Epoch: 15   Global Step: 256180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:40,288-Speed 3264.24 samples/sec   Loss 0.4192   LearningRate 0.0054   Epoch: 15   Global Step: 256190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:22:43,425-Speed 3264.84 samples/sec   Loss 0.4210   LearningRate 0.0054   Epoch: 15   Global Step: 256200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:22:46,611-Speed 3214.69 samples/sec   Loss 0.4160   LearningRate 0.0054   Epoch: 15   Global Step: 256210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:22:49,725-Speed 3289.00 samples/sec   Loss 0.4199   LearningRate 0.0054   Epoch: 15   Global Step: 256220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:22:52,789-Speed 3343.41 samples/sec   Loss 0.4028   LearningRate 0.0054   Epoch: 15   Global Step: 256230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:22:55,869-Speed 3325.71 samples/sec   Loss 0.4753   LearningRate 0.0054   Epoch: 15   Global Step: 256240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:22:58,931-Speed 3344.92 samples/sec   Loss 0.4431   LearningRate 0.0054   Epoch: 15   Global Step: 256250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:23:02,017-Speed 3318.06 samples/sec   Loss 0.4343   LearningRate 0.0054   Epoch: 15   Global Step: 256260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:23:05,065-Speed 3360.17 samples/sec   Loss 0.4238   LearningRate 0.0054   Epoch: 15   Global Step: 256270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:08,157-Speed 3313.14 samples/sec   Loss 0.4244   LearningRate 0.0054   Epoch: 15   Global Step: 256280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:11,338-Speed 3219.98 samples/sec   Loss 0.4491   LearningRate 0.0054   Epoch: 15   Global Step: 256290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:14,499-Speed 3240.44 samples/sec   Loss 0.4207   LearningRate 0.0054   Epoch: 15   Global Step: 256300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:17,568-Speed 3337.10 samples/sec   Loss 0.3968   LearningRate 0.0054   Epoch: 15   Global Step: 256310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:20,668-Speed 3304.44 samples/sec   Loss 0.3972   LearningRate 0.0054   Epoch: 15   Global Step: 256320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:23,762-Speed 3309.39 samples/sec   Loss 0.4122   LearningRate 0.0054   Epoch: 15   Global Step: 256330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:26,818-Speed 3351.60 samples/sec   Loss 0.3998   LearningRate 0.0054   Epoch: 15   Global Step: 256340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:29,969-Speed 3250.80 samples/sec   Loss 0.4172   LearningRate 0.0054   Epoch: 15   Global Step: 256350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:33,047-Speed 3327.17 samples/sec   Loss 0.4289   LearningRate 0.0054   Epoch: 15   Global Step: 256360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:36,194-Speed 3254.89 samples/sec   Loss 0.4173   LearningRate 0.0054   Epoch: 15   Global Step: 256370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:23:39,289-Speed 3309.54 samples/sec   Loss 0.4175   LearningRate 0.0054   Epoch: 15   Global Step: 256380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:23:42,448-Speed 3242.08 samples/sec   Loss 0.4012   LearningRate 0.0054   Epoch: 15   Global Step: 256390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:23:45,683-Speed 3166.18 samples/sec   Loss 0.4332   LearningRate 0.0054   Epoch: 15   Global Step: 256400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:23:48,771-Speed 3316.87 samples/sec   Loss 0.4488   LearningRate 0.0054   Epoch: 15   Global Step: 256410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:23:51,818-Speed 3361.80 samples/sec   Loss 0.4056   LearningRate 0.0054   Epoch: 15   Global Step: 256420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:54,885-Speed 3339.34 samples/sec   Loss 0.4259   LearningRate 0.0054   Epoch: 15   Global Step: 256430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:23:57,960-Speed 3330.33 samples/sec   Loss 0.4255   LearningRate 0.0054   Epoch: 15   Global Step: 256440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:01,037-Speed 3328.43 samples/sec   Loss 0.4122   LearningRate 0.0054   Epoch: 15   Global Step: 256450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:04,104-Speed 3340.20 samples/sec   Loss 0.4314   LearningRate 0.0054   Epoch: 15   Global Step: 256460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:07,167-Speed 3344.56 samples/sec   Loss 0.4100   LearningRate 0.0054   Epoch: 15   Global Step: 256470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:10,222-Speed 3351.91 samples/sec   Loss 0.4138   LearningRate 0.0054   Epoch: 15   Global Step: 256480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:13,276-Speed 3354.18 samples/sec   Loss 0.4173   LearningRate 0.0054   Epoch: 15   Global Step: 256490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:16,362-Speed 3318.78 samples/sec   Loss 0.4186   LearningRate 0.0054   Epoch: 15   Global Step: 256500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:19,514-Speed 3248.75 samples/sec   Loss 0.4101   LearningRate 0.0054   Epoch: 15   Global Step: 256510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:22,611-Speed 3307.50 samples/sec   Loss 0.3927   LearningRate 0.0054   Epoch: 15   Global Step: 256520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:24:25,685-Speed 3332.04 samples/sec   Loss 0.4161   LearningRate 0.0054   Epoch: 15   Global Step: 256530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:24:28,796-Speed 3291.91 samples/sec   Loss 0.4249   LearningRate 0.0054   Epoch: 15   Global Step: 256540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:24:31,859-Speed 3344.33 samples/sec   Loss 0.4297   LearningRate 0.0054   Epoch: 15   Global Step: 256550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:24:34,960-Speed 3302.91 samples/sec   Loss 0.4086   LearningRate 0.0054   Epoch: 15   Global Step: 256560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:24:38,154-Speed 3206.92 samples/sec   Loss 0.4348   LearningRate 0.0054   Epoch: 15   Global Step: 256570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:24:41,271-Speed 3285.35 samples/sec   Loss 0.4120   LearningRate 0.0054   Epoch: 15   Global Step: 256580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:24:44,326-Speed 3352.34 samples/sec   Loss 0.4138   LearningRate 0.0054   Epoch: 15   Global Step: 256590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:47,389-Speed 3343.95 samples/sec   Loss 0.4378   LearningRate 0.0054   Epoch: 15   Global Step: 256600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:50,483-Speed 3310.70 samples/sec   Loss 0.4271   LearningRate 0.0053   Epoch: 15   Global Step: 256610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:53,557-Speed 3331.53 samples/sec   Loss 0.4429   LearningRate 0.0053   Epoch: 15   Global Step: 256620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:56,615-Speed 3349.85 samples/sec   Loss 0.4400   LearningRate 0.0053   Epoch: 15   Global Step: 256630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:24:59,704-Speed 3316.14 samples/sec   Loss 0.4203   LearningRate 0.0053   Epoch: 15   Global Step: 256640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:02,761-Speed 3350.49 samples/sec   Loss 0.4117   LearningRate 0.0053   Epoch: 15   Global Step: 256650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:05,849-Speed 3316.59 samples/sec   Loss 0.4230   LearningRate 0.0053   Epoch: 15   Global Step: 256660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:08,942-Speed 3311.22 samples/sec   Loss 0.4387   LearningRate 0.0053   Epoch: 15   Global Step: 256670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:12,068-Speed 3276.50 samples/sec   Loss 0.4243   LearningRate 0.0053   Epoch: 15   Global Step: 256680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:15,288-Speed 3180.64 samples/sec   Loss 0.4247   LearningRate 0.0053   Epoch: 15   Global Step: 256690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:25:18,356-Speed 3338.14 samples/sec   Loss 0.4210   LearningRate 0.0053   Epoch: 15   Global Step: 256700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:25:21,497-Speed 3260.92 samples/sec   Loss 0.4083   LearningRate 0.0053   Epoch: 15   Global Step: 256710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:25:24,560-Speed 3344.65 samples/sec   Loss 0.4153   LearningRate 0.0053   Epoch: 15   Global Step: 256720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:25:27,638-Speed 3327.44 samples/sec   Loss 0.4196   LearningRate 0.0053   Epoch: 15   Global Step: 256730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:25:30,694-Speed 3351.80 samples/sec   Loss 0.4148   LearningRate 0.0053   Epoch: 15   Global Step: 256740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:25:33,841-Speed 3254.53 samples/sec   Loss 0.4275   LearningRate 0.0053   Epoch: 15   Global Step: 256750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:25:36,975-Speed 3268.00 samples/sec   Loss 0.4122   LearningRate 0.0053   Epoch: 15   Global Step: 256760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:40,039-Speed 3342.74 samples/sec   Loss 0.4099   LearningRate 0.0053   Epoch: 15   Global Step: 256770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:43,094-Speed 3351.93 samples/sec   Loss 0.4076   LearningRate 0.0053   Epoch: 15   Global Step: 256780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:46,159-Speed 3341.94 samples/sec   Loss 0.4288   LearningRate 0.0053   Epoch: 15   Global Step: 256790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:49,237-Speed 3328.12 samples/sec   Loss 0.4016   LearningRate 0.0053   Epoch: 15   Global Step: 256800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:52,383-Speed 3255.69 samples/sec   Loss 0.4238   LearningRate 0.0053   Epoch: 15   Global Step: 256810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:55,446-Speed 3344.24 samples/sec   Loss 0.4118   LearningRate 0.0053   Epoch: 15   Global Step: 256820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:25:58,516-Speed 3336.01 samples/sec   Loss 0.4294   LearningRate 0.0053   Epoch: 15   Global Step: 256830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:01,633-Speed 3286.14 samples/sec   Loss 0.4260   LearningRate 0.0053   Epoch: 15   Global Step: 256840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:04,713-Speed 3325.58 samples/sec   Loss 0.3981   LearningRate 0.0053   Epoch: 15   Global Step: 256850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:07,807-Speed 3309.54 samples/sec   Loss 0.4150   LearningRate 0.0053   Epoch: 15   Global Step: 256860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:26:10,922-Speed 3288.43 samples/sec   Loss 0.3810   LearningRate 0.0053   Epoch: 15   Global Step: 256870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:26:14,005-Speed 3321.63 samples/sec   Loss 0.4042   LearningRate 0.0053   Epoch: 15   Global Step: 256880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:17,087-Speed 3323.11 samples/sec   Loss 0.4083   LearningRate 0.0053   Epoch: 15   Global Step: 256890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:20,173-Speed 3319.21 samples/sec   Loss 0.4239   LearningRate 0.0053   Epoch: 15   Global Step: 256900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:23,249-Speed 3330.36 samples/sec   Loss 0.4258   LearningRate 0.0053   Epoch: 15   Global Step: 256910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:26,316-Speed 3339.67 samples/sec   Loss 0.4085   LearningRate 0.0053   Epoch: 15   Global Step: 256920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:29,383-Speed 3339.31 samples/sec   Loss 0.4101   LearningRate 0.0053   Epoch: 15   Global Step: 256930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:32,441-Speed 3349.33 samples/sec   Loss 0.4220   LearningRate 0.0053   Epoch: 15   Global Step: 256940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:35,501-Speed 3347.08 samples/sec   Loss 0.4280   LearningRate 0.0053   Epoch: 15   Global Step: 256950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:38,581-Speed 3325.12 samples/sec   Loss 0.4375   LearningRate 0.0053   Epoch: 15   Global Step: 256960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:41,658-Speed 3329.09 samples/sec   Loss 0.4250   LearningRate 0.0053   Epoch: 15   Global Step: 256970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:26:44,734-Speed 3329.74 samples/sec   Loss 0.4525   LearningRate 0.0053   Epoch: 15   Global Step: 256980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:26:47,795-Speed 3345.87 samples/sec   Loss 0.4057   LearningRate 0.0053   Epoch: 15   Global Step: 256990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:26:50,856-Speed 3346.04 samples/sec   Loss 0.3999   LearningRate 0.0053   Epoch: 15   Global Step: 257000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:26:53,976-Speed 3283.27 samples/sec   Loss 0.4467   LearningRate 0.0053   Epoch: 15   Global Step: 257010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:26:57,037-Speed 3345.40 samples/sec   Loss 0.4088   LearningRate 0.0053   Epoch: 15   Global Step: 257020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:00,094-Speed 3350.40 samples/sec   Loss 0.4245   LearningRate 0.0053   Epoch: 15   Global Step: 257030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:03,237-Speed 3259.53 samples/sec   Loss 0.4078   LearningRate 0.0053   Epoch: 15   Global Step: 257040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:06,362-Speed 3277.73 samples/sec   Loss 0.4159   LearningRate 0.0053   Epoch: 15   Global Step: 257050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:09,417-Speed 3351.67 samples/sec   Loss 0.4301   LearningRate 0.0053   Epoch: 15   Global Step: 257060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:12,480-Speed 3344.17 samples/sec   Loss 0.4135   LearningRate 0.0053   Epoch: 15   Global Step: 257070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:15,592-Speed 3291.54 samples/sec   Loss 0.4057   LearningRate 0.0053   Epoch: 15   Global Step: 257080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:18,665-Speed 3333.68 samples/sec   Loss 0.4244   LearningRate 0.0053   Epoch: 15   Global Step: 257090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:21,726-Speed 3345.55 samples/sec   Loss 0.4250   LearningRate 0.0053   Epoch: 15   Global Step: 257100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:24,864-Speed 3264.15 samples/sec   Loss 0.4217   LearningRate 0.0053   Epoch: 15   Global Step: 257110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:27,953-Speed 3315.25 samples/sec   Loss 0.4125   LearningRate 0.0053   Epoch: 15   Global Step: 257120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:31,049-Speed 3308.73 samples/sec   Loss 0.4248   LearningRate 0.0053   Epoch: 15   Global Step: 257130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:34,190-Speed 3260.11 samples/sec   Loss 0.4206   LearningRate 0.0053   Epoch: 15   Global Step: 257140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:37,277-Speed 3318.21 samples/sec   Loss 0.4143   LearningRate 0.0053   Epoch: 15   Global Step: 257150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:27:40,441-Speed 3237.06 samples/sec   Loss 0.4213   LearningRate 0.0053   Epoch: 15   Global Step: 257160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:43,550-Speed 3294.21 samples/sec   Loss 0.4275   LearningRate 0.0053   Epoch: 15   Global Step: 257170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:46,626-Speed 3329.94 samples/sec   Loss 0.4298   LearningRate 0.0053   Epoch: 15   Global Step: 257180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:49,700-Speed 3332.16 samples/sec   Loss 0.4039   LearningRate 0.0053   Epoch: 15   Global Step: 257190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:52,770-Speed 3336.10 samples/sec   Loss 0.4207   LearningRate 0.0053   Epoch: 15   Global Step: 257200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:55,869-Speed 3305.40 samples/sec   Loss 0.4398   LearningRate 0.0053   Epoch: 15   Global Step: 257210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:27:58,940-Speed 3335.00 samples/sec   Loss 0.4314   LearningRate 0.0053   Epoch: 15   Global Step: 257220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:02,044-Speed 3299.43 samples/sec   Loss 0.4099   LearningRate 0.0053   Epoch: 15   Global Step: 257230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:05,105-Speed 3346.01 samples/sec   Loss 0.4222   LearningRate 0.0053   Epoch: 15   Global Step: 257240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:08,194-Speed 3316.39 samples/sec   Loss 0.4435   LearningRate 0.0053   Epoch: 15   Global Step: 257250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:11,255-Speed 3345.40 samples/sec   Loss 0.4351   LearningRate 0.0053   Epoch: 15   Global Step: 257260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:14,315-Speed 3347.75 samples/sec   Loss 0.4074   LearningRate 0.0053   Epoch: 15   Global Step: 257270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:17,393-Speed 3327.44 samples/sec   Loss 0.4200   LearningRate 0.0053   Epoch: 15   Global Step: 257280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:20,457-Speed 3343.23 samples/sec   Loss 0.4150   LearningRate 0.0053   Epoch: 15   Global Step: 257290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:23,527-Speed 3335.38 samples/sec   Loss 0.3974   LearningRate 0.0053   Epoch: 15   Global Step: 257300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:26,670-Speed 3259.09 samples/sec   Loss 0.4316   LearningRate 0.0053   Epoch: 15   Global Step: 257310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:28:29,742-Speed 3334.16 samples/sec   Loss 0.4331   LearningRate 0.0053   Epoch: 15   Global Step: 257320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:28:32,859-Speed 3286.07 samples/sec   Loss 0.4257   LearningRate 0.0053   Epoch: 15   Global Step: 257330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:28:35,965-Speed 3297.92 samples/sec   Loss 0.4140   LearningRate 0.0052   Epoch: 15   Global Step: 257340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:28:39,047-Speed 3322.98 samples/sec   Loss 0.4110   LearningRate 0.0052   Epoch: 15   Global Step: 257350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:28:42,111-Speed 3343.32 samples/sec   Loss 0.4371   LearningRate 0.0052   Epoch: 15   Global Step: 257360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:28:45,190-Speed 3326.34 samples/sec   Loss 0.4333   LearningRate 0.0052   Epoch: 15   Global Step: 257370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:28:48,259-Speed 3337.57 samples/sec   Loss 0.3951   LearningRate 0.0052   Epoch: 15   Global Step: 257380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:28:51,327-Speed 3338.23 samples/sec   Loss 0.4019   LearningRate 0.0052   Epoch: 15   Global Step: 257390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:28:54,419-Speed 3312.09 samples/sec   Loss 0.4594   LearningRate 0.0052   Epoch: 15   Global Step: 257400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:28:57,602-Speed 3217.63 samples/sec   Loss 0.4259   LearningRate 0.0052   Epoch: 15   Global Step: 257410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:00,709-Speed 3296.75 samples/sec   Loss 0.4050   LearningRate 0.0052   Epoch: 15   Global Step: 257420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:29:03,778-Speed 3337.50 samples/sec   Loss 0.4342   LearningRate 0.0052   Epoch: 15   Global Step: 257430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:29:06,883-Speed 3300.93 samples/sec   Loss 0.4344   LearningRate 0.0052   Epoch: 15   Global Step: 257440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:29:10,002-Speed 3284.69 samples/sec   Loss 0.4152   LearningRate 0.0052   Epoch: 15   Global Step: 257450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:29:13,103-Speed 3302.23 samples/sec   Loss 0.4256   LearningRate 0.0052   Epoch: 15   Global Step: 257460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:16,168-Speed 3341.52 samples/sec   Loss 0.4123   LearningRate 0.0052   Epoch: 15   Global Step: 257470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:19,354-Speed 3215.38 samples/sec   Loss 0.4372   LearningRate 0.0052   Epoch: 15   Global Step: 257480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:22,570-Speed 3184.33 samples/sec   Loss 0.4225   LearningRate 0.0052   Epoch: 15   Global Step: 257490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:25,657-Speed 3317.44 samples/sec   Loss 0.4313   LearningRate 0.0052   Epoch: 15   Global Step: 257500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:28,725-Speed 3339.25 samples/sec   Loss 0.4405   LearningRate 0.0052   Epoch: 15   Global Step: 257510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:31,790-Speed 3341.27 samples/sec   Loss 0.4120   LearningRate 0.0052   Epoch: 15   Global Step: 257520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:34,860-Speed 3336.51 samples/sec   Loss 0.4255   LearningRate 0.0052   Epoch: 15   Global Step: 257530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:37,927-Speed 3339.69 samples/sec   Loss 0.3957   LearningRate 0.0052   Epoch: 15   Global Step: 257540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:41,032-Speed 3299.01 samples/sec   Loss 0.4063   LearningRate 0.0052   Epoch: 15   Global Step: 257550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:29:44,106-Speed 3330.75 samples/sec   Loss 0.4276   LearningRate 0.0052   Epoch: 15   Global Step: 257560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:29:47,185-Speed 3326.79 samples/sec   Loss 0.3947   LearningRate 0.0052   Epoch: 15   Global Step: 257570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:29:50,261-Speed 3329.39 samples/sec   Loss 0.4161   LearningRate 0.0052   Epoch: 15   Global Step: 257580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:29:53,339-Speed 3327.52 samples/sec   Loss 0.4511   LearningRate 0.0052   Epoch: 15   Global Step: 257590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:29:56,409-Speed 3337.12 samples/sec   Loss 0.4261   LearningRate 0.0052   Epoch: 15   Global Step: 257600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:29:59,613-Speed 3196.50 samples/sec   Loss 0.4088   LearningRate 0.0052   Epoch: 15   Global Step: 257610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:30:02,704-Speed 3313.45 samples/sec   Loss 0.4436   LearningRate 0.0052   Epoch: 15   Global Step: 257620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:30:05,860-Speed 3245.83 samples/sec   Loss 0.4185   LearningRate 0.0052   Epoch: 15   Global Step: 257630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:30:08,983-Speed 3278.96 samples/sec   Loss 0.4144   LearningRate 0.0052   Epoch: 15   Global Step: 257640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:30:12,040-Speed 3350.88 samples/sec   Loss 0.4198   LearningRate 0.0052   Epoch: 15   Global Step: 257650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:15,192-Speed 3249.39 samples/sec   Loss 0.4319   LearningRate 0.0052   Epoch: 15   Global Step: 257660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:18,426-Speed 3167.57 samples/sec   Loss 0.4115   LearningRate 0.0052   Epoch: 15   Global Step: 257670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:21,510-Speed 3320.47 samples/sec   Loss 0.4140   LearningRate 0.0052   Epoch: 15   Global Step: 257680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:24,583-Speed 3332.57 samples/sec   Loss 0.4370   LearningRate 0.0052   Epoch: 15   Global Step: 257690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:27,649-Speed 3341.51 samples/sec   Loss 0.4070   LearningRate 0.0052   Epoch: 15   Global Step: 257700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:30,771-Speed 3280.43 samples/sec   Loss 0.4088   LearningRate 0.0052   Epoch: 15   Global Step: 257710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:33,904-Speed 3269.35 samples/sec   Loss 0.3966   LearningRate 0.0052   Epoch: 15   Global Step: 257720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:36,989-Speed 3319.64 samples/sec   Loss 0.4307   LearningRate 0.0052   Epoch: 15   Global Step: 257730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:40,144-Speed 3247.07 samples/sec   Loss 0.4264   LearningRate 0.0052   Epoch: 15   Global Step: 257740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:30:43,279-Speed 3266.53 samples/sec   Loss 0.4396   LearningRate 0.0052   Epoch: 15   Global Step: 257750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:30:46,362-Speed 3322.32 samples/sec   Loss 0.4372   LearningRate 0.0052   Epoch: 15   Global Step: 257760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:30:49,442-Speed 3324.96 samples/sec   Loss 0.4121   LearningRate 0.0052   Epoch: 15   Global Step: 257770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:30:52,557-Speed 3288.83 samples/sec   Loss 0.4544   LearningRate 0.0052   Epoch: 15   Global Step: 257780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:30:55,722-Speed 3236.04 samples/sec   Loss 0.4339   LearningRate 0.0052   Epoch: 15   Global Step: 257790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:30:58,874-Speed 3249.54 samples/sec   Loss 0.4084   LearningRate 0.0052   Epoch: 15   Global Step: 257800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:31:01,999-Speed 3277.01 samples/sec   Loss 0.4398   LearningRate 0.0052   Epoch: 15   Global Step: 257810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:31:05,115-Speed 3287.46 samples/sec   Loss 0.4279   LearningRate 0.0052   Epoch: 15   Global Step: 257820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:31:08,181-Speed 3341.10 samples/sec   Loss 0.4413   LearningRate 0.0052   Epoch: 15   Global Step: 257830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:11,421-Speed 3160.58 samples/sec   Loss 0.4073   LearningRate 0.0052   Epoch: 15   Global Step: 257840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:14,625-Speed 3196.55 samples/sec   Loss 0.4416   LearningRate 0.0052   Epoch: 15   Global Step: 257850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:17,695-Speed 3336.56 samples/sec   Loss 0.4391   LearningRate 0.0052   Epoch: 15   Global Step: 257860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:20,772-Speed 3328.75 samples/sec   Loss 0.4146   LearningRate 0.0052   Epoch: 15   Global Step: 257870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:23,957-Speed 3215.96 samples/sec   Loss 0.4241   LearningRate 0.0052   Epoch: 15   Global Step: 257880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:27,112-Speed 3246.55 samples/sec   Loss 0.4347   LearningRate 0.0052   Epoch: 15   Global Step: 257890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:30,241-Speed 3272.69 samples/sec   Loss 0.4333   LearningRate 0.0052   Epoch: 15   Global Step: 257900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:33,445-Speed 3197.31 samples/sec   Loss 0.4330   LearningRate 0.0052   Epoch: 15   Global Step: 257910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:36,573-Speed 3274.57 samples/sec   Loss 0.4325   LearningRate 0.0052   Epoch: 15   Global Step: 257920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:31:39,692-Speed 3282.96 samples/sec   Loss 0.4230   LearningRate 0.0052   Epoch: 15   Global Step: 257930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:31:42,852-Speed 3241.74 samples/sec   Loss 0.4010   LearningRate 0.0052   Epoch: 15   Global Step: 257940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:31:45,926-Speed 3331.85 samples/sec   Loss 0.4114   LearningRate 0.0052   Epoch: 15   Global Step: 257950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:31:49,004-Speed 3327.29 samples/sec   Loss 0.4273   LearningRate 0.0052   Epoch: 15   Global Step: 257960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:31:52,070-Speed 3341.55 samples/sec   Loss 0.4298   LearningRate 0.0052   Epoch: 15   Global Step: 257970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:31:55,142-Speed 3333.26 samples/sec   Loss 0.4482   LearningRate 0.0052   Epoch: 15   Global Step: 257980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:31:58,278-Speed 3266.47 samples/sec   Loss 0.4459   LearningRate 0.0052   Epoch: 15   Global Step: 257990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:32:01,355-Speed 3328.21 samples/sec   Loss 0.4057   LearningRate 0.0052   Epoch: 15   Global Step: 258000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:32:45,210-[lfw][258000]XNorm: 20.825084
Training: 2022-04-12 02:32:45,210-[lfw][258000]Accuracy-Flip: 0.99750+-0.00239
Training: 2022-04-12 02:32:45,211-[lfw][258000]Accuracy-Highest: 0.99817
Training: 2022-04-12 02:33:36,195-[cfp_fp][258000]XNorm: 21.744758
Training: 2022-04-12 02:33:36,195-[cfp_fp][258000]Accuracy-Flip: 0.99171+-0.00510
Training: 2022-04-12 02:33:36,196-[cfp_fp][258000]Accuracy-Highest: 0.99186
Training: 2022-04-12 02:34:20,288-[agedb_30][258000]XNorm: 22.311938
Training: 2022-04-12 02:34:20,289-[agedb_30][258000]Accuracy-Flip: 0.98533+-0.00567
Training: 2022-04-12 02:34:20,289-[agedb_30][258000]Accuracy-Highest: 0.98650
Training: 2022-04-12 02:34:23,456-Speed 72.06 samples/sec   Loss 0.3996   LearningRate 0.0052   Epoch: 15   Global Step: 258010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:34:26,578-Speed 3281.07 samples/sec   Loss 0.4306   LearningRate 0.0052   Epoch: 15   Global Step: 258020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:34:29,637-Speed 3348.28 samples/sec   Loss 0.4430   LearningRate 0.0052   Epoch: 15   Global Step: 258030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:34:32,722-Speed 3319.84 samples/sec   Loss 0.4033   LearningRate 0.0052   Epoch: 15   Global Step: 258040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:34:35,867-Speed 3256.73 samples/sec   Loss 0.4538   LearningRate 0.0052   Epoch: 15   Global Step: 258050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:34:39,056-Speed 3211.54 samples/sec   Loss 0.4561   LearningRate 0.0052   Epoch: 15   Global Step: 258060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:34:42,155-Speed 3305.05 samples/sec   Loss 0.4314   LearningRate 0.0051   Epoch: 15   Global Step: 258070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:34:45,222-Speed 3339.80 samples/sec   Loss 0.4372   LearningRate 0.0051   Epoch: 15   Global Step: 258080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:34:48,292-Speed 3336.78 samples/sec   Loss 0.4319   LearningRate 0.0051   Epoch: 15   Global Step: 258090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:34:51,354-Speed 3345.04 samples/sec   Loss 0.4005   LearningRate 0.0051   Epoch: 15   Global Step: 258100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:34:54,423-Speed 3336.71 samples/sec   Loss 0.4204   LearningRate 0.0051   Epoch: 15   Global Step: 258110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:34:57,526-Speed 3300.97 samples/sec   Loss 0.4124   LearningRate 0.0051   Epoch: 15   Global Step: 258120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:35:00,607-Speed 3324.10 samples/sec   Loss 0.3979   LearningRate 0.0051   Epoch: 15   Global Step: 258130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:35:03,671-Speed 3342.83 samples/sec   Loss 0.4162   LearningRate 0.0051   Epoch: 15   Global Step: 258140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:35:06,757-Speed 3319.70 samples/sec   Loss 0.4483   LearningRate 0.0051   Epoch: 15   Global Step: 258150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:35:09,841-Speed 3320.27 samples/sec   Loss 0.4329   LearningRate 0.0051   Epoch: 15   Global Step: 258160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:35:12,920-Speed 3326.85 samples/sec   Loss 0.4376   LearningRate 0.0051   Epoch: 15   Global Step: 258170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:16,019-Speed 3305.96 samples/sec   Loss 0.4143   LearningRate 0.0051   Epoch: 15   Global Step: 258180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:19,098-Speed 3325.51 samples/sec   Loss 0.4107   LearningRate 0.0051   Epoch: 15   Global Step: 258190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:22,174-Speed 3329.82 samples/sec   Loss 0.4135   LearningRate 0.0051   Epoch: 15   Global Step: 258200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:25,263-Speed 3316.51 samples/sec   Loss 0.4344   LearningRate 0.0051   Epoch: 15   Global Step: 258210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:28,425-Speed 3238.61 samples/sec   Loss 0.4087   LearningRate 0.0051   Epoch: 15   Global Step: 258220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:31,500-Speed 3331.21 samples/sec   Loss 0.4355   LearningRate 0.0051   Epoch: 15   Global Step: 258230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:34,570-Speed 3335.96 samples/sec   Loss 0.4299   LearningRate 0.0051   Epoch: 15   Global Step: 258240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:37,656-Speed 3318.92 samples/sec   Loss 0.4545   LearningRate 0.0051   Epoch: 15   Global Step: 258250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:40,753-Speed 3308.15 samples/sec   Loss 0.4081   LearningRate 0.0051   Epoch: 15   Global Step: 258260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:43,969-Speed 3184.11 samples/sec   Loss 0.4286   LearningRate 0.0051   Epoch: 15   Global Step: 258270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:47,076-Speed 3297.61 samples/sec   Loss 0.4246   LearningRate 0.0051   Epoch: 15   Global Step: 258280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:50,217-Speed 3259.85 samples/sec   Loss 0.4233   LearningRate 0.0051   Epoch: 15   Global Step: 258290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:53,277-Speed 3347.24 samples/sec   Loss 0.4092   LearningRate 0.0051   Epoch: 15   Global Step: 258300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:56,351-Speed 3332.04 samples/sec   Loss 0.4275   LearningRate 0.0051   Epoch: 15   Global Step: 258310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:35:59,428-Speed 3329.39 samples/sec   Loss 0.4375   LearningRate 0.0051   Epoch: 15   Global Step: 258320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:02,525-Speed 3306.49 samples/sec   Loss 0.4190   LearningRate 0.0051   Epoch: 15   Global Step: 258330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:05,594-Speed 3337.95 samples/sec   Loss 0.4337   LearningRate 0.0051   Epoch: 15   Global Step: 258340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:08,657-Speed 3343.81 samples/sec   Loss 0.4091   LearningRate 0.0051   Epoch: 15   Global Step: 258350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:11,756-Speed 3304.88 samples/sec   Loss 0.4222   LearningRate 0.0051   Epoch: 15   Global Step: 258360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:14,832-Speed 3329.45 samples/sec   Loss 0.4278   LearningRate 0.0051   Epoch: 15   Global Step: 258370   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-12 02:36:17,907-Speed 3331.04 samples/sec   Loss 0.4654   LearningRate 0.0051   Epoch: 15   Global Step: 258380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:21,010-Speed 3301.57 samples/sec   Loss 0.4026   LearningRate 0.0051   Epoch: 15   Global Step: 258390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:24,078-Speed 3337.45 samples/sec   Loss 0.4568   LearningRate 0.0051   Epoch: 15   Global Step: 258400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:27,151-Speed 3334.09 samples/sec   Loss 0.4240   LearningRate 0.0051   Epoch: 15   Global Step: 258410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:30,240-Speed 3315.60 samples/sec   Loss 0.4192   LearningRate 0.0051   Epoch: 15   Global Step: 258420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:33,415-Speed 3225.58 samples/sec   Loss 0.4337   LearningRate 0.0051   Epoch: 15   Global Step: 258430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:36,503-Speed 3317.59 samples/sec   Loss 0.4245   LearningRate 0.0051   Epoch: 15   Global Step: 258440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:39,591-Speed 3316.03 samples/sec   Loss 0.4073   LearningRate 0.0051   Epoch: 15   Global Step: 258450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:36:42,709-Speed 3284.84 samples/sec   Loss 0.4193   LearningRate 0.0051   Epoch: 15   Global Step: 258460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:36:45,859-Speed 3252.39 samples/sec   Loss 0.4301   LearningRate 0.0051   Epoch: 15   Global Step: 258470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:36:48,950-Speed 3313.16 samples/sec   Loss 0.4329   LearningRate 0.0051   Epoch: 15   Global Step: 258480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:36:52,026-Speed 3329.96 samples/sec   Loss 0.4274   LearningRate 0.0051   Epoch: 15   Global Step: 258490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:36:55,094-Speed 3337.79 samples/sec   Loss 0.4351   LearningRate 0.0051   Epoch: 15   Global Step: 258500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:36:58,187-Speed 3312.30 samples/sec   Loss 0.4479   LearningRate 0.0051   Epoch: 15   Global Step: 258510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:01,269-Speed 3323.07 samples/sec   Loss 0.4269   LearningRate 0.0051   Epoch: 15   Global Step: 258520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:04,400-Speed 3271.82 samples/sec   Loss 0.4155   LearningRate 0.0051   Epoch: 15   Global Step: 258530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:07,494-Speed 3309.85 samples/sec   Loss 0.4365   LearningRate 0.0051   Epoch: 15   Global Step: 258540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:10,551-Speed 3350.04 samples/sec   Loss 0.4119   LearningRate 0.0051   Epoch: 15   Global Step: 258550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:13,613-Speed 3345.46 samples/sec   Loss 0.4151   LearningRate 0.0051   Epoch: 15   Global Step: 258560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:37:16,678-Speed 3341.00 samples/sec   Loss 0.4394   LearningRate 0.0051   Epoch: 15   Global Step: 258570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:37:19,778-Speed 3304.56 samples/sec   Loss 0.4297   LearningRate 0.0051   Epoch: 15   Global Step: 258580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:37:22,876-Speed 3306.44 samples/sec   Loss 0.4421   LearningRate 0.0051   Epoch: 15   Global Step: 258590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:37:25,975-Speed 3304.33 samples/sec   Loss 0.4103   LearningRate 0.0051   Epoch: 15   Global Step: 258600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:29,038-Speed 3343.84 samples/sec   Loss 0.4114   LearningRate 0.0051   Epoch: 15   Global Step: 258610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:32,100-Speed 3345.06 samples/sec   Loss 0.4368   LearningRate 0.0051   Epoch: 15   Global Step: 258620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:35,164-Speed 3342.70 samples/sec   Loss 0.4463   LearningRate 0.0051   Epoch: 15   Global Step: 258630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:38,223-Speed 3348.44 samples/sec   Loss 0.4205   LearningRate 0.0051   Epoch: 15   Global Step: 258640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:41,292-Speed 3337.63 samples/sec   Loss 0.4454   LearningRate 0.0051   Epoch: 15   Global Step: 258650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:44,361-Speed 3336.59 samples/sec   Loss 0.4373   LearningRate 0.0051   Epoch: 15   Global Step: 258660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:47,541-Speed 3221.67 samples/sec   Loss 0.4524   LearningRate 0.0051   Epoch: 15   Global Step: 258670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:50,616-Speed 3330.47 samples/sec   Loss 0.4022   LearningRate 0.0051   Epoch: 15   Global Step: 258680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:53,728-Speed 3291.41 samples/sec   Loss 0.4453   LearningRate 0.0051   Epoch: 15   Global Step: 258690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:37:56,789-Speed 3346.10 samples/sec   Loss 0.4213   LearningRate 0.0051   Epoch: 15   Global Step: 258700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:37:59,855-Speed 3340.67 samples/sec   Loss 0.4213   LearningRate 0.0051   Epoch: 15   Global Step: 258710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:02,973-Speed 3284.68 samples/sec   Loss 0.4316   LearningRate 0.0051   Epoch: 15   Global Step: 258720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:06,031-Speed 3349.46 samples/sec   Loss 0.4045   LearningRate 0.0051   Epoch: 15   Global Step: 258730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:09,095-Speed 3342.67 samples/sec   Loss 0.4284   LearningRate 0.0051   Epoch: 15   Global Step: 258740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:12,173-Speed 3327.13 samples/sec   Loss 0.4532   LearningRate 0.0051   Epoch: 15   Global Step: 258750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:15,305-Speed 3270.39 samples/sec   Loss 0.4429   LearningRate 0.0051   Epoch: 15   Global Step: 258760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:18,381-Speed 3330.28 samples/sec   Loss 0.4206   LearningRate 0.0051   Epoch: 15   Global Step: 258770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:21,456-Speed 3330.95 samples/sec   Loss 0.4464   LearningRate 0.0051   Epoch: 15   Global Step: 258780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:24,531-Speed 3331.27 samples/sec   Loss 0.4062   LearningRate 0.0051   Epoch: 15   Global Step: 258790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:27,585-Speed 3352.96 samples/sec   Loss 0.4474   LearningRate 0.0051   Epoch: 15   Global Step: 258800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:30,649-Speed 3342.98 samples/sec   Loss 0.4305   LearningRate 0.0050   Epoch: 15   Global Step: 258810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:33,710-Speed 3346.08 samples/sec   Loss 0.4140   LearningRate 0.0050   Epoch: 15   Global Step: 258820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:36,791-Speed 3324.85 samples/sec   Loss 0.4279   LearningRate 0.0050   Epoch: 15   Global Step: 258830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:39,851-Speed 3346.66 samples/sec   Loss 0.4156   LearningRate 0.0050   Epoch: 15   Global Step: 258840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:42,923-Speed 3334.41 samples/sec   Loss 0.4219   LearningRate 0.0050   Epoch: 15   Global Step: 258850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:38:45,993-Speed 3336.47 samples/sec   Loss 0.4633   LearningRate 0.0050   Epoch: 15   Global Step: 258860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:38:49,088-Speed 3309.55 samples/sec   Loss 0.4448   LearningRate 0.0050   Epoch: 15   Global Step: 258870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:38:52,157-Speed 3337.29 samples/sec   Loss 0.4317   LearningRate 0.0050   Epoch: 15   Global Step: 258880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:38:55,255-Speed 3305.36 samples/sec   Loss 0.4299   LearningRate 0.0050   Epoch: 15   Global Step: 258890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:38:58,398-Speed 3259.13 samples/sec   Loss 0.4375   LearningRate 0.0050   Epoch: 15   Global Step: 258900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:01,490-Speed 3312.65 samples/sec   Loss 0.4167   LearningRate 0.0050   Epoch: 15   Global Step: 258910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:04,703-Speed 3187.38 samples/sec   Loss 0.4184   LearningRate 0.0050   Epoch: 15   Global Step: 258920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:07,873-Speed 3231.72 samples/sec   Loss 0.4201   LearningRate 0.0050   Epoch: 15   Global Step: 258930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:11,002-Speed 3272.67 samples/sec   Loss 0.4246   LearningRate 0.0050   Epoch: 15   Global Step: 258940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:14,115-Speed 3290.96 samples/sec   Loss 0.4388   LearningRate 0.0050   Epoch: 15   Global Step: 258950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:17,185-Speed 3335.80 samples/sec   Loss 0.4094   LearningRate 0.0050   Epoch: 15   Global Step: 258960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:39:20,237-Speed 3356.40 samples/sec   Loss 0.4470   LearningRate 0.0050   Epoch: 15   Global Step: 258970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:23,380-Speed 3258.60 samples/sec   Loss 0.4169   LearningRate 0.0050   Epoch: 15   Global Step: 258980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:26,464-Speed 3321.29 samples/sec   Loss 0.4187   LearningRate 0.0050   Epoch: 15   Global Step: 258990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:29,655-Speed 3209.86 samples/sec   Loss 0.4281   LearningRate 0.0050   Epoch: 15   Global Step: 259000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:32,764-Speed 3294.38 samples/sec   Loss 0.4368   LearningRate 0.0050   Epoch: 15   Global Step: 259010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:35,872-Speed 3295.48 samples/sec   Loss 0.4261   LearningRate 0.0050   Epoch: 15   Global Step: 259020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:38,954-Speed 3322.79 samples/sec   Loss 0.4151   LearningRate 0.0050   Epoch: 15   Global Step: 259030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:42,020-Speed 3340.83 samples/sec   Loss 0.4329   LearningRate 0.0050   Epoch: 15   Global Step: 259040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:45,085-Speed 3342.84 samples/sec   Loss 0.4118   LearningRate 0.0050   Epoch: 15   Global Step: 259050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:48,178-Speed 3310.52 samples/sec   Loss 0.4142   LearningRate 0.0050   Epoch: 15   Global Step: 259060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:39:51,254-Speed 3329.88 samples/sec   Loss 0.4440   LearningRate 0.0050   Epoch: 15   Global Step: 259070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:39:54,323-Speed 3337.66 samples/sec   Loss 0.4290   LearningRate 0.0050   Epoch: 15   Global Step: 259080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:39:57,386-Speed 3343.36 samples/sec   Loss 0.4053   LearningRate 0.0050   Epoch: 15   Global Step: 259090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:40:00,448-Speed 3344.73 samples/sec   Loss 0.4396   LearningRate 0.0050   Epoch: 15   Global Step: 259100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:40:03,582-Speed 3269.09 samples/sec   Loss 0.4456   LearningRate 0.0050   Epoch: 15   Global Step: 259110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:40:06,817-Speed 3165.14 samples/sec   Loss 0.4158   LearningRate 0.0050   Epoch: 15   Global Step: 259120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:40:09,928-Speed 3293.27 samples/sec   Loss 0.4353   LearningRate 0.0050   Epoch: 15   Global Step: 259130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:40:13,045-Speed 3286.29 samples/sec   Loss 0.4164   LearningRate 0.0050   Epoch: 15   Global Step: 259140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:40:16,101-Speed 3351.13 samples/sec   Loss 0.4465   LearningRate 0.0050   Epoch: 15   Global Step: 259150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:19,163-Speed 3344.72 samples/sec   Loss 0.4407   LearningRate 0.0050   Epoch: 15   Global Step: 259160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:22,280-Speed 3286.66 samples/sec   Loss 0.4270   LearningRate 0.0050   Epoch: 15   Global Step: 259170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:25,452-Speed 3228.75 samples/sec   Loss 0.4159   LearningRate 0.0050   Epoch: 15   Global Step: 259180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:28,540-Speed 3316.88 samples/sec   Loss 0.4521   LearningRate 0.0050   Epoch: 15   Global Step: 259190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:31,612-Speed 3333.06 samples/sec   Loss 0.4288   LearningRate 0.0050   Epoch: 15   Global Step: 259200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:34,677-Speed 3342.89 samples/sec   Loss 0.4349   LearningRate 0.0050   Epoch: 15   Global Step: 259210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:37,739-Speed 3344.53 samples/sec   Loss 0.4259   LearningRate 0.0050   Epoch: 15   Global Step: 259220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:40,826-Speed 3318.49 samples/sec   Loss 0.4092   LearningRate 0.0050   Epoch: 15   Global Step: 259230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:43,894-Speed 3338.66 samples/sec   Loss 0.4165   LearningRate 0.0050   Epoch: 15   Global Step: 259240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:46,995-Speed 3301.90 samples/sec   Loss 0.4352   LearningRate 0.0050   Epoch: 15   Global Step: 259250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:40:50,138-Speed 3258.76 samples/sec   Loss 0.4590   LearningRate 0.0050   Epoch: 15   Global Step: 259260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:40:53,251-Speed 3290.39 samples/sec   Loss 0.4357   LearningRate 0.0050   Epoch: 15   Global Step: 259270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:40:56,338-Speed 3317.67 samples/sec   Loss 0.4411   LearningRate 0.0050   Epoch: 15   Global Step: 259280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:40:59,463-Speed 3278.03 samples/sec   Loss 0.4440   LearningRate 0.0050   Epoch: 15   Global Step: 259290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:41:02,550-Speed 3317.91 samples/sec   Loss 0.4419   LearningRate 0.0050   Epoch: 15   Global Step: 259300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:41:05,652-Speed 3301.92 samples/sec   Loss 0.4232   LearningRate 0.0050   Epoch: 15   Global Step: 259310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:41:08,790-Speed 3264.42 samples/sec   Loss 0.4168   LearningRate 0.0050   Epoch: 15   Global Step: 259320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:41:11,969-Speed 3221.85 samples/sec   Loss 0.4138   LearningRate 0.0050   Epoch: 15   Global Step: 259330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:41:15,175-Speed 3194.19 samples/sec   Loss 0.4137   LearningRate 0.0050   Epoch: 15   Global Step: 259340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:41:18,299-Speed 3278.06 samples/sec   Loss 0.4372   LearningRate 0.0050   Epoch: 15   Global Step: 259350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:41:21,372-Speed 3333.15 samples/sec   Loss 0.4306   LearningRate 0.0050   Epoch: 15   Global Step: 259360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:41:24,592-Speed 3180.78 samples/sec   Loss 0.4284   LearningRate 0.0050   Epoch: 15   Global Step: 259370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:41:27,684-Speed 3313.59 samples/sec   Loss 0.4107   LearningRate 0.0050   Epoch: 15   Global Step: 259380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:30,820-Speed 3265.84 samples/sec   Loss 0.4277   LearningRate 0.0050   Epoch: 15   Global Step: 259390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:33,908-Speed 3316.27 samples/sec   Loss 0.4303   LearningRate 0.0050   Epoch: 15   Global Step: 259400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:36,974-Speed 3340.48 samples/sec   Loss 0.4402   LearningRate 0.0050   Epoch: 15   Global Step: 259410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:40,081-Speed 3297.05 samples/sec   Loss 0.4108   LearningRate 0.0050   Epoch: 15   Global Step: 259420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:43,282-Speed 3200.14 samples/sec   Loss 0.4213   LearningRate 0.0050   Epoch: 15   Global Step: 259430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:46,425-Speed 3257.92 samples/sec   Loss 0.4386   LearningRate 0.0050   Epoch: 15   Global Step: 259440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:49,547-Speed 3280.42 samples/sec   Loss 0.4255   LearningRate 0.0050   Epoch: 15   Global Step: 259450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:52,635-Speed 3317.62 samples/sec   Loss 0.4085   LearningRate 0.0050   Epoch: 15   Global Step: 259460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:55,747-Speed 3291.65 samples/sec   Loss 0.4123   LearningRate 0.0050   Epoch: 15   Global Step: 259470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:41:58,883-Speed 3265.32 samples/sec   Loss 0.4388   LearningRate 0.0050   Epoch: 15   Global Step: 259480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:42:01,955-Speed 3334.37 samples/sec   Loss 0.4205   LearningRate 0.0050   Epoch: 15   Global Step: 259490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:42:05,034-Speed 3327.01 samples/sec   Loss 0.4142   LearningRate 0.0050   Epoch: 15   Global Step: 259500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:42:08,107-Speed 3332.55 samples/sec   Loss 0.4315   LearningRate 0.0050   Epoch: 15   Global Step: 259510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:42:11,179-Speed 3333.76 samples/sec   Loss 0.4099   LearningRate 0.0050   Epoch: 15   Global Step: 259520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:42:14,256-Speed 3329.33 samples/sec   Loss 0.4164   LearningRate 0.0050   Epoch: 15   Global Step: 259530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:42:17,321-Speed 3340.82 samples/sec   Loss 0.4273   LearningRate 0.0050   Epoch: 15   Global Step: 259540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:42:20,403-Speed 3323.46 samples/sec   Loss 0.4292   LearningRate 0.0049   Epoch: 15   Global Step: 259550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:42:23,472-Speed 3337.54 samples/sec   Loss 0.4286   LearningRate 0.0049   Epoch: 15   Global Step: 259560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:42:26,523-Speed 3357.19 samples/sec   Loss 0.4389   LearningRate 0.0049   Epoch: 15   Global Step: 259570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:29,583-Speed 3346.92 samples/sec   Loss 0.4083   LearningRate 0.0049   Epoch: 15   Global Step: 259580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:32,650-Speed 3339.73 samples/sec   Loss 0.4291   LearningRate 0.0049   Epoch: 15   Global Step: 259590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:35,740-Speed 3315.26 samples/sec   Loss 0.4285   LearningRate 0.0049   Epoch: 15   Global Step: 259600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:39,599-Speed 2653.95 samples/sec   Loss 0.4244   LearningRate 0.0049   Epoch: 15   Global Step: 259610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:42,689-Speed 3314.76 samples/sec   Loss 0.4683   LearningRate 0.0049   Epoch: 15   Global Step: 259620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:45,772-Speed 3321.86 samples/sec   Loss 0.4261   LearningRate 0.0049   Epoch: 15   Global Step: 259630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:48,911-Speed 3263.07 samples/sec   Loss 0.4079   LearningRate 0.0049   Epoch: 15   Global Step: 259640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:51,985-Speed 3331.36 samples/sec   Loss 0.4381   LearningRate 0.0049   Epoch: 15   Global Step: 259650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:55,071-Speed 3318.96 samples/sec   Loss 0.4157   LearningRate 0.0049   Epoch: 15   Global Step: 259660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:42:58,206-Speed 3267.93 samples/sec   Loss 0.4107   LearningRate 0.0049   Epoch: 15   Global Step: 259670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:43:01,301-Speed 3308.37 samples/sec   Loss 0.4027   LearningRate 0.0049   Epoch: 15   Global Step: 259680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:43:04,458-Speed 3245.25 samples/sec   Loss 0.4180   LearningRate 0.0049   Epoch: 15   Global Step: 259690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:43:07,590-Speed 3269.31 samples/sec   Loss 0.4154   LearningRate 0.0049   Epoch: 15   Global Step: 259700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:43:10,656-Speed 3341.08 samples/sec   Loss 0.4335   LearningRate 0.0049   Epoch: 15   Global Step: 259710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:43:13,733-Speed 3328.72 samples/sec   Loss 0.4527   LearningRate 0.0049   Epoch: 15   Global Step: 259720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:43:16,811-Speed 3327.78 samples/sec   Loss 0.4370   LearningRate 0.0049   Epoch: 15   Global Step: 259730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:19,878-Speed 3339.48 samples/sec   Loss 0.4407   LearningRate 0.0049   Epoch: 15   Global Step: 259740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:22,960-Speed 3323.28 samples/sec   Loss 0.4143   LearningRate 0.0049   Epoch: 15   Global Step: 259750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:26,041-Speed 3324.82 samples/sec   Loss 0.4362   LearningRate 0.0049   Epoch: 15   Global Step: 259760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:29,138-Speed 3306.53 samples/sec   Loss 0.4364   LearningRate 0.0049   Epoch: 15   Global Step: 259770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:32,219-Speed 3324.85 samples/sec   Loss 0.4003   LearningRate 0.0049   Epoch: 15   Global Step: 259780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:35,408-Speed 3211.70 samples/sec   Loss 0.4051   LearningRate 0.0049   Epoch: 15   Global Step: 259790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:38,609-Speed 3199.82 samples/sec   Loss 0.4311   LearningRate 0.0049   Epoch: 15   Global Step: 259800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:41,798-Speed 3211.43 samples/sec   Loss 0.4206   LearningRate 0.0049   Epoch: 15   Global Step: 259810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:44,865-Speed 3339.95 samples/sec   Loss 0.4497   LearningRate 0.0049   Epoch: 15   Global Step: 259820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:43:47,935-Speed 3336.27 samples/sec   Loss 0.4513   LearningRate 0.0049   Epoch: 15   Global Step: 259830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:43:51,002-Speed 3338.88 samples/sec   Loss 0.4450   LearningRate 0.0049   Epoch: 15   Global Step: 259840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:43:54,085-Speed 3322.67 samples/sec   Loss 0.4233   LearningRate 0.0049   Epoch: 15   Global Step: 259850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:43:57,155-Speed 3335.89 samples/sec   Loss 0.4253   LearningRate 0.0049   Epoch: 15   Global Step: 259860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:00,223-Speed 3338.35 samples/sec   Loss 0.4133   LearningRate 0.0049   Epoch: 15   Global Step: 259870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:03,295-Speed 3334.06 samples/sec   Loss 0.4261   LearningRate 0.0049   Epoch: 15   Global Step: 259880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:06,372-Speed 3329.51 samples/sec   Loss 0.4544   LearningRate 0.0049   Epoch: 15   Global Step: 259890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:09,454-Speed 3322.82 samples/sec   Loss 0.4184   LearningRate 0.0049   Epoch: 15   Global Step: 259900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:12,520-Speed 3340.59 samples/sec   Loss 0.4473   LearningRate 0.0049   Epoch: 15   Global Step: 259910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:15,601-Speed 3323.82 samples/sec   Loss 0.4498   LearningRate 0.0049   Epoch: 15   Global Step: 259920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:18,666-Speed 3341.95 samples/sec   Loss 0.4230   LearningRate 0.0049   Epoch: 15   Global Step: 259930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:21,769-Speed 3301.21 samples/sec   Loss 0.4246   LearningRate 0.0049   Epoch: 15   Global Step: 259940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:24,947-Speed 3223.36 samples/sec   Loss 0.4322   LearningRate 0.0049   Epoch: 15   Global Step: 259950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:28,076-Speed 3273.30 samples/sec   Loss 0.4268   LearningRate 0.0049   Epoch: 15   Global Step: 259960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:44:31,299-Speed 3177.68 samples/sec   Loss 0.4096   LearningRate 0.0049   Epoch: 15   Global Step: 259970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:44:34,368-Speed 3336.84 samples/sec   Loss 0.4577   LearningRate 0.0049   Epoch: 15   Global Step: 259980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:44:37,456-Speed 3317.51 samples/sec   Loss 0.4259   LearningRate 0.0049   Epoch: 15   Global Step: 259990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:44:40,514-Speed 3349.31 samples/sec   Loss 0.4334   LearningRate 0.0049   Epoch: 15   Global Step: 260000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:45:24,496-[lfw][260000]XNorm: 22.488010
Training: 2022-04-12 02:45:24,497-[lfw][260000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 02:45:24,498-[lfw][260000]Accuracy-Highest: 0.99817
Training: 2022-04-12 02:46:15,545-[cfp_fp][260000]XNorm: 23.407848
Training: 2022-04-12 02:46:15,546-[cfp_fp][260000]Accuracy-Flip: 0.99114+-0.00464
Training: 2022-04-12 02:46:15,546-[cfp_fp][260000]Accuracy-Highest: 0.99186
Training: 2022-04-12 02:46:59,411-[agedb_30][260000]XNorm: 23.868578
Training: 2022-04-12 02:46:59,412-[agedb_30][260000]Accuracy-Flip: 0.98533+-0.00515
Training: 2022-04-12 02:46:59,412-[agedb_30][260000]Accuracy-Highest: 0.98650
Training: 2022-04-12 02:47:02,534-Speed 72.10 samples/sec   Loss 0.4264   LearningRate 0.0049   Epoch: 15   Global Step: 260010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:05,698-Speed 3237.63 samples/sec   Loss 0.4168   LearningRate 0.0049   Epoch: 15   Global Step: 260020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:08,833-Speed 3266.57 samples/sec   Loss 0.4497   LearningRate 0.0049   Epoch: 15   Global Step: 260030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:11,909-Speed 3330.26 samples/sec   Loss 0.4610   LearningRate 0.0049   Epoch: 15   Global Step: 260040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:14,993-Speed 3320.14 samples/sec   Loss 0.4420   LearningRate 0.0049   Epoch: 15   Global Step: 260050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:18,079-Speed 3319.67 samples/sec   Loss 0.4287   LearningRate 0.0049   Epoch: 15   Global Step: 260060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:21,155-Speed 3329.53 samples/sec   Loss 0.4701   LearningRate 0.0049   Epoch: 15   Global Step: 260070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:24,247-Speed 3313.25 samples/sec   Loss 0.4474   LearningRate 0.0049   Epoch: 15   Global Step: 260080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:27,328-Speed 3324.13 samples/sec   Loss 0.4353   LearningRate 0.0049   Epoch: 15   Global Step: 260090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:30,392-Speed 3342.57 samples/sec   Loss 0.4320   LearningRate 0.0049   Epoch: 15   Global Step: 260100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:47:33,467-Speed 3331.12 samples/sec   Loss 0.4093   LearningRate 0.0049   Epoch: 15   Global Step: 260110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:36,559-Speed 3311.99 samples/sec   Loss 0.4472   LearningRate 0.0049   Epoch: 15   Global Step: 260120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:39,705-Speed 3256.03 samples/sec   Loss 0.4237   LearningRate 0.0049   Epoch: 15   Global Step: 260130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:42,770-Speed 3341.76 samples/sec   Loss 0.4195   LearningRate 0.0049   Epoch: 15   Global Step: 260140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:45,862-Speed 3312.67 samples/sec   Loss 0.4358   LearningRate 0.0049   Epoch: 15   Global Step: 260150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:48,920-Speed 3349.40 samples/sec   Loss 0.4361   LearningRate 0.0049   Epoch: 15   Global Step: 260160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:52,044-Speed 3278.93 samples/sec   Loss 0.4383   LearningRate 0.0049   Epoch: 15   Global Step: 260170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:55,121-Speed 3328.16 samples/sec   Loss 0.4244   LearningRate 0.0049   Epoch: 15   Global Step: 260180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:47:58,179-Speed 3349.40 samples/sec   Loss 0.4586   LearningRate 0.0049   Epoch: 15   Global Step: 260190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:48:01,240-Speed 3345.98 samples/sec   Loss 0.4287   LearningRate 0.0049   Epoch: 15   Global Step: 260200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 02:48:04,317-Speed 3329.17 samples/sec   Loss 0.4203   LearningRate 0.0049   Epoch: 15   Global Step: 260210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:07,418-Speed 3302.73 samples/sec   Loss 0.4227   LearningRate 0.0049   Epoch: 15   Global Step: 260220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:10,500-Speed 3323.31 samples/sec   Loss 0.4197   LearningRate 0.0049   Epoch: 15   Global Step: 260230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:13,575-Speed 3331.33 samples/sec   Loss 0.4287   LearningRate 0.0049   Epoch: 15   Global Step: 260240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:16,675-Speed 3303.30 samples/sec   Loss 0.4189   LearningRate 0.0049   Epoch: 15   Global Step: 260250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:19,753-Speed 3327.79 samples/sec   Loss 0.4499   LearningRate 0.0049   Epoch: 15   Global Step: 260260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:22,877-Speed 3278.87 samples/sec   Loss 0.4321   LearningRate 0.0049   Epoch: 15   Global Step: 260270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:26,044-Speed 3233.77 samples/sec   Loss 0.4479   LearningRate 0.0049   Epoch: 15   Global Step: 260280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:29,121-Speed 3329.52 samples/sec   Loss 0.4425   LearningRate 0.0049   Epoch: 15   Global Step: 260290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:32,191-Speed 3335.46 samples/sec   Loss 0.4454   LearningRate 0.0049   Epoch: 15   Global Step: 260300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:48:35,254-Speed 3344.08 samples/sec   Loss 0.4412   LearningRate 0.0048   Epoch: 15   Global Step: 260310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:48:38,319-Speed 3341.99 samples/sec   Loss 0.4366   LearningRate 0.0048   Epoch: 15   Global Step: 260320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:48:41,393-Speed 3331.68 samples/sec   Loss 0.4127   LearningRate 0.0048   Epoch: 15   Global Step: 260330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:48:44,507-Speed 3289.04 samples/sec   Loss 0.4661   LearningRate 0.0048   Epoch: 15   Global Step: 260340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:48:47,617-Speed 3294.14 samples/sec   Loss 0.4405   LearningRate 0.0048   Epoch: 15   Global Step: 260350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:48:50,677-Speed 3346.78 samples/sec   Loss 0.4361   LearningRate 0.0048   Epoch: 15   Global Step: 260360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:48:53,791-Speed 3288.82 samples/sec   Loss 0.4291   LearningRate 0.0048   Epoch: 15   Global Step: 260370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:48:56,864-Speed 3333.57 samples/sec   Loss 0.4289   LearningRate 0.0048   Epoch: 15   Global Step: 260380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:49:00,057-Speed 3207.12 samples/sec   Loss 0.4482   LearningRate 0.0048   Epoch: 15   Global Step: 260390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:49:03,141-Speed 3321.41 samples/sec   Loss 0.4349   LearningRate 0.0048   Epoch: 15   Global Step: 260400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:49:06,192-Speed 3356.73 samples/sec   Loss 0.4320   LearningRate 0.0048   Epoch: 15   Global Step: 260410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:49:09,339-Speed 3254.67 samples/sec   Loss 0.4388   LearningRate 0.0048   Epoch: 15   Global Step: 260420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:49:12,400-Speed 3346.66 samples/sec   Loss 0.4281   LearningRate 0.0048   Epoch: 15   Global Step: 260430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:49:15,469-Speed 3337.40 samples/sec   Loss 0.4173   LearningRate 0.0048   Epoch: 15   Global Step: 260440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:18,525-Speed 3351.28 samples/sec   Loss 0.4188   LearningRate 0.0048   Epoch: 15   Global Step: 260450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:21,606-Speed 3324.03 samples/sec   Loss 0.4633   LearningRate 0.0048   Epoch: 15   Global Step: 260460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:24,689-Speed 3322.51 samples/sec   Loss 0.4447   LearningRate 0.0048   Epoch: 15   Global Step: 260470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:27,801-Speed 3291.15 samples/sec   Loss 0.4311   LearningRate 0.0048   Epoch: 15   Global Step: 260480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:30,881-Speed 3325.53 samples/sec   Loss 0.3995   LearningRate 0.0048   Epoch: 15   Global Step: 260490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:34,003-Speed 3280.52 samples/sec   Loss 0.4458   LearningRate 0.0048   Epoch: 15   Global Step: 260500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:37,083-Speed 3325.69 samples/sec   Loss 0.4022   LearningRate 0.0048   Epoch: 15   Global Step: 260510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:40,157-Speed 3331.92 samples/sec   Loss 0.4255   LearningRate 0.0048   Epoch: 15   Global Step: 260520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:43,240-Speed 3323.01 samples/sec   Loss 0.4345   LearningRate 0.0048   Epoch: 15   Global Step: 260530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:46,415-Speed 3225.45 samples/sec   Loss 0.4382   LearningRate 0.0048   Epoch: 15   Global Step: 260540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:49:49,515-Speed 3304.24 samples/sec   Loss 0.4098   LearningRate 0.0048   Epoch: 15   Global Step: 260550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:49:52,608-Speed 3310.84 samples/sec   Loss 0.4305   LearningRate 0.0048   Epoch: 15   Global Step: 260560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:49:55,661-Speed 3354.69 samples/sec   Loss 0.4413   LearningRate 0.0048   Epoch: 15   Global Step: 260570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:49:58,723-Speed 3345.22 samples/sec   Loss 0.4224   LearningRate 0.0048   Epoch: 15   Global Step: 260580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:50:01,842-Speed 3284.20 samples/sec   Loss 0.4576   LearningRate 0.0048   Epoch: 15   Global Step: 260590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:50:04,902-Speed 3346.98 samples/sec   Loss 0.4380   LearningRate 0.0048   Epoch: 15   Global Step: 260600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:50:07,961-Speed 3348.87 samples/sec   Loss 0.4379   LearningRate 0.0048   Epoch: 15   Global Step: 260610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:50:11,021-Speed 3347.18 samples/sec   Loss 0.4355   LearningRate 0.0048   Epoch: 15   Global Step: 260620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:50:14,084-Speed 3343.75 samples/sec   Loss 0.4360   LearningRate 0.0048   Epoch: 15   Global Step: 260630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:50:17,177-Speed 3310.78 samples/sec   Loss 0.4344   LearningRate 0.0048   Epoch: 15   Global Step: 260640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:50:20,257-Speed 3325.43 samples/sec   Loss 0.4425   LearningRate 0.0048   Epoch: 15   Global Step: 260650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:50:23,330-Speed 3333.52 samples/sec   Loss 0.4469   LearningRate 0.0048   Epoch: 15   Global Step: 260660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:50:26,389-Speed 3348.06 samples/sec   Loss 0.4226   LearningRate 0.0048   Epoch: 15   Global Step: 260670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:29,458-Speed 3337.42 samples/sec   Loss 0.4381   LearningRate 0.0048   Epoch: 15   Global Step: 260680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:32,539-Speed 3324.51 samples/sec   Loss 0.4141   LearningRate 0.0048   Epoch: 15   Global Step: 260690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:35,669-Speed 3272.69 samples/sec   Loss 0.4314   LearningRate 0.0048   Epoch: 15   Global Step: 260700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:38,806-Speed 3264.23 samples/sec   Loss 0.4068   LearningRate 0.0048   Epoch: 15   Global Step: 260710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:41,888-Speed 3324.01 samples/sec   Loss 0.4391   LearningRate 0.0048   Epoch: 15   Global Step: 260720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:44,947-Speed 3347.84 samples/sec   Loss 0.4216   LearningRate 0.0048   Epoch: 15   Global Step: 260730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:48,041-Speed 3310.50 samples/sec   Loss 0.4665   LearningRate 0.0048   Epoch: 15   Global Step: 260740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:51,144-Speed 3300.20 samples/sec   Loss 0.4551   LearningRate 0.0048   Epoch: 15   Global Step: 260750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:54,304-Speed 3241.91 samples/sec   Loss 0.4245   LearningRate 0.0048   Epoch: 15   Global Step: 260760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:50:57,368-Speed 3343.02 samples/sec   Loss 0.4325   LearningRate 0.0048   Epoch: 15   Global Step: 260770   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-12 02:51:00,470-Speed 3302.16 samples/sec   Loss 0.4264   LearningRate 0.0048   Epoch: 15   Global Step: 260780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:51:03,597-Speed 3274.65 samples/sec   Loss 0.4152   LearningRate 0.0048   Epoch: 15   Global Step: 260790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:51:06,752-Speed 3246.84 samples/sec   Loss 0.4470   LearningRate 0.0048   Epoch: 15   Global Step: 260800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:09,933-Speed 3220.08 samples/sec   Loss 0.4352   LearningRate 0.0048   Epoch: 15   Global Step: 260810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:12,997-Speed 3343.10 samples/sec   Loss 0.4580   LearningRate 0.0048   Epoch: 15   Global Step: 260820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:16,055-Speed 3349.94 samples/sec   Loss 0.4301   LearningRate 0.0048   Epoch: 15   Global Step: 260830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:19,122-Speed 3338.62 samples/sec   Loss 0.4205   LearningRate 0.0048   Epoch: 15   Global Step: 260840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:22,180-Speed 3349.96 samples/sec   Loss 0.4577   LearningRate 0.0048   Epoch: 15   Global Step: 260850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:25,301-Speed 3282.06 samples/sec   Loss 0.4532   LearningRate 0.0048   Epoch: 15   Global Step: 260860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:28,416-Speed 3288.27 samples/sec   Loss 0.4654   LearningRate 0.0048   Epoch: 15   Global Step: 260870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:31,536-Speed 3282.84 samples/sec   Loss 0.4628   LearningRate 0.0048   Epoch: 15   Global Step: 260880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:34,613-Speed 3328.44 samples/sec   Loss 0.4453   LearningRate 0.0048   Epoch: 15   Global Step: 260890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:37,707-Speed 3310.26 samples/sec   Loss 0.4473   LearningRate 0.0048   Epoch: 15   Global Step: 260900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:51:40,759-Speed 3356.15 samples/sec   Loss 0.4646   LearningRate 0.0048   Epoch: 15   Global Step: 260910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:43,853-Speed 3310.52 samples/sec   Loss 0.4220   LearningRate 0.0048   Epoch: 15   Global Step: 260920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:46,931-Speed 3326.74 samples/sec   Loss 0.4208   LearningRate 0.0048   Epoch: 15   Global Step: 260930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:50,125-Speed 3207.58 samples/sec   Loss 0.4306   LearningRate 0.0048   Epoch: 15   Global Step: 260940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:53,204-Speed 3327.06 samples/sec   Loss 0.4420   LearningRate 0.0048   Epoch: 15   Global Step: 260950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:56,266-Speed 3344.78 samples/sec   Loss 0.4279   LearningRate 0.0048   Epoch: 15   Global Step: 260960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:51:59,330-Speed 3342.58 samples/sec   Loss 0.4247   LearningRate 0.0048   Epoch: 15   Global Step: 260970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:02,398-Speed 3338.55 samples/sec   Loss 0.4112   LearningRate 0.0048   Epoch: 15   Global Step: 260980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:05,465-Speed 3338.68 samples/sec   Loss 0.4393   LearningRate 0.0048   Epoch: 15   Global Step: 260990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:08,542-Speed 3328.55 samples/sec   Loss 0.4203   LearningRate 0.0048   Epoch: 15   Global Step: 261000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:11,631-Speed 3316.59 samples/sec   Loss 0.4415   LearningRate 0.0048   Epoch: 15   Global Step: 261010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:52:14,693-Speed 3344.04 samples/sec   Loss 0.4434   LearningRate 0.0048   Epoch: 15   Global Step: 261020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:52:17,756-Speed 3344.67 samples/sec   Loss 0.4612   LearningRate 0.0048   Epoch: 15   Global Step: 261030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:20,823-Speed 3339.89 samples/sec   Loss 0.4491   LearningRate 0.0048   Epoch: 15   Global Step: 261040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:23,911-Speed 3316.38 samples/sec   Loss 0.4125   LearningRate 0.0048   Epoch: 15   Global Step: 261050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:26,974-Speed 3343.56 samples/sec   Loss 0.4344   LearningRate 0.0048   Epoch: 15   Global Step: 261060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:30,060-Speed 3319.67 samples/sec   Loss 0.4492   LearningRate 0.0047   Epoch: 15   Global Step: 261070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:33,151-Speed 3313.37 samples/sec   Loss 0.4383   LearningRate 0.0047   Epoch: 15   Global Step: 261080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:36,213-Speed 3344.47 samples/sec   Loss 0.4447   LearningRate 0.0047   Epoch: 15   Global Step: 261090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:39,296-Speed 3322.88 samples/sec   Loss 0.4235   LearningRate 0.0047   Epoch: 15   Global Step: 261100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:42,376-Speed 3325.02 samples/sec   Loss 0.4398   LearningRate 0.0047   Epoch: 15   Global Step: 261110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:45,444-Speed 3338.82 samples/sec   Loss 0.4157   LearningRate 0.0047   Epoch: 15   Global Step: 261120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:48,523-Speed 3326.37 samples/sec   Loss 0.4197   LearningRate 0.0047   Epoch: 15   Global Step: 261130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:52:51,582-Speed 3348.35 samples/sec   Loss 0.4302   LearningRate 0.0047   Epoch: 15   Global Step: 261140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:54,689-Speed 3296.25 samples/sec   Loss 0.4267   LearningRate 0.0047   Epoch: 15   Global Step: 261150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:52:57,757-Speed 3338.60 samples/sec   Loss 0.4245   LearningRate 0.0047   Epoch: 15   Global Step: 261160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:00,843-Speed 3318.25 samples/sec   Loss 0.4395   LearningRate 0.0047   Epoch: 15   Global Step: 261170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:03,930-Speed 3318.69 samples/sec   Loss 0.4441   LearningRate 0.0047   Epoch: 15   Global Step: 261180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:06,986-Speed 3350.65 samples/sec   Loss 0.4375   LearningRate 0.0047   Epoch: 15   Global Step: 261190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:10,064-Speed 3327.55 samples/sec   Loss 0.4308   LearningRate 0.0047   Epoch: 15   Global Step: 261200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:13,136-Speed 3334.67 samples/sec   Loss 0.4453   LearningRate 0.0047   Epoch: 15   Global Step: 261210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:16,204-Speed 3338.11 samples/sec   Loss 0.4362   LearningRate 0.0047   Epoch: 15   Global Step: 261220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:19,276-Speed 3334.89 samples/sec   Loss 0.4417   LearningRate 0.0047   Epoch: 15   Global Step: 261230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:22,374-Speed 3306.22 samples/sec   Loss 0.4276   LearningRate 0.0047   Epoch: 15   Global Step: 261240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:53:25,443-Speed 3337.09 samples/sec   Loss 0.4259   LearningRate 0.0047   Epoch: 15   Global Step: 261250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:53:28,535-Speed 3312.79 samples/sec   Loss 0.4331   LearningRate 0.0047   Epoch: 15   Global Step: 261260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:53:31,703-Speed 3232.16 samples/sec   Loss 0.4344   LearningRate 0.0047   Epoch: 15   Global Step: 261270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:53:34,805-Speed 3303.37 samples/sec   Loss 0.4534   LearningRate 0.0047   Epoch: 15   Global Step: 261280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:37,868-Speed 3343.24 samples/sec   Loss 0.4321   LearningRate 0.0047   Epoch: 15   Global Step: 261290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:40,949-Speed 3324.90 samples/sec   Loss 0.4548   LearningRate 0.0047   Epoch: 15   Global Step: 261300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:44,097-Speed 3253.01 samples/sec   Loss 0.4512   LearningRate 0.0047   Epoch: 15   Global Step: 261310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:47,179-Speed 3323.73 samples/sec   Loss 0.4158   LearningRate 0.0047   Epoch: 15   Global Step: 261320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:50,256-Speed 3327.79 samples/sec   Loss 0.4249   LearningRate 0.0047   Epoch: 15   Global Step: 261330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:53,340-Speed 3321.92 samples/sec   Loss 0.4238   LearningRate 0.0047   Epoch: 15   Global Step: 261340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:56,402-Speed 3344.55 samples/sec   Loss 0.4161   LearningRate 0.0047   Epoch: 15   Global Step: 261350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:53:59,474-Speed 3334.14 samples/sec   Loss 0.4407   LearningRate 0.0047   Epoch: 15   Global Step: 261360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:02,548-Speed 3332.18 samples/sec   Loss 0.4240   LearningRate 0.0047   Epoch: 15   Global Step: 261370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:05,615-Speed 3340.27 samples/sec   Loss 0.4445   LearningRate 0.0047   Epoch: 15   Global Step: 261380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:54:08,676-Speed 3345.36 samples/sec   Loss 0.4563   LearningRate 0.0047   Epoch: 15   Global Step: 261390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:54:11,789-Speed 3290.06 samples/sec   Loss 0.4541   LearningRate 0.0047   Epoch: 15   Global Step: 261400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:54:14,852-Speed 3344.45 samples/sec   Loss 0.4416   LearningRate 0.0047   Epoch: 15   Global Step: 261410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:54:17,959-Speed 3296.18 samples/sec   Loss 0.4466   LearningRate 0.0047   Epoch: 15   Global Step: 261420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:54:21,020-Speed 3346.38 samples/sec   Loss 0.4567   LearningRate 0.0047   Epoch: 15   Global Step: 261430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:54:24,135-Speed 3287.69 samples/sec   Loss 0.4416   LearningRate 0.0047   Epoch: 15   Global Step: 261440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:54:27,198-Speed 3343.74 samples/sec   Loss 0.4293   LearningRate 0.0047   Epoch: 15   Global Step: 261450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:54:30,283-Speed 3319.85 samples/sec   Loss 0.4305   LearningRate 0.0047   Epoch: 15   Global Step: 261460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:54:33,343-Speed 3347.82 samples/sec   Loss 0.4407   LearningRate 0.0047   Epoch: 15   Global Step: 261470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:36,441-Speed 3305.24 samples/sec   Loss 0.4318   LearningRate 0.0047   Epoch: 15   Global Step: 261480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:39,506-Speed 3341.94 samples/sec   Loss 0.4295   LearningRate 0.0047   Epoch: 15   Global Step: 261490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:42,588-Speed 3323.35 samples/sec   Loss 0.4295   LearningRate 0.0047   Epoch: 15   Global Step: 261500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:45,662-Speed 3331.49 samples/sec   Loss 0.4384   LearningRate 0.0047   Epoch: 15   Global Step: 261510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:48,739-Speed 3328.52 samples/sec   Loss 0.4358   LearningRate 0.0047   Epoch: 15   Global Step: 261520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:51,819-Speed 3326.08 samples/sec   Loss 0.4274   LearningRate 0.0047   Epoch: 15   Global Step: 261530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:54,890-Speed 3335.38 samples/sec   Loss 0.4367   LearningRate 0.0047   Epoch: 15   Global Step: 261540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:54:57,958-Speed 3337.96 samples/sec   Loss 0.4320   LearningRate 0.0047   Epoch: 15   Global Step: 261550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:55:01,022-Speed 3342.76 samples/sec   Loss 0.4677   LearningRate 0.0047   Epoch: 15   Global Step: 261560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:55:04,206-Speed 3216.97 samples/sec   Loss 0.4552   LearningRate 0.0047   Epoch: 15   Global Step: 261570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:07,311-Speed 3298.43 samples/sec   Loss 0.4255   LearningRate 0.0047   Epoch: 15   Global Step: 261580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:10,424-Speed 3290.98 samples/sec   Loss 0.4311   LearningRate 0.0047   Epoch: 15   Global Step: 261590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:13,575-Speed 3250.36 samples/sec   Loss 0.4406   LearningRate 0.0047   Epoch: 15   Global Step: 261600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:16,759-Speed 3216.94 samples/sec   Loss 0.4322   LearningRate 0.0047   Epoch: 15   Global Step: 261610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:19,911-Speed 3249.32 samples/sec   Loss 0.4402   LearningRate 0.0047   Epoch: 15   Global Step: 261620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:22,984-Speed 3333.07 samples/sec   Loss 0.4263   LearningRate 0.0047   Epoch: 15   Global Step: 261630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:26,109-Speed 3277.20 samples/sec   Loss 0.4477   LearningRate 0.0047   Epoch: 15   Global Step: 261640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:29,175-Speed 3340.42 samples/sec   Loss 0.4314   LearningRate 0.0047   Epoch: 15   Global Step: 261650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:32,254-Speed 3326.77 samples/sec   Loss 0.4861   LearningRate 0.0047   Epoch: 15   Global Step: 261660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:35,314-Speed 3347.61 samples/sec   Loss 0.4111   LearningRate 0.0047   Epoch: 15   Global Step: 261670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:38,407-Speed 3310.65 samples/sec   Loss 0.4573   LearningRate 0.0047   Epoch: 15   Global Step: 261680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:41,475-Speed 3339.27 samples/sec   Loss 0.4475   LearningRate 0.0047   Epoch: 15   Global Step: 261690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:44,569-Speed 3309.86 samples/sec   Loss 0.4268   LearningRate 0.0047   Epoch: 15   Global Step: 261700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:47,645-Speed 3330.15 samples/sec   Loss 0.4480   LearningRate 0.0047   Epoch: 15   Global Step: 261710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:50,747-Speed 3301.63 samples/sec   Loss 0.4524   LearningRate 0.0047   Epoch: 15   Global Step: 261720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:53,949-Speed 3198.38 samples/sec   Loss 0.4324   LearningRate 0.0047   Epoch: 15   Global Step: 261730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:55:57,144-Speed 3205.79 samples/sec   Loss 0.4462   LearningRate 0.0047   Epoch: 15   Global Step: 261740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:00,210-Speed 3340.73 samples/sec   Loss 0.4547   LearningRate 0.0047   Epoch: 15   Global Step: 261750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:03,286-Speed 3329.46 samples/sec   Loss 0.4566   LearningRate 0.0047   Epoch: 15   Global Step: 261760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:06,362-Speed 3330.42 samples/sec   Loss 0.3803   LearningRate 0.0047   Epoch: 15   Global Step: 261770   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-12 02:56:09,414-Speed 3355.41 samples/sec   Loss 0.4462   LearningRate 0.0047   Epoch: 15   Global Step: 261780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:12,507-Speed 3312.27 samples/sec   Loss 0.4314   LearningRate 0.0047   Epoch: 15   Global Step: 261790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:15,573-Speed 3341.04 samples/sec   Loss 0.4301   LearningRate 0.0047   Epoch: 15   Global Step: 261800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:18,657-Speed 3320.79 samples/sec   Loss 0.4116   LearningRate 0.0047   Epoch: 15   Global Step: 261810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:21,732-Speed 3330.24 samples/sec   Loss 0.4470   LearningRate 0.0047   Epoch: 15   Global Step: 261820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:24,796-Speed 3343.37 samples/sec   Loss 0.4270   LearningRate 0.0047   Epoch: 15   Global Step: 261830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:27,880-Speed 3320.64 samples/sec   Loss 0.4451   LearningRate 0.0046   Epoch: 15   Global Step: 261840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:30,951-Speed 3335.37 samples/sec   Loss 0.4286   LearningRate 0.0046   Epoch: 15   Global Step: 261850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:34,021-Speed 3336.77 samples/sec   Loss 0.4427   LearningRate 0.0046   Epoch: 15   Global Step: 261860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:37,089-Speed 3337.84 samples/sec   Loss 0.4234   LearningRate 0.0046   Epoch: 15   Global Step: 261870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:56:40,161-Speed 3334.54 samples/sec   Loss 0.4340   LearningRate 0.0046   Epoch: 15   Global Step: 261880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:56:43,304-Speed 3258.30 samples/sec   Loss 0.4352   LearningRate 0.0046   Epoch: 15   Global Step: 261890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:56:46,403-Speed 3305.78 samples/sec   Loss 0.4239   LearningRate 0.0046   Epoch: 15   Global Step: 261900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:56:49,552-Speed 3251.64 samples/sec   Loss 0.4184   LearningRate 0.0046   Epoch: 15   Global Step: 261910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:56:52,620-Speed 3339.22 samples/sec   Loss 0.4210   LearningRate 0.0046   Epoch: 15   Global Step: 261920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:56:55,718-Speed 3306.19 samples/sec   Loss 0.4551   LearningRate 0.0046   Epoch: 15   Global Step: 261930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:56:58,803-Speed 3319.15 samples/sec   Loss 0.4363   LearningRate 0.0046   Epoch: 15   Global Step: 261940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:57:01,929-Speed 3276.16 samples/sec   Loss 0.4197   LearningRate 0.0046   Epoch: 15   Global Step: 261950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:57:05,057-Speed 3275.43 samples/sec   Loss 0.4640   LearningRate 0.0046   Epoch: 15   Global Step: 261960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:57:08,121-Speed 3342.23 samples/sec   Loss 0.4409   LearningRate 0.0046   Epoch: 15   Global Step: 261970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 02:57:11,219-Speed 3307.11 samples/sec   Loss 0.4356   LearningRate 0.0046   Epoch: 15   Global Step: 261980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:57:14,279-Speed 3346.59 samples/sec   Loss 0.4451   LearningRate 0.0046   Epoch: 15   Global Step: 261990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:57:17,340-Speed 3346.09 samples/sec   Loss 0.4255   LearningRate 0.0046   Epoch: 15   Global Step: 262000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:58:01,486-[lfw][262000]XNorm: 19.966944
Training: 2022-04-12 02:58:01,487-[lfw][262000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-12 02:58:01,487-[lfw][262000]Accuracy-Highest: 0.99817
Training: 2022-04-12 02:58:52,600-[cfp_fp][262000]XNorm: 21.088606
Training: 2022-04-12 02:58:52,600-[cfp_fp][262000]Accuracy-Flip: 0.99100+-0.00535
Training: 2022-04-12 02:58:52,601-[cfp_fp][262000]Accuracy-Highest: 0.99186
Training: 2022-04-12 02:59:36,363-[agedb_30][262000]XNorm: 21.548215
Training: 2022-04-12 02:59:36,364-[agedb_30][262000]Accuracy-Flip: 0.98450+-0.00637
Training: 2022-04-12 02:59:36,364-[agedb_30][262000]Accuracy-Highest: 0.98650
Training: 2022-04-12 02:59:39,523-Speed 72.02 samples/sec   Loss 0.4420   LearningRate 0.0046   Epoch: 15   Global Step: 262010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:59:42,623-Speed 3303.41 samples/sec   Loss 0.4648   LearningRate 0.0046   Epoch: 15   Global Step: 262020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:59:45,673-Speed 3358.48 samples/sec   Loss 0.4740   LearningRate 0.0046   Epoch: 15   Global Step: 262030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:59:48,748-Speed 3330.53 samples/sec   Loss 0.4449   LearningRate 0.0046   Epoch: 15   Global Step: 262040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:59:51,798-Speed 3358.00 samples/sec   Loss 0.4437   LearningRate 0.0046   Epoch: 15   Global Step: 262050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:59:54,862-Speed 3343.48 samples/sec   Loss 0.4198   LearningRate 0.0046   Epoch: 15   Global Step: 262060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 02:59:57,914-Speed 3356.10 samples/sec   Loss 0.4265   LearningRate 0.0046   Epoch: 15   Global Step: 262070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:00:00,987-Speed 3332.61 samples/sec   Loss 0.4510   LearningRate 0.0046   Epoch: 15   Global Step: 262080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:00:04,095-Speed 3295.05 samples/sec   Loss 0.4600   LearningRate 0.0046   Epoch: 15   Global Step: 262090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:07,248-Speed 3248.86 samples/sec   Loss 0.4285   LearningRate 0.0046   Epoch: 15   Global Step: 262100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:10,404-Speed 3245.08 samples/sec   Loss 0.4319   LearningRate 0.0046   Epoch: 15   Global Step: 262110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:13,568-Speed 3237.84 samples/sec   Loss 0.4221   LearningRate 0.0046   Epoch: 15   Global Step: 262120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:16,632-Speed 3342.52 samples/sec   Loss 0.4532   LearningRate 0.0046   Epoch: 15   Global Step: 262130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:19,808-Speed 3224.79 samples/sec   Loss 0.4482   LearningRate 0.0046   Epoch: 15   Global Step: 262140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:23,055-Speed 3153.93 samples/sec   Loss 0.4303   LearningRate 0.0046   Epoch: 15   Global Step: 262150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:26,192-Speed 3264.68 samples/sec   Loss 0.4499   LearningRate 0.0046   Epoch: 15   Global Step: 262160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:29,284-Speed 3312.34 samples/sec   Loss 0.4184   LearningRate 0.0046   Epoch: 15   Global Step: 262170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:32,360-Speed 3329.99 samples/sec   Loss 0.4224   LearningRate 0.0046   Epoch: 15   Global Step: 262180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-12 03:00:35,441-Speed 3324.46 samples/sec   Loss 0.4486   LearningRate 0.0046   Epoch: 15   Global Step: 262190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:00:38,587-Speed 3256.36 samples/sec   Loss 0.4205   LearningRate 0.0046   Epoch: 15   Global Step: 262200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:00:41,717-Speed 3271.40 samples/sec   Loss 0.4016   LearningRate 0.0046   Epoch: 15   Global Step: 262210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:00:44,781-Speed 3343.18 samples/sec   Loss 0.4213   LearningRate 0.0046   Epoch: 15   Global Step: 262220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:00:47,875-Speed 3310.97 samples/sec   Loss 0.4293   LearningRate 0.0046   Epoch: 15   Global Step: 262230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:00:50,950-Speed 3330.21 samples/sec   Loss 0.4557   LearningRate 0.0046   Epoch: 15   Global Step: 262240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:00:54,025-Speed 3331.33 samples/sec   Loss 0.4196   LearningRate 0.0046   Epoch: 15   Global Step: 262250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:00:57,108-Speed 3321.72 samples/sec   Loss 0.4638   LearningRate 0.0046   Epoch: 15   Global Step: 262260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:01:00,169-Speed 3346.36 samples/sec   Loss 0.4440   LearningRate 0.0046   Epoch: 15   Global Step: 262270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:01:03,302-Speed 3268.78 samples/sec   Loss 0.4239   LearningRate 0.0046   Epoch: 15   Global Step: 262280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:01:06,361-Speed 3349.01 samples/sec   Loss 0.4169   LearningRate 0.0046   Epoch: 15   Global Step: 262290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:09,430-Speed 3336.75 samples/sec   Loss 0.4141   LearningRate 0.0046   Epoch: 15   Global Step: 262300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:12,507-Speed 3328.60 samples/sec   Loss 0.4211   LearningRate 0.0046   Epoch: 15   Global Step: 262310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:15,565-Speed 3349.31 samples/sec   Loss 0.4485   LearningRate 0.0046   Epoch: 15   Global Step: 262320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:18,631-Speed 3340.30 samples/sec   Loss 0.4220   LearningRate 0.0046   Epoch: 15   Global Step: 262330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:21,717-Speed 3319.38 samples/sec   Loss 0.4530   LearningRate 0.0046   Epoch: 15   Global Step: 262340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:24,775-Speed 3348.79 samples/sec   Loss 0.4213   LearningRate 0.0046   Epoch: 15   Global Step: 262350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:27,843-Speed 3338.31 samples/sec   Loss 0.4584   LearningRate 0.0046   Epoch: 15   Global Step: 262360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:30,910-Speed 3339.64 samples/sec   Loss 0.4486   LearningRate 0.0046   Epoch: 15   Global Step: 262370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:33,988-Speed 3328.55 samples/sec   Loss 0.4517   LearningRate 0.0046   Epoch: 15   Global Step: 262380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:37,058-Speed 3336.10 samples/sec   Loss 0.4286   LearningRate 0.0046   Epoch: 15   Global Step: 262390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:40,134-Speed 3329.31 samples/sec   Loss 0.4531   LearningRate 0.0046   Epoch: 15   Global Step: 262400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:43,193-Speed 3348.71 samples/sec   Loss 0.4137   LearningRate 0.0046   Epoch: 15   Global Step: 262410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:46,258-Speed 3341.46 samples/sec   Loss 0.4340   LearningRate 0.0046   Epoch: 15   Global Step: 262420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:49,357-Speed 3305.32 samples/sec   Loss 0.4563   LearningRate 0.0046   Epoch: 15   Global Step: 262430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:52,526-Speed 3231.32 samples/sec   Loss 0.4349   LearningRate 0.0046   Epoch: 15   Global Step: 262440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:55,606-Speed 3325.89 samples/sec   Loss 0.4185   LearningRate 0.0046   Epoch: 15   Global Step: 262450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:01:58,668-Speed 3344.55 samples/sec   Loss 0.4192   LearningRate 0.0046   Epoch: 15   Global Step: 262460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:02:01,739-Speed 3336.11 samples/sec   Loss 0.4597   LearningRate 0.0046   Epoch: 15   Global Step: 262470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:02:04,863-Speed 3278.58 samples/sec   Loss 0.4211   LearningRate 0.0046   Epoch: 15   Global Step: 262480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:02:07,965-Speed 3301.12 samples/sec   Loss 0.4434   LearningRate 0.0046   Epoch: 15   Global Step: 262490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:02:11,024-Speed 3348.53 samples/sec   Loss 0.4641   LearningRate 0.0046   Epoch: 15   Global Step: 262500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:02:14,098-Speed 3331.93 samples/sec   Loss 0.4394   LearningRate 0.0046   Epoch: 15   Global Step: 262510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:17,175-Speed 3328.52 samples/sec   Loss 0.4585   LearningRate 0.0046   Epoch: 15   Global Step: 262520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:20,331-Speed 3245.51 samples/sec   Loss 0.4238   LearningRate 0.0046   Epoch: 15   Global Step: 262530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:23,530-Speed 3201.19 samples/sec   Loss 0.4433   LearningRate 0.0046   Epoch: 15   Global Step: 262540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:26,607-Speed 3328.93 samples/sec   Loss 0.4519   LearningRate 0.0046   Epoch: 15   Global Step: 262550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:29,699-Speed 3313.13 samples/sec   Loss 0.4372   LearningRate 0.0046   Epoch: 15   Global Step: 262560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:32,778-Speed 3325.83 samples/sec   Loss 0.4495   LearningRate 0.0046   Epoch: 15   Global Step: 262570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:35,840-Speed 3344.75 samples/sec   Loss 0.4421   LearningRate 0.0046   Epoch: 15   Global Step: 262580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:38,954-Speed 3289.39 samples/sec   Loss 0.4395   LearningRate 0.0046   Epoch: 15   Global Step: 262590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:42,132-Speed 3223.17 samples/sec   Loss 0.4640   LearningRate 0.0046   Epoch: 15   Global Step: 262600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:02:45,223-Speed 3312.95 samples/sec   Loss 0.4354   LearningRate 0.0046   Epoch: 15   Global Step: 262610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:02:48,348-Speed 3277.74 samples/sec   Loss 0.4617   LearningRate 0.0045   Epoch: 15   Global Step: 262620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:02:51,425-Speed 3329.03 samples/sec   Loss 0.4464   LearningRate 0.0045   Epoch: 15   Global Step: 262630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:02:54,486-Speed 3346.14 samples/sec   Loss 0.4397   LearningRate 0.0045   Epoch: 15   Global Step: 262640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:02:57,596-Speed 3293.14 samples/sec   Loss 0.4430   LearningRate 0.0045   Epoch: 15   Global Step: 262650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:03:00,762-Speed 3234.70 samples/sec   Loss 0.4163   LearningRate 0.0045   Epoch: 15   Global Step: 262660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:03:03,845-Speed 3322.89 samples/sec   Loss 0.4380   LearningRate 0.0045   Epoch: 15   Global Step: 262670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:06,910-Speed 3341.11 samples/sec   Loss 0.4384   LearningRate 0.0045   Epoch: 15   Global Step: 262680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:09,976-Speed 3341.16 samples/sec   Loss 0.4436   LearningRate 0.0045   Epoch: 15   Global Step: 262690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:13,054-Speed 3326.66 samples/sec   Loss 0.4409   LearningRate 0.0045   Epoch: 15   Global Step: 262700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:16,145-Speed 3314.31 samples/sec   Loss 0.4284   LearningRate 0.0045   Epoch: 15   Global Step: 262710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:19,207-Speed 3344.74 samples/sec   Loss 0.4205   LearningRate 0.0045   Epoch: 15   Global Step: 262720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:22,280-Speed 3333.42 samples/sec   Loss 0.4546   LearningRate 0.0045   Epoch: 15   Global Step: 262730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:25,341-Speed 3345.72 samples/sec   Loss 0.4617   LearningRate 0.0045   Epoch: 15   Global Step: 262740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:28,427-Speed 3318.54 samples/sec   Loss 0.4327   LearningRate 0.0045   Epoch: 15   Global Step: 262750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:31,495-Speed 3338.86 samples/sec   Loss 0.4673   LearningRate 0.0045   Epoch: 15   Global Step: 262760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:34,543-Speed 3360.31 samples/sec   Loss 0.4352   LearningRate 0.0045   Epoch: 15   Global Step: 262770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:37,608-Speed 3341.15 samples/sec   Loss 0.4440   LearningRate 0.0045   Epoch: 15   Global Step: 262780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:40,674-Speed 3340.69 samples/sec   Loss 0.4426   LearningRate 0.0045   Epoch: 15   Global Step: 262790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:43,750-Speed 3330.18 samples/sec   Loss 0.4408   LearningRate 0.0045   Epoch: 15   Global Step: 262800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:46,877-Speed 3275.78 samples/sec   Loss 0.4181   LearningRate 0.0045   Epoch: 15   Global Step: 262810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:49,944-Speed 3339.65 samples/sec   Loss 0.4570   LearningRate 0.0045   Epoch: 15   Global Step: 262820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:53,012-Speed 3338.25 samples/sec   Loss 0.4320   LearningRate 0.0045   Epoch: 15   Global Step: 262830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:56,078-Speed 3340.84 samples/sec   Loss 0.4284   LearningRate 0.0045   Epoch: 15   Global Step: 262840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:03:59,138-Speed 3346.29 samples/sec   Loss 0.4185   LearningRate 0.0045   Epoch: 15   Global Step: 262850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:04:02,199-Speed 3346.77 samples/sec   Loss 0.4257   LearningRate 0.0045   Epoch: 15   Global Step: 262860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-12 03:04:05,262-Speed 3343.09 samples/sec   Loss 0.4482   LearningRate 0.0045   Epoch: 15   Global Step: 262870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:04:08,345-Speed 3322.60 samples/sec   Loss 0.4528   LearningRate 0.0045   Epoch: 15   Global Step: 262880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:04:11,434-Speed 3315.98 samples/sec   Loss 0.4351   LearningRate 0.0045   Epoch: 15   Global Step: 262890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:04:14,496-Speed 3344.79 samples/sec   Loss 0.4464   LearningRate 0.0045   Epoch: 15   Global Step: 262900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:04:17,566-Speed 3335.99 samples/sec   Loss 0.4192   LearningRate 0.0045   Epoch: 15   Global Step: 262910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:04:20,637-Speed 3335.89 samples/sec   Loss 0.4562   LearningRate 0.0045   Epoch: 15   Global Step: 262920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:04:23,701-Speed 3342.24 samples/sec   Loss 0.4426   LearningRate 0.0045   Epoch: 15   Global Step: 262930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-12 03:04:26,791-Speed 3315.49 samples/sec   Loss 0.4250   LearningRate 0.0045   Epoch: 15   Global Step: 262940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:29,873-Speed 3322.51 samples/sec   Loss 0.4564   LearningRate 0.0045   Epoch: 15   Global Step: 262950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:32,995-Speed 3280.47 samples/sec   Loss 0.4271   LearningRate 0.0045   Epoch: 15   Global Step: 262960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:36,049-Speed 3354.25 samples/sec   Loss 0.4217   LearningRate 0.0045   Epoch: 15   Global Step: 262970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:39,113-Speed 3342.92 samples/sec   Loss 0.4295   LearningRate 0.0045   Epoch: 15   Global Step: 262980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:42,174-Speed 3346.42 samples/sec   Loss 0.4076   LearningRate 0.0045   Epoch: 15   Global Step: 262990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:45,237-Speed 3343.46 samples/sec   Loss 0.4311   LearningRate 0.0045   Epoch: 15   Global Step: 263000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:48,302-Speed 3342.62 samples/sec   Loss 0.4599   LearningRate 0.0045   Epoch: 15   Global Step: 263010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:51,365-Speed 3343.46 samples/sec   Loss 0.4161   LearningRate 0.0045   Epoch: 15   Global Step: 263020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:54,437-Speed 3333.63 samples/sec   Loss 0.4611   LearningRate 0.0045   Epoch: 15   Global Step: 263030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:04:57,488-Speed 3357.38 samples/sec   Loss 0.4377   LearningRate 0.0045   Epoch: 15   Global Step: 263040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:00,601-Speed 3289.89 samples/sec   Loss 0.4464   LearningRate 0.0045   Epoch: 15   Global Step: 263050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:03,702-Speed 3303.12 samples/sec   Loss 0.4446   LearningRate 0.0045   Epoch: 15   Global Step: 263060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:06,827-Speed 3278.13 samples/sec   Loss 0.4421   LearningRate 0.0045   Epoch: 15   Global Step: 263070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:09,979-Speed 3249.38 samples/sec   Loss 0.4328   LearningRate 0.0045   Epoch: 15   Global Step: 263080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:13,045-Speed 3339.58 samples/sec   Loss 0.4584   LearningRate 0.0045   Epoch: 15   Global Step: 263090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:16,117-Speed 3334.66 samples/sec   Loss 0.4331   LearningRate 0.0045   Epoch: 15   Global Step: 263100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:19,195-Speed 3327.14 samples/sec   Loss 0.4232   LearningRate 0.0045   Epoch: 15   Global Step: 263110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:22,258-Speed 3344.01 samples/sec   Loss 0.4283   LearningRate 0.0045   Epoch: 15   Global Step: 263120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:25,325-Speed 3339.45 samples/sec   Loss 0.4525   LearningRate 0.0045   Epoch: 15   Global Step: 263130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:28,388-Speed 3344.53 samples/sec   Loss 0.4314   LearningRate 0.0045   Epoch: 15   Global Step: 263140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:05:31,488-Speed 3303.22 samples/sec   Loss 0.4436   LearningRate 0.0045   Epoch: 15   Global Step: 263150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:05:34,570-Speed 3323.95 samples/sec   Loss 0.4442   LearningRate 0.0045   Epoch: 15   Global Step: 263160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:05:37,666-Speed 3308.46 samples/sec   Loss 0.4490   LearningRate 0.0045   Epoch: 15   Global Step: 263170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:05:40,738-Speed 3333.10 samples/sec   Loss 0.4338   LearningRate 0.0045   Epoch: 15   Global Step: 263180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:05:43,891-Speed 3249.11 samples/sec   Loss 0.4246   LearningRate 0.0045   Epoch: 15   Global Step: 263190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:05:46,989-Speed 3304.97 samples/sec   Loss 0.4482   LearningRate 0.0045   Epoch: 15   Global Step: 263200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:05:50,116-Speed 3275.82 samples/sec   Loss 0.4354   LearningRate 0.0045   Epoch: 15   Global Step: 263210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:05:53,196-Speed 3325.36 samples/sec   Loss 0.4192   LearningRate 0.0045   Epoch: 15   Global Step: 263220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:05:56,257-Speed 3345.89 samples/sec   Loss 0.4433   LearningRate 0.0045   Epoch: 15   Global Step: 263230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:05:59,361-Speed 3300.09 samples/sec   Loss 0.4226   LearningRate 0.0045   Epoch: 15   Global Step: 263240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:02,528-Speed 3234.41 samples/sec   Loss 0.4278   LearningRate 0.0045   Epoch: 15   Global Step: 263250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:05,593-Speed 3341.88 samples/sec   Loss 0.4472   LearningRate 0.0045   Epoch: 15   Global Step: 263260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:08,678-Speed 3319.27 samples/sec   Loss 0.4419   LearningRate 0.0045   Epoch: 15   Global Step: 263270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:11,760-Speed 3323.71 samples/sec   Loss 0.4441   LearningRate 0.0045   Epoch: 15   Global Step: 263280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:14,845-Speed 3319.47 samples/sec   Loss 0.4578   LearningRate 0.0045   Epoch: 15   Global Step: 263290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:17,942-Speed 3307.53 samples/sec   Loss 0.4521   LearningRate 0.0045   Epoch: 15   Global Step: 263300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:21,082-Speed 3261.62 samples/sec   Loss 0.4443   LearningRate 0.0045   Epoch: 15   Global Step: 263310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:24,147-Speed 3341.47 samples/sec   Loss 0.4071   LearningRate 0.0045   Epoch: 15   Global Step: 263320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:27,229-Speed 3323.13 samples/sec   Loss 0.4537   LearningRate 0.0045   Epoch: 15   Global Step: 263330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:06:30,296-Speed 3339.73 samples/sec   Loss 0.4312   LearningRate 0.0045   Epoch: 15   Global Step: 263340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:06:33,361-Speed 3342.70 samples/sec   Loss 0.4500   LearningRate 0.0045   Epoch: 15   Global Step: 263350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:36,425-Speed 3342.03 samples/sec   Loss 0.4317   LearningRate 0.0045   Epoch: 15   Global Step: 263360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:39,487-Speed 3345.16 samples/sec   Loss 0.4456   LearningRate 0.0045   Epoch: 15   Global Step: 263370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:42,570-Speed 3322.73 samples/sec   Loss 0.4236   LearningRate 0.0045   Epoch: 15   Global Step: 263380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:45,657-Speed 3317.31 samples/sec   Loss 0.4591   LearningRate 0.0045   Epoch: 15   Global Step: 263390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:48,755-Speed 3306.47 samples/sec   Loss 0.4370   LearningRate 0.0045   Epoch: 15   Global Step: 263400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:51,865-Speed 3292.60 samples/sec   Loss 0.4545   LearningRate 0.0044   Epoch: 15   Global Step: 263410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:54,926-Speed 3346.17 samples/sec   Loss 0.4368   LearningRate 0.0044   Epoch: 15   Global Step: 263420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:06:58,038-Speed 3291.20 samples/sec   Loss 0.4447   LearningRate 0.0044   Epoch: 15   Global Step: 263430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:07:01,124-Speed 3319.91 samples/sec   Loss 0.4337   LearningRate 0.0044   Epoch: 15   Global Step: 263440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:07:04,227-Speed 3299.73 samples/sec   Loss 0.4244   LearningRate 0.0044   Epoch: 15   Global Step: 263450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:07:07,294-Speed 3340.07 samples/sec   Loss 0.4596   LearningRate 0.0044   Epoch: 15   Global Step: 263460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:07:10,349-Speed 3352.47 samples/sec   Loss 0.4674   LearningRate 0.0044   Epoch: 15   Global Step: 263470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:07:13,413-Speed 3343.16 samples/sec   Loss 0.4242   LearningRate 0.0044   Epoch: 15   Global Step: 263480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:07:16,493-Speed 3325.51 samples/sec   Loss 0.4641   LearningRate 0.0044   Epoch: 15   Global Step: 263490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:07:19,618-Speed 3277.29 samples/sec   Loss 0.4303   LearningRate 0.0044   Epoch: 15   Global Step: 263500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:22,734-Speed 3286.35 samples/sec   Loss 0.4304   LearningRate 0.0044   Epoch: 15   Global Step: 263510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:25,810-Speed 3330.76 samples/sec   Loss 0.4135   LearningRate 0.0044   Epoch: 15   Global Step: 263520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:28,877-Speed 3339.28 samples/sec   Loss 0.4662   LearningRate 0.0044   Epoch: 15   Global Step: 263530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:31,950-Speed 3333.34 samples/sec   Loss 0.4686   LearningRate 0.0044   Epoch: 15   Global Step: 263540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:35,050-Speed 3303.78 samples/sec   Loss 0.4508   LearningRate 0.0044   Epoch: 15   Global Step: 263550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:38,129-Speed 3326.67 samples/sec   Loss 0.4431   LearningRate 0.0044   Epoch: 15   Global Step: 263560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:41,194-Speed 3340.94 samples/sec   Loss 0.4295   LearningRate 0.0044   Epoch: 15   Global Step: 263570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:44,264-Speed 3336.84 samples/sec   Loss 0.4432   LearningRate 0.0044   Epoch: 15   Global Step: 263580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:47,383-Speed 3283.73 samples/sec   Loss 0.4353   LearningRate 0.0044   Epoch: 15   Global Step: 263590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:07:50,496-Speed 3289.49 samples/sec   Loss 0.4537   LearningRate 0.0044   Epoch: 15   Global Step: 263600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:07:53,572-Speed 3330.23 samples/sec   Loss 0.4461   LearningRate 0.0044   Epoch: 15   Global Step: 263610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:07:56,642-Speed 3336.71 samples/sec   Loss 0.4440   LearningRate 0.0044   Epoch: 15   Global Step: 263620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:07:59,719-Speed 3328.69 samples/sec   Loss 0.4270   LearningRate 0.0044   Epoch: 15   Global Step: 263630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:02,782-Speed 3343.55 samples/sec   Loss 0.4523   LearningRate 0.0044   Epoch: 15   Global Step: 263640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:05,845-Speed 3343.77 samples/sec   Loss 0.4283   LearningRate 0.0044   Epoch: 15   Global Step: 263650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:08,965-Speed 3282.86 samples/sec   Loss 0.4371   LearningRate 0.0044   Epoch: 15   Global Step: 263660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:12,033-Speed 3338.13 samples/sec   Loss 0.4429   LearningRate 0.0044   Epoch: 15   Global Step: 263670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:15,129-Speed 3308.90 samples/sec   Loss 0.4473   LearningRate 0.0044   Epoch: 15   Global Step: 263680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:18,201-Speed 3333.56 samples/sec   Loss 0.4573   LearningRate 0.0044   Epoch: 15   Global Step: 263690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:21,285-Speed 3321.03 samples/sec   Loss 0.4371   LearningRate 0.0044   Epoch: 15   Global Step: 263700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:08:24,356-Speed 3335.71 samples/sec   Loss 0.4290   LearningRate 0.0044   Epoch: 15   Global Step: 263710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:08:27,440-Speed 3320.91 samples/sec   Loss 0.4424   LearningRate 0.0044   Epoch: 15   Global Step: 263720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:08:30,505-Speed 3341.50 samples/sec   Loss 0.4227   LearningRate 0.0044   Epoch: 15   Global Step: 263730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:08:33,568-Speed 3344.04 samples/sec   Loss 0.4392   LearningRate 0.0044   Epoch: 15   Global Step: 263740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:08:36,635-Speed 3340.14 samples/sec   Loss 0.4429   LearningRate 0.0044   Epoch: 15   Global Step: 263750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:08:39,724-Speed 3315.35 samples/sec   Loss 0.4473   LearningRate 0.0044   Epoch: 15   Global Step: 263760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:08:42,840-Speed 3287.22 samples/sec   Loss 0.4417   LearningRate 0.0044   Epoch: 15   Global Step: 263770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:08:45,961-Speed 3281.27 samples/sec   Loss 0.4145   LearningRate 0.0044   Epoch: 15   Global Step: 263780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:08:49,057-Speed 3308.69 samples/sec   Loss 0.4282   LearningRate 0.0044   Epoch: 15   Global Step: 263790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:52,158-Speed 3303.25 samples/sec   Loss 0.4517   LearningRate 0.0044   Epoch: 15   Global Step: 263800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:55,236-Speed 3327.20 samples/sec   Loss 0.4330   LearningRate 0.0044   Epoch: 15   Global Step: 263810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:08:58,302-Speed 3340.77 samples/sec   Loss 0.3935   LearningRate 0.0044   Epoch: 15   Global Step: 263820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:01,377-Speed 3330.29 samples/sec   Loss 0.4688   LearningRate 0.0044   Epoch: 15   Global Step: 263830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:04,447-Speed 3336.66 samples/sec   Loss 0.4305   LearningRate 0.0044   Epoch: 15   Global Step: 263840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:07,525-Speed 3327.19 samples/sec   Loss 0.4403   LearningRate 0.0044   Epoch: 15   Global Step: 263850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:10,756-Speed 3169.89 samples/sec   Loss 0.4251   LearningRate 0.0044   Epoch: 15   Global Step: 263860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:13,843-Speed 3317.99 samples/sec   Loss 0.4780   LearningRate 0.0044   Epoch: 15   Global Step: 263870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:16,959-Speed 3288.03 samples/sec   Loss 0.4396   LearningRate 0.0044   Epoch: 15   Global Step: 263880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:20,063-Speed 3298.85 samples/sec   Loss 0.4188   LearningRate 0.0044   Epoch: 15   Global Step: 263890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:23,137-Speed 3331.86 samples/sec   Loss 0.4085   LearningRate 0.0044   Epoch: 15   Global Step: 263900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:26,201-Speed 3343.05 samples/sec   Loss 0.4473   LearningRate 0.0044   Epoch: 15   Global Step: 263910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:09:29,265-Speed 3343.23 samples/sec   Loss 0.4548   LearningRate 0.0044   Epoch: 15   Global Step: 263920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:09:32,347-Speed 3323.06 samples/sec   Loss 0.4294   LearningRate 0.0044   Epoch: 15   Global Step: 263930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:09:35,435-Speed 3316.89 samples/sec   Loss 0.4395   LearningRate 0.0044   Epoch: 15   Global Step: 263940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:09:38,510-Speed 3331.04 samples/sec   Loss 0.4146   LearningRate 0.0044   Epoch: 15   Global Step: 263950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:09:41,702-Speed 3208.84 samples/sec   Loss 0.4515   LearningRate 0.0044   Epoch: 15   Global Step: 263960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:09:44,770-Speed 3337.72 samples/sec   Loss 0.4286   LearningRate 0.0044   Epoch: 15   Global Step: 263970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:09:47,837-Speed 3339.92 samples/sec   Loss 0.4177   LearningRate 0.0044   Epoch: 15   Global Step: 263980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:09:50,908-Speed 3334.65 samples/sec   Loss 0.4486   LearningRate 0.0044   Epoch: 15   Global Step: 263990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:09:53,983-Speed 3331.50 samples/sec   Loss 0.4452   LearningRate 0.0044   Epoch: 15   Global Step: 264000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:10:37,955-[lfw][264000]XNorm: 21.615600
Training: 2022-04-12 03:10:37,956-[lfw][264000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 03:10:37,956-[lfw][264000]Accuracy-Highest: 0.99817
Training: 2022-04-12 03:11:28,859-[cfp_fp][264000]XNorm: 22.653354
Training: 2022-04-12 03:11:28,859-[cfp_fp][264000]Accuracy-Flip: 0.99100+-0.00511
Training: 2022-04-12 03:11:28,860-[cfp_fp][264000]Accuracy-Highest: 0.99186
Training: 2022-04-12 03:12:12,785-[agedb_30][264000]XNorm: 22.831257
Training: 2022-04-12 03:12:12,785-[agedb_30][264000]Accuracy-Flip: 0.98500+-0.00592
Training: 2022-04-12 03:12:12,786-[agedb_30][264000]Accuracy-Highest: 0.98650
Training: 2022-04-12 03:12:15,872-Speed 72.17 samples/sec   Loss 0.4482   LearningRate 0.0044   Epoch: 15   Global Step: 264010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:12:18,925-Speed 3354.40 samples/sec   Loss 0.4140   LearningRate 0.0044   Epoch: 15   Global Step: 264020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:21,981-Speed 3351.43 samples/sec   Loss 0.4564   LearningRate 0.0044   Epoch: 15   Global Step: 264030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:25,037-Speed 3351.21 samples/sec   Loss 0.4614   LearningRate 0.0044   Epoch: 15   Global Step: 264040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:28,096-Speed 3348.81 samples/sec   Loss 0.4413   LearningRate 0.0044   Epoch: 15   Global Step: 264050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:31,202-Speed 3296.93 samples/sec   Loss 0.4435   LearningRate 0.0044   Epoch: 15   Global Step: 264060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:34,340-Speed 3264.75 samples/sec   Loss 0.4344   LearningRate 0.0044   Epoch: 15   Global Step: 264070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:37,494-Speed 3246.98 samples/sec   Loss 0.4415   LearningRate 0.0044   Epoch: 15   Global Step: 264080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:40,571-Speed 3328.32 samples/sec   Loss 0.4352   LearningRate 0.0044   Epoch: 15   Global Step: 264090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:43,666-Speed 3309.80 samples/sec   Loss 0.4313   LearningRate 0.0044   Epoch: 15   Global Step: 264100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:46,740-Speed 3331.27 samples/sec   Loss 0.4510   LearningRate 0.0044   Epoch: 15   Global Step: 264110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:49,816-Speed 3330.55 samples/sec   Loss 0.4260   LearningRate 0.0044   Epoch: 15   Global Step: 264120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:52,872-Speed 3351.30 samples/sec   Loss 0.4325   LearningRate 0.0044   Epoch: 15   Global Step: 264130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:55,988-Speed 3286.76 samples/sec   Loss 0.4262   LearningRate 0.0044   Epoch: 15   Global Step: 264140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:12:59,105-Speed 3285.94 samples/sec   Loss 0.4441   LearningRate 0.0044   Epoch: 15   Global Step: 264150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:13:02,240-Speed 3267.30 samples/sec   Loss 0.4342   LearningRate 0.0044   Epoch: 15   Global Step: 264160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:13:05,364-Speed 3278.87 samples/sec   Loss 0.4311   LearningRate 0.0044   Epoch: 15   Global Step: 264170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:13:08,415-Speed 3356.94 samples/sec   Loss 0.3984   LearningRate 0.0044   Epoch: 15   Global Step: 264180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:11,609-Speed 3206.84 samples/sec   Loss 0.4389   LearningRate 0.0044   Epoch: 15   Global Step: 264190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:14,728-Speed 3283.82 samples/sec   Loss 0.4521   LearningRate 0.0043   Epoch: 15   Global Step: 264200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:17,787-Speed 3348.15 samples/sec   Loss 0.4476   LearningRate 0.0043   Epoch: 15   Global Step: 264210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:20,961-Speed 3227.59 samples/sec   Loss 0.4419   LearningRate 0.0043   Epoch: 15   Global Step: 264220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:24,075-Speed 3288.29 samples/sec   Loss 0.4403   LearningRate 0.0043   Epoch: 15   Global Step: 264230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:27,243-Speed 3233.17 samples/sec   Loss 0.4462   LearningRate 0.0043   Epoch: 15   Global Step: 264240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:30,333-Speed 3314.65 samples/sec   Loss 0.4579   LearningRate 0.0043   Epoch: 15   Global Step: 264250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:33,411-Speed 3328.10 samples/sec   Loss 0.4568   LearningRate 0.0043   Epoch: 15   Global Step: 264260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:36,469-Speed 3349.50 samples/sec   Loss 0.4300   LearningRate 0.0043   Epoch: 15   Global Step: 264270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:39,538-Speed 3337.86 samples/sec   Loss 0.4487   LearningRate 0.0043   Epoch: 15   Global Step: 264280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:13:42,596-Speed 3350.14 samples/sec   Loss 0.4755   LearningRate 0.0043   Epoch: 15   Global Step: 264290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:13:45,653-Speed 3349.79 samples/sec   Loss 0.4502   LearningRate 0.0043   Epoch: 15   Global Step: 264300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:13:48,700-Speed 3360.96 samples/sec   Loss 0.4648   LearningRate 0.0043   Epoch: 15   Global Step: 264310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:51,768-Speed 3338.95 samples/sec   Loss 0.4746   LearningRate 0.0043   Epoch: 15   Global Step: 264320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:54,835-Speed 3339.22 samples/sec   Loss 0.4583   LearningRate 0.0043   Epoch: 15   Global Step: 264330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:13:57,914-Speed 3327.56 samples/sec   Loss 0.4669   LearningRate 0.0043   Epoch: 15   Global Step: 264340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:00,977-Speed 3343.63 samples/sec   Loss 0.4283   LearningRate 0.0043   Epoch: 15   Global Step: 264350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:04,065-Speed 3316.95 samples/sec   Loss 0.4315   LearningRate 0.0043   Epoch: 15   Global Step: 264360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:07,134-Speed 3336.82 samples/sec   Loss 0.4371   LearningRate 0.0043   Epoch: 15   Global Step: 264370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:10,199-Speed 3342.18 samples/sec   Loss 0.4599   LearningRate 0.0043   Epoch: 15   Global Step: 264380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:13,338-Speed 3263.34 samples/sec   Loss 0.4067   LearningRate 0.0043   Epoch: 15   Global Step: 264390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:16,420-Speed 3322.95 samples/sec   Loss 0.4307   LearningRate 0.0043   Epoch: 15   Global Step: 264400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:19,581-Speed 3239.82 samples/sec   Loss 0.4110   LearningRate 0.0043   Epoch: 15   Global Step: 264410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:14:22,655-Speed 3331.73 samples/sec   Loss 0.4103   LearningRate 0.0043   Epoch: 15   Global Step: 264420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:25,750-Speed 3310.76 samples/sec   Loss 0.4424   LearningRate 0.0043   Epoch: 15   Global Step: 264430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:28,846-Speed 3308.13 samples/sec   Loss 0.4643   LearningRate 0.0043   Epoch: 15   Global Step: 264440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:31,989-Speed 3258.70 samples/sec   Loss 0.4583   LearningRate 0.0043   Epoch: 15   Global Step: 264450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:35,047-Speed 3348.80 samples/sec   Loss 0.4863   LearningRate 0.0043   Epoch: 15   Global Step: 264460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:38,117-Speed 3336.74 samples/sec   Loss 0.4568   LearningRate 0.0043   Epoch: 15   Global Step: 264470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:41,175-Speed 3348.69 samples/sec   Loss 0.4098   LearningRate 0.0043   Epoch: 15   Global Step: 264480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:44,290-Speed 3288.40 samples/sec   Loss 0.4423   LearningRate 0.0043   Epoch: 15   Global Step: 264490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:47,486-Speed 3205.04 samples/sec   Loss 0.4222   LearningRate 0.0043   Epoch: 15   Global Step: 264500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:50,556-Speed 3335.96 samples/sec   Loss 0.4692   LearningRate 0.0043   Epoch: 15   Global Step: 264510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:14:53,629-Speed 3333.59 samples/sec   Loss 0.4233   LearningRate 0.0043   Epoch: 15   Global Step: 264520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:14:56,893-Speed 3137.97 samples/sec   Loss 0.4370   LearningRate 0.0043   Epoch: 15   Global Step: 264530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:00,023-Speed 3271.80 samples/sec   Loss 0.4264   LearningRate 0.0043   Epoch: 15   Global Step: 264540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:03,087-Speed 3343.00 samples/sec   Loss 0.4304   LearningRate 0.0043   Epoch: 15   Global Step: 264550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:06,188-Speed 3302.69 samples/sec   Loss 0.4427   LearningRate 0.0043   Epoch: 15   Global Step: 264560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:09,252-Speed 3343.15 samples/sec   Loss 0.4536   LearningRate 0.0043   Epoch: 15   Global Step: 264570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:12,334-Speed 3323.09 samples/sec   Loss 0.4427   LearningRate 0.0043   Epoch: 15   Global Step: 264580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:15,412-Speed 3327.71 samples/sec   Loss 0.4569   LearningRate 0.0043   Epoch: 15   Global Step: 264590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:18,471-Speed 3348.50 samples/sec   Loss 0.4215   LearningRate 0.0043   Epoch: 15   Global Step: 264600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:21,534-Speed 3343.73 samples/sec   Loss 0.4434   LearningRate 0.0043   Epoch: 15   Global Step: 264610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:24,594-Speed 3347.20 samples/sec   Loss 0.4457   LearningRate 0.0043   Epoch: 15   Global Step: 264620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:15:27,662-Speed 3338.55 samples/sec   Loss 0.4398   LearningRate 0.0043   Epoch: 15   Global Step: 264630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:30,724-Speed 3344.67 samples/sec   Loss 0.4352   LearningRate 0.0043   Epoch: 15   Global Step: 264640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:33,804-Speed 3324.79 samples/sec   Loss 0.4349   LearningRate 0.0043   Epoch: 15   Global Step: 264650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:36,869-Speed 3341.80 samples/sec   Loss 0.4308   LearningRate 0.0043   Epoch: 15   Global Step: 264660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:39,937-Speed 3339.16 samples/sec   Loss 0.4263   LearningRate 0.0043   Epoch: 15   Global Step: 264670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:43,003-Speed 3340.70 samples/sec   Loss 0.4551   LearningRate 0.0043   Epoch: 15   Global Step: 264680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:46,058-Speed 3351.78 samples/sec   Loss 0.4125   LearningRate 0.0043   Epoch: 15   Global Step: 264690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:49,188-Speed 3272.91 samples/sec   Loss 0.4301   LearningRate 0.0043   Epoch: 15   Global Step: 264700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:52,374-Speed 3215.19 samples/sec   Loss 0.4414   LearningRate 0.0043   Epoch: 15   Global Step: 264710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:55,528-Speed 3247.30 samples/sec   Loss 0.4465   LearningRate 0.0043   Epoch: 15   Global Step: 264720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:15:58,645-Speed 3285.60 samples/sec   Loss 0.4391   LearningRate 0.0043   Epoch: 15   Global Step: 264730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:01,779-Speed 3268.10 samples/sec   Loss 0.4469   LearningRate 0.0043   Epoch: 15   Global Step: 264740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:04,900-Speed 3282.00 samples/sec   Loss 0.4573   LearningRate 0.0043   Epoch: 15   Global Step: 264750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:07,961-Speed 3345.93 samples/sec   Loss 0.4346   LearningRate 0.0043   Epoch: 15   Global Step: 264760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:11,034-Speed 3333.14 samples/sec   Loss 0.4471   LearningRate 0.0043   Epoch: 15   Global Step: 264770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:14,107-Speed 3333.22 samples/sec   Loss 0.4254   LearningRate 0.0043   Epoch: 15   Global Step: 264780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:17,168-Speed 3346.53 samples/sec   Loss 0.4566   LearningRate 0.0043   Epoch: 15   Global Step: 264790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:20,234-Speed 3340.73 samples/sec   Loss 0.4425   LearningRate 0.0043   Epoch: 15   Global Step: 264800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:23,297-Speed 3343.55 samples/sec   Loss 0.4535   LearningRate 0.0043   Epoch: 15   Global Step: 264810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:26,364-Speed 3338.80 samples/sec   Loss 0.4543   LearningRate 0.0043   Epoch: 15   Global Step: 264820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:29,425-Speed 3346.38 samples/sec   Loss 0.4226   LearningRate 0.0043   Epoch: 15   Global Step: 264830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:32,561-Speed 3266.01 samples/sec   Loss 0.4522   LearningRate 0.0043   Epoch: 15   Global Step: 264840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:35,658-Speed 3307.07 samples/sec   Loss 0.4201   LearningRate 0.0043   Epoch: 15   Global Step: 264850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:38,747-Speed 3316.16 samples/sec   Loss 0.4576   LearningRate 0.0043   Epoch: 15   Global Step: 264860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:41,843-Speed 3309.12 samples/sec   Loss 0.4599   LearningRate 0.0043   Epoch: 15   Global Step: 264870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:44,916-Speed 3332.39 samples/sec   Loss 0.4237   LearningRate 0.0043   Epoch: 15   Global Step: 264880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:48,009-Speed 3311.92 samples/sec   Loss 0.4617   LearningRate 0.0043   Epoch: 15   Global Step: 264890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:51,073-Speed 3342.99 samples/sec   Loss 0.4530   LearningRate 0.0043   Epoch: 15   Global Step: 264900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:16:54,129-Speed 3351.51 samples/sec   Loss 0.4224   LearningRate 0.0043   Epoch: 15   Global Step: 264910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:16:57,200-Speed 3335.29 samples/sec   Loss 0.4594   LearningRate 0.0043   Epoch: 15   Global Step: 264920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:00,314-Speed 3288.85 samples/sec   Loss 0.4376   LearningRate 0.0043   Epoch: 15   Global Step: 264930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:03,383-Speed 3336.90 samples/sec   Loss 0.4297   LearningRate 0.0043   Epoch: 15   Global Step: 264940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:06,445-Speed 3345.53 samples/sec   Loss 0.4239   LearningRate 0.0043   Epoch: 15   Global Step: 264950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:09,522-Speed 3329.22 samples/sec   Loss 0.4456   LearningRate 0.0043   Epoch: 15   Global Step: 264960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:12,582-Speed 3346.23 samples/sec   Loss 0.4634   LearningRate 0.0043   Epoch: 15   Global Step: 264970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:15,680-Speed 3306.57 samples/sec   Loss 0.4333   LearningRate 0.0043   Epoch: 15   Global Step: 264980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:18,785-Speed 3298.87 samples/sec   Loss 0.4500   LearningRate 0.0043   Epoch: 15   Global Step: 264990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:21,855-Speed 3335.95 samples/sec   Loss 0.4343   LearningRate 0.0043   Epoch: 15   Global Step: 265000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:24,919-Speed 3343.19 samples/sec   Loss 0.4403   LearningRate 0.0042   Epoch: 15   Global Step: 265010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:17:27,985-Speed 3340.50 samples/sec   Loss 0.4597   LearningRate 0.0042   Epoch: 15   Global Step: 265020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:17:31,063-Speed 3327.39 samples/sec   Loss 0.4405   LearningRate 0.0042   Epoch: 15   Global Step: 265030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:17:34,130-Speed 3340.08 samples/sec   Loss 0.4063   LearningRate 0.0042   Epoch: 15   Global Step: 265040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:17:37,193-Speed 3343.40 samples/sec   Loss 0.4443   LearningRate 0.0042   Epoch: 15   Global Step: 265050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:17:40,261-Speed 3337.94 samples/sec   Loss 0.4456   LearningRate 0.0042   Epoch: 15   Global Step: 265060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:17:43,465-Speed 3196.89 samples/sec   Loss 0.4315   LearningRate 0.0042   Epoch: 15   Global Step: 265070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:17:46,586-Speed 3282.49 samples/sec   Loss 0.4645   LearningRate 0.0042   Epoch: 15   Global Step: 265080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:49,663-Speed 3327.95 samples/sec   Loss 0.4484   LearningRate 0.0042   Epoch: 15   Global Step: 265090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:52,787-Speed 3278.54 samples/sec   Loss 0.4711   LearningRate 0.0042   Epoch: 15   Global Step: 265100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:55,905-Speed 3284.66 samples/sec   Loss 0.4322   LearningRate 0.0042   Epoch: 15   Global Step: 265110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:17:59,034-Speed 3273.33 samples/sec   Loss 0.4458   LearningRate 0.0042   Epoch: 15   Global Step: 265120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:02,147-Speed 3291.08 samples/sec   Loss 0.4481   LearningRate 0.0042   Epoch: 15   Global Step: 265130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:05,208-Speed 3345.70 samples/sec   Loss 0.4453   LearningRate 0.0042   Epoch: 15   Global Step: 265140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:08,332-Speed 3279.02 samples/sec   Loss 0.4244   LearningRate 0.0042   Epoch: 15   Global Step: 265150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:11,429-Speed 3306.55 samples/sec   Loss 0.4398   LearningRate 0.0042   Epoch: 15   Global Step: 265160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:14,501-Speed 3334.63 samples/sec   Loss 0.4551   LearningRate 0.0042   Epoch: 15   Global Step: 265170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:17,576-Speed 3330.24 samples/sec   Loss 0.4331   LearningRate 0.0042   Epoch: 15   Global Step: 265180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:18:20,636-Speed 3347.82 samples/sec   Loss 0.4455   LearningRate 0.0042   Epoch: 15   Global Step: 265190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:23,724-Speed 3316.89 samples/sec   Loss 0.4374   LearningRate 0.0042   Epoch: 15   Global Step: 265200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:26,801-Speed 3328.17 samples/sec   Loss 0.4074   LearningRate 0.0042   Epoch: 15   Global Step: 265210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:29,916-Speed 3288.46 samples/sec   Loss 0.4173   LearningRate 0.0042   Epoch: 15   Global Step: 265220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:33,120-Speed 3196.70 samples/sec   Loss 0.4366   LearningRate 0.0042   Epoch: 15   Global Step: 265230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:36,198-Speed 3327.83 samples/sec   Loss 0.4397   LearningRate 0.0042   Epoch: 15   Global Step: 265240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:39,287-Speed 3315.92 samples/sec   Loss 0.4243   LearningRate 0.0042   Epoch: 15   Global Step: 265250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:42,415-Speed 3274.25 samples/sec   Loss 0.4518   LearningRate 0.0042   Epoch: 15   Global Step: 265260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:45,539-Speed 3278.50 samples/sec   Loss 0.4525   LearningRate 0.0042   Epoch: 15   Global Step: 265270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:48,727-Speed 3212.43 samples/sec   Loss 0.4507   LearningRate 0.0042   Epoch: 15   Global Step: 265280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:18:51,831-Speed 3299.49 samples/sec   Loss 0.4503   LearningRate 0.0042   Epoch: 15   Global Step: 265290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:18:54,924-Speed 3311.40 samples/sec   Loss 0.4259   LearningRate 0.0042   Epoch: 15   Global Step: 265300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:18:58,069-Speed 3257.56 samples/sec   Loss 0.4512   LearningRate 0.0042   Epoch: 15   Global Step: 265310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:19:01,148-Speed 3327.11 samples/sec   Loss 0.4481   LearningRate 0.0042   Epoch: 15   Global Step: 265320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:19:04,235-Speed 3317.46 samples/sec   Loss 0.4641   LearningRate 0.0042   Epoch: 15   Global Step: 265330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:19:07,309-Speed 3331.73 samples/sec   Loss 0.4397   LearningRate 0.0042   Epoch: 15   Global Step: 265340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:19:10,468-Speed 3241.81 samples/sec   Loss 0.4242   LearningRate 0.0042   Epoch: 15   Global Step: 265350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:19:13,624-Speed 3245.96 samples/sec   Loss 0.4341   LearningRate 0.0042   Epoch: 15   Global Step: 265360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:16,695-Speed 3334.91 samples/sec   Loss 0.4383   LearningRate 0.0042   Epoch: 15   Global Step: 265370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:19,768-Speed 3332.87 samples/sec   Loss 0.4376   LearningRate 0.0042   Epoch: 15   Global Step: 265380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:22,832-Speed 3341.99 samples/sec   Loss 0.4501   LearningRate 0.0042   Epoch: 15   Global Step: 265390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:25,898-Speed 3341.91 samples/sec   Loss 0.4345   LearningRate 0.0042   Epoch: 15   Global Step: 265400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:28,996-Speed 3306.02 samples/sec   Loss 0.4296   LearningRate 0.0042   Epoch: 15   Global Step: 265410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:32,078-Speed 3322.89 samples/sec   Loss 0.4154   LearningRate 0.0042   Epoch: 15   Global Step: 265420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:35,168-Speed 3314.57 samples/sec   Loss 0.4545   LearningRate 0.0042   Epoch: 15   Global Step: 265430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:38,292-Speed 3278.58 samples/sec   Loss 0.3930   LearningRate 0.0042   Epoch: 15   Global Step: 265440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:41,367-Speed 3330.75 samples/sec   Loss 0.4078   LearningRate 0.0042   Epoch: 15   Global Step: 265450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:44,451-Speed 3321.66 samples/sec   Loss 0.4441   LearningRate 0.0042   Epoch: 15   Global Step: 265460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:19:47,523-Speed 3334.35 samples/sec   Loss 0.4793   LearningRate 0.0042   Epoch: 15   Global Step: 265470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:19:50,632-Speed 3293.93 samples/sec   Loss 0.4209   LearningRate 0.0042   Epoch: 15   Global Step: 265480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:53,701-Speed 3337.87 samples/sec   Loss 0.4310   LearningRate 0.0042   Epoch: 15   Global Step: 265490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:56,763-Speed 3344.75 samples/sec   Loss 0.4145   LearningRate 0.0042   Epoch: 15   Global Step: 265500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:19:59,853-Speed 3315.18 samples/sec   Loss 0.4338   LearningRate 0.0042   Epoch: 15   Global Step: 265510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:20:03,002-Speed 3252.29 samples/sec   Loss 0.4278   LearningRate 0.0042   Epoch: 15   Global Step: 265520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:20:06,070-Speed 3338.20 samples/sec   Loss 0.4631   LearningRate 0.0042   Epoch: 15   Global Step: 265530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:20:09,182-Speed 3290.92 samples/sec   Loss 0.4462   LearningRate 0.0042   Epoch: 15   Global Step: 265540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:20:12,266-Speed 3320.77 samples/sec   Loss 0.4324   LearningRate 0.0042   Epoch: 15   Global Step: 265550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:20:15,374-Speed 3296.31 samples/sec   Loss 0.4432   LearningRate 0.0042   Epoch: 15   Global Step: 265560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:20:18,510-Speed 3266.25 samples/sec   Loss 0.4500   LearningRate 0.0042   Epoch: 15   Global Step: 265570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:20:21,588-Speed 3327.58 samples/sec   Loss 0.4508   LearningRate 0.0042   Epoch: 15   Global Step: 265580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:24,649-Speed 3345.56 samples/sec   Loss 0.4388   LearningRate 0.0042   Epoch: 15   Global Step: 265590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:27,715-Speed 3340.95 samples/sec   Loss 0.4115   LearningRate 0.0042   Epoch: 15   Global Step: 265600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:30,792-Speed 3329.09 samples/sec   Loss 0.4445   LearningRate 0.0042   Epoch: 15   Global Step: 265610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:33,938-Speed 3255.48 samples/sec   Loss 0.4364   LearningRate 0.0042   Epoch: 15   Global Step: 265620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:37,061-Speed 3279.05 samples/sec   Loss 0.4429   LearningRate 0.0042   Epoch: 15   Global Step: 265630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:40,215-Speed 3247.50 samples/sec   Loss 0.4094   LearningRate 0.0042   Epoch: 15   Global Step: 265640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:43,352-Speed 3265.79 samples/sec   Loss 0.4301   LearningRate 0.0042   Epoch: 15   Global Step: 265650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:46,416-Speed 3343.11 samples/sec   Loss 0.4667   LearningRate 0.0042   Epoch: 15   Global Step: 265660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:49,541-Speed 3277.06 samples/sec   Loss 0.4358   LearningRate 0.0042   Epoch: 15   Global Step: 265670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:52,592-Speed 3356.57 samples/sec   Loss 0.4281   LearningRate 0.0042   Epoch: 15   Global Step: 265680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:55,713-Speed 3282.58 samples/sec   Loss 0.4303   LearningRate 0.0042   Epoch: 15   Global Step: 265690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:20:58,930-Speed 3183.68 samples/sec   Loss 0.4446   LearningRate 0.0042   Epoch: 15   Global Step: 265700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:21:02,061-Speed 3271.43 samples/sec   Loss 0.4383   LearningRate 0.0042   Epoch: 15   Global Step: 265710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:21:05,134-Speed 3332.13 samples/sec   Loss 0.4380   LearningRate 0.0042   Epoch: 15   Global Step: 265720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:21:08,202-Speed 3338.73 samples/sec   Loss 0.4608   LearningRate 0.0042   Epoch: 15   Global Step: 265730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:21:11,269-Speed 3339.70 samples/sec   Loss 0.4274   LearningRate 0.0042   Epoch: 15   Global Step: 265740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:21:14,333-Speed 3343.46 samples/sec   Loss 0.4161   LearningRate 0.0042   Epoch: 15   Global Step: 265750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:21:17,400-Speed 3338.79 samples/sec   Loss 0.4217   LearningRate 0.0042   Epoch: 15   Global Step: 265760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:21:20,491-Speed 3314.27 samples/sec   Loss 0.4283   LearningRate 0.0042   Epoch: 15   Global Step: 265770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:23,612-Speed 3281.37 samples/sec   Loss 0.4272   LearningRate 0.0042   Epoch: 15   Global Step: 265780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:26,746-Speed 3268.38 samples/sec   Loss 0.4568   LearningRate 0.0042   Epoch: 15   Global Step: 265790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:29,815-Speed 3337.32 samples/sec   Loss 0.4398   LearningRate 0.0042   Epoch: 15   Global Step: 265800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:32,885-Speed 3336.00 samples/sec   Loss 0.4302   LearningRate 0.0042   Epoch: 15   Global Step: 265810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:35,948-Speed 3344.62 samples/sec   Loss 0.4136   LearningRate 0.0041   Epoch: 15   Global Step: 265820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:39,081-Speed 3269.18 samples/sec   Loss 0.4425   LearningRate 0.0041   Epoch: 15   Global Step: 265830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:42,185-Speed 3300.10 samples/sec   Loss 0.4367   LearningRate 0.0041   Epoch: 15   Global Step: 265840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:45,249-Speed 3342.69 samples/sec   Loss 0.4320   LearningRate 0.0041   Epoch: 15   Global Step: 265850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:48,310-Speed 3345.39 samples/sec   Loss 0.4522   LearningRate 0.0041   Epoch: 15   Global Step: 265860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:51,392-Speed 3323.72 samples/sec   Loss 0.4511   LearningRate 0.0041   Epoch: 15   Global Step: 265870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:21:54,452-Speed 3347.17 samples/sec   Loss 0.4167   LearningRate 0.0041   Epoch: 15   Global Step: 265880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:21:57,513-Speed 3345.18 samples/sec   Loss 0.4235   LearningRate 0.0041   Epoch: 15   Global Step: 265890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:00,608-Speed 3309.96 samples/sec   Loss 0.4688   LearningRate 0.0041   Epoch: 15   Global Step: 265900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:03,771-Speed 3237.71 samples/sec   Loss 0.4481   LearningRate 0.0041   Epoch: 15   Global Step: 265910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:06,834-Speed 3345.07 samples/sec   Loss 0.4415   LearningRate 0.0041   Epoch: 15   Global Step: 265920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:09,934-Speed 3303.64 samples/sec   Loss 0.4509   LearningRate 0.0041   Epoch: 15   Global Step: 265930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:13,035-Speed 3302.75 samples/sec   Loss 0.4384   LearningRate 0.0041   Epoch: 15   Global Step: 265940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:16,098-Speed 3343.83 samples/sec   Loss 0.4493   LearningRate 0.0041   Epoch: 15   Global Step: 265950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:19,180-Speed 3322.87 samples/sec   Loss 0.4420   LearningRate 0.0041   Epoch: 15   Global Step: 265960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:22,258-Speed 3327.38 samples/sec   Loss 0.4292   LearningRate 0.0041   Epoch: 15   Global Step: 265970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:25,335-Speed 3328.75 samples/sec   Loss 0.4256   LearningRate 0.0041   Epoch: 15   Global Step: 265980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:22:28,385-Speed 3358.27 samples/sec   Loss 0.4051   LearningRate 0.0041   Epoch: 15   Global Step: 265990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:22:31,457-Speed 3334.88 samples/sec   Loss 0.4278   LearningRate 0.0041   Epoch: 15   Global Step: 266000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:23:15,793-[lfw][266000]XNorm: 21.736719
Training: 2022-04-12 03:23:15,793-[lfw][266000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 03:23:15,794-[lfw][266000]Accuracy-Highest: 0.99817
Training: 2022-04-12 03:24:07,268-[cfp_fp][266000]XNorm: 23.034383
Training: 2022-04-12 03:24:07,269-[cfp_fp][266000]Accuracy-Flip: 0.99057+-0.00453
Training: 2022-04-12 03:24:07,269-[cfp_fp][266000]Accuracy-Highest: 0.99186
Training: 2022-04-12 03:24:51,495-[agedb_30][266000]XNorm: 23.315120
Training: 2022-04-12 03:24:51,495-[agedb_30][266000]Accuracy-Flip: 0.98333+-0.00619
Training: 2022-04-12 03:24:51,496-[agedb_30][266000]Accuracy-Highest: 0.98650
Training: 2022-04-12 03:24:54,568-Speed 71.55 samples/sec   Loss 0.4420   LearningRate 0.0041   Epoch: 15   Global Step: 266010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:24:57,633-Speed 3343.04 samples/sec   Loss 0.4368   LearningRate 0.0041   Epoch: 15   Global Step: 266020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:00,705-Speed 3333.49 samples/sec   Loss 0.4286   LearningRate 0.0041   Epoch: 15   Global Step: 266030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:03,751-Speed 3363.03 samples/sec   Loss 0.4582   LearningRate 0.0041   Epoch: 15   Global Step: 266040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:06,837-Speed 3318.27 samples/sec   Loss 0.4264   LearningRate 0.0041   Epoch: 15   Global Step: 266050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:09,893-Speed 3351.98 samples/sec   Loss 0.4107   LearningRate 0.0041   Epoch: 15   Global Step: 266060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:13,024-Speed 3270.98 samples/sec   Loss 0.4305   LearningRate 0.0041   Epoch: 15   Global Step: 266070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:16,169-Speed 3256.99 samples/sec   Loss 0.4009   LearningRate 0.0041   Epoch: 15   Global Step: 266080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:19,296-Speed 3275.52 samples/sec   Loss 0.4364   LearningRate 0.0041   Epoch: 15   Global Step: 266090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:25:22,388-Speed 3312.22 samples/sec   Loss 0.4646   LearningRate 0.0041   Epoch: 15   Global Step: 266100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:25:25,461-Speed 3332.95 samples/sec   Loss 0.4134   LearningRate 0.0041   Epoch: 15   Global Step: 266110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:25:28,588-Speed 3275.32 samples/sec   Loss 0.4215   LearningRate 0.0041   Epoch: 15   Global Step: 266120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:25:31,658-Speed 3336.81 samples/sec   Loss 0.4481   LearningRate 0.0041   Epoch: 15   Global Step: 266130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:25:34,796-Speed 3264.28 samples/sec   Loss 0.4279   LearningRate 0.0041   Epoch: 15   Global Step: 266140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:25:37,909-Speed 3289.82 samples/sec   Loss 0.3975   LearningRate 0.0041   Epoch: 15   Global Step: 266150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:25:40,994-Speed 3319.75 samples/sec   Loss 0.4273   LearningRate 0.0041   Epoch: 15   Global Step: 266160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:44,196-Speed 3198.45 samples/sec   Loss 0.4329   LearningRate 0.0041   Epoch: 15   Global Step: 266170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:47,268-Speed 3334.65 samples/sec   Loss 0.4231   LearningRate 0.0041   Epoch: 15   Global Step: 266180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:50,336-Speed 3340.97 samples/sec   Loss 0.4030   LearningRate 0.0041   Epoch: 15   Global Step: 266190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:53,399-Speed 3343.67 samples/sec   Loss 0.4196   LearningRate 0.0041   Epoch: 15   Global Step: 266200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:56,458-Speed 3349.12 samples/sec   Loss 0.4410   LearningRate 0.0041   Epoch: 15   Global Step: 266210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:25:59,626-Speed 3232.93 samples/sec   Loss 0.4617   LearningRate 0.0041   Epoch: 15   Global Step: 266220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:02,743-Speed 3286.18 samples/sec   Loss 0.4178   LearningRate 0.0041   Epoch: 15   Global Step: 266230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:05,896-Speed 3247.81 samples/sec   Loss 0.4187   LearningRate 0.0041   Epoch: 15   Global Step: 266240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:08,961-Speed 3341.94 samples/sec   Loss 0.4230   LearningRate 0.0041   Epoch: 15   Global Step: 266250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:12,039-Speed 3326.85 samples/sec   Loss 0.4222   LearningRate 0.0041   Epoch: 15   Global Step: 266260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:26:15,132-Speed 3311.66 samples/sec   Loss 0.4355   LearningRate 0.0041   Epoch: 15   Global Step: 266270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:26:18,268-Speed 3265.99 samples/sec   Loss 0.4348   LearningRate 0.0041   Epoch: 15   Global Step: 266280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:26:21,336-Speed 3338.75 samples/sec   Loss 0.4326   LearningRate 0.0041   Epoch: 15   Global Step: 266290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:26:24,388-Speed 3356.44 samples/sec   Loss 0.4568   LearningRate 0.0041   Epoch: 15   Global Step: 266300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:27,449-Speed 3345.86 samples/sec   Loss 0.4288   LearningRate 0.0041   Epoch: 15   Global Step: 266310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:30,507-Speed 3348.76 samples/sec   Loss 0.4142   LearningRate 0.0041   Epoch: 15   Global Step: 266320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:33,576-Speed 3337.84 samples/sec   Loss 0.4129   LearningRate 0.0041   Epoch: 15   Global Step: 266330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:36,679-Speed 3300.23 samples/sec   Loss 0.4322   LearningRate 0.0041   Epoch: 15   Global Step: 266340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:39,748-Speed 3337.47 samples/sec   Loss 0.4350   LearningRate 0.0041   Epoch: 15   Global Step: 266350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:42,816-Speed 3338.47 samples/sec   Loss 0.4122   LearningRate 0.0041   Epoch: 15   Global Step: 266360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:45,884-Speed 3338.23 samples/sec   Loss 0.4192   LearningRate 0.0041   Epoch: 15   Global Step: 266370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:49,105-Speed 3180.03 samples/sec   Loss 0.4246   LearningRate 0.0041   Epoch: 15   Global Step: 266380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:52,163-Speed 3350.04 samples/sec   Loss 0.4454   LearningRate 0.0041   Epoch: 15   Global Step: 266390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:26:55,221-Speed 3349.33 samples/sec   Loss 0.4429   LearningRate 0.0041   Epoch: 15   Global Step: 266400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:26:58,306-Speed 3319.59 samples/sec   Loss 0.4484   LearningRate 0.0041   Epoch: 15   Global Step: 266410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:01,376-Speed 3336.80 samples/sec   Loss 0.4466   LearningRate 0.0041   Epoch: 15   Global Step: 266420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:04,438-Speed 3344.57 samples/sec   Loss 0.4492   LearningRate 0.0041   Epoch: 15   Global Step: 266430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:07,494-Speed 3351.51 samples/sec   Loss 0.4322   LearningRate 0.0041   Epoch: 15   Global Step: 266440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:10,628-Speed 3268.42 samples/sec   Loss 0.4371   LearningRate 0.0041   Epoch: 15   Global Step: 266450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:13,696-Speed 3337.70 samples/sec   Loss 0.4157   LearningRate 0.0041   Epoch: 15   Global Step: 266460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:16,772-Speed 3330.64 samples/sec   Loss 0.4350   LearningRate 0.0041   Epoch: 15   Global Step: 266470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:19,853-Speed 3324.77 samples/sec   Loss 0.4520   LearningRate 0.0041   Epoch: 15   Global Step: 266480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:22,917-Speed 3341.81 samples/sec   Loss 0.4498   LearningRate 0.0041   Epoch: 15   Global Step: 266490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:25,990-Speed 3333.38 samples/sec   Loss 0.4370   LearningRate 0.0041   Epoch: 15   Global Step: 266500   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:27:29,051-Speed 3346.01 samples/sec   Loss 0.4291   LearningRate 0.0041   Epoch: 15   Global Step: 266510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:32,148-Speed 3307.57 samples/sec   Loss 0.4563   LearningRate 0.0041   Epoch: 15   Global Step: 266520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:35,210-Speed 3344.73 samples/sec   Loss 0.4424   LearningRate 0.0041   Epoch: 15   Global Step: 266530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:38,276-Speed 3340.06 samples/sec   Loss 0.4388   LearningRate 0.0041   Epoch: 15   Global Step: 266540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:41,335-Speed 3348.62 samples/sec   Loss 0.4514   LearningRate 0.0041   Epoch: 15   Global Step: 266550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:44,418-Speed 3322.42 samples/sec   Loss 0.4396   LearningRate 0.0041   Epoch: 15   Global Step: 266560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:47,475-Speed 3350.37 samples/sec   Loss 0.4422   LearningRate 0.0041   Epoch: 15   Global Step: 266570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:27:50,524-Speed 3359.67 samples/sec   Loss 0.4470   LearningRate 0.0041   Epoch: 15   Global Step: 266580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:27:53,583-Speed 3348.18 samples/sec   Loss 0.4443   LearningRate 0.0041   Epoch: 15   Global Step: 266590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:27:56,641-Speed 3349.73 samples/sec   Loss 0.4256   LearningRate 0.0041   Epoch: 15   Global Step: 266600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:27:59,711-Speed 3336.44 samples/sec   Loss 0.4278   LearningRate 0.0041   Epoch: 15   Global Step: 266610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:28:02,778-Speed 3339.61 samples/sec   Loss 0.4621   LearningRate 0.0041   Epoch: 15   Global Step: 266620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:28:05,837-Speed 3347.19 samples/sec   Loss 0.4249   LearningRate 0.0041   Epoch: 15   Global Step: 266630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:28:08,915-Speed 3327.57 samples/sec   Loss 0.4450   LearningRate 0.0041   Epoch: 15   Global Step: 266640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:28:11,986-Speed 3336.02 samples/sec   Loss 0.4495   LearningRate 0.0040   Epoch: 15   Global Step: 266650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:28:15,142-Speed 3244.88 samples/sec   Loss 0.4247   LearningRate 0.0040   Epoch: 15   Global Step: 266660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:28:18,214-Speed 3334.39 samples/sec   Loss 0.4221   LearningRate 0.0040   Epoch: 15   Global Step: 266670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:28:21,281-Speed 3339.19 samples/sec   Loss 0.4430   LearningRate 0.0040   Epoch: 15   Global Step: 266680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-12 03:28:24,385-Speed 3300.26 samples/sec   Loss 0.4545   LearningRate 0.0040   Epoch: 15   Global Step: 266690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:27,451-Speed 3340.71 samples/sec   Loss 0.4300   LearningRate 0.0040   Epoch: 15   Global Step: 266700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:30,625-Speed 3227.04 samples/sec   Loss 0.3956   LearningRate 0.0040   Epoch: 15   Global Step: 266710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:33,694-Speed 3337.05 samples/sec   Loss 0.4759   LearningRate 0.0040   Epoch: 15   Global Step: 266720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:36,804-Speed 3293.51 samples/sec   Loss 0.4416   LearningRate 0.0040   Epoch: 15   Global Step: 266730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:39,866-Speed 3344.46 samples/sec   Loss 0.4428   LearningRate 0.0040   Epoch: 15   Global Step: 266740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:42,946-Speed 3325.55 samples/sec   Loss 0.4260   LearningRate 0.0040   Epoch: 15   Global Step: 266750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:46,032-Speed 3319.36 samples/sec   Loss 0.4255   LearningRate 0.0040   Epoch: 15   Global Step: 266760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:49,104-Speed 3334.71 samples/sec   Loss 0.4507   LearningRate 0.0040   Epoch: 15   Global Step: 266770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:52,182-Speed 3327.00 samples/sec   Loss 0.4415   LearningRate 0.0040   Epoch: 15   Global Step: 266780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:28:55,275-Speed 3311.16 samples/sec   Loss 0.4390   LearningRate 0.0040   Epoch: 15   Global Step: 266790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:28:58,431-Speed 3245.15 samples/sec   Loss 0.4406   LearningRate 0.0040   Epoch: 15   Global Step: 266800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:29:01,564-Speed 3269.83 samples/sec   Loss 0.4170   LearningRate 0.0040   Epoch: 15   Global Step: 266810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:29:04,639-Speed 3330.92 samples/sec   Loss 0.4372   LearningRate 0.0040   Epoch: 15   Global Step: 266820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:07,709-Speed 3336.04 samples/sec   Loss 0.4433   LearningRate 0.0040   Epoch: 15   Global Step: 266830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:10,790-Speed 3324.56 samples/sec   Loss 0.4307   LearningRate 0.0040   Epoch: 15   Global Step: 266840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:13,859-Speed 3337.72 samples/sec   Loss 0.4474   LearningRate 0.0040   Epoch: 15   Global Step: 266850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:16,930-Speed 3334.46 samples/sec   Loss 0.4597   LearningRate 0.0040   Epoch: 15   Global Step: 266860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:20,078-Speed 3253.87 samples/sec   Loss 0.4445   LearningRate 0.0040   Epoch: 15   Global Step: 266870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:23,166-Speed 3316.86 samples/sec   Loss 0.4366   LearningRate 0.0040   Epoch: 15   Global Step: 266880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:26,348-Speed 3218.59 samples/sec   Loss 0.4174   LearningRate 0.0040   Epoch: 15   Global Step: 266890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:29,467-Speed 3283.51 samples/sec   Loss 0.4307   LearningRate 0.0040   Epoch: 15   Global Step: 266900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:32,530-Speed 3344.54 samples/sec   Loss 0.4317   LearningRate 0.0040   Epoch: 15   Global Step: 266910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:29:35,773-Speed 3158.28 samples/sec   Loss 0.4199   LearningRate 0.0040   Epoch: 15   Global Step: 266920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:29:38,862-Speed 3316.10 samples/sec   Loss 0.4108   LearningRate 0.0040   Epoch: 15   Global Step: 266930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:29:42,018-Speed 3245.10 samples/sec   Loss 0.4237   LearningRate 0.0040   Epoch: 15   Global Step: 266940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:29:45,081-Speed 3343.31 samples/sec   Loss 0.4515   LearningRate 0.0040   Epoch: 15   Global Step: 266950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:29:48,197-Speed 3287.11 samples/sec   Loss 0.4451   LearningRate 0.0040   Epoch: 15   Global Step: 266960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:29:51,268-Speed 3334.89 samples/sec   Loss 0.4510   LearningRate 0.0040   Epoch: 15   Global Step: 266970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:29:54,360-Speed 3312.53 samples/sec   Loss 0.4288   LearningRate 0.0040   Epoch: 15   Global Step: 266980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:29:57,450-Speed 3315.75 samples/sec   Loss 0.4508   LearningRate 0.0040   Epoch: 15   Global Step: 266990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:30:00,514-Speed 3341.79 samples/sec   Loss 0.4304   LearningRate 0.0040   Epoch: 15   Global Step: 267000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:30:03,588-Speed 3332.94 samples/sec   Loss 0.4431   LearningRate 0.0040   Epoch: 15   Global Step: 267010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:30:06,648-Speed 3347.10 samples/sec   Loss 0.4420   LearningRate 0.0040   Epoch: 15   Global Step: 267020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:30:09,731-Speed 3321.94 samples/sec   Loss 0.4484   LearningRate 0.0040   Epoch: 15   Global Step: 267030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:30:12,799-Speed 3338.36 samples/sec   Loss 0.4180   LearningRate 0.0040   Epoch: 15   Global Step: 267040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:30:16,146-Speed 3059.94 samples/sec   Loss 0.4582   LearningRate 0.0040   Epoch: 15   Global Step: 267050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:30:51,175-Speed 292.35 samples/sec   Loss 0.3554   LearningRate 0.0040   Epoch: 16   Global Step: 267060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:30:54,571-Speed 3015.72 samples/sec   Loss 0.2452   LearningRate 0.0040   Epoch: 16   Global Step: 267070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:30:57,769-Speed 3203.51 samples/sec   Loss 0.2460   LearningRate 0.0040   Epoch: 16   Global Step: 267080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:31:00,840-Speed 3334.56 samples/sec   Loss 0.2574   LearningRate 0.0040   Epoch: 16   Global Step: 267090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:31:03,897-Speed 3350.49 samples/sec   Loss 0.2450   LearningRate 0.0040   Epoch: 16   Global Step: 267100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:31:07,024-Speed 3275.61 samples/sec   Loss 0.2424   LearningRate 0.0040   Epoch: 16   Global Step: 267110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:31:10,246-Speed 3179.07 samples/sec   Loss 0.2507   LearningRate 0.0040   Epoch: 16   Global Step: 267120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:31:13,296-Speed 3358.04 samples/sec   Loss 0.2732   LearningRate 0.0040   Epoch: 16   Global Step: 267130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:16,381-Speed 3319.96 samples/sec   Loss 0.2564   LearningRate 0.0040   Epoch: 16   Global Step: 267140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:19,480-Speed 3305.21 samples/sec   Loss 0.2513   LearningRate 0.0040   Epoch: 16   Global Step: 267150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:22,556-Speed 3330.31 samples/sec   Loss 0.2478   LearningRate 0.0040   Epoch: 16   Global Step: 267160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:25,753-Speed 3203.83 samples/sec   Loss 0.2621   LearningRate 0.0040   Epoch: 16   Global Step: 267170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:28,868-Speed 3288.29 samples/sec   Loss 0.2410   LearningRate 0.0040   Epoch: 16   Global Step: 267180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:32,026-Speed 3243.05 samples/sec   Loss 0.2362   LearningRate 0.0040   Epoch: 16   Global Step: 267190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:35,134-Speed 3295.60 samples/sec   Loss 0.2448   LearningRate 0.0040   Epoch: 16   Global Step: 267200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:38,197-Speed 3343.74 samples/sec   Loss 0.2414   LearningRate 0.0040   Epoch: 16   Global Step: 267210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:41,291-Speed 3310.93 samples/sec   Loss 0.2300   LearningRate 0.0040   Epoch: 16   Global Step: 267220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:31:44,369-Speed 3327.09 samples/sec   Loss 0.2573   LearningRate 0.0040   Epoch: 16   Global Step: 267230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:31:47,436-Speed 3340.14 samples/sec   Loss 0.2212   LearningRate 0.0040   Epoch: 16   Global Step: 267240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:31:50,516-Speed 3324.69 samples/sec   Loss 0.2442   LearningRate 0.0040   Epoch: 16   Global Step: 267250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:31:53,600-Speed 3321.92 samples/sec   Loss 0.2454   LearningRate 0.0040   Epoch: 16   Global Step: 267260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:31:56,972-Speed 3036.94 samples/sec   Loss 0.2367   LearningRate 0.0040   Epoch: 16   Global Step: 267270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:00,229-Speed 3145.21 samples/sec   Loss 0.2347   LearningRate 0.0040   Epoch: 16   Global Step: 267280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:03,940-Speed 2759.38 samples/sec   Loss 0.2348   LearningRate 0.0040   Epoch: 16   Global Step: 267290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:07,022-Speed 3324.47 samples/sec   Loss 0.2367   LearningRate 0.0040   Epoch: 16   Global Step: 267300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:10,161-Speed 3262.50 samples/sec   Loss 0.2423   LearningRate 0.0040   Epoch: 16   Global Step: 267310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:13,281-Speed 3283.15 samples/sec   Loss 0.2338   LearningRate 0.0040   Epoch: 16   Global Step: 267320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:16,352-Speed 3335.06 samples/sec   Loss 0.2328   LearningRate 0.0040   Epoch: 16   Global Step: 267330   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:32:19,422-Speed 3336.36 samples/sec   Loss 0.2447   LearningRate 0.0040   Epoch: 16   Global Step: 267340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:22,529-Speed 3296.53 samples/sec   Loss 0.2602   LearningRate 0.0040   Epoch: 16   Global Step: 267350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:25,628-Speed 3305.65 samples/sec   Loss 0.2459   LearningRate 0.0040   Epoch: 16   Global Step: 267360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:28,773-Speed 3256.26 samples/sec   Loss 0.2398   LearningRate 0.0040   Epoch: 16   Global Step: 267370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:31,993-Speed 3181.16 samples/sec   Loss 0.2485   LearningRate 0.0040   Epoch: 16   Global Step: 267380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:35,147-Speed 3246.80 samples/sec   Loss 0.2432   LearningRate 0.0040   Epoch: 16   Global Step: 267390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:32:38,242-Speed 3309.45 samples/sec   Loss 0.2413   LearningRate 0.0040   Epoch: 16   Global Step: 267400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:32:41,316-Speed 3332.86 samples/sec   Loss 0.2539   LearningRate 0.0040   Epoch: 16   Global Step: 267410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:32:44,391-Speed 3330.55 samples/sec   Loss 0.2661   LearningRate 0.0040   Epoch: 16   Global Step: 267420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:32:47,470-Speed 3325.92 samples/sec   Loss 0.2661   LearningRate 0.0040   Epoch: 16   Global Step: 267430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:32:50,542-Speed 3334.44 samples/sec   Loss 0.2362   LearningRate 0.0040   Epoch: 16   Global Step: 267440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:32:53,613-Speed 3335.37 samples/sec   Loss 0.2224   LearningRate 0.0040   Epoch: 16   Global Step: 267450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:32:56,677-Speed 3343.11 samples/sec   Loss 0.2505   LearningRate 0.0040   Epoch: 16   Global Step: 267460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:32:59,746-Speed 3337.32 samples/sec   Loss 0.2205   LearningRate 0.0040   Epoch: 16   Global Step: 267470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:33:02,890-Speed 3258.06 samples/sec   Loss 0.2447   LearningRate 0.0039   Epoch: 16   Global Step: 267480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:33:06,011-Speed 3281.55 samples/sec   Loss 0.2411   LearningRate 0.0039   Epoch: 16   Global Step: 267490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:33:09,080-Speed 3337.18 samples/sec   Loss 0.2590   LearningRate 0.0039   Epoch: 16   Global Step: 267500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:12,150-Speed 3336.19 samples/sec   Loss 0.2664   LearningRate 0.0039   Epoch: 16   Global Step: 267510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:15,217-Speed 3339.94 samples/sec   Loss 0.2346   LearningRate 0.0039   Epoch: 16   Global Step: 267520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:18,280-Speed 3343.41 samples/sec   Loss 0.2434   LearningRate 0.0039   Epoch: 16   Global Step: 267530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:21,344-Speed 3343.23 samples/sec   Loss 0.2451   LearningRate 0.0039   Epoch: 16   Global Step: 267540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:24,413-Speed 3337.41 samples/sec   Loss 0.2423   LearningRate 0.0039   Epoch: 16   Global Step: 267550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:27,484-Speed 3335.92 samples/sec   Loss 0.2612   LearningRate 0.0039   Epoch: 16   Global Step: 267560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:30,550-Speed 3340.13 samples/sec   Loss 0.2364   LearningRate 0.0039   Epoch: 16   Global Step: 267570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:33,622-Speed 3334.11 samples/sec   Loss 0.2386   LearningRate 0.0039   Epoch: 16   Global Step: 267580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:36,694-Speed 3334.12 samples/sec   Loss 0.2276   LearningRate 0.0039   Epoch: 16   Global Step: 267590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:39,774-Speed 3325.03 samples/sec   Loss 0.2559   LearningRate 0.0039   Epoch: 16   Global Step: 267600   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:33:42,841-Speed 3340.18 samples/sec   Loss 0.2447   LearningRate 0.0039   Epoch: 16   Global Step: 267610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:45,937-Speed 3307.70 samples/sec   Loss 0.2481   LearningRate 0.0039   Epoch: 16   Global Step: 267620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:49,061-Speed 3278.69 samples/sec   Loss 0.2512   LearningRate 0.0039   Epoch: 16   Global Step: 267630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:52,194-Speed 3270.03 samples/sec   Loss 0.2638   LearningRate 0.0039   Epoch: 16   Global Step: 267640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:55,261-Speed 3339.21 samples/sec   Loss 0.2643   LearningRate 0.0039   Epoch: 16   Global Step: 267650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:33:58,348-Speed 3317.43 samples/sec   Loss 0.2451   LearningRate 0.0039   Epoch: 16   Global Step: 267660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:01,427-Speed 3326.64 samples/sec   Loss 0.2502   LearningRate 0.0039   Epoch: 16   Global Step: 267670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:04,506-Speed 3326.46 samples/sec   Loss 0.2734   LearningRate 0.0039   Epoch: 16   Global Step: 267680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:07,571-Speed 3341.40 samples/sec   Loss 0.2338   LearningRate 0.0039   Epoch: 16   Global Step: 267690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:10,670-Speed 3305.82 samples/sec   Loss 0.2533   LearningRate 0.0039   Epoch: 16   Global Step: 267700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:13,732-Speed 3343.87 samples/sec   Loss 0.2197   LearningRate 0.0039   Epoch: 16   Global Step: 267710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:16,814-Speed 3323.71 samples/sec   Loss 0.2414   LearningRate 0.0039   Epoch: 16   Global Step: 267720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:19,885-Speed 3335.87 samples/sec   Loss 0.2408   LearningRate 0.0039   Epoch: 16   Global Step: 267730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:22,959-Speed 3331.80 samples/sec   Loss 0.2397   LearningRate 0.0039   Epoch: 16   Global Step: 267740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:26,032-Speed 3333.19 samples/sec   Loss 0.2365   LearningRate 0.0039   Epoch: 16   Global Step: 267750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:29,112-Speed 3325.25 samples/sec   Loss 0.2442   LearningRate 0.0039   Epoch: 16   Global Step: 267760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:32,180-Speed 3338.53 samples/sec   Loss 0.2248   LearningRate 0.0039   Epoch: 16   Global Step: 267770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:35,261-Speed 3323.90 samples/sec   Loss 0.2322   LearningRate 0.0039   Epoch: 16   Global Step: 267780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:38,360-Speed 3305.12 samples/sec   Loss 0.2567   LearningRate 0.0039   Epoch: 16   Global Step: 267790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:34:41,420-Speed 3347.48 samples/sec   Loss 0.2450   LearningRate 0.0039   Epoch: 16   Global Step: 267800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:34:44,499-Speed 3326.13 samples/sec   Loss 0.2378   LearningRate 0.0039   Epoch: 16   Global Step: 267810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:34:47,595-Speed 3308.03 samples/sec   Loss 0.2419   LearningRate 0.0039   Epoch: 16   Global Step: 267820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:34:50,670-Speed 3331.45 samples/sec   Loss 0.2504   LearningRate 0.0039   Epoch: 16   Global Step: 267830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:34:53,736-Speed 3340.83 samples/sec   Loss 0.2379   LearningRate 0.0039   Epoch: 16   Global Step: 267840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:34:56,848-Speed 3290.87 samples/sec   Loss 0.2297   LearningRate 0.0039   Epoch: 16   Global Step: 267850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:34:59,963-Speed 3288.25 samples/sec   Loss 0.2563   LearningRate 0.0039   Epoch: 16   Global Step: 267860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:03,057-Speed 3309.58 samples/sec   Loss 0.2628   LearningRate 0.0039   Epoch: 16   Global Step: 267870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:06,207-Speed 3252.15 samples/sec   Loss 0.2194   LearningRate 0.0039   Epoch: 16   Global Step: 267880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:09,319-Speed 3290.91 samples/sec   Loss 0.2498   LearningRate 0.0039   Epoch: 16   Global Step: 267890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:12,391-Speed 3333.99 samples/sec   Loss 0.2298   LearningRate 0.0039   Epoch: 16   Global Step: 267900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:35:15,463-Speed 3334.26 samples/sec   Loss 0.2478   LearningRate 0.0039   Epoch: 16   Global Step: 267910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:35:18,598-Speed 3267.55 samples/sec   Loss 0.2463   LearningRate 0.0039   Epoch: 16   Global Step: 267920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:35:21,778-Speed 3220.33 samples/sec   Loss 0.2498   LearningRate 0.0039   Epoch: 16   Global Step: 267930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:24,941-Speed 3238.75 samples/sec   Loss 0.2367   LearningRate 0.0039   Epoch: 16   Global Step: 267940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:28,041-Speed 3303.12 samples/sec   Loss 0.2378   LearningRate 0.0039   Epoch: 16   Global Step: 267950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:31,145-Speed 3300.34 samples/sec   Loss 0.2552   LearningRate 0.0039   Epoch: 16   Global Step: 267960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:34,212-Speed 3339.31 samples/sec   Loss 0.2450   LearningRate 0.0039   Epoch: 16   Global Step: 267970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:37,301-Speed 3315.76 samples/sec   Loss 0.2256   LearningRate 0.0039   Epoch: 16   Global Step: 267980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:42,008-Speed 2175.87 samples/sec   Loss 0.2433   LearningRate 0.0039   Epoch: 16   Global Step: 267990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:35:45,115-Speed 3296.94 samples/sec   Loss 0.2358   LearningRate 0.0039   Epoch: 16   Global Step: 268000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:36:29,199-[lfw][268000]XNorm: 21.128086
Training: 2022-04-12 03:36:29,200-[lfw][268000]Accuracy-Flip: 0.99767+-0.00200
Training: 2022-04-12 03:36:29,201-[lfw][268000]Accuracy-Highest: 0.99817
Training: 2022-04-12 03:37:20,459-[cfp_fp][268000]XNorm: 22.424664
Training: 2022-04-12 03:37:20,460-[cfp_fp][268000]Accuracy-Flip: 0.99171+-0.00442
Training: 2022-04-12 03:37:20,460-[cfp_fp][268000]Accuracy-Highest: 0.99186
Training: 2022-04-12 03:38:04,539-[agedb_30][268000]XNorm: 22.934307
Training: 2022-04-12 03:38:04,539-[agedb_30][268000]Accuracy-Flip: 0.98433+-0.00564
Training: 2022-04-12 03:38:04,540-[agedb_30][268000]Accuracy-Highest: 0.98650
Training: 2022-04-12 03:38:07,639-Speed 71.85 samples/sec   Loss 0.2757   LearningRate 0.0039   Epoch: 16   Global Step: 268010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:38:14,417-Speed 1510.85 samples/sec   Loss 0.2634   LearningRate 0.0039   Epoch: 16   Global Step: 268020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:38:17,514-Speed 3307.09 samples/sec   Loss 0.2617   LearningRate 0.0039   Epoch: 16   Global Step: 268030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:20,594-Speed 3325.15 samples/sec   Loss 0.2424   LearningRate 0.0039   Epoch: 16   Global Step: 268040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:23,650-Speed 3351.84 samples/sec   Loss 0.2449   LearningRate 0.0039   Epoch: 16   Global Step: 268050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:26,725-Speed 3330.41 samples/sec   Loss 0.2374   LearningRate 0.0039   Epoch: 16   Global Step: 268060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:29,870-Speed 3257.00 samples/sec   Loss 0.2444   LearningRate 0.0039   Epoch: 16   Global Step: 268070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:32,977-Speed 3296.69 samples/sec   Loss 0.2416   LearningRate 0.0039   Epoch: 16   Global Step: 268080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:36,094-Speed 3285.53 samples/sec   Loss 0.2377   LearningRate 0.0039   Epoch: 16   Global Step: 268090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:39,168-Speed 3332.11 samples/sec   Loss 0.2505   LearningRate 0.0039   Epoch: 16   Global Step: 268100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:42,237-Speed 3337.11 samples/sec   Loss 0.2509   LearningRate 0.0039   Epoch: 16   Global Step: 268110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:45,307-Speed 3336.66 samples/sec   Loss 0.2327   LearningRate 0.0039   Epoch: 16   Global Step: 268120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:48,363-Speed 3350.93 samples/sec   Loss 0.2351   LearningRate 0.0039   Epoch: 16   Global Step: 268130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:51,445-Speed 3323.84 samples/sec   Loss 0.2458   LearningRate 0.0039   Epoch: 16   Global Step: 268140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:54,589-Speed 3257.33 samples/sec   Loss 0.2389   LearningRate 0.0039   Epoch: 16   Global Step: 268150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:38:57,668-Speed 3327.02 samples/sec   Loss 0.2556   LearningRate 0.0039   Epoch: 16   Global Step: 268160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:00,773-Speed 3298.68 samples/sec   Loss 0.2728   LearningRate 0.0039   Epoch: 16   Global Step: 268170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:03,844-Speed 3335.57 samples/sec   Loss 0.2394   LearningRate 0.0039   Epoch: 16   Global Step: 268180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:06,965-Speed 3281.46 samples/sec   Loss 0.2403   LearningRate 0.0039   Epoch: 16   Global Step: 268190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:10,082-Speed 3286.19 samples/sec   Loss 0.2453   LearningRate 0.0039   Epoch: 16   Global Step: 268200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:13,167-Speed 3319.73 samples/sec   Loss 0.2386   LearningRate 0.0039   Epoch: 16   Global Step: 268210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:16,244-Speed 3328.90 samples/sec   Loss 0.2528   LearningRate 0.0039   Epoch: 16   Global Step: 268220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:19,354-Speed 3292.81 samples/sec   Loss 0.2445   LearningRate 0.0039   Epoch: 16   Global Step: 268230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:22,456-Speed 3301.80 samples/sec   Loss 0.2425   LearningRate 0.0039   Epoch: 16   Global Step: 268240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:25,585-Speed 3272.67 samples/sec   Loss 0.2442   LearningRate 0.0039   Epoch: 16   Global Step: 268250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:28,685-Speed 3305.20 samples/sec   Loss 0.2520   LearningRate 0.0039   Epoch: 16   Global Step: 268260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:31,941-Speed 3145.24 samples/sec   Loss 0.2606   LearningRate 0.0039   Epoch: 16   Global Step: 268270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:39:35,158-Speed 3183.53 samples/sec   Loss 0.2512   LearningRate 0.0039   Epoch: 16   Global Step: 268280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:39:38,239-Speed 3324.07 samples/sec   Loss 0.2550   LearningRate 0.0039   Epoch: 16   Global Step: 268290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:39:41,328-Speed 3315.62 samples/sec   Loss 0.2560   LearningRate 0.0039   Epoch: 16   Global Step: 268300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:39:44,424-Speed 3308.20 samples/sec   Loss 0.2407   LearningRate 0.0039   Epoch: 16   Global Step: 268310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:39:47,492-Speed 3339.38 samples/sec   Loss 0.2544   LearningRate 0.0038   Epoch: 16   Global Step: 268320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:39:50,586-Speed 3310.04 samples/sec   Loss 0.2393   LearningRate 0.0038   Epoch: 16   Global Step: 268330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:39:53,705-Speed 3283.76 samples/sec   Loss 0.2617   LearningRate 0.0038   Epoch: 16   Global Step: 268340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:39:56,800-Speed 3309.12 samples/sec   Loss 0.2537   LearningRate 0.0038   Epoch: 16   Global Step: 268350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:39:59,871-Speed 3335.82 samples/sec   Loss 0.2710   LearningRate 0.0038   Epoch: 16   Global Step: 268360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:40:02,974-Speed 3300.58 samples/sec   Loss 0.2487   LearningRate 0.0038   Epoch: 16   Global Step: 268370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:40:06,106-Speed 3270.33 samples/sec   Loss 0.2378   LearningRate 0.0038   Epoch: 16   Global Step: 268380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:09,169-Speed 3343.39 samples/sec   Loss 0.2489   LearningRate 0.0038   Epoch: 16   Global Step: 268390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:12,250-Speed 3324.05 samples/sec   Loss 0.2465   LearningRate 0.0038   Epoch: 16   Global Step: 268400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:15,326-Speed 3330.16 samples/sec   Loss 0.2691   LearningRate 0.0038   Epoch: 16   Global Step: 268410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:18,429-Speed 3301.32 samples/sec   Loss 0.2487   LearningRate 0.0038   Epoch: 16   Global Step: 268420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:21,545-Speed 3286.97 samples/sec   Loss 0.2655   LearningRate 0.0038   Epoch: 16   Global Step: 268430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:24,606-Speed 3345.80 samples/sec   Loss 0.2314   LearningRate 0.0038   Epoch: 16   Global Step: 268440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:27,772-Speed 3235.82 samples/sec   Loss 0.2559   LearningRate 0.0038   Epoch: 16   Global Step: 268450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:30,844-Speed 3333.95 samples/sec   Loss 0.2584   LearningRate 0.0038   Epoch: 16   Global Step: 268460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:33,897-Speed 3354.13 samples/sec   Loss 0.2441   LearningRate 0.0038   Epoch: 16   Global Step: 268470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:36,970-Speed 3333.52 samples/sec   Loss 0.2334   LearningRate 0.0038   Epoch: 16   Global Step: 268480   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:40:40,031-Speed 3346.23 samples/sec   Loss 0.2528   LearningRate 0.0038   Epoch: 16   Global Step: 268490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:43,104-Speed 3333.07 samples/sec   Loss 0.2319   LearningRate 0.0038   Epoch: 16   Global Step: 268500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:46,155-Speed 3356.44 samples/sec   Loss 0.2615   LearningRate 0.0038   Epoch: 16   Global Step: 268510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:49,284-Speed 3273.09 samples/sec   Loss 0.2491   LearningRate 0.0038   Epoch: 16   Global Step: 268520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:52,343-Speed 3348.22 samples/sec   Loss 0.2569   LearningRate 0.0038   Epoch: 16   Global Step: 268530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:55,402-Speed 3348.37 samples/sec   Loss 0.2626   LearningRate 0.0038   Epoch: 16   Global Step: 268540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:40:58,463-Speed 3347.08 samples/sec   Loss 0.2501   LearningRate 0.0038   Epoch: 16   Global Step: 268550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:01,514-Speed 3356.97 samples/sec   Loss 0.2518   LearningRate 0.0038   Epoch: 16   Global Step: 268560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:04,673-Speed 3241.72 samples/sec   Loss 0.2432   LearningRate 0.0038   Epoch: 16   Global Step: 268570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:07,745-Speed 3334.04 samples/sec   Loss 0.2445   LearningRate 0.0038   Epoch: 16   Global Step: 268580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:10,906-Speed 3239.73 samples/sec   Loss 0.2495   LearningRate 0.0038   Epoch: 16   Global Step: 268590   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:41:13,962-Speed 3351.49 samples/sec   Loss 0.2714   LearningRate 0.0038   Epoch: 16   Global Step: 268600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:17,046-Speed 3321.18 samples/sec   Loss 0.2402   LearningRate 0.0038   Epoch: 16   Global Step: 268610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:20,187-Speed 3261.63 samples/sec   Loss 0.2390   LearningRate 0.0038   Epoch: 16   Global Step: 268620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:23,278-Speed 3313.82 samples/sec   Loss 0.2703   LearningRate 0.0038   Epoch: 16   Global Step: 268630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:26,342-Speed 3342.69 samples/sec   Loss 0.2499   LearningRate 0.0038   Epoch: 16   Global Step: 268640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:29,395-Speed 3354.93 samples/sec   Loss 0.2488   LearningRate 0.0038   Epoch: 16   Global Step: 268650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:41:32,449-Speed 3353.20 samples/sec   Loss 0.2704   LearningRate 0.0038   Epoch: 16   Global Step: 268660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:41:35,534-Speed 3320.42 samples/sec   Loss 0.2471   LearningRate 0.0038   Epoch: 16   Global Step: 268670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:41:38,597-Speed 3343.54 samples/sec   Loss 0.2369   LearningRate 0.0038   Epoch: 16   Global Step: 268680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:41:41,691-Speed 3309.92 samples/sec   Loss 0.2578   LearningRate 0.0038   Epoch: 16   Global Step: 268690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:41:44,773-Speed 3324.18 samples/sec   Loss 0.2456   LearningRate 0.0038   Epoch: 16   Global Step: 268700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:41:47,836-Speed 3344.21 samples/sec   Loss 0.2433   LearningRate 0.0038   Epoch: 16   Global Step: 268710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:41:50,921-Speed 3319.44 samples/sec   Loss 0.2475   LearningRate 0.0038   Epoch: 16   Global Step: 268720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:41:54,039-Speed 3284.42 samples/sec   Loss 0.2442   LearningRate 0.0038   Epoch: 16   Global Step: 268730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:41:57,098-Speed 3348.81 samples/sec   Loss 0.2325   LearningRate 0.0038   Epoch: 16   Global Step: 268740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:42:00,168-Speed 3336.56 samples/sec   Loss 0.2415   LearningRate 0.0038   Epoch: 16   Global Step: 268750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:42:03,224-Speed 3350.97 samples/sec   Loss 0.2537   LearningRate 0.0038   Epoch: 16   Global Step: 268760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:06,282-Speed 3349.26 samples/sec   Loss 0.2466   LearningRate 0.0038   Epoch: 16   Global Step: 268770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:09,354-Speed 3333.59 samples/sec   Loss 0.2557   LearningRate 0.0038   Epoch: 16   Global Step: 268780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:12,415-Speed 3347.14 samples/sec   Loss 0.2349   LearningRate 0.0038   Epoch: 16   Global Step: 268790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:15,478-Speed 3343.97 samples/sec   Loss 0.2384   LearningRate 0.0038   Epoch: 16   Global Step: 268800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:18,573-Speed 3309.16 samples/sec   Loss 0.2561   LearningRate 0.0038   Epoch: 16   Global Step: 268810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:21,644-Speed 3334.68 samples/sec   Loss 0.2438   LearningRate 0.0038   Epoch: 16   Global Step: 268820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:24,722-Speed 3327.96 samples/sec   Loss 0.2423   LearningRate 0.0038   Epoch: 16   Global Step: 268830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:27,811-Speed 3315.21 samples/sec   Loss 0.2588   LearningRate 0.0038   Epoch: 16   Global Step: 268840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:30,933-Speed 3280.28 samples/sec   Loss 0.2425   LearningRate 0.0038   Epoch: 16   Global Step: 268850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:34,013-Speed 3326.16 samples/sec   Loss 0.2386   LearningRate 0.0038   Epoch: 16   Global Step: 268860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:37,120-Speed 3296.36 samples/sec   Loss 0.2487   LearningRate 0.0038   Epoch: 16   Global Step: 268870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:40,197-Speed 3328.71 samples/sec   Loss 0.2677   LearningRate 0.0038   Epoch: 16   Global Step: 268880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:43,266-Speed 3337.79 samples/sec   Loss 0.2419   LearningRate 0.0038   Epoch: 16   Global Step: 268890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:46,358-Speed 3312.23 samples/sec   Loss 0.2402   LearningRate 0.0038   Epoch: 16   Global Step: 268900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:49,561-Speed 3198.34 samples/sec   Loss 0.2681   LearningRate 0.0038   Epoch: 16   Global Step: 268910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:42:52,616-Speed 3352.33 samples/sec   Loss 0.2395   LearningRate 0.0038   Epoch: 16   Global Step: 268920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:42:55,675-Speed 3348.18 samples/sec   Loss 0.2555   LearningRate 0.0038   Epoch: 16   Global Step: 268930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:42:58,742-Speed 3339.35 samples/sec   Loss 0.2559   LearningRate 0.0038   Epoch: 16   Global Step: 268940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:43:01,809-Speed 3339.07 samples/sec   Loss 0.2505   LearningRate 0.0038   Epoch: 16   Global Step: 268950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:43:04,868-Speed 3348.76 samples/sec   Loss 0.2358   LearningRate 0.0038   Epoch: 16   Global Step: 268960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:43:08,002-Speed 3268.17 samples/sec   Loss 0.2438   LearningRate 0.0038   Epoch: 16   Global Step: 268970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:43:11,111-Speed 3294.08 samples/sec   Loss 0.2430   LearningRate 0.0038   Epoch: 16   Global Step: 268980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:43:14,180-Speed 3337.98 samples/sec   Loss 0.2702   LearningRate 0.0038   Epoch: 16   Global Step: 268990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:43:17,248-Speed 3338.51 samples/sec   Loss 0.2421   LearningRate 0.0038   Epoch: 16   Global Step: 269000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:43:20,330-Speed 3322.25 samples/sec   Loss 0.2622   LearningRate 0.0038   Epoch: 16   Global Step: 269010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:43:23,478-Speed 3253.85 samples/sec   Loss 0.2503   LearningRate 0.0038   Epoch: 16   Global Step: 269020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:26,550-Speed 3333.78 samples/sec   Loss 0.2535   LearningRate 0.0038   Epoch: 16   Global Step: 269030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:29,766-Speed 3185.07 samples/sec   Loss 0.2309   LearningRate 0.0038   Epoch: 16   Global Step: 269040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:32,877-Speed 3292.88 samples/sec   Loss 0.2530   LearningRate 0.0038   Epoch: 16   Global Step: 269050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:35,959-Speed 3323.33 samples/sec   Loss 0.2750   LearningRate 0.0038   Epoch: 16   Global Step: 269060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:39,140-Speed 3219.50 samples/sec   Loss 0.2419   LearningRate 0.0038   Epoch: 16   Global Step: 269070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:42,267-Speed 3275.81 samples/sec   Loss 0.2529   LearningRate 0.0038   Epoch: 16   Global Step: 269080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:45,379-Speed 3290.32 samples/sec   Loss 0.2464   LearningRate 0.0038   Epoch: 16   Global Step: 269090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:48,458-Speed 3326.61 samples/sec   Loss 0.2430   LearningRate 0.0038   Epoch: 16   Global Step: 269100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:51,521-Speed 3343.86 samples/sec   Loss 0.2543   LearningRate 0.0038   Epoch: 16   Global Step: 269110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:43:54,615-Speed 3310.18 samples/sec   Loss 0.2472   LearningRate 0.0038   Epoch: 16   Global Step: 269120   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:43:57,688-Speed 3333.57 samples/sec   Loss 0.2465   LearningRate 0.0038   Epoch: 16   Global Step: 269130   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:44:00,746-Speed 3349.90 samples/sec   Loss 0.2617   LearningRate 0.0038   Epoch: 16   Global Step: 269140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:03,811-Speed 3341.63 samples/sec   Loss 0.2519   LearningRate 0.0038   Epoch: 16   Global Step: 269150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:06,897-Speed 3319.24 samples/sec   Loss 0.2673   LearningRate 0.0038   Epoch: 16   Global Step: 269160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:09,970-Speed 3332.69 samples/sec   Loss 0.2499   LearningRate 0.0038   Epoch: 16   Global Step: 269170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:13,052-Speed 3323.25 samples/sec   Loss 0.2387   LearningRate 0.0037   Epoch: 16   Global Step: 269180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:16,128-Speed 3329.40 samples/sec   Loss 0.2538   LearningRate 0.0037   Epoch: 16   Global Step: 269190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:19,202-Speed 3331.65 samples/sec   Loss 0.2669   LearningRate 0.0037   Epoch: 16   Global Step: 269200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:22,282-Speed 3325.65 samples/sec   Loss 0.2351   LearningRate 0.0037   Epoch: 16   Global Step: 269210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:25,345-Speed 3343.52 samples/sec   Loss 0.2374   LearningRate 0.0037   Epoch: 16   Global Step: 269220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:28,466-Speed 3282.62 samples/sec   Loss 0.2514   LearningRate 0.0037   Epoch: 16   Global Step: 269230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:31,554-Speed 3316.68 samples/sec   Loss 0.2444   LearningRate 0.0037   Epoch: 16   Global Step: 269240   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:44:34,725-Speed 3229.26 samples/sec   Loss 0.2601   LearningRate 0.0037   Epoch: 16   Global Step: 269250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:37,840-Speed 3288.56 samples/sec   Loss 0.2522   LearningRate 0.0037   Epoch: 16   Global Step: 269260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:40,944-Speed 3299.75 samples/sec   Loss 0.2454   LearningRate 0.0037   Epoch: 16   Global Step: 269270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:44,062-Speed 3284.55 samples/sec   Loss 0.2331   LearningRate 0.0037   Epoch: 16   Global Step: 269280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:47,260-Speed 3203.34 samples/sec   Loss 0.2579   LearningRate 0.0037   Epoch: 16   Global Step: 269290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:50,376-Speed 3286.41 samples/sec   Loss 0.2409   LearningRate 0.0037   Epoch: 16   Global Step: 269300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:53,446-Speed 3336.85 samples/sec   Loss 0.2298   LearningRate 0.0037   Epoch: 16   Global Step: 269310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:56,583-Speed 3265.28 samples/sec   Loss 0.2451   LearningRate 0.0037   Epoch: 16   Global Step: 269320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:44:59,649-Speed 3340.41 samples/sec   Loss 0.2397   LearningRate 0.0037   Epoch: 16   Global Step: 269330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:45:02,713-Speed 3342.51 samples/sec   Loss 0.2695   LearningRate 0.0037   Epoch: 16   Global Step: 269340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:45:05,792-Speed 3326.70 samples/sec   Loss 0.2476   LearningRate 0.0037   Epoch: 16   Global Step: 269350   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:45:08,847-Speed 3352.77 samples/sec   Loss 0.2563   LearningRate 0.0037   Epoch: 16   Global Step: 269360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:45:11,908-Speed 3345.92 samples/sec   Loss 0.2301   LearningRate 0.0037   Epoch: 16   Global Step: 269370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:15,009-Speed 3303.13 samples/sec   Loss 0.2465   LearningRate 0.0037   Epoch: 16   Global Step: 269380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:18,093-Speed 3320.36 samples/sec   Loss 0.2447   LearningRate 0.0037   Epoch: 16   Global Step: 269390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:21,211-Speed 3285.63 samples/sec   Loss 0.2571   LearningRate 0.0037   Epoch: 16   Global Step: 269400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:24,329-Speed 3284.13 samples/sec   Loss 0.2632   LearningRate 0.0037   Epoch: 16   Global Step: 269410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:27,432-Speed 3301.29 samples/sec   Loss 0.2303   LearningRate 0.0037   Epoch: 16   Global Step: 269420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:30,494-Speed 3345.03 samples/sec   Loss 0.2649   LearningRate 0.0037   Epoch: 16   Global Step: 269430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:33,562-Speed 3337.93 samples/sec   Loss 0.2501   LearningRate 0.0037   Epoch: 16   Global Step: 269440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:36,668-Speed 3297.58 samples/sec   Loss 0.2639   LearningRate 0.0037   Epoch: 16   Global Step: 269450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:39,829-Speed 3240.63 samples/sec   Loss 0.2507   LearningRate 0.0037   Epoch: 16   Global Step: 269460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:45:42,988-Speed 3242.19 samples/sec   Loss 0.2700   LearningRate 0.0037   Epoch: 16   Global Step: 269470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:45:46,106-Speed 3284.40 samples/sec   Loss 0.2432   LearningRate 0.0037   Epoch: 16   Global Step: 269480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:45:49,272-Speed 3235.41 samples/sec   Loss 0.2640   LearningRate 0.0037   Epoch: 16   Global Step: 269490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:45:52,438-Speed 3235.43 samples/sec   Loss 0.2538   LearningRate 0.0037   Epoch: 16   Global Step: 269500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:45:55,611-Speed 3227.17 samples/sec   Loss 0.2534   LearningRate 0.0037   Epoch: 16   Global Step: 269510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:45:58,684-Speed 3333.79 samples/sec   Loss 0.2638   LearningRate 0.0037   Epoch: 16   Global Step: 269520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:01,751-Speed 3338.74 samples/sec   Loss 0.2590   LearningRate 0.0037   Epoch: 16   Global Step: 269530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:04,818-Speed 3339.38 samples/sec   Loss 0.2352   LearningRate 0.0037   Epoch: 16   Global Step: 269540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:07,888-Speed 3336.51 samples/sec   Loss 0.2564   LearningRate 0.0037   Epoch: 16   Global Step: 269550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:10,973-Speed 3320.64 samples/sec   Loss 0.2671   LearningRate 0.0037   Epoch: 16   Global Step: 269560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:14,058-Speed 3320.52 samples/sec   Loss 0.2591   LearningRate 0.0037   Epoch: 16   Global Step: 269570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:17,168-Speed 3294.21 samples/sec   Loss 0.2568   LearningRate 0.0037   Epoch: 16   Global Step: 269580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:20,320-Speed 3248.96 samples/sec   Loss 0.2468   LearningRate 0.0037   Epoch: 16   Global Step: 269590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:23,387-Speed 3339.42 samples/sec   Loss 0.2631   LearningRate 0.0037   Epoch: 16   Global Step: 269600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:26,450-Speed 3343.73 samples/sec   Loss 0.2486   LearningRate 0.0037   Epoch: 16   Global Step: 269610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:29,532-Speed 3323.45 samples/sec   Loss 0.2730   LearningRate 0.0037   Epoch: 16   Global Step: 269620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:46:32,670-Speed 3263.84 samples/sec   Loss 0.2563   LearningRate 0.0037   Epoch: 16   Global Step: 269630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:46:35,757-Speed 3317.71 samples/sec   Loss 0.2486   LearningRate 0.0037   Epoch: 16   Global Step: 269640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:46:38,837-Speed 3325.99 samples/sec   Loss 0.2815   LearningRate 0.0037   Epoch: 16   Global Step: 269650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:46:41,899-Speed 3345.07 samples/sec   Loss 0.2577   LearningRate 0.0037   Epoch: 16   Global Step: 269660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:44,964-Speed 3341.07 samples/sec   Loss 0.2696   LearningRate 0.0037   Epoch: 16   Global Step: 269670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:48,046-Speed 3323.47 samples/sec   Loss 0.2543   LearningRate 0.0037   Epoch: 16   Global Step: 269680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:51,118-Speed 3334.13 samples/sec   Loss 0.2635   LearningRate 0.0037   Epoch: 16   Global Step: 269690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:54,203-Speed 3319.66 samples/sec   Loss 0.2544   LearningRate 0.0037   Epoch: 16   Global Step: 269700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:46:57,275-Speed 3334.89 samples/sec   Loss 0.2670   LearningRate 0.0037   Epoch: 16   Global Step: 269710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:47:00,449-Speed 3225.85 samples/sec   Loss 0.2468   LearningRate 0.0037   Epoch: 16   Global Step: 269720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:47:03,588-Speed 3263.33 samples/sec   Loss 0.2715   LearningRate 0.0037   Epoch: 16   Global Step: 269730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:47:06,747-Speed 3241.77 samples/sec   Loss 0.2527   LearningRate 0.0037   Epoch: 16   Global Step: 269740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:47:09,820-Speed 3333.62 samples/sec   Loss 0.2480   LearningRate 0.0037   Epoch: 16   Global Step: 269750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:47:12,908-Speed 3317.29 samples/sec   Loss 0.2611   LearningRate 0.0037   Epoch: 16   Global Step: 269760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:16,050-Speed 3259.92 samples/sec   Loss 0.2582   LearningRate 0.0037   Epoch: 16   Global Step: 269770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:19,158-Speed 3294.93 samples/sec   Loss 0.2724   LearningRate 0.0037   Epoch: 16   Global Step: 269780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:22,247-Speed 3315.84 samples/sec   Loss 0.2635   LearningRate 0.0037   Epoch: 16   Global Step: 269790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:25,336-Speed 3315.87 samples/sec   Loss 0.2659   LearningRate 0.0037   Epoch: 16   Global Step: 269800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:28,428-Speed 3312.37 samples/sec   Loss 0.2540   LearningRate 0.0037   Epoch: 16   Global Step: 269810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:31,505-Speed 3328.92 samples/sec   Loss 0.2594   LearningRate 0.0037   Epoch: 16   Global Step: 269820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:34,584-Speed 3326.04 samples/sec   Loss 0.2652   LearningRate 0.0037   Epoch: 16   Global Step: 269830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:37,665-Speed 3324.81 samples/sec   Loss 0.2628   LearningRate 0.0037   Epoch: 16   Global Step: 269840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:40,739-Speed 3331.78 samples/sec   Loss 0.2583   LearningRate 0.0037   Epoch: 16   Global Step: 269850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:43,811-Speed 3334.03 samples/sec   Loss 0.2531   LearningRate 0.0037   Epoch: 16   Global Step: 269860   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:47:46,886-Speed 3331.17 samples/sec   Loss 0.2414   LearningRate 0.0037   Epoch: 16   Global Step: 269870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:49,954-Speed 3337.98 samples/sec   Loss 0.2648   LearningRate 0.0037   Epoch: 16   Global Step: 269880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:53,017-Speed 3344.07 samples/sec   Loss 0.2675   LearningRate 0.0037   Epoch: 16   Global Step: 269890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:56,161-Speed 3258.34 samples/sec   Loss 0.2484   LearningRate 0.0037   Epoch: 16   Global Step: 269900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:47:59,245-Speed 3321.00 samples/sec   Loss 0.2533   LearningRate 0.0037   Epoch: 16   Global Step: 269910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:48:02,350-Speed 3297.96 samples/sec   Loss 0.2689   LearningRate 0.0037   Epoch: 16   Global Step: 269920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:48:05,450-Speed 3304.30 samples/sec   Loss 0.2566   LearningRate 0.0037   Epoch: 16   Global Step: 269930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:48:08,590-Speed 3262.52 samples/sec   Loss 0.2508   LearningRate 0.0037   Epoch: 16   Global Step: 269940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:48:11,670-Speed 3325.68 samples/sec   Loss 0.2394   LearningRate 0.0037   Epoch: 16   Global Step: 269950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:48:14,754-Speed 3320.53 samples/sec   Loss 0.2676   LearningRate 0.0037   Epoch: 16   Global Step: 269960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:48:17,884-Speed 3272.42 samples/sec   Loss 0.2831   LearningRate 0.0037   Epoch: 16   Global Step: 269970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:48:21,061-Speed 3224.24 samples/sec   Loss 0.2692   LearningRate 0.0037   Epoch: 16   Global Step: 269980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:48:24,171-Speed 3292.77 samples/sec   Loss 0.2624   LearningRate 0.0037   Epoch: 16   Global Step: 269990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:48:27,275-Speed 3299.40 samples/sec   Loss 0.2500   LearningRate 0.0037   Epoch: 16   Global Step: 270000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:49:11,018-[lfw][270000]XNorm: 20.693332
Training: 2022-04-12 03:49:11,018-[lfw][270000]Accuracy-Flip: 0.99800+-0.00180
Training: 2022-04-12 03:49:11,019-[lfw][270000]Accuracy-Highest: 0.99817
Training: 2022-04-12 03:50:01,787-[cfp_fp][270000]XNorm: 22.299416
Training: 2022-04-12 03:50:01,788-[cfp_fp][270000]Accuracy-Flip: 0.99143+-0.00429
Training: 2022-04-12 03:50:01,788-[cfp_fp][270000]Accuracy-Highest: 0.99186
Training: 2022-04-12 03:50:45,441-[agedb_30][270000]XNorm: 22.570015
Training: 2022-04-12 03:50:45,441-[agedb_30][270000]Accuracy-Flip: 0.98567+-0.00588
Training: 2022-04-12 03:50:45,442-[agedb_30][270000]Accuracy-Highest: 0.98650
Training: 2022-04-12 03:50:48,527-Speed 72.50 samples/sec   Loss 0.2646   LearningRate 0.0037   Epoch: 16   Global Step: 270010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:50:51,602-Speed 3330.95 samples/sec   Loss 0.2601   LearningRate 0.0037   Epoch: 16   Global Step: 270020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:50:54,668-Speed 3340.50 samples/sec   Loss 0.2485   LearningRate 0.0037   Epoch: 16   Global Step: 270030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:50:57,787-Speed 3283.67 samples/sec   Loss 0.2667   LearningRate 0.0037   Epoch: 16   Global Step: 270040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:51:00,925-Speed 3263.95 samples/sec   Loss 0.2447   LearningRate 0.0036   Epoch: 16   Global Step: 270050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:51:03,987-Speed 3345.00 samples/sec   Loss 0.2347   LearningRate 0.0036   Epoch: 16   Global Step: 270060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:51:07,056-Speed 3337.85 samples/sec   Loss 0.2598   LearningRate 0.0036   Epoch: 16   Global Step: 270070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:51:10,159-Speed 3300.41 samples/sec   Loss 0.2547   LearningRate 0.0036   Epoch: 16   Global Step: 270080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:51:13,239-Speed 3325.54 samples/sec   Loss 0.2501   LearningRate 0.0036   Epoch: 16   Global Step: 270090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:51:16,323-Speed 3320.69 samples/sec   Loss 0.2480   LearningRate 0.0036   Epoch: 16   Global Step: 270100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:51:19,413-Speed 3314.16 samples/sec   Loss 0.2495   LearningRate 0.0036   Epoch: 16   Global Step: 270110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:22,474-Speed 3346.37 samples/sec   Loss 0.2408   LearningRate 0.0036   Epoch: 16   Global Step: 270120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:25,550-Speed 3330.08 samples/sec   Loss 0.2619   LearningRate 0.0036   Epoch: 16   Global Step: 270130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:28,647-Speed 3307.44 samples/sec   Loss 0.2482   LearningRate 0.0036   Epoch: 16   Global Step: 270140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:31,801-Speed 3246.81 samples/sec   Loss 0.2439   LearningRate 0.0036   Epoch: 16   Global Step: 270150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:34,900-Speed 3304.94 samples/sec   Loss 0.2667   LearningRate 0.0036   Epoch: 16   Global Step: 270160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:37,981-Speed 3324.41 samples/sec   Loss 0.2566   LearningRate 0.0036   Epoch: 16   Global Step: 270170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:41,171-Speed 3211.06 samples/sec   Loss 0.2452   LearningRate 0.0036   Epoch: 16   Global Step: 270180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:44,295-Speed 3278.48 samples/sec   Loss 0.2587   LearningRate 0.0036   Epoch: 16   Global Step: 270190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:47,371-Speed 3329.70 samples/sec   Loss 0.2539   LearningRate 0.0036   Epoch: 16   Global Step: 270200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:50,422-Speed 3356.84 samples/sec   Loss 0.2448   LearningRate 0.0036   Epoch: 16   Global Step: 270210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:53,489-Speed 3340.52 samples/sec   Loss 0.2731   LearningRate 0.0036   Epoch: 16   Global Step: 270220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:56,558-Speed 3336.70 samples/sec   Loss 0.2615   LearningRate 0.0036   Epoch: 16   Global Step: 270230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:51:59,620-Speed 3345.35 samples/sec   Loss 0.2520   LearningRate 0.0036   Epoch: 16   Global Step: 270240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:02,684-Speed 3342.38 samples/sec   Loss 0.2512   LearningRate 0.0036   Epoch: 16   Global Step: 270250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:05,750-Speed 3341.05 samples/sec   Loss 0.2310   LearningRate 0.0036   Epoch: 16   Global Step: 270260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:08,906-Speed 3244.94 samples/sec   Loss 0.2551   LearningRate 0.0036   Epoch: 16   Global Step: 270270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:12,030-Speed 3278.93 samples/sec   Loss 0.2454   LearningRate 0.0036   Epoch: 16   Global Step: 270280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:15,130-Speed 3303.77 samples/sec   Loss 0.2486   LearningRate 0.0036   Epoch: 16   Global Step: 270290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:18,236-Speed 3297.98 samples/sec   Loss 0.2477   LearningRate 0.0036   Epoch: 16   Global Step: 270300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:21,339-Speed 3300.16 samples/sec   Loss 0.2489   LearningRate 0.0036   Epoch: 16   Global Step: 270310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:24,422-Speed 3322.39 samples/sec   Loss 0.2756   LearningRate 0.0036   Epoch: 16   Global Step: 270320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:27,523-Speed 3302.76 samples/sec   Loss 0.2585   LearningRate 0.0036   Epoch: 16   Global Step: 270330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:30,640-Speed 3286.49 samples/sec   Loss 0.2631   LearningRate 0.0036   Epoch: 16   Global Step: 270340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:52:33,772-Speed 3269.55 samples/sec   Loss 0.2434   LearningRate 0.0036   Epoch: 16   Global Step: 270350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:52:36,863-Speed 3313.30 samples/sec   Loss 0.2473   LearningRate 0.0036   Epoch: 16   Global Step: 270360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:52:39,938-Speed 3330.82 samples/sec   Loss 0.2611   LearningRate 0.0036   Epoch: 16   Global Step: 270370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:52:43,065-Speed 3275.74 samples/sec   Loss 0.2589   LearningRate 0.0036   Epoch: 16   Global Step: 270380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:52:46,183-Speed 3285.30 samples/sec   Loss 0.2971   LearningRate 0.0036   Epoch: 16   Global Step: 270390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:49,277-Speed 3310.25 samples/sec   Loss 0.2641   LearningRate 0.0036   Epoch: 16   Global Step: 270400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:52,397-Speed 3283.40 samples/sec   Loss 0.2526   LearningRate 0.0036   Epoch: 16   Global Step: 270410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:55,483-Speed 3318.22 samples/sec   Loss 0.2403   LearningRate 0.0036   Epoch: 16   Global Step: 270420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:52:58,578-Speed 3309.83 samples/sec   Loss 0.2557   LearningRate 0.0036   Epoch: 16   Global Step: 270430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:53:01,654-Speed 3329.84 samples/sec   Loss 0.2677   LearningRate 0.0036   Epoch: 16   Global Step: 270440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:53:04,846-Speed 3208.06 samples/sec   Loss 0.2614   LearningRate 0.0036   Epoch: 16   Global Step: 270450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:53:07,923-Speed 3328.56 samples/sec   Loss 0.2472   LearningRate 0.0036   Epoch: 16   Global Step: 270460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:53:10,997-Speed 3332.34 samples/sec   Loss 0.2573   LearningRate 0.0036   Epoch: 16   Global Step: 270470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:53:14,064-Speed 3339.69 samples/sec   Loss 0.2576   LearningRate 0.0036   Epoch: 16   Global Step: 270480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:53:17,159-Speed 3309.61 samples/sec   Loss 0.2685   LearningRate 0.0036   Epoch: 16   Global Step: 270490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:20,223-Speed 3342.46 samples/sec   Loss 0.2600   LearningRate 0.0036   Epoch: 16   Global Step: 270500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:23,319-Speed 3308.25 samples/sec   Loss 0.2598   LearningRate 0.0036   Epoch: 16   Global Step: 270510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:26,449-Speed 3272.58 samples/sec   Loss 0.2608   LearningRate 0.0036   Epoch: 16   Global Step: 270520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:29,541-Speed 3312.44 samples/sec   Loss 0.2587   LearningRate 0.0036   Epoch: 16   Global Step: 270530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:32,671-Speed 3272.13 samples/sec   Loss 0.2620   LearningRate 0.0036   Epoch: 16   Global Step: 270540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:35,821-Speed 3250.84 samples/sec   Loss 0.2405   LearningRate 0.0036   Epoch: 16   Global Step: 270550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:38,952-Speed 3272.15 samples/sec   Loss 0.2661   LearningRate 0.0036   Epoch: 16   Global Step: 270560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:42,054-Speed 3302.00 samples/sec   Loss 0.2541   LearningRate 0.0036   Epoch: 16   Global Step: 270570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:45,141-Speed 3317.03 samples/sec   Loss 0.2630   LearningRate 0.0036   Epoch: 16   Global Step: 270580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:48,243-Speed 3301.90 samples/sec   Loss 0.2654   LearningRate 0.0036   Epoch: 16   Global Step: 270590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:51,315-Speed 3334.05 samples/sec   Loss 0.2467   LearningRate 0.0036   Epoch: 16   Global Step: 270600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:54,377-Speed 3345.49 samples/sec   Loss 0.2494   LearningRate 0.0036   Epoch: 16   Global Step: 270610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:53:57,487-Speed 3292.97 samples/sec   Loss 0.2545   LearningRate 0.0036   Epoch: 16   Global Step: 270620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:00,678-Speed 3210.15 samples/sec   Loss 0.2613   LearningRate 0.0036   Epoch: 16   Global Step: 270630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:03,769-Speed 3313.35 samples/sec   Loss 0.2500   LearningRate 0.0036   Epoch: 16   Global Step: 270640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:06,929-Speed 3241.20 samples/sec   Loss 0.2613   LearningRate 0.0036   Epoch: 16   Global Step: 270650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:10,069-Speed 3262.01 samples/sec   Loss 0.2516   LearningRate 0.0036   Epoch: 16   Global Step: 270660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:13,250-Speed 3219.60 samples/sec   Loss 0.2545   LearningRate 0.0036   Epoch: 16   Global Step: 270670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:16,324-Speed 3332.02 samples/sec   Loss 0.2489   LearningRate 0.0036   Epoch: 16   Global Step: 270680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:19,388-Speed 3342.76 samples/sec   Loss 0.2703   LearningRate 0.0036   Epoch: 16   Global Step: 270690   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:54:22,457-Speed 3337.86 samples/sec   Loss 0.2503   LearningRate 0.0036   Epoch: 16   Global Step: 270700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:25,535-Speed 3327.02 samples/sec   Loss 0.2639   LearningRate 0.0036   Epoch: 16   Global Step: 270710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:28,603-Speed 3338.26 samples/sec   Loss 0.2521   LearningRate 0.0036   Epoch: 16   Global Step: 270720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:31,675-Speed 3334.08 samples/sec   Loss 0.2330   LearningRate 0.0036   Epoch: 16   Global Step: 270730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:34,752-Speed 3329.11 samples/sec   Loss 0.2670   LearningRate 0.0036   Epoch: 16   Global Step: 270740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:37,837-Speed 3319.49 samples/sec   Loss 0.2777   LearningRate 0.0036   Epoch: 16   Global Step: 270750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:40,904-Speed 3339.98 samples/sec   Loss 0.2568   LearningRate 0.0036   Epoch: 16   Global Step: 270760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:43,968-Speed 3343.36 samples/sec   Loss 0.2591   LearningRate 0.0036   Epoch: 16   Global Step: 270770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:47,044-Speed 3328.83 samples/sec   Loss 0.2707   LearningRate 0.0036   Epoch: 16   Global Step: 270780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:50,112-Speed 3338.40 samples/sec   Loss 0.2498   LearningRate 0.0036   Epoch: 16   Global Step: 270790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:53,173-Speed 3345.91 samples/sec   Loss 0.2698   LearningRate 0.0036   Epoch: 16   Global Step: 270800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:56,296-Speed 3280.70 samples/sec   Loss 0.2716   LearningRate 0.0036   Epoch: 16   Global Step: 270810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:54:59,382-Speed 3318.69 samples/sec   Loss 0.2588   LearningRate 0.0036   Epoch: 16   Global Step: 270820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:55:02,448-Speed 3340.54 samples/sec   Loss 0.2634   LearningRate 0.0036   Epoch: 16   Global Step: 270830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:55:05,535-Speed 3318.30 samples/sec   Loss 0.2479   LearningRate 0.0036   Epoch: 16   Global Step: 270840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:55:08,590-Speed 3352.18 samples/sec   Loss 0.2531   LearningRate 0.0036   Epoch: 16   Global Step: 270850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:11,657-Speed 3339.41 samples/sec   Loss 0.2480   LearningRate 0.0036   Epoch: 16   Global Step: 270860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:14,737-Speed 3325.69 samples/sec   Loss 0.2541   LearningRate 0.0036   Epoch: 16   Global Step: 270870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:17,804-Speed 3339.35 samples/sec   Loss 0.2331   LearningRate 0.0036   Epoch: 16   Global Step: 270880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:20,875-Speed 3334.46 samples/sec   Loss 0.2426   LearningRate 0.0036   Epoch: 16   Global Step: 270890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:23,976-Speed 3303.38 samples/sec   Loss 0.2585   LearningRate 0.0036   Epoch: 16   Global Step: 270900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:27,047-Speed 3335.67 samples/sec   Loss 0.2700   LearningRate 0.0036   Epoch: 16   Global Step: 270910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:30,112-Speed 3341.81 samples/sec   Loss 0.2731   LearningRate 0.0036   Epoch: 16   Global Step: 270920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:33,183-Speed 3334.98 samples/sec   Loss 0.2651   LearningRate 0.0035   Epoch: 16   Global Step: 270930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:36,287-Speed 3299.67 samples/sec   Loss 0.2616   LearningRate 0.0035   Epoch: 16   Global Step: 270940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:55:39,357-Speed 3335.87 samples/sec   Loss 0.2701   LearningRate 0.0035   Epoch: 16   Global Step: 270950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:55:42,424-Speed 3339.78 samples/sec   Loss 0.2361   LearningRate 0.0035   Epoch: 16   Global Step: 270960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:55:45,516-Speed 3312.44 samples/sec   Loss 0.2580   LearningRate 0.0035   Epoch: 16   Global Step: 270970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:55:48,588-Speed 3333.94 samples/sec   Loss 0.2754   LearningRate 0.0035   Epoch: 16   Global Step: 270980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:55:51,741-Speed 3248.64 samples/sec   Loss 0.2632   LearningRate 0.0035   Epoch: 16   Global Step: 270990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:55:54,841-Speed 3303.64 samples/sec   Loss 0.2481   LearningRate 0.0035   Epoch: 16   Global Step: 271000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:55:57,908-Speed 3339.36 samples/sec   Loss 0.2622   LearningRate 0.0035   Epoch: 16   Global Step: 271010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:00,983-Speed 3330.80 samples/sec   Loss 0.2630   LearningRate 0.0035   Epoch: 16   Global Step: 271020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:04,037-Speed 3353.75 samples/sec   Loss 0.2513   LearningRate 0.0035   Epoch: 16   Global Step: 271030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:07,103-Speed 3340.46 samples/sec   Loss 0.2753   LearningRate 0.0035   Epoch: 16   Global Step: 271040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:10,167-Speed 3343.25 samples/sec   Loss 0.2732   LearningRate 0.0035   Epoch: 16   Global Step: 271050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:13,277-Speed 3292.58 samples/sec   Loss 0.2657   LearningRate 0.0035   Epoch: 16   Global Step: 271060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:16,353-Speed 3329.84 samples/sec   Loss 0.2637   LearningRate 0.0035   Epoch: 16   Global Step: 271070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:19,429-Speed 3330.53 samples/sec   Loss 0.2717   LearningRate 0.0035   Epoch: 16   Global Step: 271080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:22,491-Speed 3345.16 samples/sec   Loss 0.2612   LearningRate 0.0035   Epoch: 16   Global Step: 271090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:25,561-Speed 3335.98 samples/sec   Loss 0.2562   LearningRate 0.0035   Epoch: 16   Global Step: 271100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:28,628-Speed 3340.11 samples/sec   Loss 0.2629   LearningRate 0.0035   Epoch: 16   Global Step: 271110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:31,694-Speed 3339.68 samples/sec   Loss 0.2575   LearningRate 0.0035   Epoch: 16   Global Step: 271120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:56:34,756-Speed 3345.27 samples/sec   Loss 0.2902   LearningRate 0.0035   Epoch: 16   Global Step: 271130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:37,842-Speed 3318.52 samples/sec   Loss 0.2813   LearningRate 0.0035   Epoch: 16   Global Step: 271140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:40,922-Speed 3326.22 samples/sec   Loss 0.2717   LearningRate 0.0035   Epoch: 16   Global Step: 271150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:44,006-Speed 3320.18 samples/sec   Loss 0.2662   LearningRate 0.0035   Epoch: 16   Global Step: 271160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:47,179-Speed 3228.46 samples/sec   Loss 0.2631   LearningRate 0.0035   Epoch: 16   Global Step: 271170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:50,315-Speed 3266.21 samples/sec   Loss 0.2640   LearningRate 0.0035   Epoch: 16   Global Step: 271180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:53,495-Speed 3220.43 samples/sec   Loss 0.2696   LearningRate 0.0035   Epoch: 16   Global Step: 271190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:56,569-Speed 3332.43 samples/sec   Loss 0.2533   LearningRate 0.0035   Epoch: 16   Global Step: 271200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:56:59,636-Speed 3338.84 samples/sec   Loss 0.2909   LearningRate 0.0035   Epoch: 16   Global Step: 271210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:57:02,731-Speed 3309.17 samples/sec   Loss 0.2679   LearningRate 0.0035   Epoch: 16   Global Step: 271220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:57:05,822-Speed 3314.08 samples/sec   Loss 0.2777   LearningRate 0.0035   Epoch: 16   Global Step: 271230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:08,996-Speed 3227.05 samples/sec   Loss 0.2716   LearningRate 0.0035   Epoch: 16   Global Step: 271240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:12,115-Speed 3283.75 samples/sec   Loss 0.2615   LearningRate 0.0035   Epoch: 16   Global Step: 271250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:15,186-Speed 3334.88 samples/sec   Loss 0.2584   LearningRate 0.0035   Epoch: 16   Global Step: 271260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:18,326-Speed 3262.37 samples/sec   Loss 0.2602   LearningRate 0.0035   Epoch: 16   Global Step: 271270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:21,477-Speed 3249.94 samples/sec   Loss 0.2652   LearningRate 0.0035   Epoch: 16   Global Step: 271280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:24,591-Speed 3289.28 samples/sec   Loss 0.2494   LearningRate 0.0035   Epoch: 16   Global Step: 271290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:27,668-Speed 3328.83 samples/sec   Loss 0.2627   LearningRate 0.0035   Epoch: 16   Global Step: 271300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:30,753-Speed 3319.94 samples/sec   Loss 0.2513   LearningRate 0.0035   Epoch: 16   Global Step: 271310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:33,853-Speed 3303.72 samples/sec   Loss 0.2623   LearningRate 0.0035   Epoch: 16   Global Step: 271320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:57:36,924-Speed 3335.64 samples/sec   Loss 0.2552   LearningRate 0.0035   Epoch: 16   Global Step: 271330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:57:39,998-Speed 3331.13 samples/sec   Loss 0.2670   LearningRate 0.0035   Epoch: 16   Global Step: 271340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:57:43,066-Speed 3339.37 samples/sec   Loss 0.2591   LearningRate 0.0035   Epoch: 16   Global Step: 271350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:57:46,136-Speed 3336.53 samples/sec   Loss 0.2410   LearningRate 0.0035   Epoch: 16   Global Step: 271360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:57:49,201-Speed 3341.69 samples/sec   Loss 0.2436   LearningRate 0.0035   Epoch: 16   Global Step: 271370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:57:52,309-Speed 3295.36 samples/sec   Loss 0.2765   LearningRate 0.0035   Epoch: 16   Global Step: 271380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:57:55,400-Speed 3312.61 samples/sec   Loss 0.2533   LearningRate 0.0035   Epoch: 16   Global Step: 271390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:57:58,515-Speed 3288.07 samples/sec   Loss 0.2674   LearningRate 0.0035   Epoch: 16   Global Step: 271400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:58:01,710-Speed 3206.47 samples/sec   Loss 0.2456   LearningRate 0.0035   Epoch: 16   Global Step: 271410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:58:04,789-Speed 3326.67 samples/sec   Loss 0.2605   LearningRate 0.0035   Epoch: 16   Global Step: 271420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:58:07,890-Speed 3302.19 samples/sec   Loss 0.2728   LearningRate 0.0035   Epoch: 16   Global Step: 271430   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-12 03:58:10,961-Speed 3335.41 samples/sec   Loss 0.2722   LearningRate 0.0035   Epoch: 16   Global Step: 271440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:58:14,041-Speed 3326.21 samples/sec   Loss 0.2436   LearningRate 0.0035   Epoch: 16   Global Step: 271450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:17,118-Speed 3327.98 samples/sec   Loss 0.2638   LearningRate 0.0035   Epoch: 16   Global Step: 271460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:20,183-Speed 3341.82 samples/sec   Loss 0.2674   LearningRate 0.0035   Epoch: 16   Global Step: 271470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:23,262-Speed 3326.35 samples/sec   Loss 0.2585   LearningRate 0.0035   Epoch: 16   Global Step: 271480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:26,335-Speed 3332.82 samples/sec   Loss 0.2568   LearningRate 0.0035   Epoch: 16   Global Step: 271490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:29,418-Speed 3322.51 samples/sec   Loss 0.2816   LearningRate 0.0035   Epoch: 16   Global Step: 271500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:32,506-Speed 3316.89 samples/sec   Loss 0.2681   LearningRate 0.0035   Epoch: 16   Global Step: 271510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:35,583-Speed 3328.31 samples/sec   Loss 0.2651   LearningRate 0.0035   Epoch: 16   Global Step: 271520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:38,654-Speed 3335.93 samples/sec   Loss 0.2691   LearningRate 0.0035   Epoch: 16   Global Step: 271530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:41,730-Speed 3329.00 samples/sec   Loss 0.2490   LearningRate 0.0035   Epoch: 16   Global Step: 271540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:58:44,809-Speed 3326.34 samples/sec   Loss 0.2458   LearningRate 0.0035   Epoch: 16   Global Step: 271550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:58:47,887-Speed 3327.77 samples/sec   Loss 0.2551   LearningRate 0.0035   Epoch: 16   Global Step: 271560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:58:50,999-Speed 3291.20 samples/sec   Loss 0.2592   LearningRate 0.0035   Epoch: 16   Global Step: 271570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:58:54,171-Speed 3228.78 samples/sec   Loss 0.2662   LearningRate 0.0035   Epoch: 16   Global Step: 271580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:58:57,298-Speed 3275.40 samples/sec   Loss 0.2547   LearningRate 0.0035   Epoch: 16   Global Step: 271590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:00,369-Speed 3335.14 samples/sec   Loss 0.2535   LearningRate 0.0035   Epoch: 16   Global Step: 271600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:03,438-Speed 3338.63 samples/sec   Loss 0.2556   LearningRate 0.0035   Epoch: 16   Global Step: 271610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:06,514-Speed 3329.60 samples/sec   Loss 0.2636   LearningRate 0.0035   Epoch: 16   Global Step: 271620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:09,604-Speed 3314.02 samples/sec   Loss 0.2426   LearningRate 0.0035   Epoch: 16   Global Step: 271630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:12,741-Speed 3265.20 samples/sec   Loss 0.2597   LearningRate 0.0035   Epoch: 16   Global Step: 271640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:15,812-Speed 3335.48 samples/sec   Loss 0.2627   LearningRate 0.0035   Epoch: 16   Global Step: 271650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:18,893-Speed 3324.21 samples/sec   Loss 0.2631   LearningRate 0.0035   Epoch: 16   Global Step: 271660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:21,960-Speed 3339.16 samples/sec   Loss 0.2479   LearningRate 0.0035   Epoch: 16   Global Step: 271670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:25,038-Speed 3327.61 samples/sec   Loss 0.2510   LearningRate 0.0035   Epoch: 16   Global Step: 271680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 03:59:28,094-Speed 3351.16 samples/sec   Loss 0.2595   LearningRate 0.0035   Epoch: 16   Global Step: 271690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:31,164-Speed 3336.56 samples/sec   Loss 0.2419   LearningRate 0.0035   Epoch: 16   Global Step: 271700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:34,256-Speed 3313.05 samples/sec   Loss 0.2650   LearningRate 0.0035   Epoch: 16   Global Step: 271710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:37,364-Speed 3295.58 samples/sec   Loss 0.2628   LearningRate 0.0035   Epoch: 16   Global Step: 271720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:40,472-Speed 3295.35 samples/sec   Loss 0.2751   LearningRate 0.0035   Epoch: 16   Global Step: 271730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:43,540-Speed 3338.52 samples/sec   Loss 0.2708   LearningRate 0.0035   Epoch: 16   Global Step: 271740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:46,628-Speed 3316.85 samples/sec   Loss 0.2603   LearningRate 0.0035   Epoch: 16   Global Step: 271750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:49,700-Speed 3334.16 samples/sec   Loss 0.2622   LearningRate 0.0035   Epoch: 16   Global Step: 271760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:52,776-Speed 3329.15 samples/sec   Loss 0.2494   LearningRate 0.0035   Epoch: 16   Global Step: 271770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:55,849-Speed 3333.64 samples/sec   Loss 0.2788   LearningRate 0.0035   Epoch: 16   Global Step: 271780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 03:59:58,992-Speed 3258.98 samples/sec   Loss 0.2834   LearningRate 0.0035   Epoch: 16   Global Step: 271790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:00:02,087-Speed 3308.77 samples/sec   Loss 0.2609   LearningRate 0.0035   Epoch: 16   Global Step: 271800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:00:05,165-Speed 3327.37 samples/sec   Loss 0.2420   LearningRate 0.0035   Epoch: 16   Global Step: 271810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:00:08,232-Speed 3340.54 samples/sec   Loss 0.2531   LearningRate 0.0034   Epoch: 16   Global Step: 271820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:00:11,300-Speed 3338.17 samples/sec   Loss 0.2894   LearningRate 0.0034   Epoch: 16   Global Step: 271830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:00:14,390-Speed 3314.73 samples/sec   Loss 0.2695   LearningRate 0.0034   Epoch: 16   Global Step: 271840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:00:17,461-Speed 3334.86 samples/sec   Loss 0.2749   LearningRate 0.0034   Epoch: 16   Global Step: 271850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:00:20,528-Speed 3339.68 samples/sec   Loss 0.2846   LearningRate 0.0034   Epoch: 16   Global Step: 271860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:23,752-Speed 3176.70 samples/sec   Loss 0.2519   LearningRate 0.0034   Epoch: 16   Global Step: 271870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:26,877-Speed 3277.36 samples/sec   Loss 0.2660   LearningRate 0.0034   Epoch: 16   Global Step: 271880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:29,974-Speed 3307.69 samples/sec   Loss 0.2683   LearningRate 0.0034   Epoch: 16   Global Step: 271890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:33,158-Speed 3216.48 samples/sec   Loss 0.2584   LearningRate 0.0034   Epoch: 16   Global Step: 271900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:36,239-Speed 3324.60 samples/sec   Loss 0.2398   LearningRate 0.0034   Epoch: 16   Global Step: 271910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:39,368-Speed 3273.24 samples/sec   Loss 0.2718   LearningRate 0.0034   Epoch: 16   Global Step: 271920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:42,454-Speed 3318.80 samples/sec   Loss 0.2674   LearningRate 0.0034   Epoch: 16   Global Step: 271930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:45,545-Speed 3313.28 samples/sec   Loss 0.2705   LearningRate 0.0034   Epoch: 16   Global Step: 271940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:48,633-Speed 3317.38 samples/sec   Loss 0.2701   LearningRate 0.0034   Epoch: 16   Global Step: 271950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:00:51,718-Speed 3320.12 samples/sec   Loss 0.2780   LearningRate 0.0034   Epoch: 16   Global Step: 271960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:00:54,818-Speed 3304.23 samples/sec   Loss 0.2427   LearningRate 0.0034   Epoch: 16   Global Step: 271970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:00:57,881-Speed 3343.63 samples/sec   Loss 0.2572   LearningRate 0.0034   Epoch: 16   Global Step: 271980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:01:00,952-Speed 3334.49 samples/sec   Loss 0.2799   LearningRate 0.0034   Epoch: 16   Global Step: 271990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:01:04,056-Speed 3300.33 samples/sec   Loss 0.2686   LearningRate 0.0034   Epoch: 16   Global Step: 272000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:01:47,513-[lfw][272000]XNorm: 20.855730
Training: 2022-04-12 04:01:47,513-[lfw][272000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-12 04:01:47,514-[lfw][272000]Accuracy-Highest: 0.99817
Training: 2022-04-12 04:02:38,135-[cfp_fp][272000]XNorm: 22.385252
Training: 2022-04-12 04:02:38,136-[cfp_fp][272000]Accuracy-Flip: 0.99143+-0.00438
Training: 2022-04-12 04:02:38,136-[cfp_fp][272000]Accuracy-Highest: 0.99186
Training: 2022-04-12 04:03:21,514-[agedb_30][272000]XNorm: 22.864713
Training: 2022-04-12 04:03:21,515-[agedb_30][272000]Accuracy-Flip: 0.98517+-0.00643
Training: 2022-04-12 04:03:21,515-[agedb_30][272000]Accuracy-Highest: 0.98650
Training: 2022-04-12 04:03:24,576-Speed 72.87 samples/sec   Loss 0.2575   LearningRate 0.0034   Epoch: 16   Global Step: 272010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:27,659-Speed 3322.01 samples/sec   Loss 0.2547   LearningRate 0.0034   Epoch: 16   Global Step: 272020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:30,734-Speed 3330.97 samples/sec   Loss 0.2456   LearningRate 0.0034   Epoch: 16   Global Step: 272030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:33,804-Speed 3335.49 samples/sec   Loss 0.2753   LearningRate 0.0034   Epoch: 16   Global Step: 272040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:36,887-Speed 3322.56 samples/sec   Loss 0.2629   LearningRate 0.0034   Epoch: 16   Global Step: 272050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:39,948-Speed 3346.27 samples/sec   Loss 0.2705   LearningRate 0.0034   Epoch: 16   Global Step: 272060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:43,014-Speed 3340.72 samples/sec   Loss 0.2548   LearningRate 0.0034   Epoch: 16   Global Step: 272070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:46,070-Speed 3351.46 samples/sec   Loss 0.2583   LearningRate 0.0034   Epoch: 16   Global Step: 272080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:03:49,134-Speed 3343.17 samples/sec   Loss 0.2594   LearningRate 0.0034   Epoch: 16   Global Step: 272090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:52,242-Speed 3295.06 samples/sec   Loss 0.2641   LearningRate 0.0034   Epoch: 16   Global Step: 272100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:55,347-Speed 3298.34 samples/sec   Loss 0.2629   LearningRate 0.0034   Epoch: 16   Global Step: 272110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:03:58,484-Speed 3264.73 samples/sec   Loss 0.2600   LearningRate 0.0034   Epoch: 16   Global Step: 272120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:01,573-Speed 3316.50 samples/sec   Loss 0.2625   LearningRate 0.0034   Epoch: 16   Global Step: 272130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:04,651-Speed 3327.40 samples/sec   Loss 0.2748   LearningRate 0.0034   Epoch: 16   Global Step: 272140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:07,774-Speed 3279.08 samples/sec   Loss 0.2468   LearningRate 0.0034   Epoch: 16   Global Step: 272150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:10,886-Speed 3292.05 samples/sec   Loss 0.2784   LearningRate 0.0034   Epoch: 16   Global Step: 272160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:14,091-Speed 3195.29 samples/sec   Loss 0.2647   LearningRate 0.0034   Epoch: 16   Global Step: 272170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:17,193-Speed 3301.93 samples/sec   Loss 0.2717   LearningRate 0.0034   Epoch: 16   Global Step: 272180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:20,355-Speed 3239.22 samples/sec   Loss 0.2587   LearningRate 0.0034   Epoch: 16   Global Step: 272190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:04:23,482-Speed 3275.45 samples/sec   Loss 0.2713   LearningRate 0.0034   Epoch: 16   Global Step: 272200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:04:26,573-Speed 3313.54 samples/sec   Loss 0.2857   LearningRate 0.0034   Epoch: 16   Global Step: 272210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:29,637-Speed 3342.79 samples/sec   Loss 0.2984   LearningRate 0.0034   Epoch: 16   Global Step: 272220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:32,726-Speed 3315.22 samples/sec   Loss 0.2760   LearningRate 0.0034   Epoch: 16   Global Step: 272230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:35,842-Speed 3287.81 samples/sec   Loss 0.2607   LearningRate 0.0034   Epoch: 16   Global Step: 272240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:38,973-Speed 3270.62 samples/sec   Loss 0.2818   LearningRate 0.0034   Epoch: 16   Global Step: 272250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:42,052-Speed 3327.18 samples/sec   Loss 0.2666   LearningRate 0.0034   Epoch: 16   Global Step: 272260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:45,194-Speed 3259.52 samples/sec   Loss 0.2468   LearningRate 0.0034   Epoch: 16   Global Step: 272270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:48,290-Speed 3308.42 samples/sec   Loss 0.2580   LearningRate 0.0034   Epoch: 16   Global Step: 272280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:51,370-Speed 3324.84 samples/sec   Loss 0.2705   LearningRate 0.0034   Epoch: 16   Global Step: 272290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:54,446-Speed 3330.11 samples/sec   Loss 0.2370   LearningRate 0.0034   Epoch: 16   Global Step: 272300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:04:57,523-Speed 3328.93 samples/sec   Loss 0.2737   LearningRate 0.0034   Epoch: 16   Global Step: 272310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-12 04:05:00,577-Speed 3354.39 samples/sec   Loss 0.2672   LearningRate 0.0034   Epoch: 16   Global Step: 272320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:03,741-Speed 3236.47 samples/sec   Loss 0.2553   LearningRate 0.0034   Epoch: 16   Global Step: 272330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:06,873-Speed 3270.59 samples/sec   Loss 0.2618   LearningRate 0.0034   Epoch: 16   Global Step: 272340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:10,015-Speed 3259.36 samples/sec   Loss 0.2524   LearningRate 0.0034   Epoch: 16   Global Step: 272350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:13,135-Speed 3283.40 samples/sec   Loss 0.2583   LearningRate 0.0034   Epoch: 16   Global Step: 272360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:16,311-Speed 3224.86 samples/sec   Loss 0.2836   LearningRate 0.0034   Epoch: 16   Global Step: 272370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:19,453-Speed 3259.71 samples/sec   Loss 0.2650   LearningRate 0.0034   Epoch: 16   Global Step: 272380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:22,584-Speed 3270.63 samples/sec   Loss 0.2695   LearningRate 0.0034   Epoch: 16   Global Step: 272390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:25,695-Speed 3291.97 samples/sec   Loss 0.2769   LearningRate 0.0034   Epoch: 16   Global Step: 272400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:28,772-Speed 3329.84 samples/sec   Loss 0.2700   LearningRate 0.0034   Epoch: 16   Global Step: 272410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-12 04:05:31,967-Speed 3205.64 samples/sec   Loss 0.2669   LearningRate 0.0034   Epoch: 16   Global Step: 272420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:05:35,031-Speed 3342.30 samples/sec   Loss 0.2704   LearningRate 0.0034   Epoch: 16   Global Step: 272430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:05:38,095-Speed 3343.49 samples/sec   Loss 0.2621   LearningRate 0.0034   Epoch: 16   Global Step: 272440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:05:41,191-Speed 3308.23 samples/sec   Loss 0.2619   LearningRate 0.0034   Epoch: 16   Global Step: 272450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:05:44,316-Speed 3277.56 samples/sec   Loss 0.2550   LearningRate 0.0034   Epoch: 16   Global Step: 272460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:05:47,422-Speed 3297.02 samples/sec   Loss 0.2566   LearningRate 0.0034   Epoch: 16   Global Step: 272470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:05:50,528-Speed 3297.96 samples/sec   Loss 0.2470   LearningRate 0.0034   Epoch: 16   Global Step: 272480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:05:53,622-Speed 3310.11 samples/sec   Loss 0.2596   LearningRate 0.0034   Epoch: 16   Global Step: 272490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:05:56,741-Speed 3283.97 samples/sec   Loss 0.2827   LearningRate 0.0034   Epoch: 16   Global Step: 272500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:05:59,884-Speed 3259.11 samples/sec   Loss 0.2561   LearningRate 0.0034   Epoch: 16   Global Step: 272510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:06:03,014-Speed 3272.60 samples/sec   Loss 0.2655   LearningRate 0.0034   Epoch: 16   Global Step: 272520   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:06:06,062-Speed 3359.81 samples/sec   Loss 0.2691   LearningRate 0.0034   Epoch: 16   Global Step: 272530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:06:09,205-Speed 3258.91 samples/sec   Loss 0.2744   LearningRate 0.0034   Epoch: 16   Global Step: 272540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:06:12,351-Speed 3255.27 samples/sec   Loss 0.2838   LearningRate 0.0034   Epoch: 16   Global Step: 272550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:06:15,416-Speed 3341.94 samples/sec   Loss 0.2712   LearningRate 0.0034   Epoch: 16   Global Step: 272560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:06:18,465-Speed 3359.73 samples/sec   Loss 0.2708   LearningRate 0.0034   Epoch: 16   Global Step: 272570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:21,533-Speed 3337.96 samples/sec   Loss 0.2589   LearningRate 0.0034   Epoch: 16   Global Step: 272580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:24,596-Speed 3343.99 samples/sec   Loss 0.2293   LearningRate 0.0034   Epoch: 16   Global Step: 272590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:27,659-Speed 3344.42 samples/sec   Loss 0.2726   LearningRate 0.0034   Epoch: 16   Global Step: 272600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:30,741-Speed 3322.63 samples/sec   Loss 0.2654   LearningRate 0.0034   Epoch: 16   Global Step: 272610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:33,843-Speed 3302.48 samples/sec   Loss 0.2603   LearningRate 0.0034   Epoch: 16   Global Step: 272620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:36,919-Speed 3329.27 samples/sec   Loss 0.2750   LearningRate 0.0034   Epoch: 16   Global Step: 272630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:40,009-Speed 3314.41 samples/sec   Loss 0.2666   LearningRate 0.0034   Epoch: 16   Global Step: 272640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:43,086-Speed 3328.63 samples/sec   Loss 0.2513   LearningRate 0.0034   Epoch: 16   Global Step: 272650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:46,203-Speed 3286.49 samples/sec   Loss 0.2751   LearningRate 0.0034   Epoch: 16   Global Step: 272660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:06:49,299-Speed 3308.15 samples/sec   Loss 0.2854   LearningRate 0.0034   Epoch: 16   Global Step: 272670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:06:52,362-Speed 3343.88 samples/sec   Loss 0.2779   LearningRate 0.0034   Epoch: 16   Global Step: 272680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:06:55,426-Speed 3342.54 samples/sec   Loss 0.2604   LearningRate 0.0034   Epoch: 16   Global Step: 272690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:06:58,522-Speed 3309.03 samples/sec   Loss 0.2553   LearningRate 0.0034   Epoch: 16   Global Step: 272700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:07:01,576-Speed 3353.98 samples/sec   Loss 0.2658   LearningRate 0.0034   Epoch: 16   Global Step: 272710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:04,636-Speed 3347.43 samples/sec   Loss 0.2600   LearningRate 0.0034   Epoch: 16   Global Step: 272720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:07,701-Speed 3340.84 samples/sec   Loss 0.2709   LearningRate 0.0033   Epoch: 16   Global Step: 272730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:10,766-Speed 3342.37 samples/sec   Loss 0.2641   LearningRate 0.0033   Epoch: 16   Global Step: 272740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:13,846-Speed 3325.89 samples/sec   Loss 0.2615   LearningRate 0.0033   Epoch: 16   Global Step: 272750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:16,905-Speed 3348.10 samples/sec   Loss 0.2807   LearningRate 0.0033   Epoch: 16   Global Step: 272760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:19,988-Speed 3321.49 samples/sec   Loss 0.2617   LearningRate 0.0033   Epoch: 16   Global Step: 272770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:23,049-Speed 3345.86 samples/sec   Loss 0.2655   LearningRate 0.0033   Epoch: 16   Global Step: 272780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:26,127-Speed 3328.24 samples/sec   Loss 0.2737   LearningRate 0.0033   Epoch: 16   Global Step: 272790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:29,184-Speed 3349.74 samples/sec   Loss 0.2768   LearningRate 0.0033   Epoch: 16   Global Step: 272800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:07:32,263-Speed 3327.13 samples/sec   Loss 0.2668   LearningRate 0.0033   Epoch: 16   Global Step: 272810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:07:35,330-Speed 3339.69 samples/sec   Loss 0.2709   LearningRate 0.0033   Epoch: 16   Global Step: 272820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:07:38,394-Speed 3342.30 samples/sec   Loss 0.2716   LearningRate 0.0033   Epoch: 16   Global Step: 272830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:07:41,464-Speed 3336.66 samples/sec   Loss 0.2719   LearningRate 0.0033   Epoch: 16   Global Step: 272840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:07:44,531-Speed 3339.48 samples/sec   Loss 0.2891   LearningRate 0.0033   Epoch: 16   Global Step: 272850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:07:47,623-Speed 3313.02 samples/sec   Loss 0.2492   LearningRate 0.0033   Epoch: 16   Global Step: 272860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:07:50,710-Speed 3317.55 samples/sec   Loss 0.2519   LearningRate 0.0033   Epoch: 16   Global Step: 272870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:07:53,859-Speed 3252.62 samples/sec   Loss 0.2533   LearningRate 0.0033   Epoch: 16   Global Step: 272880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:07:57,021-Speed 3238.50 samples/sec   Loss 0.2869   LearningRate 0.0033   Epoch: 16   Global Step: 272890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:00,128-Speed 3296.96 samples/sec   Loss 0.2685   LearningRate 0.0033   Epoch: 16   Global Step: 272900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:03,246-Speed 3285.11 samples/sec   Loss 0.2712   LearningRate 0.0033   Epoch: 16   Global Step: 272910   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:08:06,301-Speed 3353.32 samples/sec   Loss 0.2559   LearningRate 0.0033   Epoch: 16   Global Step: 272920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:09,378-Speed 3328.22 samples/sec   Loss 0.2795   LearningRate 0.0033   Epoch: 16   Global Step: 272930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:12,444-Speed 3340.85 samples/sec   Loss 0.2391   LearningRate 0.0033   Epoch: 16   Global Step: 272940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:15,614-Speed 3230.68 samples/sec   Loss 0.2691   LearningRate 0.0033   Epoch: 16   Global Step: 272950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:18,682-Speed 3338.45 samples/sec   Loss 0.2576   LearningRate 0.0033   Epoch: 16   Global Step: 272960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:21,755-Speed 3333.40 samples/sec   Loss 0.2863   LearningRate 0.0033   Epoch: 16   Global Step: 272970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:24,820-Speed 3341.13 samples/sec   Loss 0.2718   LearningRate 0.0033   Epoch: 16   Global Step: 272980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:27,892-Speed 3334.38 samples/sec   Loss 0.2576   LearningRate 0.0033   Epoch: 16   Global Step: 272990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:30,984-Speed 3312.61 samples/sec   Loss 0.2874   LearningRate 0.0033   Epoch: 16   Global Step: 273000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:34,090-Speed 3298.08 samples/sec   Loss 0.2508   LearningRate 0.0033   Epoch: 16   Global Step: 273010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:37,174-Speed 3320.36 samples/sec   Loss 0.2616   LearningRate 0.0033   Epoch: 16   Global Step: 273020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:40,330-Speed 3246.09 samples/sec   Loss 0.2642   LearningRate 0.0033   Epoch: 16   Global Step: 273030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:43,405-Speed 3330.69 samples/sec   Loss 0.2748   LearningRate 0.0033   Epoch: 16   Global Step: 273040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:46,571-Speed 3235.16 samples/sec   Loss 0.3004   LearningRate 0.0033   Epoch: 16   Global Step: 273050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:49,682-Speed 3291.59 samples/sec   Loss 0.2529   LearningRate 0.0033   Epoch: 16   Global Step: 273060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:52,749-Speed 3339.28 samples/sec   Loss 0.2813   LearningRate 0.0033   Epoch: 16   Global Step: 273070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:55,815-Speed 3341.49 samples/sec   Loss 0.2550   LearningRate 0.0033   Epoch: 16   Global Step: 273080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:08:58,878-Speed 3344.94 samples/sec   Loss 0.2779   LearningRate 0.0033   Epoch: 16   Global Step: 273090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:01,949-Speed 3335.43 samples/sec   Loss 0.2582   LearningRate 0.0033   Epoch: 16   Global Step: 273100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:05,035-Speed 3318.66 samples/sec   Loss 0.2586   LearningRate 0.0033   Epoch: 16   Global Step: 273110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:08,095-Speed 3347.18 samples/sec   Loss 0.2741   LearningRate 0.0033   Epoch: 16   Global Step: 273120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:11,183-Speed 3316.80 samples/sec   Loss 0.2737   LearningRate 0.0033   Epoch: 16   Global Step: 273130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:14,281-Speed 3305.78 samples/sec   Loss 0.2572   LearningRate 0.0033   Epoch: 16   Global Step: 273140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:17,362-Speed 3324.40 samples/sec   Loss 0.2511   LearningRate 0.0033   Epoch: 16   Global Step: 273150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:20,436-Speed 3332.24 samples/sec   Loss 0.2784   LearningRate 0.0033   Epoch: 16   Global Step: 273160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:23,500-Speed 3342.30 samples/sec   Loss 0.2607   LearningRate 0.0033   Epoch: 16   Global Step: 273170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:26,572-Speed 3334.34 samples/sec   Loss 0.2686   LearningRate 0.0033   Epoch: 16   Global Step: 273180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:09:29,640-Speed 3338.29 samples/sec   Loss 0.2623   LearningRate 0.0033   Epoch: 16   Global Step: 273190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:09:32,725-Speed 3320.23 samples/sec   Loss 0.2574   LearningRate 0.0033   Epoch: 16   Global Step: 273200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:09:35,788-Speed 3344.31 samples/sec   Loss 0.2676   LearningRate 0.0033   Epoch: 16   Global Step: 273210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:09:38,859-Speed 3334.79 samples/sec   Loss 0.2549   LearningRate 0.0033   Epoch: 16   Global Step: 273220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:09:41,929-Speed 3335.88 samples/sec   Loss 0.2713   LearningRate 0.0033   Epoch: 16   Global Step: 273230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:09:45,122-Speed 3208.13 samples/sec   Loss 0.2667   LearningRate 0.0033   Epoch: 16   Global Step: 273240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:09:48,203-Speed 3324.42 samples/sec   Loss 0.2519   LearningRate 0.0033   Epoch: 16   Global Step: 273250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:09:51,308-Speed 3298.72 samples/sec   Loss 0.2598   LearningRate 0.0033   Epoch: 16   Global Step: 273260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:09:54,376-Speed 3338.05 samples/sec   Loss 0.2848   LearningRate 0.0033   Epoch: 16   Global Step: 273270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:09:57,537-Speed 3240.97 samples/sec   Loss 0.2652   LearningRate 0.0033   Epoch: 16   Global Step: 273280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:10:00,604-Speed 3338.64 samples/sec   Loss 0.2563   LearningRate 0.0033   Epoch: 16   Global Step: 273290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:03,687-Speed 3322.72 samples/sec   Loss 0.2844   LearningRate 0.0033   Epoch: 16   Global Step: 273300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:06,766-Speed 3325.99 samples/sec   Loss 0.2579   LearningRate 0.0033   Epoch: 16   Global Step: 273310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:09,854-Speed 3317.17 samples/sec   Loss 0.2611   LearningRate 0.0033   Epoch: 16   Global Step: 273320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:12,991-Speed 3265.00 samples/sec   Loss 0.2674   LearningRate 0.0033   Epoch: 16   Global Step: 273330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:16,169-Speed 3222.83 samples/sec   Loss 0.2782   LearningRate 0.0033   Epoch: 16   Global Step: 273340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:19,244-Speed 3331.80 samples/sec   Loss 0.2513   LearningRate 0.0033   Epoch: 16   Global Step: 273350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:22,311-Speed 3339.35 samples/sec   Loss 0.2775   LearningRate 0.0033   Epoch: 16   Global Step: 273360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:25,385-Speed 3331.33 samples/sec   Loss 0.2616   LearningRate 0.0033   Epoch: 16   Global Step: 273370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:28,459-Speed 3332.53 samples/sec   Loss 0.2790   LearningRate 0.0033   Epoch: 16   Global Step: 273380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:31,525-Speed 3339.78 samples/sec   Loss 0.2595   LearningRate 0.0033   Epoch: 16   Global Step: 273390   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:10:34,703-Speed 3222.94 samples/sec   Loss 0.2683   LearningRate 0.0033   Epoch: 16   Global Step: 273400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:10:37,771-Speed 3338.69 samples/sec   Loss 0.2771   LearningRate 0.0033   Epoch: 16   Global Step: 273410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:10:40,917-Speed 3255.39 samples/sec   Loss 0.2749   LearningRate 0.0033   Epoch: 16   Global Step: 273420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:10:44,065-Speed 3253.83 samples/sec   Loss 0.2856   LearningRate 0.0033   Epoch: 16   Global Step: 273430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:10:47,206-Speed 3261.37 samples/sec   Loss 0.2518   LearningRate 0.0033   Epoch: 16   Global Step: 273440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:10:50,288-Speed 3323.45 samples/sec   Loss 0.2719   LearningRate 0.0033   Epoch: 16   Global Step: 273450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:10:53,438-Speed 3250.85 samples/sec   Loss 0.2856   LearningRate 0.0033   Epoch: 16   Global Step: 273460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:10:56,516-Speed 3327.57 samples/sec   Loss 0.2563   LearningRate 0.0033   Epoch: 16   Global Step: 273470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:10:59,584-Speed 3339.22 samples/sec   Loss 0.2690   LearningRate 0.0033   Epoch: 16   Global Step: 273480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:02,737-Speed 3248.09 samples/sec   Loss 0.2881   LearningRate 0.0033   Epoch: 16   Global Step: 273490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:05,911-Speed 3226.64 samples/sec   Loss 0.2677   LearningRate 0.0033   Epoch: 16   Global Step: 273500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:08,984-Speed 3333.36 samples/sec   Loss 0.2771   LearningRate 0.0033   Epoch: 16   Global Step: 273510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:11:12,045-Speed 3345.90 samples/sec   Loss 0.2740   LearningRate 0.0033   Epoch: 16   Global Step: 273520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:15,230-Speed 3216.37 samples/sec   Loss 0.2675   LearningRate 0.0033   Epoch: 16   Global Step: 273530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:18,300-Speed 3336.20 samples/sec   Loss 0.2720   LearningRate 0.0033   Epoch: 16   Global Step: 273540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:21,367-Speed 3338.80 samples/sec   Loss 0.2689   LearningRate 0.0033   Epoch: 16   Global Step: 273550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:24,453-Speed 3318.89 samples/sec   Loss 0.2814   LearningRate 0.0033   Epoch: 16   Global Step: 273560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:27,545-Speed 3312.79 samples/sec   Loss 0.2755   LearningRate 0.0033   Epoch: 16   Global Step: 273570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:30,663-Speed 3285.06 samples/sec   Loss 0.2782   LearningRate 0.0033   Epoch: 16   Global Step: 273580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:33,752-Speed 3315.35 samples/sec   Loss 0.2615   LearningRate 0.0033   Epoch: 16   Global Step: 273590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:36,836-Speed 3321.12 samples/sec   Loss 0.2613   LearningRate 0.0033   Epoch: 16   Global Step: 273600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:39,957-Speed 3282.07 samples/sec   Loss 0.2483   LearningRate 0.0033   Epoch: 16   Global Step: 273610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:11:43,034-Speed 3328.51 samples/sec   Loss 0.2582   LearningRate 0.0033   Epoch: 16   Global Step: 273620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:11:46,107-Speed 3333.39 samples/sec   Loss 0.2568   LearningRate 0.0033   Epoch: 16   Global Step: 273630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:11:49,173-Speed 3341.11 samples/sec   Loss 0.2619   LearningRate 0.0032   Epoch: 16   Global Step: 273640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:11:52,255-Speed 3323.07 samples/sec   Loss 0.2814   LearningRate 0.0032   Epoch: 16   Global Step: 273650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:11:55,341-Speed 3318.56 samples/sec   Loss 0.2721   LearningRate 0.0032   Epoch: 16   Global Step: 273660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:11:58,478-Speed 3264.57 samples/sec   Loss 0.2597   LearningRate 0.0032   Epoch: 16   Global Step: 273670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:12:01,575-Speed 3307.60 samples/sec   Loss 0.2402   LearningRate 0.0032   Epoch: 16   Global Step: 273680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:12:04,644-Speed 3337.79 samples/sec   Loss 0.2571   LearningRate 0.0032   Epoch: 16   Global Step: 273690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:12:07,715-Speed 3335.48 samples/sec   Loss 0.2826   LearningRate 0.0032   Epoch: 16   Global Step: 273700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:12:10,803-Speed 3315.95 samples/sec   Loss 0.2622   LearningRate 0.0032   Epoch: 16   Global Step: 273710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:12:13,911-Speed 3295.65 samples/sec   Loss 0.2706   LearningRate 0.0032   Epoch: 16   Global Step: 273720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:12:16,986-Speed 3330.71 samples/sec   Loss 0.2827   LearningRate 0.0032   Epoch: 16   Global Step: 273730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:20,062-Speed 3330.05 samples/sec   Loss 0.2793   LearningRate 0.0032   Epoch: 16   Global Step: 273740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:23,138-Speed 3329.09 samples/sec   Loss 0.2687   LearningRate 0.0032   Epoch: 16   Global Step: 273750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:26,247-Speed 3294.98 samples/sec   Loss 0.2712   LearningRate 0.0032   Epoch: 16   Global Step: 273760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:29,387-Speed 3261.33 samples/sec   Loss 0.2786   LearningRate 0.0032   Epoch: 16   Global Step: 273770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:32,552-Speed 3237.04 samples/sec   Loss 0.2481   LearningRate 0.0032   Epoch: 16   Global Step: 273780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:35,689-Speed 3264.33 samples/sec   Loss 0.2772   LearningRate 0.0032   Epoch: 16   Global Step: 273790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:38,792-Speed 3300.83 samples/sec   Loss 0.2724   LearningRate 0.0032   Epoch: 16   Global Step: 273800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:41,867-Speed 3331.37 samples/sec   Loss 0.2531   LearningRate 0.0032   Epoch: 16   Global Step: 273810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:44,937-Speed 3335.64 samples/sec   Loss 0.2705   LearningRate 0.0032   Epoch: 16   Global Step: 273820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:12:48,007-Speed 3336.96 samples/sec   Loss 0.2674   LearningRate 0.0032   Epoch: 16   Global Step: 273830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:12:51,086-Speed 3326.01 samples/sec   Loss 0.2517   LearningRate 0.0032   Epoch: 16   Global Step: 273840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:12:54,168-Speed 3322.81 samples/sec   Loss 0.2720   LearningRate 0.0032   Epoch: 16   Global Step: 273850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:12:57,274-Speed 3297.72 samples/sec   Loss 0.2644   LearningRate 0.0032   Epoch: 16   Global Step: 273860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:13:00,343-Speed 3338.32 samples/sec   Loss 0.2673   LearningRate 0.0032   Epoch: 16   Global Step: 273870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:13:03,406-Speed 3343.84 samples/sec   Loss 0.2736   LearningRate 0.0032   Epoch: 16   Global Step: 273880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:13:06,476-Speed 3335.86 samples/sec   Loss 0.2799   LearningRate 0.0032   Epoch: 16   Global Step: 273890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:13:09,539-Speed 3343.73 samples/sec   Loss 0.2806   LearningRate 0.0032   Epoch: 16   Global Step: 273900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:13:12,616-Speed 3329.09 samples/sec   Loss 0.2735   LearningRate 0.0032   Epoch: 16   Global Step: 273910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:13:15,684-Speed 3338.42 samples/sec   Loss 0.2755   LearningRate 0.0032   Epoch: 16   Global Step: 273920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:13:18,749-Speed 3341.90 samples/sec   Loss 0.2819   LearningRate 0.0032   Epoch: 16   Global Step: 273930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:13:21,841-Speed 3311.95 samples/sec   Loss 0.2595   LearningRate 0.0032   Epoch: 16   Global Step: 273940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:13:24,904-Speed 3344.78 samples/sec   Loss 0.2648   LearningRate 0.0032   Epoch: 16   Global Step: 273950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:13:27,966-Speed 3345.28 samples/sec   Loss 0.2532   LearningRate 0.0032   Epoch: 16   Global Step: 273960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:13:31,073-Speed 3296.52 samples/sec   Loss 0.2677   LearningRate 0.0032   Epoch: 16   Global Step: 273970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:13:34,138-Speed 3340.72 samples/sec   Loss 0.2753   LearningRate 0.0032   Epoch: 16   Global Step: 273980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:13:37,250-Speed 3292.16 samples/sec   Loss 0.2595   LearningRate 0.0032   Epoch: 16   Global Step: 273990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:13:40,391-Speed 3260.19 samples/sec   Loss 0.2762   LearningRate 0.0032   Epoch: 16   Global Step: 274000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:14:23,898-[lfw][274000]XNorm: 21.338436
Training: 2022-04-12 04:14:23,899-[lfw][274000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-12 04:14:23,899-[lfw][274000]Accuracy-Highest: 0.99817
Training: 2022-04-12 04:15:14,386-[cfp_fp][274000]XNorm: 23.095802
Training: 2022-04-12 04:15:14,387-[cfp_fp][274000]Accuracy-Flip: 0.99100+-0.00447
Training: 2022-04-12 04:15:14,387-[cfp_fp][274000]Accuracy-Highest: 0.99186
Training: 2022-04-12 04:15:57,843-[agedb_30][274000]XNorm: 23.452439
Training: 2022-04-12 04:15:57,844-[agedb_30][274000]Accuracy-Flip: 0.98517+-0.00545
Training: 2022-04-12 04:15:57,844-[agedb_30][274000]Accuracy-Highest: 0.98650
Training: 2022-04-12 04:16:00,934-Speed 72.86 samples/sec   Loss 0.2795   LearningRate 0.0032   Epoch: 16   Global Step: 274010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:16:04,000-Speed 3340.50 samples/sec   Loss 0.2798   LearningRate 0.0032   Epoch: 16   Global Step: 274020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:07,075-Speed 3331.38 samples/sec   Loss 0.2681   LearningRate 0.0032   Epoch: 16   Global Step: 274030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:10,128-Speed 3354.37 samples/sec   Loss 0.2594   LearningRate 0.0032   Epoch: 16   Global Step: 274040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:13,196-Speed 3338.78 samples/sec   Loss 0.2709   LearningRate 0.0032   Epoch: 16   Global Step: 274050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:16,265-Speed 3336.93 samples/sec   Loss 0.2802   LearningRate 0.0032   Epoch: 16   Global Step: 274060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:19,324-Speed 3348.10 samples/sec   Loss 0.2561   LearningRate 0.0032   Epoch: 16   Global Step: 274070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:22,383-Speed 3348.55 samples/sec   Loss 0.2711   LearningRate 0.0032   Epoch: 16   Global Step: 274080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:25,446-Speed 3344.25 samples/sec   Loss 0.2640   LearningRate 0.0032   Epoch: 16   Global Step: 274090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:28,510-Speed 3342.56 samples/sec   Loss 0.2848   LearningRate 0.0032   Epoch: 16   Global Step: 274100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:31,580-Speed 3336.73 samples/sec   Loss 0.2738   LearningRate 0.0032   Epoch: 16   Global Step: 274110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:34,625-Speed 3363.24 samples/sec   Loss 0.2819   LearningRate 0.0032   Epoch: 16   Global Step: 274120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:37,731-Speed 3297.77 samples/sec   Loss 0.2628   LearningRate 0.0032   Epoch: 16   Global Step: 274130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:40,790-Speed 3348.42 samples/sec   Loss 0.2617   LearningRate 0.0032   Epoch: 16   Global Step: 274140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:43,878-Speed 3315.98 samples/sec   Loss 0.2655   LearningRate 0.0032   Epoch: 16   Global Step: 274150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:46,938-Speed 3347.98 samples/sec   Loss 0.2639   LearningRate 0.0032   Epoch: 16   Global Step: 274160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:50,021-Speed 3321.47 samples/sec   Loss 0.2776   LearningRate 0.0032   Epoch: 16   Global Step: 274170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:53,090-Speed 3338.20 samples/sec   Loss 0.2758   LearningRate 0.0032   Epoch: 16   Global Step: 274180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:56,163-Speed 3332.71 samples/sec   Loss 0.2866   LearningRate 0.0032   Epoch: 16   Global Step: 274190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:16:59,221-Speed 3348.88 samples/sec   Loss 0.2572   LearningRate 0.0032   Epoch: 16   Global Step: 274200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:17:02,296-Speed 3331.10 samples/sec   Loss 0.2575   LearningRate 0.0032   Epoch: 16   Global Step: 274210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:17:05,363-Speed 3339.17 samples/sec   Loss 0.2654   LearningRate 0.0032   Epoch: 16   Global Step: 274220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:17:08,434-Speed 3335.72 samples/sec   Loss 0.2631   LearningRate 0.0032   Epoch: 16   Global Step: 274230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:17:11,512-Speed 3327.01 samples/sec   Loss 0.2619   LearningRate 0.0032   Epoch: 16   Global Step: 274240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:17:14,570-Speed 3349.32 samples/sec   Loss 0.2534   LearningRate 0.0032   Epoch: 16   Global Step: 274250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:17:17,646-Speed 3330.16 samples/sec   Loss 0.2860   LearningRate 0.0032   Epoch: 16   Global Step: 274260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:17:20,717-Speed 3334.77 samples/sec   Loss 0.2754   LearningRate 0.0032   Epoch: 16   Global Step: 274270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:17:23,790-Speed 3334.03 samples/sec   Loss 0.2716   LearningRate 0.0032   Epoch: 16   Global Step: 274280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:17:26,842-Speed 3355.60 samples/sec   Loss 0.2674   LearningRate 0.0032   Epoch: 16   Global Step: 274290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:29,905-Speed 3343.19 samples/sec   Loss 0.2675   LearningRate 0.0032   Epoch: 16   Global Step: 274300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:32,965-Speed 3347.38 samples/sec   Loss 0.2691   LearningRate 0.0032   Epoch: 16   Global Step: 274310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:36,025-Speed 3347.21 samples/sec   Loss 0.2791   LearningRate 0.0032   Epoch: 16   Global Step: 274320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:39,087-Speed 3345.50 samples/sec   Loss 0.2854   LearningRate 0.0032   Epoch: 16   Global Step: 274330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:42,146-Speed 3347.60 samples/sec   Loss 0.2804   LearningRate 0.0032   Epoch: 16   Global Step: 274340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:45,228-Speed 3323.26 samples/sec   Loss 0.2630   LearningRate 0.0032   Epoch: 16   Global Step: 274350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:48,295-Speed 3339.89 samples/sec   Loss 0.2572   LearningRate 0.0032   Epoch: 16   Global Step: 274360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:51,357-Speed 3345.26 samples/sec   Loss 0.2709   LearningRate 0.0032   Epoch: 16   Global Step: 274370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:54,422-Speed 3342.29 samples/sec   Loss 0.2817   LearningRate 0.0032   Epoch: 16   Global Step: 274380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:17:57,491-Speed 3337.25 samples/sec   Loss 0.2650   LearningRate 0.0032   Epoch: 16   Global Step: 274390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:00,559-Speed 3338.04 samples/sec   Loss 0.2616   LearningRate 0.0032   Epoch: 16   Global Step: 274400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:03,648-Speed 3315.14 samples/sec   Loss 0.2707   LearningRate 0.0032   Epoch: 16   Global Step: 274410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:06,716-Speed 3338.47 samples/sec   Loss 0.2761   LearningRate 0.0032   Epoch: 16   Global Step: 274420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:09,776-Speed 3347.20 samples/sec   Loss 0.2637   LearningRate 0.0032   Epoch: 16   Global Step: 274430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:12,855-Speed 3327.45 samples/sec   Loss 0.2653   LearningRate 0.0032   Epoch: 16   Global Step: 274440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:15,920-Speed 3341.47 samples/sec   Loss 0.2679   LearningRate 0.0032   Epoch: 16   Global Step: 274450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:18,986-Speed 3341.12 samples/sec   Loss 0.2529   LearningRate 0.0032   Epoch: 16   Global Step: 274460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:22,040-Speed 3353.50 samples/sec   Loss 0.2637   LearningRate 0.0032   Epoch: 16   Global Step: 274470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:25,104-Speed 3343.08 samples/sec   Loss 0.2653   LearningRate 0.0032   Epoch: 16   Global Step: 274480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:28,187-Speed 3321.08 samples/sec   Loss 0.2683   LearningRate 0.0032   Epoch: 16   Global Step: 274490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:31,300-Speed 3290.79 samples/sec   Loss 0.2796   LearningRate 0.0032   Epoch: 16   Global Step: 274500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:34,367-Speed 3339.60 samples/sec   Loss 0.2549   LearningRate 0.0032   Epoch: 16   Global Step: 274510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:37,435-Speed 3338.05 samples/sec   Loss 0.2763   LearningRate 0.0032   Epoch: 16   Global Step: 274520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:40,622-Speed 3214.44 samples/sec   Loss 0.2710   LearningRate 0.0032   Epoch: 16   Global Step: 274530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:43,800-Speed 3222.25 samples/sec   Loss 0.2554   LearningRate 0.0032   Epoch: 16   Global Step: 274540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:46,873-Speed 3333.04 samples/sec   Loss 0.2519   LearningRate 0.0032   Epoch: 16   Global Step: 274550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:49,938-Speed 3341.53 samples/sec   Loss 0.2620   LearningRate 0.0032   Epoch: 16   Global Step: 274560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:18:53,005-Speed 3340.42 samples/sec   Loss 0.2699   LearningRate 0.0032   Epoch: 16   Global Step: 274570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:56,076-Speed 3334.06 samples/sec   Loss 0.2850   LearningRate 0.0031   Epoch: 16   Global Step: 274580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:18:59,171-Speed 3309.97 samples/sec   Loss 0.2844   LearningRate 0.0031   Epoch: 16   Global Step: 274590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:19:02,254-Speed 3322.25 samples/sec   Loss 0.2679   LearningRate 0.0031   Epoch: 16   Global Step: 274600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:19:05,314-Speed 3349.05 samples/sec   Loss 0.2667   LearningRate 0.0031   Epoch: 16   Global Step: 274610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:19:08,388-Speed 3332.60 samples/sec   Loss 0.2924   LearningRate 0.0031   Epoch: 16   Global Step: 274620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:19:11,454-Speed 3340.16 samples/sec   Loss 0.2649   LearningRate 0.0031   Epoch: 16   Global Step: 274630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:19:14,524-Speed 3336.95 samples/sec   Loss 0.2618   LearningRate 0.0031   Epoch: 16   Global Step: 274640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:19:17,578-Speed 3353.34 samples/sec   Loss 0.2744   LearningRate 0.0031   Epoch: 16   Global Step: 274650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:20,657-Speed 3326.49 samples/sec   Loss 0.2789   LearningRate 0.0031   Epoch: 16   Global Step: 274660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:23,719-Speed 3345.12 samples/sec   Loss 0.2688   LearningRate 0.0031   Epoch: 16   Global Step: 274670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:26,785-Speed 3340.05 samples/sec   Loss 0.2699   LearningRate 0.0031   Epoch: 16   Global Step: 274680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:29,991-Speed 3194.26 samples/sec   Loss 0.2618   LearningRate 0.0031   Epoch: 16   Global Step: 274690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:33,139-Speed 3254.45 samples/sec   Loss 0.2654   LearningRate 0.0031   Epoch: 16   Global Step: 274700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:36,289-Speed 3251.74 samples/sec   Loss 0.2594   LearningRate 0.0031   Epoch: 16   Global Step: 274710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:39,362-Speed 3332.87 samples/sec   Loss 0.2624   LearningRate 0.0031   Epoch: 16   Global Step: 274720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:42,448-Speed 3319.04 samples/sec   Loss 0.2953   LearningRate 0.0031   Epoch: 16   Global Step: 274730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:45,514-Speed 3340.70 samples/sec   Loss 0.2742   LearningRate 0.0031   Epoch: 16   Global Step: 274740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:19:48,576-Speed 3344.36 samples/sec   Loss 0.2814   LearningRate 0.0031   Epoch: 16   Global Step: 274750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:19:51,687-Speed 3292.41 samples/sec   Loss 0.2731   LearningRate 0.0031   Epoch: 16   Global Step: 274760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:19:54,784-Speed 3307.97 samples/sec   Loss 0.2699   LearningRate 0.0031   Epoch: 16   Global Step: 274770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:19:57,931-Speed 3254.10 samples/sec   Loss 0.2552   LearningRate 0.0031   Epoch: 16   Global Step: 274780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:01,000-Speed 3337.90 samples/sec   Loss 0.2724   LearningRate 0.0031   Epoch: 16   Global Step: 274790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:04,081-Speed 3323.81 samples/sec   Loss 0.2673   LearningRate 0.0031   Epoch: 16   Global Step: 274800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:07,144-Speed 3344.30 samples/sec   Loss 0.2695   LearningRate 0.0031   Epoch: 16   Global Step: 274810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:10,277-Speed 3268.75 samples/sec   Loss 0.2790   LearningRate 0.0031   Epoch: 16   Global Step: 274820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:13,371-Speed 3310.32 samples/sec   Loss 0.2808   LearningRate 0.0031   Epoch: 16   Global Step: 274830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:16,543-Speed 3229.47 samples/sec   Loss 0.2641   LearningRate 0.0031   Epoch: 16   Global Step: 274840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:19,649-Speed 3296.76 samples/sec   Loss 0.2705   LearningRate 0.0031   Epoch: 16   Global Step: 274850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:22,786-Speed 3266.11 samples/sec   Loss 0.2715   LearningRate 0.0031   Epoch: 16   Global Step: 274860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:25,854-Speed 3337.77 samples/sec   Loss 0.2775   LearningRate 0.0031   Epoch: 16   Global Step: 274870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:28,959-Speed 3299.68 samples/sec   Loss 0.2640   LearningRate 0.0031   Epoch: 16   Global Step: 274880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:20:32,025-Speed 3340.75 samples/sec   Loss 0.2809   LearningRate 0.0031   Epoch: 16   Global Step: 274890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:20:35,146-Speed 3281.39 samples/sec   Loss 0.2688   LearningRate 0.0031   Epoch: 16   Global Step: 274900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:38,251-Speed 3298.09 samples/sec   Loss 0.2651   LearningRate 0.0031   Epoch: 16   Global Step: 274910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:41,320-Speed 3337.66 samples/sec   Loss 0.2750   LearningRate 0.0031   Epoch: 16   Global Step: 274920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:44,428-Speed 3295.30 samples/sec   Loss 0.2469   LearningRate 0.0031   Epoch: 16   Global Step: 274930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:47,509-Speed 3324.06 samples/sec   Loss 0.2728   LearningRate 0.0031   Epoch: 16   Global Step: 274940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:50,642-Speed 3270.05 samples/sec   Loss 0.2614   LearningRate 0.0031   Epoch: 16   Global Step: 274950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:54,515-Speed 2644.03 samples/sec   Loss 0.2756   LearningRate 0.0031   Epoch: 16   Global Step: 274960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:20:57,582-Speed 3340.15 samples/sec   Loss 0.2742   LearningRate 0.0031   Epoch: 16   Global Step: 274970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:00,648-Speed 3340.87 samples/sec   Loss 0.2724   LearningRate 0.0031   Epoch: 16   Global Step: 274980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:03,781-Speed 3268.27 samples/sec   Loss 0.2762   LearningRate 0.0031   Epoch: 16   Global Step: 274990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:06,839-Speed 3349.62 samples/sec   Loss 0.2818   LearningRate 0.0031   Epoch: 16   Global Step: 275000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:09,898-Speed 3347.97 samples/sec   Loss 0.2849   LearningRate 0.0031   Epoch: 16   Global Step: 275010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:12,972-Speed 3331.95 samples/sec   Loss 0.2852   LearningRate 0.0031   Epoch: 16   Global Step: 275020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:16,037-Speed 3341.93 samples/sec   Loss 0.2845   LearningRate 0.0031   Epoch: 16   Global Step: 275030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:19,172-Speed 3266.85 samples/sec   Loss 0.2658   LearningRate 0.0031   Epoch: 16   Global Step: 275040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:22,265-Speed 3311.22 samples/sec   Loss 0.2765   LearningRate 0.0031   Epoch: 16   Global Step: 275050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:25,414-Speed 3253.00 samples/sec   Loss 0.2617   LearningRate 0.0031   Epoch: 16   Global Step: 275060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:28,491-Speed 3328.77 samples/sec   Loss 0.2620   LearningRate 0.0031   Epoch: 16   Global Step: 275070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:31,565-Speed 3332.06 samples/sec   Loss 0.2613   LearningRate 0.0031   Epoch: 16   Global Step: 275080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:34,656-Speed 3314.18 samples/sec   Loss 0.2659   LearningRate 0.0031   Epoch: 16   Global Step: 275090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:21:37,733-Speed 3328.34 samples/sec   Loss 0.2652   LearningRate 0.0031   Epoch: 16   Global Step: 275100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:21:40,853-Speed 3282.00 samples/sec   Loss 0.2722   LearningRate 0.0031   Epoch: 16   Global Step: 275110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:21:43,975-Speed 3281.44 samples/sec   Loss 0.2650   LearningRate 0.0031   Epoch: 16   Global Step: 275120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:21:47,062-Speed 3318.02 samples/sec   Loss 0.2787   LearningRate 0.0031   Epoch: 16   Global Step: 275130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:21:50,185-Speed 3279.13 samples/sec   Loss 0.2811   LearningRate 0.0031   Epoch: 16   Global Step: 275140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:21:53,263-Speed 3327.63 samples/sec   Loss 0.2650   LearningRate 0.0031   Epoch: 16   Global Step: 275150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:21:56,339-Speed 3329.62 samples/sec   Loss 0.2665   LearningRate 0.0031   Epoch: 16   Global Step: 275160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:21:59,416-Speed 3329.12 samples/sec   Loss 0.2598   LearningRate 0.0031   Epoch: 16   Global Step: 275170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:02,510-Speed 3309.68 samples/sec   Loss 0.2834   LearningRate 0.0031   Epoch: 16   Global Step: 275180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:05,592-Speed 3324.07 samples/sec   Loss 0.2724   LearningRate 0.0031   Epoch: 16   Global Step: 275190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:08,665-Speed 3332.17 samples/sec   Loss 0.2591   LearningRate 0.0031   Epoch: 16   Global Step: 275200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:11,751-Speed 3319.10 samples/sec   Loss 0.2943   LearningRate 0.0031   Epoch: 16   Global Step: 275210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:14,903-Speed 3249.54 samples/sec   Loss 0.2646   LearningRate 0.0031   Epoch: 16   Global Step: 275220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:17,964-Speed 3346.71 samples/sec   Loss 0.2551   LearningRate 0.0031   Epoch: 16   Global Step: 275230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:21,031-Speed 3339.20 samples/sec   Loss 0.2776   LearningRate 0.0031   Epoch: 16   Global Step: 275240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:24,125-Speed 3310.70 samples/sec   Loss 0.2591   LearningRate 0.0031   Epoch: 16   Global Step: 275250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:27,238-Speed 3290.52 samples/sec   Loss 0.2588   LearningRate 0.0031   Epoch: 16   Global Step: 275260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:30,379-Speed 3260.35 samples/sec   Loss 0.2677   LearningRate 0.0031   Epoch: 16   Global Step: 275270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:22:33,462-Speed 3321.86 samples/sec   Loss 0.2683   LearningRate 0.0031   Epoch: 16   Global Step: 275280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:36,645-Speed 3217.67 samples/sec   Loss 0.2586   LearningRate 0.0031   Epoch: 16   Global Step: 275290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:39,719-Speed 3331.45 samples/sec   Loss 0.2588   LearningRate 0.0031   Epoch: 16   Global Step: 275300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:42,794-Speed 3331.18 samples/sec   Loss 0.2789   LearningRate 0.0031   Epoch: 16   Global Step: 275310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:45,876-Speed 3323.58 samples/sec   Loss 0.2849   LearningRate 0.0031   Epoch: 16   Global Step: 275320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:48,951-Speed 3330.89 samples/sec   Loss 0.2748   LearningRate 0.0031   Epoch: 16   Global Step: 275330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:52,033-Speed 3323.67 samples/sec   Loss 0.2711   LearningRate 0.0031   Epoch: 16   Global Step: 275340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:55,134-Speed 3302.19 samples/sec   Loss 0.2612   LearningRate 0.0031   Epoch: 16   Global Step: 275350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:22:58,217-Speed 3322.88 samples/sec   Loss 0.2498   LearningRate 0.0031   Epoch: 16   Global Step: 275360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:23:01,312-Speed 3308.92 samples/sec   Loss 0.2787   LearningRate 0.0031   Epoch: 16   Global Step: 275370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:23:04,400-Speed 3316.28 samples/sec   Loss 0.2639   LearningRate 0.0031   Epoch: 16   Global Step: 275380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:07,537-Speed 3265.16 samples/sec   Loss 0.2720   LearningRate 0.0031   Epoch: 16   Global Step: 275390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:10,618-Speed 3324.70 samples/sec   Loss 0.2533   LearningRate 0.0031   Epoch: 16   Global Step: 275400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:13,738-Speed 3282.25 samples/sec   Loss 0.2588   LearningRate 0.0031   Epoch: 16   Global Step: 275410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:16,802-Speed 3343.51 samples/sec   Loss 0.2916   LearningRate 0.0031   Epoch: 16   Global Step: 275420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:19,913-Speed 3291.77 samples/sec   Loss 0.3032   LearningRate 0.0031   Epoch: 16   Global Step: 275430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:23,029-Speed 3287.73 samples/sec   Loss 0.2612   LearningRate 0.0031   Epoch: 16   Global Step: 275440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:26,210-Speed 3219.45 samples/sec   Loss 0.2502   LearningRate 0.0031   Epoch: 16   Global Step: 275450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:29,309-Speed 3304.43 samples/sec   Loss 0.2737   LearningRate 0.0031   Epoch: 16   Global Step: 275460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:32,372-Speed 3344.42 samples/sec   Loss 0.3002   LearningRate 0.0031   Epoch: 16   Global Step: 275470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:35,425-Speed 3354.47 samples/sec   Loss 0.2786   LearningRate 0.0031   Epoch: 16   Global Step: 275480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:38,544-Speed 3283.75 samples/sec   Loss 0.2944   LearningRate 0.0031   Epoch: 16   Global Step: 275490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:23:41,612-Speed 3339.05 samples/sec   Loss 0.2633   LearningRate 0.0031   Epoch: 16   Global Step: 275500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:23:44,727-Speed 3287.70 samples/sec   Loss 0.2923   LearningRate 0.0031   Epoch: 16   Global Step: 275510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:23:47,801-Speed 3332.56 samples/sec   Loss 0.2692   LearningRate 0.0031   Epoch: 16   Global Step: 275520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:23:50,865-Speed 3342.26 samples/sec   Loss 0.2752   LearningRate 0.0030   Epoch: 16   Global Step: 275530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:23:53,938-Speed 3333.28 samples/sec   Loss 0.2665   LearningRate 0.0030   Epoch: 16   Global Step: 275540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:23:57,008-Speed 3336.24 samples/sec   Loss 0.2839   LearningRate 0.0030   Epoch: 16   Global Step: 275550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:00,075-Speed 3339.08 samples/sec   Loss 0.2732   LearningRate 0.0030   Epoch: 16   Global Step: 275560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:03,150-Speed 3331.48 samples/sec   Loss 0.2939   LearningRate 0.0030   Epoch: 16   Global Step: 275570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:06,221-Speed 3334.59 samples/sec   Loss 0.2580   LearningRate 0.0030   Epoch: 16   Global Step: 275580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:09,292-Speed 3335.02 samples/sec   Loss 0.2621   LearningRate 0.0030   Epoch: 16   Global Step: 275590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:12,495-Speed 3197.69 samples/sec   Loss 0.2803   LearningRate 0.0030   Epoch: 16   Global Step: 275600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:24:15,592-Speed 3307.32 samples/sec   Loss 0.2623   LearningRate 0.0030   Epoch: 16   Global Step: 275610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:24:18,695-Speed 3300.51 samples/sec   Loss 0.2880   LearningRate 0.0030   Epoch: 16   Global Step: 275620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:24:21,831-Speed 3266.22 samples/sec   Loss 0.2690   LearningRate 0.0030   Epoch: 16   Global Step: 275630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:24:24,919-Speed 3316.84 samples/sec   Loss 0.3139   LearningRate 0.0030   Epoch: 16   Global Step: 275640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:24:28,023-Speed 3299.57 samples/sec   Loss 0.2921   LearningRate 0.0030   Epoch: 16   Global Step: 275650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:24:31,105-Speed 3323.46 samples/sec   Loss 0.2675   LearningRate 0.0030   Epoch: 16   Global Step: 275660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:24:34,177-Speed 3334.10 samples/sec   Loss 0.2582   LearningRate 0.0030   Epoch: 16   Global Step: 275670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:24:37,246-Speed 3337.69 samples/sec   Loss 0.2907   LearningRate 0.0030   Epoch: 16   Global Step: 275680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:40,321-Speed 3330.78 samples/sec   Loss 0.2971   LearningRate 0.0030   Epoch: 16   Global Step: 275690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:43,397-Speed 3329.38 samples/sec   Loss 0.2886   LearningRate 0.0030   Epoch: 16   Global Step: 275700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:46,470-Speed 3333.74 samples/sec   Loss 0.2731   LearningRate 0.0030   Epoch: 16   Global Step: 275710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:49,555-Speed 3319.40 samples/sec   Loss 0.2824   LearningRate 0.0030   Epoch: 16   Global Step: 275720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:52,643-Speed 3316.97 samples/sec   Loss 0.2695   LearningRate 0.0030   Epoch: 16   Global Step: 275730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:55,741-Speed 3306.54 samples/sec   Loss 0.2750   LearningRate 0.0030   Epoch: 16   Global Step: 275740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:24:58,805-Speed 3341.87 samples/sec   Loss 0.2536   LearningRate 0.0030   Epoch: 16   Global Step: 275750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:01,929-Speed 3278.95 samples/sec   Loss 0.2894   LearningRate 0.0030   Epoch: 16   Global Step: 275760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:05,092-Speed 3238.51 samples/sec   Loss 0.2746   LearningRate 0.0030   Epoch: 16   Global Step: 275770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:08,176-Speed 3320.56 samples/sec   Loss 0.2856   LearningRate 0.0030   Epoch: 16   Global Step: 275780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:11,263-Speed 3318.27 samples/sec   Loss 0.2664   LearningRate 0.0030   Epoch: 16   Global Step: 275790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:14,366-Speed 3301.00 samples/sec   Loss 0.2891   LearningRate 0.0030   Epoch: 16   Global Step: 275800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:17,508-Speed 3260.10 samples/sec   Loss 0.2688   LearningRate 0.0030   Epoch: 16   Global Step: 275810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:20,579-Speed 3334.35 samples/sec   Loss 0.3021   LearningRate 0.0030   Epoch: 16   Global Step: 275820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:23,698-Speed 3284.13 samples/sec   Loss 0.2779   LearningRate 0.0030   Epoch: 16   Global Step: 275830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:26,787-Speed 3315.73 samples/sec   Loss 0.2720   LearningRate 0.0030   Epoch: 16   Global Step: 275840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:29,912-Speed 3277.77 samples/sec   Loss 0.2549   LearningRate 0.0030   Epoch: 16   Global Step: 275850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:32,999-Speed 3317.77 samples/sec   Loss 0.2806   LearningRate 0.0030   Epoch: 16   Global Step: 275860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:36,129-Speed 3272.79 samples/sec   Loss 0.2756   LearningRate 0.0030   Epoch: 16   Global Step: 275870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:39,270-Speed 3260.66 samples/sec   Loss 0.2444   LearningRate 0.0030   Epoch: 16   Global Step: 275880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:25:42,354-Speed 3320.93 samples/sec   Loss 0.2846   LearningRate 0.0030   Epoch: 16   Global Step: 275890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:45,587-Speed 3167.79 samples/sec   Loss 0.2838   LearningRate 0.0030   Epoch: 16   Global Step: 275900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:48,774-Speed 3213.60 samples/sec   Loss 0.2955   LearningRate 0.0030   Epoch: 16   Global Step: 275910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:51,919-Speed 3257.48 samples/sec   Loss 0.2637   LearningRate 0.0030   Epoch: 16   Global Step: 275920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:54,997-Speed 3327.58 samples/sec   Loss 0.2457   LearningRate 0.0030   Epoch: 16   Global Step: 275930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:25:58,207-Speed 3190.95 samples/sec   Loss 0.2931   LearningRate 0.0030   Epoch: 16   Global Step: 275940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:26:01,312-Speed 3298.35 samples/sec   Loss 0.2638   LearningRate 0.0030   Epoch: 16   Global Step: 275950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:26:04,382-Speed 3336.50 samples/sec   Loss 0.2868   LearningRate 0.0030   Epoch: 16   Global Step: 275960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:26:07,528-Speed 3255.38 samples/sec   Loss 0.2729   LearningRate 0.0030   Epoch: 16   Global Step: 275970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:26:10,705-Speed 3223.92 samples/sec   Loss 0.3126   LearningRate 0.0030   Epoch: 16   Global Step: 275980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:26:13,829-Speed 3278.63 samples/sec   Loss 0.2742   LearningRate 0.0030   Epoch: 16   Global Step: 275990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:26:16,916-Speed 3317.92 samples/sec   Loss 0.2788   LearningRate 0.0030   Epoch: 16   Global Step: 276000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:27:00,773-[lfw][276000]XNorm: 21.466773
Training: 2022-04-12 04:27:00,774-[lfw][276000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 04:27:00,774-[lfw][276000]Accuracy-Highest: 0.99817
Training: 2022-04-12 04:27:51,475-[cfp_fp][276000]XNorm: 22.947246
Training: 2022-04-12 04:27:51,476-[cfp_fp][276000]Accuracy-Flip: 0.99129+-0.00329
Training: 2022-04-12 04:27:51,476-[cfp_fp][276000]Accuracy-Highest: 0.99186
Training: 2022-04-12 04:28:35,055-[agedb_30][276000]XNorm: 23.308940
Training: 2022-04-12 04:28:35,055-[agedb_30][276000]Accuracy-Flip: 0.98550+-0.00563
Training: 2022-04-12 04:28:35,056-[agedb_30][276000]Accuracy-Highest: 0.98650
Training: 2022-04-12 04:28:38,121-Speed 72.52 samples/sec   Loss 0.2739   LearningRate 0.0030   Epoch: 16   Global Step: 276010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:28:41,175-Speed 3353.95 samples/sec   Loss 0.2831   LearningRate 0.0030   Epoch: 16   Global Step: 276020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:28:44,232-Speed 3350.19 samples/sec   Loss 0.2458   LearningRate 0.0030   Epoch: 16   Global Step: 276030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:28:47,280-Speed 3360.41 samples/sec   Loss 0.2743   LearningRate 0.0030   Epoch: 16   Global Step: 276040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:28:50,353-Speed 3333.32 samples/sec   Loss 0.2897   LearningRate 0.0030   Epoch: 16   Global Step: 276050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:28:53,411-Speed 3349.28 samples/sec   Loss 0.2688   LearningRate 0.0030   Epoch: 16   Global Step: 276060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:28:56,466-Speed 3351.85 samples/sec   Loss 0.2706   LearningRate 0.0030   Epoch: 16   Global Step: 276070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:28:59,549-Speed 3322.84 samples/sec   Loss 0.2678   LearningRate 0.0030   Epoch: 16   Global Step: 276080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:29:02,647-Speed 3305.73 samples/sec   Loss 0.2600   LearningRate 0.0030   Epoch: 16   Global Step: 276090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:29:05,691-Speed 3364.81 samples/sec   Loss 0.2614   LearningRate 0.0030   Epoch: 16   Global Step: 276100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:08,747-Speed 3351.73 samples/sec   Loss 0.2890   LearningRate 0.0030   Epoch: 16   Global Step: 276110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:11,857-Speed 3292.77 samples/sec   Loss 0.2530   LearningRate 0.0030   Epoch: 16   Global Step: 276120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:14,960-Speed 3301.03 samples/sec   Loss 0.2614   LearningRate 0.0030   Epoch: 16   Global Step: 276130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:18,159-Speed 3201.40 samples/sec   Loss 0.2508   LearningRate 0.0030   Epoch: 16   Global Step: 276140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:21,219-Speed 3346.96 samples/sec   Loss 0.2945   LearningRate 0.0030   Epoch: 16   Global Step: 276150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:24,374-Speed 3246.70 samples/sec   Loss 0.2749   LearningRate 0.0030   Epoch: 16   Global Step: 276160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:27,476-Speed 3301.85 samples/sec   Loss 0.2603   LearningRate 0.0030   Epoch: 16   Global Step: 276170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:30,549-Speed 3333.75 samples/sec   Loss 0.2675   LearningRate 0.0030   Epoch: 16   Global Step: 276180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:33,615-Speed 3339.82 samples/sec   Loss 0.2664   LearningRate 0.0030   Epoch: 16   Global Step: 276190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:29:36,725-Speed 3293.71 samples/sec   Loss 0.2666   LearningRate 0.0030   Epoch: 16   Global Step: 276200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:29:39,808-Speed 3321.85 samples/sec   Loss 0.2692   LearningRate 0.0030   Epoch: 16   Global Step: 276210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:29:42,865-Speed 3350.01 samples/sec   Loss 0.2649   LearningRate 0.0030   Epoch: 16   Global Step: 276220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:29:45,932-Speed 3340.01 samples/sec   Loss 0.2680   LearningRate 0.0030   Epoch: 16   Global Step: 276230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:29:49,004-Speed 3334.39 samples/sec   Loss 0.2647   LearningRate 0.0030   Epoch: 16   Global Step: 276240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:29:52,065-Speed 3346.74 samples/sec   Loss 0.2744   LearningRate 0.0030   Epoch: 16   Global Step: 276250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:29:55,127-Speed 3344.81 samples/sec   Loss 0.2821   LearningRate 0.0030   Epoch: 16   Global Step: 276260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:29:58,225-Speed 3305.98 samples/sec   Loss 0.2746   LearningRate 0.0030   Epoch: 16   Global Step: 276270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:01,309-Speed 3320.25 samples/sec   Loss 0.2716   LearningRate 0.0030   Epoch: 16   Global Step: 276280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:04,372-Speed 3344.84 samples/sec   Loss 0.2645   LearningRate 0.0030   Epoch: 16   Global Step: 276290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:07,471-Speed 3304.56 samples/sec   Loss 0.2974   LearningRate 0.0030   Epoch: 16   Global Step: 276300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:10,528-Speed 3350.71 samples/sec   Loss 0.2586   LearningRate 0.0030   Epoch: 16   Global Step: 276310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:13,603-Speed 3330.06 samples/sec   Loss 0.2787   LearningRate 0.0030   Epoch: 16   Global Step: 276320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:16,681-Speed 3327.77 samples/sec   Loss 0.2669   LearningRate 0.0030   Epoch: 16   Global Step: 276330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:19,794-Speed 3291.18 samples/sec   Loss 0.2710   LearningRate 0.0030   Epoch: 16   Global Step: 276340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:22,858-Speed 3342.42 samples/sec   Loss 0.2638   LearningRate 0.0030   Epoch: 16   Global Step: 276350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:25,956-Speed 3306.17 samples/sec   Loss 0.2631   LearningRate 0.0030   Epoch: 16   Global Step: 276360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:29,070-Speed 3288.40 samples/sec   Loss 0.2569   LearningRate 0.0030   Epoch: 16   Global Step: 276370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:32,147-Speed 3329.26 samples/sec   Loss 0.2757   LearningRate 0.0030   Epoch: 16   Global Step: 276380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:35,213-Speed 3340.72 samples/sec   Loss 0.2641   LearningRate 0.0030   Epoch: 16   Global Step: 276390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:38,342-Speed 3273.02 samples/sec   Loss 0.2876   LearningRate 0.0030   Epoch: 16   Global Step: 276400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:41,442-Speed 3303.88 samples/sec   Loss 0.2880   LearningRate 0.0030   Epoch: 16   Global Step: 276410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:44,501-Speed 3348.29 samples/sec   Loss 0.2732   LearningRate 0.0030   Epoch: 16   Global Step: 276420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:47,566-Speed 3341.75 samples/sec   Loss 0.2737   LearningRate 0.0030   Epoch: 16   Global Step: 276430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:50,666-Speed 3304.39 samples/sec   Loss 0.2854   LearningRate 0.0030   Epoch: 16   Global Step: 276440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:53,780-Speed 3288.34 samples/sec   Loss 0.2613   LearningRate 0.0030   Epoch: 16   Global Step: 276450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:56,841-Speed 3345.81 samples/sec   Loss 0.2654   LearningRate 0.0030   Epoch: 16   Global Step: 276460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:30:59,937-Speed 3309.07 samples/sec   Loss 0.2806   LearningRate 0.0030   Epoch: 16   Global Step: 276470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:31:03,016-Speed 3326.01 samples/sec   Loss 0.2754   LearningRate 0.0030   Epoch: 16   Global Step: 276480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:31:06,133-Speed 3286.27 samples/sec   Loss 0.2686   LearningRate 0.0029   Epoch: 16   Global Step: 276490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:31:09,192-Speed 3348.16 samples/sec   Loss 0.2678   LearningRate 0.0029   Epoch: 16   Global Step: 276500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:12,270-Speed 3327.56 samples/sec   Loss 0.2590   LearningRate 0.0029   Epoch: 16   Global Step: 276510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:15,385-Speed 3288.43 samples/sec   Loss 0.2566   LearningRate 0.0029   Epoch: 16   Global Step: 276520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:18,457-Speed 3334.14 samples/sec   Loss 0.2569   LearningRate 0.0029   Epoch: 16   Global Step: 276530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:21,573-Speed 3286.91 samples/sec   Loss 0.2787   LearningRate 0.0029   Epoch: 16   Global Step: 276540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:24,669-Speed 3308.20 samples/sec   Loss 0.2743   LearningRate 0.0029   Epoch: 16   Global Step: 276550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:27,745-Speed 3328.77 samples/sec   Loss 0.2806   LearningRate 0.0029   Epoch: 16   Global Step: 276560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:30,804-Speed 3348.55 samples/sec   Loss 0.2664   LearningRate 0.0029   Epoch: 16   Global Step: 276570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:33,886-Speed 3324.02 samples/sec   Loss 0.2425   LearningRate 0.0029   Epoch: 16   Global Step: 276580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:36,981-Speed 3309.18 samples/sec   Loss 0.2776   LearningRate 0.0029   Epoch: 16   Global Step: 276590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:31:40,098-Speed 3285.94 samples/sec   Loss 0.2840   LearningRate 0.0029   Epoch: 16   Global Step: 276600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:31:43,229-Speed 3271.69 samples/sec   Loss 0.2793   LearningRate 0.0029   Epoch: 16   Global Step: 276610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:31:46,290-Speed 3345.48 samples/sec   Loss 0.2853   LearningRate 0.0029   Epoch: 16   Global Step: 276620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:31:49,354-Speed 3342.66 samples/sec   Loss 0.2965   LearningRate 0.0029   Epoch: 16   Global Step: 276630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:31:52,420-Speed 3340.33 samples/sec   Loss 0.2818   LearningRate 0.0029   Epoch: 16   Global Step: 276640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:31:55,539-Speed 3283.84 samples/sec   Loss 0.2768   LearningRate 0.0029   Epoch: 16   Global Step: 276650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:31:58,621-Speed 3323.47 samples/sec   Loss 0.2907   LearningRate 0.0029   Epoch: 16   Global Step: 276660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:01,684-Speed 3344.29 samples/sec   Loss 0.2811   LearningRate 0.0029   Epoch: 16   Global Step: 276670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:04,745-Speed 3346.56 samples/sec   Loss 0.2623   LearningRate 0.0029   Epoch: 16   Global Step: 276680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:07,817-Speed 3333.81 samples/sec   Loss 0.2834   LearningRate 0.0029   Epoch: 16   Global Step: 276690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:10,974-Speed 3243.85 samples/sec   Loss 0.2733   LearningRate 0.0029   Epoch: 16   Global Step: 276700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:14,047-Speed 3333.37 samples/sec   Loss 0.2726   LearningRate 0.0029   Epoch: 16   Global Step: 276710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:17,118-Speed 3335.30 samples/sec   Loss 0.2691   LearningRate 0.0029   Epoch: 16   Global Step: 276720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:20,210-Speed 3311.84 samples/sec   Loss 0.2712   LearningRate 0.0029   Epoch: 16   Global Step: 276730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:23,283-Speed 3332.85 samples/sec   Loss 0.2680   LearningRate 0.0029   Epoch: 16   Global Step: 276740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:26,341-Speed 3349.71 samples/sec   Loss 0.2919   LearningRate 0.0029   Epoch: 16   Global Step: 276750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:32:29,420-Speed 3326.42 samples/sec   Loss 0.2840   LearningRate 0.0029   Epoch: 16   Global Step: 276760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:32:32,484-Speed 3343.38 samples/sec   Loss 0.2926   LearningRate 0.0029   Epoch: 16   Global Step: 276770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:32:35,598-Speed 3289.05 samples/sec   Loss 0.2905   LearningRate 0.0029   Epoch: 16   Global Step: 276780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:32:38,661-Speed 3344.39 samples/sec   Loss 0.2927   LearningRate 0.0029   Epoch: 16   Global Step: 276790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:32:41,847-Speed 3214.41 samples/sec   Loss 0.2711   LearningRate 0.0029   Epoch: 16   Global Step: 276800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:32:44,940-Speed 3311.63 samples/sec   Loss 0.2833   LearningRate 0.0029   Epoch: 16   Global Step: 276810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:32:48,024-Speed 3320.88 samples/sec   Loss 0.2679   LearningRate 0.0029   Epoch: 16   Global Step: 276820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:32:51,110-Speed 3318.28 samples/sec   Loss 0.2567   LearningRate 0.0029   Epoch: 16   Global Step: 276830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:32:54,238-Speed 3274.93 samples/sec   Loss 0.2504   LearningRate 0.0029   Epoch: 16   Global Step: 276840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:32:57,346-Speed 3295.61 samples/sec   Loss 0.2665   LearningRate 0.0029   Epoch: 16   Global Step: 276850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:00,478-Speed 3269.94 samples/sec   Loss 0.2640   LearningRate 0.0029   Epoch: 16   Global Step: 276860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:03,593-Speed 3288.00 samples/sec   Loss 0.2881   LearningRate 0.0029   Epoch: 16   Global Step: 276870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:06,671-Speed 3327.94 samples/sec   Loss 0.2942   LearningRate 0.0029   Epoch: 16   Global Step: 276880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:09,733-Speed 3344.29 samples/sec   Loss 0.2762   LearningRate 0.0029   Epoch: 16   Global Step: 276890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:12,802-Speed 3337.91 samples/sec   Loss 0.2779   LearningRate 0.0029   Epoch: 16   Global Step: 276900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:15,876-Speed 3331.46 samples/sec   Loss 0.2662   LearningRate 0.0029   Epoch: 16   Global Step: 276910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:18,933-Speed 3350.58 samples/sec   Loss 0.2654   LearningRate 0.0029   Epoch: 16   Global Step: 276920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:22,008-Speed 3330.67 samples/sec   Loss 0.2608   LearningRate 0.0029   Epoch: 16   Global Step: 276930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:25,090-Speed 3323.74 samples/sec   Loss 0.2755   LearningRate 0.0029   Epoch: 16   Global Step: 276940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:28,152-Speed 3345.15 samples/sec   Loss 0.2600   LearningRate 0.0029   Epoch: 16   Global Step: 276950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:31,241-Speed 3315.89 samples/sec   Loss 0.2710   LearningRate 0.0029   Epoch: 16   Global Step: 276960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:34,340-Speed 3304.62 samples/sec   Loss 0.2635   LearningRate 0.0029   Epoch: 16   Global Step: 276970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:37,404-Speed 3342.65 samples/sec   Loss 0.2659   LearningRate 0.0029   Epoch: 16   Global Step: 276980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:40,517-Speed 3290.12 samples/sec   Loss 0.2668   LearningRate 0.0029   Epoch: 16   Global Step: 276990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:43,603-Speed 3318.80 samples/sec   Loss 0.2751   LearningRate 0.0029   Epoch: 16   Global Step: 277000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:46,701-Speed 3306.88 samples/sec   Loss 0.2764   LearningRate 0.0029   Epoch: 16   Global Step: 277010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:49,879-Speed 3221.98 samples/sec   Loss 0.2823   LearningRate 0.0029   Epoch: 16   Global Step: 277020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:33:52,957-Speed 3328.51 samples/sec   Loss 0.2616   LearningRate 0.0029   Epoch: 16   Global Step: 277030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:33:56,021-Speed 3343.03 samples/sec   Loss 0.2870   LearningRate 0.0029   Epoch: 16   Global Step: 277040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:33:59,121-Speed 3303.74 samples/sec   Loss 0.2818   LearningRate 0.0029   Epoch: 16   Global Step: 277050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:02,187-Speed 3340.48 samples/sec   Loss 0.2818   LearningRate 0.0029   Epoch: 16   Global Step: 277060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:05,426-Speed 3162.54 samples/sec   Loss 0.2522   LearningRate 0.0029   Epoch: 16   Global Step: 277070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:08,543-Speed 3285.58 samples/sec   Loss 0.2599   LearningRate 0.0029   Epoch: 16   Global Step: 277080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:11,676-Speed 3268.80 samples/sec   Loss 0.2685   LearningRate 0.0029   Epoch: 16   Global Step: 277090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:14,785-Speed 3294.49 samples/sec   Loss 0.2428   LearningRate 0.0029   Epoch: 16   Global Step: 277100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:17,878-Speed 3311.20 samples/sec   Loss 0.2732   LearningRate 0.0029   Epoch: 16   Global Step: 277110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:20,949-Speed 3335.18 samples/sec   Loss 0.2661   LearningRate 0.0029   Epoch: 16   Global Step: 277120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:24,016-Speed 3339.71 samples/sec   Loss 0.2871   LearningRate 0.0029   Epoch: 16   Global Step: 277130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:34:27,106-Speed 3314.64 samples/sec   Loss 0.2701   LearningRate 0.0029   Epoch: 16   Global Step: 277140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:34:30,251-Speed 3256.71 samples/sec   Loss 0.2662   LearningRate 0.0029   Epoch: 16   Global Step: 277150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:34:33,371-Speed 3283.23 samples/sec   Loss 0.2643   LearningRate 0.0029   Epoch: 16   Global Step: 277160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:34:36,435-Speed 3342.04 samples/sec   Loss 0.2766   LearningRate 0.0029   Epoch: 16   Global Step: 277170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:34:39,510-Speed 3331.11 samples/sec   Loss 0.3006   LearningRate 0.0029   Epoch: 16   Global Step: 277180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:34:42,584-Speed 3332.30 samples/sec   Loss 0.2688   LearningRate 0.0029   Epoch: 16   Global Step: 277190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:34:45,660-Speed 3329.05 samples/sec   Loss 0.2861   LearningRate 0.0029   Epoch: 16   Global Step: 277200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:34:48,723-Speed 3343.99 samples/sec   Loss 0.2728   LearningRate 0.0029   Epoch: 16   Global Step: 277210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:34:51,771-Speed 3360.44 samples/sec   Loss 0.3001   LearningRate 0.0029   Epoch: 16   Global Step: 277220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:54,851-Speed 3325.69 samples/sec   Loss 0.2715   LearningRate 0.0029   Epoch: 16   Global Step: 277230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:34:57,911-Speed 3347.18 samples/sec   Loss 0.2580   LearningRate 0.0029   Epoch: 16   Global Step: 277240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:00,986-Speed 3330.59 samples/sec   Loss 0.2472   LearningRate 0.0029   Epoch: 16   Global Step: 277250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:04,102-Speed 3287.65 samples/sec   Loss 0.2741   LearningRate 0.0029   Epoch: 16   Global Step: 277260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:07,205-Speed 3300.43 samples/sec   Loss 0.2788   LearningRate 0.0029   Epoch: 16   Global Step: 277270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:10,290-Speed 3319.96 samples/sec   Loss 0.2798   LearningRate 0.0029   Epoch: 16   Global Step: 277280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:13,361-Speed 3334.93 samples/sec   Loss 0.2492   LearningRate 0.0029   Epoch: 16   Global Step: 277290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:16,431-Speed 3337.48 samples/sec   Loss 0.2700   LearningRate 0.0029   Epoch: 16   Global Step: 277300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:19,513-Speed 3322.66 samples/sec   Loss 0.2582   LearningRate 0.0029   Epoch: 16   Global Step: 277310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:22,563-Speed 3357.94 samples/sec   Loss 0.2729   LearningRate 0.0029   Epoch: 16   Global Step: 277320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:25,634-Speed 3334.92 samples/sec   Loss 0.2831   LearningRate 0.0029   Epoch: 16   Global Step: 277330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:28,707-Speed 3333.04 samples/sec   Loss 0.2681   LearningRate 0.0029   Epoch: 16   Global Step: 277340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:31,770-Speed 3344.08 samples/sec   Loss 0.2725   LearningRate 0.0029   Epoch: 16   Global Step: 277350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:34,834-Speed 3342.29 samples/sec   Loss 0.2678   LearningRate 0.0029   Epoch: 16   Global Step: 277360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:37,899-Speed 3342.30 samples/sec   Loss 0.2603   LearningRate 0.0029   Epoch: 16   Global Step: 277370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:40,959-Speed 3347.61 samples/sec   Loss 0.2654   LearningRate 0.0029   Epoch: 16   Global Step: 277380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:44,021-Speed 3344.87 samples/sec   Loss 0.3007   LearningRate 0.0029   Epoch: 16   Global Step: 277390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:47,083-Speed 3344.41 samples/sec   Loss 0.2816   LearningRate 0.0029   Epoch: 16   Global Step: 277400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:50,162-Speed 3326.39 samples/sec   Loss 0.2609   LearningRate 0.0029   Epoch: 16   Global Step: 277410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:35:53,251-Speed 3315.52 samples/sec   Loss 0.2698   LearningRate 0.0029   Epoch: 16   Global Step: 277420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:35:56,320-Speed 3337.69 samples/sec   Loss 0.2875   LearningRate 0.0029   Epoch: 16   Global Step: 277430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:35:59,401-Speed 3324.79 samples/sec   Loss 0.2812   LearningRate 0.0029   Epoch: 16   Global Step: 277440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:02,557-Speed 3244.70 samples/sec   Loss 0.2739   LearningRate 0.0029   Epoch: 16   Global Step: 277450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:05,676-Speed 3283.93 samples/sec   Loss 0.2705   LearningRate 0.0029   Epoch: 16   Global Step: 277460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:08,792-Speed 3287.36 samples/sec   Loss 0.2624   LearningRate 0.0028   Epoch: 16   Global Step: 277470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:11,907-Speed 3288.40 samples/sec   Loss 0.2933   LearningRate 0.0028   Epoch: 16   Global Step: 277480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:14,992-Speed 3320.11 samples/sec   Loss 0.2749   LearningRate 0.0028   Epoch: 16   Global Step: 277490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:18,113-Speed 3281.76 samples/sec   Loss 0.2898   LearningRate 0.0028   Epoch: 16   Global Step: 277500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:21,214-Speed 3302.00 samples/sec   Loss 0.2898   LearningRate 0.0028   Epoch: 16   Global Step: 277510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:24,281-Speed 3339.43 samples/sec   Loss 0.2940   LearningRate 0.0028   Epoch: 16   Global Step: 277520   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:36:27,373-Speed 3313.40 samples/sec   Loss 0.2931   LearningRate 0.0028   Epoch: 16   Global Step: 277530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:30,437-Speed 3341.80 samples/sec   Loss 0.2682   LearningRate 0.0028   Epoch: 16   Global Step: 277540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:33,548-Speed 3292.75 samples/sec   Loss 0.2833   LearningRate 0.0028   Epoch: 16   Global Step: 277550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:36,633-Speed 3320.08 samples/sec   Loss 0.2476   LearningRate 0.0028   Epoch: 16   Global Step: 277560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:39,704-Speed 3335.11 samples/sec   Loss 0.2764   LearningRate 0.0028   Epoch: 16   Global Step: 277570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:36:42,760-Speed 3351.78 samples/sec   Loss 0.2779   LearningRate 0.0028   Epoch: 16   Global Step: 277580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:36:45,828-Speed 3339.10 samples/sec   Loss 0.2691   LearningRate 0.0028   Epoch: 16   Global Step: 277590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:36:48,938-Speed 3292.47 samples/sec   Loss 0.2637   LearningRate 0.0028   Epoch: 16   Global Step: 277600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:36:52,077-Speed 3263.56 samples/sec   Loss 0.2771   LearningRate 0.0028   Epoch: 16   Global Step: 277610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:36:55,215-Speed 3263.38 samples/sec   Loss 0.2638   LearningRate 0.0028   Epoch: 16   Global Step: 277620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:36:58,370-Speed 3246.25 samples/sec   Loss 0.2868   LearningRate 0.0028   Epoch: 16   Global Step: 277630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:01,480-Speed 3293.44 samples/sec   Loss 0.2658   LearningRate 0.0028   Epoch: 16   Global Step: 277640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:04,637-Speed 3244.90 samples/sec   Loss 0.2864   LearningRate 0.0028   Epoch: 16   Global Step: 277650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:07,707-Speed 3335.92 samples/sec   Loss 0.2620   LearningRate 0.0028   Epoch: 16   Global Step: 277660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:10,787-Speed 3325.42 samples/sec   Loss 0.2677   LearningRate 0.0028   Epoch: 16   Global Step: 277670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:13,869-Speed 3323.40 samples/sec   Loss 0.2729   LearningRate 0.0028   Epoch: 16   Global Step: 277680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:37:16,950-Speed 3324.45 samples/sec   Loss 0.2582   LearningRate 0.0028   Epoch: 16   Global Step: 277690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:37:20,002-Speed 3355.24 samples/sec   Loss 0.2814   LearningRate 0.0028   Epoch: 16   Global Step: 277700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:23,090-Speed 3317.00 samples/sec   Loss 0.2763   LearningRate 0.0028   Epoch: 16   Global Step: 277710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:26,172-Speed 3323.12 samples/sec   Loss 0.2674   LearningRate 0.0028   Epoch: 16   Global Step: 277720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:29,275-Speed 3301.36 samples/sec   Loss 0.2743   LearningRate 0.0028   Epoch: 16   Global Step: 277730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:32,428-Speed 3248.76 samples/sec   Loss 0.2904   LearningRate 0.0028   Epoch: 16   Global Step: 277740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:35,504-Speed 3328.97 samples/sec   Loss 0.3000   LearningRate 0.0028   Epoch: 16   Global Step: 277750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:38,591-Speed 3318.57 samples/sec   Loss 0.2871   LearningRate 0.0028   Epoch: 16   Global Step: 277760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:41,653-Speed 3344.05 samples/sec   Loss 0.2493   LearningRate 0.0028   Epoch: 16   Global Step: 277770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:44,720-Speed 3340.37 samples/sec   Loss 0.2667   LearningRate 0.0028   Epoch: 16   Global Step: 277780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:47,798-Speed 3326.79 samples/sec   Loss 0.2711   LearningRate 0.0028   Epoch: 16   Global Step: 277790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:37:50,872-Speed 3332.06 samples/sec   Loss 0.2814   LearningRate 0.0028   Epoch: 16   Global Step: 277800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:37:53,965-Speed 3311.83 samples/sec   Loss 0.2565   LearningRate 0.0028   Epoch: 16   Global Step: 277810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:37:57,059-Speed 3310.49 samples/sec   Loss 0.2726   LearningRate 0.0028   Epoch: 16   Global Step: 277820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:00,126-Speed 3339.58 samples/sec   Loss 0.2877   LearningRate 0.0028   Epoch: 16   Global Step: 277830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:03,200-Speed 3331.11 samples/sec   Loss 0.2820   LearningRate 0.0028   Epoch: 16   Global Step: 277840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:06,265-Speed 3341.94 samples/sec   Loss 0.2866   LearningRate 0.0028   Epoch: 16   Global Step: 277850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:09,334-Speed 3337.12 samples/sec   Loss 0.2747   LearningRate 0.0028   Epoch: 16   Global Step: 277860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:12,412-Speed 3327.73 samples/sec   Loss 0.2699   LearningRate 0.0028   Epoch: 16   Global Step: 277870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:15,500-Speed 3317.35 samples/sec   Loss 0.2684   LearningRate 0.0028   Epoch: 16   Global Step: 277880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:18,610-Speed 3293.89 samples/sec   Loss 0.2964   LearningRate 0.0028   Epoch: 16   Global Step: 277890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:21,685-Speed 3330.70 samples/sec   Loss 0.2855   LearningRate 0.0028   Epoch: 16   Global Step: 277900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:24,788-Speed 3299.90 samples/sec   Loss 0.2738   LearningRate 0.0028   Epoch: 16   Global Step: 277910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:38:27,849-Speed 3346.99 samples/sec   Loss 0.2736   LearningRate 0.0028   Epoch: 16   Global Step: 277920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:38:30,953-Speed 3299.08 samples/sec   Loss 0.2700   LearningRate 0.0028   Epoch: 16   Global Step: 277930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:38:34,018-Speed 3341.34 samples/sec   Loss 0.2575   LearningRate 0.0028   Epoch: 16   Global Step: 277940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:38:37,097-Speed 3326.54 samples/sec   Loss 0.2740   LearningRate 0.0028   Epoch: 16   Global Step: 277950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:38:40,241-Speed 3258.42 samples/sec   Loss 0.2962   LearningRate 0.0028   Epoch: 16   Global Step: 277960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:38:43,312-Speed 3334.97 samples/sec   Loss 0.2689   LearningRate 0.0028   Epoch: 16   Global Step: 277970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:38:46,413-Speed 3302.78 samples/sec   Loss 0.2684   LearningRate 0.0028   Epoch: 16   Global Step: 277980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:38:49,496-Speed 3322.39 samples/sec   Loss 0.2878   LearningRate 0.0028   Epoch: 16   Global Step: 277990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:38:52,582-Speed 3318.52 samples/sec   Loss 0.2817   LearningRate 0.0028   Epoch: 16   Global Step: 278000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:39:36,181-[lfw][278000]XNorm: 21.068682
Training: 2022-04-12 04:39:36,182-[lfw][278000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-12 04:39:36,182-[lfw][278000]Accuracy-Highest: 0.99817
Training: 2022-04-12 04:40:26,894-[cfp_fp][278000]XNorm: 22.358443
Training: 2022-04-12 04:40:26,895-[cfp_fp][278000]Accuracy-Flip: 0.99186+-0.00373
Training: 2022-04-12 04:40:26,895-[cfp_fp][278000]Accuracy-Highest: 0.99186
Training: 2022-04-12 04:41:10,545-[agedb_30][278000]XNorm: 22.776555
Training: 2022-04-12 04:41:10,546-[agedb_30][278000]Accuracy-Flip: 0.98517+-0.00643
Training: 2022-04-12 04:41:10,546-[agedb_30][278000]Accuracy-Highest: 0.98650
Training: 2022-04-12 04:41:13,633-Speed 72.60 samples/sec   Loss 0.2559   LearningRate 0.0028   Epoch: 16   Global Step: 278010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:41:16,703-Speed 3335.94 samples/sec   Loss 0.2861   LearningRate 0.0028   Epoch: 16   Global Step: 278020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:19,791-Speed 3317.08 samples/sec   Loss 0.2768   LearningRate 0.0028   Epoch: 16   Global Step: 278030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:22,872-Speed 3323.43 samples/sec   Loss 0.2685   LearningRate 0.0028   Epoch: 16   Global Step: 278040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:25,928-Speed 3352.08 samples/sec   Loss 0.2839   LearningRate 0.0028   Epoch: 16   Global Step: 278050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:28,990-Speed 3344.46 samples/sec   Loss 0.2840   LearningRate 0.0028   Epoch: 16   Global Step: 278060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:32,069-Speed 3326.82 samples/sec   Loss 0.2945   LearningRate 0.0028   Epoch: 16   Global Step: 278070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:35,137-Speed 3337.74 samples/sec   Loss 0.2840   LearningRate 0.0028   Epoch: 16   Global Step: 278080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:38,242-Speed 3299.83 samples/sec   Loss 0.2790   LearningRate 0.0028   Epoch: 16   Global Step: 278090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:41,325-Speed 3322.07 samples/sec   Loss 0.2857   LearningRate 0.0028   Epoch: 16   Global Step: 278100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:44,404-Speed 3326.26 samples/sec   Loss 0.2815   LearningRate 0.0028   Epoch: 16   Global Step: 278110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:47,483-Speed 3326.57 samples/sec   Loss 0.2779   LearningRate 0.0028   Epoch: 16   Global Step: 278120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:50,548-Speed 3341.39 samples/sec   Loss 0.2933   LearningRate 0.0028   Epoch: 16   Global Step: 278130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:53,701-Speed 3248.90 samples/sec   Loss 0.2778   LearningRate 0.0028   Epoch: 16   Global Step: 278140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:56,856-Speed 3246.50 samples/sec   Loss 0.2547   LearningRate 0.0028   Epoch: 16   Global Step: 278150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:41:59,981-Speed 3277.49 samples/sec   Loss 0.3050   LearningRate 0.0028   Epoch: 16   Global Step: 278160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:03,080-Speed 3304.23 samples/sec   Loss 0.2744   LearningRate 0.0028   Epoch: 16   Global Step: 278170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:06,140-Speed 3348.26 samples/sec   Loss 0.2738   LearningRate 0.0028   Epoch: 16   Global Step: 278180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:09,265-Speed 3277.28 samples/sec   Loss 0.2694   LearningRate 0.0028   Epoch: 16   Global Step: 278190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:12,334-Speed 3337.14 samples/sec   Loss 0.2530   LearningRate 0.0028   Epoch: 16   Global Step: 278200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:15,400-Speed 3340.35 samples/sec   Loss 0.2716   LearningRate 0.0028   Epoch: 16   Global Step: 278210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:18,450-Speed 3358.53 samples/sec   Loss 0.2706   LearningRate 0.0028   Epoch: 16   Global Step: 278220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:21,535-Speed 3319.97 samples/sec   Loss 0.2692   LearningRate 0.0028   Epoch: 16   Global Step: 278230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:24,599-Speed 3344.24 samples/sec   Loss 0.2893   LearningRate 0.0028   Epoch: 16   Global Step: 278240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:27,696-Speed 3306.54 samples/sec   Loss 0.2711   LearningRate 0.0028   Epoch: 16   Global Step: 278250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:30,760-Speed 3343.33 samples/sec   Loss 0.2636   LearningRate 0.0028   Epoch: 16   Global Step: 278260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:33,821-Speed 3346.11 samples/sec   Loss 0.2938   LearningRate 0.0028   Epoch: 16   Global Step: 278270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:36,877-Speed 3351.71 samples/sec   Loss 0.2651   LearningRate 0.0028   Epoch: 16   Global Step: 278280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:39,938-Speed 3345.32 samples/sec   Loss 0.2652   LearningRate 0.0028   Epoch: 16   Global Step: 278290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:43,012-Speed 3332.56 samples/sec   Loss 0.2785   LearningRate 0.0028   Epoch: 16   Global Step: 278300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:46,075-Speed 3343.08 samples/sec   Loss 0.2475   LearningRate 0.0028   Epoch: 16   Global Step: 278310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:49,150-Speed 3331.51 samples/sec   Loss 0.2713   LearningRate 0.0028   Epoch: 16   Global Step: 278320   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:42:52,196-Speed 3361.64 samples/sec   Loss 0.2698   LearningRate 0.0028   Epoch: 16   Global Step: 278330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:42:55,255-Speed 3348.75 samples/sec   Loss 0.2652   LearningRate 0.0028   Epoch: 16   Global Step: 278340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:42:58,333-Speed 3328.06 samples/sec   Loss 0.2515   LearningRate 0.0028   Epoch: 16   Global Step: 278350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:01,392-Speed 3348.54 samples/sec   Loss 0.2870   LearningRate 0.0028   Epoch: 16   Global Step: 278360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:04,528-Speed 3265.97 samples/sec   Loss 0.2671   LearningRate 0.0028   Epoch: 16   Global Step: 278370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:07,672-Speed 3257.40 samples/sec   Loss 0.2722   LearningRate 0.0028   Epoch: 16   Global Step: 278380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:10,734-Speed 3345.08 samples/sec   Loss 0.2712   LearningRate 0.0028   Epoch: 16   Global Step: 278390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:13,794-Speed 3346.33 samples/sec   Loss 0.2681   LearningRate 0.0028   Epoch: 16   Global Step: 278400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:16,878-Speed 3321.48 samples/sec   Loss 0.2629   LearningRate 0.0028   Epoch: 16   Global Step: 278410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:19,949-Speed 3335.10 samples/sec   Loss 0.2872   LearningRate 0.0028   Epoch: 16   Global Step: 278420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:23,122-Speed 3228.23 samples/sec   Loss 0.2739   LearningRate 0.0028   Epoch: 16   Global Step: 278430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:26,248-Speed 3276.87 samples/sec   Loss 0.2838   LearningRate 0.0028   Epoch: 16   Global Step: 278440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:43:29,307-Speed 3347.89 samples/sec   Loss 0.2820   LearningRate 0.0028   Epoch: 16   Global Step: 278450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:43:32,374-Speed 3339.51 samples/sec   Loss 0.2738   LearningRate 0.0028   Epoch: 16   Global Step: 278460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:43:35,445-Speed 3334.96 samples/sec   Loss 0.2450   LearningRate 0.0027   Epoch: 16   Global Step: 278470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:43:38,503-Speed 3350.07 samples/sec   Loss 0.2694   LearningRate 0.0027   Epoch: 16   Global Step: 278480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:43:41,592-Speed 3315.37 samples/sec   Loss 0.2707   LearningRate 0.0027   Epoch: 16   Global Step: 278490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:43:44,715-Speed 3279.37 samples/sec   Loss 0.2652   LearningRate 0.0027   Epoch: 16   Global Step: 278500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:43:47,906-Speed 3209.58 samples/sec   Loss 0.2845   LearningRate 0.0027   Epoch: 16   Global Step: 278510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:43:51,030-Speed 3278.49 samples/sec   Loss 0.2796   LearningRate 0.0027   Epoch: 16   Global Step: 278520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:54,136-Speed 3298.34 samples/sec   Loss 0.2636   LearningRate 0.0027   Epoch: 16   Global Step: 278530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:43:57,210-Speed 3331.92 samples/sec   Loss 0.2911   LearningRate 0.0027   Epoch: 16   Global Step: 278540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:00,281-Speed 3335.01 samples/sec   Loss 0.2751   LearningRate 0.0027   Epoch: 16   Global Step: 278550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:03,437-Speed 3245.80 samples/sec   Loss 0.2721   LearningRate 0.0027   Epoch: 16   Global Step: 278560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:06,568-Speed 3270.80 samples/sec   Loss 0.2719   LearningRate 0.0027   Epoch: 16   Global Step: 278570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:09,696-Speed 3274.21 samples/sec   Loss 0.2852   LearningRate 0.0027   Epoch: 16   Global Step: 278580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:12,823-Speed 3275.51 samples/sec   Loss 0.2927   LearningRate 0.0027   Epoch: 16   Global Step: 278590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:15,893-Speed 3336.00 samples/sec   Loss 0.2682   LearningRate 0.0027   Epoch: 16   Global Step: 278600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:18,953-Speed 3347.10 samples/sec   Loss 0.2925   LearningRate 0.0027   Epoch: 16   Global Step: 278610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:22,015-Speed 3345.22 samples/sec   Loss 0.2860   LearningRate 0.0027   Epoch: 16   Global Step: 278620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:25,079-Speed 3343.73 samples/sec   Loss 0.2738   LearningRate 0.0027   Epoch: 16   Global Step: 278630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:28,137-Speed 3349.34 samples/sec   Loss 0.2647   LearningRate 0.0027   Epoch: 16   Global Step: 278640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:31,218-Speed 3323.45 samples/sec   Loss 0.2839   LearningRate 0.0027   Epoch: 16   Global Step: 278650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:34,285-Speed 3339.72 samples/sec   Loss 0.2773   LearningRate 0.0027   Epoch: 16   Global Step: 278660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:37,343-Speed 3349.54 samples/sec   Loss 0.2709   LearningRate 0.0027   Epoch: 16   Global Step: 278670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:40,398-Speed 3352.51 samples/sec   Loss 0.2821   LearningRate 0.0027   Epoch: 16   Global Step: 278680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:43,476-Speed 3327.52 samples/sec   Loss 0.2606   LearningRate 0.0027   Epoch: 16   Global Step: 278690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:46,533-Speed 3350.65 samples/sec   Loss 0.2634   LearningRate 0.0027   Epoch: 16   Global Step: 278700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-12 04:44:49,636-Speed 3301.16 samples/sec   Loss 0.2689   LearningRate 0.0027   Epoch: 16   Global Step: 278710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:52,725-Speed 3315.68 samples/sec   Loss 0.2563   LearningRate 0.0027   Epoch: 16   Global Step: 278720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:55,783-Speed 3348.99 samples/sec   Loss 0.2896   LearningRate 0.0027   Epoch: 16   Global Step: 278730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:44:58,847-Speed 3342.65 samples/sec   Loss 0.2818   LearningRate 0.0027   Epoch: 16   Global Step: 278740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:45:01,906-Speed 3348.82 samples/sec   Loss 0.2667   LearningRate 0.0027   Epoch: 16   Global Step: 278750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:45:04,988-Speed 3322.65 samples/sec   Loss 0.2701   LearningRate 0.0027   Epoch: 16   Global Step: 278760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:45:08,067-Speed 3326.84 samples/sec   Loss 0.2808   LearningRate 0.0027   Epoch: 16   Global Step: 278770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:45:11,201-Speed 3268.27 samples/sec   Loss 0.2878   LearningRate 0.0027   Epoch: 16   Global Step: 278780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:45:14,295-Speed 3310.38 samples/sec   Loss 0.2667   LearningRate 0.0027   Epoch: 16   Global Step: 278790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:45:17,355-Speed 3347.28 samples/sec   Loss 0.2841   LearningRate 0.0027   Epoch: 16   Global Step: 278800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:45:20,448-Speed 3311.30 samples/sec   Loss 0.2672   LearningRate 0.0027   Epoch: 16   Global Step: 278810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:23,528-Speed 3325.67 samples/sec   Loss 0.2714   LearningRate 0.0027   Epoch: 16   Global Step: 278820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:26,654-Speed 3276.16 samples/sec   Loss 0.2894   LearningRate 0.0027   Epoch: 16   Global Step: 278830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:29,820-Speed 3235.30 samples/sec   Loss 0.2811   LearningRate 0.0027   Epoch: 16   Global Step: 278840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:32,963-Speed 3258.82 samples/sec   Loss 0.2762   LearningRate 0.0027   Epoch: 16   Global Step: 278850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:36,077-Speed 3289.36 samples/sec   Loss 0.2578   LearningRate 0.0027   Epoch: 16   Global Step: 278860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:39,144-Speed 3339.04 samples/sec   Loss 0.2633   LearningRate 0.0027   Epoch: 16   Global Step: 278870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:42,212-Speed 3338.49 samples/sec   Loss 0.2758   LearningRate 0.0027   Epoch: 16   Global Step: 278880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:45,273-Speed 3346.36 samples/sec   Loss 0.2709   LearningRate 0.0027   Epoch: 16   Global Step: 278890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:48,330-Speed 3350.01 samples/sec   Loss 0.2838   LearningRate 0.0027   Epoch: 16   Global Step: 278900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:51,407-Speed 3328.34 samples/sec   Loss 0.2785   LearningRate 0.0027   Epoch: 16   Global Step: 278910   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:45:54,478-Speed 3335.83 samples/sec   Loss 0.2769   LearningRate 0.0027   Epoch: 16   Global Step: 278920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:45:57,573-Speed 3308.85 samples/sec   Loss 0.2862   LearningRate 0.0027   Epoch: 16   Global Step: 278930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:00,682-Speed 3294.77 samples/sec   Loss 0.2564   LearningRate 0.0027   Epoch: 16   Global Step: 278940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:03,777-Speed 3309.37 samples/sec   Loss 0.2753   LearningRate 0.0027   Epoch: 16   Global Step: 278950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:06,874-Speed 3307.10 samples/sec   Loss 0.2490   LearningRate 0.0027   Epoch: 16   Global Step: 278960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:09,971-Speed 3307.81 samples/sec   Loss 0.2701   LearningRate 0.0027   Epoch: 16   Global Step: 278970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:13,032-Speed 3345.50 samples/sec   Loss 0.2891   LearningRate 0.0027   Epoch: 16   Global Step: 278980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:16,177-Speed 3256.75 samples/sec   Loss 0.2820   LearningRate 0.0027   Epoch: 16   Global Step: 278990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:19,234-Speed 3350.12 samples/sec   Loss 0.2761   LearningRate 0.0027   Epoch: 16   Global Step: 279000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:22,298-Speed 3343.64 samples/sec   Loss 0.2697   LearningRate 0.0027   Epoch: 16   Global Step: 279010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:25,369-Speed 3334.39 samples/sec   Loss 0.2830   LearningRate 0.0027   Epoch: 16   Global Step: 279020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:28,430-Speed 3346.79 samples/sec   Loss 0.2885   LearningRate 0.0027   Epoch: 16   Global Step: 279030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:31,508-Speed 3326.60 samples/sec   Loss 0.2727   LearningRate 0.0027   Epoch: 16   Global Step: 279040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:46:34,623-Speed 3288.78 samples/sec   Loss 0.2813   LearningRate 0.0027   Epoch: 16   Global Step: 279050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:37,753-Speed 3271.97 samples/sec   Loss 0.2857   LearningRate 0.0027   Epoch: 16   Global Step: 279060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:40,826-Speed 3333.77 samples/sec   Loss 0.2765   LearningRate 0.0027   Epoch: 16   Global Step: 279070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:43,921-Speed 3308.98 samples/sec   Loss 0.2847   LearningRate 0.0027   Epoch: 16   Global Step: 279080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:47,012-Speed 3313.64 samples/sec   Loss 0.2693   LearningRate 0.0027   Epoch: 16   Global Step: 279090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:50,070-Speed 3349.52 samples/sec   Loss 0.2791   LearningRate 0.0027   Epoch: 16   Global Step: 279100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:53,129-Speed 3348.14 samples/sec   Loss 0.2555   LearningRate 0.0027   Epoch: 16   Global Step: 279110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:56,205-Speed 3329.97 samples/sec   Loss 0.2739   LearningRate 0.0027   Epoch: 16   Global Step: 279120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:46:59,312-Speed 3295.88 samples/sec   Loss 0.2780   LearningRate 0.0027   Epoch: 16   Global Step: 279130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:02,429-Speed 3285.99 samples/sec   Loss 0.2691   LearningRate 0.0027   Epoch: 16   Global Step: 279140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:05,515-Speed 3319.76 samples/sec   Loss 0.2737   LearningRate 0.0027   Epoch: 16   Global Step: 279150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:08,596-Speed 3324.05 samples/sec   Loss 0.2771   LearningRate 0.0027   Epoch: 16   Global Step: 279160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:11,692-Speed 3308.03 samples/sec   Loss 0.2606   LearningRate 0.0027   Epoch: 16   Global Step: 279170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:14,757-Speed 3341.72 samples/sec   Loss 0.2505   LearningRate 0.0027   Epoch: 16   Global Step: 279180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:17,842-Speed 3320.81 samples/sec   Loss 0.2761   LearningRate 0.0027   Epoch: 16   Global Step: 279190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:20,939-Speed 3306.82 samples/sec   Loss 0.2857   LearningRate 0.0027   Epoch: 16   Global Step: 279200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:24,034-Speed 3308.44 samples/sec   Loss 0.2662   LearningRate 0.0027   Epoch: 16   Global Step: 279210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:27,137-Speed 3300.79 samples/sec   Loss 0.2901   LearningRate 0.0027   Epoch: 16   Global Step: 279220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:30,210-Speed 3334.09 samples/sec   Loss 0.2637   LearningRate 0.0027   Epoch: 16   Global Step: 279230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:33,274-Speed 3342.98 samples/sec   Loss 0.2626   LearningRate 0.0027   Epoch: 16   Global Step: 279240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:36,333-Speed 3348.10 samples/sec   Loss 0.2570   LearningRate 0.0027   Epoch: 16   Global Step: 279250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:39,409-Speed 3330.25 samples/sec   Loss 0.2664   LearningRate 0.0027   Epoch: 16   Global Step: 279260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:42,488-Speed 3326.09 samples/sec   Loss 0.2689   LearningRate 0.0027   Epoch: 16   Global Step: 279270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:45,549-Speed 3345.41 samples/sec   Loss 0.2865   LearningRate 0.0027   Epoch: 16   Global Step: 279280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:48,645-Speed 3308.83 samples/sec   Loss 0.2891   LearningRate 0.0027   Epoch: 16   Global Step: 279290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:51,785-Speed 3261.39 samples/sec   Loss 0.2696   LearningRate 0.0027   Epoch: 16   Global Step: 279300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:54,883-Speed 3306.58 samples/sec   Loss 0.2658   LearningRate 0.0027   Epoch: 16   Global Step: 279310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:47:58,011-Speed 3274.06 samples/sec   Loss 0.2601   LearningRate 0.0027   Epoch: 16   Global Step: 279320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:01,139-Speed 3274.63 samples/sec   Loss 0.2738   LearningRate 0.0027   Epoch: 16   Global Step: 279330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:04,308-Speed 3232.69 samples/sec   Loss 0.2704   LearningRate 0.0027   Epoch: 16   Global Step: 279340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:07,390-Speed 3323.22 samples/sec   Loss 0.2749   LearningRate 0.0027   Epoch: 16   Global Step: 279350   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:48:10,445-Speed 3352.33 samples/sec   Loss 0.2693   LearningRate 0.0027   Epoch: 16   Global Step: 279360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:13,635-Speed 3210.17 samples/sec   Loss 0.2702   LearningRate 0.0027   Epoch: 16   Global Step: 279370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:16,799-Speed 3237.52 samples/sec   Loss 0.2778   LearningRate 0.0027   Epoch: 16   Global Step: 279380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:19,864-Speed 3342.10 samples/sec   Loss 0.2655   LearningRate 0.0027   Epoch: 16   Global Step: 279390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:22,948-Speed 3320.62 samples/sec   Loss 0.2790   LearningRate 0.0027   Epoch: 16   Global Step: 279400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:26,033-Speed 3319.90 samples/sec   Loss 0.2752   LearningRate 0.0027   Epoch: 16   Global Step: 279410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:29,095-Speed 3345.93 samples/sec   Loss 0.2858   LearningRate 0.0027   Epoch: 16   Global Step: 279420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:32,185-Speed 3314.73 samples/sec   Loss 0.2635   LearningRate 0.0027   Epoch: 16   Global Step: 279430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:35,246-Speed 3345.27 samples/sec   Loss 0.2708   LearningRate 0.0027   Epoch: 16   Global Step: 279440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:48:38,314-Speed 3338.39 samples/sec   Loss 0.2620   LearningRate 0.0027   Epoch: 16   Global Step: 279450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:48:41,392-Speed 3328.43 samples/sec   Loss 0.2631   LearningRate 0.0027   Epoch: 16   Global Step: 279460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:48:44,510-Speed 3284.40 samples/sec   Loss 0.2811   LearningRate 0.0027   Epoch: 16   Global Step: 279470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:48:47,708-Speed 3202.91 samples/sec   Loss 0.2667   LearningRate 0.0026   Epoch: 16   Global Step: 279480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:48:50,806-Speed 3306.03 samples/sec   Loss 0.2569   LearningRate 0.0026   Epoch: 16   Global Step: 279490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:48:53,895-Speed 3315.50 samples/sec   Loss 0.2956   LearningRate 0.0026   Epoch: 16   Global Step: 279500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:48:57,042-Speed 3255.15 samples/sec   Loss 0.2948   LearningRate 0.0026   Epoch: 16   Global Step: 279510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:49:00,138-Speed 3308.55 samples/sec   Loss 0.2850   LearningRate 0.0026   Epoch: 16   Global Step: 279520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:49:03,287-Speed 3252.52 samples/sec   Loss 0.2747   LearningRate 0.0026   Epoch: 16   Global Step: 279530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:49:06,348-Speed 3345.49 samples/sec   Loss 0.2733   LearningRate 0.0026   Epoch: 16   Global Step: 279540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:49:09,419-Speed 3335.74 samples/sec   Loss 0.2732   LearningRate 0.0026   Epoch: 16   Global Step: 279550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:12,490-Speed 3334.94 samples/sec   Loss 0.2649   LearningRate 0.0026   Epoch: 16   Global Step: 279560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:15,592-Speed 3301.62 samples/sec   Loss 0.2822   LearningRate 0.0026   Epoch: 16   Global Step: 279570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:18,665-Speed 3333.45 samples/sec   Loss 0.2812   LearningRate 0.0026   Epoch: 16   Global Step: 279580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:21,827-Speed 3239.80 samples/sec   Loss 0.2942   LearningRate 0.0026   Epoch: 16   Global Step: 279590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:24,890-Speed 3343.83 samples/sec   Loss 0.2684   LearningRate 0.0026   Epoch: 16   Global Step: 279600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:27,963-Speed 3332.87 samples/sec   Loss 0.2717   LearningRate 0.0026   Epoch: 16   Global Step: 279610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:31,079-Speed 3286.99 samples/sec   Loss 0.2820   LearningRate 0.0026   Epoch: 16   Global Step: 279620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:34,159-Speed 3325.47 samples/sec   Loss 0.2802   LearningRate 0.0026   Epoch: 16   Global Step: 279630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:37,281-Speed 3280.54 samples/sec   Loss 0.2821   LearningRate 0.0026   Epoch: 16   Global Step: 279640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:40,381-Speed 3303.98 samples/sec   Loss 0.2736   LearningRate 0.0026   Epoch: 16   Global Step: 279650   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:49:43,442-Speed 3345.54 samples/sec   Loss 0.2668   LearningRate 0.0026   Epoch: 16   Global Step: 279660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:46,530-Speed 3317.66 samples/sec   Loss 0.2644   LearningRate 0.0026   Epoch: 16   Global Step: 279670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:49,597-Speed 3338.96 samples/sec   Loss 0.2747   LearningRate 0.0026   Epoch: 16   Global Step: 279680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:52,667-Speed 3336.37 samples/sec   Loss 0.2687   LearningRate 0.0026   Epoch: 16   Global Step: 279690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:55,740-Speed 3332.64 samples/sec   Loss 0.2740   LearningRate 0.0026   Epoch: 16   Global Step: 279700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:49:58,818-Speed 3327.88 samples/sec   Loss 0.2958   LearningRate 0.0026   Epoch: 16   Global Step: 279710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:01,881-Speed 3343.62 samples/sec   Loss 0.2864   LearningRate 0.0026   Epoch: 16   Global Step: 279720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:04,993-Speed 3291.00 samples/sec   Loss 0.2655   LearningRate 0.0026   Epoch: 16   Global Step: 279730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:08,056-Speed 3344.10 samples/sec   Loss 0.2826   LearningRate 0.0026   Epoch: 16   Global Step: 279740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:11,143-Speed 3318.19 samples/sec   Loss 0.2772   LearningRate 0.0026   Epoch: 16   Global Step: 279750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:14,273-Speed 3272.77 samples/sec   Loss 0.2747   LearningRate 0.0026   Epoch: 16   Global Step: 279760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:17,354-Speed 3324.11 samples/sec   Loss 0.2661   LearningRate 0.0026   Epoch: 16   Global Step: 279770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:20,443-Speed 3315.11 samples/sec   Loss 0.2629   LearningRate 0.0026   Epoch: 16   Global Step: 279780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:23,519-Speed 3330.65 samples/sec   Loss 0.2778   LearningRate 0.0026   Epoch: 16   Global Step: 279790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:26,595-Speed 3329.50 samples/sec   Loss 0.2778   LearningRate 0.0026   Epoch: 16   Global Step: 279800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:29,683-Speed 3316.43 samples/sec   Loss 0.2702   LearningRate 0.0026   Epoch: 16   Global Step: 279810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:32,755-Speed 3333.75 samples/sec   Loss 0.2969   LearningRate 0.0026   Epoch: 16   Global Step: 279820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:35,829-Speed 3332.57 samples/sec   Loss 0.2911   LearningRate 0.0026   Epoch: 16   Global Step: 279830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:38,939-Speed 3293.05 samples/sec   Loss 0.2911   LearningRate 0.0026   Epoch: 16   Global Step: 279840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:42,031-Speed 3312.89 samples/sec   Loss 0.2807   LearningRate 0.0026   Epoch: 16   Global Step: 279850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:45,085-Speed 3354.32 samples/sec   Loss 0.2995   LearningRate 0.0026   Epoch: 16   Global Step: 279860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:48,161-Speed 3329.09 samples/sec   Loss 0.2750   LearningRate 0.0026   Epoch: 16   Global Step: 279870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:51,235-Speed 3332.27 samples/sec   Loss 0.2730   LearningRate 0.0026   Epoch: 16   Global Step: 279880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:54,322-Speed 3317.86 samples/sec   Loss 0.2661   LearningRate 0.0026   Epoch: 16   Global Step: 279890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:50:57,392-Speed 3335.78 samples/sec   Loss 0.2772   LearningRate 0.0026   Epoch: 16   Global Step: 279900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:00,482-Speed 3314.44 samples/sec   Loss 0.2743   LearningRate 0.0026   Epoch: 16   Global Step: 279910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:03,577-Speed 3309.45 samples/sec   Loss 0.2710   LearningRate 0.0026   Epoch: 16   Global Step: 279920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:06,654-Speed 3329.03 samples/sec   Loss 0.3000   LearningRate 0.0026   Epoch: 16   Global Step: 279930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:09,725-Speed 3335.31 samples/sec   Loss 0.2810   LearningRate 0.0026   Epoch: 16   Global Step: 279940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:12,896-Speed 3229.77 samples/sec   Loss 0.2764   LearningRate 0.0026   Epoch: 16   Global Step: 279950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:16,011-Speed 3288.90 samples/sec   Loss 0.2747   LearningRate 0.0026   Epoch: 16   Global Step: 279960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:19,228-Speed 3183.80 samples/sec   Loss 0.2825   LearningRate 0.0026   Epoch: 16   Global Step: 279970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:22,315-Speed 3316.96 samples/sec   Loss 0.2761   LearningRate 0.0026   Epoch: 16   Global Step: 279980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:25,428-Speed 3290.36 samples/sec   Loss 0.2697   LearningRate 0.0026   Epoch: 16   Global Step: 279990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:51:28,496-Speed 3338.66 samples/sec   Loss 0.2718   LearningRate 0.0026   Epoch: 16   Global Step: 280000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:52:12,353-[lfw][280000]XNorm: 21.511452
Training: 2022-04-12 04:52:12,354-[lfw][280000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-12 04:52:12,354-[lfw][280000]Accuracy-Highest: 0.99817
Training: 2022-04-12 04:53:03,256-[cfp_fp][280000]XNorm: 23.007266
Training: 2022-04-12 04:53:03,257-[cfp_fp][280000]Accuracy-Flip: 0.99129+-0.00401
Training: 2022-04-12 04:53:03,257-[cfp_fp][280000]Accuracy-Highest: 0.99186
Training: 2022-04-12 04:53:47,151-[agedb_30][280000]XNorm: 23.190409
Training: 2022-04-12 04:53:47,152-[agedb_30][280000]Accuracy-Flip: 0.98467+-0.00653
Training: 2022-04-12 04:53:47,152-[agedb_30][280000]Accuracy-Highest: 0.98650
Training: 2022-04-12 04:53:50,225-Speed 72.25 samples/sec   Loss 0.2703   LearningRate 0.0026   Epoch: 16   Global Step: 280010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:53:53,285-Speed 3347.07 samples/sec   Loss 0.2866   LearningRate 0.0026   Epoch: 16   Global Step: 280020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:53:56,342-Speed 3350.39 samples/sec   Loss 0.2656   LearningRate 0.0026   Epoch: 16   Global Step: 280030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:53:59,389-Speed 3361.46 samples/sec   Loss 0.2745   LearningRate 0.0026   Epoch: 16   Global Step: 280040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:02,540-Speed 3250.04 samples/sec   Loss 0.2918   LearningRate 0.0026   Epoch: 16   Global Step: 280050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:05,586-Speed 3363.10 samples/sec   Loss 0.2656   LearningRate 0.0026   Epoch: 16   Global Step: 280060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:08,664-Speed 3327.80 samples/sec   Loss 0.2895   LearningRate 0.0026   Epoch: 16   Global Step: 280070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:11,796-Speed 3270.12 samples/sec   Loss 0.2692   LearningRate 0.0026   Epoch: 16   Global Step: 280080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:14,850-Speed 3354.16 samples/sec   Loss 0.2854   LearningRate 0.0026   Epoch: 16   Global Step: 280090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:17,905-Speed 3351.92 samples/sec   Loss 0.2650   LearningRate 0.0026   Epoch: 16   Global Step: 280100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:20,990-Speed 3319.73 samples/sec   Loss 0.2804   LearningRate 0.0026   Epoch: 16   Global Step: 280110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:24,084-Speed 3310.61 samples/sec   Loss 0.2842   LearningRate 0.0026   Epoch: 16   Global Step: 280120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:27,149-Speed 3341.34 samples/sec   Loss 0.2736   LearningRate 0.0026   Epoch: 16   Global Step: 280130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:30,258-Speed 3294.48 samples/sec   Loss 0.2743   LearningRate 0.0026   Epoch: 16   Global Step: 280140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:33,333-Speed 3330.90 samples/sec   Loss 0.2633   LearningRate 0.0026   Epoch: 16   Global Step: 280150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:36,396-Speed 3343.96 samples/sec   Loss 0.2835   LearningRate 0.0026   Epoch: 16   Global Step: 280160   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:54:39,440-Speed 3364.81 samples/sec   Loss 0.2802   LearningRate 0.0026   Epoch: 16   Global Step: 280170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:42,500-Speed 3348.03 samples/sec   Loss 0.2816   LearningRate 0.0026   Epoch: 16   Global Step: 280180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:54:45,563-Speed 3343.36 samples/sec   Loss 0.2858   LearningRate 0.0026   Epoch: 16   Global Step: 280190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:54:48,626-Speed 3344.25 samples/sec   Loss 0.2699   LearningRate 0.0026   Epoch: 16   Global Step: 280200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:54:51,690-Speed 3342.57 samples/sec   Loss 0.2437   LearningRate 0.0026   Epoch: 16   Global Step: 280210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:54:54,744-Speed 3353.53 samples/sec   Loss 0.2766   LearningRate 0.0026   Epoch: 16   Global Step: 280220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:54:57,809-Speed 3341.91 samples/sec   Loss 0.2731   LearningRate 0.0026   Epoch: 16   Global Step: 280230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:55:00,906-Speed 3306.88 samples/sec   Loss 0.2709   LearningRate 0.0026   Epoch: 16   Global Step: 280240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:55:03,963-Speed 3351.08 samples/sec   Loss 0.2723   LearningRate 0.0026   Epoch: 16   Global Step: 280250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:55:07,110-Speed 3254.48 samples/sec   Loss 0.2822   LearningRate 0.0026   Epoch: 16   Global Step: 280260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:55:10,166-Speed 3352.22 samples/sec   Loss 0.2656   LearningRate 0.0026   Epoch: 16   Global Step: 280270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:55:13,299-Speed 3269.04 samples/sec   Loss 0.2916   LearningRate 0.0026   Epoch: 16   Global Step: 280280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:55:16,400-Speed 3302.82 samples/sec   Loss 0.2791   LearningRate 0.0026   Epoch: 16   Global Step: 280290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:19,468-Speed 3337.85 samples/sec   Loss 0.2701   LearningRate 0.0026   Epoch: 16   Global Step: 280300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:22,590-Speed 3281.24 samples/sec   Loss 0.2838   LearningRate 0.0026   Epoch: 16   Global Step: 280310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:25,735-Speed 3255.98 samples/sec   Loss 0.2710   LearningRate 0.0026   Epoch: 16   Global Step: 280320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:28,810-Speed 3331.16 samples/sec   Loss 0.2598   LearningRate 0.0026   Epoch: 16   Global Step: 280330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:31,872-Speed 3345.56 samples/sec   Loss 0.2896   LearningRate 0.0026   Epoch: 16   Global Step: 280340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:34,934-Speed 3345.20 samples/sec   Loss 0.2861   LearningRate 0.0026   Epoch: 16   Global Step: 280350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:38,009-Speed 3330.10 samples/sec   Loss 0.2848   LearningRate 0.0026   Epoch: 16   Global Step: 280360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:41,070-Speed 3345.98 samples/sec   Loss 0.2908   LearningRate 0.0026   Epoch: 16   Global Step: 280370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:44,133-Speed 3344.38 samples/sec   Loss 0.2689   LearningRate 0.0026   Epoch: 16   Global Step: 280380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:55:47,201-Speed 3338.26 samples/sec   Loss 0.2975   LearningRate 0.0026   Epoch: 16   Global Step: 280390   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:55:50,265-Speed 3342.72 samples/sec   Loss 0.2695   LearningRate 0.0026   Epoch: 16   Global Step: 280400   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:55:53,339-Speed 3332.22 samples/sec   Loss 0.3043   LearningRate 0.0026   Epoch: 16   Global Step: 280410   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:55:56,414-Speed 3330.64 samples/sec   Loss 0.2715   LearningRate 0.0026   Epoch: 16   Global Step: 280420   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:55:59,488-Speed 3332.00 samples/sec   Loss 0.2762   LearningRate 0.0026   Epoch: 16   Global Step: 280430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:02,544-Speed 3351.93 samples/sec   Loss 0.2588   LearningRate 0.0026   Epoch: 16   Global Step: 280440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:05,602-Speed 3348.88 samples/sec   Loss 0.2635   LearningRate 0.0026   Epoch: 16   Global Step: 280450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:08,677-Speed 3331.16 samples/sec   Loss 0.2754   LearningRate 0.0026   Epoch: 16   Global Step: 280460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:11,821-Speed 3257.23 samples/sec   Loss 0.2820   LearningRate 0.0026   Epoch: 16   Global Step: 280470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:14,928-Speed 3296.62 samples/sec   Loss 0.2735   LearningRate 0.0026   Epoch: 16   Global Step: 280480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:18,015-Speed 3317.84 samples/sec   Loss 0.2869   LearningRate 0.0026   Epoch: 16   Global Step: 280490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:21,087-Speed 3333.82 samples/sec   Loss 0.2761   LearningRate 0.0026   Epoch: 16   Global Step: 280500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:24,145-Speed 3350.69 samples/sec   Loss 0.2818   LearningRate 0.0026   Epoch: 16   Global Step: 280510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:27,205-Speed 3346.38 samples/sec   Loss 0.2648   LearningRate 0.0025   Epoch: 16   Global Step: 280520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:30,281-Speed 3330.55 samples/sec   Loss 0.3024   LearningRate 0.0025   Epoch: 16   Global Step: 280530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:33,362-Speed 3324.25 samples/sec   Loss 0.2742   LearningRate 0.0025   Epoch: 16   Global Step: 280540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:36,424-Speed 3344.38 samples/sec   Loss 0.2939   LearningRate 0.0025   Epoch: 16   Global Step: 280550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:39,498-Speed 3332.46 samples/sec   Loss 0.2727   LearningRate 0.0025   Epoch: 16   Global Step: 280560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:42,557-Speed 3347.93 samples/sec   Loss 0.2788   LearningRate 0.0025   Epoch: 16   Global Step: 280570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:45,665-Speed 3295.69 samples/sec   Loss 0.2753   LearningRate 0.0025   Epoch: 16   Global Step: 280580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:56:48,717-Speed 3355.03 samples/sec   Loss 0.2917   LearningRate 0.0025   Epoch: 16   Global Step: 280590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:56:51,834-Speed 3286.97 samples/sec   Loss 0.2788   LearningRate 0.0025   Epoch: 16   Global Step: 280600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:56:54,957-Speed 3279.53 samples/sec   Loss 0.2728   LearningRate 0.0025   Epoch: 16   Global Step: 280610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:56:58,017-Speed 3347.09 samples/sec   Loss 0.2742   LearningRate 0.0025   Epoch: 16   Global Step: 280620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:01,079-Speed 3344.61 samples/sec   Loss 0.2577   LearningRate 0.0025   Epoch: 16   Global Step: 280630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:04,142-Speed 3343.88 samples/sec   Loss 0.2710   LearningRate 0.0025   Epoch: 16   Global Step: 280640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:07,282-Speed 3262.16 samples/sec   Loss 0.2693   LearningRate 0.0025   Epoch: 16   Global Step: 280650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:10,402-Speed 3282.24 samples/sec   Loss 0.2626   LearningRate 0.0025   Epoch: 16   Global Step: 280660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:13,502-Speed 3304.84 samples/sec   Loss 0.2882   LearningRate 0.0025   Epoch: 16   Global Step: 280670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:16,590-Speed 3316.88 samples/sec   Loss 0.2667   LearningRate 0.0025   Epoch: 16   Global Step: 280680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:19,663-Speed 3332.99 samples/sec   Loss 0.2762   LearningRate 0.0025   Epoch: 16   Global Step: 280690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:57:22,728-Speed 3341.15 samples/sec   Loss 0.2680   LearningRate 0.0025   Epoch: 16   Global Step: 280700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:57:25,787-Speed 3349.27 samples/sec   Loss 0.2745   LearningRate 0.0025   Epoch: 16   Global Step: 280710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:57:28,832-Speed 3362.99 samples/sec   Loss 0.2788   LearningRate 0.0025   Epoch: 16   Global Step: 280720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:31,892-Speed 3347.01 samples/sec   Loss 0.2770   LearningRate 0.0025   Epoch: 16   Global Step: 280730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:34,954-Speed 3345.51 samples/sec   Loss 0.2792   LearningRate 0.0025   Epoch: 16   Global Step: 280740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:38,033-Speed 3325.92 samples/sec   Loss 0.2748   LearningRate 0.0025   Epoch: 16   Global Step: 280750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:41,093-Speed 3347.18 samples/sec   Loss 0.2718   LearningRate 0.0025   Epoch: 16   Global Step: 280760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:44,156-Speed 3344.15 samples/sec   Loss 0.2770   LearningRate 0.0025   Epoch: 16   Global Step: 280770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:47,230-Speed 3332.56 samples/sec   Loss 0.2883   LearningRate 0.0025   Epoch: 16   Global Step: 280780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:50,301-Speed 3334.14 samples/sec   Loss 0.2770   LearningRate 0.0025   Epoch: 16   Global Step: 280790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:53,368-Speed 3339.99 samples/sec   Loss 0.2641   LearningRate 0.0025   Epoch: 16   Global Step: 280800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:56,500-Speed 3269.75 samples/sec   Loss 0.2870   LearningRate 0.0025   Epoch: 16   Global Step: 280810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:57:59,559-Speed 3348.40 samples/sec   Loss 0.2766   LearningRate 0.0025   Epoch: 16   Global Step: 280820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:02,675-Speed 3287.19 samples/sec   Loss 0.2786   LearningRate 0.0025   Epoch: 16   Global Step: 280830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:05,754-Speed 3326.59 samples/sec   Loss 0.2660   LearningRate 0.0025   Epoch: 16   Global Step: 280840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:08,826-Speed 3333.59 samples/sec   Loss 0.2727   LearningRate 0.0025   Epoch: 16   Global Step: 280850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:11,898-Speed 3334.56 samples/sec   Loss 0.2736   LearningRate 0.0025   Epoch: 16   Global Step: 280860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:14,974-Speed 3329.43 samples/sec   Loss 0.2832   LearningRate 0.0025   Epoch: 16   Global Step: 280870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:18,037-Speed 3344.77 samples/sec   Loss 0.2591   LearningRate 0.0025   Epoch: 16   Global Step: 280880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:21,096-Speed 3348.07 samples/sec   Loss 0.2693   LearningRate 0.0025   Epoch: 16   Global Step: 280890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:24,180-Speed 3320.94 samples/sec   Loss 0.2865   LearningRate 0.0025   Epoch: 16   Global Step: 280900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:27,240-Speed 3346.38 samples/sec   Loss 0.2855   LearningRate 0.0025   Epoch: 16   Global Step: 280910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:30,325-Speed 3320.22 samples/sec   Loss 0.2800   LearningRate 0.0025   Epoch: 16   Global Step: 280920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:33,392-Speed 3340.16 samples/sec   Loss 0.2881   LearningRate 0.0025   Epoch: 16   Global Step: 280930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:36,472-Speed 3324.98 samples/sec   Loss 0.2788   LearningRate 0.0025   Epoch: 16   Global Step: 280940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:58:39,613-Speed 3260.54 samples/sec   Loss 0.2610   LearningRate 0.0025   Epoch: 16   Global Step: 280950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:58:42,748-Speed 3268.46 samples/sec   Loss 0.2865   LearningRate 0.0025   Epoch: 16   Global Step: 280960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:58:45,857-Speed 3293.96 samples/sec   Loss 0.2813   LearningRate 0.0025   Epoch: 16   Global Step: 280970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:58:48,965-Speed 3294.93 samples/sec   Loss 0.2707   LearningRate 0.0025   Epoch: 16   Global Step: 280980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:58:52,039-Speed 3332.33 samples/sec   Loss 0.2609   LearningRate 0.0025   Epoch: 16   Global Step: 280990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:58:55,136-Speed 3307.06 samples/sec   Loss 0.2994   LearningRate 0.0025   Epoch: 16   Global Step: 281000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:58:58,217-Speed 3323.98 samples/sec   Loss 0.2771   LearningRate 0.0025   Epoch: 16   Global Step: 281010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:59:01,284-Speed 3339.49 samples/sec   Loss 0.2671   LearningRate 0.0025   Epoch: 16   Global Step: 281020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:59:04,398-Speed 3289.65 samples/sec   Loss 0.2835   LearningRate 0.0025   Epoch: 16   Global Step: 281030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:59:07,499-Speed 3303.41 samples/sec   Loss 0.2638   LearningRate 0.0025   Epoch: 16   Global Step: 281040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 04:59:10,694-Speed 3205.61 samples/sec   Loss 0.2590   LearningRate 0.0025   Epoch: 16   Global Step: 281050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:13,828-Speed 3267.90 samples/sec   Loss 0.2781   LearningRate 0.0025   Epoch: 16   Global Step: 281060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:16,909-Speed 3324.18 samples/sec   Loss 0.2856   LearningRate 0.0025   Epoch: 16   Global Step: 281070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:19,989-Speed 3325.96 samples/sec   Loss 0.2808   LearningRate 0.0025   Epoch: 16   Global Step: 281080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:23,065-Speed 3329.92 samples/sec   Loss 0.2691   LearningRate 0.0025   Epoch: 16   Global Step: 281090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:26,207-Speed 3259.26 samples/sec   Loss 0.2769   LearningRate 0.0025   Epoch: 16   Global Step: 281100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:29,346-Speed 3262.81 samples/sec   Loss 0.2877   LearningRate 0.0025   Epoch: 16   Global Step: 281110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:32,488-Speed 3260.36 samples/sec   Loss 0.2785   LearningRate 0.0025   Epoch: 16   Global Step: 281120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:35,558-Speed 3336.00 samples/sec   Loss 0.2844   LearningRate 0.0025   Epoch: 16   Global Step: 281130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:38,619-Speed 3346.44 samples/sec   Loss 0.2891   LearningRate 0.0025   Epoch: 16   Global Step: 281140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:41,681-Speed 3344.49 samples/sec   Loss 0.2771   LearningRate 0.0025   Epoch: 16   Global Step: 281150   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 04:59:44,730-Speed 3359.76 samples/sec   Loss 0.2795   LearningRate 0.0025   Epoch: 16   Global Step: 281160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:47,793-Speed 3343.85 samples/sec   Loss 0.2627   LearningRate 0.0025   Epoch: 16   Global Step: 281170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:50,857-Speed 3342.19 samples/sec   Loss 0.2794   LearningRate 0.0025   Epoch: 16   Global Step: 281180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:53,925-Speed 3339.02 samples/sec   Loss 0.2721   LearningRate 0.0025   Epoch: 16   Global Step: 281190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 04:59:56,985-Speed 3346.52 samples/sec   Loss 0.2670   LearningRate 0.0025   Epoch: 16   Global Step: 281200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:00,059-Speed 3332.33 samples/sec   Loss 0.2505   LearningRate 0.0025   Epoch: 16   Global Step: 281210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:03,140-Speed 3324.97 samples/sec   Loss 0.2777   LearningRate 0.0025   Epoch: 16   Global Step: 281220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:06,208-Speed 3338.50 samples/sec   Loss 0.2806   LearningRate 0.0025   Epoch: 16   Global Step: 281230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:09,266-Speed 3348.84 samples/sec   Loss 0.2776   LearningRate 0.0025   Epoch: 16   Global Step: 281240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:12,330-Speed 3342.94 samples/sec   Loss 0.2882   LearningRate 0.0025   Epoch: 16   Global Step: 281250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:15,382-Speed 3355.73 samples/sec   Loss 0.2644   LearningRate 0.0025   Epoch: 16   Global Step: 281260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:18,462-Speed 3325.41 samples/sec   Loss 0.2800   LearningRate 0.0025   Epoch: 16   Global Step: 281270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:21,524-Speed 3345.04 samples/sec   Loss 0.2573   LearningRate 0.0025   Epoch: 16   Global Step: 281280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:24,593-Speed 3337.37 samples/sec   Loss 0.2878   LearningRate 0.0025   Epoch: 16   Global Step: 281290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:27,667-Speed 3332.65 samples/sec   Loss 0.2786   LearningRate 0.0025   Epoch: 16   Global Step: 281300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:30,731-Speed 3342.71 samples/sec   Loss 0.2733   LearningRate 0.0025   Epoch: 16   Global Step: 281310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:33,795-Speed 3342.66 samples/sec   Loss 0.2830   LearningRate 0.0025   Epoch: 16   Global Step: 281320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:36,858-Speed 3343.93 samples/sec   Loss 0.2749   LearningRate 0.0025   Epoch: 16   Global Step: 281330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:39,977-Speed 3283.14 samples/sec   Loss 0.2640   LearningRate 0.0025   Epoch: 16   Global Step: 281340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:43,055-Speed 3328.33 samples/sec   Loss 0.2721   LearningRate 0.0025   Epoch: 16   Global Step: 281350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:46,135-Speed 3325.28 samples/sec   Loss 0.2758   LearningRate 0.0025   Epoch: 16   Global Step: 281360   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 05:00:49,305-Speed 3231.10 samples/sec   Loss 0.2850   LearningRate 0.0025   Epoch: 16   Global Step: 281370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:52,376-Speed 3334.90 samples/sec   Loss 0.2785   LearningRate 0.0025   Epoch: 16   Global Step: 281380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:55,449-Speed 3333.68 samples/sec   Loss 0.2641   LearningRate 0.0025   Epoch: 16   Global Step: 281390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:00:58,525-Speed 3330.00 samples/sec   Loss 0.2992   LearningRate 0.0025   Epoch: 16   Global Step: 281400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:01:01,592-Speed 3338.39 samples/sec   Loss 0.2833   LearningRate 0.0025   Epoch: 16   Global Step: 281410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:04,655-Speed 3345.00 samples/sec   Loss 0.2836   LearningRate 0.0025   Epoch: 16   Global Step: 281420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:07,744-Speed 3315.41 samples/sec   Loss 0.3037   LearningRate 0.0025   Epoch: 16   Global Step: 281430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:10,856-Speed 3291.11 samples/sec   Loss 0.2635   LearningRate 0.0025   Epoch: 16   Global Step: 281440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:13,932-Speed 3329.92 samples/sec   Loss 0.2714   LearningRate 0.0025   Epoch: 16   Global Step: 281450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:17,044-Speed 3290.84 samples/sec   Loss 0.2630   LearningRate 0.0025   Epoch: 16   Global Step: 281460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:20,110-Speed 3340.27 samples/sec   Loss 0.2653   LearningRate 0.0025   Epoch: 16   Global Step: 281470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:23,183-Speed 3333.99 samples/sec   Loss 0.2646   LearningRate 0.0025   Epoch: 16   Global Step: 281480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:26,301-Speed 3284.62 samples/sec   Loss 0.2771   LearningRate 0.0025   Epoch: 16   Global Step: 281490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:29,362-Speed 3345.57 samples/sec   Loss 0.2723   LearningRate 0.0025   Epoch: 16   Global Step: 281500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:01:32,430-Speed 3338.96 samples/sec   Loss 0.2848   LearningRate 0.0025   Epoch: 16   Global Step: 281510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:01:35,550-Speed 3282.67 samples/sec   Loss 0.2701   LearningRate 0.0025   Epoch: 16   Global Step: 281520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:01:38,623-Speed 3332.62 samples/sec   Loss 0.2627   LearningRate 0.0025   Epoch: 16   Global Step: 281530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:01:41,694-Speed 3335.88 samples/sec   Loss 0.2692   LearningRate 0.0025   Epoch: 16   Global Step: 281540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:01:44,843-Speed 3252.69 samples/sec   Loss 0.2556   LearningRate 0.0025   Epoch: 16   Global Step: 281550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:01:47,933-Speed 3314.85 samples/sec   Loss 0.2800   LearningRate 0.0025   Epoch: 16   Global Step: 281560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:01:51,038-Speed 3298.76 samples/sec   Loss 0.2625   LearningRate 0.0024   Epoch: 16   Global Step: 281570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:01:54,249-Speed 3189.71 samples/sec   Loss 0.2862   LearningRate 0.0024   Epoch: 16   Global Step: 281580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:01:57,377-Speed 3274.41 samples/sec   Loss 0.2592   LearningRate 0.0024   Epoch: 16   Global Step: 281590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:00,439-Speed 3344.65 samples/sec   Loss 0.2598   LearningRate 0.0024   Epoch: 16   Global Step: 281600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:03,524-Speed 3319.49 samples/sec   Loss 0.2567   LearningRate 0.0024   Epoch: 16   Global Step: 281610   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-12 05:02:06,574-Speed 3358.56 samples/sec   Loss 0.2841   LearningRate 0.0024   Epoch: 16   Global Step: 281620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:09,645-Speed 3335.74 samples/sec   Loss 0.2648   LearningRate 0.0024   Epoch: 16   Global Step: 281630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:12,714-Speed 3337.22 samples/sec   Loss 0.2664   LearningRate 0.0024   Epoch: 16   Global Step: 281640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:15,781-Speed 3339.22 samples/sec   Loss 0.2600   LearningRate 0.0024   Epoch: 16   Global Step: 281650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:18,857-Speed 3329.55 samples/sec   Loss 0.2842   LearningRate 0.0024   Epoch: 16   Global Step: 281660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:21,954-Speed 3307.62 samples/sec   Loss 0.2747   LearningRate 0.0024   Epoch: 16   Global Step: 281670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:25,034-Speed 3325.13 samples/sec   Loss 0.2878   LearningRate 0.0024   Epoch: 16   Global Step: 281680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:28,152-Speed 3285.59 samples/sec   Loss 0.2714   LearningRate 0.0024   Epoch: 16   Global Step: 281690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:31,258-Speed 3296.97 samples/sec   Loss 0.2738   LearningRate 0.0024   Epoch: 16   Global Step: 281700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:34,370-Speed 3291.77 samples/sec   Loss 0.2792   LearningRate 0.0024   Epoch: 16   Global Step: 281710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:37,428-Speed 3348.38 samples/sec   Loss 0.2719   LearningRate 0.0024   Epoch: 16   Global Step: 281720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:40,499-Speed 3335.59 samples/sec   Loss 0.2737   LearningRate 0.0024   Epoch: 16   Global Step: 281730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:43,567-Speed 3338.86 samples/sec   Loss 0.2464   LearningRate 0.0024   Epoch: 16   Global Step: 281740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:46,634-Speed 3339.26 samples/sec   Loss 0.2782   LearningRate 0.0024   Epoch: 16   Global Step: 281750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-12 05:02:49,698-Speed 3343.46 samples/sec   Loss 0.2715   LearningRate 0.0024   Epoch: 16   Global Step: 281760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:02:52,796-Speed 3305.93 samples/sec   Loss 0.2764   LearningRate 0.0024   Epoch: 16   Global Step: 281770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:02:55,875-Speed 3325.65 samples/sec   Loss 0.2846   LearningRate 0.0024   Epoch: 16   Global Step: 281780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:02:58,949-Speed 3331.90 samples/sec   Loss 0.2527   LearningRate 0.0024   Epoch: 16   Global Step: 281790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:03:02,016-Speed 3340.07 samples/sec   Loss 0.2827   LearningRate 0.0024   Epoch: 16   Global Step: 281800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-12 05:03:05,090-Speed 3331.14 samples/sec   Loss 0.2842   LearningRate 0.0024   Epoch: 16   Global Step: 281810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:03:08,246-Speed 3246.41 samples/sec   Loss 0.2567   LearningRate 0.0024   Epoch: 16   Global Step: 281820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:03:11,310-Speed 3342.08 samples/sec   Loss 0.2939   LearningRate 0.0024   Epoch: 16   Global Step: 281830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:03:14,380-Speed 3336.71 samples/sec   Loss 0.2526   LearningRate 0.0024   Epoch: 16   Global Step: 281840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:03:17,473-Speed 3311.00 samples/sec   Loss 0.2699   LearningRate 0.0024   Epoch: 16   Global Step: 281850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:03:20,538-Speed 3342.04 samples/sec   Loss 0.2763   LearningRate 0.0024   Epoch: 16   Global Step: 281860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:23,602-Speed 3342.48 samples/sec   Loss 0.2862   LearningRate 0.0024   Epoch: 16   Global Step: 281870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:26,739-Speed 3264.74 samples/sec   Loss 0.2789   LearningRate 0.0024   Epoch: 16   Global Step: 281880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:29,813-Speed 3332.07 samples/sec   Loss 0.2734   LearningRate 0.0024   Epoch: 16   Global Step: 281890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:32,905-Speed 3312.23 samples/sec   Loss 0.2703   LearningRate 0.0024   Epoch: 16   Global Step: 281900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:35,965-Speed 3347.29 samples/sec   Loss 0.2809   LearningRate 0.0024   Epoch: 16   Global Step: 281910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:39,046-Speed 3324.63 samples/sec   Loss 0.2767   LearningRate 0.0024   Epoch: 16   Global Step: 281920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:42,109-Speed 3343.80 samples/sec   Loss 0.2868   LearningRate 0.0024   Epoch: 16   Global Step: 281930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:45,196-Speed 3317.80 samples/sec   Loss 0.2787   LearningRate 0.0024   Epoch: 16   Global Step: 281940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:48,316-Speed 3282.54 samples/sec   Loss 0.2857   LearningRate 0.0024   Epoch: 16   Global Step: 281950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:51,372-Speed 3352.27 samples/sec   Loss 0.2405   LearningRate 0.0024   Epoch: 16   Global Step: 281960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:54,436-Speed 3342.42 samples/sec   Loss 0.2770   LearningRate 0.0024   Epoch: 16   Global Step: 281970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:03:57,520-Speed 3321.36 samples/sec   Loss 0.2766   LearningRate 0.0024   Epoch: 16   Global Step: 281980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:04:00,581-Speed 3345.19 samples/sec   Loss 0.2607   LearningRate 0.0024   Epoch: 16   Global Step: 281990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:04:03,704-Speed 3280.33 samples/sec   Loss 0.2820   LearningRate 0.0024   Epoch: 16   Global Step: 282000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:04:47,587-[lfw][282000]XNorm: 19.686614
Training: 2022-04-12 05:04:47,588-[lfw][282000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-12 05:04:47,588-[lfw][282000]Accuracy-Highest: 0.99817
Training: 2022-04-12 05:05:38,592-[cfp_fp][282000]XNorm: 21.385790
Training: 2022-04-12 05:05:38,593-[cfp_fp][282000]Accuracy-Flip: 0.99143+-0.00378
Training: 2022-04-12 05:05:38,593-[cfp_fp][282000]Accuracy-Highest: 0.99186
Training: 2022-04-12 05:06:22,486-[agedb_30][282000]XNorm: 21.903421
Training: 2022-04-12 05:06:22,486-[agedb_30][282000]Accuracy-Flip: 0.98600+-0.00611
Training: 2022-04-12 05:06:22,487-[agedb_30][282000]Accuracy-Highest: 0.98650
Training: 2022-04-12 05:06:25,580-Speed 72.18 samples/sec   Loss 0.2606   LearningRate 0.0024   Epoch: 16   Global Step: 282010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:06:28,703-Speed 3280.16 samples/sec   Loss 0.2690   LearningRate 0.0024   Epoch: 16   Global Step: 282020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:06:31,817-Speed 3289.59 samples/sec   Loss 0.2563   LearningRate 0.0024   Epoch: 16   Global Step: 282030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:06:34,902-Speed 3319.45 samples/sec   Loss 0.2853   LearningRate 0.0024   Epoch: 16   Global Step: 282040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:06:37,975-Speed 3332.36 samples/sec   Loss 0.2675   LearningRate 0.0024   Epoch: 16   Global Step: 282050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:06:41,032-Speed 3350.65 samples/sec   Loss 0.2822   LearningRate 0.0024   Epoch: 16   Global Step: 282060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:06:44,113-Speed 3324.24 samples/sec   Loss 0.2807   LearningRate 0.0024   Epoch: 16   Global Step: 282070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:06:47,169-Speed 3352.10 samples/sec   Loss 0.2758   LearningRate 0.0024   Epoch: 16   Global Step: 282080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:06:50,254-Speed 3319.99 samples/sec   Loss 0.2808   LearningRate 0.0024   Epoch: 16   Global Step: 282090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-12 05:06:53,408-Speed 3247.12 samples/sec   Loss 0.2749   LearningRate 0.0024   Epoch: 16   Global Step: 282100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:06:56,463-Speed 3352.63 samples/sec   Loss 0.2619   LearningRate 0.0024   Epoch: 16   Global Step: 282110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:06:59,531-Speed 3338.80 samples/sec   Loss 0.2522   LearningRate 0.0024   Epoch: 16   Global Step: 282120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:02,602-Speed 3335.22 samples/sec   Loss 0.2781   LearningRate 0.0024   Epoch: 16   Global Step: 282130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:05,683-Speed 3324.51 samples/sec   Loss 0.2755   LearningRate 0.0024   Epoch: 16   Global Step: 282140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:08,826-Speed 3258.40 samples/sec   Loss 0.2665   LearningRate 0.0024   Epoch: 16   Global Step: 282150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:11,923-Speed 3306.58 samples/sec   Loss 0.2796   LearningRate 0.0024   Epoch: 16   Global Step: 282160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:15,100-Speed 3223.52 samples/sec   Loss 0.2858   LearningRate 0.0024   Epoch: 16   Global Step: 282170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:18,174-Speed 3332.64 samples/sec   Loss 0.2783   LearningRate 0.0024   Epoch: 16   Global Step: 282180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:21,257-Speed 3321.84 samples/sec   Loss 0.2547   LearningRate 0.0024   Epoch: 16   Global Step: 282190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:24,346-Speed 3315.86 samples/sec   Loss 0.2933   LearningRate 0.0024   Epoch: 16   Global Step: 282200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:07:27,413-Speed 3339.99 samples/sec   Loss 0.2974   LearningRate 0.0024   Epoch: 16   Global Step: 282210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:07:30,479-Speed 3340.53 samples/sec   Loss 0.2775   LearningRate 0.0024   Epoch: 16   Global Step: 282220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:07:33,542-Speed 3343.52 samples/sec   Loss 0.2764   LearningRate 0.0024   Epoch: 16   Global Step: 282230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:07:36,602-Speed 3347.46 samples/sec   Loss 0.2928   LearningRate 0.0024   Epoch: 16   Global Step: 282240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:07:39,713-Speed 3291.81 samples/sec   Loss 0.2768   LearningRate 0.0024   Epoch: 16   Global Step: 282250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:07:42,908-Speed 3206.25 samples/sec   Loss 0.2815   LearningRate 0.0024   Epoch: 16   Global Step: 282260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:07:46,071-Speed 3238.43 samples/sec   Loss 0.2624   LearningRate 0.0024   Epoch: 16   Global Step: 282270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:07:49,149-Speed 3327.79 samples/sec   Loss 0.2844   LearningRate 0.0024   Epoch: 16   Global Step: 282280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:07:52,207-Speed 3348.54 samples/sec   Loss 0.2731   LearningRate 0.0024   Epoch: 16   Global Step: 282290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:55,335-Speed 3274.38 samples/sec   Loss 0.2741   LearningRate 0.0024   Epoch: 16   Global Step: 282300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:07:58,437-Speed 3302.00 samples/sec   Loss 0.2908   LearningRate 0.0024   Epoch: 16   Global Step: 282310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:08:01,518-Speed 3323.69 samples/sec   Loss 0.2474   LearningRate 0.0024   Epoch: 16   Global Step: 282320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:08:04,585-Speed 3339.91 samples/sec   Loss 0.2844   LearningRate 0.0024   Epoch: 16   Global Step: 282330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:08:07,677-Speed 3312.44 samples/sec   Loss 0.2851   LearningRate 0.0024   Epoch: 16   Global Step: 282340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:08:10,778-Speed 3303.30 samples/sec   Loss 0.2751   LearningRate 0.0024   Epoch: 16   Global Step: 282350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:08:13,838-Speed 3347.29 samples/sec   Loss 0.2704   LearningRate 0.0024   Epoch: 16   Global Step: 282360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:08:16,899-Speed 3345.84 samples/sec   Loss 0.2774   LearningRate 0.0024   Epoch: 16   Global Step: 282370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:08:19,960-Speed 3346.52 samples/sec   Loss 0.2557   LearningRate 0.0024   Epoch: 16   Global Step: 282380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:08:23,017-Speed 3349.51 samples/sec   Loss 0.2863   LearningRate 0.0024   Epoch: 16   Global Step: 282390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:26,165-Speed 3253.43 samples/sec   Loss 0.2735   LearningRate 0.0024   Epoch: 16   Global Step: 282400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:29,225-Speed 3348.12 samples/sec   Loss 0.2763   LearningRate 0.0024   Epoch: 16   Global Step: 282410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:32,281-Speed 3351.05 samples/sec   Loss 0.2744   LearningRate 0.0024   Epoch: 16   Global Step: 282420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:35,363-Speed 3323.85 samples/sec   Loss 0.2556   LearningRate 0.0024   Epoch: 16   Global Step: 282430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:38,420-Speed 3349.78 samples/sec   Loss 0.2812   LearningRate 0.0024   Epoch: 16   Global Step: 282440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:41,478-Speed 3349.30 samples/sec   Loss 0.2647   LearningRate 0.0024   Epoch: 16   Global Step: 282450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:44,563-Speed 3320.76 samples/sec   Loss 0.2687   LearningRate 0.0024   Epoch: 16   Global Step: 282460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:47,626-Speed 3343.75 samples/sec   Loss 0.2669   LearningRate 0.0024   Epoch: 16   Global Step: 282470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:50,740-Speed 3288.54 samples/sec   Loss 0.2696   LearningRate 0.0024   Epoch: 16   Global Step: 282480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:53,839-Speed 3304.72 samples/sec   Loss 0.2537   LearningRate 0.0024   Epoch: 16   Global Step: 282490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:56,917-Speed 3328.35 samples/sec   Loss 0.2794   LearningRate 0.0024   Epoch: 16   Global Step: 282500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:08:59,984-Speed 3339.18 samples/sec   Loss 0.2721   LearningRate 0.0024   Epoch: 16   Global Step: 282510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:09:03,049-Speed 3341.38 samples/sec   Loss 0.2734   LearningRate 0.0024   Epoch: 16   Global Step: 282520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:09:06,134-Speed 3320.55 samples/sec   Loss 0.2662   LearningRate 0.0024   Epoch: 16   Global Step: 282530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:09,264-Speed 3272.75 samples/sec   Loss 0.2519   LearningRate 0.0024   Epoch: 16   Global Step: 282540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:12,341-Speed 3328.44 samples/sec   Loss 0.2802   LearningRate 0.0024   Epoch: 16   Global Step: 282550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:15,400-Speed 3348.31 samples/sec   Loss 0.2841   LearningRate 0.0024   Epoch: 16   Global Step: 282560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:18,465-Speed 3341.19 samples/sec   Loss 0.2778   LearningRate 0.0024   Epoch: 16   Global Step: 282570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:21,526-Speed 3345.65 samples/sec   Loss 0.2710   LearningRate 0.0024   Epoch: 16   Global Step: 282580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:24,595-Speed 3337.90 samples/sec   Loss 0.2786   LearningRate 0.0024   Epoch: 16   Global Step: 282590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:27,700-Speed 3297.88 samples/sec   Loss 0.2796   LearningRate 0.0024   Epoch: 16   Global Step: 282600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:30,778-Speed 3328.14 samples/sec   Loss 0.2692   LearningRate 0.0024   Epoch: 16   Global Step: 282610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:33,882-Speed 3299.41 samples/sec   Loss 0.2696   LearningRate 0.0024   Epoch: 16   Global Step: 282620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:36,973-Speed 3314.05 samples/sec   Loss 0.2611   LearningRate 0.0024   Epoch: 16   Global Step: 282630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:09:40,076-Speed 3300.92 samples/sec   Loss 0.2793   LearningRate 0.0024   Epoch: 16   Global Step: 282640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:43,158-Speed 3322.75 samples/sec   Loss 0.2760   LearningRate 0.0023   Epoch: 16   Global Step: 282650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:46,233-Speed 3331.01 samples/sec   Loss 0.2683   LearningRate 0.0023   Epoch: 16   Global Step: 282660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:49,297-Speed 3342.96 samples/sec   Loss 0.2908   LearningRate 0.0023   Epoch: 16   Global Step: 282670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:52,364-Speed 3339.56 samples/sec   Loss 0.2769   LearningRate 0.0023   Epoch: 16   Global Step: 282680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:55,429-Speed 3341.50 samples/sec   Loss 0.2793   LearningRate 0.0023   Epoch: 16   Global Step: 282690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:09:58,499-Speed 3335.87 samples/sec   Loss 0.2611   LearningRate 0.0023   Epoch: 16   Global Step: 282700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:10:01,560-Speed 3347.08 samples/sec   Loss 0.2398   LearningRate 0.0023   Epoch: 16   Global Step: 282710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:10:04,667-Speed 3296.48 samples/sec   Loss 0.2837   LearningRate 0.0023   Epoch: 16   Global Step: 282720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:10:07,771-Speed 3299.85 samples/sec   Loss 0.2746   LearningRate 0.0023   Epoch: 16   Global Step: 282730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:10:10,850-Speed 3326.10 samples/sec   Loss 0.2718   LearningRate 0.0023   Epoch: 16   Global Step: 282740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:13,910-Speed 3347.07 samples/sec   Loss 0.2658   LearningRate 0.0023   Epoch: 16   Global Step: 282750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:16,995-Speed 3320.11 samples/sec   Loss 0.2811   LearningRate 0.0023   Epoch: 16   Global Step: 282760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:20,073-Speed 3327.79 samples/sec   Loss 0.2580   LearningRate 0.0023   Epoch: 16   Global Step: 282770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:23,136-Speed 3342.91 samples/sec   Loss 0.2683   LearningRate 0.0023   Epoch: 16   Global Step: 282780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:26,271-Speed 3266.84 samples/sec   Loss 0.2831   LearningRate 0.0023   Epoch: 16   Global Step: 282790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:29,403-Speed 3270.77 samples/sec   Loss 0.2839   LearningRate 0.0023   Epoch: 16   Global Step: 282800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:32,530-Speed 3276.02 samples/sec   Loss 0.2762   LearningRate 0.0023   Epoch: 16   Global Step: 282810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:35,588-Speed 3348.93 samples/sec   Loss 0.2611   LearningRate 0.0023   Epoch: 16   Global Step: 282820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:38,646-Speed 3349.67 samples/sec   Loss 0.2848   LearningRate 0.0023   Epoch: 16   Global Step: 282830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:41,719-Speed 3333.14 samples/sec   Loss 0.2714   LearningRate 0.0023   Epoch: 16   Global Step: 282840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:44,779-Speed 3346.31 samples/sec   Loss 0.2811   LearningRate 0.0023   Epoch: 16   Global Step: 282850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:47,882-Speed 3301.35 samples/sec   Loss 0.2939   LearningRate 0.0023   Epoch: 16   Global Step: 282860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:50,945-Speed 3343.46 samples/sec   Loss 0.2871   LearningRate 0.0023   Epoch: 16   Global Step: 282870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:54,017-Speed 3333.94 samples/sec   Loss 0.3004   LearningRate 0.0023   Epoch: 16   Global Step: 282880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:10:57,088-Speed 3335.66 samples/sec   Loss 0.2652   LearningRate 0.0023   Epoch: 16   Global Step: 282890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:00,156-Speed 3338.86 samples/sec   Loss 0.2775   LearningRate 0.0023   Epoch: 16   Global Step: 282900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:03,227-Speed 3334.88 samples/sec   Loss 0.2765   LearningRate 0.0023   Epoch: 16   Global Step: 282910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:06,291-Speed 3342.54 samples/sec   Loss 0.2690   LearningRate 0.0023   Epoch: 16   Global Step: 282920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:09,354-Speed 3344.32 samples/sec   Loss 0.2819   LearningRate 0.0023   Epoch: 16   Global Step: 282930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:12,409-Speed 3352.48 samples/sec   Loss 0.2689   LearningRate 0.0023   Epoch: 16   Global Step: 282940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:15,469-Speed 3347.13 samples/sec   Loss 0.2743   LearningRate 0.0023   Epoch: 16   Global Step: 282950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:18,551-Speed 3323.44 samples/sec   Loss 0.2643   LearningRate 0.0023   Epoch: 16   Global Step: 282960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:21,628-Speed 3328.42 samples/sec   Loss 0.2796   LearningRate 0.0023   Epoch: 16   Global Step: 282970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:24,688-Speed 3347.00 samples/sec   Loss 0.3014   LearningRate 0.0023   Epoch: 16   Global Step: 282980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:27,746-Speed 3349.04 samples/sec   Loss 0.2642   LearningRate 0.0023   Epoch: 16   Global Step: 282990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:30,807-Speed 3346.69 samples/sec   Loss 0.2467   LearningRate 0.0023   Epoch: 16   Global Step: 283000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:33,937-Speed 3272.10 samples/sec   Loss 0.2662   LearningRate 0.0023   Epoch: 16   Global Step: 283010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:37,049-Speed 3291.72 samples/sec   Loss 0.2751   LearningRate 0.0023   Epoch: 16   Global Step: 283020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:40,163-Speed 3288.08 samples/sec   Loss 0.2821   LearningRate 0.0023   Epoch: 16   Global Step: 283030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:43,255-Speed 3312.65 samples/sec   Loss 0.2848   LearningRate 0.0023   Epoch: 16   Global Step: 283040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:46,333-Speed 3327.53 samples/sec   Loss 0.2874   LearningRate 0.0023   Epoch: 16   Global Step: 283050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:49,463-Speed 3271.98 samples/sec   Loss 0.2717   LearningRate 0.0023   Epoch: 16   Global Step: 283060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:52,633-Speed 3232.05 samples/sec   Loss 0.2721   LearningRate 0.0023   Epoch: 16   Global Step: 283070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:55,744-Speed 3292.32 samples/sec   Loss 0.2702   LearningRate 0.0023   Epoch: 16   Global Step: 283080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:11:58,814-Speed 3335.47 samples/sec   Loss 0.2871   LearningRate 0.0023   Epoch: 16   Global Step: 283090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:01,902-Speed 3317.32 samples/sec   Loss 0.2730   LearningRate 0.0023   Epoch: 16   Global Step: 283100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:05,076-Speed 3226.33 samples/sec   Loss 0.2863   LearningRate 0.0023   Epoch: 16   Global Step: 283110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:08,189-Speed 3290.95 samples/sec   Loss 0.2695   LearningRate 0.0023   Epoch: 16   Global Step: 283120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:11,251-Speed 3344.58 samples/sec   Loss 0.2688   LearningRate 0.0023   Epoch: 16   Global Step: 283130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:14,329-Speed 3326.92 samples/sec   Loss 0.2776   LearningRate 0.0023   Epoch: 16   Global Step: 283140   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:12:17,459-Speed 3272.48 samples/sec   Loss 0.2933   LearningRate 0.0023   Epoch: 16   Global Step: 283150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:20,520-Speed 3346.66 samples/sec   Loss 0.2934   LearningRate 0.0023   Epoch: 16   Global Step: 283160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:23,707-Speed 3213.58 samples/sec   Loss 0.2816   LearningRate 0.0023   Epoch: 16   Global Step: 283170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:26,784-Speed 3328.26 samples/sec   Loss 0.3008   LearningRate 0.0023   Epoch: 16   Global Step: 283180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:29,862-Speed 3328.49 samples/sec   Loss 0.2910   LearningRate 0.0023   Epoch: 16   Global Step: 283190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:32,950-Speed 3316.82 samples/sec   Loss 0.2633   LearningRate 0.0023   Epoch: 16   Global Step: 283200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:12:36,146-Speed 3203.88 samples/sec   Loss 0.2724   LearningRate 0.0023   Epoch: 16   Global Step: 283210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:12:39,256-Speed 3293.94 samples/sec   Loss 0.2574   LearningRate 0.0023   Epoch: 16   Global Step: 283220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:12:42,320-Speed 3342.81 samples/sec   Loss 0.2832   LearningRate 0.0023   Epoch: 16   Global Step: 283230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:12:45,395-Speed 3330.08 samples/sec   Loss 0.2637   LearningRate 0.0023   Epoch: 16   Global Step: 283240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:12:48,465-Speed 3336.97 samples/sec   Loss 0.2963   LearningRate 0.0023   Epoch: 16   Global Step: 283250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:12:51,529-Speed 3342.20 samples/sec   Loss 0.2726   LearningRate 0.0023   Epoch: 16   Global Step: 283260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:12:54,593-Speed 3343.23 samples/sec   Loss 0.2775   LearningRate 0.0023   Epoch: 16   Global Step: 283270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:12:57,665-Speed 3333.31 samples/sec   Loss 0.2715   LearningRate 0.0023   Epoch: 16   Global Step: 283280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:00,728-Speed 3344.30 samples/sec   Loss 0.2621   LearningRate 0.0023   Epoch: 16   Global Step: 283290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:03,806-Speed 3327.25 samples/sec   Loss 0.3042   LearningRate 0.0023   Epoch: 16   Global Step: 283300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:06,901-Speed 3309.92 samples/sec   Loss 0.2615   LearningRate 0.0023   Epoch: 16   Global Step: 283310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:13:09,978-Speed 3328.69 samples/sec   Loss 0.2832   LearningRate 0.0023   Epoch: 16   Global Step: 283320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:13:13,046-Speed 3338.41 samples/sec   Loss 0.2675   LearningRate 0.0023   Epoch: 16   Global Step: 283330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:16,152-Speed 3297.49 samples/sec   Loss 0.2626   LearningRate 0.0023   Epoch: 16   Global Step: 283340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:19,227-Speed 3330.47 samples/sec   Loss 0.2652   LearningRate 0.0023   Epoch: 16   Global Step: 283350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:22,320-Speed 3311.34 samples/sec   Loss 0.2570   LearningRate 0.0023   Epoch: 16   Global Step: 283360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:25,382-Speed 3345.49 samples/sec   Loss 0.2760   LearningRate 0.0023   Epoch: 16   Global Step: 283370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:28,479-Speed 3306.37 samples/sec   Loss 0.2821   LearningRate 0.0023   Epoch: 16   Global Step: 283380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:31,540-Speed 3347.14 samples/sec   Loss 0.2775   LearningRate 0.0023   Epoch: 16   Global Step: 283390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:34,608-Speed 3337.78 samples/sec   Loss 0.2709   LearningRate 0.0023   Epoch: 16   Global Step: 283400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:37,678-Speed 3336.39 samples/sec   Loss 0.2765   LearningRate 0.0023   Epoch: 16   Global Step: 283410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:40,748-Speed 3335.72 samples/sec   Loss 0.2751   LearningRate 0.0023   Epoch: 16   Global Step: 283420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:43,835-Speed 3318.56 samples/sec   Loss 0.2710   LearningRate 0.0023   Epoch: 16   Global Step: 283430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:13:46,889-Speed 3354.01 samples/sec   Loss 0.2736   LearningRate 0.0023   Epoch: 16   Global Step: 283440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:13:49,961-Speed 3333.72 samples/sec   Loss 0.2688   LearningRate 0.0023   Epoch: 16   Global Step: 283450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:13:53,089-Speed 3274.84 samples/sec   Loss 0.2796   LearningRate 0.0023   Epoch: 16   Global Step: 283460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:13:56,175-Speed 3318.64 samples/sec   Loss 0.2603   LearningRate 0.0023   Epoch: 16   Global Step: 283470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:13:59,246-Speed 3334.57 samples/sec   Loss 0.2693   LearningRate 0.0023   Epoch: 16   Global Step: 283480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:14:02,366-Speed 3283.05 samples/sec   Loss 0.2852   LearningRate 0.0023   Epoch: 16   Global Step: 283490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:14:05,539-Speed 3227.87 samples/sec   Loss 0.2985   LearningRate 0.0023   Epoch: 16   Global Step: 283500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:14:08,660-Speed 3282.37 samples/sec   Loss 0.2750   LearningRate 0.0023   Epoch: 16   Global Step: 283510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:14:11,790-Speed 3272.58 samples/sec   Loss 0.2864   LearningRate 0.0023   Epoch: 16   Global Step: 283520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:14:14,907-Speed 3285.74 samples/sec   Loss 0.2799   LearningRate 0.0023   Epoch: 16   Global Step: 283530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:14:18,005-Speed 3306.16 samples/sec   Loss 0.2777   LearningRate 0.0023   Epoch: 16   Global Step: 283540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:14:21,073-Speed 3337.60 samples/sec   Loss 0.2971   LearningRate 0.0023   Epoch: 16   Global Step: 283550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:14:24,137-Speed 3343.03 samples/sec   Loss 0.2709   LearningRate 0.0023   Epoch: 16   Global Step: 283560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:14:27,276-Speed 3262.82 samples/sec   Loss 0.2860   LearningRate 0.0023   Epoch: 16   Global Step: 283570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:30,337-Speed 3346.41 samples/sec   Loss 0.2841   LearningRate 0.0023   Epoch: 16   Global Step: 283580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:33,411-Speed 3331.39 samples/sec   Loss 0.2685   LearningRate 0.0023   Epoch: 16   Global Step: 283590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:36,511-Speed 3304.44 samples/sec   Loss 0.2664   LearningRate 0.0023   Epoch: 16   Global Step: 283600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:39,572-Speed 3346.14 samples/sec   Loss 0.2720   LearningRate 0.0023   Epoch: 16   Global Step: 283610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:42,637-Speed 3342.25 samples/sec   Loss 0.2719   LearningRate 0.0023   Epoch: 16   Global Step: 283620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:45,721-Speed 3320.75 samples/sec   Loss 0.2673   LearningRate 0.0023   Epoch: 16   Global Step: 283630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:48,849-Speed 3273.61 samples/sec   Loss 0.2646   LearningRate 0.0023   Epoch: 16   Global Step: 283640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:51,924-Speed 3331.46 samples/sec   Loss 0.2686   LearningRate 0.0023   Epoch: 16   Global Step: 283650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:55,017-Speed 3311.17 samples/sec   Loss 0.2655   LearningRate 0.0023   Epoch: 16   Global Step: 283660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:14:58,102-Speed 3320.61 samples/sec   Loss 0.2848   LearningRate 0.0023   Epoch: 16   Global Step: 283670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:15:01,186-Speed 3320.89 samples/sec   Loss 0.2574   LearningRate 0.0023   Epoch: 16   Global Step: 283680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:15:04,258-Speed 3333.44 samples/sec   Loss 0.2730   LearningRate 0.0023   Epoch: 16   Global Step: 283690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:15:07,341-Speed 3323.00 samples/sec   Loss 0.2808   LearningRate 0.0023   Epoch: 16   Global Step: 283700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:15:10,407-Speed 3340.73 samples/sec   Loss 0.2729   LearningRate 0.0023   Epoch: 16   Global Step: 283710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:15:13,482-Speed 3330.66 samples/sec   Loss 0.2722   LearningRate 0.0023   Epoch: 16   Global Step: 283720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:15:16,568-Speed 3318.38 samples/sec   Loss 0.2792   LearningRate 0.0023   Epoch: 16   Global Step: 283730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:15:19,882-Speed 3090.30 samples/sec   Loss 0.2821   LearningRate 0.0023   Epoch: 16   Global Step: 283740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:15:58,170-Speed 267.46 samples/sec   Loss 0.2574   LearningRate 0.0022   Epoch: 17   Global Step: 283750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:01,867-Speed 2770.26 samples/sec   Loss 0.1665   LearningRate 0.0022   Epoch: 17   Global Step: 283760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:05,135-Speed 3133.98 samples/sec   Loss 0.1706   LearningRate 0.0022   Epoch: 17   Global Step: 283770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:08,238-Speed 3302.24 samples/sec   Loss 0.1615   LearningRate 0.0022   Epoch: 17   Global Step: 283780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:11,343-Speed 3298.01 samples/sec   Loss 0.1544   LearningRate 0.0022   Epoch: 17   Global Step: 283790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:14,428-Speed 3320.17 samples/sec   Loss 0.1702   LearningRate 0.0022   Epoch: 17   Global Step: 283800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:17,565-Speed 3264.36 samples/sec   Loss 0.1761   LearningRate 0.0022   Epoch: 17   Global Step: 283810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:20,688-Speed 3280.39 samples/sec   Loss 0.1558   LearningRate 0.0022   Epoch: 17   Global Step: 283820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:23,792-Speed 3300.17 samples/sec   Loss 0.1709   LearningRate 0.0022   Epoch: 17   Global Step: 283830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:26,914-Speed 3280.64 samples/sec   Loss 0.1551   LearningRate 0.0022   Epoch: 17   Global Step: 283840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:16:30,061-Speed 3254.24 samples/sec   Loss 0.1599   LearningRate 0.0022   Epoch: 17   Global Step: 283850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:16:33,124-Speed 3344.19 samples/sec   Loss 0.1592   LearningRate 0.0022   Epoch: 17   Global Step: 283860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:16:36,195-Speed 3334.67 samples/sec   Loss 0.1536   LearningRate 0.0022   Epoch: 17   Global Step: 283870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:16:39,257-Speed 3345.15 samples/sec   Loss 0.1693   LearningRate 0.0022   Epoch: 17   Global Step: 283880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:16:42,337-Speed 3325.66 samples/sec   Loss 0.1581   LearningRate 0.0022   Epoch: 17   Global Step: 283890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:16:45,411-Speed 3331.51 samples/sec   Loss 0.1621   LearningRate 0.0022   Epoch: 17   Global Step: 283900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:16:48,470-Speed 3348.76 samples/sec   Loss 0.1778   LearningRate 0.0022   Epoch: 17   Global Step: 283910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:51,556-Speed 3318.18 samples/sec   Loss 0.1605   LearningRate 0.0022   Epoch: 17   Global Step: 283920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:54,647-Speed 3314.65 samples/sec   Loss 0.1686   LearningRate 0.0022   Epoch: 17   Global Step: 283930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:16:57,741-Speed 3309.52 samples/sec   Loss 0.1690   LearningRate 0.0022   Epoch: 17   Global Step: 283940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:17:00,861-Speed 3283.11 samples/sec   Loss 0.1599   LearningRate 0.0022   Epoch: 17   Global Step: 283950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:17:04,274-Speed 3000.83 samples/sec   Loss 0.1668   LearningRate 0.0022   Epoch: 17   Global Step: 283960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:17:07,402-Speed 3274.86 samples/sec   Loss 0.1689   LearningRate 0.0022   Epoch: 17   Global Step: 283970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:17:10,546-Speed 3257.66 samples/sec   Loss 0.1631   LearningRate 0.0022   Epoch: 17   Global Step: 283980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:17:13,841-Speed 3108.29 samples/sec   Loss 0.1581   LearningRate 0.0022   Epoch: 17   Global Step: 283990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:17:17,050-Speed 3192.33 samples/sec   Loss 0.1649   LearningRate 0.0022   Epoch: 17   Global Step: 284000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:18:01,252-[lfw][284000]XNorm: 20.549868
Training: 2022-04-12 05:18:01,252-[lfw][284000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 05:18:01,253-[lfw][284000]Accuracy-Highest: 0.99817
Training: 2022-04-12 05:18:52,467-[cfp_fp][284000]XNorm: 22.197119
Training: 2022-04-12 05:18:52,468-[cfp_fp][284000]Accuracy-Flip: 0.99100+-0.00409
Training: 2022-04-12 05:18:52,468-[cfp_fp][284000]Accuracy-Highest: 0.99186
Training: 2022-04-12 05:19:35,935-[agedb_30][284000]XNorm: 22.360230
Training: 2022-04-12 05:19:35,936-[agedb_30][284000]Accuracy-Flip: 0.98517+-0.00580
Training: 2022-04-12 05:19:35,936-[agedb_30][284000]Accuracy-Highest: 0.98650
Training: 2022-04-12 05:19:39,007-Speed 72.13 samples/sec   Loss 0.1674   LearningRate 0.0022   Epoch: 17   Global Step: 284010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:19:42,082-Speed 3330.96 samples/sec   Loss 0.1574   LearningRate 0.0022   Epoch: 17   Global Step: 284020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:19:45,260-Speed 3222.80 samples/sec   Loss 0.1574   LearningRate 0.0022   Epoch: 17   Global Step: 284030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:19:48,356-Speed 3308.37 samples/sec   Loss 0.1603   LearningRate 0.0022   Epoch: 17   Global Step: 284040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:19:51,422-Speed 3340.57 samples/sec   Loss 0.1518   LearningRate 0.0022   Epoch: 17   Global Step: 284050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:19:54,691-Speed 3132.37 samples/sec   Loss 0.1678   LearningRate 0.0022   Epoch: 17   Global Step: 284060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:19:57,783-Speed 3312.88 samples/sec   Loss 0.1655   LearningRate 0.0022   Epoch: 17   Global Step: 284070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:20:00,887-Speed 3299.24 samples/sec   Loss 0.1664   LearningRate 0.0022   Epoch: 17   Global Step: 284080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:04,012-Speed 3278.23 samples/sec   Loss 0.1604   LearningRate 0.0022   Epoch: 17   Global Step: 284090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:07,147-Speed 3267.16 samples/sec   Loss 0.1722   LearningRate 0.0022   Epoch: 17   Global Step: 284100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:10,344-Speed 3203.65 samples/sec   Loss 0.1607   LearningRate 0.0022   Epoch: 17   Global Step: 284110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:13,403-Speed 3348.24 samples/sec   Loss 0.1621   LearningRate 0.0022   Epoch: 17   Global Step: 284120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:16,501-Speed 3306.28 samples/sec   Loss 0.1587   LearningRate 0.0022   Epoch: 17   Global Step: 284130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:19,589-Speed 3316.51 samples/sec   Loss 0.1500   LearningRate 0.0022   Epoch: 17   Global Step: 284140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:22,711-Speed 3281.19 samples/sec   Loss 0.1581   LearningRate 0.0022   Epoch: 17   Global Step: 284150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:25,803-Speed 3312.28 samples/sec   Loss 0.1623   LearningRate 0.0022   Epoch: 17   Global Step: 284160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:28,880-Speed 3328.72 samples/sec   Loss 0.1537   LearningRate 0.0022   Epoch: 17   Global Step: 284170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:31,939-Speed 3348.81 samples/sec   Loss 0.1573   LearningRate 0.0022   Epoch: 17   Global Step: 284180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:20:35,061-Speed 3280.20 samples/sec   Loss 0.1749   LearningRate 0.0022   Epoch: 17   Global Step: 284190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:38,142-Speed 3324.46 samples/sec   Loss 0.1625   LearningRate 0.0022   Epoch: 17   Global Step: 284200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:41,212-Speed 3335.72 samples/sec   Loss 0.1671   LearningRate 0.0022   Epoch: 17   Global Step: 284210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:44,275-Speed 3343.92 samples/sec   Loss 0.1766   LearningRate 0.0022   Epoch: 17   Global Step: 284220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:47,350-Speed 3331.53 samples/sec   Loss 0.1783   LearningRate 0.0022   Epoch: 17   Global Step: 284230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:50,466-Speed 3287.16 samples/sec   Loss 0.1727   LearningRate 0.0022   Epoch: 17   Global Step: 284240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:53,568-Speed 3301.25 samples/sec   Loss 0.1701   LearningRate 0.0022   Epoch: 17   Global Step: 284250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:56,648-Speed 3325.93 samples/sec   Loss 0.1622   LearningRate 0.0022   Epoch: 17   Global Step: 284260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:20:59,794-Speed 3254.93 samples/sec   Loss 0.1823   LearningRate 0.0022   Epoch: 17   Global Step: 284270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:21:02,880-Speed 3320.05 samples/sec   Loss 0.1662   LearningRate 0.0022   Epoch: 17   Global Step: 284280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:21:05,954-Speed 3331.08 samples/sec   Loss 0.1586   LearningRate 0.0022   Epoch: 17   Global Step: 284290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:09,065-Speed 3292.90 samples/sec   Loss 0.1791   LearningRate 0.0022   Epoch: 17   Global Step: 284300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:12,129-Speed 3342.52 samples/sec   Loss 0.1669   LearningRate 0.0022   Epoch: 17   Global Step: 284310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:15,237-Speed 3295.45 samples/sec   Loss 0.1655   LearningRate 0.0022   Epoch: 17   Global Step: 284320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:18,316-Speed 3326.52 samples/sec   Loss 0.1626   LearningRate 0.0022   Epoch: 17   Global Step: 284330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:22,983-Speed 2194.53 samples/sec   Loss 0.1556   LearningRate 0.0022   Epoch: 17   Global Step: 284340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:26,102-Speed 3284.18 samples/sec   Loss 0.1749   LearningRate 0.0022   Epoch: 17   Global Step: 284350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:29,321-Speed 3181.65 samples/sec   Loss 0.1728   LearningRate 0.0022   Epoch: 17   Global Step: 284360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:33,804-Speed 2284.45 samples/sec   Loss 0.1638   LearningRate 0.0022   Epoch: 17   Global Step: 284370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:36,931-Speed 3275.71 samples/sec   Loss 0.1628   LearningRate 0.0022   Epoch: 17   Global Step: 284380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:40,002-Speed 3335.23 samples/sec   Loss 0.1509   LearningRate 0.0022   Epoch: 17   Global Step: 284390   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:21:43,059-Speed 3349.85 samples/sec   Loss 0.1597   LearningRate 0.0022   Epoch: 17   Global Step: 284400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:46,138-Speed 3327.55 samples/sec   Loss 0.1599   LearningRate 0.0022   Epoch: 17   Global Step: 284410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:49,218-Speed 3325.67 samples/sec   Loss 0.1595   LearningRate 0.0022   Epoch: 17   Global Step: 284420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:52,313-Speed 3309.26 samples/sec   Loss 0.1524   LearningRate 0.0022   Epoch: 17   Global Step: 284430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:55,477-Speed 3236.37 samples/sec   Loss 0.1656   LearningRate 0.0022   Epoch: 17   Global Step: 284440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:21:58,678-Speed 3199.74 samples/sec   Loss 0.1848   LearningRate 0.0022   Epoch: 17   Global Step: 284450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:01,765-Speed 3318.34 samples/sec   Loss 0.1683   LearningRate 0.0022   Epoch: 17   Global Step: 284460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:04,924-Speed 3241.83 samples/sec   Loss 0.1644   LearningRate 0.0022   Epoch: 17   Global Step: 284470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:08,068-Speed 3257.34 samples/sec   Loss 0.1567   LearningRate 0.0022   Epoch: 17   Global Step: 284480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:11,242-Speed 3226.74 samples/sec   Loss 0.1531   LearningRate 0.0022   Epoch: 17   Global Step: 284490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:14,357-Speed 3288.91 samples/sec   Loss 0.1821   LearningRate 0.0022   Epoch: 17   Global Step: 284500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:17,420-Speed 3343.86 samples/sec   Loss 0.1655   LearningRate 0.0022   Epoch: 17   Global Step: 284510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:20,488-Speed 3338.52 samples/sec   Loss 0.1741   LearningRate 0.0022   Epoch: 17   Global Step: 284520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:23,634-Speed 3255.34 samples/sec   Loss 0.1731   LearningRate 0.0022   Epoch: 17   Global Step: 284530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:26,725-Speed 3314.33 samples/sec   Loss 0.1668   LearningRate 0.0022   Epoch: 17   Global Step: 284540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:29,784-Speed 3348.11 samples/sec   Loss 0.1655   LearningRate 0.0022   Epoch: 17   Global Step: 284550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:22:32,871-Speed 3317.15 samples/sec   Loss 0.1628   LearningRate 0.0022   Epoch: 17   Global Step: 284560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:35,967-Speed 3307.97 samples/sec   Loss 0.1607   LearningRate 0.0022   Epoch: 17   Global Step: 284570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:39,044-Speed 3329.39 samples/sec   Loss 0.1622   LearningRate 0.0022   Epoch: 17   Global Step: 284580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:42,133-Speed 3316.42 samples/sec   Loss 0.1685   LearningRate 0.0022   Epoch: 17   Global Step: 284590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:45,205-Speed 3333.28 samples/sec   Loss 0.1550   LearningRate 0.0022   Epoch: 17   Global Step: 284600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:48,270-Speed 3341.53 samples/sec   Loss 0.1680   LearningRate 0.0022   Epoch: 17   Global Step: 284610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:51,346-Speed 3330.77 samples/sec   Loss 0.1623   LearningRate 0.0022   Epoch: 17   Global Step: 284620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:54,427-Speed 3323.97 samples/sec   Loss 0.1673   LearningRate 0.0022   Epoch: 17   Global Step: 284630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:22:57,613-Speed 3214.16 samples/sec   Loss 0.1527   LearningRate 0.0022   Epoch: 17   Global Step: 284640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:00,704-Speed 3313.77 samples/sec   Loss 0.1591   LearningRate 0.0022   Epoch: 17   Global Step: 284650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:03,768-Speed 3342.75 samples/sec   Loss 0.1666   LearningRate 0.0022   Epoch: 17   Global Step: 284660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:23:06,905-Speed 3265.31 samples/sec   Loss 0.1603   LearningRate 0.0022   Epoch: 17   Global Step: 284670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:23:10,008-Speed 3301.29 samples/sec   Loss 0.1630   LearningRate 0.0022   Epoch: 17   Global Step: 284680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:23:13,081-Speed 3332.09 samples/sec   Loss 0.1631   LearningRate 0.0022   Epoch: 17   Global Step: 284690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:23:16,293-Speed 3189.46 samples/sec   Loss 0.1627   LearningRate 0.0022   Epoch: 17   Global Step: 284700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:23:19,398-Speed 3298.25 samples/sec   Loss 0.1585   LearningRate 0.0022   Epoch: 17   Global Step: 284710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:23:22,460-Speed 3344.58 samples/sec   Loss 0.1597   LearningRate 0.0022   Epoch: 17   Global Step: 284720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:23:25,522-Speed 3344.84 samples/sec   Loss 0.1702   LearningRate 0.0022   Epoch: 17   Global Step: 284730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:23:28,659-Speed 3265.71 samples/sec   Loss 0.1523   LearningRate 0.0022   Epoch: 17   Global Step: 284740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:23:31,785-Speed 3276.31 samples/sec   Loss 0.1761   LearningRate 0.0022   Epoch: 17   Global Step: 284750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:34,853-Speed 3338.15 samples/sec   Loss 0.1561   LearningRate 0.0022   Epoch: 17   Global Step: 284760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:37,925-Speed 3334.77 samples/sec   Loss 0.1653   LearningRate 0.0022   Epoch: 17   Global Step: 284770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:41,071-Speed 3255.20 samples/sec   Loss 0.1702   LearningRate 0.0022   Epoch: 17   Global Step: 284780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:44,231-Speed 3240.97 samples/sec   Loss 0.1590   LearningRate 0.0022   Epoch: 17   Global Step: 284790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:47,427-Speed 3204.89 samples/sec   Loss 0.1540   LearningRate 0.0022   Epoch: 17   Global Step: 284800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:50,501-Speed 3332.53 samples/sec   Loss 0.1480   LearningRate 0.0022   Epoch: 17   Global Step: 284810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:53,585-Speed 3321.04 samples/sec   Loss 0.1635   LearningRate 0.0022   Epoch: 17   Global Step: 284820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:56,726-Speed 3260.44 samples/sec   Loss 0.1503   LearningRate 0.0022   Epoch: 17   Global Step: 284830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:23:59,917-Speed 3210.30 samples/sec   Loss 0.1727   LearningRate 0.0022   Epoch: 17   Global Step: 284840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:24:03,069-Speed 3248.81 samples/sec   Loss 0.1533   LearningRate 0.0022   Epoch: 17   Global Step: 284850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:06,180-Speed 3292.09 samples/sec   Loss 0.1577   LearningRate 0.0022   Epoch: 17   Global Step: 284860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:09,316-Speed 3266.97 samples/sec   Loss 0.1625   LearningRate 0.0022   Epoch: 17   Global Step: 284870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:12,408-Speed 3311.82 samples/sec   Loss 0.1572   LearningRate 0.0021   Epoch: 17   Global Step: 284880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:15,523-Speed 3288.50 samples/sec   Loss 0.1635   LearningRate 0.0021   Epoch: 17   Global Step: 284890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:18,616-Speed 3311.67 samples/sec   Loss 0.1441   LearningRate 0.0021   Epoch: 17   Global Step: 284900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:21,759-Speed 3258.26 samples/sec   Loss 0.1526   LearningRate 0.0021   Epoch: 17   Global Step: 284910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:24,893-Speed 3267.81 samples/sec   Loss 0.1651   LearningRate 0.0021   Epoch: 17   Global Step: 284920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:27,965-Speed 3334.57 samples/sec   Loss 0.1652   LearningRate 0.0021   Epoch: 17   Global Step: 284930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:31,037-Speed 3333.75 samples/sec   Loss 0.1579   LearningRate 0.0021   Epoch: 17   Global Step: 284940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:34,108-Speed 3335.43 samples/sec   Loss 0.1528   LearningRate 0.0021   Epoch: 17   Global Step: 284950   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:24:37,169-Speed 3346.04 samples/sec   Loss 0.1757   LearningRate 0.0021   Epoch: 17   Global Step: 284960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:40,329-Speed 3240.94 samples/sec   Loss 0.1683   LearningRate 0.0021   Epoch: 17   Global Step: 284970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:43,392-Speed 3343.69 samples/sec   Loss 0.1557   LearningRate 0.0021   Epoch: 17   Global Step: 284980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:46,496-Speed 3299.94 samples/sec   Loss 0.1683   LearningRate 0.0021   Epoch: 17   Global Step: 284990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:49,595-Speed 3305.27 samples/sec   Loss 0.1839   LearningRate 0.0021   Epoch: 17   Global Step: 285000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:52,672-Speed 3328.50 samples/sec   Loss 0.1739   LearningRate 0.0021   Epoch: 17   Global Step: 285010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:55,737-Speed 3342.35 samples/sec   Loss 0.1568   LearningRate 0.0021   Epoch: 17   Global Step: 285020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:24:58,802-Speed 3342.05 samples/sec   Loss 0.1715   LearningRate 0.0021   Epoch: 17   Global Step: 285030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:25:01,854-Speed 3354.96 samples/sec   Loss 0.1618   LearningRate 0.0021   Epoch: 17   Global Step: 285040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:05,001-Speed 3255.48 samples/sec   Loss 0.1679   LearningRate 0.0021   Epoch: 17   Global Step: 285050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:08,150-Speed 3251.96 samples/sec   Loss 0.1516   LearningRate 0.0021   Epoch: 17   Global Step: 285060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:11,228-Speed 3327.85 samples/sec   Loss 0.1597   LearningRate 0.0021   Epoch: 17   Global Step: 285070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:14,346-Speed 3284.85 samples/sec   Loss 0.1595   LearningRate 0.0021   Epoch: 17   Global Step: 285080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:17,478-Speed 3269.67 samples/sec   Loss 0.1686   LearningRate 0.0021   Epoch: 17   Global Step: 285090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:20,587-Speed 3294.98 samples/sec   Loss 0.1661   LearningRate 0.0021   Epoch: 17   Global Step: 285100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:23,696-Speed 3294.89 samples/sec   Loss 0.1612   LearningRate 0.0021   Epoch: 17   Global Step: 285110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:26,868-Speed 3228.61 samples/sec   Loss 0.1683   LearningRate 0.0021   Epoch: 17   Global Step: 285120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:30,086-Speed 3183.27 samples/sec   Loss 0.1759   LearningRate 0.0021   Epoch: 17   Global Step: 285130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:25:33,339-Speed 3147.90 samples/sec   Loss 0.1666   LearningRate 0.0021   Epoch: 17   Global Step: 285140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:25:36,500-Speed 3240.34 samples/sec   Loss 0.1586   LearningRate 0.0021   Epoch: 17   Global Step: 285150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:25:39,560-Speed 3347.81 samples/sec   Loss 0.1722   LearningRate 0.0021   Epoch: 17   Global Step: 285160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:25:42,627-Speed 3339.19 samples/sec   Loss 0.1708   LearningRate 0.0021   Epoch: 17   Global Step: 285170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:25:45,765-Speed 3263.96 samples/sec   Loss 0.1739   LearningRate 0.0021   Epoch: 17   Global Step: 285180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:25:48,846-Speed 3324.58 samples/sec   Loss 0.1710   LearningRate 0.0021   Epoch: 17   Global Step: 285190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:25:52,018-Speed 3228.60 samples/sec   Loss 0.1641   LearningRate 0.0021   Epoch: 17   Global Step: 285200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:25:55,180-Speed 3239.97 samples/sec   Loss 0.1660   LearningRate 0.0021   Epoch: 17   Global Step: 285210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:25:58,239-Speed 3347.58 samples/sec   Loss 0.1554   LearningRate 0.0021   Epoch: 17   Global Step: 285220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:26:01,363-Speed 3278.78 samples/sec   Loss 0.1554   LearningRate 0.0021   Epoch: 17   Global Step: 285230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:04,464-Speed 3303.20 samples/sec   Loss 0.1704   LearningRate 0.0021   Epoch: 17   Global Step: 285240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:07,595-Speed 3270.97 samples/sec   Loss 0.1629   LearningRate 0.0021   Epoch: 17   Global Step: 285250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:10,684-Speed 3316.37 samples/sec   Loss 0.1648   LearningRate 0.0021   Epoch: 17   Global Step: 285260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:13,755-Speed 3335.03 samples/sec   Loss 0.1726   LearningRate 0.0021   Epoch: 17   Global Step: 285270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:16,891-Speed 3266.20 samples/sec   Loss 0.1695   LearningRate 0.0021   Epoch: 17   Global Step: 285280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:19,963-Speed 3333.78 samples/sec   Loss 0.1805   LearningRate 0.0021   Epoch: 17   Global Step: 285290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:23,074-Speed 3292.24 samples/sec   Loss 0.1634   LearningRate 0.0021   Epoch: 17   Global Step: 285300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:26,198-Speed 3278.66 samples/sec   Loss 0.1613   LearningRate 0.0021   Epoch: 17   Global Step: 285310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:29,318-Speed 3282.50 samples/sec   Loss 0.1529   LearningRate 0.0021   Epoch: 17   Global Step: 285320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:26:32,452-Speed 3267.73 samples/sec   Loss 0.1703   LearningRate 0.0021   Epoch: 17   Global Step: 285330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:26:35,556-Speed 3300.18 samples/sec   Loss 0.1756   LearningRate 0.0021   Epoch: 17   Global Step: 285340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:26:38,629-Speed 3334.15 samples/sec   Loss 0.1642   LearningRate 0.0021   Epoch: 17   Global Step: 285350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:26:41,717-Speed 3315.93 samples/sec   Loss 0.1741   LearningRate 0.0021   Epoch: 17   Global Step: 285360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:26:44,841-Speed 3278.73 samples/sec   Loss 0.1613   LearningRate 0.0021   Epoch: 17   Global Step: 285370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:26:47,942-Speed 3303.51 samples/sec   Loss 0.1645   LearningRate 0.0021   Epoch: 17   Global Step: 285380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:26:51,059-Speed 3285.59 samples/sec   Loss 0.1688   LearningRate 0.0021   Epoch: 17   Global Step: 285390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:26:54,297-Speed 3163.52 samples/sec   Loss 0.1563   LearningRate 0.0021   Epoch: 17   Global Step: 285400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:26:57,511-Speed 3186.10 samples/sec   Loss 0.1660   LearningRate 0.0021   Epoch: 17   Global Step: 285410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:00,629-Speed 3285.05 samples/sec   Loss 0.1651   LearningRate 0.0021   Epoch: 17   Global Step: 285420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:03,696-Speed 3339.37 samples/sec   Loss 0.1622   LearningRate 0.0021   Epoch: 17   Global Step: 285430   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:27:06,804-Speed 3296.00 samples/sec   Loss 0.1568   LearningRate 0.0021   Epoch: 17   Global Step: 285440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:09,978-Speed 3227.28 samples/sec   Loss 0.1610   LearningRate 0.0021   Epoch: 17   Global Step: 285450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:13,093-Speed 3287.80 samples/sec   Loss 0.1582   LearningRate 0.0021   Epoch: 17   Global Step: 285460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:16,176-Speed 3322.45 samples/sec   Loss 0.1559   LearningRate 0.0021   Epoch: 17   Global Step: 285470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:19,252-Speed 3329.71 samples/sec   Loss 0.1716   LearningRate 0.0021   Epoch: 17   Global Step: 285480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:22,321-Speed 3337.33 samples/sec   Loss 0.1785   LearningRate 0.0021   Epoch: 17   Global Step: 285490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:25,395-Speed 3331.25 samples/sec   Loss 0.1648   LearningRate 0.0021   Epoch: 17   Global Step: 285500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:28,525-Speed 3272.18 samples/sec   Loss 0.1680   LearningRate 0.0021   Epoch: 17   Global Step: 285510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:31,662-Speed 3266.00 samples/sec   Loss 0.1633   LearningRate 0.0021   Epoch: 17   Global Step: 285520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:34,740-Speed 3327.67 samples/sec   Loss 0.1692   LearningRate 0.0021   Epoch: 17   Global Step: 285530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:37,811-Speed 3334.88 samples/sec   Loss 0.1664   LearningRate 0.0021   Epoch: 17   Global Step: 285540   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:27:40,879-Speed 3338.91 samples/sec   Loss 0.1698   LearningRate 0.0021   Epoch: 17   Global Step: 285550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:43,951-Speed 3333.69 samples/sec   Loss 0.1669   LearningRate 0.0021   Epoch: 17   Global Step: 285560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:47,040-Speed 3315.51 samples/sec   Loss 0.1715   LearningRate 0.0021   Epoch: 17   Global Step: 285570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:50,302-Speed 3139.43 samples/sec   Loss 0.1681   LearningRate 0.0021   Epoch: 17   Global Step: 285580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:53,393-Speed 3313.50 samples/sec   Loss 0.1690   LearningRate 0.0021   Epoch: 17   Global Step: 285590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:56,585-Speed 3209.46 samples/sec   Loss 0.1584   LearningRate 0.0021   Epoch: 17   Global Step: 285600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:27:59,698-Speed 3289.98 samples/sec   Loss 0.1562   LearningRate 0.0021   Epoch: 17   Global Step: 285610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:02,838-Speed 3262.06 samples/sec   Loss 0.1582   LearningRate 0.0021   Epoch: 17   Global Step: 285620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:05,921-Speed 3322.04 samples/sec   Loss 0.1523   LearningRate 0.0021   Epoch: 17   Global Step: 285630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:09,006-Speed 3320.62 samples/sec   Loss 0.1654   LearningRate 0.0021   Epoch: 17   Global Step: 285640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:12,086-Speed 3324.80 samples/sec   Loss 0.1641   LearningRate 0.0021   Epoch: 17   Global Step: 285650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:15,310-Speed 3176.73 samples/sec   Loss 0.1623   LearningRate 0.0021   Epoch: 17   Global Step: 285660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:18,514-Speed 3196.53 samples/sec   Loss 0.1553   LearningRate 0.0021   Epoch: 17   Global Step: 285670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:21,707-Speed 3208.13 samples/sec   Loss 0.1595   LearningRate 0.0021   Epoch: 17   Global Step: 285680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:24,852-Speed 3256.49 samples/sec   Loss 0.1602   LearningRate 0.0021   Epoch: 17   Global Step: 285690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:27,921-Speed 3338.71 samples/sec   Loss 0.1573   LearningRate 0.0021   Epoch: 17   Global Step: 285700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:28:30,996-Speed 3330.46 samples/sec   Loss 0.1590   LearningRate 0.0021   Epoch: 17   Global Step: 285710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:28:34,156-Speed 3241.29 samples/sec   Loss 0.1679   LearningRate 0.0021   Epoch: 17   Global Step: 285720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:28:37,286-Speed 3272.25 samples/sec   Loss 0.1716   LearningRate 0.0021   Epoch: 17   Global Step: 285730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:28:40,362-Speed 3329.07 samples/sec   Loss 0.1649   LearningRate 0.0021   Epoch: 17   Global Step: 285740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:28:43,468-Speed 3297.60 samples/sec   Loss 0.1542   LearningRate 0.0021   Epoch: 17   Global Step: 285750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:28:46,587-Speed 3284.20 samples/sec   Loss 0.1673   LearningRate 0.0021   Epoch: 17   Global Step: 285760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:28:49,747-Speed 3240.78 samples/sec   Loss 0.1668   LearningRate 0.0021   Epoch: 17   Global Step: 285770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:28:52,834-Speed 3318.39 samples/sec   Loss 0.1654   LearningRate 0.0021   Epoch: 17   Global Step: 285780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:28:55,938-Speed 3300.31 samples/sec   Loss 0.1520   LearningRate 0.0021   Epoch: 17   Global Step: 285790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:28:59,039-Speed 3302.97 samples/sec   Loss 0.1709   LearningRate 0.0021   Epoch: 17   Global Step: 285800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:02,188-Speed 3251.81 samples/sec   Loss 0.1680   LearningRate 0.0021   Epoch: 17   Global Step: 285810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:29:05,376-Speed 3213.61 samples/sec   Loss 0.1578   LearningRate 0.0021   Epoch: 17   Global Step: 285820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:29:08,446-Speed 3335.50 samples/sec   Loss 0.1476   LearningRate 0.0021   Epoch: 17   Global Step: 285830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:29:11,537-Speed 3313.40 samples/sec   Loss 0.1610   LearningRate 0.0021   Epoch: 17   Global Step: 285840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:29:14,631-Speed 3310.81 samples/sec   Loss 0.1764   LearningRate 0.0021   Epoch: 17   Global Step: 285850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:29:17,711-Speed 3325.85 samples/sec   Loss 0.1792   LearningRate 0.0021   Epoch: 17   Global Step: 285860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:29:20,881-Speed 3230.48 samples/sec   Loss 0.1803   LearningRate 0.0021   Epoch: 17   Global Step: 285870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:23,972-Speed 3314.35 samples/sec   Loss 0.1856   LearningRate 0.0021   Epoch: 17   Global Step: 285880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:27,113-Speed 3260.07 samples/sec   Loss 0.1560   LearningRate 0.0021   Epoch: 17   Global Step: 285890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:30,337-Speed 3177.35 samples/sec   Loss 0.1621   LearningRate 0.0021   Epoch: 17   Global Step: 285900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:33,409-Speed 3333.69 samples/sec   Loss 0.1681   LearningRate 0.0021   Epoch: 17   Global Step: 285910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:36,489-Speed 3325.82 samples/sec   Loss 0.1465   LearningRate 0.0021   Epoch: 17   Global Step: 285920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:39,576-Speed 3317.27 samples/sec   Loss 0.1680   LearningRate 0.0021   Epoch: 17   Global Step: 285930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:42,655-Speed 3327.30 samples/sec   Loss 0.1685   LearningRate 0.0021   Epoch: 17   Global Step: 285940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:45,735-Speed 3324.28 samples/sec   Loss 0.1639   LearningRate 0.0021   Epoch: 17   Global Step: 285950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:48,813-Speed 3328.70 samples/sec   Loss 0.1584   LearningRate 0.0021   Epoch: 17   Global Step: 285960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:29:51,896-Speed 3321.85 samples/sec   Loss 0.1595   LearningRate 0.0021   Epoch: 17   Global Step: 285970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:29:54,964-Speed 3338.60 samples/sec   Loss 0.1510   LearningRate 0.0021   Epoch: 17   Global Step: 285980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:29:58,058-Speed 3311.02 samples/sec   Loss 0.1679   LearningRate 0.0021   Epoch: 17   Global Step: 285990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:30:01,137-Speed 3326.57 samples/sec   Loss 0.1633   LearningRate 0.0021   Epoch: 17   Global Step: 286000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:30:45,065-[lfw][286000]XNorm: 20.160438
Training: 2022-04-12 05:30:45,066-[lfw][286000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 05:30:45,066-[lfw][286000]Accuracy-Highest: 0.99817
Training: 2022-04-12 05:31:36,207-[cfp_fp][286000]XNorm: 22.307507
Training: 2022-04-12 05:31:36,207-[cfp_fp][286000]Accuracy-Flip: 0.99171+-0.00403
Training: 2022-04-12 05:31:36,208-[cfp_fp][286000]Accuracy-Highest: 0.99186
Training: 2022-04-12 05:32:19,958-[agedb_30][286000]XNorm: 22.450874
Training: 2022-04-12 05:32:19,959-[agedb_30][286000]Accuracy-Flip: 0.98450+-0.00707
Training: 2022-04-12 05:32:19,959-[agedb_30][286000]Accuracy-Highest: 0.98650
Training: 2022-04-12 05:32:23,083-Speed 72.14 samples/sec   Loss 0.1821   LearningRate 0.0021   Epoch: 17   Global Step: 286010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:26,136-Speed 3354.86 samples/sec   Loss 0.1677   LearningRate 0.0021   Epoch: 17   Global Step: 286020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:29,206-Speed 3336.12 samples/sec   Loss 0.1700   LearningRate 0.0020   Epoch: 17   Global Step: 286030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:32,364-Speed 3243.74 samples/sec   Loss 0.1645   LearningRate 0.0020   Epoch: 17   Global Step: 286040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:35,431-Speed 3338.99 samples/sec   Loss 0.1707   LearningRate 0.0020   Epoch: 17   Global Step: 286050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:38,491-Speed 3347.25 samples/sec   Loss 0.1649   LearningRate 0.0020   Epoch: 17   Global Step: 286060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:41,546-Speed 3352.78 samples/sec   Loss 0.1681   LearningRate 0.0020   Epoch: 17   Global Step: 286070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:44,635-Speed 3315.38 samples/sec   Loss 0.1632   LearningRate 0.0020   Epoch: 17   Global Step: 286080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:47,703-Speed 3338.76 samples/sec   Loss 0.1646   LearningRate 0.0020   Epoch: 17   Global Step: 286090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:50,857-Speed 3247.41 samples/sec   Loss 0.1707   LearningRate 0.0020   Epoch: 17   Global Step: 286100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:53,935-Speed 3327.51 samples/sec   Loss 0.1784   LearningRate 0.0020   Epoch: 17   Global Step: 286110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:32:57,023-Speed 3316.83 samples/sec   Loss 0.1612   LearningRate 0.0020   Epoch: 17   Global Step: 286120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:33:00,110-Speed 3318.18 samples/sec   Loss 0.1598   LearningRate 0.0020   Epoch: 17   Global Step: 286130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:33:03,211-Speed 3302.91 samples/sec   Loss 0.1563   LearningRate 0.0020   Epoch: 17   Global Step: 286140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:33:06,326-Speed 3287.63 samples/sec   Loss 0.1634   LearningRate 0.0020   Epoch: 17   Global Step: 286150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:33:09,412-Speed 3318.98 samples/sec   Loss 0.1552   LearningRate 0.0020   Epoch: 17   Global Step: 286160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:33:12,505-Speed 3311.51 samples/sec   Loss 0.1775   LearningRate 0.0020   Epoch: 17   Global Step: 286170   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:33:15,579-Speed 3332.50 samples/sec   Loss 0.1631   LearningRate 0.0020   Epoch: 17   Global Step: 286180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:33:18,776-Speed 3203.32 samples/sec   Loss 0.1607   LearningRate 0.0020   Epoch: 17   Global Step: 286190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:21,876-Speed 3303.72 samples/sec   Loss 0.1400   LearningRate 0.0020   Epoch: 17   Global Step: 286200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:25,157-Speed 3121.82 samples/sec   Loss 0.1607   LearningRate 0.0020   Epoch: 17   Global Step: 286210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:28,248-Speed 3313.24 samples/sec   Loss 0.1770   LearningRate 0.0020   Epoch: 17   Global Step: 286220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:31,328-Speed 3325.90 samples/sec   Loss 0.1640   LearningRate 0.0020   Epoch: 17   Global Step: 286230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:34,445-Speed 3286.20 samples/sec   Loss 0.1814   LearningRate 0.0020   Epoch: 17   Global Step: 286240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:37,600-Speed 3246.20 samples/sec   Loss 0.1563   LearningRate 0.0020   Epoch: 17   Global Step: 286250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:40,667-Speed 3340.07 samples/sec   Loss 0.1707   LearningRate 0.0020   Epoch: 17   Global Step: 286260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:43,743-Speed 3329.00 samples/sec   Loss 0.1728   LearningRate 0.0020   Epoch: 17   Global Step: 286270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:46,814-Speed 3335.40 samples/sec   Loss 0.1663   LearningRate 0.0020   Epoch: 17   Global Step: 286280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:33:49,892-Speed 3328.11 samples/sec   Loss 0.1547   LearningRate 0.0020   Epoch: 17   Global Step: 286290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:33:53,010-Speed 3284.64 samples/sec   Loss 0.1561   LearningRate 0.0020   Epoch: 17   Global Step: 286300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:33:56,082-Speed 3333.70 samples/sec   Loss 0.1549   LearningRate 0.0020   Epoch: 17   Global Step: 286310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:33:59,178-Speed 3308.36 samples/sec   Loss 0.1588   LearningRate 0.0020   Epoch: 17   Global Step: 286320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:02,389-Speed 3190.02 samples/sec   Loss 0.1732   LearningRate 0.0020   Epoch: 17   Global Step: 286330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:05,531-Speed 3259.95 samples/sec   Loss 0.1672   LearningRate 0.0020   Epoch: 17   Global Step: 286340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:08,641-Speed 3293.03 samples/sec   Loss 0.1765   LearningRate 0.0020   Epoch: 17   Global Step: 286350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:11,756-Speed 3288.45 samples/sec   Loss 0.1706   LearningRate 0.0020   Epoch: 17   Global Step: 286360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:14,869-Speed 3290.31 samples/sec   Loss 0.1699   LearningRate 0.0020   Epoch: 17   Global Step: 286370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:17,942-Speed 3333.09 samples/sec   Loss 0.1571   LearningRate 0.0020   Epoch: 17   Global Step: 286380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:21,008-Speed 3340.14 samples/sec   Loss 0.1755   LearningRate 0.0020   Epoch: 17   Global Step: 286390   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:34:24,063-Speed 3352.61 samples/sec   Loss 0.1649   LearningRate 0.0020   Epoch: 17   Global Step: 286400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:27,126-Speed 3344.61 samples/sec   Loss 0.1711   LearningRate 0.0020   Epoch: 17   Global Step: 286410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:30,219-Speed 3311.24 samples/sec   Loss 0.1604   LearningRate 0.0020   Epoch: 17   Global Step: 286420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:33,306-Speed 3318.12 samples/sec   Loss 0.1835   LearningRate 0.0020   Epoch: 17   Global Step: 286430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:36,439-Speed 3269.44 samples/sec   Loss 0.1559   LearningRate 0.0020   Epoch: 17   Global Step: 286440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:39,574-Speed 3266.39 samples/sec   Loss 0.1686   LearningRate 0.0020   Epoch: 17   Global Step: 286450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:42,645-Speed 3335.28 samples/sec   Loss 0.1697   LearningRate 0.0020   Epoch: 17   Global Step: 286460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:45,724-Speed 3327.27 samples/sec   Loss 0.1680   LearningRate 0.0020   Epoch: 17   Global Step: 286470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:48,811-Speed 3317.74 samples/sec   Loss 0.1663   LearningRate 0.0020   Epoch: 17   Global Step: 286480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:51,876-Speed 3341.39 samples/sec   Loss 0.1645   LearningRate 0.0020   Epoch: 17   Global Step: 286490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:34:54,946-Speed 3336.95 samples/sec   Loss 0.1697   LearningRate 0.0020   Epoch: 17   Global Step: 286500   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:34:58,050-Speed 3298.89 samples/sec   Loss 0.1600   LearningRate 0.0020   Epoch: 17   Global Step: 286510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:01,120-Speed 3336.23 samples/sec   Loss 0.1720   LearningRate 0.0020   Epoch: 17   Global Step: 286520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:04,196-Speed 3330.83 samples/sec   Loss 0.1637   LearningRate 0.0020   Epoch: 17   Global Step: 286530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:07,316-Speed 3282.38 samples/sec   Loss 0.1706   LearningRate 0.0020   Epoch: 17   Global Step: 286540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:10,385-Speed 3336.80 samples/sec   Loss 0.1691   LearningRate 0.0020   Epoch: 17   Global Step: 286550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:13,468-Speed 3321.76 samples/sec   Loss 0.1668   LearningRate 0.0020   Epoch: 17   Global Step: 286560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:16,624-Speed 3246.30 samples/sec   Loss 0.1614   LearningRate 0.0020   Epoch: 17   Global Step: 286570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:19,717-Speed 3311.44 samples/sec   Loss 0.1520   LearningRate 0.0020   Epoch: 17   Global Step: 286580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:22,782-Speed 3341.94 samples/sec   Loss 0.1718   LearningRate 0.0020   Epoch: 17   Global Step: 286590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:25,975-Speed 3207.52 samples/sec   Loss 0.1749   LearningRate 0.0020   Epoch: 17   Global Step: 286600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:29,071-Speed 3307.85 samples/sec   Loss 0.1626   LearningRate 0.0020   Epoch: 17   Global Step: 286610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:32,146-Speed 3331.77 samples/sec   Loss 0.1605   LearningRate 0.0020   Epoch: 17   Global Step: 286620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:35,269-Speed 3278.89 samples/sec   Loss 0.1677   LearningRate 0.0020   Epoch: 17   Global Step: 286630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:38,351-Speed 3323.69 samples/sec   Loss 0.1603   LearningRate 0.0020   Epoch: 17   Global Step: 286640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:41,479-Speed 3274.09 samples/sec   Loss 0.1902   LearningRate 0.0020   Epoch: 17   Global Step: 286650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:44,560-Speed 3324.82 samples/sec   Loss 0.1710   LearningRate 0.0020   Epoch: 17   Global Step: 286660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:47,652-Speed 3312.58 samples/sec   Loss 0.1607   LearningRate 0.0020   Epoch: 17   Global Step: 286670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:50,725-Speed 3332.84 samples/sec   Loss 0.1586   LearningRate 0.0020   Epoch: 17   Global Step: 286680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:53,812-Speed 3318.05 samples/sec   Loss 0.1510   LearningRate 0.0020   Epoch: 17   Global Step: 286690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:35:56,966-Speed 3247.17 samples/sec   Loss 0.1752   LearningRate 0.0020   Epoch: 17   Global Step: 286700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:36:00,191-Speed 3176.20 samples/sec   Loss 0.1738   LearningRate 0.0020   Epoch: 17   Global Step: 286710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:36:03,337-Speed 3256.12 samples/sec   Loss 0.1500   LearningRate 0.0020   Epoch: 17   Global Step: 286720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:36:06,397-Speed 3346.30 samples/sec   Loss 0.1661   LearningRate 0.0020   Epoch: 17   Global Step: 286730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:09,488-Speed 3314.32 samples/sec   Loss 0.1730   LearningRate 0.0020   Epoch: 17   Global Step: 286740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:12,577-Speed 3315.23 samples/sec   Loss 0.1649   LearningRate 0.0020   Epoch: 17   Global Step: 286750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:15,654-Speed 3328.90 samples/sec   Loss 0.1832   LearningRate 0.0020   Epoch: 17   Global Step: 286760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:18,740-Speed 3319.30 samples/sec   Loss 0.1633   LearningRate 0.0020   Epoch: 17   Global Step: 286770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:21,845-Speed 3298.56 samples/sec   Loss 0.1568   LearningRate 0.0020   Epoch: 17   Global Step: 286780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:24,950-Speed 3298.75 samples/sec   Loss 0.1807   LearningRate 0.0020   Epoch: 17   Global Step: 286790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:28,016-Speed 3339.94 samples/sec   Loss 0.1776   LearningRate 0.0020   Epoch: 17   Global Step: 286800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:31,089-Speed 3332.79 samples/sec   Loss 0.1661   LearningRate 0.0020   Epoch: 17   Global Step: 286810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:34,169-Speed 3325.72 samples/sec   Loss 0.1725   LearningRate 0.0020   Epoch: 17   Global Step: 286820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:37,256-Speed 3318.53 samples/sec   Loss 0.1676   LearningRate 0.0020   Epoch: 17   Global Step: 286830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:36:40,338-Speed 3325.20 samples/sec   Loss 0.1694   LearningRate 0.0020   Epoch: 17   Global Step: 286840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:36:43,415-Speed 3328.87 samples/sec   Loss 0.1650   LearningRate 0.0020   Epoch: 17   Global Step: 286850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:36:46,504-Speed 3315.53 samples/sec   Loss 0.1593   LearningRate 0.0020   Epoch: 17   Global Step: 286860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:36:49,589-Speed 3320.09 samples/sec   Loss 0.1644   LearningRate 0.0020   Epoch: 17   Global Step: 286870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:36:52,659-Speed 3335.49 samples/sec   Loss 0.1694   LearningRate 0.0020   Epoch: 17   Global Step: 286880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:55,747-Speed 3317.50 samples/sec   Loss 0.1615   LearningRate 0.0020   Epoch: 17   Global Step: 286890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:36:58,834-Speed 3317.59 samples/sec   Loss 0.1508   LearningRate 0.0020   Epoch: 17   Global Step: 286900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:37:01,920-Speed 3318.45 samples/sec   Loss 0.1684   LearningRate 0.0020   Epoch: 17   Global Step: 286910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:37:05,073-Speed 3248.61 samples/sec   Loss 0.1641   LearningRate 0.0020   Epoch: 17   Global Step: 286920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:37:08,220-Speed 3254.81 samples/sec   Loss 0.1661   LearningRate 0.0020   Epoch: 17   Global Step: 286930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:37:11,331-Speed 3292.55 samples/sec   Loss 0.1526   LearningRate 0.0020   Epoch: 17   Global Step: 286940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:37:14,408-Speed 3329.49 samples/sec   Loss 0.1578   LearningRate 0.0020   Epoch: 17   Global Step: 286950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:37:17,489-Speed 3324.12 samples/sec   Loss 0.1758   LearningRate 0.0020   Epoch: 17   Global Step: 286960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:37:20,569-Speed 3325.41 samples/sec   Loss 0.1728   LearningRate 0.0020   Epoch: 17   Global Step: 286970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:37:23,662-Speed 3310.50 samples/sec   Loss 0.1650   LearningRate 0.0020   Epoch: 17   Global Step: 286980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:26,742-Speed 3326.28 samples/sec   Loss 0.1667   LearningRate 0.0020   Epoch: 17   Global Step: 286990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:29,866-Speed 3278.81 samples/sec   Loss 0.1643   LearningRate 0.0020   Epoch: 17   Global Step: 287000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:32,973-Speed 3296.01 samples/sec   Loss 0.1684   LearningRate 0.0020   Epoch: 17   Global Step: 287010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:36,064-Speed 3313.31 samples/sec   Loss 0.1658   LearningRate 0.0020   Epoch: 17   Global Step: 287020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:39,140-Speed 3330.44 samples/sec   Loss 0.1506   LearningRate 0.0020   Epoch: 17   Global Step: 287030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:42,216-Speed 3329.17 samples/sec   Loss 0.1677   LearningRate 0.0020   Epoch: 17   Global Step: 287040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:45,286-Speed 3336.54 samples/sec   Loss 0.1538   LearningRate 0.0020   Epoch: 17   Global Step: 287050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:48,474-Speed 3212.61 samples/sec   Loss 0.1679   LearningRate 0.0020   Epoch: 17   Global Step: 287060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:51,645-Speed 3229.83 samples/sec   Loss 0.1814   LearningRate 0.0020   Epoch: 17   Global Step: 287070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:54,713-Speed 3337.98 samples/sec   Loss 0.1675   LearningRate 0.0020   Epoch: 17   Global Step: 287080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:37:57,879-Speed 3235.63 samples/sec   Loss 0.1595   LearningRate 0.0020   Epoch: 17   Global Step: 287090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:38:00,951-Speed 3334.69 samples/sec   Loss 0.1807   LearningRate 0.0020   Epoch: 17   Global Step: 287100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:38:04,052-Speed 3302.74 samples/sec   Loss 0.1486   LearningRate 0.0020   Epoch: 17   Global Step: 287110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:38:07,129-Speed 3329.14 samples/sec   Loss 0.1669   LearningRate 0.0020   Epoch: 17   Global Step: 287120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:38:10,223-Speed 3309.59 samples/sec   Loss 0.1675   LearningRate 0.0020   Epoch: 17   Global Step: 287130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:13,299-Speed 3330.45 samples/sec   Loss 0.1704   LearningRate 0.0020   Epoch: 17   Global Step: 287140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:16,376-Speed 3328.76 samples/sec   Loss 0.1642   LearningRate 0.0020   Epoch: 17   Global Step: 287150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:19,487-Speed 3291.46 samples/sec   Loss 0.1572   LearningRate 0.0020   Epoch: 17   Global Step: 287160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:22,587-Speed 3303.63 samples/sec   Loss 0.1762   LearningRate 0.0020   Epoch: 17   Global Step: 287170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:25,681-Speed 3311.11 samples/sec   Loss 0.1681   LearningRate 0.0020   Epoch: 17   Global Step: 287180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:28,751-Speed 3336.46 samples/sec   Loss 0.1690   LearningRate 0.0020   Epoch: 17   Global Step: 287190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:31,828-Speed 3328.44 samples/sec   Loss 0.1854   LearningRate 0.0020   Epoch: 17   Global Step: 287200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:34,899-Speed 3335.57 samples/sec   Loss 0.1930   LearningRate 0.0019   Epoch: 17   Global Step: 287210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:37,992-Speed 3310.92 samples/sec   Loss 0.1532   LearningRate 0.0019   Epoch: 17   Global Step: 287220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:41,060-Speed 3338.33 samples/sec   Loss 0.1729   LearningRate 0.0019   Epoch: 17   Global Step: 287230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:38:44,116-Speed 3351.50 samples/sec   Loss 0.1607   LearningRate 0.0019   Epoch: 17   Global Step: 287240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:47,190-Speed 3332.55 samples/sec   Loss 0.1682   LearningRate 0.0019   Epoch: 17   Global Step: 287250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:50,272-Speed 3322.94 samples/sec   Loss 0.1791   LearningRate 0.0019   Epoch: 17   Global Step: 287260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:53,355-Speed 3322.58 samples/sec   Loss 0.1734   LearningRate 0.0019   Epoch: 17   Global Step: 287270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:56,449-Speed 3310.25 samples/sec   Loss 0.1760   LearningRate 0.0019   Epoch: 17   Global Step: 287280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:38:59,590-Speed 3260.92 samples/sec   Loss 0.1684   LearningRate 0.0019   Epoch: 17   Global Step: 287290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:39:02,752-Speed 3238.78 samples/sec   Loss 0.1747   LearningRate 0.0019   Epoch: 17   Global Step: 287300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:39:05,829-Speed 3328.77 samples/sec   Loss 0.1601   LearningRate 0.0019   Epoch: 17   Global Step: 287310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:39:08,921-Speed 3312.79 samples/sec   Loss 0.1751   LearningRate 0.0019   Epoch: 17   Global Step: 287320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:39:12,107-Speed 3215.14 samples/sec   Loss 0.1521   LearningRate 0.0019   Epoch: 17   Global Step: 287330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:39:15,196-Speed 3315.58 samples/sec   Loss 0.1674   LearningRate 0.0019   Epoch: 17   Global Step: 287340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:18,275-Speed 3326.79 samples/sec   Loss 0.1601   LearningRate 0.0019   Epoch: 17   Global Step: 287350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:21,356-Speed 3324.06 samples/sec   Loss 0.1600   LearningRate 0.0019   Epoch: 17   Global Step: 287360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:24,423-Speed 3340.02 samples/sec   Loss 0.1660   LearningRate 0.0019   Epoch: 17   Global Step: 287370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:27,494-Speed 3334.93 samples/sec   Loss 0.1683   LearningRate 0.0019   Epoch: 17   Global Step: 287380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:30,566-Speed 3333.91 samples/sec   Loss 0.1591   LearningRate 0.0019   Epoch: 17   Global Step: 287390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:33,649-Speed 3322.62 samples/sec   Loss 0.1800   LearningRate 0.0019   Epoch: 17   Global Step: 287400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:36,719-Speed 3336.03 samples/sec   Loss 0.1636   LearningRate 0.0019   Epoch: 17   Global Step: 287410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:39,806-Speed 3317.63 samples/sec   Loss 0.1793   LearningRate 0.0019   Epoch: 17   Global Step: 287420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:42,944-Speed 3264.21 samples/sec   Loss 0.1810   LearningRate 0.0019   Epoch: 17   Global Step: 287430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:46,026-Speed 3323.38 samples/sec   Loss 0.1615   LearningRate 0.0019   Epoch: 17   Global Step: 287440   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:39:49,089-Speed 3343.63 samples/sec   Loss 0.1556   LearningRate 0.0019   Epoch: 17   Global Step: 287450   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:39:52,207-Speed 3285.48 samples/sec   Loss 0.1521   LearningRate 0.0019   Epoch: 17   Global Step: 287460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:55,324-Speed 3285.52 samples/sec   Loss 0.1495   LearningRate 0.0019   Epoch: 17   Global Step: 287470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:39:58,445-Speed 3282.21 samples/sec   Loss 0.1752   LearningRate 0.0019   Epoch: 17   Global Step: 287480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:40:01,629-Speed 3216.28 samples/sec   Loss 0.1715   LearningRate 0.0019   Epoch: 17   Global Step: 287490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:40:04,714-Speed 3319.45 samples/sec   Loss 0.1778   LearningRate 0.0019   Epoch: 17   Global Step: 287500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:40:07,791-Speed 3328.87 samples/sec   Loss 0.1700   LearningRate 0.0019   Epoch: 17   Global Step: 287510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:40:10,888-Speed 3307.53 samples/sec   Loss 0.1669   LearningRate 0.0019   Epoch: 17   Global Step: 287520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:40:13,969-Speed 3325.14 samples/sec   Loss 0.1594   LearningRate 0.0019   Epoch: 17   Global Step: 287530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:40:17,043-Speed 3331.92 samples/sec   Loss 0.1741   LearningRate 0.0019   Epoch: 17   Global Step: 287540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:40:20,110-Speed 3339.54 samples/sec   Loss 0.1582   LearningRate 0.0019   Epoch: 17   Global Step: 287550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:40:23,175-Speed 3341.18 samples/sec   Loss 0.1561   LearningRate 0.0019   Epoch: 17   Global Step: 287560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:26,355-Speed 3221.34 samples/sec   Loss 0.1717   LearningRate 0.0019   Epoch: 17   Global Step: 287570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:29,555-Speed 3200.30 samples/sec   Loss 0.1511   LearningRate 0.0019   Epoch: 17   Global Step: 287580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:32,644-Speed 3315.35 samples/sec   Loss 0.1563   LearningRate 0.0019   Epoch: 17   Global Step: 287590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:35,735-Speed 3313.99 samples/sec   Loss 0.1607   LearningRate 0.0019   Epoch: 17   Global Step: 287600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:38,824-Speed 3315.32 samples/sec   Loss 0.1583   LearningRate 0.0019   Epoch: 17   Global Step: 287610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:41,921-Speed 3307.40 samples/sec   Loss 0.1640   LearningRate 0.0019   Epoch: 17   Global Step: 287620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:45,061-Speed 3262.26 samples/sec   Loss 0.1608   LearningRate 0.0019   Epoch: 17   Global Step: 287630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:48,198-Speed 3265.04 samples/sec   Loss 0.1671   LearningRate 0.0019   Epoch: 17   Global Step: 287640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:51,291-Speed 3310.86 samples/sec   Loss 0.1675   LearningRate 0.0019   Epoch: 17   Global Step: 287650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:40:54,479-Speed 3213.00 samples/sec   Loss 0.1743   LearningRate 0.0019   Epoch: 17   Global Step: 287660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:40:57,586-Speed 3297.43 samples/sec   Loss 0.1748   LearningRate 0.0019   Epoch: 17   Global Step: 287670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:00,666-Speed 3324.32 samples/sec   Loss 0.1684   LearningRate 0.0019   Epoch: 17   Global Step: 287680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:03,784-Speed 3285.13 samples/sec   Loss 0.1536   LearningRate 0.0019   Epoch: 17   Global Step: 287690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:06,859-Speed 3330.80 samples/sec   Loss 0.1696   LearningRate 0.0019   Epoch: 17   Global Step: 287700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:09,959-Speed 3304.19 samples/sec   Loss 0.1761   LearningRate 0.0019   Epoch: 17   Global Step: 287710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:13,037-Speed 3328.17 samples/sec   Loss 0.1671   LearningRate 0.0019   Epoch: 17   Global Step: 287720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:16,111-Speed 3331.30 samples/sec   Loss 0.1734   LearningRate 0.0019   Epoch: 17   Global Step: 287730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:19,190-Speed 3326.80 samples/sec   Loss 0.1533   LearningRate 0.0019   Epoch: 17   Global Step: 287740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:22,285-Speed 3309.34 samples/sec   Loss 0.1559   LearningRate 0.0019   Epoch: 17   Global Step: 287750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:25,370-Speed 3319.77 samples/sec   Loss 0.1689   LearningRate 0.0019   Epoch: 17   Global Step: 287760   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:41:28,440-Speed 3336.91 samples/sec   Loss 0.1773   LearningRate 0.0019   Epoch: 17   Global Step: 287770   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:41:31,503-Speed 3343.73 samples/sec   Loss 0.1634   LearningRate 0.0019   Epoch: 17   Global Step: 287780   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:41:34,566-Speed 3343.65 samples/sec   Loss 0.1719   LearningRate 0.0019   Epoch: 17   Global Step: 287790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:37,752-Speed 3214.64 samples/sec   Loss 0.1692   LearningRate 0.0019   Epoch: 17   Global Step: 287800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:40,830-Speed 3328.18 samples/sec   Loss 0.1697   LearningRate 0.0019   Epoch: 17   Global Step: 287810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:43,913-Speed 3321.54 samples/sec   Loss 0.1772   LearningRate 0.0019   Epoch: 17   Global Step: 287820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:47,049-Speed 3266.19 samples/sec   Loss 0.1610   LearningRate 0.0019   Epoch: 17   Global Step: 287830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:41:50,249-Speed 3200.90 samples/sec   Loss 0.1825   LearningRate 0.0019   Epoch: 17   Global Step: 287840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:41:53,400-Speed 3251.12 samples/sec   Loss 0.1734   LearningRate 0.0019   Epoch: 17   Global Step: 287850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:41:56,466-Speed 3340.23 samples/sec   Loss 0.1734   LearningRate 0.0019   Epoch: 17   Global Step: 287860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:41:59,553-Speed 3317.32 samples/sec   Loss 0.1760   LearningRate 0.0019   Epoch: 17   Global Step: 287870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:42:02,637-Speed 3321.14 samples/sec   Loss 0.1596   LearningRate 0.0019   Epoch: 17   Global Step: 287880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:42:05,731-Speed 3310.45 samples/sec   Loss 0.1577   LearningRate 0.0019   Epoch: 17   Global Step: 287890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:42:08,928-Speed 3204.11 samples/sec   Loss 0.1664   LearningRate 0.0019   Epoch: 17   Global Step: 287900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:42:12,000-Speed 3333.66 samples/sec   Loss 0.1680   LearningRate 0.0019   Epoch: 17   Global Step: 287910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:42:15,100-Speed 3303.67 samples/sec   Loss 0.1626   LearningRate 0.0019   Epoch: 17   Global Step: 287920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:42:18,174-Speed 3332.38 samples/sec   Loss 0.1776   LearningRate 0.0019   Epoch: 17   Global Step: 287930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:42:21,246-Speed 3334.39 samples/sec   Loss 0.1594   LearningRate 0.0019   Epoch: 17   Global Step: 287940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:42:24,331-Speed 3320.12 samples/sec   Loss 0.1626   LearningRate 0.0019   Epoch: 17   Global Step: 287950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:42:27,404-Speed 3333.07 samples/sec   Loss 0.1682   LearningRate 0.0019   Epoch: 17   Global Step: 287960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:42:30,665-Speed 3140.45 samples/sec   Loss 0.1623   LearningRate 0.0019   Epoch: 17   Global Step: 287970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:42:33,807-Speed 3260.25 samples/sec   Loss 0.1685   LearningRate 0.0019   Epoch: 17   Global Step: 287980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:42:36,913-Speed 3297.34 samples/sec   Loss 0.1786   LearningRate 0.0019   Epoch: 17   Global Step: 287990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:42:39,988-Speed 3331.21 samples/sec   Loss 0.1768   LearningRate 0.0019   Epoch: 17   Global Step: 288000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:43:24,221-[lfw][288000]XNorm: 20.884042
Training: 2022-04-12 05:43:24,222-[lfw][288000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-04-12 05:43:24,222-[lfw][288000]Accuracy-Highest: 0.99817
Training: 2022-04-12 05:44:15,439-[cfp_fp][288000]XNorm: 22.413486
Training: 2022-04-12 05:44:15,439-[cfp_fp][288000]Accuracy-Flip: 0.99200+-0.00415
Training: 2022-04-12 05:44:15,440-[cfp_fp][288000]Accuracy-Highest: 0.99200
Training: 2022-04-12 05:44:59,247-[agedb_30][288000]XNorm: 22.716747
Training: 2022-04-12 05:44:59,247-[agedb_30][288000]Accuracy-Flip: 0.98467+-0.00666
Training: 2022-04-12 05:44:59,248-[agedb_30][288000]Accuracy-Highest: 0.98650
Training: 2022-04-12 05:45:02,335-Speed 71.94 samples/sec   Loss 0.1721   LearningRate 0.0019   Epoch: 17   Global Step: 288010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:05,401-Speed 3340.69 samples/sec   Loss 0.1700   LearningRate 0.0019   Epoch: 17   Global Step: 288020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:08,459-Speed 3349.50 samples/sec   Loss 0.1562   LearningRate 0.0019   Epoch: 17   Global Step: 288030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:11,506-Speed 3360.77 samples/sec   Loss 0.1628   LearningRate 0.0019   Epoch: 17   Global Step: 288040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:14,567-Speed 3345.58 samples/sec   Loss 0.1716   LearningRate 0.0019   Epoch: 17   Global Step: 288050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:17,662-Speed 3309.85 samples/sec   Loss 0.1546   LearningRate 0.0019   Epoch: 17   Global Step: 288060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:20,725-Speed 3344.58 samples/sec   Loss 0.1697   LearningRate 0.0019   Epoch: 17   Global Step: 288070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:23,786-Speed 3345.83 samples/sec   Loss 0.1584   LearningRate 0.0019   Epoch: 17   Global Step: 288080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:26,879-Speed 3311.47 samples/sec   Loss 0.1786   LearningRate 0.0019   Epoch: 17   Global Step: 288090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:29,937-Speed 3348.69 samples/sec   Loss 0.1683   LearningRate 0.0019   Epoch: 17   Global Step: 288100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:33,002-Speed 3341.78 samples/sec   Loss 0.1664   LearningRate 0.0019   Epoch: 17   Global Step: 288110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:36,148-Speed 3255.65 samples/sec   Loss 0.1557   LearningRate 0.0019   Epoch: 17   Global Step: 288120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:39,335-Speed 3214.30 samples/sec   Loss 0.1623   LearningRate 0.0019   Epoch: 17   Global Step: 288130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:42,400-Speed 3341.28 samples/sec   Loss 0.1646   LearningRate 0.0019   Epoch: 17   Global Step: 288140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:45,485-Speed 3320.05 samples/sec   Loss 0.1535   LearningRate 0.0019   Epoch: 17   Global Step: 288150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:45:48,571-Speed 3318.79 samples/sec   Loss 0.1566   LearningRate 0.0019   Epoch: 17   Global Step: 288160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:45:51,634-Speed 3344.81 samples/sec   Loss 0.1727   LearningRate 0.0019   Epoch: 17   Global Step: 288170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:45:54,697-Speed 3343.19 samples/sec   Loss 0.1647   LearningRate 0.0019   Epoch: 17   Global Step: 288180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:45:57,801-Speed 3299.53 samples/sec   Loss 0.1548   LearningRate 0.0019   Epoch: 17   Global Step: 288190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:46:00,880-Speed 3327.25 samples/sec   Loss 0.1720   LearningRate 0.0019   Epoch: 17   Global Step: 288200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:46:03,970-Speed 3313.99 samples/sec   Loss 0.1663   LearningRate 0.0019   Epoch: 17   Global Step: 288210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:46:07,037-Speed 3339.92 samples/sec   Loss 0.1754   LearningRate 0.0019   Epoch: 17   Global Step: 288220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:46:10,142-Speed 3298.57 samples/sec   Loss 0.1587   LearningRate 0.0019   Epoch: 17   Global Step: 288230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:46:13,264-Speed 3281.08 samples/sec   Loss 0.1787   LearningRate 0.0019   Epoch: 17   Global Step: 288240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:46:16,381-Speed 3286.18 samples/sec   Loss 0.1623   LearningRate 0.0019   Epoch: 17   Global Step: 288250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 05:46:19,453-Speed 3333.98 samples/sec   Loss 0.1772   LearningRate 0.0019   Epoch: 17   Global Step: 288260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:22,571-Speed 3284.67 samples/sec   Loss 0.1720   LearningRate 0.0019   Epoch: 17   Global Step: 288270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:25,725-Speed 3247.34 samples/sec   Loss 0.1636   LearningRate 0.0019   Epoch: 17   Global Step: 288280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:28,807-Speed 3323.13 samples/sec   Loss 0.1626   LearningRate 0.0019   Epoch: 17   Global Step: 288290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:31,957-Speed 3251.14 samples/sec   Loss 0.1751   LearningRate 0.0019   Epoch: 17   Global Step: 288300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:35,044-Speed 3318.07 samples/sec   Loss 0.1467   LearningRate 0.0019   Epoch: 17   Global Step: 288310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:38,145-Speed 3302.90 samples/sec   Loss 0.1540   LearningRate 0.0019   Epoch: 17   Global Step: 288320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:41,229-Speed 3321.48 samples/sec   Loss 0.1551   LearningRate 0.0019   Epoch: 17   Global Step: 288330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:44,332-Speed 3301.43 samples/sec   Loss 0.1552   LearningRate 0.0019   Epoch: 17   Global Step: 288340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:47,563-Speed 3169.21 samples/sec   Loss 0.1666   LearningRate 0.0019   Epoch: 17   Global Step: 288350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:50,622-Speed 3348.64 samples/sec   Loss 0.1653   LearningRate 0.0019   Epoch: 17   Global Step: 288360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:53,693-Speed 3334.97 samples/sec   Loss 0.1725   LearningRate 0.0019   Epoch: 17   Global Step: 288370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:56,758-Speed 3341.58 samples/sec   Loss 0.1636   LearningRate 0.0019   Epoch: 17   Global Step: 288380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:46:59,860-Speed 3302.02 samples/sec   Loss 0.1581   LearningRate 0.0019   Epoch: 17   Global Step: 288390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:02,921-Speed 3346.20 samples/sec   Loss 0.1626   LearningRate 0.0019   Epoch: 17   Global Step: 288400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:06,015-Speed 3310.82 samples/sec   Loss 0.1593   LearningRate 0.0019   Epoch: 17   Global Step: 288410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:09,147-Speed 3269.59 samples/sec   Loss 0.1850   LearningRate 0.0018   Epoch: 17   Global Step: 288420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:12,208-Speed 3346.63 samples/sec   Loss 0.1651   LearningRate 0.0018   Epoch: 17   Global Step: 288430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:15,274-Speed 3341.21 samples/sec   Loss 0.1739   LearningRate 0.0018   Epoch: 17   Global Step: 288440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:18,333-Speed 3347.98 samples/sec   Loss 0.1591   LearningRate 0.0018   Epoch: 17   Global Step: 288450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:21,395-Speed 3344.19 samples/sec   Loss 0.1734   LearningRate 0.0018   Epoch: 17   Global Step: 288460   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:47:24,445-Speed 3358.35 samples/sec   Loss 0.1715   LearningRate 0.0018   Epoch: 17   Global Step: 288470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:27,572-Speed 3275.80 samples/sec   Loss 0.1693   LearningRate 0.0018   Epoch: 17   Global Step: 288480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:30,651-Speed 3325.87 samples/sec   Loss 0.1651   LearningRate 0.0018   Epoch: 17   Global Step: 288490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:33,721-Speed 3336.69 samples/sec   Loss 0.1577   LearningRate 0.0018   Epoch: 17   Global Step: 288500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:36,855-Speed 3268.70 samples/sec   Loss 0.1834   LearningRate 0.0018   Epoch: 17   Global Step: 288510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:39,932-Speed 3328.46 samples/sec   Loss 0.1717   LearningRate 0.0018   Epoch: 17   Global Step: 288520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:43,103-Speed 3230.39 samples/sec   Loss 0.1646   LearningRate 0.0018   Epoch: 17   Global Step: 288530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:46,174-Speed 3334.20 samples/sec   Loss 0.1659   LearningRate 0.0018   Epoch: 17   Global Step: 288540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:49,377-Speed 3198.50 samples/sec   Loss 0.1756   LearningRate 0.0018   Epoch: 17   Global Step: 288550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:52,443-Speed 3340.53 samples/sec   Loss 0.1769   LearningRate 0.0018   Epoch: 17   Global Step: 288560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:47:55,532-Speed 3315.37 samples/sec   Loss 0.1715   LearningRate 0.0018   Epoch: 17   Global Step: 288570   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:47:58,611-Speed 3326.02 samples/sec   Loss 0.1965   LearningRate 0.0018   Epoch: 17   Global Step: 288580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:01,676-Speed 3342.45 samples/sec   Loss 0.1639   LearningRate 0.0018   Epoch: 17   Global Step: 288590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:04,739-Speed 3343.10 samples/sec   Loss 0.1630   LearningRate 0.0018   Epoch: 17   Global Step: 288600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:07,823-Speed 3321.60 samples/sec   Loss 0.1894   LearningRate 0.0018   Epoch: 17   Global Step: 288610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:10,885-Speed 3345.47 samples/sec   Loss 0.1628   LearningRate 0.0018   Epoch: 17   Global Step: 288620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:13,963-Speed 3327.53 samples/sec   Loss 0.1819   LearningRate 0.0018   Epoch: 17   Global Step: 288630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:17,034-Speed 3335.25 samples/sec   Loss 0.1849   LearningRate 0.0018   Epoch: 17   Global Step: 288640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:20,118-Speed 3320.66 samples/sec   Loss 0.1618   LearningRate 0.0018   Epoch: 17   Global Step: 288650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:23,184-Speed 3340.61 samples/sec   Loss 0.1689   LearningRate 0.0018   Epoch: 17   Global Step: 288660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:26,276-Speed 3313.11 samples/sec   Loss 0.1694   LearningRate 0.0018   Epoch: 17   Global Step: 288670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:29,334-Speed 3348.95 samples/sec   Loss 0.1659   LearningRate 0.0018   Epoch: 17   Global Step: 288680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:32,399-Speed 3342.10 samples/sec   Loss 0.1692   LearningRate 0.0018   Epoch: 17   Global Step: 288690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:35,483-Speed 3320.46 samples/sec   Loss 0.1738   LearningRate 0.0018   Epoch: 17   Global Step: 288700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:38,553-Speed 3337.15 samples/sec   Loss 0.1622   LearningRate 0.0018   Epoch: 17   Global Step: 288710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:41,645-Speed 3312.56 samples/sec   Loss 0.1619   LearningRate 0.0018   Epoch: 17   Global Step: 288720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:44,707-Speed 3344.11 samples/sec   Loss 0.1784   LearningRate 0.0018   Epoch: 17   Global Step: 288730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:47,791-Speed 3321.18 samples/sec   Loss 0.1765   LearningRate 0.0018   Epoch: 17   Global Step: 288740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:50,866-Speed 3330.59 samples/sec   Loss 0.1810   LearningRate 0.0018   Epoch: 17   Global Step: 288750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:53,928-Speed 3345.59 samples/sec   Loss 0.1728   LearningRate 0.0018   Epoch: 17   Global Step: 288760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:48:56,995-Speed 3340.36 samples/sec   Loss 0.1610   LearningRate 0.0018   Epoch: 17   Global Step: 288770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:00,077-Speed 3322.86 samples/sec   Loss 0.1735   LearningRate 0.0018   Epoch: 17   Global Step: 288780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:03,289-Speed 3188.75 samples/sec   Loss 0.1644   LearningRate 0.0018   Epoch: 17   Global Step: 288790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:06,395-Speed 3297.73 samples/sec   Loss 0.1848   LearningRate 0.0018   Epoch: 17   Global Step: 288800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:09,490-Speed 3308.77 samples/sec   Loss 0.1620   LearningRate 0.0018   Epoch: 17   Global Step: 288810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:12,559-Speed 3337.62 samples/sec   Loss 0.1571   LearningRate 0.0018   Epoch: 17   Global Step: 288820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:15,658-Speed 3304.29 samples/sec   Loss 0.1670   LearningRate 0.0018   Epoch: 17   Global Step: 288830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:18,761-Speed 3301.11 samples/sec   Loss 0.1709   LearningRate 0.0018   Epoch: 17   Global Step: 288840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:21,865-Speed 3300.25 samples/sec   Loss 0.1680   LearningRate 0.0018   Epoch: 17   Global Step: 288850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:24,968-Speed 3300.78 samples/sec   Loss 0.1711   LearningRate 0.0018   Epoch: 17   Global Step: 288860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:28,038-Speed 3335.96 samples/sec   Loss 0.1556   LearningRate 0.0018   Epoch: 17   Global Step: 288870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:31,100-Speed 3345.80 samples/sec   Loss 0.1563   LearningRate 0.0018   Epoch: 17   Global Step: 288880   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:49:34,151-Speed 3356.24 samples/sec   Loss 0.1747   LearningRate 0.0018   Epoch: 17   Global Step: 288890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:37,237-Speed 3319.28 samples/sec   Loss 0.1613   LearningRate 0.0018   Epoch: 17   Global Step: 288900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:40,314-Speed 3328.41 samples/sec   Loss 0.1678   LearningRate 0.0018   Epoch: 17   Global Step: 288910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:43,401-Speed 3317.34 samples/sec   Loss 0.1575   LearningRate 0.0018   Epoch: 17   Global Step: 288920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:46,483-Speed 3323.44 samples/sec   Loss 0.1643   LearningRate 0.0018   Epoch: 17   Global Step: 288930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:49,547-Speed 3342.66 samples/sec   Loss 0.1763   LearningRate 0.0018   Epoch: 17   Global Step: 288940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:52,625-Speed 3327.78 samples/sec   Loss 0.1615   LearningRate 0.0018   Epoch: 17   Global Step: 288950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:55,692-Speed 3340.02 samples/sec   Loss 0.1839   LearningRate 0.0018   Epoch: 17   Global Step: 288960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:49:58,775-Speed 3322.33 samples/sec   Loss 0.1750   LearningRate 0.0018   Epoch: 17   Global Step: 288970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:01,844-Speed 3336.76 samples/sec   Loss 0.1831   LearningRate 0.0018   Epoch: 17   Global Step: 288980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:04,899-Speed 3352.57 samples/sec   Loss 0.1673   LearningRate 0.0018   Epoch: 17   Global Step: 288990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:07,958-Speed 3348.26 samples/sec   Loss 0.1478   LearningRate 0.0018   Epoch: 17   Global Step: 289000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:11,027-Speed 3337.10 samples/sec   Loss 0.1897   LearningRate 0.0018   Epoch: 17   Global Step: 289010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:14,108-Speed 3324.80 samples/sec   Loss 0.1729   LearningRate 0.0018   Epoch: 17   Global Step: 289020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:17,193-Speed 3320.03 samples/sec   Loss 0.1589   LearningRate 0.0018   Epoch: 17   Global Step: 289030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:20,278-Speed 3320.23 samples/sec   Loss 0.1588   LearningRate 0.0018   Epoch: 17   Global Step: 289040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:23,344-Speed 3341.09 samples/sec   Loss 0.1703   LearningRate 0.0018   Epoch: 17   Global Step: 289050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:26,433-Speed 3315.65 samples/sec   Loss 0.1757   LearningRate 0.0018   Epoch: 17   Global Step: 289060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:29,499-Speed 3339.91 samples/sec   Loss 0.1740   LearningRate 0.0018   Epoch: 17   Global Step: 289070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:32,567-Speed 3339.04 samples/sec   Loss 0.1628   LearningRate 0.0018   Epoch: 17   Global Step: 289080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:35,674-Speed 3296.02 samples/sec   Loss 0.1773   LearningRate 0.0018   Epoch: 17   Global Step: 289090   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:50:38,829-Speed 3246.35 samples/sec   Loss 0.1735   LearningRate 0.0018   Epoch: 17   Global Step: 289100   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:50:42,053-Speed 3177.26 samples/sec   Loss 0.1635   LearningRate 0.0018   Epoch: 17   Global Step: 289110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:45,138-Speed 3320.32 samples/sec   Loss 0.1741   LearningRate 0.0018   Epoch: 17   Global Step: 289120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:48,249-Speed 3292.23 samples/sec   Loss 0.1627   LearningRate 0.0018   Epoch: 17   Global Step: 289130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:51,324-Speed 3330.46 samples/sec   Loss 0.1650   LearningRate 0.0018   Epoch: 17   Global Step: 289140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:54,389-Speed 3341.76 samples/sec   Loss 0.1756   LearningRate 0.0018   Epoch: 17   Global Step: 289150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:50:57,454-Speed 3341.69 samples/sec   Loss 0.1693   LearningRate 0.0018   Epoch: 17   Global Step: 289160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:00,536-Speed 3323.33 samples/sec   Loss 0.1647   LearningRate 0.0018   Epoch: 17   Global Step: 289170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:03,605-Speed 3337.34 samples/sec   Loss 0.1591   LearningRate 0.0018   Epoch: 17   Global Step: 289180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:06,671-Speed 3340.36 samples/sec   Loss 0.1613   LearningRate 0.0018   Epoch: 17   Global Step: 289190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:09,749-Speed 3327.81 samples/sec   Loss 0.1538   LearningRate 0.0018   Epoch: 17   Global Step: 289200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:12,801-Speed 3355.50 samples/sec   Loss 0.1654   LearningRate 0.0018   Epoch: 17   Global Step: 289210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:15,885-Speed 3322.11 samples/sec   Loss 0.1671   LearningRate 0.0018   Epoch: 17   Global Step: 289220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:18,955-Speed 3336.41 samples/sec   Loss 0.1631   LearningRate 0.0018   Epoch: 17   Global Step: 289230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:22,029-Speed 3331.02 samples/sec   Loss 0.1721   LearningRate 0.0018   Epoch: 17   Global Step: 289240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:25,123-Speed 3310.68 samples/sec   Loss 0.1688   LearningRate 0.0018   Epoch: 17   Global Step: 289250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:28,210-Speed 3318.43 samples/sec   Loss 0.1629   LearningRate 0.0018   Epoch: 17   Global Step: 289260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:31,322-Speed 3290.10 samples/sec   Loss 0.1763   LearningRate 0.0018   Epoch: 17   Global Step: 289270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:34,402-Speed 3325.53 samples/sec   Loss 0.1651   LearningRate 0.0018   Epoch: 17   Global Step: 289280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:37,596-Speed 3207.84 samples/sec   Loss 0.1656   LearningRate 0.0018   Epoch: 17   Global Step: 289290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:40,670-Speed 3331.22 samples/sec   Loss 0.1600   LearningRate 0.0018   Epoch: 17   Global Step: 289300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:44,521-Speed 2659.39 samples/sec   Loss 0.1665   LearningRate 0.0018   Epoch: 17   Global Step: 289310   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:51:47,615-Speed 3310.52 samples/sec   Loss 0.1468   LearningRate 0.0018   Epoch: 17   Global Step: 289320   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:51:50,683-Speed 3338.38 samples/sec   Loss 0.1704   LearningRate 0.0018   Epoch: 17   Global Step: 289330   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:51:53,848-Speed 3235.95 samples/sec   Loss 0.1715   LearningRate 0.0018   Epoch: 17   Global Step: 289340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:51:56,948-Speed 3304.33 samples/sec   Loss 0.1779   LearningRate 0.0018   Epoch: 17   Global Step: 289350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:00,034-Speed 3319.29 samples/sec   Loss 0.1590   LearningRate 0.0018   Epoch: 17   Global Step: 289360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:03,182-Speed 3252.78 samples/sec   Loss 0.1712   LearningRate 0.0018   Epoch: 17   Global Step: 289370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:06,346-Speed 3237.98 samples/sec   Loss 0.1710   LearningRate 0.0018   Epoch: 17   Global Step: 289380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:09,428-Speed 3322.97 samples/sec   Loss 0.1650   LearningRate 0.0018   Epoch: 17   Global Step: 289390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:12,511-Speed 3322.50 samples/sec   Loss 0.1825   LearningRate 0.0018   Epoch: 17   Global Step: 289400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:15,578-Speed 3339.16 samples/sec   Loss 0.1752   LearningRate 0.0018   Epoch: 17   Global Step: 289410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:18,653-Speed 3330.85 samples/sec   Loss 0.1639   LearningRate 0.0018   Epoch: 17   Global Step: 289420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:21,739-Speed 3318.68 samples/sec   Loss 0.1776   LearningRate 0.0018   Epoch: 17   Global Step: 289430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:24,820-Speed 3326.01 samples/sec   Loss 0.1870   LearningRate 0.0018   Epoch: 17   Global Step: 289440   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:52:27,890-Speed 3336.21 samples/sec   Loss 0.1824   LearningRate 0.0018   Epoch: 17   Global Step: 289450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:30,977-Speed 3317.53 samples/sec   Loss 0.1572   LearningRate 0.0018   Epoch: 17   Global Step: 289460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:34,039-Speed 3344.87 samples/sec   Loss 0.1493   LearningRate 0.0018   Epoch: 17   Global Step: 289470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:37,127-Speed 3317.90 samples/sec   Loss 0.1746   LearningRate 0.0018   Epoch: 17   Global Step: 289480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:40,252-Speed 3277.19 samples/sec   Loss 0.1779   LearningRate 0.0018   Epoch: 17   Global Step: 289490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:43,316-Speed 3342.28 samples/sec   Loss 0.1619   LearningRate 0.0018   Epoch: 17   Global Step: 289500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:46,387-Speed 3335.81 samples/sec   Loss 0.1671   LearningRate 0.0018   Epoch: 17   Global Step: 289510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:49,454-Speed 3338.71 samples/sec   Loss 0.1575   LearningRate 0.0018   Epoch: 17   Global Step: 289520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:52,524-Speed 3336.49 samples/sec   Loss 0.1699   LearningRate 0.0018   Epoch: 17   Global Step: 289530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:55,593-Speed 3337.52 samples/sec   Loss 0.1825   LearningRate 0.0018   Epoch: 17   Global Step: 289540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:52:58,674-Speed 3323.86 samples/sec   Loss 0.1821   LearningRate 0.0018   Epoch: 17   Global Step: 289550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:01,743-Speed 3337.53 samples/sec   Loss 0.1666   LearningRate 0.0018   Epoch: 17   Global Step: 289560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:04,816-Speed 3333.97 samples/sec   Loss 0.1662   LearningRate 0.0018   Epoch: 17   Global Step: 289570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:07,965-Speed 3252.08 samples/sec   Loss 0.1748   LearningRate 0.0018   Epoch: 17   Global Step: 289580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:11,118-Speed 3248.14 samples/sec   Loss 0.1815   LearningRate 0.0018   Epoch: 17   Global Step: 289590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:14,263-Speed 3256.77 samples/sec   Loss 0.1537   LearningRate 0.0018   Epoch: 17   Global Step: 289600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:17,328-Speed 3341.62 samples/sec   Loss 0.1774   LearningRate 0.0018   Epoch: 17   Global Step: 289610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:20,397-Speed 3337.22 samples/sec   Loss 0.1767   LearningRate 0.0018   Epoch: 17   Global Step: 289620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:23,467-Speed 3336.39 samples/sec   Loss 0.1581   LearningRate 0.0018   Epoch: 17   Global Step: 289630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:26,537-Speed 3336.88 samples/sec   Loss 0.1719   LearningRate 0.0018   Epoch: 17   Global Step: 289640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:29,617-Speed 3324.64 samples/sec   Loss 0.1636   LearningRate 0.0018   Epoch: 17   Global Step: 289650   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:53:32,671-Speed 3354.55 samples/sec   Loss 0.1754   LearningRate 0.0017   Epoch: 17   Global Step: 289660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:35,793-Speed 3280.18 samples/sec   Loss 0.1609   LearningRate 0.0017   Epoch: 17   Global Step: 289670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:38,859-Speed 3341.12 samples/sec   Loss 0.1732   LearningRate 0.0017   Epoch: 17   Global Step: 289680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:41,931-Speed 3333.39 samples/sec   Loss 0.1700   LearningRate 0.0017   Epoch: 17   Global Step: 289690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:45,023-Speed 3312.52 samples/sec   Loss 0.1493   LearningRate 0.0017   Epoch: 17   Global Step: 289700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:48,124-Speed 3302.93 samples/sec   Loss 0.1675   LearningRate 0.0017   Epoch: 17   Global Step: 289710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:51,227-Speed 3301.53 samples/sec   Loss 0.1721   LearningRate 0.0017   Epoch: 17   Global Step: 289720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:54,420-Speed 3207.37 samples/sec   Loss 0.1692   LearningRate 0.0017   Epoch: 17   Global Step: 289730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:53:57,504-Speed 3320.74 samples/sec   Loss 0.1581   LearningRate 0.0017   Epoch: 17   Global Step: 289740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:00,585-Speed 3325.53 samples/sec   Loss 0.1721   LearningRate 0.0017   Epoch: 17   Global Step: 289750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:03,737-Speed 3249.29 samples/sec   Loss 0.1824   LearningRate 0.0017   Epoch: 17   Global Step: 289760   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:54:06,806-Speed 3336.86 samples/sec   Loss 0.1671   LearningRate 0.0017   Epoch: 17   Global Step: 289770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:09,869-Speed 3343.52 samples/sec   Loss 0.1659   LearningRate 0.0017   Epoch: 17   Global Step: 289780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:12,955-Speed 3319.70 samples/sec   Loss 0.1695   LearningRate 0.0017   Epoch: 17   Global Step: 289790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:16,032-Speed 3327.83 samples/sec   Loss 0.1634   LearningRate 0.0017   Epoch: 17   Global Step: 289800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:19,100-Speed 3338.94 samples/sec   Loss 0.1873   LearningRate 0.0017   Epoch: 17   Global Step: 289810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:22,181-Speed 3324.22 samples/sec   Loss 0.1659   LearningRate 0.0017   Epoch: 17   Global Step: 289820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:25,257-Speed 3329.53 samples/sec   Loss 0.1684   LearningRate 0.0017   Epoch: 17   Global Step: 289830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:28,325-Speed 3338.60 samples/sec   Loss 0.1784   LearningRate 0.0017   Epoch: 17   Global Step: 289840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:31,389-Speed 3343.32 samples/sec   Loss 0.1626   LearningRate 0.0017   Epoch: 17   Global Step: 289850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:34,453-Speed 3342.82 samples/sec   Loss 0.1633   LearningRate 0.0017   Epoch: 17   Global Step: 289860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:37,552-Speed 3305.53 samples/sec   Loss 0.1693   LearningRate 0.0017   Epoch: 17   Global Step: 289870   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:54:40,702-Speed 3251.22 samples/sec   Loss 0.1628   LearningRate 0.0017   Epoch: 17   Global Step: 289880   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:54:43,849-Speed 3254.35 samples/sec   Loss 0.1682   LearningRate 0.0017   Epoch: 17   Global Step: 289890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:46,923-Speed 3331.74 samples/sec   Loss 0.1795   LearningRate 0.0017   Epoch: 17   Global Step: 289900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:50,059-Speed 3265.74 samples/sec   Loss 0.1709   LearningRate 0.0017   Epoch: 17   Global Step: 289910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:53,147-Speed 3316.98 samples/sec   Loss 0.1637   LearningRate 0.0017   Epoch: 17   Global Step: 289920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:56,228-Speed 3324.99 samples/sec   Loss 0.1851   LearningRate 0.0017   Epoch: 17   Global Step: 289930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:54:59,318-Speed 3314.90 samples/sec   Loss 0.1560   LearningRate 0.0017   Epoch: 17   Global Step: 289940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:55:02,561-Speed 3158.30 samples/sec   Loss 0.1745   LearningRate 0.0017   Epoch: 17   Global Step: 289950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:55:05,785-Speed 3177.09 samples/sec   Loss 0.1873   LearningRate 0.0017   Epoch: 17   Global Step: 289960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:55:08,903-Speed 3284.05 samples/sec   Loss 0.1826   LearningRate 0.0017   Epoch: 17   Global Step: 289970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:55:11,990-Speed 3318.22 samples/sec   Loss 0.1648   LearningRate 0.0017   Epoch: 17   Global Step: 289980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:55:15,111-Speed 3281.76 samples/sec   Loss 0.1829   LearningRate 0.0017   Epoch: 17   Global Step: 289990   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:55:18,330-Speed 3181.79 samples/sec   Loss 0.1872   LearningRate 0.0017   Epoch: 17   Global Step: 290000   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:56:02,099-[lfw][290000]XNorm: 20.870605
Training: 2022-04-12 05:56:02,099-[lfw][290000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-12 05:56:02,100-[lfw][290000]Accuracy-Highest: 0.99817
Training: 2022-04-12 05:56:53,077-[cfp_fp][290000]XNorm: 22.715098
Training: 2022-04-12 05:56:53,078-[cfp_fp][290000]Accuracy-Flip: 0.99071+-0.00385
Training: 2022-04-12 05:56:53,078-[cfp_fp][290000]Accuracy-Highest: 0.99200
Training: 2022-04-12 05:57:36,669-[agedb_30][290000]XNorm: 22.869896
Training: 2022-04-12 05:57:36,670-[agedb_30][290000]Accuracy-Flip: 0.98583+-0.00593
Training: 2022-04-12 05:57:36,670-[agedb_30][290000]Accuracy-Highest: 0.98650
Training: 2022-04-12 05:57:39,719-Speed 72.42 samples/sec   Loss 0.1600   LearningRate 0.0017   Epoch: 17   Global Step: 290010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:57:42,780-Speed 3345.96 samples/sec   Loss 0.1647   LearningRate 0.0017   Epoch: 17   Global Step: 290020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:57:45,865-Speed 3320.25 samples/sec   Loss 0.1729   LearningRate 0.0017   Epoch: 17   Global Step: 290030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:57:48,945-Speed 3325.78 samples/sec   Loss 0.1668   LearningRate 0.0017   Epoch: 17   Global Step: 290040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:57:52,028-Speed 3322.27 samples/sec   Loss 0.1736   LearningRate 0.0017   Epoch: 17   Global Step: 290050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:57:55,105-Speed 3327.66 samples/sec   Loss 0.1782   LearningRate 0.0017   Epoch: 17   Global Step: 290060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:57:58,285-Speed 3221.25 samples/sec   Loss 0.1714   LearningRate 0.0017   Epoch: 17   Global Step: 290070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:01,437-Speed 3249.06 samples/sec   Loss 0.1609   LearningRate 0.0017   Epoch: 17   Global Step: 290080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:04,539-Speed 3302.31 samples/sec   Loss 0.1646   LearningRate 0.0017   Epoch: 17   Global Step: 290090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:07,605-Speed 3340.90 samples/sec   Loss 0.1806   LearningRate 0.0017   Epoch: 17   Global Step: 290100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:10,673-Speed 3337.72 samples/sec   Loss 0.1722   LearningRate 0.0017   Epoch: 17   Global Step: 290110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:13,856-Speed 3217.98 samples/sec   Loss 0.1786   LearningRate 0.0017   Epoch: 17   Global Step: 290120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:16,970-Speed 3289.00 samples/sec   Loss 0.1705   LearningRate 0.0017   Epoch: 17   Global Step: 290130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:20,031-Speed 3346.84 samples/sec   Loss 0.1775   LearningRate 0.0017   Epoch: 17   Global Step: 290140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:23,129-Speed 3305.22 samples/sec   Loss 0.1642   LearningRate 0.0017   Epoch: 17   Global Step: 290150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:26,258-Speed 3273.11 samples/sec   Loss 0.1805   LearningRate 0.0017   Epoch: 17   Global Step: 290160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:29,406-Speed 3253.53 samples/sec   Loss 0.1794   LearningRate 0.0017   Epoch: 17   Global Step: 290170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:32,577-Speed 3230.49 samples/sec   Loss 0.1738   LearningRate 0.0017   Epoch: 17   Global Step: 290180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:35,637-Speed 3347.07 samples/sec   Loss 0.1716   LearningRate 0.0017   Epoch: 17   Global Step: 290190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:38,701-Speed 3343.27 samples/sec   Loss 0.1678   LearningRate 0.0017   Epoch: 17   Global Step: 290200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:41,765-Speed 3342.59 samples/sec   Loss 0.1752   LearningRate 0.0017   Epoch: 17   Global Step: 290210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:44,833-Speed 3338.64 samples/sec   Loss 0.1912   LearningRate 0.0017   Epoch: 17   Global Step: 290220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:47,955-Speed 3280.34 samples/sec   Loss 0.1671   LearningRate 0.0017   Epoch: 17   Global Step: 290230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:51,018-Speed 3343.73 samples/sec   Loss 0.1900   LearningRate 0.0017   Epoch: 17   Global Step: 290240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:54,089-Speed 3334.67 samples/sec   Loss 0.1750   LearningRate 0.0017   Epoch: 17   Global Step: 290250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:58:57,255-Speed 3235.16 samples/sec   Loss 0.1792   LearningRate 0.0017   Epoch: 17   Global Step: 290260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:00,326-Speed 3335.39 samples/sec   Loss 0.1621   LearningRate 0.0017   Epoch: 17   Global Step: 290270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:03,494-Speed 3233.58 samples/sec   Loss 0.1674   LearningRate 0.0017   Epoch: 17   Global Step: 290280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:06,565-Speed 3334.58 samples/sec   Loss 0.1621   LearningRate 0.0017   Epoch: 17   Global Step: 290290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:09,647-Speed 3323.94 samples/sec   Loss 0.1839   LearningRate 0.0017   Epoch: 17   Global Step: 290300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:12,709-Speed 3344.95 samples/sec   Loss 0.1692   LearningRate 0.0017   Epoch: 17   Global Step: 290310   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 05:59:15,774-Speed 3341.77 samples/sec   Loss 0.1653   LearningRate 0.0017   Epoch: 17   Global Step: 290320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:18,846-Speed 3333.84 samples/sec   Loss 0.1658   LearningRate 0.0017   Epoch: 17   Global Step: 290330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:21,921-Speed 3331.09 samples/sec   Loss 0.1667   LearningRate 0.0017   Epoch: 17   Global Step: 290340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:24,983-Speed 3344.36 samples/sec   Loss 0.1658   LearningRate 0.0017   Epoch: 17   Global Step: 290350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:28,074-Speed 3313.96 samples/sec   Loss 0.1744   LearningRate 0.0017   Epoch: 17   Global Step: 290360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:31,202-Speed 3274.37 samples/sec   Loss 0.1890   LearningRate 0.0017   Epoch: 17   Global Step: 290370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:34,431-Speed 3171.64 samples/sec   Loss 0.1833   LearningRate 0.0017   Epoch: 17   Global Step: 290380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:37,643-Speed 3188.92 samples/sec   Loss 0.1672   LearningRate 0.0017   Epoch: 17   Global Step: 290390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:40,799-Speed 3245.45 samples/sec   Loss 0.1736   LearningRate 0.0017   Epoch: 17   Global Step: 290400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:43,926-Speed 3274.96 samples/sec   Loss 0.1719   LearningRate 0.0017   Epoch: 17   Global Step: 290410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:46,979-Speed 3355.25 samples/sec   Loss 0.1637   LearningRate 0.0017   Epoch: 17   Global Step: 290420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:50,053-Speed 3334.52 samples/sec   Loss 0.1767   LearningRate 0.0017   Epoch: 17   Global Step: 290430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:53,115-Speed 3344.48 samples/sec   Loss 0.1734   LearningRate 0.0017   Epoch: 17   Global Step: 290440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:56,177-Speed 3345.29 samples/sec   Loss 0.1596   LearningRate 0.0017   Epoch: 17   Global Step: 290450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 05:59:59,252-Speed 3330.75 samples/sec   Loss 0.1524   LearningRate 0.0017   Epoch: 17   Global Step: 290460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:02,323-Speed 3335.45 samples/sec   Loss 0.1637   LearningRate 0.0017   Epoch: 17   Global Step: 290470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:05,395-Speed 3334.38 samples/sec   Loss 0.1671   LearningRate 0.0017   Epoch: 17   Global Step: 290480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:08,477-Speed 3323.32 samples/sec   Loss 0.1907   LearningRate 0.0017   Epoch: 17   Global Step: 290490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:11,558-Speed 3324.12 samples/sec   Loss 0.1823   LearningRate 0.0017   Epoch: 17   Global Step: 290500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:14,648-Speed 3314.24 samples/sec   Loss 0.1718   LearningRate 0.0017   Epoch: 17   Global Step: 290510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:17,738-Speed 3314.78 samples/sec   Loss 0.1785   LearningRate 0.0017   Epoch: 17   Global Step: 290520   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 06:00:20,806-Speed 3338.66 samples/sec   Loss 0.1832   LearningRate 0.0017   Epoch: 17   Global Step: 290530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:23,883-Speed 3328.58 samples/sec   Loss 0.1747   LearningRate 0.0017   Epoch: 17   Global Step: 290540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:26,946-Speed 3344.45 samples/sec   Loss 0.1824   LearningRate 0.0017   Epoch: 17   Global Step: 290550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:30,016-Speed 3335.88 samples/sec   Loss 0.1713   LearningRate 0.0017   Epoch: 17   Global Step: 290560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:33,082-Speed 3340.88 samples/sec   Loss 0.1664   LearningRate 0.0017   Epoch: 17   Global Step: 290570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:36,146-Speed 3342.19 samples/sec   Loss 0.1770   LearningRate 0.0017   Epoch: 17   Global Step: 290580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:39,223-Speed 3329.43 samples/sec   Loss 0.1756   LearningRate 0.0017   Epoch: 17   Global Step: 290590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:42,322-Speed 3303.87 samples/sec   Loss 0.1863   LearningRate 0.0017   Epoch: 17   Global Step: 290600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:45,486-Speed 3237.58 samples/sec   Loss 0.1652   LearningRate 0.0017   Epoch: 17   Global Step: 290610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:48,569-Speed 3321.79 samples/sec   Loss 0.1647   LearningRate 0.0017   Epoch: 17   Global Step: 290620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:00:51,716-Speed 3255.61 samples/sec   Loss 0.1784   LearningRate 0.0017   Epoch: 17   Global Step: 290630   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 06:00:54,787-Speed 3334.78 samples/sec   Loss 0.1624   LearningRate 0.0017   Epoch: 17   Global Step: 290640   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 06:00:57,840-Speed 3355.26 samples/sec   Loss 0.1673   LearningRate 0.0017   Epoch: 17   Global Step: 290650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:00,923-Speed 3321.81 samples/sec   Loss 0.1634   LearningRate 0.0017   Epoch: 17   Global Step: 290660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:04,009-Speed 3318.77 samples/sec   Loss 0.1476   LearningRate 0.0017   Epoch: 17   Global Step: 290670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:07,070-Speed 3346.00 samples/sec   Loss 0.1707   LearningRate 0.0017   Epoch: 17   Global Step: 290680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:10,160-Speed 3314.80 samples/sec   Loss 0.1632   LearningRate 0.0017   Epoch: 17   Global Step: 290690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:13,235-Speed 3331.31 samples/sec   Loss 0.1724   LearningRate 0.0017   Epoch: 17   Global Step: 290700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:16,311-Speed 3330.19 samples/sec   Loss 0.1608   LearningRate 0.0017   Epoch: 17   Global Step: 290710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:19,372-Speed 3346.40 samples/sec   Loss 0.1589   LearningRate 0.0017   Epoch: 17   Global Step: 290720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:22,437-Speed 3341.00 samples/sec   Loss 0.1837   LearningRate 0.0017   Epoch: 17   Global Step: 290730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:25,535-Speed 3306.42 samples/sec   Loss 0.1731   LearningRate 0.0017   Epoch: 17   Global Step: 290740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:28,608-Speed 3333.11 samples/sec   Loss 0.1631   LearningRate 0.0017   Epoch: 17   Global Step: 290750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:31,709-Speed 3302.71 samples/sec   Loss 0.1692   LearningRate 0.0017   Epoch: 17   Global Step: 290760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:34,807-Speed 3305.70 samples/sec   Loss 0.1689   LearningRate 0.0017   Epoch: 17   Global Step: 290770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:37,871-Speed 3342.63 samples/sec   Loss 0.1659   LearningRate 0.0017   Epoch: 17   Global Step: 290780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:40,947-Speed 3330.27 samples/sec   Loss 0.1809   LearningRate 0.0017   Epoch: 17   Global Step: 290790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:44,013-Speed 3340.22 samples/sec   Loss 0.1712   LearningRate 0.0017   Epoch: 17   Global Step: 290800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:47,075-Speed 3345.05 samples/sec   Loss 0.1612   LearningRate 0.0017   Epoch: 17   Global Step: 290810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-12 06:01:50,178-Speed 3301.64 samples/sec   Loss 0.1677   LearningRate 0.0017   Epoch: 17   Global Step: 290820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:53,265-Speed 3317.04 samples/sec   Loss 0.1656   LearningRate 0.0017   Epoch: 17   Global Step: 290830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:56,341-Speed 3329.68 samples/sec   Loss 0.1719   LearningRate 0.0017   Epoch: 17   Global Step: 290840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:01:59,410-Speed 3337.29 samples/sec   Loss 0.1779   LearningRate 0.0017   Epoch: 17   Global Step: 290850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:02,488-Speed 3328.34 samples/sec   Loss 0.1522   LearningRate 0.0017   Epoch: 17   Global Step: 290860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:05,558-Speed 3335.84 samples/sec   Loss 0.1660   LearningRate 0.0017   Epoch: 17   Global Step: 290870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:08,708-Speed 3251.93 samples/sec   Loss 0.1722   LearningRate 0.0017   Epoch: 17   Global Step: 290880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:11,951-Speed 3158.11 samples/sec   Loss 0.1775   LearningRate 0.0017   Epoch: 17   Global Step: 290890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:15,088-Speed 3264.46 samples/sec   Loss 0.1760   LearningRate 0.0017   Epoch: 17   Global Step: 290900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:18,159-Speed 3335.04 samples/sec   Loss 0.1910   LearningRate 0.0017   Epoch: 17   Global Step: 290910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:21,223-Speed 3343.53 samples/sec   Loss 0.1670   LearningRate 0.0017   Epoch: 17   Global Step: 290920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:24,294-Speed 3334.63 samples/sec   Loss 0.1764   LearningRate 0.0017   Epoch: 17   Global Step: 290930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:27,370-Speed 3330.03 samples/sec   Loss 0.1725   LearningRate 0.0017   Epoch: 17   Global Step: 290940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:30,434-Speed 3342.45 samples/sec   Loss 0.1696   LearningRate 0.0016   Epoch: 17   Global Step: 290950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:33,506-Speed 3333.88 samples/sec   Loss 0.1690   LearningRate 0.0016   Epoch: 17   Global Step: 290960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:36,571-Speed 3341.40 samples/sec   Loss 0.1630   LearningRate 0.0016   Epoch: 17   Global Step: 290970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:39,753-Speed 3219.52 samples/sec   Loss 0.1818   LearningRate 0.0016   Epoch: 17   Global Step: 290980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:42,850-Speed 3306.75 samples/sec   Loss 0.1695   LearningRate 0.0016   Epoch: 17   Global Step: 290990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:45,917-Speed 3340.37 samples/sec   Loss 0.1665   LearningRate 0.0016   Epoch: 17   Global Step: 291000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:48,998-Speed 3324.19 samples/sec   Loss 0.1804   LearningRate 0.0016   Epoch: 17   Global Step: 291010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:52,054-Speed 3351.48 samples/sec   Loss 0.1718   LearningRate 0.0016   Epoch: 17   Global Step: 291020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:55,185-Speed 3271.26 samples/sec   Loss 0.1590   LearningRate 0.0016   Epoch: 17   Global Step: 291030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:02:58,249-Speed 3342.03 samples/sec   Loss 0.1788   LearningRate 0.0016   Epoch: 17   Global Step: 291040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:01,310-Speed 3345.76 samples/sec   Loss 0.1728   LearningRate 0.0016   Epoch: 17   Global Step: 291050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:04,405-Speed 3309.98 samples/sec   Loss 0.1695   LearningRate 0.0016   Epoch: 17   Global Step: 291060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:07,474-Speed 3337.52 samples/sec   Loss 0.1700   LearningRate 0.0016   Epoch: 17   Global Step: 291070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:10,551-Speed 3328.81 samples/sec   Loss 0.1786   LearningRate 0.0016   Epoch: 17   Global Step: 291080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:13,623-Speed 3334.16 samples/sec   Loss 0.1706   LearningRate 0.0016   Epoch: 17   Global Step: 291090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:16,702-Speed 3326.18 samples/sec   Loss 0.1655   LearningRate 0.0016   Epoch: 17   Global Step: 291100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:19,767-Speed 3341.69 samples/sec   Loss 0.1685   LearningRate 0.0016   Epoch: 17   Global Step: 291110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:22,841-Speed 3332.62 samples/sec   Loss 0.1607   LearningRate 0.0016   Epoch: 17   Global Step: 291120   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 06:03:25,908-Speed 3338.66 samples/sec   Loss 0.1753   LearningRate 0.0016   Epoch: 17   Global Step: 291130   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 06:03:28,964-Speed 3351.96 samples/sec   Loss 0.1793   LearningRate 0.0016   Epoch: 17   Global Step: 291140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:32,070-Speed 3297.56 samples/sec   Loss 0.1708   LearningRate 0.0016   Epoch: 17   Global Step: 291150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:35,182-Speed 3291.15 samples/sec   Loss 0.1580   LearningRate 0.0016   Epoch: 17   Global Step: 291160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:38,255-Speed 3333.30 samples/sec   Loss 0.1757   LearningRate 0.0016   Epoch: 17   Global Step: 291170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:41,322-Speed 3339.15 samples/sec   Loss 0.1764   LearningRate 0.0016   Epoch: 17   Global Step: 291180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:44,388-Speed 3340.63 samples/sec   Loss 0.1805   LearningRate 0.0016   Epoch: 17   Global Step: 291190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:47,472-Speed 3320.97 samples/sec   Loss 0.1690   LearningRate 0.0016   Epoch: 17   Global Step: 291200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:50,604-Speed 3271.04 samples/sec   Loss 0.1831   LearningRate 0.0016   Epoch: 17   Global Step: 291210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:53,693-Speed 3315.42 samples/sec   Loss 0.1697   LearningRate 0.0016   Epoch: 17   Global Step: 291220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:56,773-Speed 3325.84 samples/sec   Loss 0.1710   LearningRate 0.0016   Epoch: 17   Global Step: 291230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:03:59,923-Speed 3250.52 samples/sec   Loss 0.1607   LearningRate 0.0016   Epoch: 17   Global Step: 291240   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 06:04:02,999-Speed 3330.74 samples/sec   Loss 0.1647   LearningRate 0.0016   Epoch: 17   Global Step: 291250   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 06:04:06,065-Speed 3340.06 samples/sec   Loss 0.1740   LearningRate 0.0016   Epoch: 17   Global Step: 291260   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-12 06:04:09,122-Speed 3350.94 samples/sec   Loss 0.1580   LearningRate 0.0016   Epoch: 17   Global Step: 291270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:04:12,194-Speed 3333.78 samples/sec   Loss 0.1736   LearningRate 0.0016   Epoch: 17   Global Step: 291280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-12 06:04:15,258-Speed 3342.76 samples/sec   Loss 0.1848   LearningRate 0.0016   Epoch: 17   Global Step: 291290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:18,352-Speed 3310.34 samples/sec   Loss 0.1875   LearningRate 0.0016   Epoch: 17   Global Step: 291300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:21,433-Speed 3324.57 samples/sec   Loss 0.1660   LearningRate 0.0016   Epoch: 17   Global Step: 291310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:24,602-Speed 3232.54 samples/sec   Loss 0.1701   LearningRate 0.0016   Epoch: 17   Global Step: 291320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:27,731-Speed 3272.31 samples/sec   Loss 0.1721   LearningRate 0.0016   Epoch: 17   Global Step: 291330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:30,819-Speed 3317.78 samples/sec   Loss 0.1621   LearningRate 0.0016   Epoch: 17   Global Step: 291340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:33,915-Speed 3308.52 samples/sec   Loss 0.1759   LearningRate 0.0016   Epoch: 17   Global Step: 291350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:37,040-Speed 3277.55 samples/sec   Loss 0.1703   LearningRate 0.0016   Epoch: 17   Global Step: 291360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:40,205-Speed 3236.01 samples/sec   Loss 0.1782   LearningRate 0.0016   Epoch: 17   Global Step: 291370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:43,282-Speed 3327.51 samples/sec   Loss 0.1725   LearningRate 0.0016   Epoch: 17   Global Step: 291380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:04:46,337-Speed 3353.73 samples/sec   Loss 0.1742   LearningRate 0.0016   Epoch: 17   Global Step: 291390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:04:49,406-Speed 3337.24 samples/sec   Loss 0.1668   LearningRate 0.0016   Epoch: 17   Global Step: 291400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:04:52,493-Speed 3317.71 samples/sec   Loss 0.1720   LearningRate 0.0016   Epoch: 17   Global Step: 291410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:04:55,556-Speed 3343.14 samples/sec   Loss 0.1577   LearningRate 0.0016   Epoch: 17   Global Step: 291420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:04:58,671-Speed 3289.23 samples/sec   Loss 0.1619   LearningRate 0.0016   Epoch: 17   Global Step: 291430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:05:01,739-Speed 3337.81 samples/sec   Loss 0.1659   LearningRate 0.0016   Epoch: 17   Global Step: 291440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:05:04,805-Speed 3341.49 samples/sec   Loss 0.1782   LearningRate 0.0016   Epoch: 17   Global Step: 291450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:05:07,934-Speed 3273.09 samples/sec   Loss 0.1697   LearningRate 0.0016   Epoch: 17   Global Step: 291460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:05:11,048-Speed 3288.63 samples/sec   Loss 0.1633   LearningRate 0.0016   Epoch: 17   Global Step: 291470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:05:14,163-Speed 3287.88 samples/sec   Loss 0.1698   LearningRate 0.0016   Epoch: 17   Global Step: 291480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:05:17,241-Speed 3327.31 samples/sec   Loss 0.1859   LearningRate 0.0016   Epoch: 17   Global Step: 291490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:20,358-Speed 3286.22 samples/sec   Loss 0.1667   LearningRate 0.0016   Epoch: 17   Global Step: 291500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:23,433-Speed 3331.37 samples/sec   Loss 0.1772   LearningRate 0.0016   Epoch: 17   Global Step: 291510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:26,546-Speed 3289.87 samples/sec   Loss 0.1602   LearningRate 0.0016   Epoch: 17   Global Step: 291520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:29,609-Speed 3343.64 samples/sec   Loss 0.1781   LearningRate 0.0016   Epoch: 17   Global Step: 291530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:32,695-Speed 3319.13 samples/sec   Loss 0.1886   LearningRate 0.0016   Epoch: 17   Global Step: 291540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:35,814-Speed 3284.67 samples/sec   Loss 0.1733   LearningRate 0.0016   Epoch: 17   Global Step: 291550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:38,907-Speed 3311.44 samples/sec   Loss 0.1947   LearningRate 0.0016   Epoch: 17   Global Step: 291560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:42,054-Speed 3254.16 samples/sec   Loss 0.1637   LearningRate 0.0016   Epoch: 17   Global Step: 291570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:45,128-Speed 3331.66 samples/sec   Loss 0.1796   LearningRate 0.0016   Epoch: 17   Global Step: 291580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:48,215-Speed 3317.64 samples/sec   Loss 0.1687   LearningRate 0.0016   Epoch: 17   Global Step: 291590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:51,321-Speed 3298.54 samples/sec   Loss 0.1550   LearningRate 0.0016   Epoch: 17   Global Step: 291600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:54,384-Speed 3344.01 samples/sec   Loss 0.1751   LearningRate 0.0016   Epoch: 17   Global Step: 291610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:05:57,544-Speed 3241.69 samples/sec   Loss 0.1679   LearningRate 0.0016   Epoch: 17   Global Step: 291620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:00,706-Speed 3238.65 samples/sec   Loss 0.1698   LearningRate 0.0016   Epoch: 17   Global Step: 291630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:03,790-Speed 3320.81 samples/sec   Loss 0.1652   LearningRate 0.0016   Epoch: 17   Global Step: 291640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:06,927-Speed 3265.11 samples/sec   Loss 0.1699   LearningRate 0.0016   Epoch: 17   Global Step: 291650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:09,998-Speed 3336.01 samples/sec   Loss 0.1593   LearningRate 0.0016   Epoch: 17   Global Step: 291660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:13,084-Speed 3318.23 samples/sec   Loss 0.1571   LearningRate 0.0016   Epoch: 17   Global Step: 291670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:16,151-Speed 3340.06 samples/sec   Loss 0.1769   LearningRate 0.0016   Epoch: 17   Global Step: 291680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:19,220-Speed 3337.92 samples/sec   Loss 0.1716   LearningRate 0.0016   Epoch: 17   Global Step: 291690   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:06:22,328-Speed 3295.37 samples/sec   Loss 0.1725   LearningRate 0.0016   Epoch: 17   Global Step: 291700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:25,396-Speed 3338.17 samples/sec   Loss 0.1897   LearningRate 0.0016   Epoch: 17   Global Step: 291710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:28,460-Speed 3342.43 samples/sec   Loss 0.1740   LearningRate 0.0016   Epoch: 17   Global Step: 291720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:31,550-Speed 3315.33 samples/sec   Loss 0.1769   LearningRate 0.0016   Epoch: 17   Global Step: 291730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:34,661-Speed 3291.98 samples/sec   Loss 0.1531   LearningRate 0.0016   Epoch: 17   Global Step: 291740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:37,777-Speed 3286.93 samples/sec   Loss 0.1784   LearningRate 0.0016   Epoch: 17   Global Step: 291750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:40,881-Speed 3299.34 samples/sec   Loss 0.1592   LearningRate 0.0016   Epoch: 17   Global Step: 291760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:43,950-Speed 3337.25 samples/sec   Loss 0.1554   LearningRate 0.0016   Epoch: 17   Global Step: 291770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:47,032-Speed 3323.96 samples/sec   Loss 0.1646   LearningRate 0.0016   Epoch: 17   Global Step: 291780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:50,099-Speed 3339.53 samples/sec   Loss 0.1574   LearningRate 0.0016   Epoch: 17   Global Step: 291790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:53,199-Speed 3304.05 samples/sec   Loss 0.1698   LearningRate 0.0016   Epoch: 17   Global Step: 291800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:56,270-Speed 3334.70 samples/sec   Loss 0.1661   LearningRate 0.0016   Epoch: 17   Global Step: 291810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:06:59,337-Speed 3340.12 samples/sec   Loss 0.1720   LearningRate 0.0016   Epoch: 17   Global Step: 291820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:02,406-Speed 3336.82 samples/sec   Loss 0.1743   LearningRate 0.0016   Epoch: 17   Global Step: 291830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:05,475-Speed 3337.14 samples/sec   Loss 0.1775   LearningRate 0.0016   Epoch: 17   Global Step: 291840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:08,557-Speed 3323.14 samples/sec   Loss 0.1692   LearningRate 0.0016   Epoch: 17   Global Step: 291850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:11,625-Speed 3339.07 samples/sec   Loss 0.1849   LearningRate 0.0016   Epoch: 17   Global Step: 291860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:14,692-Speed 3339.83 samples/sec   Loss 0.1714   LearningRate 0.0016   Epoch: 17   Global Step: 291870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:17,795-Speed 3300.05 samples/sec   Loss 0.1835   LearningRate 0.0016   Epoch: 17   Global Step: 291880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:20,947-Speed 3250.07 samples/sec   Loss 0.1775   LearningRate 0.0016   Epoch: 17   Global Step: 291890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:24,039-Speed 3312.44 samples/sec   Loss 0.1816   LearningRate 0.0016   Epoch: 17   Global Step: 291900   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:07:27,097-Speed 3349.67 samples/sec   Loss 0.1580   LearningRate 0.0016   Epoch: 17   Global Step: 291910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:30,172-Speed 3329.96 samples/sec   Loss 0.1674   LearningRate 0.0016   Epoch: 17   Global Step: 291920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:33,307-Speed 3266.97 samples/sec   Loss 0.1676   LearningRate 0.0016   Epoch: 17   Global Step: 291930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:36,397-Speed 3315.66 samples/sec   Loss 0.1720   LearningRate 0.0016   Epoch: 17   Global Step: 291940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:39,530-Speed 3269.00 samples/sec   Loss 0.1709   LearningRate 0.0016   Epoch: 17   Global Step: 291950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:42,632-Speed 3301.45 samples/sec   Loss 0.1684   LearningRate 0.0016   Epoch: 17   Global Step: 291960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:45,758-Speed 3276.42 samples/sec   Loss 0.1536   LearningRate 0.0016   Epoch: 17   Global Step: 291970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:48,871-Speed 3290.92 samples/sec   Loss 0.1703   LearningRate 0.0016   Epoch: 17   Global Step: 291980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:51,980-Speed 3293.71 samples/sec   Loss 0.1752   LearningRate 0.0016   Epoch: 17   Global Step: 291990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:07:55,147-Speed 3234.46 samples/sec   Loss 0.1609   LearningRate 0.0016   Epoch: 17   Global Step: 292000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:08:39,127-[lfw][292000]XNorm: 21.051618
Training: 2022-04-12 06:08:39,128-[lfw][292000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-12 06:08:39,128-[lfw][292000]Accuracy-Highest: 0.99817
Training: 2022-04-12 06:09:30,213-[cfp_fp][292000]XNorm: 22.689512
Training: 2022-04-12 06:09:30,213-[cfp_fp][292000]Accuracy-Flip: 0.99157+-0.00421
Training: 2022-04-12 06:09:30,214-[cfp_fp][292000]Accuracy-Highest: 0.99200
Training: 2022-04-12 06:10:14,158-[agedb_30][292000]XNorm: 23.059562
Training: 2022-04-12 06:10:14,158-[agedb_30][292000]Accuracy-Flip: 0.98533+-0.00614
Training: 2022-04-12 06:10:14,159-[agedb_30][292000]Accuracy-Highest: 0.98650
Training: 2022-04-12 06:10:17,329-Speed 72.02 samples/sec   Loss 0.1672   LearningRate 0.0016   Epoch: 17   Global Step: 292010   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:10:20,477-Speed 3253.78 samples/sec   Loss 0.1603   LearningRate 0.0016   Epoch: 17   Global Step: 292020   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:10:23,538-Speed 3346.18 samples/sec   Loss 0.1815   LearningRate 0.0016   Epoch: 17   Global Step: 292030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:10:26,665-Speed 3276.03 samples/sec   Loss 0.1748   LearningRate 0.0016   Epoch: 17   Global Step: 292040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:10:29,737-Speed 3333.23 samples/sec   Loss 0.1687   LearningRate 0.0016   Epoch: 17   Global Step: 292050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:10:32,803-Speed 3340.73 samples/sec   Loss 0.1824   LearningRate 0.0016   Epoch: 17   Global Step: 292060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:10:35,870-Speed 3340.12 samples/sec   Loss 0.1738   LearningRate 0.0016   Epoch: 17   Global Step: 292070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:10:38,943-Speed 3333.09 samples/sec   Loss 0.1842   LearningRate 0.0016   Epoch: 17   Global Step: 292080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:10:42,034-Speed 3313.31 samples/sec   Loss 0.1714   LearningRate 0.0016   Epoch: 17   Global Step: 292090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:10:45,117-Speed 3322.53 samples/sec   Loss 0.1667   LearningRate 0.0016   Epoch: 17   Global Step: 292100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:10:48,191-Speed 3331.56 samples/sec   Loss 0.1574   LearningRate 0.0016   Epoch: 17   Global Step: 292110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:10:51,254-Speed 3344.04 samples/sec   Loss 0.1659   LearningRate 0.0016   Epoch: 17   Global Step: 292120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:10:54,321-Speed 3339.73 samples/sec   Loss 0.1799   LearningRate 0.0016   Epoch: 17   Global Step: 292130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:10:57,382-Speed 3346.06 samples/sec   Loss 0.1920   LearningRate 0.0016   Epoch: 17   Global Step: 292140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:11:00,580-Speed 3203.10 samples/sec   Loss 0.1650   LearningRate 0.0016   Epoch: 17   Global Step: 292150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:11:03,676-Speed 3307.52 samples/sec   Loss 0.1810   LearningRate 0.0016   Epoch: 17   Global Step: 292160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:11:06,762-Speed 3319.35 samples/sec   Loss 0.1773   LearningRate 0.0016   Epoch: 17   Global Step: 292170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:11:09,828-Speed 3340.50 samples/sec   Loss 0.1663   LearningRate 0.0016   Epoch: 17   Global Step: 292180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:11:12,957-Speed 3273.53 samples/sec   Loss 0.1716   LearningRate 0.0016   Epoch: 17   Global Step: 292190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:11:16,131-Speed 3226.48 samples/sec   Loss 0.1706   LearningRate 0.0016   Epoch: 17   Global Step: 292200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:19,205-Speed 3332.56 samples/sec   Loss 0.1687   LearningRate 0.0016   Epoch: 17   Global Step: 292210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:22,269-Speed 3342.69 samples/sec   Loss 0.1861   LearningRate 0.0016   Epoch: 17   Global Step: 292220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:25,356-Speed 3318.13 samples/sec   Loss 0.1677   LearningRate 0.0016   Epoch: 17   Global Step: 292230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:28,498-Speed 3259.53 samples/sec   Loss 0.1740   LearningRate 0.0016   Epoch: 17   Global Step: 292240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:31,565-Speed 3339.19 samples/sec   Loss 0.1711   LearningRate 0.0016   Epoch: 17   Global Step: 292250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:34,729-Speed 3237.39 samples/sec   Loss 0.1716   LearningRate 0.0015   Epoch: 17   Global Step: 292260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:37,914-Speed 3216.06 samples/sec   Loss 0.1696   LearningRate 0.0015   Epoch: 17   Global Step: 292270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:41,066-Speed 3248.80 samples/sec   Loss 0.1748   LearningRate 0.0015   Epoch: 17   Global Step: 292280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:44,297-Speed 3169.89 samples/sec   Loss 0.1785   LearningRate 0.0015   Epoch: 17   Global Step: 292290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:11:47,450-Speed 3249.53 samples/sec   Loss 0.1824   LearningRate 0.0015   Epoch: 17   Global Step: 292300   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:11:50,534-Speed 3320.67 samples/sec   Loss 0.1669   LearningRate 0.0015   Epoch: 17   Global Step: 292310   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:11:53,588-Speed 3353.59 samples/sec   Loss 0.1738   LearningRate 0.0015   Epoch: 17   Global Step: 292320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:11:56,669-Speed 3325.09 samples/sec   Loss 0.1804   LearningRate 0.0015   Epoch: 17   Global Step: 292330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:11:59,834-Speed 3235.83 samples/sec   Loss 0.1741   LearningRate 0.0015   Epoch: 17   Global Step: 292340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:12:02,928-Speed 3310.36 samples/sec   Loss 0.1700   LearningRate 0.0015   Epoch: 17   Global Step: 292350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:12:06,122-Speed 3206.09 samples/sec   Loss 0.1783   LearningRate 0.0015   Epoch: 17   Global Step: 292360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:12:09,198-Speed 3330.34 samples/sec   Loss 0.1828   LearningRate 0.0015   Epoch: 17   Global Step: 292370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:12:12,317-Speed 3283.73 samples/sec   Loss 0.1652   LearningRate 0.0015   Epoch: 17   Global Step: 292380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:12:15,421-Speed 3300.08 samples/sec   Loss 0.1700   LearningRate 0.0015   Epoch: 17   Global Step: 292390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:12:18,497-Speed 3330.00 samples/sec   Loss 0.1711   LearningRate 0.0015   Epoch: 17   Global Step: 292400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:12:21,581-Speed 3321.34 samples/sec   Loss 0.1624   LearningRate 0.0015   Epoch: 17   Global Step: 292410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:12:24,645-Speed 3342.48 samples/sec   Loss 0.1682   LearningRate 0.0015   Epoch: 17   Global Step: 292420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:27,709-Speed 3342.42 samples/sec   Loss 0.1704   LearningRate 0.0015   Epoch: 17   Global Step: 292430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:30,869-Speed 3240.86 samples/sec   Loss 0.1708   LearningRate 0.0015   Epoch: 17   Global Step: 292440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:34,031-Speed 3239.28 samples/sec   Loss 0.1562   LearningRate 0.0015   Epoch: 17   Global Step: 292450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:37,167-Speed 3266.52 samples/sec   Loss 0.1744   LearningRate 0.0015   Epoch: 17   Global Step: 292460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:40,231-Speed 3342.90 samples/sec   Loss 0.1795   LearningRate 0.0015   Epoch: 17   Global Step: 292470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:43,368-Speed 3265.69 samples/sec   Loss 0.1617   LearningRate 0.0015   Epoch: 17   Global Step: 292480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:46,442-Speed 3331.73 samples/sec   Loss 0.1705   LearningRate 0.0015   Epoch: 17   Global Step: 292490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:49,521-Speed 3326.67 samples/sec   Loss 0.1739   LearningRate 0.0015   Epoch: 17   Global Step: 292500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:52,630-Speed 3294.23 samples/sec   Loss 0.1674   LearningRate 0.0015   Epoch: 17   Global Step: 292510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:12:55,749-Speed 3283.41 samples/sec   Loss 0.1787   LearningRate 0.0015   Epoch: 17   Global Step: 292520   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:12:58,835-Speed 3319.75 samples/sec   Loss 0.1798   LearningRate 0.0015   Epoch: 17   Global Step: 292530   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:13:01,952-Speed 3285.20 samples/sec   Loss 0.1711   LearningRate 0.0015   Epoch: 17   Global Step: 292540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:13:05,057-Speed 3299.12 samples/sec   Loss 0.1709   LearningRate 0.0015   Epoch: 17   Global Step: 292550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:13:08,243-Speed 3215.23 samples/sec   Loss 0.1732   LearningRate 0.0015   Epoch: 17   Global Step: 292560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:13:11,387-Speed 3257.17 samples/sec   Loss 0.1811   LearningRate 0.0015   Epoch: 17   Global Step: 292570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:14,487-Speed 3304.21 samples/sec   Loss 0.1812   LearningRate 0.0015   Epoch: 17   Global Step: 292580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:17,548-Speed 3345.40 samples/sec   Loss 0.1732   LearningRate 0.0015   Epoch: 17   Global Step: 292590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:20,611-Speed 3344.24 samples/sec   Loss 0.1848   LearningRate 0.0015   Epoch: 17   Global Step: 292600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:23,694-Speed 3322.52 samples/sec   Loss 0.1756   LearningRate 0.0015   Epoch: 17   Global Step: 292610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:26,759-Speed 3341.51 samples/sec   Loss 0.1666   LearningRate 0.0015   Epoch: 17   Global Step: 292620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:29,851-Speed 3312.56 samples/sec   Loss 0.1883   LearningRate 0.0015   Epoch: 17   Global Step: 292630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:33,005-Speed 3248.12 samples/sec   Loss 0.1559   LearningRate 0.0015   Epoch: 17   Global Step: 292640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:36,154-Speed 3251.86 samples/sec   Loss 0.1782   LearningRate 0.0015   Epoch: 17   Global Step: 292650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:39,262-Speed 3295.81 samples/sec   Loss 0.1753   LearningRate 0.0015   Epoch: 17   Global Step: 292660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:13:42,339-Speed 3329.21 samples/sec   Loss 0.1800   LearningRate 0.0015   Epoch: 17   Global Step: 292670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:13:45,506-Speed 3233.22 samples/sec   Loss 0.1775   LearningRate 0.0015   Epoch: 17   Global Step: 292680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:13:48,630-Speed 3278.91 samples/sec   Loss 0.1723   LearningRate 0.0015   Epoch: 17   Global Step: 292690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:13:51,703-Speed 3333.17 samples/sec   Loss 0.1702   LearningRate 0.0015   Epoch: 17   Global Step: 292700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:13:54,771-Speed 3337.78 samples/sec   Loss 0.1777   LearningRate 0.0015   Epoch: 17   Global Step: 292710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:13:57,835-Speed 3343.08 samples/sec   Loss 0.1638   LearningRate 0.0015   Epoch: 17   Global Step: 292720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:00,925-Speed 3315.11 samples/sec   Loss 0.1981   LearningRate 0.0015   Epoch: 17   Global Step: 292730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:04,005-Speed 3325.74 samples/sec   Loss 0.1712   LearningRate 0.0015   Epoch: 17   Global Step: 292740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:07,098-Speed 3311.32 samples/sec   Loss 0.1942   LearningRate 0.0015   Epoch: 17   Global Step: 292750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:10,163-Speed 3342.03 samples/sec   Loss 0.1746   LearningRate 0.0015   Epoch: 17   Global Step: 292760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:13,237-Speed 3331.24 samples/sec   Loss 0.1663   LearningRate 0.0015   Epoch: 17   Global Step: 292770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:16,307-Speed 3336.33 samples/sec   Loss 0.1753   LearningRate 0.0015   Epoch: 17   Global Step: 292780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:19,407-Speed 3303.70 samples/sec   Loss 0.1626   LearningRate 0.0015   Epoch: 17   Global Step: 292790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:22,515-Speed 3295.47 samples/sec   Loss 0.1766   LearningRate 0.0015   Epoch: 17   Global Step: 292800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:25,616-Speed 3303.66 samples/sec   Loss 0.1749   LearningRate 0.0015   Epoch: 17   Global Step: 292810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:28,688-Speed 3334.41 samples/sec   Loss 0.1746   LearningRate 0.0015   Epoch: 17   Global Step: 292820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:31,752-Speed 3342.71 samples/sec   Loss 0.1574   LearningRate 0.0015   Epoch: 17   Global Step: 292830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:34,830-Speed 3328.05 samples/sec   Loss 0.1730   LearningRate 0.0015   Epoch: 17   Global Step: 292840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:37,911-Speed 3323.75 samples/sec   Loss 0.1822   LearningRate 0.0015   Epoch: 17   Global Step: 292850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:40,980-Speed 3337.37 samples/sec   Loss 0.1596   LearningRate 0.0015   Epoch: 17   Global Step: 292860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:44,034-Speed 3353.40 samples/sec   Loss 0.1641   LearningRate 0.0015   Epoch: 17   Global Step: 292870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:47,110-Speed 3329.89 samples/sec   Loss 0.1714   LearningRate 0.0015   Epoch: 17   Global Step: 292880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:50,175-Speed 3341.88 samples/sec   Loss 0.1718   LearningRate 0.0015   Epoch: 17   Global Step: 292890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:53,255-Speed 3325.42 samples/sec   Loss 0.1814   LearningRate 0.0015   Epoch: 17   Global Step: 292900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:56,320-Speed 3341.38 samples/sec   Loss 0.1711   LearningRate 0.0015   Epoch: 17   Global Step: 292910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:14:59,429-Speed 3294.79 samples/sec   Loss 0.1753   LearningRate 0.0015   Epoch: 17   Global Step: 292920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:15:02,585-Speed 3245.77 samples/sec   Loss 0.1574   LearningRate 0.0015   Epoch: 17   Global Step: 292930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:15:05,709-Speed 3278.13 samples/sec   Loss 0.1698   LearningRate 0.0015   Epoch: 17   Global Step: 292940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:15:08,764-Speed 3352.94 samples/sec   Loss 0.1665   LearningRate 0.0015   Epoch: 17   Global Step: 292950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:11,831-Speed 3338.81 samples/sec   Loss 0.1784   LearningRate 0.0015   Epoch: 17   Global Step: 292960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:14,914-Speed 3322.57 samples/sec   Loss 0.1839   LearningRate 0.0015   Epoch: 17   Global Step: 292970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:17,981-Speed 3338.99 samples/sec   Loss 0.1557   LearningRate 0.0015   Epoch: 17   Global Step: 292980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:21,069-Speed 3316.89 samples/sec   Loss 0.1761   LearningRate 0.0015   Epoch: 17   Global Step: 292990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:24,158-Speed 3315.90 samples/sec   Loss 0.1718   LearningRate 0.0015   Epoch: 17   Global Step: 293000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:27,228-Speed 3337.20 samples/sec   Loss 0.1714   LearningRate 0.0015   Epoch: 17   Global Step: 293010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:30,310-Speed 3322.57 samples/sec   Loss 0.1577   LearningRate 0.0015   Epoch: 17   Global Step: 293020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:33,408-Speed 3306.88 samples/sec   Loss 0.1790   LearningRate 0.0015   Epoch: 17   Global Step: 293030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:36,487-Speed 3326.39 samples/sec   Loss 0.1610   LearningRate 0.0015   Epoch: 17   Global Step: 293040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:15:39,584-Speed 3306.38 samples/sec   Loss 0.1593   LearningRate 0.0015   Epoch: 17   Global Step: 293050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:15:42,736-Speed 3250.38 samples/sec   Loss 0.1740   LearningRate 0.0015   Epoch: 17   Global Step: 293060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:15:45,975-Speed 3161.93 samples/sec   Loss 0.1731   LearningRate 0.0015   Epoch: 17   Global Step: 293070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:15:49,106-Speed 3271.26 samples/sec   Loss 0.1693   LearningRate 0.0015   Epoch: 17   Global Step: 293080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:15:52,190-Speed 3321.75 samples/sec   Loss 0.1712   LearningRate 0.0015   Epoch: 17   Global Step: 293090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:15:55,264-Speed 3331.98 samples/sec   Loss 0.1709   LearningRate 0.0015   Epoch: 17   Global Step: 293100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:15:58,329-Speed 3342.02 samples/sec   Loss 0.1696   LearningRate 0.0015   Epoch: 17   Global Step: 293110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:16:01,461-Speed 3269.38 samples/sec   Loss 0.1767   LearningRate 0.0015   Epoch: 17   Global Step: 293120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:16:04,652-Speed 3209.72 samples/sec   Loss 0.1799   LearningRate 0.0015   Epoch: 17   Global Step: 293130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:16:07,765-Speed 3290.96 samples/sec   Loss 0.1632   LearningRate 0.0015   Epoch: 17   Global Step: 293140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:10,833-Speed 3338.11 samples/sec   Loss 0.1844   LearningRate 0.0015   Epoch: 17   Global Step: 293150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:13,902-Speed 3337.44 samples/sec   Loss 0.1695   LearningRate 0.0015   Epoch: 17   Global Step: 293160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:16,967-Speed 3341.67 samples/sec   Loss 0.1741   LearningRate 0.0015   Epoch: 17   Global Step: 293170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:20,091-Speed 3278.79 samples/sec   Loss 0.1663   LearningRate 0.0015   Epoch: 17   Global Step: 293180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:23,171-Speed 3324.99 samples/sec   Loss 0.1731   LearningRate 0.0015   Epoch: 17   Global Step: 293190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:26,236-Speed 3341.69 samples/sec   Loss 0.1808   LearningRate 0.0015   Epoch: 17   Global Step: 293200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:29,327-Speed 3314.45 samples/sec   Loss 0.1948   LearningRate 0.0015   Epoch: 17   Global Step: 293210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:32,427-Speed 3303.29 samples/sec   Loss 0.1873   LearningRate 0.0015   Epoch: 17   Global Step: 293220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:35,551-Speed 3279.15 samples/sec   Loss 0.1593   LearningRate 0.0015   Epoch: 17   Global Step: 293230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:38,628-Speed 3328.20 samples/sec   Loss 0.1723   LearningRate 0.0015   Epoch: 17   Global Step: 293240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:16:41,707-Speed 3326.82 samples/sec   Loss 0.1668   LearningRate 0.0015   Epoch: 17   Global Step: 293250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:16:44,771-Speed 3342.35 samples/sec   Loss 0.1743   LearningRate 0.0015   Epoch: 17   Global Step: 293260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:16:47,834-Speed 3344.75 samples/sec   Loss 0.1747   LearningRate 0.0015   Epoch: 17   Global Step: 293270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:50,948-Speed 3288.89 samples/sec   Loss 0.1664   LearningRate 0.0015   Epoch: 17   Global Step: 293280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:54,014-Speed 3340.38 samples/sec   Loss 0.1533   LearningRate 0.0015   Epoch: 17   Global Step: 293290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:16:57,130-Speed 3287.23 samples/sec   Loss 0.1727   LearningRate 0.0015   Epoch: 17   Global Step: 293300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:00,206-Speed 3328.95 samples/sec   Loss 0.1693   LearningRate 0.0015   Epoch: 17   Global Step: 293310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:03,302-Speed 3309.05 samples/sec   Loss 0.1726   LearningRate 0.0015   Epoch: 17   Global Step: 293320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:06,401-Speed 3304.16 samples/sec   Loss 0.1692   LearningRate 0.0015   Epoch: 17   Global Step: 293330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:09,482-Speed 3324.77 samples/sec   Loss 0.1856   LearningRate 0.0015   Epoch: 17   Global Step: 293340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:12,548-Speed 3341.04 samples/sec   Loss 0.1639   LearningRate 0.0015   Epoch: 17   Global Step: 293350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:15,622-Speed 3331.41 samples/sec   Loss 0.1784   LearningRate 0.0015   Epoch: 17   Global Step: 293360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:18,690-Speed 3338.61 samples/sec   Loss 0.1701   LearningRate 0.0015   Epoch: 17   Global Step: 293370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:17:21,744-Speed 3353.77 samples/sec   Loss 0.1756   LearningRate 0.0015   Epoch: 17   Global Step: 293380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:24,834-Speed 3314.49 samples/sec   Loss 0.1708   LearningRate 0.0015   Epoch: 17   Global Step: 293390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:27,924-Speed 3315.27 samples/sec   Loss 0.1728   LearningRate 0.0015   Epoch: 17   Global Step: 293400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:31,003-Speed 3326.05 samples/sec   Loss 0.1771   LearningRate 0.0015   Epoch: 17   Global Step: 293410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:34,068-Speed 3341.70 samples/sec   Loss 0.1657   LearningRate 0.0015   Epoch: 17   Global Step: 293420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:37,133-Speed 3341.91 samples/sec   Loss 0.1795   LearningRate 0.0015   Epoch: 17   Global Step: 293430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:40,219-Speed 3318.89 samples/sec   Loss 0.1684   LearningRate 0.0015   Epoch: 17   Global Step: 293440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:43,370-Speed 3250.33 samples/sec   Loss 0.1793   LearningRate 0.0015   Epoch: 17   Global Step: 293450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:46,460-Speed 3315.12 samples/sec   Loss 0.1808   LearningRate 0.0015   Epoch: 17   Global Step: 293460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:49,560-Speed 3304.33 samples/sec   Loss 0.1849   LearningRate 0.0015   Epoch: 17   Global Step: 293470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:17:52,713-Speed 3248.20 samples/sec   Loss 0.1691   LearningRate 0.0015   Epoch: 17   Global Step: 293480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:17:55,790-Speed 3328.55 samples/sec   Loss 0.1686   LearningRate 0.0015   Epoch: 17   Global Step: 293490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:17:58,875-Speed 3319.99 samples/sec   Loss 0.1554   LearningRate 0.0015   Epoch: 17   Global Step: 293500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:18:02,087-Speed 3188.33 samples/sec   Loss 0.1688   LearningRate 0.0015   Epoch: 17   Global Step: 293510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:18:05,169-Speed 3323.70 samples/sec   Loss 0.1866   LearningRate 0.0015   Epoch: 17   Global Step: 293520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:08,265-Speed 3309.18 samples/sec   Loss 0.1604   LearningRate 0.0015   Epoch: 17   Global Step: 293530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:11,399-Speed 3267.23 samples/sec   Loss 0.1822   LearningRate 0.0015   Epoch: 17   Global Step: 293540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:14,528-Speed 3273.25 samples/sec   Loss 0.1811   LearningRate 0.0015   Epoch: 17   Global Step: 293550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:17,615-Speed 3318.57 samples/sec   Loss 0.1859   LearningRate 0.0015   Epoch: 17   Global Step: 293560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:20,718-Speed 3300.68 samples/sec   Loss 0.1805   LearningRate 0.0015   Epoch: 17   Global Step: 293570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:23,828-Speed 3292.64 samples/sec   Loss 0.1626   LearningRate 0.0015   Epoch: 17   Global Step: 293580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:26,924-Speed 3309.09 samples/sec   Loss 0.1726   LearningRate 0.0015   Epoch: 17   Global Step: 293590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:30,005-Speed 3324.55 samples/sec   Loss 0.1737   LearningRate 0.0015   Epoch: 17   Global Step: 293600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:33,078-Speed 3332.68 samples/sec   Loss 0.1649   LearningRate 0.0015   Epoch: 17   Global Step: 293610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:18:36,158-Speed 3326.12 samples/sec   Loss 0.1989   LearningRate 0.0015   Epoch: 17   Global Step: 293620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:18:39,241-Speed 3321.40 samples/sec   Loss 0.1762   LearningRate 0.0014   Epoch: 17   Global Step: 293630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:18:42,334-Speed 3311.87 samples/sec   Loss 0.1850   LearningRate 0.0014   Epoch: 17   Global Step: 293640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:18:45,418-Speed 3321.14 samples/sec   Loss 0.1762   LearningRate 0.0014   Epoch: 17   Global Step: 293650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:18:48,509-Speed 3313.15 samples/sec   Loss 0.1702   LearningRate 0.0014   Epoch: 17   Global Step: 293660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:18:51,660-Speed 3250.18 samples/sec   Loss 0.1642   LearningRate 0.0014   Epoch: 17   Global Step: 293670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:18:54,779-Speed 3284.45 samples/sec   Loss 0.1666   LearningRate 0.0014   Epoch: 17   Global Step: 293680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:18:57,959-Speed 3220.18 samples/sec   Loss 0.1736   LearningRate 0.0014   Epoch: 17   Global Step: 293690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:19:01,083-Speed 3279.19 samples/sec   Loss 0.1721   LearningRate 0.0014   Epoch: 17   Global Step: 293700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:19:04,206-Speed 3279.47 samples/sec   Loss 0.1806   LearningRate 0.0014   Epoch: 17   Global Step: 293710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:19:07,322-Speed 3286.93 samples/sec   Loss 0.1700   LearningRate 0.0014   Epoch: 17   Global Step: 293720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:19:10,432-Speed 3293.06 samples/sec   Loss 0.1824   LearningRate 0.0014   Epoch: 17   Global Step: 293730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:19:13,535-Speed 3301.63 samples/sec   Loss 0.1693   LearningRate 0.0014   Epoch: 17   Global Step: 293740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:19:16,645-Speed 3292.55 samples/sec   Loss 0.1704   LearningRate 0.0014   Epoch: 17   Global Step: 293750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:19:19,734-Speed 3316.22 samples/sec   Loss 0.1738   LearningRate 0.0014   Epoch: 17   Global Step: 293760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:19:22,905-Speed 3229.42 samples/sec   Loss 0.1760   LearningRate 0.0014   Epoch: 17   Global Step: 293770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:19:25,978-Speed 3334.02 samples/sec   Loss 0.1721   LearningRate 0.0014   Epoch: 17   Global Step: 293780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:29,042-Speed 3342.60 samples/sec   Loss 0.1840   LearningRate 0.0014   Epoch: 17   Global Step: 293790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:32,127-Speed 3320.28 samples/sec   Loss 0.1618   LearningRate 0.0014   Epoch: 17   Global Step: 293800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:35,393-Speed 3135.19 samples/sec   Loss 0.1691   LearningRate 0.0014   Epoch: 17   Global Step: 293810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:38,562-Speed 3232.56 samples/sec   Loss 0.1685   LearningRate 0.0014   Epoch: 17   Global Step: 293820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:41,623-Speed 3345.78 samples/sec   Loss 0.1778   LearningRate 0.0014   Epoch: 17   Global Step: 293830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:44,694-Speed 3335.12 samples/sec   Loss 0.1609   LearningRate 0.0014   Epoch: 17   Global Step: 293840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:47,777-Speed 3322.27 samples/sec   Loss 0.1714   LearningRate 0.0014   Epoch: 17   Global Step: 293850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:50,843-Speed 3341.21 samples/sec   Loss 0.1752   LearningRate 0.0014   Epoch: 17   Global Step: 293860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:53,910-Speed 3339.57 samples/sec   Loss 0.1845   LearningRate 0.0014   Epoch: 17   Global Step: 293870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-12 06:19:56,987-Speed 3328.84 samples/sec   Loss 0.1742   LearningRate 0.0014   Epoch: 17   Global Step: 293880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:00,111-Speed 3278.48 samples/sec   Loss 0.1891   LearningRate 0.0014   Epoch: 17   Global Step: 293890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:03,175-Speed 3341.76 samples/sec   Loss 0.1743   LearningRate 0.0014   Epoch: 17   Global Step: 293900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:06,255-Speed 3326.39 samples/sec   Loss 0.1750   LearningRate 0.0014   Epoch: 17   Global Step: 293910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:09,350-Speed 3308.41 samples/sec   Loss 0.1675   LearningRate 0.0014   Epoch: 17   Global Step: 293920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:12,419-Speed 3337.91 samples/sec   Loss 0.1762   LearningRate 0.0014   Epoch: 17   Global Step: 293930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:15,492-Speed 3333.07 samples/sec   Loss 0.1686   LearningRate 0.0014   Epoch: 17   Global Step: 293940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:18,554-Speed 3345.46 samples/sec   Loss 0.1721   LearningRate 0.0014   Epoch: 17   Global Step: 293950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:21,636-Speed 3323.00 samples/sec   Loss 0.1721   LearningRate 0.0014   Epoch: 17   Global Step: 293960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:24,702-Speed 3340.02 samples/sec   Loss 0.1721   LearningRate 0.0014   Epoch: 17   Global Step: 293970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:20:27,772-Speed 3336.61 samples/sec   Loss 0.1647   LearningRate 0.0014   Epoch: 17   Global Step: 293980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:20:30,906-Speed 3268.54 samples/sec   Loss 0.1666   LearningRate 0.0014   Epoch: 17   Global Step: 293990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:20:33,978-Speed 3333.87 samples/sec   Loss 0.1748   LearningRate 0.0014   Epoch: 17   Global Step: 294000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:21:17,977-[lfw][294000]XNorm: 20.687133
Training: 2022-04-12 06:21:17,978-[lfw][294000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 06:21:17,978-[lfw][294000]Accuracy-Highest: 0.99817
Training: 2022-04-12 06:22:09,010-[cfp_fp][294000]XNorm: 22.257177
Training: 2022-04-12 06:22:09,010-[cfp_fp][294000]Accuracy-Flip: 0.99157+-0.00431
Training: 2022-04-12 06:22:09,011-[cfp_fp][294000]Accuracy-Highest: 0.99200
Training: 2022-04-12 06:22:52,921-[agedb_30][294000]XNorm: 22.570799
Training: 2022-04-12 06:22:52,922-[agedb_30][294000]Accuracy-Flip: 0.98600+-0.00549
Training: 2022-04-12 06:22:52,922-[agedb_30][294000]Accuracy-Highest: 0.98650
Training: 2022-04-12 06:22:56,058-Speed 72.07 samples/sec   Loss 0.1553   LearningRate 0.0014   Epoch: 17   Global Step: 294010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:22:59,137-Speed 3326.94 samples/sec   Loss 0.1777   LearningRate 0.0014   Epoch: 17   Global Step: 294020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:02,226-Speed 3315.79 samples/sec   Loss 0.1745   LearningRate 0.0014   Epoch: 17   Global Step: 294030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:05,335-Speed 3293.70 samples/sec   Loss 0.1693   LearningRate 0.0014   Epoch: 17   Global Step: 294040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:08,415-Speed 3325.69 samples/sec   Loss 0.1792   LearningRate 0.0014   Epoch: 17   Global Step: 294050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:11,478-Speed 3344.36 samples/sec   Loss 0.1599   LearningRate 0.0014   Epoch: 17   Global Step: 294060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:14,585-Speed 3296.32 samples/sec   Loss 0.1753   LearningRate 0.0014   Epoch: 17   Global Step: 294070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:17,670-Speed 3319.85 samples/sec   Loss 0.1758   LearningRate 0.0014   Epoch: 17   Global Step: 294080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:20,797-Speed 3276.24 samples/sec   Loss 0.1636   LearningRate 0.0014   Epoch: 17   Global Step: 294090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:23,883-Speed 3318.11 samples/sec   Loss 0.1656   LearningRate 0.0014   Epoch: 17   Global Step: 294100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:26,939-Speed 3351.85 samples/sec   Loss 0.1629   LearningRate 0.0014   Epoch: 17   Global Step: 294110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:30,003-Speed 3342.26 samples/sec   Loss 0.1703   LearningRate 0.0014   Epoch: 17   Global Step: 294120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:33,071-Speed 3338.64 samples/sec   Loss 0.1809   LearningRate 0.0014   Epoch: 17   Global Step: 294130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:36,180-Speed 3295.41 samples/sec   Loss 0.1673   LearningRate 0.0014   Epoch: 17   Global Step: 294140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:39,328-Speed 3252.94 samples/sec   Loss 0.1638   LearningRate 0.0014   Epoch: 17   Global Step: 294150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:42,405-Speed 3328.96 samples/sec   Loss 0.1830   LearningRate 0.0014   Epoch: 17   Global Step: 294160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:45,466-Speed 3346.01 samples/sec   Loss 0.1720   LearningRate 0.0014   Epoch: 17   Global Step: 294170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:48,534-Speed 3338.67 samples/sec   Loss 0.1585   LearningRate 0.0014   Epoch: 17   Global Step: 294180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:51,632-Speed 3305.92 samples/sec   Loss 0.1801   LearningRate 0.0014   Epoch: 17   Global Step: 294190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:54,708-Speed 3329.34 samples/sec   Loss 0.1663   LearningRate 0.0014   Epoch: 17   Global Step: 294200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:23:57,817-Speed 3294.23 samples/sec   Loss 0.1936   LearningRate 0.0014   Epoch: 17   Global Step: 294210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:00,905-Speed 3317.52 samples/sec   Loss 0.1965   LearningRate 0.0014   Epoch: 17   Global Step: 294220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:03,962-Speed 3350.85 samples/sec   Loss 0.1902   LearningRate 0.0014   Epoch: 17   Global Step: 294230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:07,037-Speed 3330.78 samples/sec   Loss 0.1936   LearningRate 0.0014   Epoch: 17   Global Step: 294240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:10,107-Speed 3335.44 samples/sec   Loss 0.1812   LearningRate 0.0014   Epoch: 17   Global Step: 294250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:13,174-Speed 3340.38 samples/sec   Loss 0.1726   LearningRate 0.0014   Epoch: 17   Global Step: 294260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:16,259-Speed 3319.70 samples/sec   Loss 0.1799   LearningRate 0.0014   Epoch: 17   Global Step: 294270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:19,339-Speed 3325.15 samples/sec   Loss 0.1624   LearningRate 0.0014   Epoch: 17   Global Step: 294280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:22,416-Speed 3328.52 samples/sec   Loss 0.1716   LearningRate 0.0014   Epoch: 17   Global Step: 294290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:25,500-Speed 3320.87 samples/sec   Loss 0.1576   LearningRate 0.0014   Epoch: 17   Global Step: 294300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:28,711-Speed 3190.10 samples/sec   Loss 0.1732   LearningRate 0.0014   Epoch: 17   Global Step: 294310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:31,822-Speed 3292.63 samples/sec   Loss 0.1622   LearningRate 0.0014   Epoch: 17   Global Step: 294320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:24:34,909-Speed 3317.46 samples/sec   Loss 0.1628   LearningRate 0.0014   Epoch: 17   Global Step: 294330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:37,980-Speed 3335.89 samples/sec   Loss 0.1727   LearningRate 0.0014   Epoch: 17   Global Step: 294340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:41,100-Speed 3282.01 samples/sec   Loss 0.1634   LearningRate 0.0014   Epoch: 17   Global Step: 294350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:44,215-Speed 3287.76 samples/sec   Loss 0.1795   LearningRate 0.0014   Epoch: 17   Global Step: 294360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:47,330-Speed 3288.43 samples/sec   Loss 0.1781   LearningRate 0.0014   Epoch: 17   Global Step: 294370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:50,416-Speed 3318.78 samples/sec   Loss 0.1739   LearningRate 0.0014   Epoch: 17   Global Step: 294380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:53,545-Speed 3274.46 samples/sec   Loss 0.1879   LearningRate 0.0014   Epoch: 17   Global Step: 294390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:56,630-Speed 3319.57 samples/sec   Loss 0.1690   LearningRate 0.0014   Epoch: 17   Global Step: 294400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:24:59,763-Speed 3269.22 samples/sec   Loss 0.1676   LearningRate 0.0014   Epoch: 17   Global Step: 294410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:25:02,831-Speed 3339.00 samples/sec   Loss 0.1711   LearningRate 0.0014   Epoch: 17   Global Step: 294420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:25:05,892-Speed 3345.88 samples/sec   Loss 0.1638   LearningRate 0.0014   Epoch: 17   Global Step: 294430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:25:08,976-Speed 3320.98 samples/sec   Loss 0.1614   LearningRate 0.0014   Epoch: 17   Global Step: 294440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:25:12,038-Speed 3344.70 samples/sec   Loss 0.1761   LearningRate 0.0014   Epoch: 17   Global Step: 294450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:25:15,220-Speed 3219.07 samples/sec   Loss 0.1816   LearningRate 0.0014   Epoch: 17   Global Step: 294460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:25:18,288-Speed 3338.47 samples/sec   Loss 0.1668   LearningRate 0.0014   Epoch: 17   Global Step: 294470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:25:21,373-Speed 3320.26 samples/sec   Loss 0.1729   LearningRate 0.0014   Epoch: 17   Global Step: 294480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:25:24,451-Speed 3327.91 samples/sec   Loss 0.1700   LearningRate 0.0014   Epoch: 17   Global Step: 294490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:27,529-Speed 3327.20 samples/sec   Loss 0.1680   LearningRate 0.0014   Epoch: 17   Global Step: 294500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:30,758-Speed 3172.04 samples/sec   Loss 0.1666   LearningRate 0.0014   Epoch: 17   Global Step: 294510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:33,879-Speed 3281.97 samples/sec   Loss 0.1755   LearningRate 0.0014   Epoch: 17   Global Step: 294520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:36,958-Speed 3326.37 samples/sec   Loss 0.1863   LearningRate 0.0014   Epoch: 17   Global Step: 294530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:40,031-Speed 3332.60 samples/sec   Loss 0.1653   LearningRate 0.0014   Epoch: 17   Global Step: 294540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:43,111-Speed 3325.47 samples/sec   Loss 0.1696   LearningRate 0.0014   Epoch: 17   Global Step: 294550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:46,303-Speed 3208.53 samples/sec   Loss 0.1542   LearningRate 0.0014   Epoch: 17   Global Step: 294560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:49,389-Speed 3320.40 samples/sec   Loss 0.1807   LearningRate 0.0014   Epoch: 17   Global Step: 294570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:52,474-Speed 3319.06 samples/sec   Loss 0.1695   LearningRate 0.0014   Epoch: 17   Global Step: 294580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:25:55,554-Speed 3325.31 samples/sec   Loss 0.1730   LearningRate 0.0014   Epoch: 17   Global Step: 294590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:25:58,669-Speed 3288.37 samples/sec   Loss 0.1899   LearningRate 0.0014   Epoch: 17   Global Step: 294600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:01,762-Speed 3311.91 samples/sec   Loss 0.1647   LearningRate 0.0014   Epoch: 17   Global Step: 294610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:04,892-Speed 3271.30 samples/sec   Loss 0.1757   LearningRate 0.0014   Epoch: 17   Global Step: 294620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:07,965-Speed 3333.47 samples/sec   Loss 0.1731   LearningRate 0.0014   Epoch: 17   Global Step: 294630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:11,079-Speed 3289.14 samples/sec   Loss 0.1654   LearningRate 0.0014   Epoch: 17   Global Step: 294640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:14,167-Speed 3317.23 samples/sec   Loss 0.1700   LearningRate 0.0014   Epoch: 17   Global Step: 294650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:17,337-Speed 3230.84 samples/sec   Loss 0.1753   LearningRate 0.0014   Epoch: 17   Global Step: 294660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:20,403-Speed 3341.39 samples/sec   Loss 0.1761   LearningRate 0.0014   Epoch: 17   Global Step: 294670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:23,467-Speed 3342.07 samples/sec   Loss 0.1591   LearningRate 0.0014   Epoch: 17   Global Step: 294680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:26,520-Speed 3355.30 samples/sec   Loss 0.1745   LearningRate 0.0014   Epoch: 17   Global Step: 294690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:26:29,591-Speed 3335.39 samples/sec   Loss 0.1653   LearningRate 0.0014   Epoch: 17   Global Step: 294700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:26:32,659-Speed 3338.48 samples/sec   Loss 0.1810   LearningRate 0.0014   Epoch: 17   Global Step: 294710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:26:35,742-Speed 3321.60 samples/sec   Loss 0.1702   LearningRate 0.0014   Epoch: 17   Global Step: 294720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:26:38,828-Speed 3319.60 samples/sec   Loss 0.1637   LearningRate 0.0014   Epoch: 17   Global Step: 294730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:26:41,905-Speed 3328.75 samples/sec   Loss 0.1771   LearningRate 0.0014   Epoch: 17   Global Step: 294740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:26:44,997-Speed 3311.96 samples/sec   Loss 0.1811   LearningRate 0.0014   Epoch: 17   Global Step: 294750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:26:48,171-Speed 3227.21 samples/sec   Loss 0.1659   LearningRate 0.0014   Epoch: 17   Global Step: 294760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:26:51,334-Speed 3238.33 samples/sec   Loss 0.1777   LearningRate 0.0014   Epoch: 17   Global Step: 294770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:26:54,531-Speed 3203.63 samples/sec   Loss 0.1706   LearningRate 0.0014   Epoch: 17   Global Step: 294780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:26:57,773-Speed 3158.62 samples/sec   Loss 0.1749   LearningRate 0.0014   Epoch: 17   Global Step: 294790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:27:00,923-Speed 3252.00 samples/sec   Loss 0.1671   LearningRate 0.0014   Epoch: 17   Global Step: 294800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:04,080-Speed 3244.69 samples/sec   Loss 0.1670   LearningRate 0.0014   Epoch: 17   Global Step: 294810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:07,234-Speed 3247.13 samples/sec   Loss 0.1744   LearningRate 0.0014   Epoch: 17   Global Step: 294820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:10,310-Speed 3329.67 samples/sec   Loss 0.1703   LearningRate 0.0014   Epoch: 17   Global Step: 294830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:13,378-Speed 3338.89 samples/sec   Loss 0.1627   LearningRate 0.0014   Epoch: 17   Global Step: 294840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:16,555-Speed 3224.21 samples/sec   Loss 0.1728   LearningRate 0.0014   Epoch: 17   Global Step: 294850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:19,677-Speed 3279.93 samples/sec   Loss 0.1861   LearningRate 0.0014   Epoch: 17   Global Step: 294860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:22,831-Speed 3247.60 samples/sec   Loss 0.1819   LearningRate 0.0014   Epoch: 17   Global Step: 294870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:25,922-Speed 3313.99 samples/sec   Loss 0.1718   LearningRate 0.0014   Epoch: 17   Global Step: 294880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:28,997-Speed 3330.63 samples/sec   Loss 0.1806   LearningRate 0.0014   Epoch: 17   Global Step: 294890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:32,060-Speed 3343.53 samples/sec   Loss 0.1567   LearningRate 0.0014   Epoch: 17   Global Step: 294900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:35,137-Speed 3329.19 samples/sec   Loss 0.1714   LearningRate 0.0014   Epoch: 17   Global Step: 294910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:38,269-Speed 3270.25 samples/sec   Loss 0.1700   LearningRate 0.0014   Epoch: 17   Global Step: 294920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:41,352-Speed 3322.35 samples/sec   Loss 0.1650   LearningRate 0.0014   Epoch: 17   Global Step: 294930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:44,432-Speed 3325.66 samples/sec   Loss 0.1703   LearningRate 0.0014   Epoch: 17   Global Step: 294940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:47,500-Speed 3338.34 samples/sec   Loss 0.1812   LearningRate 0.0014   Epoch: 17   Global Step: 294950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:50,577-Speed 3328.04 samples/sec   Loss 0.1811   LearningRate 0.0014   Epoch: 17   Global Step: 294960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:53,650-Speed 3333.72 samples/sec   Loss 0.1617   LearningRate 0.0014   Epoch: 17   Global Step: 294970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:56,725-Speed 3330.91 samples/sec   Loss 0.1708   LearningRate 0.0014   Epoch: 17   Global Step: 294980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:27:59,797-Speed 3334.01 samples/sec   Loss 0.1675   LearningRate 0.0014   Epoch: 17   Global Step: 294990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:02,865-Speed 3338.51 samples/sec   Loss 0.1609   LearningRate 0.0014   Epoch: 17   Global Step: 295000   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:28:05,950-Speed 3319.68 samples/sec   Loss 0.1690   LearningRate 0.0014   Epoch: 17   Global Step: 295010   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:28:09,032-Speed 3322.84 samples/sec   Loss 0.1799   LearningRate 0.0014   Epoch: 17   Global Step: 295020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:12,198-Speed 3235.22 samples/sec   Loss 0.1747   LearningRate 0.0014   Epoch: 17   Global Step: 295030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:15,290-Speed 3312.99 samples/sec   Loss 0.1777   LearningRate 0.0013   Epoch: 17   Global Step: 295040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:18,362-Speed 3334.26 samples/sec   Loss 0.1809   LearningRate 0.0013   Epoch: 17   Global Step: 295050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:21,449-Speed 3317.89 samples/sec   Loss 0.1668   LearningRate 0.0013   Epoch: 17   Global Step: 295060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:24,545-Speed 3308.06 samples/sec   Loss 0.1760   LearningRate 0.0013   Epoch: 17   Global Step: 295070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:27,620-Speed 3331.04 samples/sec   Loss 0.1685   LearningRate 0.0013   Epoch: 17   Global Step: 295080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:30,725-Speed 3298.22 samples/sec   Loss 0.1743   LearningRate 0.0013   Epoch: 17   Global Step: 295090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:33,806-Speed 3325.13 samples/sec   Loss 0.1889   LearningRate 0.0013   Epoch: 17   Global Step: 295100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:36,892-Speed 3318.24 samples/sec   Loss 0.1764   LearningRate 0.0013   Epoch: 17   Global Step: 295110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:39,985-Speed 3312.53 samples/sec   Loss 0.1507   LearningRate 0.0013   Epoch: 17   Global Step: 295120   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:28:43,140-Speed 3245.42 samples/sec   Loss 0.1654   LearningRate 0.0013   Epoch: 17   Global Step: 295130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:46,213-Speed 3333.49 samples/sec   Loss 0.1654   LearningRate 0.0013   Epoch: 17   Global Step: 295140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:49,298-Speed 3320.51 samples/sec   Loss 0.1746   LearningRate 0.0013   Epoch: 17   Global Step: 295150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:52,435-Speed 3264.76 samples/sec   Loss 0.1706   LearningRate 0.0013   Epoch: 17   Global Step: 295160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:55,505-Speed 3335.56 samples/sec   Loss 0.1793   LearningRate 0.0013   Epoch: 17   Global Step: 295170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:28:58,658-Speed 3249.19 samples/sec   Loss 0.1674   LearningRate 0.0013   Epoch: 17   Global Step: 295180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:29:01,739-Speed 3324.79 samples/sec   Loss 0.1696   LearningRate 0.0013   Epoch: 17   Global Step: 295190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:29:04,834-Speed 3309.44 samples/sec   Loss 0.1620   LearningRate 0.0013   Epoch: 17   Global Step: 295200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:29:07,954-Speed 3282.02 samples/sec   Loss 0.1766   LearningRate 0.0013   Epoch: 17   Global Step: 295210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:29:11,042-Speed 3317.64 samples/sec   Loss 0.1786   LearningRate 0.0013   Epoch: 17   Global Step: 295220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:29:14,262-Speed 3180.09 samples/sec   Loss 0.1562   LearningRate 0.0013   Epoch: 17   Global Step: 295230   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:29:17,335-Speed 3333.80 samples/sec   Loss 0.1621   LearningRate 0.0013   Epoch: 17   Global Step: 295240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:20,496-Speed 3239.82 samples/sec   Loss 0.1719   LearningRate 0.0013   Epoch: 17   Global Step: 295250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:23,725-Speed 3172.42 samples/sec   Loss 0.1766   LearningRate 0.0013   Epoch: 17   Global Step: 295260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:26,996-Speed 3130.88 samples/sec   Loss 0.1788   LearningRate 0.0013   Epoch: 17   Global Step: 295270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:30,140-Speed 3258.17 samples/sec   Loss 0.1693   LearningRate 0.0013   Epoch: 17   Global Step: 295280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:33,341-Speed 3199.27 samples/sec   Loss 0.1710   LearningRate 0.0013   Epoch: 17   Global Step: 295290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:36,521-Speed 3220.86 samples/sec   Loss 0.1650   LearningRate 0.0013   Epoch: 17   Global Step: 295300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:39,612-Speed 3313.72 samples/sec   Loss 0.1881   LearningRate 0.0013   Epoch: 17   Global Step: 295310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:42,706-Speed 3310.64 samples/sec   Loss 0.1760   LearningRate 0.0013   Epoch: 17   Global Step: 295320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:45,788-Speed 3323.01 samples/sec   Loss 0.1757   LearningRate 0.0013   Epoch: 17   Global Step: 295330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:29:48,908-Speed 3282.91 samples/sec   Loss 0.1805   LearningRate 0.0013   Epoch: 17   Global Step: 295340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:29:52,123-Speed 3186.62 samples/sec   Loss 0.1642   LearningRate 0.0013   Epoch: 17   Global Step: 295350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:29:55,271-Speed 3253.23 samples/sec   Loss 0.1623   LearningRate 0.0013   Epoch: 17   Global Step: 295360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:29:58,380-Speed 3294.38 samples/sec   Loss 0.1913   LearningRate 0.0013   Epoch: 17   Global Step: 295370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:01,500-Speed 3282.81 samples/sec   Loss 0.1764   LearningRate 0.0013   Epoch: 17   Global Step: 295380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:04,601-Speed 3302.36 samples/sec   Loss 0.1766   LearningRate 0.0013   Epoch: 17   Global Step: 295390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:07,678-Speed 3328.57 samples/sec   Loss 0.1669   LearningRate 0.0013   Epoch: 17   Global Step: 295400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:10,857-Speed 3222.36 samples/sec   Loss 0.1851   LearningRate 0.0013   Epoch: 17   Global Step: 295410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:13,966-Speed 3294.92 samples/sec   Loss 0.1831   LearningRate 0.0013   Epoch: 17   Global Step: 295420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:17,051-Speed 3320.24 samples/sec   Loss 0.1660   LearningRate 0.0013   Epoch: 17   Global Step: 295430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:20,140-Speed 3315.57 samples/sec   Loss 0.1695   LearningRate 0.0013   Epoch: 17   Global Step: 295440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:23,314-Speed 3226.92 samples/sec   Loss 0.1622   LearningRate 0.0013   Epoch: 17   Global Step: 295450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:26,504-Speed 3210.09 samples/sec   Loss 0.1710   LearningRate 0.0013   Epoch: 17   Global Step: 295460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:29,603-Speed 3305.22 samples/sec   Loss 0.1781   LearningRate 0.0013   Epoch: 17   Global Step: 295470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:32,672-Speed 3337.79 samples/sec   Loss 0.1696   LearningRate 0.0013   Epoch: 17   Global Step: 295480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:35,752-Speed 3325.49 samples/sec   Loss 0.1723   LearningRate 0.0013   Epoch: 17   Global Step: 295490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:38,844-Speed 3312.02 samples/sec   Loss 0.1675   LearningRate 0.0013   Epoch: 17   Global Step: 295500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:41,918-Speed 3331.91 samples/sec   Loss 0.1739   LearningRate 0.0013   Epoch: 17   Global Step: 295510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:45,067-Speed 3253.11 samples/sec   Loss 0.1510   LearningRate 0.0013   Epoch: 17   Global Step: 295520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:48,231-Speed 3237.45 samples/sec   Loss 0.1759   LearningRate 0.0013   Epoch: 17   Global Step: 295530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:51,304-Speed 3332.13 samples/sec   Loss 0.1665   LearningRate 0.0013   Epoch: 17   Global Step: 295540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:54,375-Speed 3334.94 samples/sec   Loss 0.1585   LearningRate 0.0013   Epoch: 17   Global Step: 295550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:30:57,452-Speed 3328.53 samples/sec   Loss 0.1635   LearningRate 0.0013   Epoch: 17   Global Step: 295560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:00,521-Speed 3338.27 samples/sec   Loss 0.1715   LearningRate 0.0013   Epoch: 17   Global Step: 295570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:03,594-Speed 3332.43 samples/sec   Loss 0.1731   LearningRate 0.0013   Epoch: 17   Global Step: 295580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:06,662-Speed 3337.93 samples/sec   Loss 0.1608   LearningRate 0.0013   Epoch: 17   Global Step: 295590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:09,752-Speed 3315.06 samples/sec   Loss 0.1746   LearningRate 0.0013   Epoch: 17   Global Step: 295600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:12,826-Speed 3332.52 samples/sec   Loss 0.1848   LearningRate 0.0013   Epoch: 17   Global Step: 295610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:15,904-Speed 3327.30 samples/sec   Loss 0.1786   LearningRate 0.0013   Epoch: 17   Global Step: 295620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:19,012-Speed 3295.87 samples/sec   Loss 0.1838   LearningRate 0.0013   Epoch: 17   Global Step: 295630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:22,082-Speed 3335.41 samples/sec   Loss 0.1787   LearningRate 0.0013   Epoch: 17   Global Step: 295640   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:31:25,176-Speed 3310.42 samples/sec   Loss 0.1630   LearningRate 0.0013   Epoch: 17   Global Step: 295650   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:31:28,236-Speed 3347.79 samples/sec   Loss 0.1651   LearningRate 0.0013   Epoch: 17   Global Step: 295660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:31,473-Speed 3164.45 samples/sec   Loss 0.1699   LearningRate 0.0013   Epoch: 17   Global Step: 295670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:34,566-Speed 3311.24 samples/sec   Loss 0.1753   LearningRate 0.0013   Epoch: 17   Global Step: 295680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:37,732-Speed 3235.40 samples/sec   Loss 0.1857   LearningRate 0.0013   Epoch: 17   Global Step: 295690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:40,969-Speed 3163.61 samples/sec   Loss 0.1729   LearningRate 0.0013   Epoch: 17   Global Step: 295700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:44,224-Speed 3146.80 samples/sec   Loss 0.1639   LearningRate 0.0013   Epoch: 17   Global Step: 295710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:47,301-Speed 3329.43 samples/sec   Loss 0.1552   LearningRate 0.0013   Epoch: 17   Global Step: 295720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:50,378-Speed 3327.84 samples/sec   Loss 0.1700   LearningRate 0.0013   Epoch: 17   Global Step: 295730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:53,471-Speed 3312.05 samples/sec   Loss 0.1728   LearningRate 0.0013   Epoch: 17   Global Step: 295740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:56,537-Speed 3340.65 samples/sec   Loss 0.1747   LearningRate 0.0013   Epoch: 17   Global Step: 295750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:31:59,606-Speed 3336.28 samples/sec   Loss 0.1766   LearningRate 0.0013   Epoch: 17   Global Step: 295760   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:32:02,690-Speed 3321.92 samples/sec   Loss 0.1637   LearningRate 0.0013   Epoch: 17   Global Step: 295770   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:32:05,755-Speed 3342.16 samples/sec   Loss 0.1768   LearningRate 0.0013   Epoch: 17   Global Step: 295780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:08,838-Speed 3321.99 samples/sec   Loss 0.1668   LearningRate 0.0013   Epoch: 17   Global Step: 295790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:11,955-Speed 3285.85 samples/sec   Loss 0.1795   LearningRate 0.0013   Epoch: 17   Global Step: 295800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:15,087-Speed 3269.65 samples/sec   Loss 0.1590   LearningRate 0.0013   Epoch: 17   Global Step: 295810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:18,207-Speed 3283.14 samples/sec   Loss 0.1700   LearningRate 0.0013   Epoch: 17   Global Step: 295820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:21,289-Speed 3323.16 samples/sec   Loss 0.1696   LearningRate 0.0013   Epoch: 17   Global Step: 295830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:24,456-Speed 3233.90 samples/sec   Loss 0.1703   LearningRate 0.0013   Epoch: 17   Global Step: 295840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:27,538-Speed 3323.26 samples/sec   Loss 0.1695   LearningRate 0.0013   Epoch: 17   Global Step: 295850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:30,618-Speed 3325.67 samples/sec   Loss 0.1691   LearningRate 0.0013   Epoch: 17   Global Step: 295860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:33,721-Speed 3301.52 samples/sec   Loss 0.1675   LearningRate 0.0013   Epoch: 17   Global Step: 295870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:36,796-Speed 3329.83 samples/sec   Loss 0.1761   LearningRate 0.0013   Epoch: 17   Global Step: 295880   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:32:39,893-Speed 3307.90 samples/sec   Loss 0.1730   LearningRate 0.0013   Epoch: 17   Global Step: 295890   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:32:42,973-Speed 3324.89 samples/sec   Loss 0.1921   LearningRate 0.0013   Epoch: 17   Global Step: 295900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:46,065-Speed 3312.40 samples/sec   Loss 0.1619   LearningRate 0.0013   Epoch: 17   Global Step: 295910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:49,162-Speed 3307.17 samples/sec   Loss 0.1740   LearningRate 0.0013   Epoch: 17   Global Step: 295920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:52,278-Speed 3286.91 samples/sec   Loss 0.1620   LearningRate 0.0013   Epoch: 17   Global Step: 295930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:55,529-Speed 3151.70 samples/sec   Loss 0.1729   LearningRate 0.0013   Epoch: 17   Global Step: 295940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:32:58,656-Speed 3275.12 samples/sec   Loss 0.1771   LearningRate 0.0013   Epoch: 17   Global Step: 295950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:33:01,776-Speed 3282.70 samples/sec   Loss 0.1796   LearningRate 0.0013   Epoch: 17   Global Step: 295960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:33:04,843-Speed 3339.58 samples/sec   Loss 0.1638   LearningRate 0.0013   Epoch: 17   Global Step: 295970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:33:07,991-Speed 3253.16 samples/sec   Loss 0.1719   LearningRate 0.0013   Epoch: 17   Global Step: 295980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:33:11,077-Speed 3319.00 samples/sec   Loss 0.1745   LearningRate 0.0013   Epoch: 17   Global Step: 295990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:33:14,141-Speed 3342.63 samples/sec   Loss 0.1863   LearningRate 0.0013   Epoch: 17   Global Step: 296000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:33:57,915-[lfw][296000]XNorm: 20.652361
Training: 2022-04-12 06:33:57,916-[lfw][296000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-12 06:33:57,916-[lfw][296000]Accuracy-Highest: 0.99817
Training: 2022-04-12 06:34:48,661-[cfp_fp][296000]XNorm: 22.501334
Training: 2022-04-12 06:34:48,662-[cfp_fp][296000]Accuracy-Flip: 0.99071+-0.00439
Training: 2022-04-12 06:34:48,662-[cfp_fp][296000]Accuracy-Highest: 0.99200
Training: 2022-04-12 06:35:32,316-[agedb_30][296000]XNorm: 22.638173
Training: 2022-04-12 06:35:32,316-[agedb_30][296000]Accuracy-Flip: 0.98633+-0.00586
Training: 2022-04-12 06:35:32,317-[agedb_30][296000]Accuracy-Highest: 0.98650
Training: 2022-04-12 06:35:35,402-Speed 72.49 samples/sec   Loss 0.1711   LearningRate 0.0013   Epoch: 17   Global Step: 296010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:35:38,462-Speed 3347.08 samples/sec   Loss 0.1881   LearningRate 0.0013   Epoch: 17   Global Step: 296020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:35:41,526-Speed 3343.15 samples/sec   Loss 0.1669   LearningRate 0.0013   Epoch: 17   Global Step: 296030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:35:44,606-Speed 3325.46 samples/sec   Loss 0.1805   LearningRate 0.0013   Epoch: 17   Global Step: 296040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:35:47,682-Speed 3330.54 samples/sec   Loss 0.1690   LearningRate 0.0013   Epoch: 17   Global Step: 296050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:35:50,753-Speed 3334.75 samples/sec   Loss 0.1631   LearningRate 0.0013   Epoch: 17   Global Step: 296060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:35:53,831-Speed 3327.90 samples/sec   Loss 0.1513   LearningRate 0.0013   Epoch: 17   Global Step: 296070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:35:56,952-Speed 3281.63 samples/sec   Loss 0.1707   LearningRate 0.0013   Epoch: 17   Global Step: 296080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:36:00,165-Speed 3187.24 samples/sec   Loss 0.1678   LearningRate 0.0013   Epoch: 17   Global Step: 296090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:36:03,233-Speed 3338.59 samples/sec   Loss 0.1880   LearningRate 0.0013   Epoch: 17   Global Step: 296100   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:36:06,349-Speed 3286.93 samples/sec   Loss 0.1634   LearningRate 0.0013   Epoch: 17   Global Step: 296110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:36:09,509-Speed 3241.54 samples/sec   Loss 0.1908   LearningRate 0.0013   Epoch: 17   Global Step: 296120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:12,614-Speed 3298.50 samples/sec   Loss 0.1666   LearningRate 0.0013   Epoch: 17   Global Step: 296130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:15,755-Speed 3261.20 samples/sec   Loss 0.1679   LearningRate 0.0013   Epoch: 17   Global Step: 296140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:18,860-Speed 3298.83 samples/sec   Loss 0.1799   LearningRate 0.0013   Epoch: 17   Global Step: 296150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:21,918-Speed 3349.86 samples/sec   Loss 0.1671   LearningRate 0.0013   Epoch: 17   Global Step: 296160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:25,051-Speed 3268.90 samples/sec   Loss 0.1722   LearningRate 0.0013   Epoch: 17   Global Step: 296170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:28,133-Speed 3323.57 samples/sec   Loss 0.1542   LearningRate 0.0013   Epoch: 17   Global Step: 296180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:31,238-Speed 3298.19 samples/sec   Loss 0.1653   LearningRate 0.0013   Epoch: 17   Global Step: 296190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:34,328-Speed 3315.20 samples/sec   Loss 0.1675   LearningRate 0.0013   Epoch: 17   Global Step: 296200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:37,391-Speed 3343.94 samples/sec   Loss 0.1668   LearningRate 0.0013   Epoch: 17   Global Step: 296210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:40,456-Speed 3341.05 samples/sec   Loss 0.1629   LearningRate 0.0013   Epoch: 17   Global Step: 296220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:36:43,517-Speed 3346.16 samples/sec   Loss 0.1683   LearningRate 0.0013   Epoch: 17   Global Step: 296230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:36:46,627-Speed 3293.54 samples/sec   Loss 0.1756   LearningRate 0.0013   Epoch: 17   Global Step: 296240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:36:49,714-Speed 3318.09 samples/sec   Loss 0.1748   LearningRate 0.0013   Epoch: 17   Global Step: 296250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:52,776-Speed 3344.77 samples/sec   Loss 0.1798   LearningRate 0.0013   Epoch: 17   Global Step: 296260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:55,861-Speed 3320.09 samples/sec   Loss 0.1775   LearningRate 0.0013   Epoch: 17   Global Step: 296270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:36:58,927-Speed 3341.01 samples/sec   Loss 0.1682   LearningRate 0.0013   Epoch: 17   Global Step: 296280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:37:02,063-Speed 3266.53 samples/sec   Loss 0.1970   LearningRate 0.0013   Epoch: 17   Global Step: 296290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:37:05,131-Speed 3338.20 samples/sec   Loss 0.1844   LearningRate 0.0013   Epoch: 17   Global Step: 296300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:37:08,208-Speed 3327.93 samples/sec   Loss 0.1713   LearningRate 0.0013   Epoch: 17   Global Step: 296310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:37:11,296-Speed 3316.64 samples/sec   Loss 0.1787   LearningRate 0.0013   Epoch: 17   Global Step: 296320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:37:14,435-Speed 3263.63 samples/sec   Loss 0.1715   LearningRate 0.0013   Epoch: 17   Global Step: 296330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:37:17,660-Speed 3175.94 samples/sec   Loss 0.1629   LearningRate 0.0013   Epoch: 17   Global Step: 296340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:37:20,748-Speed 3315.80 samples/sec   Loss 0.1751   LearningRate 0.0013   Epoch: 17   Global Step: 296350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:23,817-Speed 3337.61 samples/sec   Loss 0.1739   LearningRate 0.0013   Epoch: 17   Global Step: 296360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:26,883-Speed 3341.12 samples/sec   Loss 0.1604   LearningRate 0.0013   Epoch: 17   Global Step: 296370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:29,945-Speed 3344.43 samples/sec   Loss 0.1636   LearningRate 0.0013   Epoch: 17   Global Step: 296380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:33,070-Speed 3278.31 samples/sec   Loss 0.1862   LearningRate 0.0013   Epoch: 17   Global Step: 296390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:36,147-Speed 3328.23 samples/sec   Loss 0.1734   LearningRate 0.0013   Epoch: 17   Global Step: 296400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:39,238-Speed 3313.87 samples/sec   Loss 0.1669   LearningRate 0.0013   Epoch: 17   Global Step: 296410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:42,316-Speed 3326.66 samples/sec   Loss 0.1691   LearningRate 0.0013   Epoch: 17   Global Step: 296420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:45,391-Speed 3331.41 samples/sec   Loss 0.1674   LearningRate 0.0013   Epoch: 17   Global Step: 296430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:48,458-Speed 3339.22 samples/sec   Loss 0.1685   LearningRate 0.0013   Epoch: 17   Global Step: 296440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:51,515-Speed 3351.36 samples/sec   Loss 0.1720   LearningRate 0.0013   Epoch: 17   Global Step: 296450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:54,581-Speed 3340.58 samples/sec   Loss 0.1659   LearningRate 0.0013   Epoch: 17   Global Step: 296460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:37:57,748-Speed 3234.10 samples/sec   Loss 0.1778   LearningRate 0.0013   Epoch: 17   Global Step: 296470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:00,846-Speed 3305.47 samples/sec   Loss 0.1721   LearningRate 0.0013   Epoch: 17   Global Step: 296480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:04,078-Speed 3169.09 samples/sec   Loss 0.1762   LearningRate 0.0013   Epoch: 17   Global Step: 296490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:07,304-Speed 3175.44 samples/sec   Loss 0.1591   LearningRate 0.0012   Epoch: 17   Global Step: 296500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:10,449-Speed 3256.00 samples/sec   Loss 0.1734   LearningRate 0.0012   Epoch: 17   Global Step: 296510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:13,622-Speed 3227.77 samples/sec   Loss 0.1741   LearningRate 0.0012   Epoch: 17   Global Step: 296520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:16,777-Speed 3246.61 samples/sec   Loss 0.1834   LearningRate 0.0012   Epoch: 17   Global Step: 296530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:19,862-Speed 3320.95 samples/sec   Loss 0.1598   LearningRate 0.0012   Epoch: 17   Global Step: 296540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:22,977-Speed 3288.09 samples/sec   Loss 0.1753   LearningRate 0.0012   Epoch: 17   Global Step: 296550   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:38:26,048-Speed 3334.50 samples/sec   Loss 0.1674   LearningRate 0.0012   Epoch: 17   Global Step: 296560   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:38:29,105-Speed 3351.06 samples/sec   Loss 0.1643   LearningRate 0.0012   Epoch: 17   Global Step: 296570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:32,209-Speed 3299.65 samples/sec   Loss 0.1715   LearningRate 0.0012   Epoch: 17   Global Step: 296580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:35,281-Speed 3333.87 samples/sec   Loss 0.1731   LearningRate 0.0012   Epoch: 17   Global Step: 296590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:38,415-Speed 3267.57 samples/sec   Loss 0.1689   LearningRate 0.0012   Epoch: 17   Global Step: 296600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:41,586-Speed 3230.62 samples/sec   Loss 0.1634   LearningRate 0.0012   Epoch: 17   Global Step: 296610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:44,706-Speed 3282.71 samples/sec   Loss 0.1784   LearningRate 0.0012   Epoch: 17   Global Step: 296620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:47,864-Speed 3243.51 samples/sec   Loss 0.1836   LearningRate 0.0012   Epoch: 17   Global Step: 296630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:51,051-Speed 3214.36 samples/sec   Loss 0.1738   LearningRate 0.0012   Epoch: 17   Global Step: 296640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:54,222-Speed 3229.52 samples/sec   Loss 0.1746   LearningRate 0.0012   Epoch: 17   Global Step: 296650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:38:57,422-Speed 3201.26 samples/sec   Loss 0.1840   LearningRate 0.0012   Epoch: 17   Global Step: 296660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:00,552-Speed 3271.81 samples/sec   Loss 0.1633   LearningRate 0.0012   Epoch: 17   Global Step: 296670   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:39:03,614-Speed 3345.36 samples/sec   Loss 0.1654   LearningRate 0.0012   Epoch: 17   Global Step: 296680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:06,679-Speed 3342.09 samples/sec   Loss 0.1669   LearningRate 0.0012   Epoch: 17   Global Step: 296690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:09,844-Speed 3235.75 samples/sec   Loss 0.1833   LearningRate 0.0012   Epoch: 17   Global Step: 296700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:12,931-Speed 3318.13 samples/sec   Loss 0.1890   LearningRate 0.0012   Epoch: 17   Global Step: 296710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:16,001-Speed 3335.90 samples/sec   Loss 0.1680   LearningRate 0.0012   Epoch: 17   Global Step: 296720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:19,131-Speed 3272.50 samples/sec   Loss 0.1708   LearningRate 0.0012   Epoch: 17   Global Step: 296730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:22,261-Speed 3272.26 samples/sec   Loss 0.1781   LearningRate 0.0012   Epoch: 17   Global Step: 296740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:25,337-Speed 3329.82 samples/sec   Loss 0.1754   LearningRate 0.0012   Epoch: 17   Global Step: 296750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:28,484-Speed 3254.14 samples/sec   Loss 0.1659   LearningRate 0.0012   Epoch: 17   Global Step: 296760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:31,654-Speed 3231.24 samples/sec   Loss 0.1776   LearningRate 0.0012   Epoch: 17   Global Step: 296770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:34,828-Speed 3227.24 samples/sec   Loss 0.1791   LearningRate 0.0012   Epoch: 17   Global Step: 296780   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:39:38,013-Speed 3215.64 samples/sec   Loss 0.1622   LearningRate 0.0012   Epoch: 17   Global Step: 296790   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:39:41,171-Speed 3243.19 samples/sec   Loss 0.1805   LearningRate 0.0012   Epoch: 17   Global Step: 296800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:44,269-Speed 3306.80 samples/sec   Loss 0.1748   LearningRate 0.0012   Epoch: 17   Global Step: 296810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:47,355-Speed 3318.64 samples/sec   Loss 0.1709   LearningRate 0.0012   Epoch: 17   Global Step: 296820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:50,463-Speed 3295.48 samples/sec   Loss 0.1703   LearningRate 0.0012   Epoch: 17   Global Step: 296830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:53,548-Speed 3319.47 samples/sec   Loss 0.1733   LearningRate 0.0012   Epoch: 17   Global Step: 296840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:56,649-Speed 3302.66 samples/sec   Loss 0.1538   LearningRate 0.0012   Epoch: 17   Global Step: 296850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:39:59,792-Speed 3258.92 samples/sec   Loss 0.1765   LearningRate 0.0012   Epoch: 17   Global Step: 296860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:02,920-Speed 3274.33 samples/sec   Loss 0.1648   LearningRate 0.0012   Epoch: 17   Global Step: 296870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:06,101-Speed 3220.50 samples/sec   Loss 0.1707   LearningRate 0.0012   Epoch: 17   Global Step: 296880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:09,167-Speed 3341.07 samples/sec   Loss 0.1820   LearningRate 0.0012   Epoch: 17   Global Step: 296890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:12,221-Speed 3353.12 samples/sec   Loss 0.1839   LearningRate 0.0012   Epoch: 17   Global Step: 296900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:15,349-Speed 3274.58 samples/sec   Loss 0.1836   LearningRate 0.0012   Epoch: 17   Global Step: 296910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:18,496-Speed 3254.70 samples/sec   Loss 0.1791   LearningRate 0.0012   Epoch: 17   Global Step: 296920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:21,560-Speed 3342.44 samples/sec   Loss 0.1699   LearningRate 0.0012   Epoch: 17   Global Step: 296930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:24,622-Speed 3344.47 samples/sec   Loss 0.1634   LearningRate 0.0012   Epoch: 17   Global Step: 296940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:27,690-Speed 3338.34 samples/sec   Loss 0.1807   LearningRate 0.0012   Epoch: 17   Global Step: 296950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:30,766-Speed 3330.52 samples/sec   Loss 0.1685   LearningRate 0.0012   Epoch: 17   Global Step: 296960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:33,844-Speed 3327.14 samples/sec   Loss 0.1831   LearningRate 0.0012   Epoch: 17   Global Step: 296970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:36,934-Speed 3314.94 samples/sec   Loss 0.1659   LearningRate 0.0012   Epoch: 17   Global Step: 296980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:40,026-Speed 3312.74 samples/sec   Loss 0.1740   LearningRate 0.0012   Epoch: 17   Global Step: 296990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:43,091-Speed 3341.37 samples/sec   Loss 0.1695   LearningRate 0.0012   Epoch: 17   Global Step: 297000   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:40:46,181-Speed 3314.77 samples/sec   Loss 0.1728   LearningRate 0.0012   Epoch: 17   Global Step: 297010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:49,240-Speed 3347.95 samples/sec   Loss 0.1733   LearningRate 0.0012   Epoch: 17   Global Step: 297020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:52,324-Speed 3321.28 samples/sec   Loss 0.1798   LearningRate 0.0012   Epoch: 17   Global Step: 297030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:55,387-Speed 3343.53 samples/sec   Loss 0.1855   LearningRate 0.0012   Epoch: 17   Global Step: 297040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:40:58,460-Speed 3333.31 samples/sec   Loss 0.1694   LearningRate 0.0012   Epoch: 17   Global Step: 297050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:01,520-Speed 3347.57 samples/sec   Loss 0.1746   LearningRate 0.0012   Epoch: 17   Global Step: 297060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:04,690-Speed 3230.36 samples/sec   Loss 0.1652   LearningRate 0.0012   Epoch: 17   Global Step: 297070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:07,819-Speed 3274.38 samples/sec   Loss 0.1734   LearningRate 0.0012   Epoch: 17   Global Step: 297080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:10,906-Speed 3317.46 samples/sec   Loss 0.1716   LearningRate 0.0012   Epoch: 17   Global Step: 297090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:13,970-Speed 3342.50 samples/sec   Loss 0.1688   LearningRate 0.0012   Epoch: 17   Global Step: 297100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:17,039-Speed 3336.77 samples/sec   Loss 0.1676   LearningRate 0.0012   Epoch: 17   Global Step: 297110   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:41:20,104-Speed 3341.82 samples/sec   Loss 0.1680   LearningRate 0.0012   Epoch: 17   Global Step: 297120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:23,170-Speed 3340.53 samples/sec   Loss 0.1697   LearningRate 0.0012   Epoch: 17   Global Step: 297130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:26,264-Speed 3311.19 samples/sec   Loss 0.1803   LearningRate 0.0012   Epoch: 17   Global Step: 297140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:29,383-Speed 3283.65 samples/sec   Loss 0.1697   LearningRate 0.0012   Epoch: 17   Global Step: 297150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:41:32,488-Speed 3299.01 samples/sec   Loss 0.1761   LearningRate 0.0012   Epoch: 17   Global Step: 297160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:41:35,557-Speed 3336.57 samples/sec   Loss 0.1757   LearningRate 0.0012   Epoch: 17   Global Step: 297170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:41:38,636-Speed 3326.39 samples/sec   Loss 0.1710   LearningRate 0.0012   Epoch: 17   Global Step: 297180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:41:41,715-Speed 3326.37 samples/sec   Loss 0.1822   LearningRate 0.0012   Epoch: 17   Global Step: 297190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:41:44,809-Speed 3310.59 samples/sec   Loss 0.1777   LearningRate 0.0012   Epoch: 17   Global Step: 297200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:41:47,891-Speed 3322.89 samples/sec   Loss 0.1704   LearningRate 0.0012   Epoch: 17   Global Step: 297210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:41:50,969-Speed 3327.83 samples/sec   Loss 0.1822   LearningRate 0.0012   Epoch: 17   Global Step: 297220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:41:54,062-Speed 3311.73 samples/sec   Loss 0.1775   LearningRate 0.0012   Epoch: 17   Global Step: 297230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:41:57,151-Speed 3315.44 samples/sec   Loss 0.1662   LearningRate 0.0012   Epoch: 17   Global Step: 297240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:00,234-Speed 3322.66 samples/sec   Loss 0.1552   LearningRate 0.0012   Epoch: 17   Global Step: 297250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:03,418-Speed 3216.23 samples/sec   Loss 0.1856   LearningRate 0.0012   Epoch: 17   Global Step: 297260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:42:06,493-Speed 3330.78 samples/sec   Loss 0.1951   LearningRate 0.0012   Epoch: 17   Global Step: 297270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:42:09,565-Speed 3334.01 samples/sec   Loss 0.1743   LearningRate 0.0012   Epoch: 17   Global Step: 297280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:42:12,656-Speed 3314.16 samples/sec   Loss 0.1572   LearningRate 0.0012   Epoch: 17   Global Step: 297290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:15,724-Speed 3338.33 samples/sec   Loss 0.1854   LearningRate 0.0012   Epoch: 17   Global Step: 297300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:18,806-Speed 3322.75 samples/sec   Loss 0.1624   LearningRate 0.0012   Epoch: 17   Global Step: 297310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:21,876-Speed 3336.93 samples/sec   Loss 0.1693   LearningRate 0.0012   Epoch: 17   Global Step: 297320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:24,984-Speed 3295.81 samples/sec   Loss 0.1784   LearningRate 0.0012   Epoch: 17   Global Step: 297330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:28,052-Speed 3338.86 samples/sec   Loss 0.1877   LearningRate 0.0012   Epoch: 17   Global Step: 297340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:31,134-Speed 3322.65 samples/sec   Loss 0.1740   LearningRate 0.0012   Epoch: 17   Global Step: 297350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:34,200-Speed 3341.03 samples/sec   Loss 0.1816   LearningRate 0.0012   Epoch: 17   Global Step: 297360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:37,303-Speed 3300.88 samples/sec   Loss 0.1720   LearningRate 0.0012   Epoch: 17   Global Step: 297370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:40,373-Speed 3335.82 samples/sec   Loss 0.1655   LearningRate 0.0012   Epoch: 17   Global Step: 297380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:42:43,457-Speed 3320.98 samples/sec   Loss 0.1764   LearningRate 0.0012   Epoch: 17   Global Step: 297390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:42:46,555-Speed 3305.72 samples/sec   Loss 0.1822   LearningRate 0.0012   Epoch: 17   Global Step: 297400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:42:49,648-Speed 3312.85 samples/sec   Loss 0.1580   LearningRate 0.0012   Epoch: 17   Global Step: 297410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:42:52,729-Speed 3323.57 samples/sec   Loss 0.1698   LearningRate 0.0012   Epoch: 17   Global Step: 297420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:42:55,795-Speed 3341.31 samples/sec   Loss 0.1756   LearningRate 0.0012   Epoch: 17   Global Step: 297430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:42:58,861-Speed 3340.03 samples/sec   Loss 0.1717   LearningRate 0.0012   Epoch: 17   Global Step: 297440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:43:01,985-Speed 3278.76 samples/sec   Loss 0.1753   LearningRate 0.0012   Epoch: 17   Global Step: 297450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:43:05,101-Speed 3286.42 samples/sec   Loss 0.1863   LearningRate 0.0012   Epoch: 17   Global Step: 297460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:43:08,170-Speed 3337.58 samples/sec   Loss 0.1691   LearningRate 0.0012   Epoch: 17   Global Step: 297470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:43:11,234-Speed 3342.74 samples/sec   Loss 0.1829   LearningRate 0.0012   Epoch: 17   Global Step: 297480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:43:14,312-Speed 3327.24 samples/sec   Loss 0.1841   LearningRate 0.0012   Epoch: 17   Global Step: 297490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:43:17,413-Speed 3303.80 samples/sec   Loss 0.1729   LearningRate 0.0012   Epoch: 17   Global Step: 297500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:43:20,583-Speed 3230.44 samples/sec   Loss 0.1820   LearningRate 0.0012   Epoch: 17   Global Step: 297510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:43:23,653-Speed 3336.86 samples/sec   Loss 0.1636   LearningRate 0.0012   Epoch: 17   Global Step: 297520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:43:26,708-Speed 3351.81 samples/sec   Loss 0.1676   LearningRate 0.0012   Epoch: 17   Global Step: 297530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:29,775-Speed 3339.90 samples/sec   Loss 0.1811   LearningRate 0.0012   Epoch: 17   Global Step: 297540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:32,934-Speed 3242.22 samples/sec   Loss 0.1763   LearningRate 0.0012   Epoch: 17   Global Step: 297550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:36,088-Speed 3247.13 samples/sec   Loss 0.1708   LearningRate 0.0012   Epoch: 17   Global Step: 297560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:39,173-Speed 3320.61 samples/sec   Loss 0.1610   LearningRate 0.0012   Epoch: 17   Global Step: 297570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:42,362-Speed 3211.68 samples/sec   Loss 0.1646   LearningRate 0.0012   Epoch: 17   Global Step: 297580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:45,559-Speed 3203.90 samples/sec   Loss 0.1930   LearningRate 0.0012   Epoch: 17   Global Step: 297590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:48,665-Speed 3297.95 samples/sec   Loss 0.1875   LearningRate 0.0012   Epoch: 17   Global Step: 297600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:51,754-Speed 3315.57 samples/sec   Loss 0.1797   LearningRate 0.0012   Epoch: 17   Global Step: 297610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:54,825-Speed 3334.36 samples/sec   Loss 0.1856   LearningRate 0.0012   Epoch: 17   Global Step: 297620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:43:57,892-Speed 3339.74 samples/sec   Loss 0.1796   LearningRate 0.0012   Epoch: 17   Global Step: 297630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:00,980-Speed 3316.79 samples/sec   Loss 0.1757   LearningRate 0.0012   Epoch: 17   Global Step: 297640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:04,047-Speed 3339.44 samples/sec   Loss 0.1713   LearningRate 0.0012   Epoch: 17   Global Step: 297650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:07,134-Speed 3318.18 samples/sec   Loss 0.1880   LearningRate 0.0012   Epoch: 17   Global Step: 297660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:10,219-Speed 3320.38 samples/sec   Loss 0.1680   LearningRate 0.0012   Epoch: 17   Global Step: 297670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:13,350-Speed 3270.62 samples/sec   Loss 0.1750   LearningRate 0.0012   Epoch: 17   Global Step: 297680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:16,429-Speed 3327.31 samples/sec   Loss 0.1700   LearningRate 0.0012   Epoch: 17   Global Step: 297690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:19,529-Speed 3303.13 samples/sec   Loss 0.1760   LearningRate 0.0012   Epoch: 17   Global Step: 297700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:22,745-Speed 3184.69 samples/sec   Loss 0.1752   LearningRate 0.0012   Epoch: 17   Global Step: 297710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:25,811-Speed 3341.36 samples/sec   Loss 0.1517   LearningRate 0.0012   Epoch: 17   Global Step: 297720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:28,871-Speed 3346.33 samples/sec   Loss 0.1575   LearningRate 0.0012   Epoch: 17   Global Step: 297730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:32,119-Speed 3153.54 samples/sec   Loss 0.1743   LearningRate 0.0012   Epoch: 17   Global Step: 297740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:35,329-Speed 3190.69 samples/sec   Loss 0.1707   LearningRate 0.0012   Epoch: 17   Global Step: 297750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:38,520-Speed 3210.43 samples/sec   Loss 0.1726   LearningRate 0.0012   Epoch: 17   Global Step: 297760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:41,624-Speed 3299.73 samples/sec   Loss 0.1803   LearningRate 0.0012   Epoch: 17   Global Step: 297770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:44,711-Speed 3317.34 samples/sec   Loss 0.1847   LearningRate 0.0012   Epoch: 17   Global Step: 297780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:47,804-Speed 3311.82 samples/sec   Loss 0.1666   LearningRate 0.0012   Epoch: 17   Global Step: 297790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:50,910-Speed 3297.52 samples/sec   Loss 0.1792   LearningRate 0.0012   Epoch: 17   Global Step: 297800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:53,980-Speed 3335.66 samples/sec   Loss 0.1661   LearningRate 0.0012   Epoch: 17   Global Step: 297810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:44:57,051-Speed 3335.34 samples/sec   Loss 0.1820   LearningRate 0.0012   Epoch: 17   Global Step: 297820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:00,149-Speed 3306.80 samples/sec   Loss 0.1892   LearningRate 0.0012   Epoch: 17   Global Step: 297830   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:45:03,329-Speed 3221.36 samples/sec   Loss 0.1598   LearningRate 0.0012   Epoch: 17   Global Step: 297840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:06,542-Speed 3187.37 samples/sec   Loss 0.1697   LearningRate 0.0012   Epoch: 17   Global Step: 297850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:09,612-Speed 3335.43 samples/sec   Loss 0.1765   LearningRate 0.0012   Epoch: 17   Global Step: 297860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:12,676-Speed 3343.54 samples/sec   Loss 0.1725   LearningRate 0.0012   Epoch: 17   Global Step: 297870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:15,748-Speed 3333.67 samples/sec   Loss 0.1826   LearningRate 0.0012   Epoch: 17   Global Step: 297880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:18,822-Speed 3331.83 samples/sec   Loss 0.1748   LearningRate 0.0012   Epoch: 17   Global Step: 297890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:21,894-Speed 3334.18 samples/sec   Loss 0.1787   LearningRate 0.0012   Epoch: 17   Global Step: 297900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:24,978-Speed 3321.21 samples/sec   Loss 0.1607   LearningRate 0.0012   Epoch: 17   Global Step: 297910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:28,088-Speed 3293.67 samples/sec   Loss 0.1759   LearningRate 0.0012   Epoch: 17   Global Step: 297920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:31,181-Speed 3311.78 samples/sec   Loss 0.1686   LearningRate 0.0012   Epoch: 17   Global Step: 297930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:34,299-Speed 3284.48 samples/sec   Loss 0.1740   LearningRate 0.0012   Epoch: 17   Global Step: 297940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:37,422-Speed 3279.60 samples/sec   Loss 0.1748   LearningRate 0.0012   Epoch: 17   Global Step: 297950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:40,491-Speed 3336.95 samples/sec   Loss 0.1882   LearningRate 0.0012   Epoch: 17   Global Step: 297960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:43,592-Speed 3302.93 samples/sec   Loss 0.1843   LearningRate 0.0012   Epoch: 17   Global Step: 297970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:46,664-Speed 3334.70 samples/sec   Loss 0.1869   LearningRate 0.0012   Epoch: 17   Global Step: 297980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:49,760-Speed 3308.28 samples/sec   Loss 0.1855   LearningRate 0.0012   Epoch: 17   Global Step: 297990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:45:52,861-Speed 3302.67 samples/sec   Loss 0.1871   LearningRate 0.0012   Epoch: 17   Global Step: 298000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:46:36,399-[lfw][298000]XNorm: 20.692737
Training: 2022-04-12 06:46:36,399-[lfw][298000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-12 06:46:36,400-[lfw][298000]Accuracy-Highest: 0.99817
Training: 2022-04-12 06:47:27,262-[cfp_fp][298000]XNorm: 22.461339
Training: 2022-04-12 06:47:27,263-[cfp_fp][298000]Accuracy-Flip: 0.99157+-0.00386
Training: 2022-04-12 06:47:27,263-[cfp_fp][298000]Accuracy-Highest: 0.99200
Training: 2022-04-12 06:48:11,073-[agedb_30][298000]XNorm: 22.597262
Training: 2022-04-12 06:48:11,074-[agedb_30][298000]Accuracy-Flip: 0.98567+-0.00606
Training: 2022-04-12 06:48:11,074-[agedb_30][298000]Accuracy-Highest: 0.98650
Training: 2022-04-12 06:48:14,152-Speed 72.47 samples/sec   Loss 0.1756   LearningRate 0.0012   Epoch: 17   Global Step: 298010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:17,263-Speed 3291.38 samples/sec   Loss 0.1803   LearningRate 0.0012   Epoch: 17   Global Step: 298020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:20,342-Speed 3326.85 samples/sec   Loss 0.1762   LearningRate 0.0011   Epoch: 17   Global Step: 298030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:23,397-Speed 3353.01 samples/sec   Loss 0.1660   LearningRate 0.0011   Epoch: 17   Global Step: 298040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:26,468-Speed 3335.61 samples/sec   Loss 0.1799   LearningRate 0.0011   Epoch: 17   Global Step: 298050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:29,545-Speed 3328.58 samples/sec   Loss 0.1934   LearningRate 0.0011   Epoch: 17   Global Step: 298060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:32,681-Speed 3265.14 samples/sec   Loss 0.1487   LearningRate 0.0011   Epoch: 17   Global Step: 298070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:35,755-Speed 3331.81 samples/sec   Loss 0.1765   LearningRate 0.0011   Epoch: 17   Global Step: 298080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:38,899-Speed 3258.23 samples/sec   Loss 0.1865   LearningRate 0.0011   Epoch: 17   Global Step: 298090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:41,992-Speed 3310.89 samples/sec   Loss 0.1889   LearningRate 0.0011   Epoch: 17   Global Step: 298100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:45,072-Speed 3325.09 samples/sec   Loss 0.1581   LearningRate 0.0011   Epoch: 17   Global Step: 298110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:48,168-Speed 3308.97 samples/sec   Loss 0.1681   LearningRate 0.0011   Epoch: 17   Global Step: 298120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:51,236-Speed 3338.42 samples/sec   Loss 0.1801   LearningRate 0.0011   Epoch: 17   Global Step: 298130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:48:54,307-Speed 3334.87 samples/sec   Loss 0.1708   LearningRate 0.0011   Epoch: 17   Global Step: 298140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:48:57,371-Speed 3342.75 samples/sec   Loss 0.1700   LearningRate 0.0011   Epoch: 17   Global Step: 298150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:49:00,451-Speed 3325.24 samples/sec   Loss 0.1807   LearningRate 0.0011   Epoch: 17   Global Step: 298160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:49:03,527-Speed 3330.35 samples/sec   Loss 0.1611   LearningRate 0.0011   Epoch: 17   Global Step: 298170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:49:06,594-Speed 3339.90 samples/sec   Loss 0.1750   LearningRate 0.0011   Epoch: 17   Global Step: 298180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:49:09,677-Speed 3322.04 samples/sec   Loss 0.1723   LearningRate 0.0011   Epoch: 17   Global Step: 298190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:49:12,759-Speed 3322.29 samples/sec   Loss 0.1649   LearningRate 0.0011   Epoch: 17   Global Step: 298200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:49:15,829-Speed 3336.32 samples/sec   Loss 0.1656   LearningRate 0.0011   Epoch: 17   Global Step: 298210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:49:18,902-Speed 3333.69 samples/sec   Loss 0.1649   LearningRate 0.0011   Epoch: 17   Global Step: 298220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:49:21,968-Speed 3340.58 samples/sec   Loss 0.1708   LearningRate 0.0011   Epoch: 17   Global Step: 298230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:49:25,078-Speed 3293.49 samples/sec   Loss 0.1760   LearningRate 0.0011   Epoch: 17   Global Step: 298240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:28,259-Speed 3219.91 samples/sec   Loss 0.1775   LearningRate 0.0011   Epoch: 17   Global Step: 298250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:31,322-Speed 3343.92 samples/sec   Loss 0.1629   LearningRate 0.0011   Epoch: 17   Global Step: 298260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:34,402-Speed 3325.00 samples/sec   Loss 0.1634   LearningRate 0.0011   Epoch: 17   Global Step: 298270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:37,503-Speed 3302.82 samples/sec   Loss 0.1849   LearningRate 0.0011   Epoch: 17   Global Step: 298280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:40,598-Speed 3309.79 samples/sec   Loss 0.1665   LearningRate 0.0011   Epoch: 17   Global Step: 298290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:43,708-Speed 3293.26 samples/sec   Loss 0.1705   LearningRate 0.0011   Epoch: 17   Global Step: 298300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:46,769-Speed 3346.84 samples/sec   Loss 0.1619   LearningRate 0.0011   Epoch: 17   Global Step: 298310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:49,836-Speed 3339.22 samples/sec   Loss 0.1730   LearningRate 0.0011   Epoch: 17   Global Step: 298320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:52,899-Speed 3343.98 samples/sec   Loss 0.1822   LearningRate 0.0011   Epoch: 17   Global Step: 298330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:49:56,009-Speed 3293.54 samples/sec   Loss 0.1880   LearningRate 0.0011   Epoch: 17   Global Step: 298340   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:49:59,075-Speed 3340.08 samples/sec   Loss 0.1768   LearningRate 0.0011   Epoch: 17   Global Step: 298350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:50:02,146-Speed 3334.99 samples/sec   Loss 0.1849   LearningRate 0.0011   Epoch: 17   Global Step: 298360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:50:05,280-Speed 3268.41 samples/sec   Loss 0.1820   LearningRate 0.0011   Epoch: 17   Global Step: 298370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:50:08,419-Speed 3262.83 samples/sec   Loss 0.1870   LearningRate 0.0011   Epoch: 17   Global Step: 298380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:50:11,477-Speed 3349.97 samples/sec   Loss 0.1814   LearningRate 0.0011   Epoch: 17   Global Step: 298390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:14,542-Speed 3341.03 samples/sec   Loss 0.1791   LearningRate 0.0011   Epoch: 17   Global Step: 298400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:17,693-Speed 3250.91 samples/sec   Loss 0.1666   LearningRate 0.0011   Epoch: 17   Global Step: 298410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:20,757-Speed 3342.21 samples/sec   Loss 0.1733   LearningRate 0.0011   Epoch: 17   Global Step: 298420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:23,821-Speed 3343.16 samples/sec   Loss 0.1701   LearningRate 0.0011   Epoch: 17   Global Step: 298430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:26,931-Speed 3293.18 samples/sec   Loss 0.1765   LearningRate 0.0011   Epoch: 17   Global Step: 298440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:30,018-Speed 3318.06 samples/sec   Loss 0.1730   LearningRate 0.0011   Epoch: 17   Global Step: 298450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:33,156-Speed 3263.97 samples/sec   Loss 0.1687   LearningRate 0.0011   Epoch: 17   Global Step: 298460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:36,270-Speed 3289.64 samples/sec   Loss 0.1569   LearningRate 0.0011   Epoch: 17   Global Step: 298470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:39,343-Speed 3332.62 samples/sec   Loss 0.1725   LearningRate 0.0011   Epoch: 17   Global Step: 298480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:50:42,420-Speed 3329.47 samples/sec   Loss 0.1749   LearningRate 0.0011   Epoch: 17   Global Step: 298490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:50:45,582-Speed 3238.69 samples/sec   Loss 0.1840   LearningRate 0.0011   Epoch: 17   Global Step: 298500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:50:48,674-Speed 3313.07 samples/sec   Loss 0.1645   LearningRate 0.0011   Epoch: 17   Global Step: 298510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:50:51,765-Speed 3312.90 samples/sec   Loss 0.1772   LearningRate 0.0011   Epoch: 17   Global Step: 298520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:50:54,838-Speed 3333.95 samples/sec   Loss 0.1765   LearningRate 0.0011   Epoch: 17   Global Step: 298530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:50:57,926-Speed 3315.74 samples/sec   Loss 0.1636   LearningRate 0.0011   Epoch: 17   Global Step: 298540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:01,007-Speed 3324.67 samples/sec   Loss 0.1692   LearningRate 0.0011   Epoch: 17   Global Step: 298550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:04,078-Speed 3335.75 samples/sec   Loss 0.1748   LearningRate 0.0011   Epoch: 17   Global Step: 298560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:07,137-Speed 3348.60 samples/sec   Loss 0.1648   LearningRate 0.0011   Epoch: 17   Global Step: 298570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:10,274-Speed 3264.82 samples/sec   Loss 0.1649   LearningRate 0.0011   Epoch: 17   Global Step: 298580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:13,342-Speed 3337.83 samples/sec   Loss 0.1732   LearningRate 0.0011   Epoch: 17   Global Step: 298590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:16,407-Speed 3342.19 samples/sec   Loss 0.1668   LearningRate 0.0011   Epoch: 17   Global Step: 298600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:19,494-Speed 3317.45 samples/sec   Loss 0.1806   LearningRate 0.0011   Epoch: 17   Global Step: 298610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:22,567-Speed 3333.36 samples/sec   Loss 0.1884   LearningRate 0.0011   Epoch: 17   Global Step: 298620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:25,655-Speed 3317.00 samples/sec   Loss 0.1777   LearningRate 0.0011   Epoch: 17   Global Step: 298630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:28,729-Speed 3331.56 samples/sec   Loss 0.1865   LearningRate 0.0011   Epoch: 17   Global Step: 298640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:31,784-Speed 3353.11 samples/sec   Loss 0.1667   LearningRate 0.0011   Epoch: 17   Global Step: 298650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:34,853-Speed 3336.63 samples/sec   Loss 0.1592   LearningRate 0.0011   Epoch: 17   Global Step: 298660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:51:37,907-Speed 3354.01 samples/sec   Loss 0.1709   LearningRate 0.0011   Epoch: 17   Global Step: 298670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:51:40,981-Speed 3331.91 samples/sec   Loss 0.1717   LearningRate 0.0011   Epoch: 17   Global Step: 298680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:51:44,115-Speed 3268.40 samples/sec   Loss 0.1790   LearningRate 0.0011   Epoch: 17   Global Step: 298690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:51:47,186-Speed 3335.26 samples/sec   Loss 0.1813   LearningRate 0.0011   Epoch: 17   Global Step: 298700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:51:50,249-Speed 3343.07 samples/sec   Loss 0.1568   LearningRate 0.0011   Epoch: 17   Global Step: 298710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:51:53,326-Speed 3329.17 samples/sec   Loss 0.1688   LearningRate 0.0011   Epoch: 17   Global Step: 298720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:51:56,382-Speed 3351.96 samples/sec   Loss 0.2007   LearningRate 0.0011   Epoch: 17   Global Step: 298730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:51:59,439-Speed 3350.54 samples/sec   Loss 0.1637   LearningRate 0.0011   Epoch: 17   Global Step: 298740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:52:02,495-Speed 3350.89 samples/sec   Loss 0.1707   LearningRate 0.0011   Epoch: 17   Global Step: 298750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:52:05,642-Speed 3254.90 samples/sec   Loss 0.1821   LearningRate 0.0011   Epoch: 17   Global Step: 298760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:52:08,724-Speed 3323.12 samples/sec   Loss 0.1813   LearningRate 0.0011   Epoch: 17   Global Step: 298770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:11,803-Speed 3326.21 samples/sec   Loss 0.1653   LearningRate 0.0011   Epoch: 17   Global Step: 298780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:14,881-Speed 3328.15 samples/sec   Loss 0.1772   LearningRate 0.0011   Epoch: 17   Global Step: 298790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:17,971-Speed 3315.07 samples/sec   Loss 0.1805   LearningRate 0.0011   Epoch: 17   Global Step: 298800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:21,068-Speed 3306.29 samples/sec   Loss 0.1855   LearningRate 0.0011   Epoch: 17   Global Step: 298810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:24,190-Speed 3281.23 samples/sec   Loss 0.1745   LearningRate 0.0011   Epoch: 17   Global Step: 298820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:27,331-Speed 3260.69 samples/sec   Loss 0.1695   LearningRate 0.0011   Epoch: 17   Global Step: 298830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:30,435-Speed 3300.01 samples/sec   Loss 0.1592   LearningRate 0.0011   Epoch: 17   Global Step: 298840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:33,568-Speed 3269.40 samples/sec   Loss 0.1884   LearningRate 0.0011   Epoch: 17   Global Step: 298850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:36,674-Speed 3297.84 samples/sec   Loss 0.1637   LearningRate 0.0011   Epoch: 17   Global Step: 298860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:39,730-Speed 3350.60 samples/sec   Loss 0.1706   LearningRate 0.0011   Epoch: 17   Global Step: 298870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:42,802-Speed 3334.69 samples/sec   Loss 0.1753   LearningRate 0.0011   Epoch: 17   Global Step: 298880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:45,983-Speed 3220.02 samples/sec   Loss 0.1658   LearningRate 0.0011   Epoch: 17   Global Step: 298890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:49,057-Speed 3331.39 samples/sec   Loss 0.1756   LearningRate 0.0011   Epoch: 17   Global Step: 298900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:52,151-Speed 3310.47 samples/sec   Loss 0.1684   LearningRate 0.0011   Epoch: 17   Global Step: 298910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:55,218-Speed 3340.08 samples/sec   Loss 0.1765   LearningRate 0.0011   Epoch: 17   Global Step: 298920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:52:58,277-Speed 3347.73 samples/sec   Loss 0.1903   LearningRate 0.0011   Epoch: 17   Global Step: 298930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:53:01,367-Speed 3315.47 samples/sec   Loss 0.1801   LearningRate 0.0011   Epoch: 17   Global Step: 298940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:53:04,505-Speed 3263.87 samples/sec   Loss 0.1681   LearningRate 0.0011   Epoch: 17   Global Step: 298950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:53:07,651-Speed 3255.11 samples/sec   Loss 0.1788   LearningRate 0.0011   Epoch: 17   Global Step: 298960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:53:10,770-Speed 3283.84 samples/sec   Loss 0.1714   LearningRate 0.0011   Epoch: 17   Global Step: 298970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:53:13,864-Speed 3310.97 samples/sec   Loss 0.1765   LearningRate 0.0011   Epoch: 17   Global Step: 298980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:16,973-Speed 3294.39 samples/sec   Loss 0.1870   LearningRate 0.0011   Epoch: 17   Global Step: 298990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:20,063-Speed 3314.92 samples/sec   Loss 0.1727   LearningRate 0.0011   Epoch: 17   Global Step: 299000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:23,242-Speed 3221.37 samples/sec   Loss 0.1852   LearningRate 0.0011   Epoch: 17   Global Step: 299010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:26,321-Speed 3327.34 samples/sec   Loss 0.1766   LearningRate 0.0011   Epoch: 17   Global Step: 299020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:29,394-Speed 3332.89 samples/sec   Loss 0.1833   LearningRate 0.0011   Epoch: 17   Global Step: 299030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:32,460-Speed 3339.87 samples/sec   Loss 0.1754   LearningRate 0.0011   Epoch: 17   Global Step: 299040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:35,519-Speed 3348.74 samples/sec   Loss 0.1809   LearningRate 0.0011   Epoch: 17   Global Step: 299050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:38,593-Speed 3331.09 samples/sec   Loss 0.1648   LearningRate 0.0011   Epoch: 17   Global Step: 299060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:41,679-Speed 3319.07 samples/sec   Loss 0.1735   LearningRate 0.0011   Epoch: 17   Global Step: 299070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:53:44,765-Speed 3319.18 samples/sec   Loss 0.1696   LearningRate 0.0011   Epoch: 17   Global Step: 299080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:53:47,828-Speed 3344.49 samples/sec   Loss 0.1591   LearningRate 0.0011   Epoch: 17   Global Step: 299090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:53:50,899-Speed 3335.41 samples/sec   Loss 0.1818   LearningRate 0.0011   Epoch: 17   Global Step: 299100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:53:53,968-Speed 3336.73 samples/sec   Loss 0.1684   LearningRate 0.0011   Epoch: 17   Global Step: 299110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:53:57,110-Speed 3259.66 samples/sec   Loss 0.1626   LearningRate 0.0011   Epoch: 17   Global Step: 299120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:00,254-Speed 3257.67 samples/sec   Loss 0.1626   LearningRate 0.0011   Epoch: 17   Global Step: 299130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:03,333-Speed 3326.77 samples/sec   Loss 0.1770   LearningRate 0.0011   Epoch: 17   Global Step: 299140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:06,448-Speed 3287.74 samples/sec   Loss 0.1669   LearningRate 0.0011   Epoch: 17   Global Step: 299150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:09,525-Speed 3328.60 samples/sec   Loss 0.1581   LearningRate 0.0011   Epoch: 17   Global Step: 299160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:12,597-Speed 3335.34 samples/sec   Loss 0.1647   LearningRate 0.0011   Epoch: 17   Global Step: 299170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:15,675-Speed 3327.34 samples/sec   Loss 0.1719   LearningRate 0.0011   Epoch: 17   Global Step: 299180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:18,751-Speed 3330.18 samples/sec   Loss 0.1698   LearningRate 0.0011   Epoch: 17   Global Step: 299190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:21,829-Speed 3327.35 samples/sec   Loss 0.1680   LearningRate 0.0011   Epoch: 17   Global Step: 299200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:24,896-Speed 3339.74 samples/sec   Loss 0.1636   LearningRate 0.0011   Epoch: 17   Global Step: 299210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:27,984-Speed 3315.91 samples/sec   Loss 0.1688   LearningRate 0.0011   Epoch: 17   Global Step: 299220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:54:31,070-Speed 3319.19 samples/sec   Loss 0.1608   LearningRate 0.0011   Epoch: 17   Global Step: 299230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:34,151-Speed 3324.70 samples/sec   Loss 0.1792   LearningRate 0.0011   Epoch: 17   Global Step: 299240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:37,247-Speed 3307.63 samples/sec   Loss 0.1747   LearningRate 0.0011   Epoch: 17   Global Step: 299250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:40,344-Speed 3307.53 samples/sec   Loss 0.1640   LearningRate 0.0011   Epoch: 17   Global Step: 299260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:43,426-Speed 3323.19 samples/sec   Loss 0.1613   LearningRate 0.0011   Epoch: 17   Global Step: 299270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:46,547-Speed 3282.12 samples/sec   Loss 0.1625   LearningRate 0.0011   Epoch: 17   Global Step: 299280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:49,661-Speed 3288.43 samples/sec   Loss 0.1740   LearningRate 0.0011   Epoch: 17   Global Step: 299290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:52,750-Speed 3316.30 samples/sec   Loss 0.1677   LearningRate 0.0011   Epoch: 17   Global Step: 299300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:55,894-Speed 3257.66 samples/sec   Loss 0.1797   LearningRate 0.0011   Epoch: 17   Global Step: 299310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:54:59,032-Speed 3263.51 samples/sec   Loss 0.1643   LearningRate 0.0011   Epoch: 17   Global Step: 299320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:55:02,090-Speed 3350.01 samples/sec   Loss 0.1739   LearningRate 0.0011   Epoch: 17   Global Step: 299330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:55:05,159-Speed 3337.03 samples/sec   Loss 0.1740   LearningRate 0.0011   Epoch: 17   Global Step: 299340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:55:08,262-Speed 3300.78 samples/sec   Loss 0.1654   LearningRate 0.0011   Epoch: 17   Global Step: 299350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:55:11,350-Speed 3317.56 samples/sec   Loss 0.1624   LearningRate 0.0011   Epoch: 17   Global Step: 299360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:55:14,422-Speed 3333.39 samples/sec   Loss 0.1861   LearningRate 0.0011   Epoch: 17   Global Step: 299370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:55:17,539-Speed 3285.97 samples/sec   Loss 0.1608   LearningRate 0.0011   Epoch: 17   Global Step: 299380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:55:20,630-Speed 3313.84 samples/sec   Loss 0.1887   LearningRate 0.0011   Epoch: 17   Global Step: 299390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:55:23,726-Speed 3308.49 samples/sec   Loss 0.1642   LearningRate 0.0011   Epoch: 17   Global Step: 299400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:26,829-Speed 3300.13 samples/sec   Loss 0.1866   LearningRate 0.0011   Epoch: 17   Global Step: 299410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:29,896-Speed 3339.62 samples/sec   Loss 0.1625   LearningRate 0.0011   Epoch: 17   Global Step: 299420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:33,001-Speed 3299.29 samples/sec   Loss 0.1665   LearningRate 0.0011   Epoch: 17   Global Step: 299430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:36,106-Speed 3298.21 samples/sec   Loss 0.1856   LearningRate 0.0011   Epoch: 17   Global Step: 299440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:39,216-Speed 3293.81 samples/sec   Loss 0.1742   LearningRate 0.0011   Epoch: 17   Global Step: 299450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:42,305-Speed 3316.07 samples/sec   Loss 0.1680   LearningRate 0.0011   Epoch: 17   Global Step: 299460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:45,390-Speed 3319.90 samples/sec   Loss 0.1912   LearningRate 0.0011   Epoch: 17   Global Step: 299470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:48,527-Speed 3265.02 samples/sec   Loss 0.1940   LearningRate 0.0011   Epoch: 17   Global Step: 299480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:51,673-Speed 3255.51 samples/sec   Loss 0.1745   LearningRate 0.0011   Epoch: 17   Global Step: 299490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:55:54,845-Speed 3228.58 samples/sec   Loss 0.1857   LearningRate 0.0011   Epoch: 17   Global Step: 299500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:55:57,947-Speed 3301.69 samples/sec   Loss 0.1782   LearningRate 0.0011   Epoch: 17   Global Step: 299510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:01,019-Speed 3334.80 samples/sec   Loss 0.1852   LearningRate 0.0011   Epoch: 17   Global Step: 299520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:04,090-Speed 3335.28 samples/sec   Loss 0.1683   LearningRate 0.0011   Epoch: 17   Global Step: 299530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:07,181-Speed 3312.94 samples/sec   Loss 0.1657   LearningRate 0.0011   Epoch: 17   Global Step: 299540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:10,246-Speed 3342.32 samples/sec   Loss 0.1798   LearningRate 0.0011   Epoch: 17   Global Step: 299550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:13,337-Speed 3313.71 samples/sec   Loss 0.1673   LearningRate 0.0011   Epoch: 17   Global Step: 299560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:16,409-Speed 3333.40 samples/sec   Loss 0.1658   LearningRate 0.0011   Epoch: 17   Global Step: 299570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:19,526-Speed 3285.48 samples/sec   Loss 0.1748   LearningRate 0.0011   Epoch: 17   Global Step: 299580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:22,628-Speed 3302.13 samples/sec   Loss 0.1613   LearningRate 0.0011   Epoch: 17   Global Step: 299590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:25,709-Speed 3324.63 samples/sec   Loss 0.1668   LearningRate 0.0011   Epoch: 17   Global Step: 299600   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:56:28,778-Speed 3338.12 samples/sec   Loss 0.1756   LearningRate 0.0011   Epoch: 17   Global Step: 299610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:31,867-Speed 3315.29 samples/sec   Loss 0.1774   LearningRate 0.0010   Epoch: 17   Global Step: 299620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:34,945-Speed 3327.97 samples/sec   Loss 0.1759   LearningRate 0.0010   Epoch: 17   Global Step: 299630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:38,022-Speed 3328.09 samples/sec   Loss 0.1674   LearningRate 0.0010   Epoch: 17   Global Step: 299640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:41,097-Speed 3331.20 samples/sec   Loss 0.1706   LearningRate 0.0010   Epoch: 17   Global Step: 299650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:44,218-Speed 3281.94 samples/sec   Loss 0.1713   LearningRate 0.0010   Epoch: 17   Global Step: 299660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:47,310-Speed 3312.75 samples/sec   Loss 0.1793   LearningRate 0.0010   Epoch: 17   Global Step: 299670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:56:50,371-Speed 3345.10 samples/sec   Loss 0.1823   LearningRate 0.0010   Epoch: 17   Global Step: 299680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:56:53,459-Speed 3316.73 samples/sec   Loss 0.1687   LearningRate 0.0010   Epoch: 17   Global Step: 299690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:56:56,540-Speed 3325.39 samples/sec   Loss 0.1695   LearningRate 0.0010   Epoch: 17   Global Step: 299700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:56:59,714-Speed 3226.46 samples/sec   Loss 0.1757   LearningRate 0.0010   Epoch: 17   Global Step: 299710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:57:02,811-Speed 3307.49 samples/sec   Loss 0.1741   LearningRate 0.0010   Epoch: 17   Global Step: 299720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:57:05,985-Speed 3227.03 samples/sec   Loss 0.1706   LearningRate 0.0010   Epoch: 17   Global Step: 299730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:57:09,080-Speed 3308.73 samples/sec   Loss 0.1624   LearningRate 0.0010   Epoch: 17   Global Step: 299740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:57:12,161-Speed 3325.26 samples/sec   Loss 0.1667   LearningRate 0.0010   Epoch: 17   Global Step: 299750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:57:15,233-Speed 3333.82 samples/sec   Loss 0.1703   LearningRate 0.0010   Epoch: 17   Global Step: 299760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:57:18,406-Speed 3228.01 samples/sec   Loss 0.1685   LearningRate 0.0010   Epoch: 17   Global Step: 299770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:57:21,572-Speed 3234.73 samples/sec   Loss 0.1654   LearningRate 0.0010   Epoch: 17   Global Step: 299780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:24,659-Speed 3318.44 samples/sec   Loss 0.1753   LearningRate 0.0010   Epoch: 17   Global Step: 299790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:27,833-Speed 3226.40 samples/sec   Loss 0.1520   LearningRate 0.0010   Epoch: 17   Global Step: 299800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:31,037-Speed 3196.37 samples/sec   Loss 0.1713   LearningRate 0.0010   Epoch: 17   Global Step: 299810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:34,126-Speed 3315.87 samples/sec   Loss 0.1712   LearningRate 0.0010   Epoch: 17   Global Step: 299820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:37,304-Speed 3223.49 samples/sec   Loss 0.1714   LearningRate 0.0010   Epoch: 17   Global Step: 299830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:40,401-Speed 3307.19 samples/sec   Loss 0.1722   LearningRate 0.0010   Epoch: 17   Global Step: 299840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:43,590-Speed 3211.28 samples/sec   Loss 0.1763   LearningRate 0.0010   Epoch: 17   Global Step: 299850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:46,803-Speed 3187.61 samples/sec   Loss 0.1735   LearningRate 0.0010   Epoch: 17   Global Step: 299860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:49,946-Speed 3259.54 samples/sec   Loss 0.1699   LearningRate 0.0010   Epoch: 17   Global Step: 299870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:53,121-Speed 3225.58 samples/sec   Loss 0.1788   LearningRate 0.0010   Epoch: 17   Global Step: 299880   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 06:57:56,201-Speed 3325.78 samples/sec   Loss 0.1681   LearningRate 0.0010   Epoch: 17   Global Step: 299890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:57:59,296-Speed 3308.27 samples/sec   Loss 0.1823   LearningRate 0.0010   Epoch: 17   Global Step: 299900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:02,373-Speed 3329.61 samples/sec   Loss 0.1834   LearningRate 0.0010   Epoch: 17   Global Step: 299910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:05,585-Speed 3187.90 samples/sec   Loss 0.1775   LearningRate 0.0010   Epoch: 17   Global Step: 299920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:08,754-Speed 3232.00 samples/sec   Loss 0.1692   LearningRate 0.0010   Epoch: 17   Global Step: 299930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:11,854-Speed 3305.09 samples/sec   Loss 0.1705   LearningRate 0.0010   Epoch: 17   Global Step: 299940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:15,009-Speed 3246.16 samples/sec   Loss 0.1695   LearningRate 0.0010   Epoch: 17   Global Step: 299950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:18,139-Speed 3271.96 samples/sec   Loss 0.1701   LearningRate 0.0010   Epoch: 17   Global Step: 299960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:21,238-Speed 3305.01 samples/sec   Loss 0.1683   LearningRate 0.0010   Epoch: 17   Global Step: 299970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:24,342-Speed 3300.35 samples/sec   Loss 0.1701   LearningRate 0.0010   Epoch: 17   Global Step: 299980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:27,429-Speed 3317.84 samples/sec   Loss 0.1636   LearningRate 0.0010   Epoch: 17   Global Step: 299990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 06:58:30,537-Speed 3295.39 samples/sec   Loss 0.1676   LearningRate 0.0010   Epoch: 17   Global Step: 300000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 06:59:14,661-[lfw][300000]XNorm: 20.486338
Training: 2022-04-12 06:59:14,661-[lfw][300000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 06:59:14,662-[lfw][300000]Accuracy-Highest: 0.99817
Training: 2022-04-12 07:00:05,802-[cfp_fp][300000]XNorm: 22.440386
Training: 2022-04-12 07:00:05,803-[cfp_fp][300000]Accuracy-Flip: 0.99143+-0.00404
Training: 2022-04-12 07:00:05,803-[cfp_fp][300000]Accuracy-Highest: 0.99200
Training: 2022-04-12 07:00:50,092-[agedb_30][300000]XNorm: 22.711142
Training: 2022-04-12 07:00:50,092-[agedb_30][300000]Accuracy-Flip: 0.98533+-0.00632
Training: 2022-04-12 07:00:50,093-[agedb_30][300000]Accuracy-Highest: 0.98650
Training: 2022-04-12 07:00:53,160-Speed 71.80 samples/sec   Loss 0.1834   LearningRate 0.0010   Epoch: 17   Global Step: 300010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:00:56,223-Speed 3344.27 samples/sec   Loss 0.1618   LearningRate 0.0010   Epoch: 17   Global Step: 300020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:00:59,293-Speed 3336.30 samples/sec   Loss 0.1767   LearningRate 0.0010   Epoch: 17   Global Step: 300030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:01:02,414-Speed 3281.33 samples/sec   Loss 0.1694   LearningRate 0.0010   Epoch: 17   Global Step: 300040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:01:05,511-Speed 3307.95 samples/sec   Loss 0.1701   LearningRate 0.0010   Epoch: 17   Global Step: 300050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:01:08,591-Speed 3325.38 samples/sec   Loss 0.1595   LearningRate 0.0010   Epoch: 17   Global Step: 300060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:01:11,742-Speed 3250.23 samples/sec   Loss 0.1897   LearningRate 0.0010   Epoch: 17   Global Step: 300070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:01:14,864-Speed 3281.23 samples/sec   Loss 0.1696   LearningRate 0.0010   Epoch: 17   Global Step: 300080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:01:17,935-Speed 3334.42 samples/sec   Loss 0.1850   LearningRate 0.0010   Epoch: 17   Global Step: 300090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:01:21,011-Speed 3329.95 samples/sec   Loss 0.1780   LearningRate 0.0010   Epoch: 17   Global Step: 300100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:01:24,087-Speed 3329.17 samples/sec   Loss 0.1724   LearningRate 0.0010   Epoch: 17   Global Step: 300110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:01:27,166-Speed 3326.69 samples/sec   Loss 0.1744   LearningRate 0.0010   Epoch: 17   Global Step: 300120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:30,268-Speed 3301.84 samples/sec   Loss 0.1674   LearningRate 0.0010   Epoch: 17   Global Step: 300130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:33,348-Speed 3325.92 samples/sec   Loss 0.1643   LearningRate 0.0010   Epoch: 17   Global Step: 300140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:36,426-Speed 3327.52 samples/sec   Loss 0.1839   LearningRate 0.0010   Epoch: 17   Global Step: 300150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:39,541-Speed 3288.43 samples/sec   Loss 0.1509   LearningRate 0.0010   Epoch: 17   Global Step: 300160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:42,612-Speed 3335.03 samples/sec   Loss 0.1717   LearningRate 0.0010   Epoch: 17   Global Step: 300170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:45,723-Speed 3292.77 samples/sec   Loss 0.1739   LearningRate 0.0010   Epoch: 17   Global Step: 300180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:48,815-Speed 3311.90 samples/sec   Loss 0.2029   LearningRate 0.0010   Epoch: 17   Global Step: 300190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:51,883-Speed 3338.16 samples/sec   Loss 0.1649   LearningRate 0.0010   Epoch: 17   Global Step: 300200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:54,944-Speed 3345.80 samples/sec   Loss 0.1760   LearningRate 0.0010   Epoch: 17   Global Step: 300210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:01:58,008-Speed 3343.15 samples/sec   Loss 0.1669   LearningRate 0.0010   Epoch: 17   Global Step: 300220   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 07:02:01,078-Speed 3336.38 samples/sec   Loss 0.1803   LearningRate 0.0010   Epoch: 17   Global Step: 300230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:04,150-Speed 3335.03 samples/sec   Loss 0.1697   LearningRate 0.0010   Epoch: 17   Global Step: 300240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:07,232-Speed 3323.29 samples/sec   Loss 0.1669   LearningRate 0.0010   Epoch: 17   Global Step: 300250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:10,354-Speed 3280.74 samples/sec   Loss 0.1669   LearningRate 0.0010   Epoch: 17   Global Step: 300260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:13,471-Speed 3285.64 samples/sec   Loss 0.1920   LearningRate 0.0010   Epoch: 17   Global Step: 300270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:16,547-Speed 3340.96 samples/sec   Loss 0.1615   LearningRate 0.0010   Epoch: 17   Global Step: 300280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:19,605-Speed 3349.60 samples/sec   Loss 0.1793   LearningRate 0.0010   Epoch: 17   Global Step: 300290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:22,674-Speed 3337.02 samples/sec   Loss 0.1653   LearningRate 0.0010   Epoch: 17   Global Step: 300300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:25,742-Speed 3338.49 samples/sec   Loss 0.1853   LearningRate 0.0010   Epoch: 17   Global Step: 300310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:28,822-Speed 3326.12 samples/sec   Loss 0.1803   LearningRate 0.0010   Epoch: 17   Global Step: 300320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:31,884-Speed 3343.87 samples/sec   Loss 0.1673   LearningRate 0.0010   Epoch: 17   Global Step: 300330   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 07:02:34,951-Speed 3340.55 samples/sec   Loss 0.1624   LearningRate 0.0010   Epoch: 17   Global Step: 300340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:38,013-Speed 3344.86 samples/sec   Loss 0.1683   LearningRate 0.0010   Epoch: 17   Global Step: 300350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:41,113-Speed 3303.73 samples/sec   Loss 0.1697   LearningRate 0.0010   Epoch: 17   Global Step: 300360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:44,198-Speed 3319.56 samples/sec   Loss 0.1699   LearningRate 0.0010   Epoch: 17   Global Step: 300370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:47,350-Speed 3251.25 samples/sec   Loss 0.1680   LearningRate 0.0010   Epoch: 17   Global Step: 300380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:02:50,460-Speed 3293.53 samples/sec   Loss 0.1718   LearningRate 0.0010   Epoch: 17   Global Step: 300390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:02:53,547-Speed 3317.52 samples/sec   Loss 0.1660   LearningRate 0.0010   Epoch: 17   Global Step: 300400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:02:56,618-Speed 3335.46 samples/sec   Loss 0.1742   LearningRate 0.0010   Epoch: 17   Global Step: 300410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:02:59,695-Speed 3328.73 samples/sec   Loss 0.1802   LearningRate 0.0010   Epoch: 17   Global Step: 300420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:03:03,349-Speed 2802.48 samples/sec   Loss 0.1654   LearningRate 0.0010   Epoch: 17   Global Step: 300430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:03:34,707-Speed 326.57 samples/sec   Loss 0.1543   LearningRate 0.0010   Epoch: 18   Global Step: 300440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:03:38,513-Speed 2691.68 samples/sec   Loss 0.1188   LearningRate 0.0010   Epoch: 18   Global Step: 300450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:03:41,688-Speed 3225.49 samples/sec   Loss 0.1140   LearningRate 0.0010   Epoch: 18   Global Step: 300460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:03:44,749-Speed 3346.57 samples/sec   Loss 0.1332   LearningRate 0.0010   Epoch: 18   Global Step: 300470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:03:47,851-Speed 3302.38 samples/sec   Loss 0.1172   LearningRate 0.0010   Epoch: 18   Global Step: 300480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:03:50,964-Speed 3289.98 samples/sec   Loss 0.1257   LearningRate 0.0010   Epoch: 18   Global Step: 300490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:03:54,109-Speed 3255.98 samples/sec   Loss 0.1232   LearningRate 0.0010   Epoch: 18   Global Step: 300500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:03:57,179-Speed 3336.57 samples/sec   Loss 0.1299   LearningRate 0.0010   Epoch: 18   Global Step: 300510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:00,241-Speed 3344.92 samples/sec   Loss 0.1129   LearningRate 0.0010   Epoch: 18   Global Step: 300520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:03,324-Speed 3322.75 samples/sec   Loss 0.1268   LearningRate 0.0010   Epoch: 18   Global Step: 300530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:06,521-Speed 3203.63 samples/sec   Loss 0.1321   LearningRate 0.0010   Epoch: 18   Global Step: 300540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:09,661-Speed 3262.45 samples/sec   Loss 0.1186   LearningRate 0.0010   Epoch: 18   Global Step: 300550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:12,730-Speed 3336.75 samples/sec   Loss 0.1115   LearningRate 0.0010   Epoch: 18   Global Step: 300560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:15,852-Speed 3280.97 samples/sec   Loss 0.1295   LearningRate 0.0010   Epoch: 18   Global Step: 300570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:18,982-Speed 3272.24 samples/sec   Loss 0.1372   LearningRate 0.0010   Epoch: 18   Global Step: 300580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:22,073-Speed 3314.00 samples/sec   Loss 0.1142   LearningRate 0.0010   Epoch: 18   Global Step: 300590   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-12 07:04:25,136-Speed 3343.90 samples/sec   Loss 0.1274   LearningRate 0.0010   Epoch: 18   Global Step: 300600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:28,217-Speed 3323.59 samples/sec   Loss 0.1136   LearningRate 0.0010   Epoch: 18   Global Step: 300610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:31,398-Speed 3220.03 samples/sec   Loss 0.1216   LearningRate 0.0010   Epoch: 18   Global Step: 300620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:04:34,478-Speed 3326.18 samples/sec   Loss 0.1116   LearningRate 0.0010   Epoch: 18   Global Step: 300630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:04:37,633-Speed 3246.42 samples/sec   Loss 0.1218   LearningRate 0.0010   Epoch: 18   Global Step: 300640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:04:40,783-Speed 3251.30 samples/sec   Loss 0.1223   LearningRate 0.0010   Epoch: 18   Global Step: 300650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:04:43,903-Speed 3283.01 samples/sec   Loss 0.1277   LearningRate 0.0010   Epoch: 18   Global Step: 300660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:04:47,366-Speed 2957.18 samples/sec   Loss 0.1185   LearningRate 0.0010   Epoch: 18   Global Step: 300670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:04:50,619-Speed 3148.45 samples/sec   Loss 0.1226   LearningRate 0.0010   Epoch: 18   Global Step: 300680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:04:53,857-Speed 3162.99 samples/sec   Loss 0.1134   LearningRate 0.0010   Epoch: 18   Global Step: 300690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:04:58,307-Speed 2301.58 samples/sec   Loss 0.1170   LearningRate 0.0010   Epoch: 18   Global Step: 300700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:05:02,616-Speed 2376.92 samples/sec   Loss 0.1247   LearningRate 0.0010   Epoch: 18   Global Step: 300710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:05:06,821-Speed 2435.98 samples/sec   Loss 0.1203   LearningRate 0.0010   Epoch: 18   Global Step: 300720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-12 07:05:09,900-Speed 3325.74 samples/sec   Loss 0.1184   LearningRate 0.0010   Epoch: 18   Global Step: 300730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:05:12,975-Speed 3331.29 samples/sec   Loss 0.1181   LearningRate 0.0010   Epoch: 18   Global Step: 300740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:05:16,049-Speed 3331.04 samples/sec   Loss 0.1213   LearningRate 0.0010   Epoch: 18   Global Step: 300750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-12 07:05:19,129-Speed 3326.41 samples/sec   Loss 0.1084   LearningRate 0.0010   Epoch: 18   Global Step: 300760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:05:22,235-Speed 3297.10 samples/sec   Loss 0.1180   LearningRate 0.0010   Epoch: 18   Global Step: 300770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:05:25,349-Speed 3289.47 samples/sec   Loss 0.1167   LearningRate 0.0010   Epoch: 18   Global Step: 300780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:05:28,421-Speed 3334.17 samples/sec   Loss 0.1274   LearningRate 0.0010   Epoch: 18   Global Step: 300790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:05:31,491-Speed 3335.95 samples/sec   Loss 0.1297   LearningRate 0.0010   Epoch: 18   Global Step: 300800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:05:34,558-Speed 3339.45 samples/sec   Loss 0.1184   LearningRate 0.0010   Epoch: 18   Global Step: 300810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:05:37,648-Speed 3314.68 samples/sec   Loss 0.1227   LearningRate 0.0010   Epoch: 18   Global Step: 300820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:05:40,744-Speed 3308.07 samples/sec   Loss 0.1254   LearningRate 0.0010   Epoch: 18   Global Step: 300830   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:05:43,844-Speed 3304.03 samples/sec   Loss 0.1322   LearningRate 0.0010   Epoch: 18   Global Step: 300840   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:05:46,941-Speed 3307.17 samples/sec   Loss 0.1264   LearningRate 0.0010   Epoch: 18   Global Step: 300850   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:05:50,042-Speed 3304.16 samples/sec   Loss 0.1204   LearningRate 0.0010   Epoch: 18   Global Step: 300860   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:05:53,100-Speed 3349.48 samples/sec   Loss 0.1154   LearningRate 0.0010   Epoch: 18   Global Step: 300870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:05:56,214-Speed 3288.83 samples/sec   Loss 0.1109   LearningRate 0.0010   Epoch: 18   Global Step: 300880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:05:59,277-Speed 3343.95 samples/sec   Loss 0.1210   LearningRate 0.0010   Epoch: 18   Global Step: 300890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:02,339-Speed 3344.63 samples/sec   Loss 0.1192   LearningRate 0.0010   Epoch: 18   Global Step: 300900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:05,417-Speed 3327.13 samples/sec   Loss 0.1303   LearningRate 0.0010   Epoch: 18   Global Step: 300910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:08,523-Speed 3297.76 samples/sec   Loss 0.1217   LearningRate 0.0010   Epoch: 18   Global Step: 300920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:11,590-Speed 3339.10 samples/sec   Loss 0.1155   LearningRate 0.0010   Epoch: 18   Global Step: 300930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:14,670-Speed 3325.57 samples/sec   Loss 0.1252   LearningRate 0.0010   Epoch: 18   Global Step: 300940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:17,732-Speed 3345.94 samples/sec   Loss 0.1173   LearningRate 0.0010   Epoch: 18   Global Step: 300950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:20,792-Speed 3346.56 samples/sec   Loss 0.1154   LearningRate 0.0010   Epoch: 18   Global Step: 300960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:23,873-Speed 3324.69 samples/sec   Loss 0.1182   LearningRate 0.0010   Epoch: 18   Global Step: 300970   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:06:26,951-Speed 3326.94 samples/sec   Loss 0.1149   LearningRate 0.0010   Epoch: 18   Global Step: 300980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:30,014-Speed 3344.66 samples/sec   Loss 0.1203   LearningRate 0.0010   Epoch: 18   Global Step: 300990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:33,075-Speed 3345.74 samples/sec   Loss 0.1149   LearningRate 0.0010   Epoch: 18   Global Step: 301000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:36,141-Speed 3340.43 samples/sec   Loss 0.1143   LearningRate 0.0010   Epoch: 18   Global Step: 301010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:39,242-Speed 3303.09 samples/sec   Loss 0.1156   LearningRate 0.0010   Epoch: 18   Global Step: 301020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:42,341-Speed 3305.89 samples/sec   Loss 0.1165   LearningRate 0.0010   Epoch: 18   Global Step: 301030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:45,398-Speed 3350.12 samples/sec   Loss 0.1210   LearningRate 0.0010   Epoch: 18   Global Step: 301040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:48,467-Speed 3337.31 samples/sec   Loss 0.1085   LearningRate 0.0010   Epoch: 18   Global Step: 301050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:51,556-Speed 3316.49 samples/sec   Loss 0.1188   LearningRate 0.0010   Epoch: 18   Global Step: 301060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:54,671-Speed 3287.23 samples/sec   Loss 0.1060   LearningRate 0.0010   Epoch: 18   Global Step: 301070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:06:57,758-Speed 3318.43 samples/sec   Loss 0.1201   LearningRate 0.0010   Epoch: 18   Global Step: 301080   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:07:00,845-Speed 3317.66 samples/sec   Loss 0.1232   LearningRate 0.0010   Epoch: 18   Global Step: 301090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:07:03,985-Speed 3261.61 samples/sec   Loss 0.1142   LearningRate 0.0010   Epoch: 18   Global Step: 301100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:07:07,119-Speed 3268.59 samples/sec   Loss 0.1222   LearningRate 0.0010   Epoch: 18   Global Step: 301110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:07:10,213-Speed 3310.34 samples/sec   Loss 0.1307   LearningRate 0.0010   Epoch: 18   Global Step: 301120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:07:13,292-Speed 3326.17 samples/sec   Loss 0.1142   LearningRate 0.0010   Epoch: 18   Global Step: 301130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:07:16,385-Speed 3312.10 samples/sec   Loss 0.1025   LearningRate 0.0010   Epoch: 18   Global Step: 301140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:07:19,481-Speed 3307.75 samples/sec   Loss 0.1123   LearningRate 0.0010   Epoch: 18   Global Step: 301150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:07:22,667-Speed 3215.34 samples/sec   Loss 0.1289   LearningRate 0.0010   Epoch: 18   Global Step: 301160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:25,765-Speed 3305.96 samples/sec   Loss 0.1190   LearningRate 0.0010   Epoch: 18   Global Step: 301170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:28,834-Speed 3336.68 samples/sec   Loss 0.1162   LearningRate 0.0010   Epoch: 18   Global Step: 301180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:31,919-Speed 3320.16 samples/sec   Loss 0.1161   LearningRate 0.0010   Epoch: 18   Global Step: 301190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:34,985-Speed 3341.56 samples/sec   Loss 0.1135   LearningRate 0.0010   Epoch: 18   Global Step: 301200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:38,150-Speed 3235.96 samples/sec   Loss 0.1252   LearningRate 0.0010   Epoch: 18   Global Step: 301210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:41,330-Speed 3220.09 samples/sec   Loss 0.1244   LearningRate 0.0010   Epoch: 18   Global Step: 301220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:44,413-Speed 3322.72 samples/sec   Loss 0.1234   LearningRate 0.0010   Epoch: 18   Global Step: 301230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:47,487-Speed 3332.32 samples/sec   Loss 0.1062   LearningRate 0.0010   Epoch: 18   Global Step: 301240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:50,552-Speed 3341.62 samples/sec   Loss 0.1321   LearningRate 0.0010   Epoch: 18   Global Step: 301250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:07:53,629-Speed 3328.84 samples/sec   Loss 0.1172   LearningRate 0.0010   Epoch: 18   Global Step: 301260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:07:56,744-Speed 3287.68 samples/sec   Loss 0.1202   LearningRate 0.0010   Epoch: 18   Global Step: 301270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:07:59,808-Speed 3342.63 samples/sec   Loss 0.1115   LearningRate 0.0010   Epoch: 18   Global Step: 301280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:08:02,876-Speed 3338.90 samples/sec   Loss 0.1147   LearningRate 0.0009   Epoch: 18   Global Step: 301290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:08:05,970-Speed 3310.65 samples/sec   Loss 0.1184   LearningRate 0.0009   Epoch: 18   Global Step: 301300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:08:09,095-Speed 3276.77 samples/sec   Loss 0.1226   LearningRate 0.0009   Epoch: 18   Global Step: 301310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:12,218-Speed 3279.22 samples/sec   Loss 0.1304   LearningRate 0.0009   Epoch: 18   Global Step: 301320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:15,434-Speed 3184.85 samples/sec   Loss 0.1142   LearningRate 0.0009   Epoch: 18   Global Step: 301330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:18,517-Speed 3322.11 samples/sec   Loss 0.1246   LearningRate 0.0009   Epoch: 18   Global Step: 301340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:21,627-Speed 3293.40 samples/sec   Loss 0.1197   LearningRate 0.0009   Epoch: 18   Global Step: 301350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:24,718-Speed 3313.82 samples/sec   Loss 0.1105   LearningRate 0.0009   Epoch: 18   Global Step: 301360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:27,792-Speed 3332.31 samples/sec   Loss 0.1175   LearningRate 0.0009   Epoch: 18   Global Step: 301370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:30,967-Speed 3225.80 samples/sec   Loss 0.1170   LearningRate 0.0009   Epoch: 18   Global Step: 301380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:34,053-Speed 3319.37 samples/sec   Loss 0.1386   LearningRate 0.0009   Epoch: 18   Global Step: 301390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:37,136-Speed 3322.39 samples/sec   Loss 0.1292   LearningRate 0.0009   Epoch: 18   Global Step: 301400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:08:40,237-Speed 3302.34 samples/sec   Loss 0.1182   LearningRate 0.0009   Epoch: 18   Global Step: 301410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:08:43,424-Speed 3213.76 samples/sec   Loss 0.1139   LearningRate 0.0009   Epoch: 18   Global Step: 301420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:08:46,494-Speed 3336.17 samples/sec   Loss 0.1217   LearningRate 0.0009   Epoch: 18   Global Step: 301430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:08:49,596-Speed 3302.17 samples/sec   Loss 0.1246   LearningRate 0.0009   Epoch: 18   Global Step: 301440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:08:52,715-Speed 3283.91 samples/sec   Loss 0.1199   LearningRate 0.0009   Epoch: 18   Global Step: 301450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:08:55,808-Speed 3311.67 samples/sec   Loss 0.1167   LearningRate 0.0009   Epoch: 18   Global Step: 301460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:08:58,880-Speed 3333.75 samples/sec   Loss 0.1240   LearningRate 0.0009   Epoch: 18   Global Step: 301470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:01,960-Speed 3325.56 samples/sec   Loss 0.1301   LearningRate 0.0009   Epoch: 18   Global Step: 301480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:05,077-Speed 3285.72 samples/sec   Loss 0.1137   LearningRate 0.0009   Epoch: 18   Global Step: 301490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:08,257-Speed 3221.34 samples/sec   Loss 0.1178   LearningRate 0.0009   Epoch: 18   Global Step: 301500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:11,356-Speed 3304.58 samples/sec   Loss 0.1273   LearningRate 0.0009   Epoch: 18   Global Step: 301510   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:09:14,453-Speed 3306.97 samples/sec   Loss 0.1268   LearningRate 0.0009   Epoch: 18   Global Step: 301520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:17,522-Speed 3337.73 samples/sec   Loss 0.1081   LearningRate 0.0009   Epoch: 18   Global Step: 301530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:20,606-Speed 3321.41 samples/sec   Loss 0.1208   LearningRate 0.0009   Epoch: 18   Global Step: 301540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:23,705-Speed 3305.65 samples/sec   Loss 0.1185   LearningRate 0.0009   Epoch: 18   Global Step: 301550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:26,852-Speed 3253.94 samples/sec   Loss 0.1088   LearningRate 0.0009   Epoch: 18   Global Step: 301560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:30,018-Speed 3235.89 samples/sec   Loss 0.1260   LearningRate 0.0009   Epoch: 18   Global Step: 301570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:33,083-Speed 3340.68 samples/sec   Loss 0.1281   LearningRate 0.0009   Epoch: 18   Global Step: 301580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:36,163-Speed 3326.05 samples/sec   Loss 0.1107   LearningRate 0.0009   Epoch: 18   Global Step: 301590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:39,267-Speed 3299.01 samples/sec   Loss 0.1206   LearningRate 0.0009   Epoch: 18   Global Step: 301600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:42,380-Speed 3290.84 samples/sec   Loss 0.1235   LearningRate 0.0009   Epoch: 18   Global Step: 301610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:09:45,451-Speed 3334.81 samples/sec   Loss 0.1344   LearningRate 0.0009   Epoch: 18   Global Step: 301620   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:09:48,537-Speed 3318.78 samples/sec   Loss 0.1188   LearningRate 0.0009   Epoch: 18   Global Step: 301630   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:09:51,653-Speed 3287.24 samples/sec   Loss 0.1280   LearningRate 0.0009   Epoch: 18   Global Step: 301640   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:09:54,741-Speed 3316.70 samples/sec   Loss 0.1132   LearningRate 0.0009   Epoch: 18   Global Step: 301650   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:09:57,923-Speed 3219.13 samples/sec   Loss 0.1183   LearningRate 0.0009   Epoch: 18   Global Step: 301660   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:10:00,981-Speed 3349.29 samples/sec   Loss 0.1126   LearningRate 0.0009   Epoch: 18   Global Step: 301670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:04,103-Speed 3280.80 samples/sec   Loss 0.1201   LearningRate 0.0009   Epoch: 18   Global Step: 301680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:07,194-Speed 3313.07 samples/sec   Loss 0.1219   LearningRate 0.0009   Epoch: 18   Global Step: 301690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:10,273-Speed 3326.77 samples/sec   Loss 0.1206   LearningRate 0.0009   Epoch: 18   Global Step: 301700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:13,370-Speed 3307.71 samples/sec   Loss 0.1162   LearningRate 0.0009   Epoch: 18   Global Step: 301710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:16,442-Speed 3334.32 samples/sec   Loss 0.1189   LearningRate 0.0009   Epoch: 18   Global Step: 301720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:19,553-Speed 3291.74 samples/sec   Loss 0.1246   LearningRate 0.0009   Epoch: 18   Global Step: 301730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:22,619-Speed 3341.01 samples/sec   Loss 0.1155   LearningRate 0.0009   Epoch: 18   Global Step: 301740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:25,688-Speed 3336.84 samples/sec   Loss 0.1227   LearningRate 0.0009   Epoch: 18   Global Step: 301750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:28,773-Speed 3320.67 samples/sec   Loss 0.1255   LearningRate 0.0009   Epoch: 18   Global Step: 301760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:31,829-Speed 3351.65 samples/sec   Loss 0.1238   LearningRate 0.0009   Epoch: 18   Global Step: 301770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:34,893-Speed 3341.88 samples/sec   Loss 0.1177   LearningRate 0.0009   Epoch: 18   Global Step: 301780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:37,983-Speed 3315.43 samples/sec   Loss 0.1216   LearningRate 0.0009   Epoch: 18   Global Step: 301790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:41,063-Speed 3325.97 samples/sec   Loss 0.1074   LearningRate 0.0009   Epoch: 18   Global Step: 301800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:44,171-Speed 3295.32 samples/sec   Loss 0.1191   LearningRate 0.0009   Epoch: 18   Global Step: 301810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:47,274-Speed 3301.29 samples/sec   Loss 0.1225   LearningRate 0.0009   Epoch: 18   Global Step: 301820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:50,395-Speed 3280.75 samples/sec   Loss 0.1188   LearningRate 0.0009   Epoch: 18   Global Step: 301830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:53,462-Speed 3340.15 samples/sec   Loss 0.1172   LearningRate 0.0009   Epoch: 18   Global Step: 301840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:56,596-Speed 3268.27 samples/sec   Loss 0.1256   LearningRate 0.0009   Epoch: 18   Global Step: 301850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:10:59,666-Speed 3336.32 samples/sec   Loss 0.1195   LearningRate 0.0009   Epoch: 18   Global Step: 301860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:02,730-Speed 3342.22 samples/sec   Loss 0.1206   LearningRate 0.0009   Epoch: 18   Global Step: 301870   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:11:05,826-Speed 3308.11 samples/sec   Loss 0.1167   LearningRate 0.0009   Epoch: 18   Global Step: 301880   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:11:08,956-Speed 3272.46 samples/sec   Loss 0.1248   LearningRate 0.0009   Epoch: 18   Global Step: 301890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:12,103-Speed 3254.91 samples/sec   Loss 0.1129   LearningRate 0.0009   Epoch: 18   Global Step: 301900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:15,167-Speed 3343.08 samples/sec   Loss 0.1317   LearningRate 0.0009   Epoch: 18   Global Step: 301910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:18,365-Speed 3201.97 samples/sec   Loss 0.1199   LearningRate 0.0009   Epoch: 18   Global Step: 301920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:21,448-Speed 3322.66 samples/sec   Loss 0.1117   LearningRate 0.0009   Epoch: 18   Global Step: 301930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:24,595-Speed 3254.49 samples/sec   Loss 0.1095   LearningRate 0.0009   Epoch: 18   Global Step: 301940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:27,772-Speed 3223.61 samples/sec   Loss 0.1262   LearningRate 0.0009   Epoch: 18   Global Step: 301950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:30,911-Speed 3263.75 samples/sec   Loss 0.1199   LearningRate 0.0009   Epoch: 18   Global Step: 301960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:33,993-Speed 3323.14 samples/sec   Loss 0.1169   LearningRate 0.0009   Epoch: 18   Global Step: 301970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:37,059-Speed 3340.87 samples/sec   Loss 0.1237   LearningRate 0.0009   Epoch: 18   Global Step: 301980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:40,140-Speed 3324.51 samples/sec   Loss 0.1112   LearningRate 0.0009   Epoch: 18   Global Step: 301990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:11:43,267-Speed 3275.46 samples/sec   Loss 0.1294   LearningRate 0.0009   Epoch: 18   Global Step: 302000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:12:27,231-[lfw][302000]XNorm: 20.738596
Training: 2022-04-12 07:12:27,232-[lfw][302000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 07:12:27,232-[lfw][302000]Accuracy-Highest: 0.99817
Training: 2022-04-12 07:13:17,859-[cfp_fp][302000]XNorm: 22.564360
Training: 2022-04-12 07:13:17,859-[cfp_fp][302000]Accuracy-Flip: 0.99143+-0.00373
Training: 2022-04-12 07:13:17,860-[cfp_fp][302000]Accuracy-Highest: 0.99200
Training: 2022-04-12 07:14:01,355-[agedb_30][302000]XNorm: 22.712201
Training: 2022-04-12 07:14:01,355-[agedb_30][302000]Accuracy-Flip: 0.98600+-0.00544
Training: 2022-04-12 07:14:01,356-[agedb_30][302000]Accuracy-Highest: 0.98650
Training: 2022-04-12 07:14:04,417-Speed 72.55 samples/sec   Loss 0.1245   LearningRate 0.0009   Epoch: 18   Global Step: 302010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:07,467-Speed 3357.20 samples/sec   Loss 0.1222   LearningRate 0.0009   Epoch: 18   Global Step: 302020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:10,543-Speed 3330.65 samples/sec   Loss 0.1021   LearningRate 0.0009   Epoch: 18   Global Step: 302030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:13,607-Speed 3342.47 samples/sec   Loss 0.1308   LearningRate 0.0009   Epoch: 18   Global Step: 302040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:16,657-Speed 3357.62 samples/sec   Loss 0.1287   LearningRate 0.0009   Epoch: 18   Global Step: 302050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:19,733-Speed 3330.39 samples/sec   Loss 0.1218   LearningRate 0.0009   Epoch: 18   Global Step: 302060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:22,855-Speed 3280.04 samples/sec   Loss 0.1200   LearningRate 0.0009   Epoch: 18   Global Step: 302070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:25,933-Speed 3327.58 samples/sec   Loss 0.1204   LearningRate 0.0009   Epoch: 18   Global Step: 302080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:29,141-Speed 3193.41 samples/sec   Loss 0.1136   LearningRate 0.0009   Epoch: 18   Global Step: 302090   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:14:32,217-Speed 3328.97 samples/sec   Loss 0.1192   LearningRate 0.0009   Epoch: 18   Global Step: 302100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:35,443-Speed 3175.03 samples/sec   Loss 0.1244   LearningRate 0.0009   Epoch: 18   Global Step: 302110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:38,539-Speed 3309.07 samples/sec   Loss 0.1225   LearningRate 0.0009   Epoch: 18   Global Step: 302120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:41,605-Speed 3339.87 samples/sec   Loss 0.1277   LearningRate 0.0009   Epoch: 18   Global Step: 302130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:44,809-Speed 3197.30 samples/sec   Loss 0.1108   LearningRate 0.0009   Epoch: 18   Global Step: 302140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:47,919-Speed 3293.36 samples/sec   Loss 0.1225   LearningRate 0.0009   Epoch: 18   Global Step: 302150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:50,986-Speed 3339.46 samples/sec   Loss 0.1147   LearningRate 0.0009   Epoch: 18   Global Step: 302160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:54,042-Speed 3351.03 samples/sec   Loss 0.1372   LearningRate 0.0009   Epoch: 18   Global Step: 302170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:14:57,102-Speed 3347.44 samples/sec   Loss 0.1151   LearningRate 0.0009   Epoch: 18   Global Step: 302180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:15:00,206-Speed 3300.23 samples/sec   Loss 0.1213   LearningRate 0.0009   Epoch: 18   Global Step: 302190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:15:03,291-Speed 3319.51 samples/sec   Loss 0.1303   LearningRate 0.0009   Epoch: 18   Global Step: 302200   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:15:06,430-Speed 3262.38 samples/sec   Loss 0.1189   LearningRate 0.0009   Epoch: 18   Global Step: 302210   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:15:09,498-Speed 3338.65 samples/sec   Loss 0.1270   LearningRate 0.0009   Epoch: 18   Global Step: 302220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:15:12,562-Speed 3343.36 samples/sec   Loss 0.1192   LearningRate 0.0009   Epoch: 18   Global Step: 302230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:15:15,630-Speed 3338.41 samples/sec   Loss 0.1350   LearningRate 0.0009   Epoch: 18   Global Step: 302240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:15:18,683-Speed 3355.11 samples/sec   Loss 0.1063   LearningRate 0.0009   Epoch: 18   Global Step: 302250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:21,752-Speed 3337.37 samples/sec   Loss 0.1101   LearningRate 0.0009   Epoch: 18   Global Step: 302260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:24,812-Speed 3347.12 samples/sec   Loss 0.1325   LearningRate 0.0009   Epoch: 18   Global Step: 302270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:27,888-Speed 3329.29 samples/sec   Loss 0.1064   LearningRate 0.0009   Epoch: 18   Global Step: 302280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:30,958-Speed 3336.77 samples/sec   Loss 0.1219   LearningRate 0.0009   Epoch: 18   Global Step: 302290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:34,027-Speed 3336.92 samples/sec   Loss 0.1209   LearningRate 0.0009   Epoch: 18   Global Step: 302300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:37,104-Speed 3329.08 samples/sec   Loss 0.1272   LearningRate 0.0009   Epoch: 18   Global Step: 302310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:40,195-Speed 3312.94 samples/sec   Loss 0.1312   LearningRate 0.0009   Epoch: 18   Global Step: 302320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:43,270-Speed 3330.44 samples/sec   Loss 0.1265   LearningRate 0.0009   Epoch: 18   Global Step: 302330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:46,333-Speed 3344.84 samples/sec   Loss 0.1137   LearningRate 0.0009   Epoch: 18   Global Step: 302340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:49,426-Speed 3311.14 samples/sec   Loss 0.1195   LearningRate 0.0009   Epoch: 18   Global Step: 302350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:15:52,577-Speed 3251.06 samples/sec   Loss 0.1178   LearningRate 0.0009   Epoch: 18   Global Step: 302360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:15:55,635-Speed 3349.41 samples/sec   Loss 0.1103   LearningRate 0.0009   Epoch: 18   Global Step: 302370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:15:58,725-Speed 3314.73 samples/sec   Loss 0.1023   LearningRate 0.0009   Epoch: 18   Global Step: 302380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:16:01,823-Speed 3305.38 samples/sec   Loss 0.1097   LearningRate 0.0009   Epoch: 18   Global Step: 302390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:16:04,901-Speed 3327.30 samples/sec   Loss 0.1319   LearningRate 0.0009   Epoch: 18   Global Step: 302400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:16:08,006-Speed 3298.89 samples/sec   Loss 0.1193   LearningRate 0.0009   Epoch: 18   Global Step: 302410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:16:11,065-Speed 3348.51 samples/sec   Loss 0.1366   LearningRate 0.0009   Epoch: 18   Global Step: 302420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:16:14,173-Speed 3295.97 samples/sec   Loss 0.1276   LearningRate 0.0009   Epoch: 18   Global Step: 302430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:16:17,236-Speed 3344.47 samples/sec   Loss 0.1141   LearningRate 0.0009   Epoch: 18   Global Step: 302440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:16:20,322-Speed 3317.81 samples/sec   Loss 0.1173   LearningRate 0.0009   Epoch: 18   Global Step: 302450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:16:23,420-Speed 3306.33 samples/sec   Loss 0.1192   LearningRate 0.0009   Epoch: 18   Global Step: 302460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:16:26,540-Speed 3282.69 samples/sec   Loss 0.1192   LearningRate 0.0009   Epoch: 18   Global Step: 302470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:29,602-Speed 3345.69 samples/sec   Loss 0.1177   LearningRate 0.0009   Epoch: 18   Global Step: 302480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:32,666-Speed 3342.04 samples/sec   Loss 0.1226   LearningRate 0.0009   Epoch: 18   Global Step: 302490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:35,740-Speed 3331.77 samples/sec   Loss 0.1251   LearningRate 0.0009   Epoch: 18   Global Step: 302500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:38,810-Speed 3336.05 samples/sec   Loss 0.1152   LearningRate 0.0009   Epoch: 18   Global Step: 302510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:41,869-Speed 3348.86 samples/sec   Loss 0.1321   LearningRate 0.0009   Epoch: 18   Global Step: 302520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:44,937-Speed 3338.77 samples/sec   Loss 0.1177   LearningRate 0.0009   Epoch: 18   Global Step: 302530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:48,000-Speed 3344.00 samples/sec   Loss 0.1151   LearningRate 0.0009   Epoch: 18   Global Step: 302540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:51,083-Speed 3321.96 samples/sec   Loss 0.1227   LearningRate 0.0009   Epoch: 18   Global Step: 302550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:54,176-Speed 3310.61 samples/sec   Loss 0.1251   LearningRate 0.0009   Epoch: 18   Global Step: 302560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:16:57,287-Speed 3292.77 samples/sec   Loss 0.1147   LearningRate 0.0009   Epoch: 18   Global Step: 302570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:00,437-Speed 3251.70 samples/sec   Loss 0.1232   LearningRate 0.0009   Epoch: 18   Global Step: 302580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:03,500-Speed 3343.26 samples/sec   Loss 0.1222   LearningRate 0.0009   Epoch: 18   Global Step: 302590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:06,611-Speed 3292.80 samples/sec   Loss 0.1121   LearningRate 0.0009   Epoch: 18   Global Step: 302600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:09,739-Speed 3274.38 samples/sec   Loss 0.1179   LearningRate 0.0009   Epoch: 18   Global Step: 302610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:12,838-Speed 3305.22 samples/sec   Loss 0.1043   LearningRate 0.0009   Epoch: 18   Global Step: 302620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:15,923-Speed 3319.96 samples/sec   Loss 0.1208   LearningRate 0.0009   Epoch: 18   Global Step: 302630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:19,041-Speed 3284.64 samples/sec   Loss 0.1201   LearningRate 0.0009   Epoch: 18   Global Step: 302640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:22,102-Speed 3345.94 samples/sec   Loss 0.1335   LearningRate 0.0009   Epoch: 18   Global Step: 302650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:25,166-Speed 3342.40 samples/sec   Loss 0.1280   LearningRate 0.0009   Epoch: 18   Global Step: 302660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:17:28,234-Speed 3339.21 samples/sec   Loss 0.1292   LearningRate 0.0009   Epoch: 18   Global Step: 302670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:31,345-Speed 3291.58 samples/sec   Loss 0.1220   LearningRate 0.0009   Epoch: 18   Global Step: 302680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:34,454-Speed 3295.19 samples/sec   Loss 0.1189   LearningRate 0.0009   Epoch: 18   Global Step: 302690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:37,551-Speed 3306.75 samples/sec   Loss 0.1181   LearningRate 0.0009   Epoch: 18   Global Step: 302700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:40,651-Speed 3303.84 samples/sec   Loss 0.1207   LearningRate 0.0009   Epoch: 18   Global Step: 302710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:43,724-Speed 3332.98 samples/sec   Loss 0.1258   LearningRate 0.0009   Epoch: 18   Global Step: 302720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:46,786-Speed 3345.30 samples/sec   Loss 0.1072   LearningRate 0.0009   Epoch: 18   Global Step: 302730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:49,900-Speed 3289.26 samples/sec   Loss 0.1184   LearningRate 0.0009   Epoch: 18   Global Step: 302740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:52,979-Speed 3326.15 samples/sec   Loss 0.1300   LearningRate 0.0009   Epoch: 18   Global Step: 302750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:56,035-Speed 3350.77 samples/sec   Loss 0.1135   LearningRate 0.0009   Epoch: 18   Global Step: 302760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:17:59,117-Speed 3323.79 samples/sec   Loss 0.1177   LearningRate 0.0009   Epoch: 18   Global Step: 302770   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:18:02,178-Speed 3346.29 samples/sec   Loss 0.1192   LearningRate 0.0009   Epoch: 18   Global Step: 302780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:05,243-Speed 3342.12 samples/sec   Loss 0.1261   LearningRate 0.0009   Epoch: 18   Global Step: 302790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:08,305-Speed 3345.23 samples/sec   Loss 0.1239   LearningRate 0.0009   Epoch: 18   Global Step: 302800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:11,403-Speed 3305.01 samples/sec   Loss 0.1155   LearningRate 0.0009   Epoch: 18   Global Step: 302810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:14,468-Speed 3342.63 samples/sec   Loss 0.1164   LearningRate 0.0009   Epoch: 18   Global Step: 302820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:17,618-Speed 3251.07 samples/sec   Loss 0.1242   LearningRate 0.0009   Epoch: 18   Global Step: 302830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:20,751-Speed 3269.20 samples/sec   Loss 0.1171   LearningRate 0.0009   Epoch: 18   Global Step: 302840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:23,824-Speed 3332.93 samples/sec   Loss 0.1203   LearningRate 0.0009   Epoch: 18   Global Step: 302850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:26,913-Speed 3315.04 samples/sec   Loss 0.1139   LearningRate 0.0009   Epoch: 18   Global Step: 302860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:30,083-Speed 3231.97 samples/sec   Loss 0.1163   LearningRate 0.0009   Epoch: 18   Global Step: 302870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:33,196-Speed 3290.06 samples/sec   Loss 0.1223   LearningRate 0.0009   Epoch: 18   Global Step: 302880   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:18:36,266-Speed 3336.94 samples/sec   Loss 0.1401   LearningRate 0.0009   Epoch: 18   Global Step: 302890   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:18:39,320-Speed 3352.75 samples/sec   Loss 0.1280   LearningRate 0.0009   Epoch: 18   Global Step: 302900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:42,419-Speed 3305.39 samples/sec   Loss 0.1257   LearningRate 0.0009   Epoch: 18   Global Step: 302910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:45,498-Speed 3326.70 samples/sec   Loss 0.1208   LearningRate 0.0009   Epoch: 18   Global Step: 302920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:48,575-Speed 3327.81 samples/sec   Loss 0.1250   LearningRate 0.0009   Epoch: 18   Global Step: 302930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:51,637-Speed 3345.15 samples/sec   Loss 0.1149   LearningRate 0.0009   Epoch: 18   Global Step: 302940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:54,780-Speed 3259.45 samples/sec   Loss 0.1156   LearningRate 0.0009   Epoch: 18   Global Step: 302950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:18:57,856-Speed 3329.85 samples/sec   Loss 0.1085   LearningRate 0.0009   Epoch: 18   Global Step: 302960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:00,936-Speed 3325.00 samples/sec   Loss 0.1233   LearningRate 0.0009   Epoch: 18   Global Step: 302970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:04,000-Speed 3343.06 samples/sec   Loss 0.1168   LearningRate 0.0009   Epoch: 18   Global Step: 302980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:07,097-Speed 3307.69 samples/sec   Loss 0.1225   LearningRate 0.0009   Epoch: 18   Global Step: 302990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:10,168-Speed 3334.69 samples/sec   Loss 0.1156   LearningRate 0.0009   Epoch: 18   Global Step: 303000   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:19:13,320-Speed 3248.89 samples/sec   Loss 0.1128   LearningRate 0.0009   Epoch: 18   Global Step: 303010   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:19:16,457-Speed 3265.28 samples/sec   Loss 0.1192   LearningRate 0.0009   Epoch: 18   Global Step: 303020   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:19:19,550-Speed 3312.18 samples/sec   Loss 0.1165   LearningRate 0.0009   Epoch: 18   Global Step: 303030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:22,677-Speed 3275.07 samples/sec   Loss 0.1168   LearningRate 0.0009   Epoch: 18   Global Step: 303040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:25,850-Speed 3227.84 samples/sec   Loss 0.1190   LearningRate 0.0008   Epoch: 18   Global Step: 303050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:28,941-Speed 3314.08 samples/sec   Loss 0.1251   LearningRate 0.0008   Epoch: 18   Global Step: 303060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:32,136-Speed 3205.19 samples/sec   Loss 0.1161   LearningRate 0.0008   Epoch: 18   Global Step: 303070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:35,284-Speed 3252.94 samples/sec   Loss 0.1180   LearningRate 0.0008   Epoch: 18   Global Step: 303080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:38,501-Speed 3184.35 samples/sec   Loss 0.1290   LearningRate 0.0008   Epoch: 18   Global Step: 303090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:41,612-Speed 3292.44 samples/sec   Loss 0.1227   LearningRate 0.0008   Epoch: 18   Global Step: 303100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:44,764-Speed 3248.85 samples/sec   Loss 0.1164   LearningRate 0.0008   Epoch: 18   Global Step: 303110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:47,882-Speed 3285.33 samples/sec   Loss 0.1259   LearningRate 0.0008   Epoch: 18   Global Step: 303120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:51,002-Speed 3283.49 samples/sec   Loss 0.1078   LearningRate 0.0008   Epoch: 18   Global Step: 303130   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:19:54,078-Speed 3329.68 samples/sec   Loss 0.1173   LearningRate 0.0008   Epoch: 18   Global Step: 303140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:19:57,171-Speed 3311.41 samples/sec   Loss 0.1198   LearningRate 0.0008   Epoch: 18   Global Step: 303150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:00,334-Speed 3239.47 samples/sec   Loss 0.1200   LearningRate 0.0008   Epoch: 18   Global Step: 303160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:03,442-Speed 3296.06 samples/sec   Loss 0.1220   LearningRate 0.0008   Epoch: 18   Global Step: 303170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:06,539-Speed 3306.99 samples/sec   Loss 0.1075   LearningRate 0.0008   Epoch: 18   Global Step: 303180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:09,611-Speed 3333.60 samples/sec   Loss 0.1160   LearningRate 0.0008   Epoch: 18   Global Step: 303190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:12,744-Speed 3270.34 samples/sec   Loss 0.1114   LearningRate 0.0008   Epoch: 18   Global Step: 303200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:15,861-Speed 3286.28 samples/sec   Loss 0.1147   LearningRate 0.0008   Epoch: 18   Global Step: 303210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:19,026-Speed 3235.36 samples/sec   Loss 0.1215   LearningRate 0.0008   Epoch: 18   Global Step: 303220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:22,105-Speed 3326.72 samples/sec   Loss 0.1221   LearningRate 0.0008   Epoch: 18   Global Step: 303230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:25,257-Speed 3249.32 samples/sec   Loss 0.1080   LearningRate 0.0008   Epoch: 18   Global Step: 303240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:28,369-Speed 3291.10 samples/sec   Loss 0.1335   LearningRate 0.0008   Epoch: 18   Global Step: 303250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:31,468-Speed 3305.80 samples/sec   Loss 0.1205   LearningRate 0.0008   Epoch: 18   Global Step: 303260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:34,546-Speed 3326.63 samples/sec   Loss 0.1239   LearningRate 0.0008   Epoch: 18   Global Step: 303270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:37,664-Speed 3285.53 samples/sec   Loss 0.1253   LearningRate 0.0008   Epoch: 18   Global Step: 303280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:40,759-Speed 3309.52 samples/sec   Loss 0.1172   LearningRate 0.0008   Epoch: 18   Global Step: 303290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:43,892-Speed 3269.32 samples/sec   Loss 0.1274   LearningRate 0.0008   Epoch: 18   Global Step: 303300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:47,024-Speed 3270.12 samples/sec   Loss 0.1167   LearningRate 0.0008   Epoch: 18   Global Step: 303310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:50,087-Speed 3343.76 samples/sec   Loss 0.1133   LearningRate 0.0008   Epoch: 18   Global Step: 303320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:53,160-Speed 3333.34 samples/sec   Loss 0.1249   LearningRate 0.0008   Epoch: 18   Global Step: 303330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:20:56,234-Speed 3331.93 samples/sec   Loss 0.1178   LearningRate 0.0008   Epoch: 18   Global Step: 303340   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:20:59,302-Speed 3338.24 samples/sec   Loss 0.1228   LearningRate 0.0008   Epoch: 18   Global Step: 303350   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:21:02,376-Speed 3332.34 samples/sec   Loss 0.1193   LearningRate 0.0008   Epoch: 18   Global Step: 303360   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:21:05,434-Speed 3349.25 samples/sec   Loss 0.1139   LearningRate 0.0008   Epoch: 18   Global Step: 303370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:21:08,535-Speed 3302.52 samples/sec   Loss 0.1141   LearningRate 0.0008   Epoch: 18   Global Step: 303380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:21:11,619-Speed 3321.14 samples/sec   Loss 0.1174   LearningRate 0.0008   Epoch: 18   Global Step: 303390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:14,696-Speed 3328.83 samples/sec   Loss 0.1158   LearningRate 0.0008   Epoch: 18   Global Step: 303400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:17,768-Speed 3333.96 samples/sec   Loss 0.1127   LearningRate 0.0008   Epoch: 18   Global Step: 303410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:20,907-Speed 3262.68 samples/sec   Loss 0.1274   LearningRate 0.0008   Epoch: 18   Global Step: 303420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:23,996-Speed 3315.48 samples/sec   Loss 0.1204   LearningRate 0.0008   Epoch: 18   Global Step: 303430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:27,071-Speed 3331.30 samples/sec   Loss 0.1197   LearningRate 0.0008   Epoch: 18   Global Step: 303440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:30,275-Speed 3196.82 samples/sec   Loss 0.1255   LearningRate 0.0008   Epoch: 18   Global Step: 303450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:33,376-Speed 3303.25 samples/sec   Loss 0.1209   LearningRate 0.0008   Epoch: 18   Global Step: 303460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:36,445-Speed 3337.37 samples/sec   Loss 0.1196   LearningRate 0.0008   Epoch: 18   Global Step: 303470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:39,505-Speed 3347.05 samples/sec   Loss 0.1231   LearningRate 0.0008   Epoch: 18   Global Step: 303480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:21:42,656-Speed 3249.87 samples/sec   Loss 0.1136   LearningRate 0.0008   Epoch: 18   Global Step: 303490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:21:45,766-Speed 3294.01 samples/sec   Loss 0.1192   LearningRate 0.0008   Epoch: 18   Global Step: 303500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:21:48,834-Speed 3338.55 samples/sec   Loss 0.1206   LearningRate 0.0008   Epoch: 18   Global Step: 303510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:21:51,915-Speed 3323.84 samples/sec   Loss 0.1245   LearningRate 0.0008   Epoch: 18   Global Step: 303520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:21:55,007-Speed 3312.18 samples/sec   Loss 0.1288   LearningRate 0.0008   Epoch: 18   Global Step: 303530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:21:58,091-Speed 3321.66 samples/sec   Loss 0.1147   LearningRate 0.0008   Epoch: 18   Global Step: 303540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:22:01,165-Speed 3332.27 samples/sec   Loss 0.1134   LearningRate 0.0008   Epoch: 18   Global Step: 303550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:22:04,284-Speed 3283.78 samples/sec   Loss 0.1261   LearningRate 0.0008   Epoch: 18   Global Step: 303560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:07,349-Speed 3341.55 samples/sec   Loss 0.1232   LearningRate 0.0008   Epoch: 18   Global Step: 303570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:10,530-Speed 3219.41 samples/sec   Loss 0.1298   LearningRate 0.0008   Epoch: 18   Global Step: 303580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:13,717-Speed 3213.49 samples/sec   Loss 0.1119   LearningRate 0.0008   Epoch: 18   Global Step: 303590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:16,783-Speed 3340.99 samples/sec   Loss 0.1022   LearningRate 0.0008   Epoch: 18   Global Step: 303600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:19,849-Speed 3340.85 samples/sec   Loss 0.1107   LearningRate 0.0008   Epoch: 18   Global Step: 303610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:22,922-Speed 3333.11 samples/sec   Loss 0.1225   LearningRate 0.0008   Epoch: 18   Global Step: 303620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:25,993-Speed 3334.67 samples/sec   Loss 0.1216   LearningRate 0.0008   Epoch: 18   Global Step: 303630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:29,093-Speed 3304.11 samples/sec   Loss 0.1059   LearningRate 0.0008   Epoch: 18   Global Step: 303640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:32,231-Speed 3264.17 samples/sec   Loss 0.1219   LearningRate 0.0008   Epoch: 18   Global Step: 303650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:22:35,295-Speed 3343.03 samples/sec   Loss 0.1167   LearningRate 0.0008   Epoch: 18   Global Step: 303660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:22:38,375-Speed 3325.10 samples/sec   Loss 0.1271   LearningRate 0.0008   Epoch: 18   Global Step: 303670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:22:41,590-Speed 3186.12 samples/sec   Loss 0.1196   LearningRate 0.0008   Epoch: 18   Global Step: 303680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:22:44,679-Speed 3315.16 samples/sec   Loss 0.1310   LearningRate 0.0008   Epoch: 18   Global Step: 303690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:22:47,740-Speed 3345.93 samples/sec   Loss 0.1256   LearningRate 0.0008   Epoch: 18   Global Step: 303700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:22:50,810-Speed 3336.06 samples/sec   Loss 0.1166   LearningRate 0.0008   Epoch: 18   Global Step: 303710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:22:53,884-Speed 3332.44 samples/sec   Loss 0.1230   LearningRate 0.0008   Epoch: 18   Global Step: 303720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:22:56,948-Speed 3343.22 samples/sec   Loss 0.1175   LearningRate 0.0008   Epoch: 18   Global Step: 303730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:00,022-Speed 3332.05 samples/sec   Loss 0.1327   LearningRate 0.0008   Epoch: 18   Global Step: 303740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:03,086-Speed 3342.78 samples/sec   Loss 0.1194   LearningRate 0.0008   Epoch: 18   Global Step: 303750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:06,150-Speed 3342.54 samples/sec   Loss 0.1146   LearningRate 0.0008   Epoch: 18   Global Step: 303760   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:23:09,216-Speed 3340.79 samples/sec   Loss 0.1334   LearningRate 0.0008   Epoch: 18   Global Step: 303770   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:23:12,282-Speed 3340.63 samples/sec   Loss 0.1363   LearningRate 0.0008   Epoch: 18   Global Step: 303780   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:23:15,343-Speed 3345.31 samples/sec   Loss 0.1255   LearningRate 0.0008   Epoch: 18   Global Step: 303790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:18,412-Speed 3337.52 samples/sec   Loss 0.1238   LearningRate 0.0008   Epoch: 18   Global Step: 303800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:21,480-Speed 3338.99 samples/sec   Loss 0.1155   LearningRate 0.0008   Epoch: 18   Global Step: 303810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:24,567-Speed 3317.67 samples/sec   Loss 0.1184   LearningRate 0.0008   Epoch: 18   Global Step: 303820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:27,638-Speed 3334.85 samples/sec   Loss 0.1110   LearningRate 0.0008   Epoch: 18   Global Step: 303830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:30,733-Speed 3310.31 samples/sec   Loss 0.1256   LearningRate 0.0008   Epoch: 18   Global Step: 303840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:33,869-Speed 3265.24 samples/sec   Loss 0.1208   LearningRate 0.0008   Epoch: 18   Global Step: 303850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:36,945-Speed 3330.63 samples/sec   Loss 0.1355   LearningRate 0.0008   Epoch: 18   Global Step: 303860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:40,010-Speed 3341.55 samples/sec   Loss 0.1176   LearningRate 0.0008   Epoch: 18   Global Step: 303870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:43,080-Speed 3335.84 samples/sec   Loss 0.1190   LearningRate 0.0008   Epoch: 18   Global Step: 303880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:46,176-Speed 3307.88 samples/sec   Loss 0.1202   LearningRate 0.0008   Epoch: 18   Global Step: 303890   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:23:49,233-Speed 3350.52 samples/sec   Loss 0.1260   LearningRate 0.0008   Epoch: 18   Global Step: 303900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:52,319-Speed 3319.47 samples/sec   Loss 0.1164   LearningRate 0.0008   Epoch: 18   Global Step: 303910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:55,393-Speed 3331.77 samples/sec   Loss 0.1174   LearningRate 0.0008   Epoch: 18   Global Step: 303920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:23:58,579-Speed 3215.18 samples/sec   Loss 0.1189   LearningRate 0.0008   Epoch: 18   Global Step: 303930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:24:01,649-Speed 3335.60 samples/sec   Loss 0.1194   LearningRate 0.0008   Epoch: 18   Global Step: 303940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:24:04,715-Speed 3340.51 samples/sec   Loss 0.1192   LearningRate 0.0008   Epoch: 18   Global Step: 303950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:24:07,805-Speed 3315.06 samples/sec   Loss 0.1258   LearningRate 0.0008   Epoch: 18   Global Step: 303960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:24:10,882-Speed 3327.89 samples/sec   Loss 0.1259   LearningRate 0.0008   Epoch: 18   Global Step: 303970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:24:14,038-Speed 3246.33 samples/sec   Loss 0.1281   LearningRate 0.0008   Epoch: 18   Global Step: 303980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:24:17,181-Speed 3258.84 samples/sec   Loss 0.1251   LearningRate 0.0008   Epoch: 18   Global Step: 303990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:24:20,262-Speed 3323.88 samples/sec   Loss 0.1233   LearningRate 0.0008   Epoch: 18   Global Step: 304000   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:25:03,967-[lfw][304000]XNorm: 20.753942
Training: 2022-04-12 07:25:03,968-[lfw][304000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-12 07:25:03,969-[lfw][304000]Accuracy-Highest: 0.99817
Training: 2022-04-12 07:25:54,402-[cfp_fp][304000]XNorm: 22.611503
Training: 2022-04-12 07:25:54,403-[cfp_fp][304000]Accuracy-Flip: 0.99200+-0.00405
Training: 2022-04-12 07:25:54,403-[cfp_fp][304000]Accuracy-Highest: 0.99200
Training: 2022-04-12 07:26:37,832-[agedb_30][304000]XNorm: 22.824218
Training: 2022-04-12 07:26:37,833-[agedb_30][304000]Accuracy-Flip: 0.98517+-0.00589
Training: 2022-04-12 07:26:37,833-[agedb_30][304000]Accuracy-Highest: 0.98650
Training: 2022-04-12 07:26:40,929-Speed 72.80 samples/sec   Loss 0.1277   LearningRate 0.0008   Epoch: 18   Global Step: 304010   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:26:44,004-Speed 3330.97 samples/sec   Loss 0.1190   LearningRate 0.0008   Epoch: 18   Global Step: 304020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:26:47,067-Speed 3343.91 samples/sec   Loss 0.1139   LearningRate 0.0008   Epoch: 18   Global Step: 304030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:26:50,149-Speed 3323.82 samples/sec   Loss 0.1182   LearningRate 0.0008   Epoch: 18   Global Step: 304040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:26:53,248-Speed 3304.76 samples/sec   Loss 0.1202   LearningRate 0.0008   Epoch: 18   Global Step: 304050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:26:56,372-Speed 3277.81 samples/sec   Loss 0.1211   LearningRate 0.0008   Epoch: 18   Global Step: 304060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:26:59,511-Speed 3262.98 samples/sec   Loss 0.1232   LearningRate 0.0008   Epoch: 18   Global Step: 304070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:27:02,649-Speed 3264.45 samples/sec   Loss 0.1120   LearningRate 0.0008   Epoch: 18   Global Step: 304080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:27:05,706-Speed 3350.25 samples/sec   Loss 0.1376   LearningRate 0.0008   Epoch: 18   Global Step: 304090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:27:08,766-Speed 3347.51 samples/sec   Loss 0.1136   LearningRate 0.0008   Epoch: 18   Global Step: 304100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:27:11,839-Speed 3332.80 samples/sec   Loss 0.1215   LearningRate 0.0008   Epoch: 18   Global Step: 304110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:27:14,898-Speed 3348.14 samples/sec   Loss 0.1158   LearningRate 0.0008   Epoch: 18   Global Step: 304120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:27:17,973-Speed 3330.72 samples/sec   Loss 0.1249   LearningRate 0.0008   Epoch: 18   Global Step: 304130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:27:21,061-Speed 3317.63 samples/sec   Loss 0.1123   LearningRate 0.0008   Epoch: 18   Global Step: 304140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:27:24,121-Speed 3346.09 samples/sec   Loss 0.1267   LearningRate 0.0008   Epoch: 18   Global Step: 304150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:27,237-Speed 3287.20 samples/sec   Loss 0.1163   LearningRate 0.0008   Epoch: 18   Global Step: 304160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:30,313-Speed 3329.79 samples/sec   Loss 0.1196   LearningRate 0.0008   Epoch: 18   Global Step: 304170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:33,382-Speed 3337.35 samples/sec   Loss 0.1217   LearningRate 0.0008   Epoch: 18   Global Step: 304180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:36,472-Speed 3315.22 samples/sec   Loss 0.1145   LearningRate 0.0008   Epoch: 18   Global Step: 304190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:39,544-Speed 3334.15 samples/sec   Loss 0.1245   LearningRate 0.0008   Epoch: 18   Global Step: 304200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:42,615-Speed 3334.48 samples/sec   Loss 0.1312   LearningRate 0.0008   Epoch: 18   Global Step: 304210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:45,695-Speed 3325.41 samples/sec   Loss 0.1171   LearningRate 0.0008   Epoch: 18   Global Step: 304220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:48,779-Speed 3321.19 samples/sec   Loss 0.1153   LearningRate 0.0008   Epoch: 18   Global Step: 304230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:51,863-Speed 3321.36 samples/sec   Loss 0.1089   LearningRate 0.0008   Epoch: 18   Global Step: 304240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:27:54,940-Speed 3328.95 samples/sec   Loss 0.1228   LearningRate 0.0008   Epoch: 18   Global Step: 304250   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:27:58,032-Speed 3312.77 samples/sec   Loss 0.1259   LearningRate 0.0008   Epoch: 18   Global Step: 304260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:01,150-Speed 3285.22 samples/sec   Loss 0.1159   LearningRate 0.0008   Epoch: 18   Global Step: 304270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:04,280-Speed 3271.88 samples/sec   Loss 0.1103   LearningRate 0.0008   Epoch: 18   Global Step: 304280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:07,398-Speed 3285.09 samples/sec   Loss 0.1230   LearningRate 0.0008   Epoch: 18   Global Step: 304290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:10,486-Speed 3317.31 samples/sec   Loss 0.1234   LearningRate 0.0008   Epoch: 18   Global Step: 304300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:13,579-Speed 3310.39 samples/sec   Loss 0.1181   LearningRate 0.0008   Epoch: 18   Global Step: 304310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:16,655-Speed 3330.42 samples/sec   Loss 0.1311   LearningRate 0.0008   Epoch: 18   Global Step: 304320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:19,737-Speed 3322.80 samples/sec   Loss 0.1411   LearningRate 0.0008   Epoch: 18   Global Step: 304330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:22,835-Speed 3306.31 samples/sec   Loss 0.1149   LearningRate 0.0008   Epoch: 18   Global Step: 304340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:25,904-Speed 3337.44 samples/sec   Loss 0.1137   LearningRate 0.0008   Epoch: 18   Global Step: 304350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:28,970-Speed 3340.48 samples/sec   Loss 0.1214   LearningRate 0.0008   Epoch: 18   Global Step: 304360   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:28:32,068-Speed 3306.55 samples/sec   Loss 0.1095   LearningRate 0.0008   Epoch: 18   Global Step: 304370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:35,186-Speed 3285.22 samples/sec   Loss 0.1190   LearningRate 0.0008   Epoch: 18   Global Step: 304380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:38,246-Speed 3346.75 samples/sec   Loss 0.1225   LearningRate 0.0008   Epoch: 18   Global Step: 304390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:41,314-Speed 3338.73 samples/sec   Loss 0.1312   LearningRate 0.0008   Epoch: 18   Global Step: 304400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:44,374-Speed 3347.32 samples/sec   Loss 0.1215   LearningRate 0.0008   Epoch: 18   Global Step: 304410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:47,431-Speed 3349.74 samples/sec   Loss 0.1207   LearningRate 0.0008   Epoch: 18   Global Step: 304420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:50,503-Speed 3334.58 samples/sec   Loss 0.1242   LearningRate 0.0008   Epoch: 18   Global Step: 304430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:53,580-Speed 3329.16 samples/sec   Loss 0.1194   LearningRate 0.0008   Epoch: 18   Global Step: 304440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:56,679-Speed 3304.23 samples/sec   Loss 0.1229   LearningRate 0.0008   Epoch: 18   Global Step: 304450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:28:59,738-Speed 3348.08 samples/sec   Loss 0.1191   LearningRate 0.0008   Epoch: 18   Global Step: 304460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:02,792-Speed 3354.09 samples/sec   Loss 0.1259   LearningRate 0.0008   Epoch: 18   Global Step: 304470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:05,853-Speed 3346.72 samples/sec   Loss 0.1164   LearningRate 0.0008   Epoch: 18   Global Step: 304480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:08,912-Speed 3347.55 samples/sec   Loss 0.1277   LearningRate 0.0008   Epoch: 18   Global Step: 304490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:12,008-Speed 3308.09 samples/sec   Loss 0.1238   LearningRate 0.0008   Epoch: 18   Global Step: 304500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:15,083-Speed 3331.42 samples/sec   Loss 0.1319   LearningRate 0.0008   Epoch: 18   Global Step: 304510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:18,142-Speed 3348.10 samples/sec   Loss 0.1377   LearningRate 0.0008   Epoch: 18   Global Step: 304520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:21,238-Speed 3308.53 samples/sec   Loss 0.1104   LearningRate 0.0008   Epoch: 18   Global Step: 304530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:24,309-Speed 3334.48 samples/sec   Loss 0.1194   LearningRate 0.0008   Epoch: 18   Global Step: 304540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:27,372-Speed 3344.12 samples/sec   Loss 0.1311   LearningRate 0.0008   Epoch: 18   Global Step: 304550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:30,434-Speed 3345.25 samples/sec   Loss 0.1324   LearningRate 0.0008   Epoch: 18   Global Step: 304560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:33,504-Speed 3336.09 samples/sec   Loss 0.1296   LearningRate 0.0008   Epoch: 18   Global Step: 304570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:36,607-Speed 3300.40 samples/sec   Loss 0.1234   LearningRate 0.0008   Epoch: 18   Global Step: 304580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:39,792-Speed 3215.43 samples/sec   Loss 0.1230   LearningRate 0.0008   Epoch: 18   Global Step: 304590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:42,929-Speed 3265.89 samples/sec   Loss 0.1355   LearningRate 0.0008   Epoch: 18   Global Step: 304600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:45,991-Speed 3344.71 samples/sec   Loss 0.1160   LearningRate 0.0008   Epoch: 18   Global Step: 304610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:49,055-Speed 3342.63 samples/sec   Loss 0.1191   LearningRate 0.0008   Epoch: 18   Global Step: 304620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:52,129-Speed 3332.33 samples/sec   Loss 0.1190   LearningRate 0.0008   Epoch: 18   Global Step: 304630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:55,206-Speed 3328.32 samples/sec   Loss 0.1315   LearningRate 0.0008   Epoch: 18   Global Step: 304640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:29:58,377-Speed 3230.28 samples/sec   Loss 0.1173   LearningRate 0.0008   Epoch: 18   Global Step: 304650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:01,448-Speed 3334.86 samples/sec   Loss 0.1268   LearningRate 0.0008   Epoch: 18   Global Step: 304660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:04,552-Speed 3300.06 samples/sec   Loss 0.1303   LearningRate 0.0008   Epoch: 18   Global Step: 304670   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:30:07,690-Speed 3263.87 samples/sec   Loss 0.1222   LearningRate 0.0008   Epoch: 18   Global Step: 304680   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:30:10,873-Speed 3217.90 samples/sec   Loss 0.1092   LearningRate 0.0008   Epoch: 18   Global Step: 304690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:13,965-Speed 3312.25 samples/sec   Loss 0.1246   LearningRate 0.0008   Epoch: 18   Global Step: 304700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:17,057-Speed 3313.17 samples/sec   Loss 0.1202   LearningRate 0.0008   Epoch: 18   Global Step: 304710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:20,140-Speed 3322.36 samples/sec   Loss 0.1186   LearningRate 0.0008   Epoch: 18   Global Step: 304720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:23,295-Speed 3246.23 samples/sec   Loss 0.1145   LearningRate 0.0008   Epoch: 18   Global Step: 304730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:26,365-Speed 3335.45 samples/sec   Loss 0.1386   LearningRate 0.0008   Epoch: 18   Global Step: 304740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:29,432-Speed 3340.27 samples/sec   Loss 0.1244   LearningRate 0.0008   Epoch: 18   Global Step: 304750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:32,515-Speed 3321.98 samples/sec   Loss 0.1206   LearningRate 0.0008   Epoch: 18   Global Step: 304760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:30:35,640-Speed 3277.93 samples/sec   Loss 0.1213   LearningRate 0.0008   Epoch: 18   Global Step: 304770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:30:38,766-Speed 3276.52 samples/sec   Loss 0.1314   LearningRate 0.0008   Epoch: 18   Global Step: 304780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:30:41,839-Speed 3333.22 samples/sec   Loss 0.1242   LearningRate 0.0008   Epoch: 18   Global Step: 304790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:30:44,921-Speed 3322.70 samples/sec   Loss 0.1266   LearningRate 0.0008   Epoch: 18   Global Step: 304800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:30:48,044-Speed 3279.37 samples/sec   Loss 0.1268   LearningRate 0.0008   Epoch: 18   Global Step: 304810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:30:51,119-Speed 3331.84 samples/sec   Loss 0.1206   LearningRate 0.0008   Epoch: 18   Global Step: 304820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:30:54,336-Speed 3182.82 samples/sec   Loss 0.1220   LearningRate 0.0008   Epoch: 18   Global Step: 304830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:30:57,563-Speed 3173.99 samples/sec   Loss 0.1346   LearningRate 0.0008   Epoch: 18   Global Step: 304840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:31:00,652-Speed 3316.88 samples/sec   Loss 0.1235   LearningRate 0.0008   Epoch: 18   Global Step: 304850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:31:03,800-Speed 3253.21 samples/sec   Loss 0.1243   LearningRate 0.0008   Epoch: 18   Global Step: 304860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:31:06,910-Speed 3293.80 samples/sec   Loss 0.1123   LearningRate 0.0008   Epoch: 18   Global Step: 304870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:10,017-Speed 3296.81 samples/sec   Loss 0.1315   LearningRate 0.0008   Epoch: 18   Global Step: 304880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:13,084-Speed 3339.32 samples/sec   Loss 0.1174   LearningRate 0.0008   Epoch: 18   Global Step: 304890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:16,146-Speed 3344.46 samples/sec   Loss 0.1291   LearningRate 0.0008   Epoch: 18   Global Step: 304900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:19,260-Speed 3289.54 samples/sec   Loss 0.1246   LearningRate 0.0008   Epoch: 18   Global Step: 304910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:22,346-Speed 3318.52 samples/sec   Loss 0.1194   LearningRate 0.0007   Epoch: 18   Global Step: 304920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:25,477-Speed 3271.63 samples/sec   Loss 0.1199   LearningRate 0.0007   Epoch: 18   Global Step: 304930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:28,619-Speed 3259.25 samples/sec   Loss 0.1230   LearningRate 0.0007   Epoch: 18   Global Step: 304940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:31,762-Speed 3259.36 samples/sec   Loss 0.1251   LearningRate 0.0007   Epoch: 18   Global Step: 304950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:34,862-Speed 3303.77 samples/sec   Loss 0.1216   LearningRate 0.0007   Epoch: 18   Global Step: 304960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:37,938-Speed 3329.39 samples/sec   Loss 0.1238   LearningRate 0.0007   Epoch: 18   Global Step: 304970   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:31:41,007-Speed 3337.20 samples/sec   Loss 0.1157   LearningRate 0.0007   Epoch: 18   Global Step: 304980   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:31:44,093-Speed 3319.20 samples/sec   Loss 0.1225   LearningRate 0.0007   Epoch: 18   Global Step: 304990   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:31:47,154-Speed 3345.88 samples/sec   Loss 0.1256   LearningRate 0.0007   Epoch: 18   Global Step: 305000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:50,257-Speed 3300.91 samples/sec   Loss 0.1273   LearningRate 0.0007   Epoch: 18   Global Step: 305010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:53,391-Speed 3268.33 samples/sec   Loss 0.1195   LearningRate 0.0007   Epoch: 18   Global Step: 305020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:56,512-Speed 3282.51 samples/sec   Loss 0.1212   LearningRate 0.0007   Epoch: 18   Global Step: 305030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:31:59,607-Speed 3309.21 samples/sec   Loss 0.1144   LearningRate 0.0007   Epoch: 18   Global Step: 305040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:02,677-Speed 3335.67 samples/sec   Loss 0.1135   LearningRate 0.0007   Epoch: 18   Global Step: 305050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:05,786-Speed 3294.27 samples/sec   Loss 0.1257   LearningRate 0.0007   Epoch: 18   Global Step: 305060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:08,863-Speed 3329.23 samples/sec   Loss 0.1267   LearningRate 0.0007   Epoch: 18   Global Step: 305070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:12,014-Speed 3249.99 samples/sec   Loss 0.1192   LearningRate 0.0007   Epoch: 18   Global Step: 305080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:15,126-Speed 3291.47 samples/sec   Loss 0.1154   LearningRate 0.0007   Epoch: 18   Global Step: 305090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:18,217-Speed 3313.63 samples/sec   Loss 0.1259   LearningRate 0.0007   Epoch: 18   Global Step: 305100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:21,295-Speed 3327.37 samples/sec   Loss 0.1365   LearningRate 0.0007   Epoch: 18   Global Step: 305110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:24,415-Speed 3283.02 samples/sec   Loss 0.1193   LearningRate 0.0007   Epoch: 18   Global Step: 305120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:27,618-Speed 3198.43 samples/sec   Loss 0.1281   LearningRate 0.0007   Epoch: 18   Global Step: 305130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:30,747-Speed 3272.35 samples/sec   Loss 0.1172   LearningRate 0.0007   Epoch: 18   Global Step: 305140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:33,815-Speed 3339.13 samples/sec   Loss 0.1197   LearningRate 0.0007   Epoch: 18   Global Step: 305150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:36,883-Speed 3338.31 samples/sec   Loss 0.1263   LearningRate 0.0007   Epoch: 18   Global Step: 305160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:39,981-Speed 3305.65 samples/sec   Loss 0.1269   LearningRate 0.0007   Epoch: 18   Global Step: 305170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:43,050-Speed 3337.71 samples/sec   Loss 0.1215   LearningRate 0.0007   Epoch: 18   Global Step: 305180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:46,128-Speed 3327.13 samples/sec   Loss 0.1100   LearningRate 0.0007   Epoch: 18   Global Step: 305190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:49,296-Speed 3232.75 samples/sec   Loss 0.1211   LearningRate 0.0007   Epoch: 18   Global Step: 305200   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:32:52,351-Speed 3353.34 samples/sec   Loss 0.1279   LearningRate 0.0007   Epoch: 18   Global Step: 305210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:55,491-Speed 3261.96 samples/sec   Loss 0.1292   LearningRate 0.0007   Epoch: 18   Global Step: 305220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:32:58,606-Speed 3288.10 samples/sec   Loss 0.1215   LearningRate 0.0007   Epoch: 18   Global Step: 305230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:01,710-Speed 3299.57 samples/sec   Loss 0.1109   LearningRate 0.0007   Epoch: 18   Global Step: 305240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:04,787-Speed 3328.67 samples/sec   Loss 0.1092   LearningRate 0.0007   Epoch: 18   Global Step: 305250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:07,853-Speed 3341.00 samples/sec   Loss 0.1154   LearningRate 0.0007   Epoch: 18   Global Step: 305260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:10,935-Speed 3322.72 samples/sec   Loss 0.1167   LearningRate 0.0007   Epoch: 18   Global Step: 305270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:14,095-Speed 3241.64 samples/sec   Loss 0.1285   LearningRate 0.0007   Epoch: 18   Global Step: 305280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:17,172-Speed 3328.41 samples/sec   Loss 0.1380   LearningRate 0.0007   Epoch: 18   Global Step: 305290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:20,244-Speed 3334.45 samples/sec   Loss 0.1170   LearningRate 0.0007   Epoch: 18   Global Step: 305300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:23,310-Speed 3340.57 samples/sec   Loss 0.1244   LearningRate 0.0007   Epoch: 18   Global Step: 305310   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:33:26,373-Speed 3344.25 samples/sec   Loss 0.1088   LearningRate 0.0007   Epoch: 18   Global Step: 305320   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:33:29,457-Speed 3321.09 samples/sec   Loss 0.1112   LearningRate 0.0007   Epoch: 18   Global Step: 305330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:32,519-Speed 3344.27 samples/sec   Loss 0.1194   LearningRate 0.0007   Epoch: 18   Global Step: 305340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:35,594-Speed 3331.66 samples/sec   Loss 0.1061   LearningRate 0.0007   Epoch: 18   Global Step: 305350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:38,676-Speed 3323.10 samples/sec   Loss 0.1265   LearningRate 0.0007   Epoch: 18   Global Step: 305360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:41,751-Speed 3330.08 samples/sec   Loss 0.1188   LearningRate 0.0007   Epoch: 18   Global Step: 305370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:44,847-Speed 3308.95 samples/sec   Loss 0.1273   LearningRate 0.0007   Epoch: 18   Global Step: 305380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:48,696-Speed 2660.65 samples/sec   Loss 0.1240   LearningRate 0.0007   Epoch: 18   Global Step: 305390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:51,777-Speed 3324.36 samples/sec   Loss 0.1177   LearningRate 0.0007   Epoch: 18   Global Step: 305400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:54,853-Speed 3329.03 samples/sec   Loss 0.1199   LearningRate 0.0007   Epoch: 18   Global Step: 305410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:33:57,928-Speed 3330.68 samples/sec   Loss 0.1285   LearningRate 0.0007   Epoch: 18   Global Step: 305420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:34:00,995-Speed 3339.80 samples/sec   Loss 0.1194   LearningRate 0.0007   Epoch: 18   Global Step: 305430   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:34:04,079-Speed 3321.64 samples/sec   Loss 0.1288   LearningRate 0.0007   Epoch: 18   Global Step: 305440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:07,165-Speed 3318.52 samples/sec   Loss 0.1307   LearningRate 0.0007   Epoch: 18   Global Step: 305450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:10,347-Speed 3218.15 samples/sec   Loss 0.1189   LearningRate 0.0007   Epoch: 18   Global Step: 305460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:13,431-Speed 3321.47 samples/sec   Loss 0.1222   LearningRate 0.0007   Epoch: 18   Global Step: 305470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:16,539-Speed 3295.99 samples/sec   Loss 0.1221   LearningRate 0.0007   Epoch: 18   Global Step: 305480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:19,613-Speed 3331.66 samples/sec   Loss 0.1144   LearningRate 0.0007   Epoch: 18   Global Step: 305490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:22,678-Speed 3341.74 samples/sec   Loss 0.1182   LearningRate 0.0007   Epoch: 18   Global Step: 305500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:25,801-Speed 3279.61 samples/sec   Loss 0.1300   LearningRate 0.0007   Epoch: 18   Global Step: 305510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:28,918-Speed 3286.09 samples/sec   Loss 0.1268   LearningRate 0.0007   Epoch: 18   Global Step: 305520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:32,032-Speed 3289.14 samples/sec   Loss 0.1119   LearningRate 0.0007   Epoch: 18   Global Step: 305530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:34:35,202-Speed 3230.46 samples/sec   Loss 0.1216   LearningRate 0.0007   Epoch: 18   Global Step: 305540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:34:38,355-Speed 3248.93 samples/sec   Loss 0.1158   LearningRate 0.0007   Epoch: 18   Global Step: 305550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:34:41,475-Speed 3282.04 samples/sec   Loss 0.1305   LearningRate 0.0007   Epoch: 18   Global Step: 305560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:34:44,554-Speed 3326.49 samples/sec   Loss 0.1153   LearningRate 0.0007   Epoch: 18   Global Step: 305570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:34:47,635-Speed 3324.30 samples/sec   Loss 0.1193   LearningRate 0.0007   Epoch: 18   Global Step: 305580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:34:50,734-Speed 3305.88 samples/sec   Loss 0.1275   LearningRate 0.0007   Epoch: 18   Global Step: 305590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:34:53,800-Speed 3340.83 samples/sec   Loss 0.1292   LearningRate 0.0007   Epoch: 18   Global Step: 305600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:34:56,874-Speed 3331.58 samples/sec   Loss 0.1303   LearningRate 0.0007   Epoch: 18   Global Step: 305610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:34:59,969-Speed 3309.40 samples/sec   Loss 0.1299   LearningRate 0.0007   Epoch: 18   Global Step: 305620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:03,126-Speed 3243.90 samples/sec   Loss 0.1282   LearningRate 0.0007   Epoch: 18   Global Step: 305630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:06,202-Speed 3330.33 samples/sec   Loss 0.1195   LearningRate 0.0007   Epoch: 18   Global Step: 305640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:09,430-Speed 3172.02 samples/sec   Loss 0.1307   LearningRate 0.0007   Epoch: 18   Global Step: 305650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:12,552-Speed 3280.66 samples/sec   Loss 0.1277   LearningRate 0.0007   Epoch: 18   Global Step: 305660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:15,702-Speed 3252.07 samples/sec   Loss 0.1166   LearningRate 0.0007   Epoch: 18   Global Step: 305670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:18,767-Speed 3341.75 samples/sec   Loss 0.1191   LearningRate 0.0007   Epoch: 18   Global Step: 305680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:21,865-Speed 3305.76 samples/sec   Loss 0.1190   LearningRate 0.0007   Epoch: 18   Global Step: 305690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:24,950-Speed 3320.10 samples/sec   Loss 0.1156   LearningRate 0.0007   Epoch: 18   Global Step: 305700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:28,070-Speed 3282.79 samples/sec   Loss 0.1213   LearningRate 0.0007   Epoch: 18   Global Step: 305710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:35:31,137-Speed 3339.71 samples/sec   Loss 0.1213   LearningRate 0.0007   Epoch: 18   Global Step: 305720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:35:34,204-Speed 3340.14 samples/sec   Loss 0.1134   LearningRate 0.0007   Epoch: 18   Global Step: 305730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:35:37,289-Speed 3319.59 samples/sec   Loss 0.1186   LearningRate 0.0007   Epoch: 18   Global Step: 305740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:35:40,416-Speed 3275.17 samples/sec   Loss 0.1175   LearningRate 0.0007   Epoch: 18   Global Step: 305750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:35:43,496-Speed 3326.42 samples/sec   Loss 0.1372   LearningRate 0.0007   Epoch: 18   Global Step: 305760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:35:46,560-Speed 3342.40 samples/sec   Loss 0.1135   LearningRate 0.0007   Epoch: 18   Global Step: 305770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:35:49,628-Speed 3338.47 samples/sec   Loss 0.1266   LearningRate 0.0007   Epoch: 18   Global Step: 305780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:35:52,713-Speed 3319.54 samples/sec   Loss 0.1307   LearningRate 0.0007   Epoch: 18   Global Step: 305790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:35:55,813-Speed 3304.15 samples/sec   Loss 0.1254   LearningRate 0.0007   Epoch: 18   Global Step: 305800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:35:58,908-Speed 3309.83 samples/sec   Loss 0.1259   LearningRate 0.0007   Epoch: 18   Global Step: 305810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:36:02,035-Speed 3275.30 samples/sec   Loss 0.1258   LearningRate 0.0007   Epoch: 18   Global Step: 305820   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:36:05,095-Speed 3347.28 samples/sec   Loss 0.1245   LearningRate 0.0007   Epoch: 18   Global Step: 305830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:36:08,180-Speed 3319.53 samples/sec   Loss 0.1162   LearningRate 0.0007   Epoch: 18   Global Step: 305840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:36:11,302-Speed 3281.08 samples/sec   Loss 0.1142   LearningRate 0.0007   Epoch: 18   Global Step: 305850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:36:14,388-Speed 3319.13 samples/sec   Loss 0.1146   LearningRate 0.0007   Epoch: 18   Global Step: 305860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:36:17,543-Speed 3246.46 samples/sec   Loss 0.1281   LearningRate 0.0007   Epoch: 18   Global Step: 305870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:36:20,624-Speed 3324.51 samples/sec   Loss 0.1251   LearningRate 0.0007   Epoch: 18   Global Step: 305880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:23,689-Speed 3341.15 samples/sec   Loss 0.1314   LearningRate 0.0007   Epoch: 18   Global Step: 305890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:26,753-Speed 3342.49 samples/sec   Loss 0.1121   LearningRate 0.0007   Epoch: 18   Global Step: 305900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:29,827-Speed 3332.78 samples/sec   Loss 0.1084   LearningRate 0.0007   Epoch: 18   Global Step: 305910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:32,903-Speed 3328.94 samples/sec   Loss 0.1207   LearningRate 0.0007   Epoch: 18   Global Step: 305920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:35,975-Speed 3334.69 samples/sec   Loss 0.1292   LearningRate 0.0007   Epoch: 18   Global Step: 305930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:39,160-Speed 3215.75 samples/sec   Loss 0.1209   LearningRate 0.0007   Epoch: 18   Global Step: 305940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:42,404-Speed 3157.71 samples/sec   Loss 0.1286   LearningRate 0.0007   Epoch: 18   Global Step: 305950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:45,541-Speed 3264.43 samples/sec   Loss 0.1312   LearningRate 0.0007   Epoch: 18   Global Step: 305960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:48,609-Speed 3338.75 samples/sec   Loss 0.1165   LearningRate 0.0007   Epoch: 18   Global Step: 305970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:36:51,691-Speed 3323.37 samples/sec   Loss 0.1155   LearningRate 0.0007   Epoch: 18   Global Step: 305980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:36:54,786-Speed 3308.57 samples/sec   Loss 0.1287   LearningRate 0.0007   Epoch: 18   Global Step: 305990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:36:57,869-Speed 3322.22 samples/sec   Loss 0.1232   LearningRate 0.0007   Epoch: 18   Global Step: 306000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:37:41,565-[lfw][306000]XNorm: 20.833476
Training: 2022-04-12 07:37:41,565-[lfw][306000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 07:37:41,566-[lfw][306000]Accuracy-Highest: 0.99817
Training: 2022-04-12 07:38:32,302-[cfp_fp][306000]XNorm: 22.673006
Training: 2022-04-12 07:38:32,303-[cfp_fp][306000]Accuracy-Flip: 0.99100+-0.00384
Training: 2022-04-12 07:38:32,303-[cfp_fp][306000]Accuracy-Highest: 0.99200
Training: 2022-04-12 07:39:16,303-[agedb_30][306000]XNorm: 22.767974
Training: 2022-04-12 07:39:16,304-[agedb_30][306000]Accuracy-Flip: 0.98517+-0.00664
Training: 2022-04-12 07:39:16,304-[agedb_30][306000]Accuracy-Highest: 0.98650
Training: 2022-04-12 07:39:19,362-Speed 72.37 samples/sec   Loss 0.1182   LearningRate 0.0007   Epoch: 18   Global Step: 306010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:22,414-Speed 3356.16 samples/sec   Loss 0.1235   LearningRate 0.0007   Epoch: 18   Global Step: 306020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:25,476-Speed 3345.09 samples/sec   Loss 0.1177   LearningRate 0.0007   Epoch: 18   Global Step: 306030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:28,527-Speed 3356.18 samples/sec   Loss 0.1323   LearningRate 0.0007   Epoch: 18   Global Step: 306040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:31,581-Speed 3354.83 samples/sec   Loss 0.1248   LearningRate 0.0007   Epoch: 18   Global Step: 306050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:34,634-Speed 3354.22 samples/sec   Loss 0.1229   LearningRate 0.0007   Epoch: 18   Global Step: 306060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:37,692-Speed 3349.51 samples/sec   Loss 0.1207   LearningRate 0.0007   Epoch: 18   Global Step: 306070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:40,818-Speed 3276.70 samples/sec   Loss 0.1187   LearningRate 0.0007   Epoch: 18   Global Step: 306080   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:39:43,886-Speed 3338.84 samples/sec   Loss 0.1193   LearningRate 0.0007   Epoch: 18   Global Step: 306090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:46,942-Speed 3351.40 samples/sec   Loss 0.1207   LearningRate 0.0007   Epoch: 18   Global Step: 306100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:50,011-Speed 3336.60 samples/sec   Loss 0.1241   LearningRate 0.0007   Epoch: 18   Global Step: 306110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:53,074-Speed 3344.43 samples/sec   Loss 0.1215   LearningRate 0.0007   Epoch: 18   Global Step: 306120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:39:56,128-Speed 3353.31 samples/sec   Loss 0.1207   LearningRate 0.0007   Epoch: 18   Global Step: 306130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:39:59,239-Speed 3293.32 samples/sec   Loss 0.1116   LearningRate 0.0007   Epoch: 18   Global Step: 306140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:40:02,337-Speed 3305.68 samples/sec   Loss 0.1242   LearningRate 0.0007   Epoch: 18   Global Step: 306150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:40:05,409-Speed 3334.13 samples/sec   Loss 0.1176   LearningRate 0.0007   Epoch: 18   Global Step: 306160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:40:08,480-Speed 3334.72 samples/sec   Loss 0.1287   LearningRate 0.0007   Epoch: 18   Global Step: 306170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:40:11,542-Speed 3344.72 samples/sec   Loss 0.1245   LearningRate 0.0007   Epoch: 18   Global Step: 306180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:40:14,624-Speed 3323.53 samples/sec   Loss 0.1316   LearningRate 0.0007   Epoch: 18   Global Step: 306190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:40:17,694-Speed 3335.90 samples/sec   Loss 0.1197   LearningRate 0.0007   Epoch: 18   Global Step: 306200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:40:20,794-Speed 3303.99 samples/sec   Loss 0.1307   LearningRate 0.0007   Epoch: 18   Global Step: 306210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:40:23,865-Speed 3336.43 samples/sec   Loss 0.1147   LearningRate 0.0007   Epoch: 18   Global Step: 306220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:40:26,943-Speed 3327.51 samples/sec   Loss 0.1281   LearningRate 0.0007   Epoch: 18   Global Step: 306230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:30,069-Speed 3275.91 samples/sec   Loss 0.1196   LearningRate 0.0007   Epoch: 18   Global Step: 306240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:33,144-Speed 3331.53 samples/sec   Loss 0.1191   LearningRate 0.0007   Epoch: 18   Global Step: 306250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:36,256-Speed 3290.27 samples/sec   Loss 0.1329   LearningRate 0.0007   Epoch: 18   Global Step: 306260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:39,346-Speed 3314.73 samples/sec   Loss 0.1267   LearningRate 0.0007   Epoch: 18   Global Step: 306270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:42,416-Speed 3336.40 samples/sec   Loss 0.1154   LearningRate 0.0007   Epoch: 18   Global Step: 306280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:45,508-Speed 3312.40 samples/sec   Loss 0.1271   LearningRate 0.0007   Epoch: 18   Global Step: 306290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:48,588-Speed 3326.20 samples/sec   Loss 0.1214   LearningRate 0.0007   Epoch: 18   Global Step: 306300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:51,743-Speed 3245.83 samples/sec   Loss 0.1098   LearningRate 0.0007   Epoch: 18   Global Step: 306310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:54,909-Speed 3235.26 samples/sec   Loss 0.1207   LearningRate 0.0007   Epoch: 18   Global Step: 306320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:40:57,980-Speed 3334.79 samples/sec   Loss 0.1239   LearningRate 0.0007   Epoch: 18   Global Step: 306330   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:41:01,042-Speed 3345.35 samples/sec   Loss 0.1273   LearningRate 0.0007   Epoch: 18   Global Step: 306340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:04,110-Speed 3338.41 samples/sec   Loss 0.1278   LearningRate 0.0007   Epoch: 18   Global Step: 306350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:07,279-Speed 3232.24 samples/sec   Loss 0.1161   LearningRate 0.0007   Epoch: 18   Global Step: 306360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:10,347-Speed 3337.88 samples/sec   Loss 0.1334   LearningRate 0.0007   Epoch: 18   Global Step: 306370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:13,491-Speed 3257.63 samples/sec   Loss 0.1043   LearningRate 0.0007   Epoch: 18   Global Step: 306380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:16,552-Speed 3347.03 samples/sec   Loss 0.1317   LearningRate 0.0007   Epoch: 18   Global Step: 306390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:19,739-Speed 3213.45 samples/sec   Loss 0.1271   LearningRate 0.0007   Epoch: 18   Global Step: 306400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:22,813-Speed 3331.93 samples/sec   Loss 0.1270   LearningRate 0.0007   Epoch: 18   Global Step: 306410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:25,874-Speed 3346.51 samples/sec   Loss 0.1294   LearningRate 0.0007   Epoch: 18   Global Step: 306420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:28,970-Speed 3307.62 samples/sec   Loss 0.1136   LearningRate 0.0007   Epoch: 18   Global Step: 306430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:32,053-Speed 3322.61 samples/sec   Loss 0.1354   LearningRate 0.0007   Epoch: 18   Global Step: 306440   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:41:35,136-Speed 3322.47 samples/sec   Loss 0.1213   LearningRate 0.0007   Epoch: 18   Global Step: 306450   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:41:38,359-Speed 3177.61 samples/sec   Loss 0.1125   LearningRate 0.0007   Epoch: 18   Global Step: 306460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:41,467-Speed 3295.49 samples/sec   Loss 0.1039   LearningRate 0.0007   Epoch: 18   Global Step: 306470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:44,541-Speed 3331.67 samples/sec   Loss 0.1238   LearningRate 0.0007   Epoch: 18   Global Step: 306480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:47,599-Speed 3350.24 samples/sec   Loss 0.1158   LearningRate 0.0007   Epoch: 18   Global Step: 306490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:50,659-Speed 3346.19 samples/sec   Loss 0.1254   LearningRate 0.0007   Epoch: 18   Global Step: 306500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:53,734-Speed 3331.82 samples/sec   Loss 0.1147   LearningRate 0.0007   Epoch: 18   Global Step: 306510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:56,793-Speed 3347.76 samples/sec   Loss 0.1144   LearningRate 0.0007   Epoch: 18   Global Step: 306520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:41:59,849-Speed 3351.68 samples/sec   Loss 0.1205   LearningRate 0.0007   Epoch: 18   Global Step: 306530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:02,916-Speed 3339.48 samples/sec   Loss 0.1303   LearningRate 0.0007   Epoch: 18   Global Step: 306540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:06,000-Speed 3320.64 samples/sec   Loss 0.1093   LearningRate 0.0007   Epoch: 18   Global Step: 306550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:09,061-Speed 3360.70 samples/sec   Loss 0.1180   LearningRate 0.0007   Epoch: 18   Global Step: 306560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:12,123-Speed 3345.74 samples/sec   Loss 0.1296   LearningRate 0.0007   Epoch: 18   Global Step: 306570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:15,182-Speed 3348.37 samples/sec   Loss 0.1212   LearningRate 0.0007   Epoch: 18   Global Step: 306580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:18,240-Speed 3348.61 samples/sec   Loss 0.1125   LearningRate 0.0007   Epoch: 18   Global Step: 306590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:21,299-Speed 3348.42 samples/sec   Loss 0.1229   LearningRate 0.0007   Epoch: 18   Global Step: 306600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:24,358-Speed 3347.92 samples/sec   Loss 0.1214   LearningRate 0.0007   Epoch: 18   Global Step: 306610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:27,431-Speed 3333.45 samples/sec   Loss 0.1310   LearningRate 0.0007   Epoch: 18   Global Step: 306620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:30,546-Speed 3287.36 samples/sec   Loss 0.1293   LearningRate 0.0007   Epoch: 18   Global Step: 306630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:33,609-Speed 3343.90 samples/sec   Loss 0.1173   LearningRate 0.0007   Epoch: 18   Global Step: 306640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:36,670-Speed 3347.18 samples/sec   Loss 0.1337   LearningRate 0.0007   Epoch: 18   Global Step: 306650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:39,760-Speed 3314.23 samples/sec   Loss 0.1229   LearningRate 0.0007   Epoch: 18   Global Step: 306660   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:42:42,834-Speed 3332.52 samples/sec   Loss 0.1200   LearningRate 0.0007   Epoch: 18   Global Step: 306670   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:42:45,919-Speed 3319.16 samples/sec   Loss 0.1150   LearningRate 0.0007   Epoch: 18   Global Step: 306680   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:42:48,987-Speed 3338.98 samples/sec   Loss 0.1219   LearningRate 0.0007   Epoch: 18   Global Step: 306690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:52,050-Speed 3343.56 samples/sec   Loss 0.1131   LearningRate 0.0007   Epoch: 18   Global Step: 306700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:55,121-Speed 3335.14 samples/sec   Loss 0.1243   LearningRate 0.0007   Epoch: 18   Global Step: 306710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:42:58,200-Speed 3326.08 samples/sec   Loss 0.1231   LearningRate 0.0007   Epoch: 18   Global Step: 306720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:01,281-Speed 3325.43 samples/sec   Loss 0.1229   LearningRate 0.0007   Epoch: 18   Global Step: 306730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:04,388-Speed 3296.42 samples/sec   Loss 0.1238   LearningRate 0.0007   Epoch: 18   Global Step: 306740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:07,466-Speed 3327.80 samples/sec   Loss 0.1312   LearningRate 0.0007   Epoch: 18   Global Step: 306750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:10,524-Speed 3349.19 samples/sec   Loss 0.1177   LearningRate 0.0007   Epoch: 18   Global Step: 306760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:13,607-Speed 3321.81 samples/sec   Loss 0.1163   LearningRate 0.0007   Epoch: 18   Global Step: 306770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:16,731-Speed 3279.07 samples/sec   Loss 0.1233   LearningRate 0.0007   Epoch: 18   Global Step: 306780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:19,892-Speed 3239.98 samples/sec   Loss 0.1275   LearningRate 0.0007   Epoch: 18   Global Step: 306790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:23,033-Speed 3260.69 samples/sec   Loss 0.1410   LearningRate 0.0007   Epoch: 18   Global Step: 306800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:26,123-Speed 3314.21 samples/sec   Loss 0.1210   LearningRate 0.0007   Epoch: 18   Global Step: 306810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:29,285-Speed 3239.47 samples/sec   Loss 0.1245   LearningRate 0.0007   Epoch: 18   Global Step: 306820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:32,374-Speed 3316.22 samples/sec   Loss 0.1209   LearningRate 0.0007   Epoch: 18   Global Step: 306830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:35,457-Speed 3322.48 samples/sec   Loss 0.1153   LearningRate 0.0007   Epoch: 18   Global Step: 306840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:38,537-Speed 3324.92 samples/sec   Loss 0.1199   LearningRate 0.0007   Epoch: 18   Global Step: 306850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:41,624-Speed 3318.35 samples/sec   Loss 0.1285   LearningRate 0.0007   Epoch: 18   Global Step: 306860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:44,722-Speed 3305.69 samples/sec   Loss 0.1214   LearningRate 0.0007   Epoch: 18   Global Step: 306870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:47,787-Speed 3341.73 samples/sec   Loss 0.1259   LearningRate 0.0007   Epoch: 18   Global Step: 306880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:50,871-Speed 3320.83 samples/sec   Loss 0.1251   LearningRate 0.0007   Epoch: 18   Global Step: 306890   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:43:53,978-Speed 3296.55 samples/sec   Loss 0.1212   LearningRate 0.0007   Epoch: 18   Global Step: 306900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:43:57,084-Speed 3298.05 samples/sec   Loss 0.1226   LearningRate 0.0006   Epoch: 18   Global Step: 306910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:00,186-Speed 3302.00 samples/sec   Loss 0.1214   LearningRate 0.0006   Epoch: 18   Global Step: 306920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:03,345-Speed 3242.44 samples/sec   Loss 0.1197   LearningRate 0.0006   Epoch: 18   Global Step: 306930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:06,440-Speed 3310.19 samples/sec   Loss 0.1243   LearningRate 0.0006   Epoch: 18   Global Step: 306940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:09,526-Speed 3318.95 samples/sec   Loss 0.1330   LearningRate 0.0006   Epoch: 18   Global Step: 306950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:12,704-Speed 3222.26 samples/sec   Loss 0.1268   LearningRate 0.0006   Epoch: 18   Global Step: 306960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:15,765-Speed 3346.39 samples/sec   Loss 0.1290   LearningRate 0.0006   Epoch: 18   Global Step: 306970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:18,838-Speed 3332.41 samples/sec   Loss 0.1215   LearningRate 0.0006   Epoch: 18   Global Step: 306980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:21,916-Speed 3328.01 samples/sec   Loss 0.1190   LearningRate 0.0006   Epoch: 18   Global Step: 306990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:25,043-Speed 3275.28 samples/sec   Loss 0.1149   LearningRate 0.0006   Epoch: 18   Global Step: 307000   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:44:28,153-Speed 3294.04 samples/sec   Loss 0.1454   LearningRate 0.0006   Epoch: 18   Global Step: 307010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:31,266-Speed 3289.82 samples/sec   Loss 0.1270   LearningRate 0.0006   Epoch: 18   Global Step: 307020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:34,335-Speed 3336.95 samples/sec   Loss 0.1185   LearningRate 0.0006   Epoch: 18   Global Step: 307030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:37,394-Speed 3348.96 samples/sec   Loss 0.1326   LearningRate 0.0006   Epoch: 18   Global Step: 307040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:40,474-Speed 3324.99 samples/sec   Loss 0.1254   LearningRate 0.0006   Epoch: 18   Global Step: 307050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:43,559-Speed 3319.65 samples/sec   Loss 0.1282   LearningRate 0.0006   Epoch: 18   Global Step: 307060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:46,617-Speed 3349.51 samples/sec   Loss 0.1126   LearningRate 0.0006   Epoch: 18   Global Step: 307070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:49,747-Speed 3272.47 samples/sec   Loss 0.1144   LearningRate 0.0006   Epoch: 18   Global Step: 307080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:52,863-Speed 3287.58 samples/sec   Loss 0.1165   LearningRate 0.0006   Epoch: 18   Global Step: 307090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:55,924-Speed 3346.38 samples/sec   Loss 0.1207   LearningRate 0.0006   Epoch: 18   Global Step: 307100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:44:58,975-Speed 3357.39 samples/sec   Loss 0.1158   LearningRate 0.0006   Epoch: 18   Global Step: 307110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:02,039-Speed 3341.99 samples/sec   Loss 0.1150   LearningRate 0.0006   Epoch: 18   Global Step: 307120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:05,115-Speed 3330.21 samples/sec   Loss 0.1195   LearningRate 0.0006   Epoch: 18   Global Step: 307130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:08,194-Speed 3325.65 samples/sec   Loss 0.1307   LearningRate 0.0006   Epoch: 18   Global Step: 307140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:11,254-Speed 3347.36 samples/sec   Loss 0.1300   LearningRate 0.0006   Epoch: 18   Global Step: 307150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:14,324-Speed 3336.71 samples/sec   Loss 0.1230   LearningRate 0.0006   Epoch: 18   Global Step: 307160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:17,389-Speed 3341.17 samples/sec   Loss 0.1186   LearningRate 0.0006   Epoch: 18   Global Step: 307170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:20,481-Speed 3313.36 samples/sec   Loss 0.1148   LearningRate 0.0006   Epoch: 18   Global Step: 307180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:23,608-Speed 3275.33 samples/sec   Loss 0.1128   LearningRate 0.0006   Epoch: 18   Global Step: 307190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:26,678-Speed 3335.73 samples/sec   Loss 0.1265   LearningRate 0.0006   Epoch: 18   Global Step: 307200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:29,767-Speed 3315.93 samples/sec   Loss 0.1156   LearningRate 0.0006   Epoch: 18   Global Step: 307210   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:45:32,910-Speed 3258.57 samples/sec   Loss 0.1197   LearningRate 0.0006   Epoch: 18   Global Step: 307220   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:45:36,019-Speed 3294.38 samples/sec   Loss 0.1347   LearningRate 0.0006   Epoch: 18   Global Step: 307230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:39,144-Speed 3277.64 samples/sec   Loss 0.1267   LearningRate 0.0006   Epoch: 18   Global Step: 307240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:45:42,199-Speed 3352.86 samples/sec   Loss 0.1195   LearningRate 0.0006   Epoch: 18   Global Step: 307250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:45:45,292-Speed 3311.42 samples/sec   Loss 0.1269   LearningRate 0.0006   Epoch: 18   Global Step: 307260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:45:48,359-Speed 3340.13 samples/sec   Loss 0.1213   LearningRate 0.0006   Epoch: 18   Global Step: 307270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:45:51,423-Speed 3342.39 samples/sec   Loss 0.1242   LearningRate 0.0006   Epoch: 18   Global Step: 307280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:45:54,499-Speed 3329.86 samples/sec   Loss 0.1233   LearningRate 0.0006   Epoch: 18   Global Step: 307290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:45:57,561-Speed 3344.19 samples/sec   Loss 0.1219   LearningRate 0.0006   Epoch: 18   Global Step: 307300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:00,623-Speed 3345.90 samples/sec   Loss 0.1250   LearningRate 0.0006   Epoch: 18   Global Step: 307310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:03,691-Speed 3337.44 samples/sec   Loss 0.1398   LearningRate 0.0006   Epoch: 18   Global Step: 307320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:06,761-Speed 3336.56 samples/sec   Loss 0.1197   LearningRate 0.0006   Epoch: 18   Global Step: 307330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:09,830-Speed 3337.21 samples/sec   Loss 0.1166   LearningRate 0.0006   Epoch: 18   Global Step: 307340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:12,906-Speed 3329.86 samples/sec   Loss 0.1208   LearningRate 0.0006   Epoch: 18   Global Step: 307350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:46:16,028-Speed 3281.09 samples/sec   Loss 0.1288   LearningRate 0.0006   Epoch: 18   Global Step: 307360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:46:19,117-Speed 3315.90 samples/sec   Loss 0.1245   LearningRate 0.0006   Epoch: 18   Global Step: 307370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:46:22,180-Speed 3344.28 samples/sec   Loss 0.1169   LearningRate 0.0006   Epoch: 18   Global Step: 307380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:46:25,248-Speed 3338.38 samples/sec   Loss 0.1202   LearningRate 0.0006   Epoch: 18   Global Step: 307390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:46:28,339-Speed 3313.13 samples/sec   Loss 0.1264   LearningRate 0.0006   Epoch: 18   Global Step: 307400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:31,424-Speed 3319.89 samples/sec   Loss 0.1094   LearningRate 0.0006   Epoch: 18   Global Step: 307410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:34,490-Speed 3340.46 samples/sec   Loss 0.1211   LearningRate 0.0006   Epoch: 18   Global Step: 307420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:37,560-Speed 3336.57 samples/sec   Loss 0.1128   LearningRate 0.0006   Epoch: 18   Global Step: 307430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:40,647-Speed 3318.28 samples/sec   Loss 0.1069   LearningRate 0.0006   Epoch: 18   Global Step: 307440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:43,823-Speed 3225.17 samples/sec   Loss 0.1110   LearningRate 0.0006   Epoch: 18   Global Step: 307450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:46,931-Speed 3295.44 samples/sec   Loss 0.1205   LearningRate 0.0006   Epoch: 18   Global Step: 307460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:50,047-Speed 3287.33 samples/sec   Loss 0.1125   LearningRate 0.0006   Epoch: 18   Global Step: 307470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:53,160-Speed 3289.42 samples/sec   Loss 0.1260   LearningRate 0.0006   Epoch: 18   Global Step: 307480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:56,280-Speed 3282.37 samples/sec   Loss 0.1217   LearningRate 0.0006   Epoch: 18   Global Step: 307490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:46:59,462-Speed 3219.78 samples/sec   Loss 0.1266   LearningRate 0.0006   Epoch: 18   Global Step: 307500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:02,542-Speed 3324.39 samples/sec   Loss 0.1180   LearningRate 0.0006   Epoch: 18   Global Step: 307510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:05,627-Speed 3320.76 samples/sec   Loss 0.1271   LearningRate 0.0006   Epoch: 18   Global Step: 307520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:08,696-Speed 3337.51 samples/sec   Loss 0.1273   LearningRate 0.0006   Epoch: 18   Global Step: 307530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:11,777-Speed 3324.46 samples/sec   Loss 0.1271   LearningRate 0.0006   Epoch: 18   Global Step: 307540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:14,939-Speed 3239.20 samples/sec   Loss 0.1211   LearningRate 0.0006   Epoch: 18   Global Step: 307550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:18,007-Speed 3338.91 samples/sec   Loss 0.1170   LearningRate 0.0006   Epoch: 18   Global Step: 307560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:21,139-Speed 3269.48 samples/sec   Loss 0.1338   LearningRate 0.0006   Epoch: 18   Global Step: 307570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:24,199-Speed 3347.14 samples/sec   Loss 0.1180   LearningRate 0.0006   Epoch: 18   Global Step: 307580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:27,260-Speed 3345.99 samples/sec   Loss 0.1186   LearningRate 0.0006   Epoch: 18   Global Step: 307590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:30,332-Speed 3333.99 samples/sec   Loss 0.1152   LearningRate 0.0006   Epoch: 18   Global Step: 307600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:33,433-Speed 3303.70 samples/sec   Loss 0.1189   LearningRate 0.0006   Epoch: 18   Global Step: 307610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:47:36,596-Speed 3238.09 samples/sec   Loss 0.1226   LearningRate 0.0006   Epoch: 18   Global Step: 307620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:47:39,731-Speed 3266.60 samples/sec   Loss 0.1324   LearningRate 0.0006   Epoch: 18   Global Step: 307630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:47:42,867-Speed 3265.91 samples/sec   Loss 0.1163   LearningRate 0.0006   Epoch: 18   Global Step: 307640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:47:46,113-Speed 3155.96 samples/sec   Loss 0.1236   LearningRate 0.0006   Epoch: 18   Global Step: 307650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:47:49,209-Speed 3307.94 samples/sec   Loss 0.1249   LearningRate 0.0006   Epoch: 18   Global Step: 307660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:47:52,311-Speed 3302.16 samples/sec   Loss 0.1240   LearningRate 0.0006   Epoch: 18   Global Step: 307670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:47:55,458-Speed 3253.91 samples/sec   Loss 0.1206   LearningRate 0.0006   Epoch: 18   Global Step: 307680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:47:58,558-Speed 3304.39 samples/sec   Loss 0.1215   LearningRate 0.0006   Epoch: 18   Global Step: 307690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:48:01,668-Speed 3294.40 samples/sec   Loss 0.1293   LearningRate 0.0006   Epoch: 18   Global Step: 307700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:48:04,741-Speed 3341.41 samples/sec   Loss 0.1248   LearningRate 0.0006   Epoch: 18   Global Step: 307710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:48:07,806-Speed 3341.72 samples/sec   Loss 0.1111   LearningRate 0.0006   Epoch: 18   Global Step: 307720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:10,878-Speed 3333.86 samples/sec   Loss 0.1187   LearningRate 0.0006   Epoch: 18   Global Step: 307730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:14,016-Speed 3264.14 samples/sec   Loss 0.1180   LearningRate 0.0006   Epoch: 18   Global Step: 307740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:17,110-Speed 3309.69 samples/sec   Loss 0.1173   LearningRate 0.0006   Epoch: 18   Global Step: 307750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:20,265-Speed 3246.35 samples/sec   Loss 0.1252   LearningRate 0.0006   Epoch: 18   Global Step: 307760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:23,367-Speed 3302.29 samples/sec   Loss 0.1092   LearningRate 0.0006   Epoch: 18   Global Step: 307770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:26,460-Speed 3311.34 samples/sec   Loss 0.1222   LearningRate 0.0006   Epoch: 18   Global Step: 307780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:29,569-Speed 3294.75 samples/sec   Loss 0.1193   LearningRate 0.0006   Epoch: 18   Global Step: 307790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:32,684-Speed 3288.02 samples/sec   Loss 0.1228   LearningRate 0.0006   Epoch: 18   Global Step: 307800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:35,758-Speed 3331.71 samples/sec   Loss 0.1163   LearningRate 0.0006   Epoch: 18   Global Step: 307810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:38,838-Speed 3325.04 samples/sec   Loss 0.1130   LearningRate 0.0006   Epoch: 18   Global Step: 307820   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:48:41,904-Speed 3340.82 samples/sec   Loss 0.1294   LearningRate 0.0006   Epoch: 18   Global Step: 307830   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:48:45,051-Speed 3254.92 samples/sec   Loss 0.1152   LearningRate 0.0006   Epoch: 18   Global Step: 307840   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:48:48,274-Speed 3177.21 samples/sec   Loss 0.1274   LearningRate 0.0006   Epoch: 18   Global Step: 307850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:51,366-Speed 3313.16 samples/sec   Loss 0.1330   LearningRate 0.0006   Epoch: 18   Global Step: 307860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:54,438-Speed 3334.73 samples/sec   Loss 0.1222   LearningRate 0.0006   Epoch: 18   Global Step: 307870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:48:57,529-Speed 3312.74 samples/sec   Loss 0.1142   LearningRate 0.0006   Epoch: 18   Global Step: 307880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:49:00,596-Speed 3339.86 samples/sec   Loss 0.1214   LearningRate 0.0006   Epoch: 18   Global Step: 307890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:49:03,675-Speed 3326.50 samples/sec   Loss 0.1233   LearningRate 0.0006   Epoch: 18   Global Step: 307900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:49:06,744-Speed 3337.16 samples/sec   Loss 0.1226   LearningRate 0.0006   Epoch: 18   Global Step: 307910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:49:09,831-Speed 3318.69 samples/sec   Loss 0.1164   LearningRate 0.0006   Epoch: 18   Global Step: 307920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:49:12,917-Speed 3318.34 samples/sec   Loss 0.1253   LearningRate 0.0006   Epoch: 18   Global Step: 307930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:49:16,023-Speed 3297.32 samples/sec   Loss 0.1128   LearningRate 0.0006   Epoch: 18   Global Step: 307940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:49:19,135-Speed 3291.91 samples/sec   Loss 0.1396   LearningRate 0.0006   Epoch: 18   Global Step: 307950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:49:22,252-Speed 3286.07 samples/sec   Loss 0.1212   LearningRate 0.0006   Epoch: 18   Global Step: 307960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:49:25,433-Speed 3219.72 samples/sec   Loss 0.1238   LearningRate 0.0006   Epoch: 18   Global Step: 307970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:49:28,535-Speed 3301.54 samples/sec   Loss 0.1171   LearningRate 0.0006   Epoch: 18   Global Step: 307980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:49:31,614-Speed 3326.43 samples/sec   Loss 0.1238   LearningRate 0.0006   Epoch: 18   Global Step: 307990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:49:34,682-Speed 3338.32 samples/sec   Loss 0.1328   LearningRate 0.0006   Epoch: 18   Global Step: 308000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:50:18,678-[lfw][308000]XNorm: 20.810476
Training: 2022-04-12 07:50:18,679-[lfw][308000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-12 07:50:18,680-[lfw][308000]Accuracy-Highest: 0.99817
Training: 2022-04-12 07:51:09,918-[cfp_fp][308000]XNorm: 22.635153
Training: 2022-04-12 07:51:09,918-[cfp_fp][308000]Accuracy-Flip: 0.99129+-0.00396
Training: 2022-04-12 07:51:09,919-[cfp_fp][308000]Accuracy-Highest: 0.99200
Training: 2022-04-12 07:51:53,851-[agedb_30][308000]XNorm: 22.915190
Training: 2022-04-12 07:51:53,852-[agedb_30][308000]Accuracy-Flip: 0.98550+-0.00587
Training: 2022-04-12 07:51:53,852-[agedb_30][308000]Accuracy-Highest: 0.98650
Training: 2022-04-12 07:51:56,913-Speed 72.00 samples/sec   Loss 0.1210   LearningRate 0.0006   Epoch: 18   Global Step: 308010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:00,000-Speed 3317.85 samples/sec   Loss 0.1235   LearningRate 0.0006   Epoch: 18   Global Step: 308020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:03,102-Speed 3302.50 samples/sec   Loss 0.1266   LearningRate 0.0006   Epoch: 18   Global Step: 308030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:06,278-Speed 3225.14 samples/sec   Loss 0.1197   LearningRate 0.0006   Epoch: 18   Global Step: 308040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:09,396-Speed 3284.05 samples/sec   Loss 0.1250   LearningRate 0.0006   Epoch: 18   Global Step: 308050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:12,470-Speed 3332.45 samples/sec   Loss 0.1191   LearningRate 0.0006   Epoch: 18   Global Step: 308060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:52:15,563-Speed 3311.25 samples/sec   Loss 0.1323   LearningRate 0.0006   Epoch: 18   Global Step: 308070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:52:18,674-Speed 3293.20 samples/sec   Loss 0.1145   LearningRate 0.0006   Epoch: 18   Global Step: 308080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:52:21,740-Speed 3339.99 samples/sec   Loss 0.1093   LearningRate 0.0006   Epoch: 18   Global Step: 308090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:52:24,807-Speed 3339.25 samples/sec   Loss 0.1226   LearningRate 0.0006   Epoch: 18   Global Step: 308100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:27,872-Speed 3342.12 samples/sec   Loss 0.1294   LearningRate 0.0006   Epoch: 18   Global Step: 308110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:30,961-Speed 3315.55 samples/sec   Loss 0.1168   LearningRate 0.0006   Epoch: 18   Global Step: 308120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:34,031-Speed 3335.90 samples/sec   Loss 0.1258   LearningRate 0.0006   Epoch: 18   Global Step: 308130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:37,106-Speed 3331.13 samples/sec   Loss 0.1233   LearningRate 0.0006   Epoch: 18   Global Step: 308140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:40,190-Speed 3320.99 samples/sec   Loss 0.1226   LearningRate 0.0006   Epoch: 18   Global Step: 308150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:43,266-Speed 3330.39 samples/sec   Loss 0.1249   LearningRate 0.0006   Epoch: 18   Global Step: 308160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:46,354-Speed 3316.64 samples/sec   Loss 0.1285   LearningRate 0.0006   Epoch: 18   Global Step: 308170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:49,430-Speed 3330.02 samples/sec   Loss 0.1262   LearningRate 0.0006   Epoch: 18   Global Step: 308180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:52,497-Speed 3339.52 samples/sec   Loss 0.1057   LearningRate 0.0006   Epoch: 18   Global Step: 308190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:52:55,567-Speed 3335.68 samples/sec   Loss 0.1198   LearningRate 0.0006   Epoch: 18   Global Step: 308200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:52:58,662-Speed 3309.83 samples/sec   Loss 0.1282   LearningRate 0.0006   Epoch: 18   Global Step: 308210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:01,821-Speed 3242.45 samples/sec   Loss 0.1214   LearningRate 0.0006   Epoch: 18   Global Step: 308220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:04,954-Speed 3268.51 samples/sec   Loss 0.1174   LearningRate 0.0006   Epoch: 18   Global Step: 308230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:08,108-Speed 3247.89 samples/sec   Loss 0.1213   LearningRate 0.0006   Epoch: 18   Global Step: 308240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:11,272-Speed 3237.56 samples/sec   Loss 0.1239   LearningRate 0.0006   Epoch: 18   Global Step: 308250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:14,430-Speed 3242.70 samples/sec   Loss 0.1207   LearningRate 0.0006   Epoch: 18   Global Step: 308260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:17,642-Speed 3189.22 samples/sec   Loss 0.1295   LearningRate 0.0006   Epoch: 18   Global Step: 308270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:20,721-Speed 3326.14 samples/sec   Loss 0.1239   LearningRate 0.0006   Epoch: 18   Global Step: 308280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:23,799-Speed 3327.58 samples/sec   Loss 0.1207   LearningRate 0.0006   Epoch: 18   Global Step: 308290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:26,879-Speed 3326.06 samples/sec   Loss 0.1243   LearningRate 0.0006   Epoch: 18   Global Step: 308300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:53:29,986-Speed 3295.69 samples/sec   Loss 0.1144   LearningRate 0.0006   Epoch: 18   Global Step: 308310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:53:33,159-Speed 3228.59 samples/sec   Loss 0.1188   LearningRate 0.0006   Epoch: 18   Global Step: 308320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:53:36,234-Speed 3331.77 samples/sec   Loss 0.1212   LearningRate 0.0006   Epoch: 18   Global Step: 308330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:53:39,320-Speed 3318.57 samples/sec   Loss 0.1214   LearningRate 0.0006   Epoch: 18   Global Step: 308340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:53:42,413-Speed 3311.12 samples/sec   Loss 0.1282   LearningRate 0.0006   Epoch: 18   Global Step: 308350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:53:45,495-Speed 3323.27 samples/sec   Loss 0.1214   LearningRate 0.0006   Epoch: 18   Global Step: 308360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:53:48,609-Speed 3289.05 samples/sec   Loss 0.1278   LearningRate 0.0006   Epoch: 18   Global Step: 308370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:53:51,676-Speed 3340.07 samples/sec   Loss 0.1109   LearningRate 0.0006   Epoch: 18   Global Step: 308380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:53:54,740-Speed 3342.40 samples/sec   Loss 0.1148   LearningRate 0.0006   Epoch: 18   Global Step: 308390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:53:57,800-Speed 3346.86 samples/sec   Loss 0.1180   LearningRate 0.0006   Epoch: 18   Global Step: 308400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:00,856-Speed 3352.07 samples/sec   Loss 0.1400   LearningRate 0.0006   Epoch: 18   Global Step: 308410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:03,943-Speed 3318.72 samples/sec   Loss 0.1267   LearningRate 0.0006   Epoch: 18   Global Step: 308420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:07,007-Speed 3343.03 samples/sec   Loss 0.1127   LearningRate 0.0006   Epoch: 18   Global Step: 308430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:10,065-Speed 3349.38 samples/sec   Loss 0.1275   LearningRate 0.0006   Epoch: 18   Global Step: 308440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:13,165-Speed 3303.40 samples/sec   Loss 0.1233   LearningRate 0.0006   Epoch: 18   Global Step: 308450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:16,226-Speed 3346.18 samples/sec   Loss 0.1164   LearningRate 0.0006   Epoch: 18   Global Step: 308460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:19,288-Speed 3345.22 samples/sec   Loss 0.1237   LearningRate 0.0006   Epoch: 18   Global Step: 308470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:22,369-Speed 3323.88 samples/sec   Loss 0.1247   LearningRate 0.0006   Epoch: 18   Global Step: 308480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:25,451-Speed 3323.55 samples/sec   Loss 0.1196   LearningRate 0.0006   Epoch: 18   Global Step: 308490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:28,571-Speed 3281.83 samples/sec   Loss 0.1280   LearningRate 0.0006   Epoch: 18   Global Step: 308500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:31,690-Speed 3284.95 samples/sec   Loss 0.1159   LearningRate 0.0006   Epoch: 18   Global Step: 308510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:34,764-Speed 3331.40 samples/sec   Loss 0.1201   LearningRate 0.0006   Epoch: 18   Global Step: 308520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:37,876-Speed 3291.78 samples/sec   Loss 0.1292   LearningRate 0.0006   Epoch: 18   Global Step: 308530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:40,940-Speed 3342.36 samples/sec   Loss 0.1195   LearningRate 0.0006   Epoch: 18   Global Step: 308540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:44,002-Speed 3344.93 samples/sec   Loss 0.1160   LearningRate 0.0006   Epoch: 18   Global Step: 308550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:47,068-Speed 3340.25 samples/sec   Loss 0.1348   LearningRate 0.0006   Epoch: 18   Global Step: 308560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:50,161-Speed 3311.31 samples/sec   Loss 0.1264   LearningRate 0.0006   Epoch: 18   Global Step: 308570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:53,297-Speed 3266.70 samples/sec   Loss 0.1179   LearningRate 0.0006   Epoch: 18   Global Step: 308580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:56,399-Speed 3301.40 samples/sec   Loss 0.1297   LearningRate 0.0006   Epoch: 18   Global Step: 308590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:54:59,474-Speed 3331.16 samples/sec   Loss 0.1192   LearningRate 0.0006   Epoch: 18   Global Step: 308600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:02,545-Speed 3335.28 samples/sec   Loss 0.1274   LearningRate 0.0006   Epoch: 18   Global Step: 308610   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:55:05,616-Speed 3335.07 samples/sec   Loss 0.1228   LearningRate 0.0006   Epoch: 18   Global Step: 308620   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:55:08,712-Speed 3308.79 samples/sec   Loss 0.1301   LearningRate 0.0006   Epoch: 18   Global Step: 308630   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:55:11,774-Speed 3344.27 samples/sec   Loss 0.1278   LearningRate 0.0006   Epoch: 18   Global Step: 308640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:14,914-Speed 3262.37 samples/sec   Loss 0.1129   LearningRate 0.0006   Epoch: 18   Global Step: 308650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:18,027-Speed 3289.49 samples/sec   Loss 0.1264   LearningRate 0.0006   Epoch: 18   Global Step: 308660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:21,120-Speed 3311.93 samples/sec   Loss 0.1079   LearningRate 0.0006   Epoch: 18   Global Step: 308670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:24,175-Speed 3352.74 samples/sec   Loss 0.1257   LearningRate 0.0006   Epoch: 18   Global Step: 308680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:27,251-Speed 3330.09 samples/sec   Loss 0.1180   LearningRate 0.0006   Epoch: 18   Global Step: 308690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:30,317-Speed 3339.85 samples/sec   Loss 0.1357   LearningRate 0.0006   Epoch: 18   Global Step: 308700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:33,377-Speed 3348.07 samples/sec   Loss 0.1163   LearningRate 0.0006   Epoch: 18   Global Step: 308710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:36,433-Speed 3351.15 samples/sec   Loss 0.1170   LearningRate 0.0006   Epoch: 18   Global Step: 308720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:39,495-Speed 3344.77 samples/sec   Loss 0.1112   LearningRate 0.0006   Epoch: 18   Global Step: 308730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:55:42,576-Speed 3324.21 samples/sec   Loss 0.1207   LearningRate 0.0006   Epoch: 18   Global Step: 308740   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:55:45,643-Speed 3340.20 samples/sec   Loss 0.1136   LearningRate 0.0006   Epoch: 18   Global Step: 308750   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:55:48,719-Speed 3330.12 samples/sec   Loss 0.1314   LearningRate 0.0006   Epoch: 18   Global Step: 308760   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:55:51,777-Speed 3349.51 samples/sec   Loss 0.1167   LearningRate 0.0006   Epoch: 18   Global Step: 308770   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:55:54,836-Speed 3347.45 samples/sec   Loss 0.1055   LearningRate 0.0006   Epoch: 18   Global Step: 308780   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:55:57,885-Speed 3359.16 samples/sec   Loss 0.1157   LearningRate 0.0006   Epoch: 18   Global Step: 308790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:00,960-Speed 3330.96 samples/sec   Loss 0.1309   LearningRate 0.0006   Epoch: 18   Global Step: 308800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:04,090-Speed 3272.00 samples/sec   Loss 0.1165   LearningRate 0.0006   Epoch: 18   Global Step: 308810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:07,280-Speed 3211.65 samples/sec   Loss 0.1204   LearningRate 0.0006   Epoch: 18   Global Step: 308820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:10,344-Speed 3342.13 samples/sec   Loss 0.1231   LearningRate 0.0006   Epoch: 18   Global Step: 308830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:13,419-Speed 3330.99 samples/sec   Loss 0.1188   LearningRate 0.0006   Epoch: 18   Global Step: 308840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:16,562-Speed 3258.94 samples/sec   Loss 0.1164   LearningRate 0.0006   Epoch: 18   Global Step: 308850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:19,655-Speed 3311.83 samples/sec   Loss 0.1188   LearningRate 0.0006   Epoch: 18   Global Step: 308860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:22,747-Speed 3312.61 samples/sec   Loss 0.1335   LearningRate 0.0006   Epoch: 18   Global Step: 308870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:25,892-Speed 3256.60 samples/sec   Loss 0.1255   LearningRate 0.0006   Epoch: 18   Global Step: 308880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:28,961-Speed 3337.05 samples/sec   Loss 0.1238   LearningRate 0.0006   Epoch: 18   Global Step: 308890   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:56:32,020-Speed 3348.72 samples/sec   Loss 0.1302   LearningRate 0.0006   Epoch: 18   Global Step: 308900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:35,092-Speed 3333.54 samples/sec   Loss 0.1238   LearningRate 0.0006   Epoch: 18   Global Step: 308910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:38,166-Speed 3331.60 samples/sec   Loss 0.1161   LearningRate 0.0006   Epoch: 18   Global Step: 308920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:41,316-Speed 3252.21 samples/sec   Loss 0.1081   LearningRate 0.0006   Epoch: 18   Global Step: 308930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:44,389-Speed 3332.67 samples/sec   Loss 0.1284   LearningRate 0.0006   Epoch: 18   Global Step: 308940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:47,451-Speed 3345.17 samples/sec   Loss 0.1233   LearningRate 0.0006   Epoch: 18   Global Step: 308950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:50,563-Speed 3291.26 samples/sec   Loss 0.1226   LearningRate 0.0006   Epoch: 18   Global Step: 308960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:53,645-Speed 3323.54 samples/sec   Loss 0.1050   LearningRate 0.0006   Epoch: 18   Global Step: 308970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:56,705-Speed 3347.12 samples/sec   Loss 0.1252   LearningRate 0.0006   Epoch: 18   Global Step: 308980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:56:59,796-Speed 3313.11 samples/sec   Loss 0.1186   LearningRate 0.0006   Epoch: 18   Global Step: 308990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:02,940-Speed 3258.58 samples/sec   Loss 0.1210   LearningRate 0.0006   Epoch: 18   Global Step: 309000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:06,068-Speed 3274.26 samples/sec   Loss 0.1170   LearningRate 0.0006   Epoch: 18   Global Step: 309010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:09,157-Speed 3316.11 samples/sec   Loss 0.1292   LearningRate 0.0006   Epoch: 18   Global Step: 309020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:12,234-Speed 3328.92 samples/sec   Loss 0.1313   LearningRate 0.0006   Epoch: 18   Global Step: 309030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:15,347-Speed 3289.67 samples/sec   Loss 0.1230   LearningRate 0.0006   Epoch: 18   Global Step: 309040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:18,532-Speed 3216.04 samples/sec   Loss 0.1316   LearningRate 0.0006   Epoch: 18   Global Step: 309050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:21,621-Speed 3315.63 samples/sec   Loss 0.1219   LearningRate 0.0006   Epoch: 18   Global Step: 309060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:24,750-Speed 3273.14 samples/sec   Loss 0.1199   LearningRate 0.0005   Epoch: 18   Global Step: 309070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:27,913-Speed 3238.33 samples/sec   Loss 0.1157   LearningRate 0.0005   Epoch: 18   Global Step: 309080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:31,023-Speed 3293.43 samples/sec   Loss 0.1194   LearningRate 0.0005   Epoch: 18   Global Step: 309090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:34,104-Speed 3323.83 samples/sec   Loss 0.1226   LearningRate 0.0005   Epoch: 18   Global Step: 309100   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:57:37,163-Speed 3348.65 samples/sec   Loss 0.1272   LearningRate 0.0005   Epoch: 18   Global Step: 309110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:40,224-Speed 3346.25 samples/sec   Loss 0.1189   LearningRate 0.0005   Epoch: 18   Global Step: 309120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:43,289-Speed 3341.59 samples/sec   Loss 0.1305   LearningRate 0.0005   Epoch: 18   Global Step: 309130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:46,356-Speed 3339.83 samples/sec   Loss 0.1396   LearningRate 0.0005   Epoch: 18   Global Step: 309140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:49,436-Speed 3325.32 samples/sec   Loss 0.1219   LearningRate 0.0005   Epoch: 18   Global Step: 309150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:52,506-Speed 3335.30 samples/sec   Loss 0.1237   LearningRate 0.0005   Epoch: 18   Global Step: 309160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:55,664-Speed 3243.52 samples/sec   Loss 0.1106   LearningRate 0.0005   Epoch: 18   Global Step: 309170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:57:58,756-Speed 3312.83 samples/sec   Loss 0.1198   LearningRate 0.0005   Epoch: 18   Global Step: 309180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:01,826-Speed 3336.36 samples/sec   Loss 0.1168   LearningRate 0.0005   Epoch: 18   Global Step: 309190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:04,888-Speed 3344.83 samples/sec   Loss 0.1243   LearningRate 0.0005   Epoch: 18   Global Step: 309200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:07,980-Speed 3312.43 samples/sec   Loss 0.1234   LearningRate 0.0005   Epoch: 18   Global Step: 309210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:11,072-Speed 3312.32 samples/sec   Loss 0.1318   LearningRate 0.0005   Epoch: 18   Global Step: 309220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:14,162-Speed 3315.07 samples/sec   Loss 0.1163   LearningRate 0.0005   Epoch: 18   Global Step: 309230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:17,244-Speed 3323.86 samples/sec   Loss 0.1212   LearningRate 0.0005   Epoch: 18   Global Step: 309240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:20,316-Speed 3333.15 samples/sec   Loss 0.1212   LearningRate 0.0005   Epoch: 18   Global Step: 309250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:23,454-Speed 3263.98 samples/sec   Loss 0.1225   LearningRate 0.0005   Epoch: 18   Global Step: 309260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:26,526-Speed 3334.50 samples/sec   Loss 0.1192   LearningRate 0.0005   Epoch: 18   Global Step: 309270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:29,602-Speed 3329.96 samples/sec   Loss 0.1308   LearningRate 0.0005   Epoch: 18   Global Step: 309280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:32,671-Speed 3337.03 samples/sec   Loss 0.1209   LearningRate 0.0005   Epoch: 18   Global Step: 309290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:35,771-Speed 3304.23 samples/sec   Loss 0.1096   LearningRate 0.0005   Epoch: 18   Global Step: 309300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:38,822-Speed 3357.55 samples/sec   Loss 0.1186   LearningRate 0.0005   Epoch: 18   Global Step: 309310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:41,898-Speed 3329.77 samples/sec   Loss 0.1266   LearningRate 0.0005   Epoch: 18   Global Step: 309320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:44,960-Speed 3345.22 samples/sec   Loss 0.1331   LearningRate 0.0005   Epoch: 18   Global Step: 309330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:48,042-Speed 3323.05 samples/sec   Loss 0.1239   LearningRate 0.0005   Epoch: 18   Global Step: 309340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:51,113-Speed 3335.12 samples/sec   Loss 0.1236   LearningRate 0.0005   Epoch: 18   Global Step: 309350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:54,185-Speed 3334.24 samples/sec   Loss 0.1301   LearningRate 0.0005   Epoch: 18   Global Step: 309360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:58:57,280-Speed 3310.13 samples/sec   Loss 0.1319   LearningRate 0.0005   Epoch: 18   Global Step: 309370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:00,353-Speed 3332.60 samples/sec   Loss 0.1291   LearningRate 0.0005   Epoch: 18   Global Step: 309380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:03,431-Speed 3326.91 samples/sec   Loss 0.1215   LearningRate 0.0005   Epoch: 18   Global Step: 309390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:06,497-Speed 3341.29 samples/sec   Loss 0.1180   LearningRate 0.0005   Epoch: 18   Global Step: 309400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:09,583-Speed 3319.20 samples/sec   Loss 0.1289   LearningRate 0.0005   Epoch: 18   Global Step: 309410   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:59:12,689-Speed 3297.63 samples/sec   Loss 0.1140   LearningRate 0.0005   Epoch: 18   Global Step: 309420   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:59:15,764-Speed 3330.56 samples/sec   Loss 0.1144   LearningRate 0.0005   Epoch: 18   Global Step: 309430   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:59:18,833-Speed 3337.21 samples/sec   Loss 0.1296   LearningRate 0.0005   Epoch: 18   Global Step: 309440   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 07:59:21,906-Speed 3332.54 samples/sec   Loss 0.1237   LearningRate 0.0005   Epoch: 18   Global Step: 309450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:25,009-Speed 3301.48 samples/sec   Loss 0.1166   LearningRate 0.0005   Epoch: 18   Global Step: 309460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:28,090-Speed 3323.76 samples/sec   Loss 0.1243   LearningRate 0.0005   Epoch: 18   Global Step: 309470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:31,160-Speed 3336.15 samples/sec   Loss 0.1179   LearningRate 0.0005   Epoch: 18   Global Step: 309480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:34,233-Speed 3333.01 samples/sec   Loss 0.1203   LearningRate 0.0005   Epoch: 18   Global Step: 309490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:37,301-Speed 3339.13 samples/sec   Loss 0.1279   LearningRate 0.0005   Epoch: 18   Global Step: 309500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 07:59:40,361-Speed 3346.37 samples/sec   Loss 0.1253   LearningRate 0.0005   Epoch: 18   Global Step: 309510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:59:43,511-Speed 3252.62 samples/sec   Loss 0.1204   LearningRate 0.0005   Epoch: 18   Global Step: 309520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:59:46,645-Speed 3268.34 samples/sec   Loss 0.1223   LearningRate 0.0005   Epoch: 18   Global Step: 309530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:59:49,772-Speed 3275.47 samples/sec   Loss 0.1143   LearningRate 0.0005   Epoch: 18   Global Step: 309540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:59:52,844-Speed 3333.75 samples/sec   Loss 0.1285   LearningRate 0.0005   Epoch: 18   Global Step: 309550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:59:55,914-Speed 3336.65 samples/sec   Loss 0.1235   LearningRate 0.0005   Epoch: 18   Global Step: 309560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 07:59:59,050-Speed 3265.85 samples/sec   Loss 0.1181   LearningRate 0.0005   Epoch: 18   Global Step: 309570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:00:02,131-Speed 3323.43 samples/sec   Loss 0.1253   LearningRate 0.0005   Epoch: 18   Global Step: 309580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:00:05,194-Speed 3344.15 samples/sec   Loss 0.1187   LearningRate 0.0005   Epoch: 18   Global Step: 309590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:00:08,263-Speed 3337.40 samples/sec   Loss 0.1195   LearningRate 0.0005   Epoch: 18   Global Step: 309600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:00:11,340-Speed 3328.57 samples/sec   Loss 0.1259   LearningRate 0.0005   Epoch: 18   Global Step: 309610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:14,414-Speed 3332.53 samples/sec   Loss 0.1242   LearningRate 0.0005   Epoch: 18   Global Step: 309620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:17,484-Speed 3336.01 samples/sec   Loss 0.1162   LearningRate 0.0005   Epoch: 18   Global Step: 309630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:20,569-Speed 3320.56 samples/sec   Loss 0.1242   LearningRate 0.0005   Epoch: 18   Global Step: 309640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:23,635-Speed 3340.48 samples/sec   Loss 0.1342   LearningRate 0.0005   Epoch: 18   Global Step: 309650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:26,706-Speed 3334.19 samples/sec   Loss 0.1115   LearningRate 0.0005   Epoch: 18   Global Step: 309660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:29,769-Speed 3344.89 samples/sec   Loss 0.1384   LearningRate 0.0005   Epoch: 18   Global Step: 309670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:32,845-Speed 3329.51 samples/sec   Loss 0.1221   LearningRate 0.0005   Epoch: 18   Global Step: 309680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:35,909-Speed 3342.01 samples/sec   Loss 0.1196   LearningRate 0.0005   Epoch: 18   Global Step: 309690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:38,985-Speed 3330.33 samples/sec   Loss 0.1169   LearningRate 0.0005   Epoch: 18   Global Step: 309700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:00:42,056-Speed 3334.81 samples/sec   Loss 0.1259   LearningRate 0.0005   Epoch: 18   Global Step: 309710   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:00:45,135-Speed 3327.20 samples/sec   Loss 0.1231   LearningRate 0.0005   Epoch: 18   Global Step: 309720   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:00:48,199-Speed 3342.95 samples/sec   Loss 0.1170   LearningRate 0.0005   Epoch: 18   Global Step: 309730   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:00:51,274-Speed 3330.79 samples/sec   Loss 0.1202   LearningRate 0.0005   Epoch: 18   Global Step: 309740   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:00:54,341-Speed 3339.30 samples/sec   Loss 0.1287   LearningRate 0.0005   Epoch: 18   Global Step: 309750   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:00:57,420-Speed 3326.35 samples/sec   Loss 0.1287   LearningRate 0.0005   Epoch: 18   Global Step: 309760   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:01:00,485-Speed 3341.89 samples/sec   Loss 0.1209   LearningRate 0.0005   Epoch: 18   Global Step: 309770   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:01:03,564-Speed 3326.41 samples/sec   Loss 0.1172   LearningRate 0.0005   Epoch: 18   Global Step: 309780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:01:06,629-Speed 3341.51 samples/sec   Loss 0.1156   LearningRate 0.0005   Epoch: 18   Global Step: 309790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:01:09,736-Speed 3297.74 samples/sec   Loss 0.1195   LearningRate 0.0005   Epoch: 18   Global Step: 309800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:01:12,819-Speed 3321.87 samples/sec   Loss 0.1229   LearningRate 0.0005   Epoch: 18   Global Step: 309810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:15,959-Speed 3262.20 samples/sec   Loss 0.1155   LearningRate 0.0005   Epoch: 18   Global Step: 309820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:19,079-Speed 3282.09 samples/sec   Loss 0.1213   LearningRate 0.0005   Epoch: 18   Global Step: 309830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:22,146-Speed 3339.65 samples/sec   Loss 0.1301   LearningRate 0.0005   Epoch: 18   Global Step: 309840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:25,340-Speed 3206.73 samples/sec   Loss 0.1251   LearningRate 0.0005   Epoch: 18   Global Step: 309850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:28,467-Speed 3275.71 samples/sec   Loss 0.1106   LearningRate 0.0005   Epoch: 18   Global Step: 309860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:31,609-Speed 3259.30 samples/sec   Loss 0.1336   LearningRate 0.0005   Epoch: 18   Global Step: 309870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:34,688-Speed 3327.01 samples/sec   Loss 0.1212   LearningRate 0.0005   Epoch: 18   Global Step: 309880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:37,757-Speed 3337.82 samples/sec   Loss 0.1305   LearningRate 0.0005   Epoch: 18   Global Step: 309890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:40,844-Speed 3317.79 samples/sec   Loss 0.1110   LearningRate 0.0005   Epoch: 18   Global Step: 309900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:01:43,936-Speed 3312.26 samples/sec   Loss 0.1165   LearningRate 0.0005   Epoch: 18   Global Step: 309910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:01:47,090-Speed 3250.75 samples/sec   Loss 0.1220   LearningRate 0.0005   Epoch: 18   Global Step: 309920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:01:50,190-Speed 3303.78 samples/sec   Loss 0.1208   LearningRate 0.0005   Epoch: 18   Global Step: 309930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:01:53,268-Speed 3327.56 samples/sec   Loss 0.1299   LearningRate 0.0005   Epoch: 18   Global Step: 309940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:01:56,343-Speed 3331.13 samples/sec   Loss 0.1288   LearningRate 0.0005   Epoch: 18   Global Step: 309950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:01:59,463-Speed 3282.03 samples/sec   Loss 0.1121   LearningRate 0.0005   Epoch: 18   Global Step: 309960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:02:02,557-Speed 3310.57 samples/sec   Loss 0.1204   LearningRate 0.0005   Epoch: 18   Global Step: 309970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:02:05,659-Speed 3301.95 samples/sec   Loss 0.1413   LearningRate 0.0005   Epoch: 18   Global Step: 309980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:02:08,728-Speed 3338.07 samples/sec   Loss 0.1160   LearningRate 0.0005   Epoch: 18   Global Step: 309990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:02:11,831-Speed 3300.05 samples/sec   Loss 0.1157   LearningRate 0.0005   Epoch: 18   Global Step: 310000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:02:55,271-[lfw][310000]XNorm: 20.704607
Training: 2022-04-12 08:02:55,272-[lfw][310000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 08:02:55,272-[lfw][310000]Accuracy-Highest: 0.99817
Training: 2022-04-12 08:03:45,662-[cfp_fp][310000]XNorm: 22.410110
Training: 2022-04-12 08:03:45,662-[cfp_fp][310000]Accuracy-Flip: 0.99186+-0.00389
Training: 2022-04-12 08:03:45,663-[cfp_fp][310000]Accuracy-Highest: 0.99200
Training: 2022-04-12 08:04:29,053-[agedb_30][310000]XNorm: 22.644414
Training: 2022-04-12 08:04:29,054-[agedb_30][310000]Accuracy-Flip: 0.98617+-0.00592
Training: 2022-04-12 08:04:29,054-[agedb_30][310000]Accuracy-Highest: 0.98650
Training: 2022-04-12 08:04:32,119-Speed 72.99 samples/sec   Loss 0.1146   LearningRate 0.0005   Epoch: 18   Global Step: 310010   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:04:35,174-Speed 3352.53 samples/sec   Loss 0.1122   LearningRate 0.0005   Epoch: 18   Global Step: 310020   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:04:38,268-Speed 3311.19 samples/sec   Loss 0.1271   LearningRate 0.0005   Epoch: 18   Global Step: 310030   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-12 08:04:41,310-Speed 3366.82 samples/sec   Loss 0.1201   LearningRate 0.0005   Epoch: 18   Global Step: 310040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:04:44,367-Speed 3349.74 samples/sec   Loss 0.1312   LearningRate 0.0005   Epoch: 18   Global Step: 310050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:04:47,423-Speed 3351.78 samples/sec   Loss 0.1195   LearningRate 0.0005   Epoch: 18   Global Step: 310060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:04:50,495-Speed 3334.03 samples/sec   Loss 0.1317   LearningRate 0.0005   Epoch: 18   Global Step: 310070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:04:53,556-Speed 3346.37 samples/sec   Loss 0.1325   LearningRate 0.0005   Epoch: 18   Global Step: 310080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:04:56,640-Speed 3321.38 samples/sec   Loss 0.1381   LearningRate 0.0005   Epoch: 18   Global Step: 310090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:04:59,704-Speed 3342.77 samples/sec   Loss 0.1353   LearningRate 0.0005   Epoch: 18   Global Step: 310100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:05:02,772-Speed 3337.67 samples/sec   Loss 0.1482   LearningRate 0.0005   Epoch: 18   Global Step: 310110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:05:05,833-Speed 3346.16 samples/sec   Loss 0.1361   LearningRate 0.0005   Epoch: 18   Global Step: 310120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:05:08,923-Speed 3315.01 samples/sec   Loss 0.1220   LearningRate 0.0005   Epoch: 18   Global Step: 310130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:05:11,985-Speed 3345.49 samples/sec   Loss 0.1177   LearningRate 0.0005   Epoch: 18   Global Step: 310140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:05:15,049-Speed 3342.04 samples/sec   Loss 0.1217   LearningRate 0.0005   Epoch: 18   Global Step: 310150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:05:18,127-Speed 3328.08 samples/sec   Loss 0.1219   LearningRate 0.0005   Epoch: 18   Global Step: 310160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-12 08:05:21,188-Speed 3346.06 samples/sec   Loss 0.1211   LearningRate 0.0005   Epoch: 18   Global Step: 310170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:05:24,268-Speed 3325.61 samples/sec   Loss 0.1055   LearningRate 0.0005   Epoch: 18   Global Step: 310180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:05:27,361-Speed 3311.28 samples/sec   Loss 0.1262   LearningRate 0.0005   Epoch: 18   Global Step: 310190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:05:30,436-Speed 3330.79 samples/sec   Loss 0.1227   LearningRate 0.0005   Epoch: 18   Global Step: 310200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-12 08:05:33,508-Speed 3334.17 samples/sec   Loss 0.1245   LearningRate 0.0005   Epoch: 18   Global Step: 310210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:05:36,576-Speed 3338.46 samples/sec   Loss 0.1236   LearningRate 0.0005   Epoch: 18   Global Step: 310220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:05:39,704-Speed 3273.60 samples/sec   Loss 0.1264   LearningRate 0.0005   Epoch: 18   Global Step: 310230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:05:42,802-Speed 3306.80 samples/sec   Loss 0.1250   LearningRate 0.0005   Epoch: 18   Global Step: 310240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:05:45,885-Speed 3322.35 samples/sec   Loss 0.1291   LearningRate 0.0005   Epoch: 18   Global Step: 310250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:05:49,023-Speed 3263.89 samples/sec   Loss 0.1142   LearningRate 0.0005   Epoch: 18   Global Step: 310260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:05:52,097-Speed 3331.56 samples/sec   Loss 0.1233   LearningRate 0.0005   Epoch: 18   Global Step: 310270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:05:55,231-Speed 3268.18 samples/sec   Loss 0.1416   LearningRate 0.0005   Epoch: 18   Global Step: 310280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:05:58,340-Speed 3294.74 samples/sec   Loss 0.1165   LearningRate 0.0005   Epoch: 18   Global Step: 310290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:01,442-Speed 3302.12 samples/sec   Loss 0.1295   LearningRate 0.0005   Epoch: 18   Global Step: 310300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:04,674-Speed 3168.88 samples/sec   Loss 0.1169   LearningRate 0.0005   Epoch: 18   Global Step: 310310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:07,880-Speed 3196.15 samples/sec   Loss 0.1076   LearningRate 0.0005   Epoch: 18   Global Step: 310320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:10,956-Speed 3328.87 samples/sec   Loss 0.1161   LearningRate 0.0005   Epoch: 18   Global Step: 310330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:14,057-Speed 3303.28 samples/sec   Loss 0.1164   LearningRate 0.0005   Epoch: 18   Global Step: 310340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:17,234-Speed 3223.82 samples/sec   Loss 0.1202   LearningRate 0.0005   Epoch: 18   Global Step: 310350   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:20,382-Speed 3254.27 samples/sec   Loss 0.1164   LearningRate 0.0005   Epoch: 18   Global Step: 310360   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:23,485-Speed 3300.06 samples/sec   Loss 0.1142   LearningRate 0.0005   Epoch: 18   Global Step: 310370   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:06:26,557-Speed 3334.50 samples/sec   Loss 0.1198   LearningRate 0.0005   Epoch: 18   Global Step: 310380   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:06:29,702-Speed 3256.66 samples/sec   Loss 0.1344   LearningRate 0.0005   Epoch: 18   Global Step: 310390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:32,816-Speed 3288.79 samples/sec   Loss 0.1307   LearningRate 0.0005   Epoch: 18   Global Step: 310400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:35,905-Speed 3316.30 samples/sec   Loss 0.1204   LearningRate 0.0005   Epoch: 18   Global Step: 310410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:39,129-Speed 3176.41 samples/sec   Loss 0.1293   LearningRate 0.0005   Epoch: 18   Global Step: 310420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:42,231-Speed 3301.90 samples/sec   Loss 0.1232   LearningRate 0.0005   Epoch: 18   Global Step: 310430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:45,312-Speed 3324.79 samples/sec   Loss 0.1110   LearningRate 0.0005   Epoch: 18   Global Step: 310440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:48,537-Speed 3176.11 samples/sec   Loss 0.1274   LearningRate 0.0005   Epoch: 18   Global Step: 310450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:51,598-Speed 3345.87 samples/sec   Loss 0.1227   LearningRate 0.0005   Epoch: 18   Global Step: 310460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:54,658-Speed 3347.32 samples/sec   Loss 0.1237   LearningRate 0.0005   Epoch: 18   Global Step: 310470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:06:57,724-Speed 3340.60 samples/sec   Loss 0.1339   LearningRate 0.0005   Epoch: 18   Global Step: 310480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:00,823-Speed 3304.44 samples/sec   Loss 0.1215   LearningRate 0.0005   Epoch: 18   Global Step: 310490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:03,933-Speed 3293.48 samples/sec   Loss 0.1227   LearningRate 0.0005   Epoch: 18   Global Step: 310500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:06,990-Speed 3350.96 samples/sec   Loss 0.1215   LearningRate 0.0005   Epoch: 18   Global Step: 310510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:10,078-Speed 3316.13 samples/sec   Loss 0.1281   LearningRate 0.0005   Epoch: 18   Global Step: 310520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:13,153-Speed 3331.55 samples/sec   Loss 0.1229   LearningRate 0.0005   Epoch: 18   Global Step: 310530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:16,320-Speed 3233.78 samples/sec   Loss 0.1316   LearningRate 0.0005   Epoch: 18   Global Step: 310540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:19,480-Speed 3241.98 samples/sec   Loss 0.1299   LearningRate 0.0005   Epoch: 18   Global Step: 310550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:22,574-Speed 3309.36 samples/sec   Loss 0.1374   LearningRate 0.0005   Epoch: 18   Global Step: 310560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:25,657-Speed 3323.39 samples/sec   Loss 0.1261   LearningRate 0.0005   Epoch: 18   Global Step: 310570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:28,720-Speed 3343.69 samples/sec   Loss 0.1300   LearningRate 0.0005   Epoch: 18   Global Step: 310580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:31,786-Speed 3340.41 samples/sec   Loss 0.1267   LearningRate 0.0005   Epoch: 18   Global Step: 310590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:34,945-Speed 3242.42 samples/sec   Loss 0.1267   LearningRate 0.0005   Epoch: 18   Global Step: 310600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:07:38,012-Speed 3339.70 samples/sec   Loss 0.1201   LearningRate 0.0005   Epoch: 18   Global Step: 310610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:41,199-Speed 3213.97 samples/sec   Loss 0.1203   LearningRate 0.0005   Epoch: 18   Global Step: 310620   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:44,314-Speed 3288.41 samples/sec   Loss 0.1288   LearningRate 0.0005   Epoch: 18   Global Step: 310630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:47,392-Speed 3327.37 samples/sec   Loss 0.1204   LearningRate 0.0005   Epoch: 18   Global Step: 310640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:50,462-Speed 3336.88 samples/sec   Loss 0.1177   LearningRate 0.0005   Epoch: 18   Global Step: 310650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:53,633-Speed 3229.50 samples/sec   Loss 0.1238   LearningRate 0.0005   Epoch: 18   Global Step: 310660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:56,761-Speed 3274.83 samples/sec   Loss 0.1261   LearningRate 0.0005   Epoch: 18   Global Step: 310670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:07:59,827-Speed 3339.99 samples/sec   Loss 0.1244   LearningRate 0.0005   Epoch: 18   Global Step: 310680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:02,901-Speed 3332.34 samples/sec   Loss 0.1370   LearningRate 0.0005   Epoch: 18   Global Step: 310690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:06,096-Speed 3205.23 samples/sec   Loss 0.1227   LearningRate 0.0005   Epoch: 18   Global Step: 310700   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:09,162-Speed 3341.60 samples/sec   Loss 0.1218   LearningRate 0.0005   Epoch: 18   Global Step: 310710   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:08:12,217-Speed 3352.18 samples/sec   Loss 0.1171   LearningRate 0.0005   Epoch: 18   Global Step: 310720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:15,283-Speed 3341.10 samples/sec   Loss 0.1275   LearningRate 0.0005   Epoch: 18   Global Step: 310730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:18,345-Speed 3344.52 samples/sec   Loss 0.1262   LearningRate 0.0005   Epoch: 18   Global Step: 310740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:21,416-Speed 3334.61 samples/sec   Loss 0.1319   LearningRate 0.0005   Epoch: 18   Global Step: 310750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:24,493-Speed 3328.74 samples/sec   Loss 0.1162   LearningRate 0.0005   Epoch: 18   Global Step: 310760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:27,595-Speed 3301.77 samples/sec   Loss 0.1176   LearningRate 0.0005   Epoch: 18   Global Step: 310770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:30,687-Speed 3312.64 samples/sec   Loss 0.1287   LearningRate 0.0005   Epoch: 18   Global Step: 310780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:33,791-Speed 3300.11 samples/sec   Loss 0.1304   LearningRate 0.0005   Epoch: 18   Global Step: 310790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:36,877-Speed 3318.69 samples/sec   Loss 0.1242   LearningRate 0.0005   Epoch: 18   Global Step: 310800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:08:39,943-Speed 3341.01 samples/sec   Loss 0.1306   LearningRate 0.0005   Epoch: 18   Global Step: 310810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:08:43,011-Speed 3338.35 samples/sec   Loss 0.1326   LearningRate 0.0005   Epoch: 18   Global Step: 310820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:08:46,112-Speed 3303.32 samples/sec   Loss 0.1311   LearningRate 0.0005   Epoch: 18   Global Step: 310830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:08:49,217-Speed 3297.79 samples/sec   Loss 0.1254   LearningRate 0.0005   Epoch: 18   Global Step: 310840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:08:52,330-Speed 3290.94 samples/sec   Loss 0.1302   LearningRate 0.0005   Epoch: 18   Global Step: 310850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:08:55,435-Speed 3298.48 samples/sec   Loss 0.1255   LearningRate 0.0005   Epoch: 18   Global Step: 310860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:08:58,511-Speed 3329.62 samples/sec   Loss 0.1135   LearningRate 0.0005   Epoch: 18   Global Step: 310870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:09:01,583-Speed 3333.37 samples/sec   Loss 0.1218   LearningRate 0.0005   Epoch: 18   Global Step: 310880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:09:04,754-Speed 3230.52 samples/sec   Loss 0.1221   LearningRate 0.0005   Epoch: 18   Global Step: 310890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:09:07,833-Speed 3326.54 samples/sec   Loss 0.1140   LearningRate 0.0005   Epoch: 18   Global Step: 310900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:09:10,907-Speed 3332.65 samples/sec   Loss 0.1188   LearningRate 0.0005   Epoch: 18   Global Step: 310910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:13,989-Speed 3322.82 samples/sec   Loss 0.1276   LearningRate 0.0005   Epoch: 18   Global Step: 310920   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:17,056-Speed 3338.96 samples/sec   Loss 0.1300   LearningRate 0.0005   Epoch: 18   Global Step: 310930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:20,133-Speed 3328.77 samples/sec   Loss 0.1279   LearningRate 0.0005   Epoch: 18   Global Step: 310940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:23,211-Speed 3328.44 samples/sec   Loss 0.1222   LearningRate 0.0005   Epoch: 18   Global Step: 310950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:26,305-Speed 3309.49 samples/sec   Loss 0.1297   LearningRate 0.0005   Epoch: 18   Global Step: 310960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:29,369-Speed 3342.84 samples/sec   Loss 0.1162   LearningRate 0.0005   Epoch: 18   Global Step: 310970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:32,439-Speed 3336.39 samples/sec   Loss 0.1232   LearningRate 0.0005   Epoch: 18   Global Step: 310980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:35,622-Speed 3218.32 samples/sec   Loss 0.1203   LearningRate 0.0005   Epoch: 18   Global Step: 310990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:38,743-Speed 3281.41 samples/sec   Loss 0.1183   LearningRate 0.0005   Epoch: 18   Global Step: 311000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:41,825-Speed 3323.45 samples/sec   Loss 0.1285   LearningRate 0.0005   Epoch: 18   Global Step: 311010   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:09:44,902-Speed 3328.80 samples/sec   Loss 0.1226   LearningRate 0.0005   Epoch: 18   Global Step: 311020   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:09:48,029-Speed 3275.19 samples/sec   Loss 0.1215   LearningRate 0.0005   Epoch: 18   Global Step: 311030   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:09:51,140-Speed 3291.75 samples/sec   Loss 0.1273   LearningRate 0.0005   Epoch: 18   Global Step: 311040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:54,260-Speed 3283.43 samples/sec   Loss 0.1315   LearningRate 0.0005   Epoch: 18   Global Step: 311050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:09:57,368-Speed 3294.80 samples/sec   Loss 0.1350   LearningRate 0.0005   Epoch: 18   Global Step: 311060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:00,468-Speed 3304.85 samples/sec   Loss 0.1275   LearningRate 0.0005   Epoch: 18   Global Step: 311070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:03,555-Speed 3317.20 samples/sec   Loss 0.1187   LearningRate 0.0005   Epoch: 18   Global Step: 311080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:06,651-Speed 3308.44 samples/sec   Loss 0.1247   LearningRate 0.0005   Epoch: 18   Global Step: 311090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:09,719-Speed 3338.56 samples/sec   Loss 0.1273   LearningRate 0.0005   Epoch: 18   Global Step: 311100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:12,789-Speed 3336.66 samples/sec   Loss 0.1269   LearningRate 0.0005   Epoch: 18   Global Step: 311110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:15,896-Speed 3296.24 samples/sec   Loss 0.1295   LearningRate 0.0005   Epoch: 18   Global Step: 311120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:18,968-Speed 3334.25 samples/sec   Loss 0.1319   LearningRate 0.0005   Epoch: 18   Global Step: 311130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:22,031-Speed 3343.82 samples/sec   Loss 0.1375   LearningRate 0.0005   Epoch: 18   Global Step: 311140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:25,093-Speed 3344.48 samples/sec   Loss 0.1295   LearningRate 0.0005   Epoch: 18   Global Step: 311150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:28,166-Speed 3333.40 samples/sec   Loss 0.1188   LearningRate 0.0005   Epoch: 18   Global Step: 311160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:31,274-Speed 3296.06 samples/sec   Loss 0.1203   LearningRate 0.0005   Epoch: 18   Global Step: 311170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:34,352-Speed 3326.93 samples/sec   Loss 0.1241   LearningRate 0.0005   Epoch: 18   Global Step: 311180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:37,450-Speed 3305.99 samples/sec   Loss 0.1273   LearningRate 0.0005   Epoch: 18   Global Step: 311190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:40,536-Speed 3319.50 samples/sec   Loss 0.1165   LearningRate 0.0005   Epoch: 18   Global Step: 311200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:10:43,605-Speed 3337.28 samples/sec   Loss 0.1292   LearningRate 0.0005   Epoch: 18   Global Step: 311210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:10:46,689-Speed 3321.30 samples/sec   Loss 0.1238   LearningRate 0.0005   Epoch: 18   Global Step: 311220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:10:49,765-Speed 3328.74 samples/sec   Loss 0.1215   LearningRate 0.0005   Epoch: 18   Global Step: 311230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:10:52,914-Speed 3253.94 samples/sec   Loss 0.1200   LearningRate 0.0005   Epoch: 18   Global Step: 311240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:10:56,044-Speed 3271.69 samples/sec   Loss 0.1305   LearningRate 0.0005   Epoch: 18   Global Step: 311250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:10:59,117-Speed 3333.75 samples/sec   Loss 0.1181   LearningRate 0.0005   Epoch: 18   Global Step: 311260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:11:02,195-Speed 3327.09 samples/sec   Loss 0.1230   LearningRate 0.0005   Epoch: 18   Global Step: 311270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:11:05,338-Speed 3258.64 samples/sec   Loss 0.1146   LearningRate 0.0005   Epoch: 18   Global Step: 311280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:11:08,415-Speed 3328.13 samples/sec   Loss 0.1177   LearningRate 0.0005   Epoch: 18   Global Step: 311290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:11:11,503-Speed 3316.75 samples/sec   Loss 0.1206   LearningRate 0.0005   Epoch: 18   Global Step: 311300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:11:14,618-Speed 3288.81 samples/sec   Loss 0.1172   LearningRate 0.0005   Epoch: 18   Global Step: 311310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:17,693-Speed 3330.01 samples/sec   Loss 0.1032   LearningRate 0.0005   Epoch: 18   Global Step: 311320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:20,763-Speed 3337.38 samples/sec   Loss 0.1168   LearningRate 0.0005   Epoch: 18   Global Step: 311330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:23,928-Speed 3236.52 samples/sec   Loss 0.1170   LearningRate 0.0005   Epoch: 18   Global Step: 311340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:27,008-Speed 3324.50 samples/sec   Loss 0.1229   LearningRate 0.0005   Epoch: 18   Global Step: 311350   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:30,077-Speed 3337.46 samples/sec   Loss 0.1277   LearningRate 0.0005   Epoch: 18   Global Step: 311360   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:33,156-Speed 3327.15 samples/sec   Loss 0.1178   LearningRate 0.0005   Epoch: 18   Global Step: 311370   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:36,242-Speed 3318.36 samples/sec   Loss 0.1195   LearningRate 0.0005   Epoch: 18   Global Step: 311380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:39,322-Speed 3325.96 samples/sec   Loss 0.1309   LearningRate 0.0005   Epoch: 18   Global Step: 311390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:42,400-Speed 3326.78 samples/sec   Loss 0.1397   LearningRate 0.0005   Epoch: 18   Global Step: 311400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:45,525-Speed 3278.44 samples/sec   Loss 0.1262   LearningRate 0.0005   Epoch: 18   Global Step: 311410   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:11:48,603-Speed 3327.57 samples/sec   Loss 0.1325   LearningRate 0.0005   Epoch: 18   Global Step: 311420   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:11:51,668-Speed 3340.97 samples/sec   Loss 0.1231   LearningRate 0.0004   Epoch: 18   Global Step: 311430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:54,737-Speed 3337.91 samples/sec   Loss 0.1201   LearningRate 0.0004   Epoch: 18   Global Step: 311440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:11:57,810-Speed 3332.89 samples/sec   Loss 0.1265   LearningRate 0.0004   Epoch: 18   Global Step: 311450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:00,879-Speed 3336.94 samples/sec   Loss 0.1283   LearningRate 0.0004   Epoch: 18   Global Step: 311460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:03,962-Speed 3321.85 samples/sec   Loss 0.1247   LearningRate 0.0004   Epoch: 18   Global Step: 311470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:07,040-Speed 3327.99 samples/sec   Loss 0.1206   LearningRate 0.0004   Epoch: 18   Global Step: 311480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:10,119-Speed 3327.18 samples/sec   Loss 0.1229   LearningRate 0.0004   Epoch: 18   Global Step: 311490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:13,221-Speed 3302.12 samples/sec   Loss 0.1188   LearningRate 0.0004   Epoch: 18   Global Step: 311500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:16,337-Speed 3286.34 samples/sec   Loss 0.1284   LearningRate 0.0004   Epoch: 18   Global Step: 311510   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:19,414-Speed 3329.18 samples/sec   Loss 0.1168   LearningRate 0.0004   Epoch: 18   Global Step: 311520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:22,493-Speed 3326.87 samples/sec   Loss 0.1208   LearningRate 0.0004   Epoch: 18   Global Step: 311530   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:12:25,561-Speed 3337.90 samples/sec   Loss 0.1288   LearningRate 0.0004   Epoch: 18   Global Step: 311540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:28,752-Speed 3209.29 samples/sec   Loss 0.1278   LearningRate 0.0004   Epoch: 18   Global Step: 311550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:31,857-Speed 3298.87 samples/sec   Loss 0.1371   LearningRate 0.0004   Epoch: 18   Global Step: 311560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:35,009-Speed 3249.54 samples/sec   Loss 0.1375   LearningRate 0.0004   Epoch: 18   Global Step: 311570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:38,113-Speed 3299.80 samples/sec   Loss 0.1299   LearningRate 0.0004   Epoch: 18   Global Step: 311580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:41,181-Speed 3338.86 samples/sec   Loss 0.1224   LearningRate 0.0004   Epoch: 18   Global Step: 311590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:44,254-Speed 3333.10 samples/sec   Loss 0.1122   LearningRate 0.0004   Epoch: 18   Global Step: 311600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:47,331-Speed 3328.46 samples/sec   Loss 0.1111   LearningRate 0.0004   Epoch: 18   Global Step: 311610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:50,395-Speed 3343.17 samples/sec   Loss 0.1321   LearningRate 0.0004   Epoch: 18   Global Step: 311620   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:53,488-Speed 3311.53 samples/sec   Loss 0.1283   LearningRate 0.0004   Epoch: 18   Global Step: 311630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:12:56,579-Speed 3313.42 samples/sec   Loss 0.1221   LearningRate 0.0004   Epoch: 18   Global Step: 311640   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:12:59,700-Speed 3280.99 samples/sec   Loss 0.1089   LearningRate 0.0004   Epoch: 18   Global Step: 311650   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:13:02,799-Speed 3305.53 samples/sec   Loss 0.1198   LearningRate 0.0004   Epoch: 18   Global Step: 311660   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:13:05,908-Speed 3294.68 samples/sec   Loss 0.1230   LearningRate 0.0004   Epoch: 18   Global Step: 311670   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:13:08,971-Speed 3344.46 samples/sec   Loss 0.1213   LearningRate 0.0004   Epoch: 18   Global Step: 311680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:12,041-Speed 3335.46 samples/sec   Loss 0.1269   LearningRate 0.0004   Epoch: 18   Global Step: 311690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:15,106-Speed 3341.88 samples/sec   Loss 0.1262   LearningRate 0.0004   Epoch: 18   Global Step: 311700   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:18,226-Speed 3282.84 samples/sec   Loss 0.1176   LearningRate 0.0004   Epoch: 18   Global Step: 311710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:21,303-Speed 3328.48 samples/sec   Loss 0.1191   LearningRate 0.0004   Epoch: 18   Global Step: 311720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:24,366-Speed 3344.19 samples/sec   Loss 0.1204   LearningRate 0.0004   Epoch: 18   Global Step: 311730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:27,570-Speed 3196.57 samples/sec   Loss 0.1168   LearningRate 0.0004   Epoch: 18   Global Step: 311740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:30,710-Speed 3262.23 samples/sec   Loss 0.1415   LearningRate 0.0004   Epoch: 18   Global Step: 311750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:33,847-Speed 3264.81 samples/sec   Loss 0.1309   LearningRate 0.0004   Epoch: 18   Global Step: 311760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:36,913-Speed 3340.26 samples/sec   Loss 0.1198   LearningRate 0.0004   Epoch: 18   Global Step: 311770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:40,000-Speed 3318.29 samples/sec   Loss 0.1181   LearningRate 0.0004   Epoch: 18   Global Step: 311780   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:13:43,084-Speed 3321.27 samples/sec   Loss 0.1156   LearningRate 0.0004   Epoch: 18   Global Step: 311790   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:13:46,140-Speed 3351.31 samples/sec   Loss 0.1349   LearningRate 0.0004   Epoch: 18   Global Step: 311800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:49,206-Speed 3341.01 samples/sec   Loss 0.1137   LearningRate 0.0004   Epoch: 18   Global Step: 311810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:52,274-Speed 3338.41 samples/sec   Loss 0.1393   LearningRate 0.0004   Epoch: 18   Global Step: 311820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:55,348-Speed 3331.80 samples/sec   Loss 0.1271   LearningRate 0.0004   Epoch: 18   Global Step: 311830   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:13:58,438-Speed 3314.76 samples/sec   Loss 0.1129   LearningRate 0.0004   Epoch: 18   Global Step: 311840   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:01,520-Speed 3323.17 samples/sec   Loss 0.1200   LearningRate 0.0004   Epoch: 18   Global Step: 311850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:04,594-Speed 3331.94 samples/sec   Loss 0.1238   LearningRate 0.0004   Epoch: 18   Global Step: 311860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:07,689-Speed 3310.01 samples/sec   Loss 0.1178   LearningRate 0.0004   Epoch: 18   Global Step: 311870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:10,763-Speed 3331.69 samples/sec   Loss 0.1177   LearningRate 0.0004   Epoch: 18   Global Step: 311880   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:13,829-Speed 3340.11 samples/sec   Loss 0.1247   LearningRate 0.0004   Epoch: 18   Global Step: 311890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:16,906-Speed 3329.07 samples/sec   Loss 0.1169   LearningRate 0.0004   Epoch: 18   Global Step: 311900   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:14:19,978-Speed 3333.73 samples/sec   Loss 0.1230   LearningRate 0.0004   Epoch: 18   Global Step: 311910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:23,056-Speed 3327.73 samples/sec   Loss 0.1286   LearningRate 0.0004   Epoch: 18   Global Step: 311920   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:26,136-Speed 3325.04 samples/sec   Loss 0.1184   LearningRate 0.0004   Epoch: 18   Global Step: 311930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:29,219-Speed 3322.17 samples/sec   Loss 0.1241   LearningRate 0.0004   Epoch: 18   Global Step: 311940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:32,308-Speed 3316.11 samples/sec   Loss 0.1241   LearningRate 0.0004   Epoch: 18   Global Step: 311950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:35,374-Speed 3341.29 samples/sec   Loss 0.1290   LearningRate 0.0004   Epoch: 18   Global Step: 311960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:14:38,436-Speed 3343.99 samples/sec   Loss 0.1276   LearningRate 0.0004   Epoch: 18   Global Step: 311970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:14:41,501-Speed 3341.53 samples/sec   Loss 0.1192   LearningRate 0.0004   Epoch: 18   Global Step: 311980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:14:44,570-Speed 3337.64 samples/sec   Loss 0.1150   LearningRate 0.0004   Epoch: 18   Global Step: 311990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:14:47,639-Speed 3337.64 samples/sec   Loss 0.1200   LearningRate 0.0004   Epoch: 18   Global Step: 312000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:15:31,203-[lfw][312000]XNorm: 20.808169
Training: 2022-04-12 08:15:31,204-[lfw][312000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 08:15:31,204-[lfw][312000]Accuracy-Highest: 0.99817
Training: 2022-04-12 08:16:21,853-[cfp_fp][312000]XNorm: 22.615881
Training: 2022-04-12 08:16:21,853-[cfp_fp][312000]Accuracy-Flip: 0.99171+-0.00382
Training: 2022-04-12 08:16:21,854-[cfp_fp][312000]Accuracy-Highest: 0.99200
Training: 2022-04-12 08:17:05,602-[agedb_30][312000]XNorm: 22.862102
Training: 2022-04-12 08:17:05,603-[agedb_30][312000]Accuracy-Flip: 0.98633+-0.00562
Training: 2022-04-12 08:17:05,603-[agedb_30][312000]Accuracy-Highest: 0.98650
Training: 2022-04-12 08:17:08,673-Speed 72.61 samples/sec   Loss 0.1255   LearningRate 0.0004   Epoch: 18   Global Step: 312010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:17:11,765-Speed 3312.90 samples/sec   Loss 0.1355   LearningRate 0.0004   Epoch: 18   Global Step: 312020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:17:14,823-Speed 3349.21 samples/sec   Loss 0.1204   LearningRate 0.0004   Epoch: 18   Global Step: 312030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:17:17,913-Speed 3314.97 samples/sec   Loss 0.1212   LearningRate 0.0004   Epoch: 18   Global Step: 312040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:17:20,973-Speed 3346.63 samples/sec   Loss 0.1296   LearningRate 0.0004   Epoch: 18   Global Step: 312050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:17:24,055-Speed 3323.77 samples/sec   Loss 0.1234   LearningRate 0.0004   Epoch: 18   Global Step: 312060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:17:27,259-Speed 3196.69 samples/sec   Loss 0.1192   LearningRate 0.0004   Epoch: 18   Global Step: 312070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:30,394-Speed 3267.26 samples/sec   Loss 0.1222   LearningRate 0.0004   Epoch: 18   Global Step: 312080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:33,457-Speed 3343.40 samples/sec   Loss 0.1299   LearningRate 0.0004   Epoch: 18   Global Step: 312090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:36,522-Speed 3341.55 samples/sec   Loss 0.1239   LearningRate 0.0004   Epoch: 18   Global Step: 312100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:39,633-Speed 3292.48 samples/sec   Loss 0.1233   LearningRate 0.0004   Epoch: 18   Global Step: 312110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:42,706-Speed 3333.30 samples/sec   Loss 0.1180   LearningRate 0.0004   Epoch: 18   Global Step: 312120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:45,862-Speed 3245.49 samples/sec   Loss 0.1158   LearningRate 0.0004   Epoch: 18   Global Step: 312130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:48,960-Speed 3305.53 samples/sec   Loss 0.1147   LearningRate 0.0004   Epoch: 18   Global Step: 312140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:52,056-Speed 3308.58 samples/sec   Loss 0.1080   LearningRate 0.0004   Epoch: 18   Global Step: 312150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:55,124-Speed 3338.71 samples/sec   Loss 0.1329   LearningRate 0.0004   Epoch: 18   Global Step: 312160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:17:58,183-Speed 3348.44 samples/sec   Loss 0.1201   LearningRate 0.0004   Epoch: 18   Global Step: 312170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:01,259-Speed 3329.26 samples/sec   Loss 0.1212   LearningRate 0.0004   Epoch: 18   Global Step: 312180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:04,322-Speed 3343.89 samples/sec   Loss 0.1232   LearningRate 0.0004   Epoch: 18   Global Step: 312190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:07,392-Speed 3336.65 samples/sec   Loss 0.1325   LearningRate 0.0004   Epoch: 18   Global Step: 312200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:10,540-Speed 3253.57 samples/sec   Loss 0.1211   LearningRate 0.0004   Epoch: 18   Global Step: 312210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:13,673-Speed 3268.83 samples/sec   Loss 0.1313   LearningRate 0.0004   Epoch: 18   Global Step: 312220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:16,819-Speed 3255.31 samples/sec   Loss 0.1314   LearningRate 0.0004   Epoch: 18   Global Step: 312230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:19,942-Speed 3280.61 samples/sec   Loss 0.1187   LearningRate 0.0004   Epoch: 18   Global Step: 312240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:23,092-Speed 3251.03 samples/sec   Loss 0.1174   LearningRate 0.0004   Epoch: 18   Global Step: 312250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:26,160-Speed 3338.37 samples/sec   Loss 0.1184   LearningRate 0.0004   Epoch: 18   Global Step: 312260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:29,225-Speed 3341.97 samples/sec   Loss 0.1214   LearningRate 0.0004   Epoch: 18   Global Step: 312270   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:18:32,358-Speed 3269.28 samples/sec   Loss 0.1254   LearningRate 0.0004   Epoch: 18   Global Step: 312280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:35,498-Speed 3261.39 samples/sec   Loss 0.1098   LearningRate 0.0004   Epoch: 18   Global Step: 312290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:38,756-Speed 3144.45 samples/sec   Loss 0.1199   LearningRate 0.0004   Epoch: 18   Global Step: 312300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:41,832-Speed 3329.36 samples/sec   Loss 0.1215   LearningRate 0.0004   Epoch: 18   Global Step: 312310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:44,898-Speed 3340.83 samples/sec   Loss 0.1258   LearningRate 0.0004   Epoch: 18   Global Step: 312320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:47,967-Speed 3338.19 samples/sec   Loss 0.1159   LearningRate 0.0004   Epoch: 18   Global Step: 312330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:18:51,020-Speed 3354.35 samples/sec   Loss 0.1363   LearningRate 0.0004   Epoch: 18   Global Step: 312340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:18:54,117-Speed 3306.61 samples/sec   Loss 0.1259   LearningRate 0.0004   Epoch: 18   Global Step: 312350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:18:57,333-Speed 3184.79 samples/sec   Loss 0.1273   LearningRate 0.0004   Epoch: 18   Global Step: 312360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:00,437-Speed 3300.46 samples/sec   Loss 0.1274   LearningRate 0.0004   Epoch: 18   Global Step: 312370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:03,571-Speed 3268.24 samples/sec   Loss 0.1218   LearningRate 0.0004   Epoch: 18   Global Step: 312380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:06,649-Speed 3327.56 samples/sec   Loss 0.1177   LearningRate 0.0004   Epoch: 18   Global Step: 312390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:09,711-Speed 3344.42 samples/sec   Loss 0.1268   LearningRate 0.0004   Epoch: 18   Global Step: 312400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:12,780-Speed 3337.79 samples/sec   Loss 0.1177   LearningRate 0.0004   Epoch: 18   Global Step: 312410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:15,879-Speed 3304.93 samples/sec   Loss 0.1263   LearningRate 0.0004   Epoch: 18   Global Step: 312420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:18,942-Speed 3344.73 samples/sec   Loss 0.1208   LearningRate 0.0004   Epoch: 18   Global Step: 312430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:22,004-Speed 3344.29 samples/sec   Loss 0.1228   LearningRate 0.0004   Epoch: 18   Global Step: 312440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:19:25,076-Speed 3334.18 samples/sec   Loss 0.1305   LearningRate 0.0004   Epoch: 18   Global Step: 312450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:19:28,175-Speed 3304.83 samples/sec   Loss 0.1182   LearningRate 0.0004   Epoch: 18   Global Step: 312460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:19:31,241-Speed 3340.74 samples/sec   Loss 0.1287   LearningRate 0.0004   Epoch: 18   Global Step: 312470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:19:34,316-Speed 3331.06 samples/sec   Loss 0.1315   LearningRate 0.0004   Epoch: 18   Global Step: 312480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:37,424-Speed 3295.22 samples/sec   Loss 0.1222   LearningRate 0.0004   Epoch: 18   Global Step: 312490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:40,574-Speed 3251.03 samples/sec   Loss 0.1212   LearningRate 0.0004   Epoch: 18   Global Step: 312500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:43,639-Speed 3342.88 samples/sec   Loss 0.1228   LearningRate 0.0004   Epoch: 18   Global Step: 312510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:46,776-Speed 3264.88 samples/sec   Loss 0.1240   LearningRate 0.0004   Epoch: 18   Global Step: 312520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:49,887-Speed 3291.56 samples/sec   Loss 0.1159   LearningRate 0.0004   Epoch: 18   Global Step: 312530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:52,971-Speed 3320.97 samples/sec   Loss 0.1296   LearningRate 0.0004   Epoch: 18   Global Step: 312540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:56,047-Speed 3329.99 samples/sec   Loss 0.1294   LearningRate 0.0004   Epoch: 18   Global Step: 312550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:19:59,135-Speed 3328.26 samples/sec   Loss 0.1201   LearningRate 0.0004   Epoch: 18   Global Step: 312560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:20:02,243-Speed 3295.73 samples/sec   Loss 0.1230   LearningRate 0.0004   Epoch: 18   Global Step: 312570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:20:05,315-Speed 3333.48 samples/sec   Loss 0.1244   LearningRate 0.0004   Epoch: 18   Global Step: 312580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:08,398-Speed 3323.11 samples/sec   Loss 0.1228   LearningRate 0.0004   Epoch: 18   Global Step: 312590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:11,520-Speed 3279.80 samples/sec   Loss 0.1218   LearningRate 0.0004   Epoch: 18   Global Step: 312600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:14,701-Speed 3220.73 samples/sec   Loss 0.1230   LearningRate 0.0004   Epoch: 18   Global Step: 312610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:17,872-Speed 3229.08 samples/sec   Loss 0.1121   LearningRate 0.0004   Epoch: 18   Global Step: 312620   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:20,993-Speed 3282.13 samples/sec   Loss 0.1266   LearningRate 0.0004   Epoch: 18   Global Step: 312630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:24,076-Speed 3321.77 samples/sec   Loss 0.1187   LearningRate 0.0004   Epoch: 18   Global Step: 312640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:27,139-Speed 3344.18 samples/sec   Loss 0.1114   LearningRate 0.0004   Epoch: 18   Global Step: 312650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:30,253-Speed 3289.38 samples/sec   Loss 0.1261   LearningRate 0.0004   Epoch: 18   Global Step: 312660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:33,317-Speed 3343.49 samples/sec   Loss 0.1131   LearningRate 0.0004   Epoch: 18   Global Step: 312670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:36,394-Speed 3328.65 samples/sec   Loss 0.1222   LearningRate 0.0004   Epoch: 18   Global Step: 312680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:39,461-Speed 3338.91 samples/sec   Loss 0.1246   LearningRate 0.0004   Epoch: 18   Global Step: 312690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:42,531-Speed 3336.97 samples/sec   Loss 0.1164   LearningRate 0.0004   Epoch: 18   Global Step: 312700   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:45,635-Speed 3299.33 samples/sec   Loss 0.1312   LearningRate 0.0004   Epoch: 18   Global Step: 312710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:48,724-Speed 3316.55 samples/sec   Loss 0.1314   LearningRate 0.0004   Epoch: 18   Global Step: 312720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:51,819-Speed 3308.73 samples/sec   Loss 0.1233   LearningRate 0.0004   Epoch: 18   Global Step: 312730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:54,890-Speed 3334.74 samples/sec   Loss 0.1220   LearningRate 0.0004   Epoch: 18   Global Step: 312740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:20:58,047-Speed 3244.46 samples/sec   Loss 0.1263   LearningRate 0.0004   Epoch: 18   Global Step: 312750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:01,122-Speed 3331.78 samples/sec   Loss 0.1281   LearningRate 0.0004   Epoch: 18   Global Step: 312760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:04,203-Speed 3324.47 samples/sec   Loss 0.1189   LearningRate 0.0004   Epoch: 18   Global Step: 312770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:07,258-Speed 3351.66 samples/sec   Loss 0.1244   LearningRate 0.0004   Epoch: 18   Global Step: 312780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:10,324-Speed 3340.63 samples/sec   Loss 0.1303   LearningRate 0.0004   Epoch: 18   Global Step: 312790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:13,392-Speed 3338.48 samples/sec   Loss 0.1171   LearningRate 0.0004   Epoch: 18   Global Step: 312800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:16,499-Speed 3297.34 samples/sec   Loss 0.1332   LearningRate 0.0004   Epoch: 18   Global Step: 312810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:19,598-Speed 3305.27 samples/sec   Loss 0.1320   LearningRate 0.0004   Epoch: 18   Global Step: 312820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:22,705-Speed 3295.56 samples/sec   Loss 0.1315   LearningRate 0.0004   Epoch: 18   Global Step: 312830   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:25,816-Speed 3293.81 samples/sec   Loss 0.1111   LearningRate 0.0004   Epoch: 18   Global Step: 312840   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:28,997-Speed 3219.64 samples/sec   Loss 0.1206   LearningRate 0.0004   Epoch: 18   Global Step: 312850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:32,068-Speed 3335.36 samples/sec   Loss 0.1166   LearningRate 0.0004   Epoch: 18   Global Step: 312860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:35,136-Speed 3338.55 samples/sec   Loss 0.1216   LearningRate 0.0004   Epoch: 18   Global Step: 312870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:38,200-Speed 3342.50 samples/sec   Loss 0.1184   LearningRate 0.0004   Epoch: 18   Global Step: 312880   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:21:41,271-Speed 3335.39 samples/sec   Loss 0.1166   LearningRate 0.0004   Epoch: 18   Global Step: 312890   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:21:44,325-Speed 3352.74 samples/sec   Loss 0.1231   LearningRate 0.0004   Epoch: 18   Global Step: 312900   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:47,415-Speed 3314.74 samples/sec   Loss 0.1239   LearningRate 0.0004   Epoch: 18   Global Step: 312910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:50,482-Speed 3340.25 samples/sec   Loss 0.1270   LearningRate 0.0004   Epoch: 18   Global Step: 312920   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:53,555-Speed 3332.68 samples/sec   Loss 0.1275   LearningRate 0.0004   Epoch: 18   Global Step: 312930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:56,686-Speed 3272.14 samples/sec   Loss 0.1260   LearningRate 0.0004   Epoch: 18   Global Step: 312940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:21:59,891-Speed 3194.88 samples/sec   Loss 0.1261   LearningRate 0.0004   Epoch: 18   Global Step: 312950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:02,987-Speed 3308.87 samples/sec   Loss 0.1218   LearningRate 0.0004   Epoch: 18   Global Step: 312960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:06,057-Speed 3336.25 samples/sec   Loss 0.1326   LearningRate 0.0004   Epoch: 18   Global Step: 312970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:09,162-Speed 3297.86 samples/sec   Loss 0.1219   LearningRate 0.0004   Epoch: 18   Global Step: 312980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:12,235-Speed 3333.81 samples/sec   Loss 0.1259   LearningRate 0.0004   Epoch: 18   Global Step: 312990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:15,386-Speed 3250.12 samples/sec   Loss 0.1301   LearningRate 0.0004   Epoch: 18   Global Step: 313000   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:22:18,460-Speed 3332.01 samples/sec   Loss 0.1220   LearningRate 0.0004   Epoch: 18   Global Step: 313010   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:22:21,515-Speed 3352.54 samples/sec   Loss 0.1232   LearningRate 0.0004   Epoch: 18   Global Step: 313020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:24,592-Speed 3328.53 samples/sec   Loss 0.1235   LearningRate 0.0004   Epoch: 18   Global Step: 313030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:27,657-Speed 3342.53 samples/sec   Loss 0.1264   LearningRate 0.0004   Epoch: 18   Global Step: 313040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:30,741-Speed 3320.49 samples/sec   Loss 0.1246   LearningRate 0.0004   Epoch: 18   Global Step: 313050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:33,806-Speed 3342.10 samples/sec   Loss 0.1201   LearningRate 0.0004   Epoch: 18   Global Step: 313060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:36,879-Speed 3333.05 samples/sec   Loss 0.1108   LearningRate 0.0004   Epoch: 18   Global Step: 313070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:39,953-Speed 3331.46 samples/sec   Loss 0.1220   LearningRate 0.0004   Epoch: 18   Global Step: 313080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:43,034-Speed 3324.66 samples/sec   Loss 0.1292   LearningRate 0.0004   Epoch: 18   Global Step: 313090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:46,128-Speed 3310.43 samples/sec   Loss 0.1252   LearningRate 0.0004   Epoch: 18   Global Step: 313100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:49,231-Speed 3300.92 samples/sec   Loss 0.1179   LearningRate 0.0004   Epoch: 18   Global Step: 313110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:52,307-Speed 3330.12 samples/sec   Loss 0.1383   LearningRate 0.0004   Epoch: 18   Global Step: 313120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:55,467-Speed 3240.24 samples/sec   Loss 0.1166   LearningRate 0.0004   Epoch: 18   Global Step: 313130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:22:58,567-Speed 3304.26 samples/sec   Loss 0.1105   LearningRate 0.0004   Epoch: 18   Global Step: 313140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:01,660-Speed 3311.91 samples/sec   Loss 0.1259   LearningRate 0.0004   Epoch: 18   Global Step: 313150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:04,802-Speed 3259.38 samples/sec   Loss 0.1248   LearningRate 0.0004   Epoch: 18   Global Step: 313160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:08,011-Speed 3191.90 samples/sec   Loss 0.1140   LearningRate 0.0004   Epoch: 18   Global Step: 313170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:11,086-Speed 3330.48 samples/sec   Loss 0.1131   LearningRate 0.0004   Epoch: 18   Global Step: 313180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:14,158-Speed 3334.21 samples/sec   Loss 0.1199   LearningRate 0.0004   Epoch: 18   Global Step: 313190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:17,313-Speed 3246.29 samples/sec   Loss 0.1219   LearningRate 0.0004   Epoch: 18   Global Step: 313200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:20,412-Speed 3305.20 samples/sec   Loss 0.1132   LearningRate 0.0004   Epoch: 18   Global Step: 313210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:23,571-Speed 3242.14 samples/sec   Loss 0.1197   LearningRate 0.0004   Epoch: 18   Global Step: 313220   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:23:26,659-Speed 3317.36 samples/sec   Loss 0.1259   LearningRate 0.0004   Epoch: 18   Global Step: 313230   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:23:29,851-Speed 3208.69 samples/sec   Loss 0.1234   LearningRate 0.0004   Epoch: 18   Global Step: 313240   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:23:33,023-Speed 3229.08 samples/sec   Loss 0.1328   LearningRate 0.0004   Epoch: 18   Global Step: 313250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:36,235-Speed 3188.69 samples/sec   Loss 0.1286   LearningRate 0.0004   Epoch: 18   Global Step: 313260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:39,411-Speed 3225.10 samples/sec   Loss 0.1112   LearningRate 0.0004   Epoch: 18   Global Step: 313270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:42,479-Speed 3338.14 samples/sec   Loss 0.1273   LearningRate 0.0004   Epoch: 18   Global Step: 313280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:23:45,544-Speed 3341.95 samples/sec   Loss 0.1271   LearningRate 0.0004   Epoch: 18   Global Step: 313290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:23:48,649-Speed 3298.71 samples/sec   Loss 0.1236   LearningRate 0.0004   Epoch: 18   Global Step: 313300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:23:51,722-Speed 3332.47 samples/sec   Loss 0.1302   LearningRate 0.0004   Epoch: 18   Global Step: 313310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:23:54,791-Speed 3337.63 samples/sec   Loss 0.1164   LearningRate 0.0004   Epoch: 18   Global Step: 313320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:23:57,963-Speed 3228.67 samples/sec   Loss 0.1199   LearningRate 0.0004   Epoch: 18   Global Step: 313330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:24:01,058-Speed 3309.40 samples/sec   Loss 0.1302   LearningRate 0.0004   Epoch: 18   Global Step: 313340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:24:04,142-Speed 3321.64 samples/sec   Loss 0.1249   LearningRate 0.0004   Epoch: 18   Global Step: 313350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:24:07,213-Speed 3335.19 samples/sec   Loss 0.1295   LearningRate 0.0004   Epoch: 18   Global Step: 313360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:24:10,289-Speed 3329.68 samples/sec   Loss 0.1172   LearningRate 0.0004   Epoch: 18   Global Step: 313370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:24:13,390-Speed 3303.35 samples/sec   Loss 0.1222   LearningRate 0.0004   Epoch: 18   Global Step: 313380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:24:16,462-Speed 3333.61 samples/sec   Loss 0.1187   LearningRate 0.0004   Epoch: 18   Global Step: 313390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:19,541-Speed 3326.12 samples/sec   Loss 0.1177   LearningRate 0.0004   Epoch: 18   Global Step: 313400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:22,608-Speed 3339.83 samples/sec   Loss 0.1282   LearningRate 0.0004   Epoch: 18   Global Step: 313410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:25,678-Speed 3336.21 samples/sec   Loss 0.1245   LearningRate 0.0004   Epoch: 18   Global Step: 313420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:28,764-Speed 3319.63 samples/sec   Loss 0.1140   LearningRate 0.0004   Epoch: 18   Global Step: 313430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:31,834-Speed 3335.83 samples/sec   Loss 0.1166   LearningRate 0.0004   Epoch: 18   Global Step: 313440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:34,897-Speed 3343.88 samples/sec   Loss 0.1275   LearningRate 0.0004   Epoch: 18   Global Step: 313450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:37,970-Speed 3333.67 samples/sec   Loss 0.1233   LearningRate 0.0004   Epoch: 18   Global Step: 313460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:41,037-Speed 3339.13 samples/sec   Loss 0.1233   LearningRate 0.0004   Epoch: 18   Global Step: 313470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:44,158-Speed 3281.95 samples/sec   Loss 0.1222   LearningRate 0.0004   Epoch: 18   Global Step: 313480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:47,239-Speed 3323.87 samples/sec   Loss 0.1204   LearningRate 0.0004   Epoch: 18   Global Step: 313490   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:24:50,317-Speed 3327.66 samples/sec   Loss 0.1180   LearningRate 0.0004   Epoch: 18   Global Step: 313500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:53,387-Speed 3336.91 samples/sec   Loss 0.1180   LearningRate 0.0004   Epoch: 18   Global Step: 313510   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:56,463-Speed 3329.16 samples/sec   Loss 0.1160   LearningRate 0.0004   Epoch: 18   Global Step: 313520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:24:59,538-Speed 3331.62 samples/sec   Loss 0.1166   LearningRate 0.0004   Epoch: 18   Global Step: 313530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:25:02,622-Speed 3320.48 samples/sec   Loss 0.1259   LearningRate 0.0004   Epoch: 18   Global Step: 313540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:25:05,690-Speed 3338.21 samples/sec   Loss 0.1164   LearningRate 0.0004   Epoch: 18   Global Step: 313550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:25:08,758-Speed 3339.10 samples/sec   Loss 0.1352   LearningRate 0.0004   Epoch: 18   Global Step: 313560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:25:11,946-Speed 3212.17 samples/sec   Loss 0.1261   LearningRate 0.0004   Epoch: 18   Global Step: 313570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:25:15,075-Speed 3273.73 samples/sec   Loss 0.1313   LearningRate 0.0004   Epoch: 18   Global Step: 313580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:25:18,193-Speed 3285.16 samples/sec   Loss 0.1380   LearningRate 0.0004   Epoch: 18   Global Step: 313590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:25:21,365-Speed 3228.44 samples/sec   Loss 0.1202   LearningRate 0.0004   Epoch: 18   Global Step: 313600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:24,461-Speed 3308.67 samples/sec   Loss 0.1263   LearningRate 0.0004   Epoch: 18   Global Step: 313610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:27,570-Speed 3294.31 samples/sec   Loss 0.1179   LearningRate 0.0004   Epoch: 18   Global Step: 313620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:30,661-Speed 3313.54 samples/sec   Loss 0.1334   LearningRate 0.0004   Epoch: 18   Global Step: 313630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:33,749-Speed 3317.30 samples/sec   Loss 0.1292   LearningRate 0.0004   Epoch: 18   Global Step: 313640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:36,838-Speed 3315.40 samples/sec   Loss 0.1150   LearningRate 0.0004   Epoch: 18   Global Step: 313650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:39,939-Speed 3302.53 samples/sec   Loss 0.1308   LearningRate 0.0004   Epoch: 18   Global Step: 313660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:43,010-Speed 3335.03 samples/sec   Loss 0.1245   LearningRate 0.0004   Epoch: 18   Global Step: 313670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:46,106-Speed 3308.35 samples/sec   Loss 0.1275   LearningRate 0.0004   Epoch: 18   Global Step: 313680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:49,189-Speed 3322.43 samples/sec   Loss 0.1149   LearningRate 0.0004   Epoch: 18   Global Step: 313690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:52,258-Speed 3337.76 samples/sec   Loss 0.1231   LearningRate 0.0004   Epoch: 18   Global Step: 313700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:55,332-Speed 3332.06 samples/sec   Loss 0.1188   LearningRate 0.0004   Epoch: 18   Global Step: 313710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:25:58,432-Speed 3303.82 samples/sec   Loss 0.1300   LearningRate 0.0004   Epoch: 18   Global Step: 313720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:26:01,552-Speed 3282.36 samples/sec   Loss 0.1157   LearningRate 0.0004   Epoch: 18   Global Step: 313730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:26:04,684-Speed 3270.72 samples/sec   Loss 0.1197   LearningRate 0.0004   Epoch: 18   Global Step: 313740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:26:07,795-Speed 3292.06 samples/sec   Loss 0.1242   LearningRate 0.0004   Epoch: 18   Global Step: 313750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:26:10,864-Speed 3337.78 samples/sec   Loss 0.1228   LearningRate 0.0004   Epoch: 18   Global Step: 313760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:26:13,960-Speed 3308.12 samples/sec   Loss 0.1226   LearningRate 0.0004   Epoch: 18   Global Step: 313770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:26:17,109-Speed 3251.91 samples/sec   Loss 0.1241   LearningRate 0.0004   Epoch: 18   Global Step: 313780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:26:20,194-Speed 3321.36 samples/sec   Loss 0.1173   LearningRate 0.0004   Epoch: 18   Global Step: 313790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:26:23,341-Speed 3253.99 samples/sec   Loss 0.1153   LearningRate 0.0004   Epoch: 18   Global Step: 313800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:26,435-Speed 3310.53 samples/sec   Loss 0.1285   LearningRate 0.0004   Epoch: 18   Global Step: 313810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:29,534-Speed 3304.83 samples/sec   Loss 0.1213   LearningRate 0.0004   Epoch: 18   Global Step: 313820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:32,597-Speed 3343.72 samples/sec   Loss 0.1231   LearningRate 0.0004   Epoch: 18   Global Step: 313830   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:35,685-Speed 3317.71 samples/sec   Loss 0.1348   LearningRate 0.0004   Epoch: 18   Global Step: 313840   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:38,750-Speed 3341.72 samples/sec   Loss 0.1151   LearningRate 0.0004   Epoch: 18   Global Step: 313850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:41,840-Speed 3314.63 samples/sec   Loss 0.1357   LearningRate 0.0004   Epoch: 18   Global Step: 313860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:44,903-Speed 3342.75 samples/sec   Loss 0.1210   LearningRate 0.0004   Epoch: 18   Global Step: 313870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:47,987-Speed 3322.01 samples/sec   Loss 0.1273   LearningRate 0.0004   Epoch: 18   Global Step: 313880   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:51,077-Speed 3315.18 samples/sec   Loss 0.1272   LearningRate 0.0004   Epoch: 18   Global Step: 313890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:26:54,258-Speed 3219.21 samples/sec   Loss 0.1187   LearningRate 0.0004   Epoch: 18   Global Step: 313900   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:26:57,432-Speed 3227.43 samples/sec   Loss 0.1259   LearningRate 0.0004   Epoch: 18   Global Step: 313910   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:27:00,643-Speed 3189.42 samples/sec   Loss 0.1173   LearningRate 0.0004   Epoch: 18   Global Step: 313920   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:27:03,781-Speed 3263.54 samples/sec   Loss 0.1135   LearningRate 0.0004   Epoch: 18   Global Step: 313930   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:27:06,842-Speed 3346.96 samples/sec   Loss 0.1355   LearningRate 0.0004   Epoch: 18   Global Step: 313940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:27:09,911-Speed 3337.21 samples/sec   Loss 0.1141   LearningRate 0.0004   Epoch: 18   Global Step: 313950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:27:12,997-Speed 3319.01 samples/sec   Loss 0.1283   LearningRate 0.0004   Epoch: 18   Global Step: 313960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:27:16,071-Speed 3332.71 samples/sec   Loss 0.1161   LearningRate 0.0004   Epoch: 18   Global Step: 313970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:27:19,135-Speed 3342.11 samples/sec   Loss 0.1257   LearningRate 0.0004   Epoch: 18   Global Step: 313980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:27:22,215-Speed 3326.14 samples/sec   Loss 0.1312   LearningRate 0.0004   Epoch: 18   Global Step: 313990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:27:25,289-Speed 3332.07 samples/sec   Loss 0.1275   LearningRate 0.0004   Epoch: 18   Global Step: 314000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:28:09,470-[lfw][314000]XNorm: 20.510956
Training: 2022-04-12 08:28:09,471-[lfw][314000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 08:28:09,471-[lfw][314000]Accuracy-Highest: 0.99817
Training: 2022-04-12 08:29:00,618-[cfp_fp][314000]XNorm: 22.500072
Training: 2022-04-12 08:29:00,619-[cfp_fp][314000]Accuracy-Flip: 0.99171+-0.00355
Training: 2022-04-12 08:29:00,619-[cfp_fp][314000]Accuracy-Highest: 0.99200
Training: 2022-04-12 08:29:44,690-[agedb_30][314000]XNorm: 22.635913
Training: 2022-04-12 08:29:44,691-[agedb_30][314000]Accuracy-Flip: 0.98500+-0.00645
Training: 2022-04-12 08:29:44,691-[agedb_30][314000]Accuracy-Highest: 0.98650
Training: 2022-04-12 08:29:47,761-Speed 71.87 samples/sec   Loss 0.1225   LearningRate 0.0004   Epoch: 18   Global Step: 314010   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:29:50,822-Speed 3346.15 samples/sec   Loss 0.1187   LearningRate 0.0004   Epoch: 18   Global Step: 314020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:29:53,875-Speed 3354.41 samples/sec   Loss 0.1231   LearningRate 0.0004   Epoch: 18   Global Step: 314030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:29:56,935-Speed 3347.27 samples/sec   Loss 0.1212   LearningRate 0.0004   Epoch: 18   Global Step: 314040   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:29:59,995-Speed 3347.10 samples/sec   Loss 0.1251   LearningRate 0.0004   Epoch: 18   Global Step: 314050   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:30:03,110-Speed 3288.02 samples/sec   Loss 0.1338   LearningRate 0.0004   Epoch: 18   Global Step: 314060   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:30:06,173-Speed 3343.32 samples/sec   Loss 0.1286   LearningRate 0.0004   Epoch: 18   Global Step: 314070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:30:09,297-Speed 3279.32 samples/sec   Loss 0.1324   LearningRate 0.0003   Epoch: 18   Global Step: 314080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:30:12,442-Speed 3256.01 samples/sec   Loss 0.1168   LearningRate 0.0003   Epoch: 18   Global Step: 314090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:30:15,604-Speed 3240.19 samples/sec   Loss 0.1211   LearningRate 0.0003   Epoch: 18   Global Step: 314100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:30:18,664-Speed 3346.76 samples/sec   Loss 0.1149   LearningRate 0.0003   Epoch: 18   Global Step: 314110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:30:21,723-Speed 3347.73 samples/sec   Loss 0.1295   LearningRate 0.0003   Epoch: 18   Global Step: 314120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:30:24,783-Speed 3347.16 samples/sec   Loss 0.1293   LearningRate 0.0003   Epoch: 18   Global Step: 314130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:30:27,923-Speed 3262.37 samples/sec   Loss 0.1285   LearningRate 0.0003   Epoch: 18   Global Step: 314140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:30:31,044-Speed 3281.64 samples/sec   Loss 0.1317   LearningRate 0.0003   Epoch: 18   Global Step: 314150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:30:34,170-Speed 3276.65 samples/sec   Loss 0.1536   LearningRate 0.0003   Epoch: 18   Global Step: 314160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:30:37,248-Speed 3327.44 samples/sec   Loss 0.1280   LearningRate 0.0003   Epoch: 18   Global Step: 314170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:30:40,318-Speed 3336.18 samples/sec   Loss 0.1187   LearningRate 0.0003   Epoch: 18   Global Step: 314180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:30:43,407-Speed 3316.35 samples/sec   Loss 0.1271   LearningRate 0.0003   Epoch: 18   Global Step: 314190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:30:46,472-Speed 3341.43 samples/sec   Loss 0.1155   LearningRate 0.0003   Epoch: 18   Global Step: 314200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:30:49,542-Speed 3336.49 samples/sec   Loss 0.1329   LearningRate 0.0003   Epoch: 18   Global Step: 314210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:30:52,610-Speed 3337.95 samples/sec   Loss 0.1427   LearningRate 0.0003   Epoch: 18   Global Step: 314220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:30:55,678-Speed 3338.02 samples/sec   Loss 0.1231   LearningRate 0.0003   Epoch: 18   Global Step: 314230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:30:58,746-Speed 3338.52 samples/sec   Loss 0.1218   LearningRate 0.0003   Epoch: 18   Global Step: 314240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:01,846-Speed 3304.01 samples/sec   Loss 0.1180   LearningRate 0.0003   Epoch: 18   Global Step: 314250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:04,930-Speed 3321.45 samples/sec   Loss 0.1292   LearningRate 0.0003   Epoch: 18   Global Step: 314260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:31:08,005-Speed 3331.67 samples/sec   Loss 0.1216   LearningRate 0.0003   Epoch: 18   Global Step: 314270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:31:11,082-Speed 3328.60 samples/sec   Loss 0.1177   LearningRate 0.0003   Epoch: 18   Global Step: 314280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:31:14,150-Speed 3337.84 samples/sec   Loss 0.1215   LearningRate 0.0003   Epoch: 18   Global Step: 314290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:31:17,217-Speed 3339.67 samples/sec   Loss 0.1203   LearningRate 0.0003   Epoch: 18   Global Step: 314300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:31:20,377-Speed 3240.96 samples/sec   Loss 0.1198   LearningRate 0.0003   Epoch: 18   Global Step: 314310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:31:23,437-Speed 3347.58 samples/sec   Loss 0.1267   LearningRate 0.0003   Epoch: 18   Global Step: 314320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:26,519-Speed 3322.66 samples/sec   Loss 0.1245   LearningRate 0.0003   Epoch: 18   Global Step: 314330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:29,616-Speed 3307.40 samples/sec   Loss 0.1334   LearningRate 0.0003   Epoch: 18   Global Step: 314340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:32,733-Speed 3286.78 samples/sec   Loss 0.1259   LearningRate 0.0003   Epoch: 18   Global Step: 314350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:35,842-Speed 3294.71 samples/sec   Loss 0.1321   LearningRate 0.0003   Epoch: 18   Global Step: 314360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:38,974-Speed 3269.65 samples/sec   Loss 0.1169   LearningRate 0.0003   Epoch: 18   Global Step: 314370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:42,124-Speed 3251.61 samples/sec   Loss 0.1163   LearningRate 0.0003   Epoch: 18   Global Step: 314380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:45,243-Speed 3283.68 samples/sec   Loss 0.1221   LearningRate 0.0003   Epoch: 18   Global Step: 314390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:48,340-Speed 3307.12 samples/sec   Loss 0.1169   LearningRate 0.0003   Epoch: 18   Global Step: 314400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:51,463-Speed 3280.31 samples/sec   Loss 0.1254   LearningRate 0.0003   Epoch: 18   Global Step: 314410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:31:54,587-Speed 3278.15 samples/sec   Loss 0.1258   LearningRate 0.0003   Epoch: 18   Global Step: 314420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:31:57,721-Speed 3268.88 samples/sec   Loss 0.1228   LearningRate 0.0003   Epoch: 18   Global Step: 314430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:00,962-Speed 3159.69 samples/sec   Loss 0.1226   LearningRate 0.0003   Epoch: 18   Global Step: 314440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:04,099-Speed 3265.63 samples/sec   Loss 0.1257   LearningRate 0.0003   Epoch: 18   Global Step: 314450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:07,216-Speed 3284.93 samples/sec   Loss 0.1164   LearningRate 0.0003   Epoch: 18   Global Step: 314460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:10,285-Speed 3337.59 samples/sec   Loss 0.1195   LearningRate 0.0003   Epoch: 18   Global Step: 314470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:13,372-Speed 3317.72 samples/sec   Loss 0.1270   LearningRate 0.0003   Epoch: 18   Global Step: 314480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:16,440-Speed 3338.47 samples/sec   Loss 0.1295   LearningRate 0.0003   Epoch: 18   Global Step: 314490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:19,508-Speed 3338.18 samples/sec   Loss 0.1245   LearningRate 0.0003   Epoch: 18   Global Step: 314500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:22,593-Speed 3320.96 samples/sec   Loss 0.1280   LearningRate 0.0003   Epoch: 18   Global Step: 314510   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:25,666-Speed 3333.24 samples/sec   Loss 0.1248   LearningRate 0.0003   Epoch: 18   Global Step: 314520   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:32:28,764-Speed 3306.07 samples/sec   Loss 0.1214   LearningRate 0.0003   Epoch: 18   Global Step: 314530   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:32:31,920-Speed 3245.30 samples/sec   Loss 0.1288   LearningRate 0.0003   Epoch: 18   Global Step: 314540   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:32:35,038-Speed 3285.63 samples/sec   Loss 0.1248   LearningRate 0.0003   Epoch: 18   Global Step: 314550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:38,147-Speed 3293.81 samples/sec   Loss 0.1324   LearningRate 0.0003   Epoch: 18   Global Step: 314560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:41,234-Speed 3318.37 samples/sec   Loss 0.1322   LearningRate 0.0003   Epoch: 18   Global Step: 314570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:44,404-Speed 3230.41 samples/sec   Loss 0.1237   LearningRate 0.0003   Epoch: 18   Global Step: 314580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:47,523-Speed 3284.68 samples/sec   Loss 0.1188   LearningRate 0.0003   Epoch: 18   Global Step: 314590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:50,593-Speed 3335.66 samples/sec   Loss 0.1290   LearningRate 0.0003   Epoch: 18   Global Step: 314600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:53,659-Speed 3341.31 samples/sec   Loss 0.1074   LearningRate 0.0003   Epoch: 18   Global Step: 314610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:56,743-Speed 3320.34 samples/sec   Loss 0.1216   LearningRate 0.0003   Epoch: 18   Global Step: 314620   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:32:59,858-Speed 3288.61 samples/sec   Loss 0.1127   LearningRate 0.0003   Epoch: 18   Global Step: 314630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:02,999-Speed 3260.95 samples/sec   Loss 0.1256   LearningRate 0.0003   Epoch: 18   Global Step: 314640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:06,120-Speed 3281.37 samples/sec   Loss 0.1141   LearningRate 0.0003   Epoch: 18   Global Step: 314650   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:33:09,253-Speed 3268.82 samples/sec   Loss 0.1181   LearningRate 0.0003   Epoch: 18   Global Step: 314660   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:33:12,350-Speed 3307.42 samples/sec   Loss 0.1259   LearningRate 0.0003   Epoch: 18   Global Step: 314670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:15,448-Speed 3306.16 samples/sec   Loss 0.1254   LearningRate 0.0003   Epoch: 18   Global Step: 314680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:18,518-Speed 3336.30 samples/sec   Loss 0.1343   LearningRate 0.0003   Epoch: 18   Global Step: 314690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:21,741-Speed 3178.22 samples/sec   Loss 0.1091   LearningRate 0.0003   Epoch: 18   Global Step: 314700   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:24,825-Speed 3320.90 samples/sec   Loss 0.1253   LearningRate 0.0003   Epoch: 18   Global Step: 314710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:27,910-Speed 3320.75 samples/sec   Loss 0.1350   LearningRate 0.0003   Epoch: 18   Global Step: 314720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:30,988-Speed 3326.76 samples/sec   Loss 0.1253   LearningRate 0.0003   Epoch: 18   Global Step: 314730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:34,049-Speed 3346.85 samples/sec   Loss 0.1273   LearningRate 0.0003   Epoch: 18   Global Step: 314740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:33:37,111-Speed 3344.06 samples/sec   Loss 0.1221   LearningRate 0.0003   Epoch: 18   Global Step: 314750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:33:40,186-Speed 3331.82 samples/sec   Loss 0.1172   LearningRate 0.0003   Epoch: 18   Global Step: 314760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:33:43,249-Speed 3343.44 samples/sec   Loss 0.1248   LearningRate 0.0003   Epoch: 18   Global Step: 314770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:33:46,335-Speed 3319.43 samples/sec   Loss 0.1256   LearningRate 0.0003   Epoch: 18   Global Step: 314780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:33:49,415-Speed 3324.57 samples/sec   Loss 0.1162   LearningRate 0.0003   Epoch: 18   Global Step: 314790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:33:52,498-Speed 3322.32 samples/sec   Loss 0.1294   LearningRate 0.0003   Epoch: 18   Global Step: 314800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:33:55,637-Speed 3263.74 samples/sec   Loss 0.1281   LearningRate 0.0003   Epoch: 18   Global Step: 314810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:33:58,753-Speed 3286.21 samples/sec   Loss 0.1354   LearningRate 0.0003   Epoch: 18   Global Step: 314820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:01,918-Speed 3236.02 samples/sec   Loss 0.1220   LearningRate 0.0003   Epoch: 18   Global Step: 314830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:04,997-Speed 3326.74 samples/sec   Loss 0.1226   LearningRate 0.0003   Epoch: 18   Global Step: 314840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:08,102-Speed 3299.14 samples/sec   Loss 0.1231   LearningRate 0.0003   Epoch: 18   Global Step: 314850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:34:11,193-Speed 3313.57 samples/sec   Loss 0.1349   LearningRate 0.0003   Epoch: 18   Global Step: 314860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:34:14,278-Speed 3320.29 samples/sec   Loss 0.1267   LearningRate 0.0003   Epoch: 18   Global Step: 314870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:34:17,396-Speed 3285.50 samples/sec   Loss 0.1300   LearningRate 0.0003   Epoch: 18   Global Step: 314880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:20,495-Speed 3304.16 samples/sec   Loss 0.1274   LearningRate 0.0003   Epoch: 18   Global Step: 314890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:23,673-Speed 3222.70 samples/sec   Loss 0.1108   LearningRate 0.0003   Epoch: 18   Global Step: 314900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:26,808-Speed 3267.60 samples/sec   Loss 0.1270   LearningRate 0.0003   Epoch: 18   Global Step: 314910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:29,929-Speed 3281.23 samples/sec   Loss 0.1112   LearningRate 0.0003   Epoch: 18   Global Step: 314920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:32,999-Speed 3336.18 samples/sec   Loss 0.1310   LearningRate 0.0003   Epoch: 18   Global Step: 314930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:36,121-Speed 3281.02 samples/sec   Loss 0.1330   LearningRate 0.0003   Epoch: 18   Global Step: 314940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:39,211-Speed 3314.88 samples/sec   Loss 0.1310   LearningRate 0.0003   Epoch: 18   Global Step: 314950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:42,396-Speed 3216.33 samples/sec   Loss 0.1148   LearningRate 0.0003   Epoch: 18   Global Step: 314960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:45,526-Speed 3271.91 samples/sec   Loss 0.1267   LearningRate 0.0003   Epoch: 18   Global Step: 314970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:34:48,600-Speed 3332.08 samples/sec   Loss 0.1294   LearningRate 0.0003   Epoch: 18   Global Step: 314980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:34:51,690-Speed 3314.23 samples/sec   Loss 0.1165   LearningRate 0.0003   Epoch: 18   Global Step: 314990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:34:54,766-Speed 3329.51 samples/sec   Loss 0.1223   LearningRate 0.0003   Epoch: 18   Global Step: 315000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:34:57,859-Speed 3312.09 samples/sec   Loss 0.1178   LearningRate 0.0003   Epoch: 18   Global Step: 315010   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:00,933-Speed 3331.77 samples/sec   Loss 0.1204   LearningRate 0.0003   Epoch: 18   Global Step: 315020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:04,036-Speed 3301.31 samples/sec   Loss 0.1230   LearningRate 0.0003   Epoch: 18   Global Step: 315030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:07,109-Speed 3333.00 samples/sec   Loss 0.1283   LearningRate 0.0003   Epoch: 18   Global Step: 315040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:10,182-Speed 3332.90 samples/sec   Loss 0.1255   LearningRate 0.0003   Epoch: 18   Global Step: 315050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:13,265-Speed 3322.33 samples/sec   Loss 0.1298   LearningRate 0.0003   Epoch: 18   Global Step: 315060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:16,334-Speed 3336.69 samples/sec   Loss 0.1234   LearningRate 0.0003   Epoch: 18   Global Step: 315070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:19,408-Speed 3332.01 samples/sec   Loss 0.1212   LearningRate 0.0003   Epoch: 18   Global Step: 315080   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:35:22,459-Speed 3356.67 samples/sec   Loss 0.1239   LearningRate 0.0003   Epoch: 18   Global Step: 315090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:25,528-Speed 3338.18 samples/sec   Loss 0.1235   LearningRate 0.0003   Epoch: 18   Global Step: 315100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:28,610-Speed 3323.46 samples/sec   Loss 0.1313   LearningRate 0.0003   Epoch: 18   Global Step: 315110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:35:31,661-Speed 3356.51 samples/sec   Loss 0.1262   LearningRate 0.0003   Epoch: 18   Global Step: 315120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:35:34,729-Speed 3342.60 samples/sec   Loss 0.1254   LearningRate 0.0003   Epoch: 18   Global Step: 315130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:35:37,815-Speed 3318.38 samples/sec   Loss 0.1219   LearningRate 0.0003   Epoch: 18   Global Step: 315140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:35:40,922-Speed 3296.38 samples/sec   Loss 0.1143   LearningRate 0.0003   Epoch: 18   Global Step: 315150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:35:43,989-Speed 3339.71 samples/sec   Loss 0.1199   LearningRate 0.0003   Epoch: 18   Global Step: 315160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:35:47,056-Speed 3339.91 samples/sec   Loss 0.1289   LearningRate 0.0003   Epoch: 18   Global Step: 315170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:35:50,221-Speed 3236.21 samples/sec   Loss 0.1250   LearningRate 0.0003   Epoch: 18   Global Step: 315180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:35:53,417-Speed 3204.66 samples/sec   Loss 0.1246   LearningRate 0.0003   Epoch: 18   Global Step: 315190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:35:56,486-Speed 3337.22 samples/sec   Loss 0.1259   LearningRate 0.0003   Epoch: 18   Global Step: 315200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:35:59,572-Speed 3319.05 samples/sec   Loss 0.1234   LearningRate 0.0003   Epoch: 18   Global Step: 315210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:02,649-Speed 3329.35 samples/sec   Loss 0.1199   LearningRate 0.0003   Epoch: 18   Global Step: 315220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:36:05,707-Speed 3349.21 samples/sec   Loss 0.1226   LearningRate 0.0003   Epoch: 18   Global Step: 315230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:08,832-Speed 3276.81 samples/sec   Loss 0.1238   LearningRate 0.0003   Epoch: 18   Global Step: 315240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:11,907-Speed 3331.46 samples/sec   Loss 0.1273   LearningRate 0.0003   Epoch: 18   Global Step: 315250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:14,975-Speed 3338.56 samples/sec   Loss 0.1305   LearningRate 0.0003   Epoch: 18   Global Step: 315260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:18,046-Speed 3334.13 samples/sec   Loss 0.1188   LearningRate 0.0003   Epoch: 18   Global Step: 315270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:21,118-Speed 3335.12 samples/sec   Loss 0.1246   LearningRate 0.0003   Epoch: 18   Global Step: 315280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:24,197-Speed 3325.90 samples/sec   Loss 0.1316   LearningRate 0.0003   Epoch: 18   Global Step: 315290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:27,265-Speed 3339.35 samples/sec   Loss 0.1357   LearningRate 0.0003   Epoch: 18   Global Step: 315300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:30,346-Speed 3323.48 samples/sec   Loss 0.1338   LearningRate 0.0003   Epoch: 18   Global Step: 315310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:33,416-Speed 3336.73 samples/sec   Loss 0.1220   LearningRate 0.0003   Epoch: 18   Global Step: 315320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:36:36,485-Speed 3337.34 samples/sec   Loss 0.1194   LearningRate 0.0003   Epoch: 18   Global Step: 315330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:36:39,573-Speed 3316.07 samples/sec   Loss 0.1132   LearningRate 0.0003   Epoch: 18   Global Step: 315340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:36:42,649-Speed 3330.06 samples/sec   Loss 0.1352   LearningRate 0.0003   Epoch: 18   Global Step: 315350   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:36:45,731-Speed 3323.99 samples/sec   Loss 0.1219   LearningRate 0.0003   Epoch: 18   Global Step: 315360   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:36:48,796-Speed 3341.37 samples/sec   Loss 0.1259   LearningRate 0.0003   Epoch: 18   Global Step: 315370   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:36:51,865-Speed 3337.23 samples/sec   Loss 0.1148   LearningRate 0.0003   Epoch: 18   Global Step: 315380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:36:54,972-Speed 3297.16 samples/sec   Loss 0.1218   LearningRate 0.0003   Epoch: 18   Global Step: 315390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:36:58,039-Speed 3339.50 samples/sec   Loss 0.1367   LearningRate 0.0003   Epoch: 18   Global Step: 315400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:37:01,106-Speed 3338.98 samples/sec   Loss 0.1223   LearningRate 0.0003   Epoch: 18   Global Step: 315410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:37:04,169-Speed 3344.30 samples/sec   Loss 0.1131   LearningRate 0.0003   Epoch: 18   Global Step: 315420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:37:07,235-Speed 3340.21 samples/sec   Loss 0.1305   LearningRate 0.0003   Epoch: 18   Global Step: 315430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:37:10,293-Speed 3349.49 samples/sec   Loss 0.1209   LearningRate 0.0003   Epoch: 18   Global Step: 315440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:13,375-Speed 3323.04 samples/sec   Loss 0.1335   LearningRate 0.0003   Epoch: 18   Global Step: 315450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:16,481-Speed 3298.51 samples/sec   Loss 0.1247   LearningRate 0.0003   Epoch: 18   Global Step: 315460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:19,575-Speed 3309.54 samples/sec   Loss 0.1208   LearningRate 0.0003   Epoch: 18   Global Step: 315470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:22,656-Speed 3324.79 samples/sec   Loss 0.1141   LearningRate 0.0003   Epoch: 18   Global Step: 315480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:25,739-Speed 3322.01 samples/sec   Loss 0.1333   LearningRate 0.0003   Epoch: 18   Global Step: 315490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:28,802-Speed 3344.25 samples/sec   Loss 0.1225   LearningRate 0.0003   Epoch: 18   Global Step: 315500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:31,867-Speed 3341.56 samples/sec   Loss 0.1182   LearningRate 0.0003   Epoch: 18   Global Step: 315510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:34,975-Speed 3294.90 samples/sec   Loss 0.1247   LearningRate 0.0003   Epoch: 18   Global Step: 315520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:38,053-Speed 3327.81 samples/sec   Loss 0.1132   LearningRate 0.0003   Epoch: 18   Global Step: 315530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:37:41,165-Speed 3291.35 samples/sec   Loss 0.1347   LearningRate 0.0003   Epoch: 18   Global Step: 315540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:37:44,234-Speed 3337.47 samples/sec   Loss 0.1274   LearningRate 0.0003   Epoch: 18   Global Step: 315550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:37:47,380-Speed 3256.02 samples/sec   Loss 0.1230   LearningRate 0.0003   Epoch: 18   Global Step: 315560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:37:50,609-Speed 3171.26 samples/sec   Loss 0.1262   LearningRate 0.0003   Epoch: 18   Global Step: 315570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:37:53,762-Speed 3249.13 samples/sec   Loss 0.1204   LearningRate 0.0003   Epoch: 18   Global Step: 315580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:37:56,935-Speed 3227.60 samples/sec   Loss 0.1279   LearningRate 0.0003   Epoch: 18   Global Step: 315590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:00,062-Speed 3274.96 samples/sec   Loss 0.1307   LearningRate 0.0003   Epoch: 18   Global Step: 315600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:03,159-Speed 3307.58 samples/sec   Loss 0.1023   LearningRate 0.0003   Epoch: 18   Global Step: 315610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:06,251-Speed 3312.41 samples/sec   Loss 0.1221   LearningRate 0.0003   Epoch: 18   Global Step: 315620   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:09,366-Speed 3289.28 samples/sec   Loss 0.1292   LearningRate 0.0003   Epoch: 18   Global Step: 315630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:12,466-Speed 3303.19 samples/sec   Loss 0.1322   LearningRate 0.0003   Epoch: 18   Global Step: 315640   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:38:15,572-Speed 3297.83 samples/sec   Loss 0.1203   LearningRate 0.0003   Epoch: 18   Global Step: 315650   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:38:18,689-Speed 3285.87 samples/sec   Loss 0.1215   LearningRate 0.0003   Epoch: 18   Global Step: 315660   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:38:21,744-Speed 3352.70 samples/sec   Loss 0.1273   LearningRate 0.0003   Epoch: 18   Global Step: 315670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:24,820-Speed 3330.17 samples/sec   Loss 0.1171   LearningRate 0.0003   Epoch: 18   Global Step: 315680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:27,889-Speed 3336.54 samples/sec   Loss 0.1238   LearningRate 0.0003   Epoch: 18   Global Step: 315690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:30,973-Speed 3321.03 samples/sec   Loss 0.1259   LearningRate 0.0003   Epoch: 18   Global Step: 315700   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:34,119-Speed 3256.13 samples/sec   Loss 0.1207   LearningRate 0.0003   Epoch: 18   Global Step: 315710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:37,188-Speed 3337.77 samples/sec   Loss 0.1242   LearningRate 0.0003   Epoch: 18   Global Step: 315720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:40,274-Speed 3319.46 samples/sec   Loss 0.1192   LearningRate 0.0003   Epoch: 18   Global Step: 315730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:43,339-Speed 3341.78 samples/sec   Loss 0.1212   LearningRate 0.0003   Epoch: 18   Global Step: 315740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:46,420-Speed 3323.67 samples/sec   Loss 0.1166   LearningRate 0.0003   Epoch: 18   Global Step: 315750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:49,542-Speed 3280.80 samples/sec   Loss 0.1307   LearningRate 0.0003   Epoch: 18   Global Step: 315760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:52,610-Speed 3337.90 samples/sec   Loss 0.1159   LearningRate 0.0003   Epoch: 18   Global Step: 315770   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:38:55,692-Speed 3324.29 samples/sec   Loss 0.1251   LearningRate 0.0003   Epoch: 18   Global Step: 315780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:38:58,763-Speed 3334.26 samples/sec   Loss 0.1309   LearningRate 0.0003   Epoch: 18   Global Step: 315790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:01,856-Speed 3312.53 samples/sec   Loss 0.1200   LearningRate 0.0003   Epoch: 18   Global Step: 315800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:04,927-Speed 3334.18 samples/sec   Loss 0.1213   LearningRate 0.0003   Epoch: 18   Global Step: 315810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:07,993-Speed 3341.75 samples/sec   Loss 0.1290   LearningRate 0.0003   Epoch: 18   Global Step: 315820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:11,072-Speed 3326.05 samples/sec   Loss 0.1239   LearningRate 0.0003   Epoch: 18   Global Step: 315830   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:14,141-Speed 3337.74 samples/sec   Loss 0.1263   LearningRate 0.0003   Epoch: 18   Global Step: 315840   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:17,246-Speed 3298.66 samples/sec   Loss 0.1067   LearningRate 0.0003   Epoch: 18   Global Step: 315850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:20,382-Speed 3266.08 samples/sec   Loss 0.1279   LearningRate 0.0003   Epoch: 18   Global Step: 315860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:23,456-Speed 3331.06 samples/sec   Loss 0.1233   LearningRate 0.0003   Epoch: 18   Global Step: 315870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:26,513-Speed 3351.29 samples/sec   Loss 0.1213   LearningRate 0.0003   Epoch: 18   Global Step: 315880   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:29,663-Speed 3251.59 samples/sec   Loss 0.1276   LearningRate 0.0003   Epoch: 18   Global Step: 315890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:39:32,726-Speed 3344.04 samples/sec   Loss 0.1178   LearningRate 0.0003   Epoch: 18   Global Step: 315900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:39:35,824-Speed 3305.80 samples/sec   Loss 0.1223   LearningRate 0.0003   Epoch: 18   Global Step: 315910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:39:38,956-Speed 3270.39 samples/sec   Loss 0.1206   LearningRate 0.0003   Epoch: 18   Global Step: 315920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:39:42,062-Speed 3297.53 samples/sec   Loss 0.1227   LearningRate 0.0003   Epoch: 18   Global Step: 315930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:39:45,134-Speed 3333.97 samples/sec   Loss 0.1222   LearningRate 0.0003   Epoch: 18   Global Step: 315940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:39:48,209-Speed 3330.82 samples/sec   Loss 0.1100   LearningRate 0.0003   Epoch: 18   Global Step: 315950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:39:51,287-Speed 3327.65 samples/sec   Loss 0.1280   LearningRate 0.0003   Epoch: 18   Global Step: 315960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:39:54,441-Speed 3247.15 samples/sec   Loss 0.1185   LearningRate 0.0003   Epoch: 18   Global Step: 315970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:39:57,539-Speed 3306.21 samples/sec   Loss 0.1195   LearningRate 0.0003   Epoch: 18   Global Step: 315980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:40:00,643-Speed 3300.03 samples/sec   Loss 0.1215   LearningRate 0.0003   Epoch: 18   Global Step: 315990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:40:03,709-Speed 3340.97 samples/sec   Loss 0.1262   LearningRate 0.0003   Epoch: 18   Global Step: 316000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:40:47,524-[lfw][316000]XNorm: 20.722701
Training: 2022-04-12 08:40:47,525-[lfw][316000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-12 08:40:47,525-[lfw][316000]Accuracy-Highest: 0.99817
Training: 2022-04-12 08:41:38,617-[cfp_fp][316000]XNorm: 22.582108
Training: 2022-04-12 08:41:38,617-[cfp_fp][316000]Accuracy-Flip: 0.99157+-0.00370
Training: 2022-04-12 08:41:38,618-[cfp_fp][316000]Accuracy-Highest: 0.99200
Training: 2022-04-12 08:42:22,395-[agedb_30][316000]XNorm: 22.778612
Training: 2022-04-12 08:42:22,395-[agedb_30][316000]Accuracy-Flip: 0.98650+-0.00565
Training: 2022-04-12 08:42:22,396-[agedb_30][316000]Accuracy-Highest: 0.98650
Training: 2022-04-12 08:42:25,465-Speed 72.24 samples/sec   Loss 0.1300   LearningRate 0.0003   Epoch: 18   Global Step: 316010   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:28,552-Speed 3318.01 samples/sec   Loss 0.1143   LearningRate 0.0003   Epoch: 18   Global Step: 316020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:31,627-Speed 3331.66 samples/sec   Loss 0.1182   LearningRate 0.0003   Epoch: 18   Global Step: 316030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:34,686-Speed 3347.90 samples/sec   Loss 0.1125   LearningRate 0.0003   Epoch: 18   Global Step: 316040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:37,743-Speed 3349.86 samples/sec   Loss 0.1272   LearningRate 0.0003   Epoch: 18   Global Step: 316050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:40,798-Speed 3353.12 samples/sec   Loss 0.1259   LearningRate 0.0003   Epoch: 18   Global Step: 316060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:43,909-Speed 3292.45 samples/sec   Loss 0.1337   LearningRate 0.0003   Epoch: 18   Global Step: 316070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:47,002-Speed 3310.86 samples/sec   Loss 0.1244   LearningRate 0.0003   Epoch: 18   Global Step: 316080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:50,095-Speed 3312.27 samples/sec   Loss 0.1281   LearningRate 0.0003   Epoch: 18   Global Step: 316090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:53,158-Speed 3343.65 samples/sec   Loss 0.1301   LearningRate 0.0003   Epoch: 18   Global Step: 316100   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:42:56,238-Speed 3325.57 samples/sec   Loss 0.1230   LearningRate 0.0003   Epoch: 18   Global Step: 316110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:42:59,343-Speed 3298.14 samples/sec   Loss 0.1168   LearningRate 0.0003   Epoch: 18   Global Step: 316120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:02,408-Speed 3341.94 samples/sec   Loss 0.1142   LearningRate 0.0003   Epoch: 18   Global Step: 316130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:05,479-Speed 3335.55 samples/sec   Loss 0.1161   LearningRate 0.0003   Epoch: 18   Global Step: 316140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:08,551-Speed 3333.92 samples/sec   Loss 0.1263   LearningRate 0.0003   Epoch: 18   Global Step: 316150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:11,652-Speed 3302.79 samples/sec   Loss 0.1214   LearningRate 0.0003   Epoch: 18   Global Step: 316160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:14,755-Speed 3300.26 samples/sec   Loss 0.1256   LearningRate 0.0003   Epoch: 18   Global Step: 316170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:17,925-Speed 3232.12 samples/sec   Loss 0.1321   LearningRate 0.0003   Epoch: 18   Global Step: 316180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:20,983-Speed 3348.84 samples/sec   Loss 0.1249   LearningRate 0.0003   Epoch: 18   Global Step: 316190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:24,140-Speed 3244.01 samples/sec   Loss 0.1319   LearningRate 0.0003   Epoch: 18   Global Step: 316200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:27,282-Speed 3259.78 samples/sec   Loss 0.1241   LearningRate 0.0003   Epoch: 18   Global Step: 316210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:30,366-Speed 3321.22 samples/sec   Loss 0.1164   LearningRate 0.0003   Epoch: 18   Global Step: 316220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:33,474-Speed 3295.02 samples/sec   Loss 0.1171   LearningRate 0.0003   Epoch: 18   Global Step: 316230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:36,542-Speed 3339.00 samples/sec   Loss 0.1253   LearningRate 0.0003   Epoch: 18   Global Step: 316240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:39,669-Speed 3275.10 samples/sec   Loss 0.1106   LearningRate 0.0003   Epoch: 18   Global Step: 316250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:42,835-Speed 3235.10 samples/sec   Loss 0.1285   LearningRate 0.0003   Epoch: 18   Global Step: 316260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:45,908-Speed 3333.05 samples/sec   Loss 0.1276   LearningRate 0.0003   Epoch: 18   Global Step: 316270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:48,996-Speed 3316.81 samples/sec   Loss 0.1221   LearningRate 0.0003   Epoch: 18   Global Step: 316280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:52,102-Speed 3298.42 samples/sec   Loss 0.1223   LearningRate 0.0003   Epoch: 18   Global Step: 316290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:55,228-Speed 3275.41 samples/sec   Loss 0.1162   LearningRate 0.0003   Epoch: 18   Global Step: 316300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:43:58,321-Speed 3311.57 samples/sec   Loss 0.1160   LearningRate 0.0003   Epoch: 18   Global Step: 316310   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:44:01,406-Speed 3320.37 samples/sec   Loss 0.1196   LearningRate 0.0003   Epoch: 18   Global Step: 316320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:04,492-Speed 3318.96 samples/sec   Loss 0.1220   LearningRate 0.0003   Epoch: 18   Global Step: 316330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:07,554-Speed 3345.09 samples/sec   Loss 0.1249   LearningRate 0.0003   Epoch: 18   Global Step: 316340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:10,648-Speed 3311.13 samples/sec   Loss 0.1227   LearningRate 0.0003   Epoch: 18   Global Step: 316350   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:13,721-Speed 3332.57 samples/sec   Loss 0.1187   LearningRate 0.0003   Epoch: 18   Global Step: 316360   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:16,882-Speed 3240.51 samples/sec   Loss 0.1370   LearningRate 0.0003   Epoch: 18   Global Step: 316370   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:20,113-Speed 3169.29 samples/sec   Loss 0.1223   LearningRate 0.0003   Epoch: 18   Global Step: 316380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:23,192-Speed 3326.57 samples/sec   Loss 0.1178   LearningRate 0.0003   Epoch: 18   Global Step: 316390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:26,271-Speed 3326.13 samples/sec   Loss 0.1406   LearningRate 0.0003   Epoch: 18   Global Step: 316400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:29,353-Speed 3324.34 samples/sec   Loss 0.1171   LearningRate 0.0003   Epoch: 18   Global Step: 316410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:44:32,424-Speed 3334.49 samples/sec   Loss 0.1254   LearningRate 0.0003   Epoch: 18   Global Step: 316420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:44:35,551-Speed 3276.14 samples/sec   Loss 0.1191   LearningRate 0.0003   Epoch: 18   Global Step: 316430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:44:38,609-Speed 3348.84 samples/sec   Loss 0.1210   LearningRate 0.0003   Epoch: 18   Global Step: 316440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:44:41,671-Speed 3345.79 samples/sec   Loss 0.1230   LearningRate 0.0003   Epoch: 18   Global Step: 316450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:44:44,734-Speed 3342.91 samples/sec   Loss 0.1100   LearningRate 0.0003   Epoch: 18   Global Step: 316460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:44:47,913-Speed 3222.60 samples/sec   Loss 0.1141   LearningRate 0.0003   Epoch: 18   Global Step: 316470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:44:51,037-Speed 3277.82 samples/sec   Loss 0.1172   LearningRate 0.0003   Epoch: 18   Global Step: 316480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:44:54,108-Speed 3335.46 samples/sec   Loss 0.1264   LearningRate 0.0003   Epoch: 18   Global Step: 316490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:44:57,279-Speed 3229.99 samples/sec   Loss 0.1200   LearningRate 0.0003   Epoch: 18   Global Step: 316500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:45:00,417-Speed 3264.10 samples/sec   Loss 0.1257   LearningRate 0.0003   Epoch: 18   Global Step: 316510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:45:03,488-Speed 3335.04 samples/sec   Loss 0.1269   LearningRate 0.0003   Epoch: 18   Global Step: 316520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:06,554-Speed 3340.90 samples/sec   Loss 0.1167   LearningRate 0.0003   Epoch: 18   Global Step: 316530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:09,617-Speed 3344.10 samples/sec   Loss 0.1189   LearningRate 0.0003   Epoch: 18   Global Step: 316540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:12,679-Speed 3344.29 samples/sec   Loss 0.1353   LearningRate 0.0003   Epoch: 18   Global Step: 316550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:15,792-Speed 3290.51 samples/sec   Loss 0.1165   LearningRate 0.0003   Epoch: 18   Global Step: 316560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:18,912-Speed 3283.07 samples/sec   Loss 0.1404   LearningRate 0.0003   Epoch: 18   Global Step: 316570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:22,023-Speed 3291.58 samples/sec   Loss 0.1307   LearningRate 0.0003   Epoch: 18   Global Step: 316580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:25,092-Speed 3336.93 samples/sec   Loss 0.1257   LearningRate 0.0003   Epoch: 18   Global Step: 316590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:28,166-Speed 3332.19 samples/sec   Loss 0.1204   LearningRate 0.0003   Epoch: 18   Global Step: 316600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:31,258-Speed 3313.53 samples/sec   Loss 0.1311   LearningRate 0.0003   Epoch: 18   Global Step: 316610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:34,324-Speed 3340.49 samples/sec   Loss 0.1298   LearningRate 0.0003   Epoch: 18   Global Step: 316620   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:45:37,374-Speed 3357.42 samples/sec   Loss 0.1349   LearningRate 0.0003   Epoch: 18   Global Step: 316630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:40,531-Speed 3244.90 samples/sec   Loss 0.1293   LearningRate 0.0003   Epoch: 18   Global Step: 316640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:43,635-Speed 3299.59 samples/sec   Loss 0.1262   LearningRate 0.0003   Epoch: 18   Global Step: 316650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:46,762-Speed 3274.90 samples/sec   Loss 0.1231   LearningRate 0.0003   Epoch: 18   Global Step: 316660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:49,827-Speed 3342.01 samples/sec   Loss 0.1176   LearningRate 0.0003   Epoch: 18   Global Step: 316670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:52,900-Speed 3333.18 samples/sec   Loss 0.1175   LearningRate 0.0003   Epoch: 18   Global Step: 316680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:55,967-Speed 3339.21 samples/sec   Loss 0.1195   LearningRate 0.0003   Epoch: 18   Global Step: 316690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:45:59,032-Speed 3341.97 samples/sec   Loss 0.1236   LearningRate 0.0003   Epoch: 18   Global Step: 316700   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:02,104-Speed 3333.87 samples/sec   Loss 0.1173   LearningRate 0.0003   Epoch: 18   Global Step: 316710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:05,169-Speed 3342.15 samples/sec   Loss 0.1205   LearningRate 0.0003   Epoch: 18   Global Step: 316720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:08,245-Speed 3329.88 samples/sec   Loss 0.1256   LearningRate 0.0003   Epoch: 18   Global Step: 316730   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:46:11,326-Speed 3323.44 samples/sec   Loss 0.1181   LearningRate 0.0003   Epoch: 18   Global Step: 316740   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:46:14,423-Speed 3307.34 samples/sec   Loss 0.1055   LearningRate 0.0003   Epoch: 18   Global Step: 316750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:17,653-Speed 3171.06 samples/sec   Loss 0.1197   LearningRate 0.0003   Epoch: 18   Global Step: 316760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:20,797-Speed 3258.32 samples/sec   Loss 0.1246   LearningRate 0.0003   Epoch: 18   Global Step: 316770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:23,972-Speed 3226.12 samples/sec   Loss 0.1296   LearningRate 0.0003   Epoch: 18   Global Step: 316780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:27,051-Speed 3326.42 samples/sec   Loss 0.1260   LearningRate 0.0003   Epoch: 18   Global Step: 316790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:30,116-Speed 3341.13 samples/sec   Loss 0.1200   LearningRate 0.0003   Epoch: 18   Global Step: 316800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:33,189-Speed 3333.11 samples/sec   Loss 0.1189   LearningRate 0.0003   Epoch: 18   Global Step: 316810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:36,256-Speed 3339.95 samples/sec   Loss 0.1273   LearningRate 0.0003   Epoch: 18   Global Step: 316820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:39,316-Speed 3346.97 samples/sec   Loss 0.1116   LearningRate 0.0003   Epoch: 18   Global Step: 316830   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:42,386-Speed 3335.89 samples/sec   Loss 0.1238   LearningRate 0.0003   Epoch: 18   Global Step: 316840   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:45,488-Speed 3301.88 samples/sec   Loss 0.1211   LearningRate 0.0003   Epoch: 18   Global Step: 316850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:48,580-Speed 3313.28 samples/sec   Loss 0.1213   LearningRate 0.0003   Epoch: 18   Global Step: 316860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:51,669-Speed 3315.17 samples/sec   Loss 0.1287   LearningRate 0.0003   Epoch: 18   Global Step: 316870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:54,753-Speed 3320.69 samples/sec   Loss 0.1290   LearningRate 0.0003   Epoch: 18   Global Step: 316880   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:46:57,818-Speed 3342.61 samples/sec   Loss 0.1336   LearningRate 0.0003   Epoch: 18   Global Step: 316890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:00,892-Speed 3331.44 samples/sec   Loss 0.1249   LearningRate 0.0003   Epoch: 18   Global Step: 316900   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:03,959-Speed 3339.96 samples/sec   Loss 0.1256   LearningRate 0.0003   Epoch: 18   Global Step: 316910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:07,038-Speed 3325.67 samples/sec   Loss 0.1237   LearningRate 0.0003   Epoch: 18   Global Step: 316920   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:10,105-Speed 3340.04 samples/sec   Loss 0.1289   LearningRate 0.0003   Epoch: 18   Global Step: 316930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:13,176-Speed 3335.34 samples/sec   Loss 0.1199   LearningRate 0.0003   Epoch: 18   Global Step: 316940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:16,258-Speed 3323.48 samples/sec   Loss 0.1108   LearningRate 0.0003   Epoch: 18   Global Step: 316950   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:47:19,305-Speed 3360.90 samples/sec   Loss 0.1360   LearningRate 0.0003   Epoch: 18   Global Step: 316960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:22,388-Speed 3322.87 samples/sec   Loss 0.1087   LearningRate 0.0003   Epoch: 18   Global Step: 316970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:25,450-Speed 3344.47 samples/sec   Loss 0.1175   LearningRate 0.0003   Epoch: 18   Global Step: 316980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:28,521-Speed 3335.48 samples/sec   Loss 0.1229   LearningRate 0.0003   Epoch: 18   Global Step: 316990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:31,653-Speed 3269.90 samples/sec   Loss 0.1135   LearningRate 0.0003   Epoch: 18   Global Step: 317000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:34,801-Speed 3253.80 samples/sec   Loss 0.1208   LearningRate 0.0003   Epoch: 18   Global Step: 317010   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:37,876-Speed 3330.95 samples/sec   Loss 0.1268   LearningRate 0.0003   Epoch: 18   Global Step: 317020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:40,943-Speed 3338.68 samples/sec   Loss 0.1184   LearningRate 0.0003   Epoch: 18   Global Step: 317030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:44,027-Speed 3322.01 samples/sec   Loss 0.1294   LearningRate 0.0003   Epoch: 18   Global Step: 317040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:47:47,092-Speed 3341.49 samples/sec   Loss 0.1257   LearningRate 0.0003   Epoch: 18   Global Step: 317050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:47:50,164-Speed 3334.20 samples/sec   Loss 0.1223   LearningRate 0.0003   Epoch: 18   Global Step: 317060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:47:53,283-Speed 3284.28 samples/sec   Loss 0.1236   LearningRate 0.0003   Epoch: 18   Global Step: 317070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:47:56,349-Speed 3340.45 samples/sec   Loss 0.1266   LearningRate 0.0003   Epoch: 18   Global Step: 317080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:47:59,418-Speed 3336.82 samples/sec   Loss 0.1301   LearningRate 0.0003   Epoch: 18   Global Step: 317090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:48:02,585-Speed 3234.41 samples/sec   Loss 0.1224   LearningRate 0.0003   Epoch: 18   Global Step: 317100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:48:05,682-Speed 3307.24 samples/sec   Loss 0.1211   LearningRate 0.0003   Epoch: 18   Global Step: 317110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:48:09,003-Speed 3083.58 samples/sec   Loss 0.1170   LearningRate 0.0003   Epoch: 18   Global Step: 317120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:48:47,598-Speed 265.33 samples/sec   Loss 0.1116   LearningRate 0.0002   Epoch: 19   Global Step: 317130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:48:50,888-Speed 3113.78 samples/sec   Loss 0.1068   LearningRate 0.0002   Epoch: 19   Global Step: 317140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:48:54,347-Speed 2961.00 samples/sec   Loss 0.1154   LearningRate 0.0002   Epoch: 19   Global Step: 317150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:48:57,436-Speed 3316.06 samples/sec   Loss 0.1148   LearningRate 0.0002   Epoch: 19   Global Step: 317160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:00,503-Speed 3339.59 samples/sec   Loss 0.1078   LearningRate 0.0002   Epoch: 19   Global Step: 317170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:03,577-Speed 3331.52 samples/sec   Loss 0.1025   LearningRate 0.0002   Epoch: 19   Global Step: 317180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:06,676-Speed 3304.96 samples/sec   Loss 0.1065   LearningRate 0.0002   Epoch: 19   Global Step: 317190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:09,787-Speed 3292.40 samples/sec   Loss 0.1047   LearningRate 0.0002   Epoch: 19   Global Step: 317200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:12,923-Speed 3266.38 samples/sec   Loss 0.1034   LearningRate 0.0002   Epoch: 19   Global Step: 317210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:16,013-Speed 3314.28 samples/sec   Loss 0.1030   LearningRate 0.0002   Epoch: 19   Global Step: 317220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:19,101-Speed 3317.33 samples/sec   Loss 0.1042   LearningRate 0.0002   Epoch: 19   Global Step: 317230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:22,233-Speed 3269.91 samples/sec   Loss 0.0975   LearningRate 0.0002   Epoch: 19   Global Step: 317240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:25,347-Speed 3289.66 samples/sec   Loss 0.1065   LearningRate 0.0002   Epoch: 19   Global Step: 317250   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:49:28,596-Speed 3152.05 samples/sec   Loss 0.1084   LearningRate 0.0002   Epoch: 19   Global Step: 317260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:31,693-Speed 3307.06 samples/sec   Loss 0.1022   LearningRate 0.0002   Epoch: 19   Global Step: 317270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:34,836-Speed 3258.64 samples/sec   Loss 0.1041   LearningRate 0.0002   Epoch: 19   Global Step: 317280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:37,936-Speed 3304.45 samples/sec   Loss 0.1050   LearningRate 0.0002   Epoch: 19   Global Step: 317290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:41,043-Speed 3295.74 samples/sec   Loss 0.1044   LearningRate 0.0002   Epoch: 19   Global Step: 317300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:44,122-Speed 3327.17 samples/sec   Loss 0.0976   LearningRate 0.0002   Epoch: 19   Global Step: 317310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:47,198-Speed 3329.95 samples/sec   Loss 0.1014   LearningRate 0.0002   Epoch: 19   Global Step: 317320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:50,362-Speed 3236.92 samples/sec   Loss 0.1084   LearningRate 0.0002   Epoch: 19   Global Step: 317330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:53,459-Speed 3306.98 samples/sec   Loss 0.1031   LearningRate 0.0002   Epoch: 19   Global Step: 317340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:56,531-Speed 3334.68 samples/sec   Loss 0.1058   LearningRate 0.0002   Epoch: 19   Global Step: 317350   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:49:59,599-Speed 3338.28 samples/sec   Loss 0.1045   LearningRate 0.0002   Epoch: 19   Global Step: 317360   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:50:03,071-Speed 2949.51 samples/sec   Loss 0.1018   LearningRate 0.0002   Epoch: 19   Global Step: 317370   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:50:06,426-Speed 3053.37 samples/sec   Loss 0.1010   LearningRate 0.0002   Epoch: 19   Global Step: 317380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:50:09,937-Speed 2916.45 samples/sec   Loss 0.1108   LearningRate 0.0002   Epoch: 19   Global Step: 317390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:50:14,578-Speed 2207.05 samples/sec   Loss 0.1144   LearningRate 0.0002   Epoch: 19   Global Step: 317400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:50:18,470-Speed 2631.59 samples/sec   Loss 0.1088   LearningRate 0.0002   Epoch: 19   Global Step: 317410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:23,429-Speed 2065.23 samples/sec   Loss 0.1146   LearningRate 0.0002   Epoch: 19   Global Step: 317420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:26,498-Speed 3337.83 samples/sec   Loss 0.1030   LearningRate 0.0002   Epoch: 19   Global Step: 317430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:29,573-Speed 3330.00 samples/sec   Loss 0.1093   LearningRate 0.0002   Epoch: 19   Global Step: 317440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:32,632-Speed 3348.61 samples/sec   Loss 0.1015   LearningRate 0.0002   Epoch: 19   Global Step: 317450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:35,726-Speed 3310.08 samples/sec   Loss 0.1046   LearningRate 0.0002   Epoch: 19   Global Step: 317460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:38,804-Speed 3328.34 samples/sec   Loss 0.1076   LearningRate 0.0002   Epoch: 19   Global Step: 317470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:41,933-Speed 3272.59 samples/sec   Loss 0.1046   LearningRate 0.0002   Epoch: 19   Global Step: 317480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:44,990-Speed 3351.16 samples/sec   Loss 0.0936   LearningRate 0.0002   Epoch: 19   Global Step: 317490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:48,058-Speed 3338.01 samples/sec   Loss 0.0958   LearningRate 0.0002   Epoch: 19   Global Step: 317500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:50:51,159-Speed 3303.64 samples/sec   Loss 0.1075   LearningRate 0.0002   Epoch: 19   Global Step: 317510   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:50:54,272-Speed 3290.12 samples/sec   Loss 0.1133   LearningRate 0.0002   Epoch: 19   Global Step: 317520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:50:57,447-Speed 3226.12 samples/sec   Loss 0.1065   LearningRate 0.0002   Epoch: 19   Global Step: 317530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:00,563-Speed 3285.89 samples/sec   Loss 0.1066   LearningRate 0.0002   Epoch: 19   Global Step: 317540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:03,632-Speed 3337.78 samples/sec   Loss 0.1028   LearningRate 0.0002   Epoch: 19   Global Step: 317550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:06,710-Speed 3328.12 samples/sec   Loss 0.0998   LearningRate 0.0002   Epoch: 19   Global Step: 317560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:09,788-Speed 3327.63 samples/sec   Loss 0.1155   LearningRate 0.0002   Epoch: 19   Global Step: 317570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:12,911-Speed 3279.21 samples/sec   Loss 0.1116   LearningRate 0.0002   Epoch: 19   Global Step: 317580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:15,989-Speed 3328.25 samples/sec   Loss 0.1064   LearningRate 0.0002   Epoch: 19   Global Step: 317590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:19,076-Speed 3317.19 samples/sec   Loss 0.1026   LearningRate 0.0002   Epoch: 19   Global Step: 317600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:22,160-Speed 3321.53 samples/sec   Loss 0.1113   LearningRate 0.0002   Epoch: 19   Global Step: 317610   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:51:25,231-Speed 3334.87 samples/sec   Loss 0.1003   LearningRate 0.0002   Epoch: 19   Global Step: 317620   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:51:28,300-Speed 3337.49 samples/sec   Loss 0.1122   LearningRate 0.0002   Epoch: 19   Global Step: 317630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:31,370-Speed 3336.61 samples/sec   Loss 0.1068   LearningRate 0.0002   Epoch: 19   Global Step: 317640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:34,439-Speed 3338.02 samples/sec   Loss 0.1023   LearningRate 0.0002   Epoch: 19   Global Step: 317650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:37,540-Speed 3302.40 samples/sec   Loss 0.1086   LearningRate 0.0002   Epoch: 19   Global Step: 317660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:40,671-Speed 3271.71 samples/sec   Loss 0.0998   LearningRate 0.0002   Epoch: 19   Global Step: 317670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:43,760-Speed 3315.96 samples/sec   Loss 0.0948   LearningRate 0.0002   Epoch: 19   Global Step: 317680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:46,822-Speed 3344.58 samples/sec   Loss 0.1001   LearningRate 0.0002   Epoch: 19   Global Step: 317690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:49,988-Speed 3234.99 samples/sec   Loss 0.0953   LearningRate 0.0002   Epoch: 19   Global Step: 317700   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:53,054-Speed 3340.73 samples/sec   Loss 0.1056   LearningRate 0.0002   Epoch: 19   Global Step: 317710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:56,135-Speed 3324.32 samples/sec   Loss 0.0961   LearningRate 0.0002   Epoch: 19   Global Step: 317720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:51:59,198-Speed 3344.24 samples/sec   Loss 0.1023   LearningRate 0.0002   Epoch: 19   Global Step: 317730   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:52:02,259-Speed 3346.38 samples/sec   Loss 0.1031   LearningRate 0.0002   Epoch: 19   Global Step: 317740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:05,385-Speed 3276.48 samples/sec   Loss 0.1034   LearningRate 0.0002   Epoch: 19   Global Step: 317750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:08,459-Speed 3331.13 samples/sec   Loss 0.1102   LearningRate 0.0002   Epoch: 19   Global Step: 317760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:11,589-Speed 3272.53 samples/sec   Loss 0.1100   LearningRate 0.0002   Epoch: 19   Global Step: 317770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:14,666-Speed 3328.42 samples/sec   Loss 0.0979   LearningRate 0.0002   Epoch: 19   Global Step: 317780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:17,743-Speed 3329.21 samples/sec   Loss 0.1067   LearningRate 0.0002   Epoch: 19   Global Step: 317790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:20,828-Speed 3319.54 samples/sec   Loss 0.1061   LearningRate 0.0002   Epoch: 19   Global Step: 317800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:23,925-Speed 3307.28 samples/sec   Loss 0.1120   LearningRate 0.0002   Epoch: 19   Global Step: 317810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:27,054-Speed 3274.09 samples/sec   Loss 0.1115   LearningRate 0.0002   Epoch: 19   Global Step: 317820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:30,121-Speed 3339.43 samples/sec   Loss 0.1064   LearningRate 0.0002   Epoch: 19   Global Step: 317830   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:33,187-Speed 3340.73 samples/sec   Loss 0.1060   LearningRate 0.0002   Epoch: 19   Global Step: 317840   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:52:36,248-Speed 3345.65 samples/sec   Loss 0.0983   LearningRate 0.0002   Epoch: 19   Global Step: 317850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:39,316-Speed 3337.83 samples/sec   Loss 0.1073   LearningRate 0.0002   Epoch: 19   Global Step: 317860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:42,390-Speed 3332.89 samples/sec   Loss 0.1054   LearningRate 0.0002   Epoch: 19   Global Step: 317870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:45,531-Speed 3260.77 samples/sec   Loss 0.1042   LearningRate 0.0002   Epoch: 19   Global Step: 317880   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:48,617-Speed 3318.61 samples/sec   Loss 0.0997   LearningRate 0.0002   Epoch: 19   Global Step: 317890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:51,767-Speed 3250.92 samples/sec   Loss 0.0977   LearningRate 0.0002   Epoch: 19   Global Step: 317900   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:54,890-Speed 3280.92 samples/sec   Loss 0.1019   LearningRate 0.0002   Epoch: 19   Global Step: 317910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:52:57,994-Speed 3299.12 samples/sec   Loss 0.1011   LearningRate 0.0002   Epoch: 19   Global Step: 317920   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:53:01,080-Speed 3319.90 samples/sec   Loss 0.0997   LearningRate 0.0002   Epoch: 19   Global Step: 317930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:53:04,214-Speed 3267.53 samples/sec   Loss 0.1048   LearningRate 0.0002   Epoch: 19   Global Step: 317940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:53:07,328-Speed 3289.44 samples/sec   Loss 0.0973   LearningRate 0.0002   Epoch: 19   Global Step: 317950   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:53:10,481-Speed 3248.60 samples/sec   Loss 0.0955   LearningRate 0.0002   Epoch: 19   Global Step: 317960   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:53:13,537-Speed 3350.91 samples/sec   Loss 0.1045   LearningRate 0.0002   Epoch: 19   Global Step: 317970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:53:16,675-Speed 3264.33 samples/sec   Loss 0.1036   LearningRate 0.0002   Epoch: 19   Global Step: 317980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:53:19,782-Speed 3296.97 samples/sec   Loss 0.1043   LearningRate 0.0002   Epoch: 19   Global Step: 317990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:53:22,862-Speed 3325.53 samples/sec   Loss 0.1158   LearningRate 0.0002   Epoch: 19   Global Step: 318000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:54:06,964-[lfw][318000]XNorm: 20.634739
Training: 2022-04-12 08:54:06,965-[lfw][318000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-12 08:54:06,965-[lfw][318000]Accuracy-Highest: 0.99817
Training: 2022-04-12 08:54:58,016-[cfp_fp][318000]XNorm: 22.552284
Training: 2022-04-12 08:54:58,017-[cfp_fp][318000]Accuracy-Flip: 0.99157+-0.00370
Training: 2022-04-12 08:54:58,017-[cfp_fp][318000]Accuracy-Highest: 0.99200
Training: 2022-04-12 08:55:41,796-[agedb_30][318000]XNorm: 22.757094
Training: 2022-04-12 08:55:41,797-[agedb_30][318000]Accuracy-Flip: 0.98633+-0.00557
Training: 2022-04-12 08:55:41,797-[agedb_30][318000]Accuracy-Highest: 0.98650
Training: 2022-04-12 08:55:44,868-Speed 72.11 samples/sec   Loss 0.1105   LearningRate 0.0002   Epoch: 19   Global Step: 318010   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:55:47,946-Speed 3327.82 samples/sec   Loss 0.1078   LearningRate 0.0002   Epoch: 19   Global Step: 318020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:55:51,077-Speed 3271.15 samples/sec   Loss 0.0962   LearningRate 0.0002   Epoch: 19   Global Step: 318030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:55:54,272-Speed 3206.04 samples/sec   Loss 0.1039   LearningRate 0.0002   Epoch: 19   Global Step: 318040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:55:57,479-Speed 3193.03 samples/sec   Loss 0.1155   LearningRate 0.0002   Epoch: 19   Global Step: 318050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:00,570-Speed 3314.12 samples/sec   Loss 0.1013   LearningRate 0.0002   Epoch: 19   Global Step: 318060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:03,632-Speed 3345.04 samples/sec   Loss 0.1007   LearningRate 0.0002   Epoch: 19   Global Step: 318070   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:56:06,741-Speed 3294.33 samples/sec   Loss 0.1014   LearningRate 0.0002   Epoch: 19   Global Step: 318080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:09,802-Speed 3345.91 samples/sec   Loss 0.1003   LearningRate 0.0002   Epoch: 19   Global Step: 318090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:12,896-Speed 3311.32 samples/sec   Loss 0.1091   LearningRate 0.0002   Epoch: 19   Global Step: 318100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:16,149-Speed 3147.85 samples/sec   Loss 0.0987   LearningRate 0.0002   Epoch: 19   Global Step: 318110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:19,281-Speed 3270.31 samples/sec   Loss 0.1057   LearningRate 0.0002   Epoch: 19   Global Step: 318120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:22,346-Speed 3341.33 samples/sec   Loss 0.1022   LearningRate 0.0002   Epoch: 19   Global Step: 318130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:25,418-Speed 3334.83 samples/sec   Loss 0.1131   LearningRate 0.0002   Epoch: 19   Global Step: 318140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:28,506-Speed 3316.59 samples/sec   Loss 0.1055   LearningRate 0.0002   Epoch: 19   Global Step: 318150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:31,595-Speed 3315.54 samples/sec   Loss 0.1000   LearningRate 0.0002   Epoch: 19   Global Step: 318160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:34,673-Speed 3327.58 samples/sec   Loss 0.1009   LearningRate 0.0002   Epoch: 19   Global Step: 318170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:37,732-Speed 3348.62 samples/sec   Loss 0.1032   LearningRate 0.0002   Epoch: 19   Global Step: 318180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:40,840-Speed 3295.99 samples/sec   Loss 0.0985   LearningRate 0.0002   Epoch: 19   Global Step: 318190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:43,919-Speed 3325.87 samples/sec   Loss 0.1006   LearningRate 0.0002   Epoch: 19   Global Step: 318200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:46,980-Speed 3346.32 samples/sec   Loss 0.1084   LearningRate 0.0002   Epoch: 19   Global Step: 318210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:50,049-Speed 3337.39 samples/sec   Loss 0.0969   LearningRate 0.0002   Epoch: 19   Global Step: 318220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:53,115-Speed 3340.33 samples/sec   Loss 0.1092   LearningRate 0.0002   Epoch: 19   Global Step: 318230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:56:56,189-Speed 3331.87 samples/sec   Loss 0.0965   LearningRate 0.0002   Epoch: 19   Global Step: 318240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:56:59,279-Speed 3315.26 samples/sec   Loss 0.1054   LearningRate 0.0002   Epoch: 19   Global Step: 318250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:57:02,351-Speed 3333.35 samples/sec   Loss 0.0980   LearningRate 0.0002   Epoch: 19   Global Step: 318260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:57:05,496-Speed 3256.98 samples/sec   Loss 0.0996   LearningRate 0.0002   Epoch: 19   Global Step: 318270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:57:08,586-Speed 3316.04 samples/sec   Loss 0.1063   LearningRate 0.0002   Epoch: 19   Global Step: 318280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:57:11,744-Speed 3242.76 samples/sec   Loss 0.0932   LearningRate 0.0002   Epoch: 19   Global Step: 318290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:57:14,885-Speed 3260.18 samples/sec   Loss 0.0974   LearningRate 0.0002   Epoch: 19   Global Step: 318300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:57:18,042-Speed 3244.84 samples/sec   Loss 0.0985   LearningRate 0.0002   Epoch: 19   Global Step: 318310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:57:21,154-Speed 3290.88 samples/sec   Loss 0.1054   LearningRate 0.0002   Epoch: 19   Global Step: 318320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:57:24,234-Speed 3325.68 samples/sec   Loss 0.0973   LearningRate 0.0002   Epoch: 19   Global Step: 318330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:57:27,392-Speed 3243.88 samples/sec   Loss 0.0989   LearningRate 0.0002   Epoch: 19   Global Step: 318340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:30,491-Speed 3304.52 samples/sec   Loss 0.1125   LearningRate 0.0002   Epoch: 19   Global Step: 318350   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:33,554-Speed 3344.36 samples/sec   Loss 0.1018   LearningRate 0.0002   Epoch: 19   Global Step: 318360   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:36,615-Speed 3345.87 samples/sec   Loss 0.1077   LearningRate 0.0002   Epoch: 19   Global Step: 318370   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:39,683-Speed 3338.77 samples/sec   Loss 0.1112   LearningRate 0.0002   Epoch: 19   Global Step: 318380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:42,784-Speed 3303.02 samples/sec   Loss 0.1012   LearningRate 0.0002   Epoch: 19   Global Step: 318390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:45,891-Speed 3296.21 samples/sec   Loss 0.0933   LearningRate 0.0002   Epoch: 19   Global Step: 318400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:49,038-Speed 3254.44 samples/sec   Loss 0.1152   LearningRate 0.0002   Epoch: 19   Global Step: 318410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:52,144-Speed 3297.38 samples/sec   Loss 0.0958   LearningRate 0.0002   Epoch: 19   Global Step: 318420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:55,258-Speed 3289.94 samples/sec   Loss 0.0972   LearningRate 0.0002   Epoch: 19   Global Step: 318430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:57:58,331-Speed 3333.01 samples/sec   Loss 0.1074   LearningRate 0.0002   Epoch: 19   Global Step: 318440   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:58:01,398-Speed 3339.63 samples/sec   Loss 0.1069   LearningRate 0.0002   Epoch: 19   Global Step: 318450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:04,465-Speed 3339.70 samples/sec   Loss 0.1063   LearningRate 0.0002   Epoch: 19   Global Step: 318460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:07,523-Speed 3349.16 samples/sec   Loss 0.0998   LearningRate 0.0002   Epoch: 19   Global Step: 318470   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:10,599-Speed 3329.42 samples/sec   Loss 0.1018   LearningRate 0.0002   Epoch: 19   Global Step: 318480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:13,658-Speed 3348.65 samples/sec   Loss 0.1053   LearningRate 0.0002   Epoch: 19   Global Step: 318490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:16,712-Speed 3353.08 samples/sec   Loss 0.1018   LearningRate 0.0002   Epoch: 19   Global Step: 318500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:19,774-Speed 3345.09 samples/sec   Loss 0.1192   LearningRate 0.0002   Epoch: 19   Global Step: 318510   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:22,849-Speed 3330.45 samples/sec   Loss 0.1139   LearningRate 0.0002   Epoch: 19   Global Step: 318520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:25,926-Speed 3329.60 samples/sec   Loss 0.1024   LearningRate 0.0002   Epoch: 19   Global Step: 318530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:28,985-Speed 3348.84 samples/sec   Loss 0.1057   LearningRate 0.0002   Epoch: 19   Global Step: 318540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:32,042-Speed 3349.41 samples/sec   Loss 0.1074   LearningRate 0.0002   Epoch: 19   Global Step: 318550   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 08:58:35,091-Speed 3359.37 samples/sec   Loss 0.1085   LearningRate 0.0002   Epoch: 19   Global Step: 318560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:38,164-Speed 3333.69 samples/sec   Loss 0.0889   LearningRate 0.0002   Epoch: 19   Global Step: 318570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:41,226-Speed 3345.00 samples/sec   Loss 0.0975   LearningRate 0.0002   Epoch: 19   Global Step: 318580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:44,303-Speed 3328.57 samples/sec   Loss 0.0937   LearningRate 0.0002   Epoch: 19   Global Step: 318590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:47,413-Speed 3292.94 samples/sec   Loss 0.1027   LearningRate 0.0002   Epoch: 19   Global Step: 318600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:50,507-Speed 3309.81 samples/sec   Loss 0.1054   LearningRate 0.0002   Epoch: 19   Global Step: 318610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:58:53,550-Speed 3366.64 samples/sec   Loss 0.1063   LearningRate 0.0002   Epoch: 19   Global Step: 318620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:58:56,623-Speed 3332.81 samples/sec   Loss 0.1096   LearningRate 0.0002   Epoch: 19   Global Step: 318630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:58:59,688-Speed 3341.94 samples/sec   Loss 0.1048   LearningRate 0.0002   Epoch: 19   Global Step: 318640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:02,766-Speed 3328.18 samples/sec   Loss 0.1017   LearningRate 0.0002   Epoch: 19   Global Step: 318650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:05,822-Speed 3350.77 samples/sec   Loss 0.1117   LearningRate 0.0002   Epoch: 19   Global Step: 318660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:08,883-Speed 3346.47 samples/sec   Loss 0.0978   LearningRate 0.0002   Epoch: 19   Global Step: 318670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:11,944-Speed 3346.29 samples/sec   Loss 0.1102   LearningRate 0.0002   Epoch: 19   Global Step: 318680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:15,038-Speed 3309.49 samples/sec   Loss 0.0983   LearningRate 0.0002   Epoch: 19   Global Step: 318690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:18,106-Speed 3339.09 samples/sec   Loss 0.1055   LearningRate 0.0002   Epoch: 19   Global Step: 318700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:21,184-Speed 3327.60 samples/sec   Loss 0.1073   LearningRate 0.0002   Epoch: 19   Global Step: 318710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:24,243-Speed 3347.95 samples/sec   Loss 0.1086   LearningRate 0.0002   Epoch: 19   Global Step: 318720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 08:59:27,332-Speed 3315.99 samples/sec   Loss 0.1033   LearningRate 0.0002   Epoch: 19   Global Step: 318730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:30,488-Speed 3245.22 samples/sec   Loss 0.1146   LearningRate 0.0002   Epoch: 19   Global Step: 318740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:33,666-Speed 3223.14 samples/sec   Loss 0.0912   LearningRate 0.0002   Epoch: 19   Global Step: 318750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:36,727-Speed 3346.01 samples/sec   Loss 0.0974   LearningRate 0.0002   Epoch: 19   Global Step: 318760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:39,795-Speed 3338.63 samples/sec   Loss 0.1102   LearningRate 0.0002   Epoch: 19   Global Step: 318770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:42,857-Speed 3345.07 samples/sec   Loss 0.1067   LearningRate 0.0002   Epoch: 19   Global Step: 318780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:45,928-Speed 3334.89 samples/sec   Loss 0.1101   LearningRate 0.0002   Epoch: 19   Global Step: 318790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:48,989-Speed 3346.24 samples/sec   Loss 0.1013   LearningRate 0.0002   Epoch: 19   Global Step: 318800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:52,120-Speed 3270.85 samples/sec   Loss 0.1048   LearningRate 0.0002   Epoch: 19   Global Step: 318810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:55,187-Speed 3339.91 samples/sec   Loss 0.1126   LearningRate 0.0002   Epoch: 19   Global Step: 318820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 08:59:58,246-Speed 3347.97 samples/sec   Loss 0.1049   LearningRate 0.0002   Epoch: 19   Global Step: 318830   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:01,310-Speed 3343.81 samples/sec   Loss 0.1024   LearningRate 0.0002   Epoch: 19   Global Step: 318840   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:04,384-Speed 3331.14 samples/sec   Loss 0.1019   LearningRate 0.0002   Epoch: 19   Global Step: 318850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:07,487-Speed 3301.19 samples/sec   Loss 0.1052   LearningRate 0.0002   Epoch: 19   Global Step: 318860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:10,533-Speed 3362.62 samples/sec   Loss 0.1124   LearningRate 0.0002   Epoch: 19   Global Step: 318870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:13,607-Speed 3332.11 samples/sec   Loss 0.1026   LearningRate 0.0002   Epoch: 19   Global Step: 318880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:16,666-Speed 3348.49 samples/sec   Loss 0.1171   LearningRate 0.0002   Epoch: 19   Global Step: 318890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:19,765-Speed 3304.57 samples/sec   Loss 0.1062   LearningRate 0.0002   Epoch: 19   Global Step: 318900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:22,827-Speed 3344.67 samples/sec   Loss 0.0970   LearningRate 0.0002   Epoch: 19   Global Step: 318910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:25,887-Speed 3346.98 samples/sec   Loss 0.1129   LearningRate 0.0002   Epoch: 19   Global Step: 318920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:28,947-Speed 3347.75 samples/sec   Loss 0.1077   LearningRate 0.0002   Epoch: 19   Global Step: 318930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:32,018-Speed 3334.87 samples/sec   Loss 0.1006   LearningRate 0.0002   Epoch: 19   Global Step: 318940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:35,074-Speed 3352.12 samples/sec   Loss 0.1083   LearningRate 0.0002   Epoch: 19   Global Step: 318950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:38,190-Speed 3286.13 samples/sec   Loss 0.0927   LearningRate 0.0002   Epoch: 19   Global Step: 318960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:00:41,353-Speed 3239.00 samples/sec   Loss 0.1176   LearningRate 0.0002   Epoch: 19   Global Step: 318970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:44,411-Speed 3349.20 samples/sec   Loss 0.0943   LearningRate 0.0002   Epoch: 19   Global Step: 318980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:47,472-Speed 3346.70 samples/sec   Loss 0.1067   LearningRate 0.0002   Epoch: 19   Global Step: 318990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:50,528-Speed 3351.01 samples/sec   Loss 0.1085   LearningRate 0.0002   Epoch: 19   Global Step: 319000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:53,585-Speed 3350.52 samples/sec   Loss 0.1137   LearningRate 0.0002   Epoch: 19   Global Step: 319010   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:56,641-Speed 3351.03 samples/sec   Loss 0.1048   LearningRate 0.0002   Epoch: 19   Global Step: 319020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:00:59,711-Speed 3336.39 samples/sec   Loss 0.1170   LearningRate 0.0002   Epoch: 19   Global Step: 319030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:02,774-Speed 3343.87 samples/sec   Loss 0.0998   LearningRate 0.0002   Epoch: 19   Global Step: 319040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:05,829-Speed 3352.44 samples/sec   Loss 0.1031   LearningRate 0.0002   Epoch: 19   Global Step: 319050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:08,901-Speed 3334.42 samples/sec   Loss 0.1142   LearningRate 0.0002   Epoch: 19   Global Step: 319060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:12,110-Speed 3191.41 samples/sec   Loss 0.1022   LearningRate 0.0002   Epoch: 19   Global Step: 319070   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 09:01:15,195-Speed 3320.73 samples/sec   Loss 0.1069   LearningRate 0.0002   Epoch: 19   Global Step: 319080   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 09:01:18,274-Speed 3326.72 samples/sec   Loss 0.1065   LearningRate 0.0002   Epoch: 19   Global Step: 319090   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 09:01:21,344-Speed 3336.25 samples/sec   Loss 0.1087   LearningRate 0.0002   Epoch: 19   Global Step: 319100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:24,406-Speed 3344.45 samples/sec   Loss 0.0999   LearningRate 0.0002   Epoch: 19   Global Step: 319110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:27,530-Speed 3278.90 samples/sec   Loss 0.0997   LearningRate 0.0002   Epoch: 19   Global Step: 319120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:30,616-Speed 3318.81 samples/sec   Loss 0.1009   LearningRate 0.0002   Epoch: 19   Global Step: 319130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:33,684-Speed 3338.02 samples/sec   Loss 0.1060   LearningRate 0.0002   Epoch: 19   Global Step: 319140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:36,823-Speed 3263.61 samples/sec   Loss 0.1078   LearningRate 0.0002   Epoch: 19   Global Step: 319150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:39,891-Speed 3338.42 samples/sec   Loss 0.0995   LearningRate 0.0002   Epoch: 19   Global Step: 319160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:42,963-Speed 3334.51 samples/sec   Loss 0.1082   LearningRate 0.0002   Epoch: 19   Global Step: 319170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:46,025-Speed 3344.00 samples/sec   Loss 0.1132   LearningRate 0.0002   Epoch: 19   Global Step: 319180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:49,090-Speed 3342.06 samples/sec   Loss 0.1073   LearningRate 0.0002   Epoch: 19   Global Step: 319190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:52,169-Speed 3326.33 samples/sec   Loss 0.0948   LearningRate 0.0002   Epoch: 19   Global Step: 319200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:55,252-Speed 3322.32 samples/sec   Loss 0.1081   LearningRate 0.0002   Epoch: 19   Global Step: 319210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:01:58,410-Speed 3243.61 samples/sec   Loss 0.1080   LearningRate 0.0002   Epoch: 19   Global Step: 319220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:01,685-Speed 3127.04 samples/sec   Loss 0.1091   LearningRate 0.0002   Epoch: 19   Global Step: 319230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:04,758-Speed 3333.67 samples/sec   Loss 0.1049   LearningRate 0.0002   Epoch: 19   Global Step: 319240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:07,829-Speed 3335.22 samples/sec   Loss 0.1012   LearningRate 0.0002   Epoch: 19   Global Step: 319250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:10,895-Speed 3340.69 samples/sec   Loss 0.1074   LearningRate 0.0002   Epoch: 19   Global Step: 319260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:13,988-Speed 3310.58 samples/sec   Loss 0.0997   LearningRate 0.0002   Epoch: 19   Global Step: 319270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:17,052-Speed 3343.02 samples/sec   Loss 0.0959   LearningRate 0.0002   Epoch: 19   Global Step: 319280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:20,110-Speed 3349.39 samples/sec   Loss 0.1100   LearningRate 0.0002   Epoch: 19   Global Step: 319290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:23,181-Speed 3335.38 samples/sec   Loss 0.1065   LearningRate 0.0002   Epoch: 19   Global Step: 319300   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 09:02:26,230-Speed 3359.04 samples/sec   Loss 0.1076   LearningRate 0.0002   Epoch: 19   Global Step: 319310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:29,303-Speed 3332.96 samples/sec   Loss 0.1120   LearningRate 0.0002   Epoch: 19   Global Step: 319320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:32,384-Speed 3325.19 samples/sec   Loss 0.1067   LearningRate 0.0002   Epoch: 19   Global Step: 319330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:35,446-Speed 3344.39 samples/sec   Loss 0.1188   LearningRate 0.0002   Epoch: 19   Global Step: 319340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:38,506-Speed 3347.89 samples/sec   Loss 0.1016   LearningRate 0.0002   Epoch: 19   Global Step: 319350   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:41,653-Speed 3254.08 samples/sec   Loss 0.1081   LearningRate 0.0002   Epoch: 19   Global Step: 319360   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:44,808-Speed 3246.28 samples/sec   Loss 0.1070   LearningRate 0.0002   Epoch: 19   Global Step: 319370   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:47,884-Speed 3330.00 samples/sec   Loss 0.1074   LearningRate 0.0002   Epoch: 19   Global Step: 319380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:50,950-Speed 3340.74 samples/sec   Loss 0.1095   LearningRate 0.0002   Epoch: 19   Global Step: 319390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:54,021-Speed 3334.30 samples/sec   Loss 0.0942   LearningRate 0.0002   Epoch: 19   Global Step: 319400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:02:57,124-Speed 3301.29 samples/sec   Loss 0.0981   LearningRate 0.0002   Epoch: 19   Global Step: 319410   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 09:03:00,297-Speed 3228.80 samples/sec   Loss 0.1019   LearningRate 0.0002   Epoch: 19   Global Step: 319420   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 09:03:03,529-Speed 3168.79 samples/sec   Loss 0.0970   LearningRate 0.0002   Epoch: 19   Global Step: 319430   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 09:03:06,635-Speed 3297.86 samples/sec   Loss 0.0949   LearningRate 0.0002   Epoch: 19   Global Step: 319440   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-12 09:03:09,689-Speed 3354.13 samples/sec   Loss 0.1103   LearningRate 0.0002   Epoch: 19   Global Step: 319450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:03:12,787-Speed 3305.56 samples/sec   Loss 0.0927   LearningRate 0.0002   Epoch: 19   Global Step: 319460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:03:15,904-Speed 3286.37 samples/sec   Loss 0.0995   LearningRate 0.0002   Epoch: 19   Global Step: 319470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:18,977-Speed 3333.47 samples/sec   Loss 0.1198   LearningRate 0.0002   Epoch: 19   Global Step: 319480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:22,042-Speed 3340.72 samples/sec   Loss 0.1019   LearningRate 0.0002   Epoch: 19   Global Step: 319490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:25,112-Speed 3336.83 samples/sec   Loss 0.1155   LearningRate 0.0002   Epoch: 19   Global Step: 319500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:28,181-Speed 3337.01 samples/sec   Loss 0.1000   LearningRate 0.0002   Epoch: 19   Global Step: 319510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:31,270-Speed 3316.45 samples/sec   Loss 0.1055   LearningRate 0.0002   Epoch: 19   Global Step: 319520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:34,407-Speed 3264.65 samples/sec   Loss 0.0957   LearningRate 0.0002   Epoch: 19   Global Step: 319530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:37,481-Speed 3332.58 samples/sec   Loss 0.1103   LearningRate 0.0002   Epoch: 19   Global Step: 319540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:40,540-Speed 3348.04 samples/sec   Loss 0.1046   LearningRate 0.0002   Epoch: 19   Global Step: 319550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:43,607-Speed 3338.94 samples/sec   Loss 0.1082   LearningRate 0.0002   Epoch: 19   Global Step: 319560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-12 09:03:46,698-Speed 3313.89 samples/sec   Loss 0.0938   LearningRate 0.0002   Epoch: 19   Global Step: 319570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:03:49,761-Speed 3344.49 samples/sec   Loss 0.1042   LearningRate 0.0002   Epoch: 19   Global Step: 319580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:03:52,857-Speed 3308.28 samples/sec   Loss 0.0925   LearningRate 0.0002   Epoch: 19   Global Step: 319590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:03:55,972-Speed 3287.56 samples/sec   Loss 0.1019   LearningRate 0.0002   Epoch: 19   Global Step: 319600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:03:59,052-Speed 3325.40 samples/sec   Loss 0.0963   LearningRate 0.0002   Epoch: 19   Global Step: 319610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:04:02,148-Speed 3307.90 samples/sec   Loss 0.1010   LearningRate 0.0002   Epoch: 19   Global Step: 319620   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:04:05,210-Speed 3345.39 samples/sec   Loss 0.0999   LearningRate 0.0002   Epoch: 19   Global Step: 319630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-12 09:04:08,280-Speed 3336.09 samples/sec   Loss 0.0998   LearningRate 0.0002   Epoch: 19   Global Step: 319640   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:11,340-Speed 3346.92 samples/sec   Loss 0.1082   LearningRate 0.0002   Epoch: 19   Global Step: 319650   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:14,407-Speed 3339.75 samples/sec   Loss 0.1116   LearningRate 0.0002   Epoch: 19   Global Step: 319660   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:17,466-Speed 3349.25 samples/sec   Loss 0.1162   LearningRate 0.0002   Epoch: 19   Global Step: 319670   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:04:20,650-Speed 3216.27 samples/sec   Loss 0.0970   LearningRate 0.0002   Epoch: 19   Global Step: 319680   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:04:23,723-Speed 3333.17 samples/sec   Loss 0.1070   LearningRate 0.0002   Epoch: 19   Global Step: 319690   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:26,782-Speed 3348.17 samples/sec   Loss 0.1021   LearningRate 0.0002   Epoch: 19   Global Step: 319700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:29,849-Speed 3339.07 samples/sec   Loss 0.1018   LearningRate 0.0002   Epoch: 19   Global Step: 319710   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:32,921-Speed 3334.37 samples/sec   Loss 0.1122   LearningRate 0.0002   Epoch: 19   Global Step: 319720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:35,995-Speed 3331.63 samples/sec   Loss 0.1029   LearningRate 0.0002   Epoch: 19   Global Step: 319730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:39,065-Speed 3336.48 samples/sec   Loss 0.1127   LearningRate 0.0002   Epoch: 19   Global Step: 319740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:42,123-Speed 3349.61 samples/sec   Loss 0.0945   LearningRate 0.0002   Epoch: 19   Global Step: 319750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:45,261-Speed 3264.75 samples/sec   Loss 0.0983   LearningRate 0.0002   Epoch: 19   Global Step: 319760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:48,397-Speed 3265.41 samples/sec   Loss 0.1021   LearningRate 0.0002   Epoch: 19   Global Step: 319770   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:51,557-Speed 3241.41 samples/sec   Loss 0.1003   LearningRate 0.0002   Epoch: 19   Global Step: 319780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:04:54,624-Speed 3339.84 samples/sec   Loss 0.1024   LearningRate 0.0002   Epoch: 19   Global Step: 319790   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:04:57,696-Speed 3333.90 samples/sec   Loss 0.1098   LearningRate 0.0002   Epoch: 19   Global Step: 319800   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:05:00,796-Speed 3303.02 samples/sec   Loss 0.0960   LearningRate 0.0002   Epoch: 19   Global Step: 319810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:03,884-Speed 3317.85 samples/sec   Loss 0.1055   LearningRate 0.0002   Epoch: 19   Global Step: 319820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:06,989-Speed 3298.05 samples/sec   Loss 0.1182   LearningRate 0.0002   Epoch: 19   Global Step: 319830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:10,061-Speed 3334.17 samples/sec   Loss 0.1008   LearningRate 0.0002   Epoch: 19   Global Step: 319840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:13,245-Speed 3217.77 samples/sec   Loss 0.0991   LearningRate 0.0002   Epoch: 19   Global Step: 319850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:16,351-Speed 3296.99 samples/sec   Loss 0.1107   LearningRate 0.0002   Epoch: 19   Global Step: 319860   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:19,602-Speed 3150.12 samples/sec   Loss 0.1001   LearningRate 0.0002   Epoch: 19   Global Step: 319870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:22,817-Speed 3185.69 samples/sec   Loss 0.1060   LearningRate 0.0002   Epoch: 19   Global Step: 319880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:26,002-Speed 3216.02 samples/sec   Loss 0.0992   LearningRate 0.0002   Epoch: 19   Global Step: 319890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:29,117-Speed 3288.01 samples/sec   Loss 0.0938   LearningRate 0.0002   Epoch: 19   Global Step: 319900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:32,192-Speed 3330.99 samples/sec   Loss 0.1082   LearningRate 0.0002   Epoch: 19   Global Step: 319910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:35,268-Speed 3330.18 samples/sec   Loss 0.1036   LearningRate 0.0002   Epoch: 19   Global Step: 319920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:38,342-Speed 3332.25 samples/sec   Loss 0.0961   LearningRate 0.0002   Epoch: 19   Global Step: 319930   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:41,477-Speed 3267.55 samples/sec   Loss 0.1022   LearningRate 0.0002   Epoch: 19   Global Step: 319940   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:44,537-Speed 3346.49 samples/sec   Loss 0.1043   LearningRate 0.0002   Epoch: 19   Global Step: 319950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:05:47,594-Speed 3351.10 samples/sec   Loss 0.1175   LearningRate 0.0002   Epoch: 19   Global Step: 319960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:05:50,666-Speed 3333.71 samples/sec   Loss 0.1129   LearningRate 0.0002   Epoch: 19   Global Step: 319970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:05:53,758-Speed 3312.21 samples/sec   Loss 0.1061   LearningRate 0.0002   Epoch: 19   Global Step: 319980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:05:56,831-Speed 3332.80 samples/sec   Loss 0.1026   LearningRate 0.0002   Epoch: 19   Global Step: 319990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:05:59,904-Speed 3333.54 samples/sec   Loss 0.1044   LearningRate 0.0002   Epoch: 19   Global Step: 320000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:06:43,760-[lfw][320000]XNorm: 20.726945
Training: 2022-04-12 09:06:43,760-[lfw][320000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 09:06:43,761-[lfw][320000]Accuracy-Highest: 0.99817
Training: 2022-04-12 09:07:34,591-[cfp_fp][320000]XNorm: 22.576026
Training: 2022-04-12 09:07:34,592-[cfp_fp][320000]Accuracy-Flip: 0.99129+-0.00347
Training: 2022-04-12 09:07:34,592-[cfp_fp][320000]Accuracy-Highest: 0.99200
Training: 2022-04-12 09:08:18,031-[agedb_30][320000]XNorm: 22.776800
Training: 2022-04-12 09:08:18,031-[agedb_30][320000]Accuracy-Flip: 0.98467+-0.00670
Training: 2022-04-12 09:08:18,032-[agedb_30][320000]Accuracy-Highest: 0.98650
Training: 2022-04-12 09:08:21,094-Speed 72.53 samples/sec   Loss 0.1043   LearningRate 0.0002   Epoch: 19   Global Step: 320010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:08:24,163-Speed 3337.27 samples/sec   Loss 0.0976   LearningRate 0.0002   Epoch: 19   Global Step: 320020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:08:27,291-Speed 3274.85 samples/sec   Loss 0.1045   LearningRate 0.0002   Epoch: 19   Global Step: 320030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:08:30,359-Speed 3338.25 samples/sec   Loss 0.0987   LearningRate 0.0002   Epoch: 19   Global Step: 320040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:08:33,417-Speed 3348.86 samples/sec   Loss 0.0986   LearningRate 0.0002   Epoch: 19   Global Step: 320050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:08:36,518-Speed 3303.84 samples/sec   Loss 0.1042   LearningRate 0.0002   Epoch: 19   Global Step: 320060   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:08:39,582-Speed 3342.57 samples/sec   Loss 0.1091   LearningRate 0.0002   Epoch: 19   Global Step: 320070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:08:42,732-Speed 3251.67 samples/sec   Loss 0.1106   LearningRate 0.0002   Epoch: 19   Global Step: 320080   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:08:45,866-Speed 3267.87 samples/sec   Loss 0.1103   LearningRate 0.0002   Epoch: 19   Global Step: 320090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:08:48,950-Speed 3321.07 samples/sec   Loss 0.0962   LearningRate 0.0002   Epoch: 19   Global Step: 320100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:08:52,011-Speed 3345.53 samples/sec   Loss 0.0979   LearningRate 0.0002   Epoch: 19   Global Step: 320110   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:08:55,073-Speed 3345.90 samples/sec   Loss 0.1025   LearningRate 0.0002   Epoch: 19   Global Step: 320120   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:08:58,136-Speed 3343.33 samples/sec   Loss 0.1100   LearningRate 0.0002   Epoch: 19   Global Step: 320130   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:01,194-Speed 3349.46 samples/sec   Loss 0.0944   LearningRate 0.0002   Epoch: 19   Global Step: 320140   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:04,271-Speed 3328.75 samples/sec   Loss 0.1028   LearningRate 0.0002   Epoch: 19   Global Step: 320150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:07,342-Speed 3335.63 samples/sec   Loss 0.1202   LearningRate 0.0002   Epoch: 19   Global Step: 320160   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:09:10,407-Speed 3340.82 samples/sec   Loss 0.1138   LearningRate 0.0002   Epoch: 19   Global Step: 320170   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:09:13,576-Speed 3232.40 samples/sec   Loss 0.1142   LearningRate 0.0002   Epoch: 19   Global Step: 320180   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:09:16,654-Speed 3327.75 samples/sec   Loss 0.1100   LearningRate 0.0002   Epoch: 19   Global Step: 320190   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:19,726-Speed 3334.41 samples/sec   Loss 0.1117   LearningRate 0.0002   Epoch: 19   Global Step: 320200   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:22,832-Speed 3297.53 samples/sec   Loss 0.1080   LearningRate 0.0002   Epoch: 19   Global Step: 320210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:25,892-Speed 3347.62 samples/sec   Loss 0.1023   LearningRate 0.0002   Epoch: 19   Global Step: 320220   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:28,950-Speed 3349.28 samples/sec   Loss 0.1253   LearningRate 0.0002   Epoch: 19   Global Step: 320230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:32,072-Speed 3280.76 samples/sec   Loss 0.1130   LearningRate 0.0002   Epoch: 19   Global Step: 320240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:35,142-Speed 3335.32 samples/sec   Loss 0.1034   LearningRate 0.0002   Epoch: 19   Global Step: 320250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:38,201-Speed 3348.88 samples/sec   Loss 0.1060   LearningRate 0.0002   Epoch: 19   Global Step: 320260   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:41,271-Speed 3335.46 samples/sec   Loss 0.1136   LearningRate 0.0002   Epoch: 19   Global Step: 320270   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:44,348-Speed 3328.84 samples/sec   Loss 0.1055   LearningRate 0.0002   Epoch: 19   Global Step: 320280   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:47,424-Speed 3329.56 samples/sec   Loss 0.1073   LearningRate 0.0002   Epoch: 19   Global Step: 320290   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:09:50,516-Speed 3313.34 samples/sec   Loss 0.1055   LearningRate 0.0002   Epoch: 19   Global Step: 320300   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:09:53,574-Speed 3349.16 samples/sec   Loss 0.1017   LearningRate 0.0002   Epoch: 19   Global Step: 320310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:09:56,632-Speed 3349.14 samples/sec   Loss 0.1010   LearningRate 0.0002   Epoch: 19   Global Step: 320320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:09:59,693-Speed 3346.91 samples/sec   Loss 0.0918   LearningRate 0.0002   Epoch: 19   Global Step: 320330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:10:02,748-Speed 3351.83 samples/sec   Loss 0.1011   LearningRate 0.0002   Epoch: 19   Global Step: 320340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:10:05,809-Speed 3346.17 samples/sec   Loss 0.1014   LearningRate 0.0002   Epoch: 19   Global Step: 320350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:10:08,867-Speed 3349.72 samples/sec   Loss 0.1127   LearningRate 0.0002   Epoch: 19   Global Step: 320360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:10:11,922-Speed 3351.83 samples/sec   Loss 0.0951   LearningRate 0.0002   Epoch: 19   Global Step: 320370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:10:14,981-Speed 3348.47 samples/sec   Loss 0.0947   LearningRate 0.0002   Epoch: 19   Global Step: 320380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:10:18,103-Speed 3280.40 samples/sec   Loss 0.1112   LearningRate 0.0002   Epoch: 19   Global Step: 320390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:10:21,188-Speed 3320.09 samples/sec   Loss 0.0995   LearningRate 0.0002   Epoch: 19   Global Step: 320400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:10:24,278-Speed 3315.48 samples/sec   Loss 0.1022   LearningRate 0.0002   Epoch: 19   Global Step: 320410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:27,378-Speed 3304.03 samples/sec   Loss 0.1016   LearningRate 0.0002   Epoch: 19   Global Step: 320420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:30,563-Speed 3215.84 samples/sec   Loss 0.1020   LearningRate 0.0002   Epoch: 19   Global Step: 320430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:33,686-Speed 3279.67 samples/sec   Loss 0.1015   LearningRate 0.0002   Epoch: 19   Global Step: 320440   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:36,839-Speed 3248.11 samples/sec   Loss 0.0964   LearningRate 0.0002   Epoch: 19   Global Step: 320450   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:39,895-Speed 3351.65 samples/sec   Loss 0.1047   LearningRate 0.0002   Epoch: 19   Global Step: 320460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:42,950-Speed 3352.80 samples/sec   Loss 0.1015   LearningRate 0.0002   Epoch: 19   Global Step: 320470   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:46,018-Speed 3337.59 samples/sec   Loss 0.0987   LearningRate 0.0002   Epoch: 19   Global Step: 320480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:49,169-Speed 3251.58 samples/sec   Loss 0.1089   LearningRate 0.0002   Epoch: 19   Global Step: 320490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:52,267-Speed 3306.17 samples/sec   Loss 0.1057   LearningRate 0.0002   Epoch: 19   Global Step: 320500   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:10:55,343-Speed 3329.08 samples/sec   Loss 0.1033   LearningRate 0.0002   Epoch: 19   Global Step: 320510   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:10:58,405-Speed 3344.79 samples/sec   Loss 0.0965   LearningRate 0.0002   Epoch: 19   Global Step: 320520   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:11:01,463-Speed 3349.82 samples/sec   Loss 0.1084   LearningRate 0.0002   Epoch: 19   Global Step: 320530   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:11:04,585-Speed 3280.34 samples/sec   Loss 0.0982   LearningRate 0.0002   Epoch: 19   Global Step: 320540   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:11:07,679-Speed 3310.51 samples/sec   Loss 0.0976   LearningRate 0.0002   Epoch: 19   Global Step: 320550   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:11:10,723-Speed 3364.66 samples/sec   Loss 0.1024   LearningRate 0.0002   Epoch: 19   Global Step: 320560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:13,778-Speed 3353.14 samples/sec   Loss 0.0959   LearningRate 0.0002   Epoch: 19   Global Step: 320570   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:16,874-Speed 3308.49 samples/sec   Loss 0.0955   LearningRate 0.0002   Epoch: 19   Global Step: 320580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:19,959-Speed 3319.70 samples/sec   Loss 0.1045   LearningRate 0.0002   Epoch: 19   Global Step: 320590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:23,033-Speed 3331.55 samples/sec   Loss 0.1049   LearningRate 0.0002   Epoch: 19   Global Step: 320600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:26,139-Speed 3297.75 samples/sec   Loss 0.1057   LearningRate 0.0002   Epoch: 19   Global Step: 320610   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:29,199-Speed 3347.60 samples/sec   Loss 0.1010   LearningRate 0.0002   Epoch: 19   Global Step: 320620   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:32,274-Speed 3330.73 samples/sec   Loss 0.1000   LearningRate 0.0002   Epoch: 19   Global Step: 320630   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:35,348-Speed 3331.79 samples/sec   Loss 0.1087   LearningRate 0.0002   Epoch: 19   Global Step: 320640   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:38,418-Speed 3336.67 samples/sec   Loss 0.1051   LearningRate 0.0002   Epoch: 19   Global Step: 320650   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:41,478-Speed 3347.20 samples/sec   Loss 0.1064   LearningRate 0.0002   Epoch: 19   Global Step: 320660   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:11:44,533-Speed 3353.10 samples/sec   Loss 0.1054   LearningRate 0.0002   Epoch: 19   Global Step: 320670   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:11:47,577-Speed 3364.37 samples/sec   Loss 0.1039   LearningRate 0.0002   Epoch: 19   Global Step: 320680   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:50,670-Speed 3311.33 samples/sec   Loss 0.0995   LearningRate 0.0002   Epoch: 19   Global Step: 320690   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:53,728-Speed 3349.79 samples/sec   Loss 0.1024   LearningRate 0.0002   Epoch: 19   Global Step: 320700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:56,793-Speed 3341.42 samples/sec   Loss 0.1009   LearningRate 0.0002   Epoch: 19   Global Step: 320710   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:11:59,871-Speed 3327.87 samples/sec   Loss 0.0988   LearningRate 0.0002   Epoch: 19   Global Step: 320720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:02,927-Speed 3351.08 samples/sec   Loss 0.1073   LearningRate 0.0002   Epoch: 19   Global Step: 320730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:05,991-Speed 3342.44 samples/sec   Loss 0.0987   LearningRate 0.0002   Epoch: 19   Global Step: 320740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:09,049-Speed 3349.74 samples/sec   Loss 0.1041   LearningRate 0.0002   Epoch: 19   Global Step: 320750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:12,108-Speed 3347.98 samples/sec   Loss 0.0965   LearningRate 0.0002   Epoch: 19   Global Step: 320760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:15,192-Speed 3321.85 samples/sec   Loss 0.0997   LearningRate 0.0002   Epoch: 19   Global Step: 320770   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:18,264-Speed 3334.26 samples/sec   Loss 0.1088   LearningRate 0.0002   Epoch: 19   Global Step: 320780   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:12:21,319-Speed 3352.67 samples/sec   Loss 0.0962   LearningRate 0.0002   Epoch: 19   Global Step: 320790   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:12:24,393-Speed 3331.35 samples/sec   Loss 0.1049   LearningRate 0.0002   Epoch: 19   Global Step: 320800   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:12:27,439-Speed 3362.47 samples/sec   Loss 0.1025   LearningRate 0.0002   Epoch: 19   Global Step: 320810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:30,519-Speed 3325.65 samples/sec   Loss 0.1054   LearningRate 0.0002   Epoch: 19   Global Step: 320820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:33,591-Speed 3334.12 samples/sec   Loss 0.1048   LearningRate 0.0002   Epoch: 19   Global Step: 320830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:36,764-Speed 3227.95 samples/sec   Loss 0.1033   LearningRate 0.0002   Epoch: 19   Global Step: 320840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:39,856-Speed 3312.42 samples/sec   Loss 0.0995   LearningRate 0.0002   Epoch: 19   Global Step: 320850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:42,949-Speed 3312.13 samples/sec   Loss 0.1161   LearningRate 0.0002   Epoch: 19   Global Step: 320860   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:46,013-Speed 3342.71 samples/sec   Loss 0.1010   LearningRate 0.0002   Epoch: 19   Global Step: 320870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:49,074-Speed 3345.38 samples/sec   Loss 0.1029   LearningRate 0.0002   Epoch: 19   Global Step: 320880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:52,128-Speed 3353.75 samples/sec   Loss 0.1009   LearningRate 0.0002   Epoch: 19   Global Step: 320890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:55,188-Speed 3347.20 samples/sec   Loss 0.1045   LearningRate 0.0001   Epoch: 19   Global Step: 320900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:12:58,244-Speed 3351.38 samples/sec   Loss 0.1034   LearningRate 0.0001   Epoch: 19   Global Step: 320910   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:13:01,296-Speed 3356.44 samples/sec   Loss 0.1104   LearningRate 0.0001   Epoch: 19   Global Step: 320920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:04,359-Speed 3343.70 samples/sec   Loss 0.1063   LearningRate 0.0001   Epoch: 19   Global Step: 320930   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:07,421-Speed 3345.18 samples/sec   Loss 0.1047   LearningRate 0.0001   Epoch: 19   Global Step: 320940   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:10,641-Speed 3181.54 samples/sec   Loss 0.0989   LearningRate 0.0001   Epoch: 19   Global Step: 320950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:13,704-Speed 3343.77 samples/sec   Loss 0.1080   LearningRate 0.0001   Epoch: 19   Global Step: 320960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:16,763-Speed 3348.02 samples/sec   Loss 0.1072   LearningRate 0.0001   Epoch: 19   Global Step: 320970   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:19,867-Speed 3299.15 samples/sec   Loss 0.1111   LearningRate 0.0001   Epoch: 19   Global Step: 320980   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:22,943-Speed 3330.44 samples/sec   Loss 0.1001   LearningRate 0.0001   Epoch: 19   Global Step: 320990   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:26,020-Speed 3328.22 samples/sec   Loss 0.1017   LearningRate 0.0001   Epoch: 19   Global Step: 321000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:29,084-Speed 3342.66 samples/sec   Loss 0.1067   LearningRate 0.0001   Epoch: 19   Global Step: 321010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:32,174-Speed 3314.18 samples/sec   Loss 0.1109   LearningRate 0.0001   Epoch: 19   Global Step: 321020   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:13:35,229-Speed 3353.92 samples/sec   Loss 0.0980   LearningRate 0.0001   Epoch: 19   Global Step: 321030   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:38,300-Speed 3335.10 samples/sec   Loss 0.0992   LearningRate 0.0001   Epoch: 19   Global Step: 321040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:41,353-Speed 3355.37 samples/sec   Loss 0.0980   LearningRate 0.0001   Epoch: 19   Global Step: 321050   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:44,408-Speed 3352.61 samples/sec   Loss 0.1095   LearningRate 0.0001   Epoch: 19   Global Step: 321060   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:47,463-Speed 3352.21 samples/sec   Loss 0.1157   LearningRate 0.0001   Epoch: 19   Global Step: 321070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:50,526-Speed 3343.61 samples/sec   Loss 0.1034   LearningRate 0.0001   Epoch: 19   Global Step: 321080   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:53,593-Speed 3339.74 samples/sec   Loss 0.1039   LearningRate 0.0001   Epoch: 19   Global Step: 321090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:56,711-Speed 3284.93 samples/sec   Loss 0.1105   LearningRate 0.0001   Epoch: 19   Global Step: 321100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:13:59,775-Speed 3342.19 samples/sec   Loss 0.1062   LearningRate 0.0001   Epoch: 19   Global Step: 321110   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:14:02,857-Speed 3323.79 samples/sec   Loss 0.1013   LearningRate 0.0001   Epoch: 19   Global Step: 321120   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:14:05,957-Speed 3304.09 samples/sec   Loss 0.1063   LearningRate 0.0001   Epoch: 19   Global Step: 321130   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:14:09,026-Speed 3338.15 samples/sec   Loss 0.0985   LearningRate 0.0001   Epoch: 19   Global Step: 321140   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:14:12,095-Speed 3336.71 samples/sec   Loss 0.1015   LearningRate 0.0001   Epoch: 19   Global Step: 321150   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:14:15,154-Speed 3348.53 samples/sec   Loss 0.1067   LearningRate 0.0001   Epoch: 19   Global Step: 321160   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:14:18,207-Speed 3354.27 samples/sec   Loss 0.1165   LearningRate 0.0001   Epoch: 19   Global Step: 321170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:14:21,274-Speed 3340.24 samples/sec   Loss 0.0996   LearningRate 0.0001   Epoch: 19   Global Step: 321180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:14:24,395-Speed 3281.97 samples/sec   Loss 0.1002   LearningRate 0.0001   Epoch: 19   Global Step: 321190   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:14:27,521-Speed 3275.65 samples/sec   Loss 0.1066   LearningRate 0.0001   Epoch: 19   Global Step: 321200   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:14:30,627-Speed 3298.39 samples/sec   Loss 0.1026   LearningRate 0.0001   Epoch: 19   Global Step: 321210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:14:33,683-Speed 3350.86 samples/sec   Loss 0.1072   LearningRate 0.0001   Epoch: 19   Global Step: 321220   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:14:36,745-Speed 3345.41 samples/sec   Loss 0.1115   LearningRate 0.0001   Epoch: 19   Global Step: 321230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:14:39,813-Speed 3338.32 samples/sec   Loss 0.1021   LearningRate 0.0001   Epoch: 19   Global Step: 321240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:14:42,877-Speed 3345.10 samples/sec   Loss 0.1022   LearningRate 0.0001   Epoch: 19   Global Step: 321250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:14:45,963-Speed 3318.37 samples/sec   Loss 0.1122   LearningRate 0.0001   Epoch: 19   Global Step: 321260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:14:49,021-Speed 3350.23 samples/sec   Loss 0.1054   LearningRate 0.0001   Epoch: 19   Global Step: 321270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:14:52,090-Speed 3336.67 samples/sec   Loss 0.1087   LearningRate 0.0001   Epoch: 19   Global Step: 321280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:14:55,173-Speed 3322.27 samples/sec   Loss 0.1092   LearningRate 0.0001   Epoch: 19   Global Step: 321290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:14:58,251-Speed 3327.86 samples/sec   Loss 0.1047   LearningRate 0.0001   Epoch: 19   Global Step: 321300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:01,354-Speed 3301.09 samples/sec   Loss 0.1045   LearningRate 0.0001   Epoch: 19   Global Step: 321310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:04,449-Speed 3309.49 samples/sec   Loss 0.0971   LearningRate 0.0001   Epoch: 19   Global Step: 321320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:07,578-Speed 3273.05 samples/sec   Loss 0.0988   LearningRate 0.0001   Epoch: 19   Global Step: 321330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:10,662-Speed 3320.74 samples/sec   Loss 0.0994   LearningRate 0.0001   Epoch: 19   Global Step: 321340   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:15:13,748-Speed 3319.65 samples/sec   Loss 0.1053   LearningRate 0.0001   Epoch: 19   Global Step: 321350   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:15:16,804-Speed 3350.89 samples/sec   Loss 0.1053   LearningRate 0.0001   Epoch: 19   Global Step: 321360   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:15:19,884-Speed 3325.41 samples/sec   Loss 0.1151   LearningRate 0.0001   Epoch: 19   Global Step: 321370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:15:22,976-Speed 3312.76 samples/sec   Loss 0.0884   LearningRate 0.0001   Epoch: 19   Global Step: 321380   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:15:26,057-Speed 3324.62 samples/sec   Loss 0.1054   LearningRate 0.0001   Epoch: 19   Global Step: 321390   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:15:29,116-Speed 3348.74 samples/sec   Loss 0.1067   LearningRate 0.0001   Epoch: 19   Global Step: 321400   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:15:32,239-Speed 3279.70 samples/sec   Loss 0.0955   LearningRate 0.0001   Epoch: 19   Global Step: 321410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:15:35,409-Speed 3230.82 samples/sec   Loss 0.1023   LearningRate 0.0001   Epoch: 19   Global Step: 321420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:15:38,460-Speed 3356.73 samples/sec   Loss 0.1018   LearningRate 0.0001   Epoch: 19   Global Step: 321430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:41,524-Speed 3343.41 samples/sec   Loss 0.1101   LearningRate 0.0001   Epoch: 19   Global Step: 321440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:44,590-Speed 3339.83 samples/sec   Loss 0.1096   LearningRate 0.0001   Epoch: 19   Global Step: 321450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:47,678-Speed 3317.14 samples/sec   Loss 0.1138   LearningRate 0.0001   Epoch: 19   Global Step: 321460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:50,771-Speed 3311.56 samples/sec   Loss 0.1013   LearningRate 0.0001   Epoch: 19   Global Step: 321470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:53,831-Speed 3347.16 samples/sec   Loss 0.0972   LearningRate 0.0001   Epoch: 19   Global Step: 321480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:56,939-Speed 3295.95 samples/sec   Loss 0.1053   LearningRate 0.0001   Epoch: 19   Global Step: 321490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:15:59,998-Speed 3347.50 samples/sec   Loss 0.1071   LearningRate 0.0001   Epoch: 19   Global Step: 321500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:16:03,062-Speed 3343.33 samples/sec   Loss 0.1027   LearningRate 0.0001   Epoch: 19   Global Step: 321510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:16:06,217-Speed 3245.85 samples/sec   Loss 0.0903   LearningRate 0.0001   Epoch: 19   Global Step: 321520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:16:09,350-Speed 3268.91 samples/sec   Loss 0.1047   LearningRate 0.0001   Epoch: 19   Global Step: 321530   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:12,419-Speed 3337.95 samples/sec   Loss 0.1084   LearningRate 0.0001   Epoch: 19   Global Step: 321540   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:15,476-Speed 3350.05 samples/sec   Loss 0.0981   LearningRate 0.0001   Epoch: 19   Global Step: 321550   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:18,536-Speed 3348.11 samples/sec   Loss 0.1061   LearningRate 0.0001   Epoch: 19   Global Step: 321560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:21,614-Speed 3326.96 samples/sec   Loss 0.1073   LearningRate 0.0001   Epoch: 19   Global Step: 321570   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:24,673-Speed 3348.77 samples/sec   Loss 0.1010   LearningRate 0.0001   Epoch: 19   Global Step: 321580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:27,788-Speed 3287.72 samples/sec   Loss 0.1116   LearningRate 0.0001   Epoch: 19   Global Step: 321590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:30,851-Speed 3344.13 samples/sec   Loss 0.1126   LearningRate 0.0001   Epoch: 19   Global Step: 321600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:33,923-Speed 3333.79 samples/sec   Loss 0.1112   LearningRate 0.0001   Epoch: 19   Global Step: 321610   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:36,999-Speed 3329.99 samples/sec   Loss 0.1057   LearningRate 0.0001   Epoch: 19   Global Step: 321620   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:40,071-Speed 3333.38 samples/sec   Loss 0.1004   LearningRate 0.0001   Epoch: 19   Global Step: 321630   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:43,182-Speed 3292.84 samples/sec   Loss 0.0989   LearningRate 0.0001   Epoch: 19   Global Step: 321640   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:46,246-Speed 3343.56 samples/sec   Loss 0.1013   LearningRate 0.0001   Epoch: 19   Global Step: 321650   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:49,317-Speed 3335.29 samples/sec   Loss 0.1080   LearningRate 0.0001   Epoch: 19   Global Step: 321660   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:52,376-Speed 3347.80 samples/sec   Loss 0.0942   LearningRate 0.0001   Epoch: 19   Global Step: 321670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:55,438-Speed 3344.93 samples/sec   Loss 0.1071   LearningRate 0.0001   Epoch: 19   Global Step: 321680   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:16:58,512-Speed 3331.71 samples/sec   Loss 0.1067   LearningRate 0.0001   Epoch: 19   Global Step: 321690   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:17:01,579-Speed 3339.32 samples/sec   Loss 0.0993   LearningRate 0.0001   Epoch: 19   Global Step: 321700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:17:04,672-Speed 3311.03 samples/sec   Loss 0.0908   LearningRate 0.0001   Epoch: 19   Global Step: 321710   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:17:07,723-Speed 3356.92 samples/sec   Loss 0.1068   LearningRate 0.0001   Epoch: 19   Global Step: 321720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:10,793-Speed 3336.83 samples/sec   Loss 0.1017   LearningRate 0.0001   Epoch: 19   Global Step: 321730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:13,859-Speed 3341.29 samples/sec   Loss 0.0992   LearningRate 0.0001   Epoch: 19   Global Step: 321740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:16,930-Speed 3335.64 samples/sec   Loss 0.1028   LearningRate 0.0001   Epoch: 19   Global Step: 321750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:20,007-Speed 3328.49 samples/sec   Loss 0.1108   LearningRate 0.0001   Epoch: 19   Global Step: 321760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:23,180-Speed 3227.04 samples/sec   Loss 0.0971   LearningRate 0.0001   Epoch: 19   Global Step: 321770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:26,317-Speed 3265.10 samples/sec   Loss 0.1074   LearningRate 0.0001   Epoch: 19   Global Step: 321780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:29,438-Speed 3281.92 samples/sec   Loss 0.1046   LearningRate 0.0001   Epoch: 19   Global Step: 321790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:32,548-Speed 3293.56 samples/sec   Loss 0.1043   LearningRate 0.0001   Epoch: 19   Global Step: 321800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:35,625-Speed 3327.94 samples/sec   Loss 0.0956   LearningRate 0.0001   Epoch: 19   Global Step: 321810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:39,485-Speed 2653.72 samples/sec   Loss 0.0987   LearningRate 0.0001   Epoch: 19   Global Step: 321820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:17:42,547-Speed 3344.36 samples/sec   Loss 0.1018   LearningRate 0.0001   Epoch: 19   Global Step: 321830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:17:45,633-Speed 3320.03 samples/sec   Loss 0.1031   LearningRate 0.0001   Epoch: 19   Global Step: 321840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:17:48,698-Speed 3340.81 samples/sec   Loss 0.1073   LearningRate 0.0001   Epoch: 19   Global Step: 321850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:17:51,765-Speed 3340.46 samples/sec   Loss 0.1046   LearningRate 0.0001   Epoch: 19   Global Step: 321860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:54,833-Speed 3338.44 samples/sec   Loss 0.1004   LearningRate 0.0001   Epoch: 19   Global Step: 321870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:17:57,962-Speed 3273.22 samples/sec   Loss 0.1028   LearningRate 0.0001   Epoch: 19   Global Step: 321880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:18:01,046-Speed 3321.06 samples/sec   Loss 0.1122   LearningRate 0.0001   Epoch: 19   Global Step: 321890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:18:04,138-Speed 3312.06 samples/sec   Loss 0.1077   LearningRate 0.0001   Epoch: 19   Global Step: 321900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:18:07,225-Speed 3318.05 samples/sec   Loss 0.1061   LearningRate 0.0001   Epoch: 19   Global Step: 321910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:18:10,364-Speed 3262.53 samples/sec   Loss 0.1079   LearningRate 0.0001   Epoch: 19   Global Step: 321920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:18:13,456-Speed 3312.97 samples/sec   Loss 0.0926   LearningRate 0.0001   Epoch: 19   Global Step: 321930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:18:16,555-Speed 3305.33 samples/sec   Loss 0.1005   LearningRate 0.0001   Epoch: 19   Global Step: 321940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:18:19,619-Speed 3342.28 samples/sec   Loss 0.1039   LearningRate 0.0001   Epoch: 19   Global Step: 321950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:18:22,807-Speed 3212.97 samples/sec   Loss 0.1001   LearningRate 0.0001   Epoch: 19   Global Step: 321960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:18:25,871-Speed 3343.08 samples/sec   Loss 0.1040   LearningRate 0.0001   Epoch: 19   Global Step: 321970   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:18:28,952-Speed 3324.51 samples/sec   Loss 0.1014   LearningRate 0.0001   Epoch: 19   Global Step: 321980   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:18:32,029-Speed 3327.91 samples/sec   Loss 0.1068   LearningRate 0.0001   Epoch: 19   Global Step: 321990   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:18:35,116-Speed 3317.63 samples/sec   Loss 0.1001   LearningRate 0.0001   Epoch: 19   Global Step: 322000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:19:18,554-[lfw][322000]XNorm: 20.678382
Training: 2022-04-12 09:19:18,554-[lfw][322000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 09:19:18,555-[lfw][322000]Accuracy-Highest: 0.99817
Training: 2022-04-12 09:20:09,300-[cfp_fp][322000]XNorm: 22.553855
Training: 2022-04-12 09:20:09,300-[cfp_fp][322000]Accuracy-Flip: 0.99186+-0.00356
Training: 2022-04-12 09:20:09,301-[cfp_fp][322000]Accuracy-Highest: 0.99200
Training: 2022-04-12 09:20:52,709-[agedb_30][322000]XNorm: 22.799332
Training: 2022-04-12 09:20:52,709-[agedb_30][322000]Accuracy-Flip: 0.98650+-0.00565
Training: 2022-04-12 09:20:52,710-[agedb_30][322000]Accuracy-Highest: 0.98650
Training: 2022-04-12 09:20:55,779-Speed 72.80 samples/sec   Loss 0.1039   LearningRate 0.0001   Epoch: 19   Global Step: 322010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:20:58,941-Speed 3238.71 samples/sec   Loss 0.1092   LearningRate 0.0001   Epoch: 19   Global Step: 322020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:02,000-Speed 3348.73 samples/sec   Loss 0.0996   LearningRate 0.0001   Epoch: 19   Global Step: 322030   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:05,050-Speed 3357.69 samples/sec   Loss 0.1031   LearningRate 0.0001   Epoch: 19   Global Step: 322040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:08,101-Speed 3356.71 samples/sec   Loss 0.1061   LearningRate 0.0001   Epoch: 19   Global Step: 322050   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:11,155-Speed 3353.89 samples/sec   Loss 0.1099   LearningRate 0.0001   Epoch: 19   Global Step: 322060   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:14,237-Speed 3324.09 samples/sec   Loss 0.0988   LearningRate 0.0001   Epoch: 19   Global Step: 322070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:17,296-Speed 3348.01 samples/sec   Loss 0.1144   LearningRate 0.0001   Epoch: 19   Global Step: 322080   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:20,389-Speed 3311.23 samples/sec   Loss 0.0995   LearningRate 0.0001   Epoch: 19   Global Step: 322090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:23,443-Speed 3353.72 samples/sec   Loss 0.0934   LearningRate 0.0001   Epoch: 19   Global Step: 322100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:26,499-Speed 3351.72 samples/sec   Loss 0.0947   LearningRate 0.0001   Epoch: 19   Global Step: 322110   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:29,555-Speed 3351.10 samples/sec   Loss 0.1039   LearningRate 0.0001   Epoch: 19   Global Step: 322120   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:32,646-Speed 3313.56 samples/sec   Loss 0.0976   LearningRate 0.0001   Epoch: 19   Global Step: 322130   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:35,764-Speed 3284.82 samples/sec   Loss 0.1025   LearningRate 0.0001   Epoch: 19   Global Step: 322140   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:38,949-Speed 3216.23 samples/sec   Loss 0.0954   LearningRate 0.0001   Epoch: 19   Global Step: 322150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:42,224-Speed 3127.32 samples/sec   Loss 0.0984   LearningRate 0.0001   Epoch: 19   Global Step: 322160   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:21:45,352-Speed 3274.69 samples/sec   Loss 0.1057   LearningRate 0.0001   Epoch: 19   Global Step: 322170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:48,420-Speed 3338.46 samples/sec   Loss 0.1031   LearningRate 0.0001   Epoch: 19   Global Step: 322180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:21:51,473-Speed 3354.99 samples/sec   Loss 0.1013   LearningRate 0.0001   Epoch: 19   Global Step: 322190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:21:54,604-Speed 3271.07 samples/sec   Loss 0.1016   LearningRate 0.0001   Epoch: 19   Global Step: 322200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:21:57,715-Speed 3292.40 samples/sec   Loss 0.0974   LearningRate 0.0001   Epoch: 19   Global Step: 322210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:22:00,802-Speed 3317.92 samples/sec   Loss 0.1006   LearningRate 0.0001   Epoch: 19   Global Step: 322220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:22:03,900-Speed 3306.48 samples/sec   Loss 0.1160   LearningRate 0.0001   Epoch: 19   Global Step: 322230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:22:06,985-Speed 3320.00 samples/sec   Loss 0.1097   LearningRate 0.0001   Epoch: 19   Global Step: 322240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:22:10,149-Speed 3237.08 samples/sec   Loss 0.1094   LearningRate 0.0001   Epoch: 19   Global Step: 322250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:22:13,213-Speed 3342.90 samples/sec   Loss 0.1116   LearningRate 0.0001   Epoch: 19   Global Step: 322260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:22:16,280-Speed 3339.51 samples/sec   Loss 0.1062   LearningRate 0.0001   Epoch: 19   Global Step: 322270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:22:19,385-Speed 3298.22 samples/sec   Loss 0.1101   LearningRate 0.0001   Epoch: 19   Global Step: 322280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:22:22,527-Speed 3260.60 samples/sec   Loss 0.1034   LearningRate 0.0001   Epoch: 19   Global Step: 322290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:25,588-Speed 3345.38 samples/sec   Loss 0.1132   LearningRate 0.0001   Epoch: 19   Global Step: 322300   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:28,701-Speed 3289.89 samples/sec   Loss 0.1030   LearningRate 0.0001   Epoch: 19   Global Step: 322310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:31,771-Speed 3336.58 samples/sec   Loss 0.1097   LearningRate 0.0001   Epoch: 19   Global Step: 322320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:34,854-Speed 3323.02 samples/sec   Loss 0.1005   LearningRate 0.0001   Epoch: 19   Global Step: 322330   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:37,946-Speed 3312.36 samples/sec   Loss 0.1018   LearningRate 0.0001   Epoch: 19   Global Step: 322340   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:41,014-Speed 3338.35 samples/sec   Loss 0.1073   LearningRate 0.0001   Epoch: 19   Global Step: 322350   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:44,102-Speed 3316.62 samples/sec   Loss 0.0949   LearningRate 0.0001   Epoch: 19   Global Step: 322360   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:47,170-Speed 3338.21 samples/sec   Loss 0.1037   LearningRate 0.0001   Epoch: 19   Global Step: 322370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:50,251-Speed 3324.53 samples/sec   Loss 0.1092   LearningRate 0.0001   Epoch: 19   Global Step: 322380   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:22:53,316-Speed 3341.26 samples/sec   Loss 0.1000   LearningRate 0.0001   Epoch: 19   Global Step: 322390   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:22:56,370-Speed 3353.56 samples/sec   Loss 0.1032   LearningRate 0.0001   Epoch: 19   Global Step: 322400   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:22:59,436-Speed 3341.99 samples/sec   Loss 0.1042   LearningRate 0.0001   Epoch: 19   Global Step: 322410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:23:02,499-Speed 3343.31 samples/sec   Loss 0.1047   LearningRate 0.0001   Epoch: 19   Global Step: 322420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:23:05,571-Speed 3334.29 samples/sec   Loss 0.1063   LearningRate 0.0001   Epoch: 19   Global Step: 322430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:23:08,615-Speed 3364.31 samples/sec   Loss 0.1187   LearningRate 0.0001   Epoch: 19   Global Step: 322440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:11,674-Speed 3348.78 samples/sec   Loss 0.1033   LearningRate 0.0001   Epoch: 19   Global Step: 322450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:14,735-Speed 3345.25 samples/sec   Loss 0.0997   LearningRate 0.0001   Epoch: 19   Global Step: 322460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:17,812-Speed 3329.52 samples/sec   Loss 0.1158   LearningRate 0.0001   Epoch: 19   Global Step: 322470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:20,875-Speed 3343.19 samples/sec   Loss 0.0944   LearningRate 0.0001   Epoch: 19   Global Step: 322480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:23,960-Speed 3320.55 samples/sec   Loss 0.1116   LearningRate 0.0001   Epoch: 19   Global Step: 322490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:27,024-Speed 3342.17 samples/sec   Loss 0.1003   LearningRate 0.0001   Epoch: 19   Global Step: 322500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:30,100-Speed 3330.22 samples/sec   Loss 0.1051   LearningRate 0.0001   Epoch: 19   Global Step: 322510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:33,162-Speed 3345.30 samples/sec   Loss 0.1015   LearningRate 0.0001   Epoch: 19   Global Step: 322520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:36,226-Speed 3343.01 samples/sec   Loss 0.1011   LearningRate 0.0001   Epoch: 19   Global Step: 322530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:23:39,291-Speed 3341.62 samples/sec   Loss 0.1068   LearningRate 0.0001   Epoch: 19   Global Step: 322540   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:23:42,358-Speed 3338.71 samples/sec   Loss 0.1126   LearningRate 0.0001   Epoch: 19   Global Step: 322550   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:23:45,505-Speed 3255.03 samples/sec   Loss 0.0980   LearningRate 0.0001   Epoch: 19   Global Step: 322560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:23:48,643-Speed 3263.66 samples/sec   Loss 0.1054   LearningRate 0.0001   Epoch: 19   Global Step: 322570   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:23:51,803-Speed 3241.87 samples/sec   Loss 0.1050   LearningRate 0.0001   Epoch: 19   Global Step: 322580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:23:55,014-Speed 3189.85 samples/sec   Loss 0.1124   LearningRate 0.0001   Epoch: 19   Global Step: 322590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:23:58,094-Speed 3325.02 samples/sec   Loss 0.1083   LearningRate 0.0001   Epoch: 19   Global Step: 322600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:01,321-Speed 3173.81 samples/sec   Loss 0.1045   LearningRate 0.0001   Epoch: 19   Global Step: 322610   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:04,388-Speed 3340.08 samples/sec   Loss 0.0989   LearningRate 0.0001   Epoch: 19   Global Step: 322620   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:07,450-Speed 3345.37 samples/sec   Loss 0.1121   LearningRate 0.0001   Epoch: 19   Global Step: 322630   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:10,555-Speed 3297.61 samples/sec   Loss 0.1131   LearningRate 0.0001   Epoch: 19   Global Step: 322640   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:24:13,623-Speed 3338.65 samples/sec   Loss 0.1103   LearningRate 0.0001   Epoch: 19   Global Step: 322650   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:16,686-Speed 3344.10 samples/sec   Loss 0.0990   LearningRate 0.0001   Epoch: 19   Global Step: 322660   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:19,759-Speed 3332.60 samples/sec   Loss 0.1131   LearningRate 0.0001   Epoch: 19   Global Step: 322670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:22,831-Speed 3334.75 samples/sec   Loss 0.1086   LearningRate 0.0001   Epoch: 19   Global Step: 322680   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:25,945-Speed 3289.59 samples/sec   Loss 0.1018   LearningRate 0.0001   Epoch: 19   Global Step: 322690   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:29,127-Speed 3218.41 samples/sec   Loss 0.1108   LearningRate 0.0001   Epoch: 19   Global Step: 322700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:32,193-Speed 3340.51 samples/sec   Loss 0.1089   LearningRate 0.0001   Epoch: 19   Global Step: 322710   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:35,254-Speed 3346.29 samples/sec   Loss 0.1051   LearningRate 0.0001   Epoch: 19   Global Step: 322720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:38,318-Speed 3342.72 samples/sec   Loss 0.1023   LearningRate 0.0001   Epoch: 19   Global Step: 322730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:41,387-Speed 3337.29 samples/sec   Loss 0.1049   LearningRate 0.0001   Epoch: 19   Global Step: 322740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:44,463-Speed 3329.81 samples/sec   Loss 0.1126   LearningRate 0.0001   Epoch: 19   Global Step: 322750   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:24:47,535-Speed 3333.17 samples/sec   Loss 0.1120   LearningRate 0.0001   Epoch: 19   Global Step: 322760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:50,613-Speed 3328.41 samples/sec   Loss 0.1211   LearningRate 0.0001   Epoch: 19   Global Step: 322770   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:53,739-Speed 3276.98 samples/sec   Loss 0.1033   LearningRate 0.0001   Epoch: 19   Global Step: 322780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:56,814-Speed 3330.91 samples/sec   Loss 0.1050   LearningRate 0.0001   Epoch: 19   Global Step: 322790   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:24:59,878-Speed 3342.26 samples/sec   Loss 0.0991   LearningRate 0.0001   Epoch: 19   Global Step: 322800   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:02,946-Speed 3338.47 samples/sec   Loss 0.0927   LearningRate 0.0001   Epoch: 19   Global Step: 322810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:06,028-Speed 3323.12 samples/sec   Loss 0.1004   LearningRate 0.0001   Epoch: 19   Global Step: 322820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:09,110-Speed 3323.60 samples/sec   Loss 0.1003   LearningRate 0.0001   Epoch: 19   Global Step: 322830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:12,196-Speed 3318.75 samples/sec   Loss 0.1100   LearningRate 0.0001   Epoch: 19   Global Step: 322840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:15,271-Speed 3331.36 samples/sec   Loss 0.1028   LearningRate 0.0001   Epoch: 19   Global Step: 322850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:18,362-Speed 3313.34 samples/sec   Loss 0.0975   LearningRate 0.0001   Epoch: 19   Global Step: 322860   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:25:21,434-Speed 3334.30 samples/sec   Loss 0.1104   LearningRate 0.0001   Epoch: 19   Global Step: 322870   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:25:24,529-Speed 3309.95 samples/sec   Loss 0.1123   LearningRate 0.0001   Epoch: 19   Global Step: 322880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:27,605-Speed 3328.54 samples/sec   Loss 0.0994   LearningRate 0.0001   Epoch: 19   Global Step: 322890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:30,684-Speed 3326.91 samples/sec   Loss 0.0984   LearningRate 0.0001   Epoch: 19   Global Step: 322900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:33,770-Speed 3318.71 samples/sec   Loss 0.0966   LearningRate 0.0001   Epoch: 19   Global Step: 322910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:36,868-Speed 3306.58 samples/sec   Loss 0.0985   LearningRate 0.0001   Epoch: 19   Global Step: 322920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:40,013-Speed 3256.77 samples/sec   Loss 0.0923   LearningRate 0.0001   Epoch: 19   Global Step: 322930   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:43,095-Speed 3322.97 samples/sec   Loss 0.1080   LearningRate 0.0001   Epoch: 19   Global Step: 322940   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:46,184-Speed 3316.38 samples/sec   Loss 0.1094   LearningRate 0.0001   Epoch: 19   Global Step: 322950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:49,330-Speed 3255.96 samples/sec   Loss 0.1039   LearningRate 0.0001   Epoch: 19   Global Step: 322960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:52,479-Speed 3252.59 samples/sec   Loss 0.1010   LearningRate 0.0001   Epoch: 19   Global Step: 322970   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:25:55,592-Speed 3289.68 samples/sec   Loss 0.1067   LearningRate 0.0001   Epoch: 19   Global Step: 322980   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:25:58,654-Speed 3345.24 samples/sec   Loss 0.0989   LearningRate 0.0001   Epoch: 19   Global Step: 322990   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:01,719-Speed 3341.41 samples/sec   Loss 0.1148   LearningRate 0.0001   Epoch: 19   Global Step: 323000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:04,794-Speed 3330.79 samples/sec   Loss 0.1044   LearningRate 0.0001   Epoch: 19   Global Step: 323010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:07,890-Speed 3308.40 samples/sec   Loss 0.0936   LearningRate 0.0001   Epoch: 19   Global Step: 323020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:10,969-Speed 3326.38 samples/sec   Loss 0.1075   LearningRate 0.0001   Epoch: 19   Global Step: 323030   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:14,087-Speed 3284.98 samples/sec   Loss 0.0995   LearningRate 0.0001   Epoch: 19   Global Step: 323040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:17,161-Speed 3331.93 samples/sec   Loss 0.1025   LearningRate 0.0001   Epoch: 19   Global Step: 323050   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:20,226-Speed 3342.17 samples/sec   Loss 0.1016   LearningRate 0.0001   Epoch: 19   Global Step: 323060   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:23,298-Speed 3334.36 samples/sec   Loss 0.0973   LearningRate 0.0001   Epoch: 19   Global Step: 323070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:26,366-Speed 3338.51 samples/sec   Loss 0.1028   LearningRate 0.0001   Epoch: 19   Global Step: 323080   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:29,503-Speed 3264.42 samples/sec   Loss 0.1120   LearningRate 0.0001   Epoch: 19   Global Step: 323090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:32,633-Speed 3272.07 samples/sec   Loss 0.1048   LearningRate 0.0001   Epoch: 19   Global Step: 323100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:35,726-Speed 3311.98 samples/sec   Loss 0.1140   LearningRate 0.0001   Epoch: 19   Global Step: 323110   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:38,815-Speed 3315.08 samples/sec   Loss 0.1089   LearningRate 0.0001   Epoch: 19   Global Step: 323120   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:41,880-Speed 3342.04 samples/sec   Loss 0.1106   LearningRate 0.0001   Epoch: 19   Global Step: 323130   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:44,979-Speed 3305.63 samples/sec   Loss 0.1022   LearningRate 0.0001   Epoch: 19   Global Step: 323140   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:48,051-Speed 3333.42 samples/sec   Loss 0.1127   LearningRate 0.0001   Epoch: 19   Global Step: 323150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:51,114-Speed 3344.26 samples/sec   Loss 0.1082   LearningRate 0.0001   Epoch: 19   Global Step: 323160   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:54,189-Speed 3330.38 samples/sec   Loss 0.0971   LearningRate 0.0001   Epoch: 19   Global Step: 323170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:26:57,254-Speed 3341.95 samples/sec   Loss 0.1105   LearningRate 0.0001   Epoch: 19   Global Step: 323180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:00,316-Speed 3344.68 samples/sec   Loss 0.1106   LearningRate 0.0001   Epoch: 19   Global Step: 323190   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:03,465-Speed 3252.86 samples/sec   Loss 0.0963   LearningRate 0.0001   Epoch: 19   Global Step: 323200   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:06,665-Speed 3200.58 samples/sec   Loss 0.1080   LearningRate 0.0001   Epoch: 19   Global Step: 323210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:09,771-Speed 3297.53 samples/sec   Loss 0.1037   LearningRate 0.0001   Epoch: 19   Global Step: 323220   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:12,845-Speed 3332.37 samples/sec   Loss 0.1027   LearningRate 0.0001   Epoch: 19   Global Step: 323230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:15,910-Speed 3341.61 samples/sec   Loss 0.1113   LearningRate 0.0001   Epoch: 19   Global Step: 323240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:19,027-Speed 3286.33 samples/sec   Loss 0.1105   LearningRate 0.0001   Epoch: 19   Global Step: 323250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:22,095-Speed 3338.39 samples/sec   Loss 0.1055   LearningRate 0.0001   Epoch: 19   Global Step: 323260   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:25,163-Speed 3338.54 samples/sec   Loss 0.0985   LearningRate 0.0001   Epoch: 19   Global Step: 323270   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:28,234-Speed 3334.30 samples/sec   Loss 0.0979   LearningRate 0.0001   Epoch: 19   Global Step: 323280   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:31,299-Speed 3342.33 samples/sec   Loss 0.1044   LearningRate 0.0001   Epoch: 19   Global Step: 323290   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:27:34,383-Speed 3320.36 samples/sec   Loss 0.1136   LearningRate 0.0001   Epoch: 19   Global Step: 323300   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:27:37,448-Speed 3342.87 samples/sec   Loss 0.1024   LearningRate 0.0001   Epoch: 19   Global Step: 323310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:40,525-Speed 3328.50 samples/sec   Loss 0.1094   LearningRate 0.0001   Epoch: 19   Global Step: 323320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:43,588-Speed 3343.84 samples/sec   Loss 0.1027   LearningRate 0.0001   Epoch: 19   Global Step: 323330   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:46,664-Speed 3328.90 samples/sec   Loss 0.0998   LearningRate 0.0001   Epoch: 19   Global Step: 323340   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:49,738-Speed 3331.93 samples/sec   Loss 0.1068   LearningRate 0.0001   Epoch: 19   Global Step: 323350   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:52,813-Speed 3331.55 samples/sec   Loss 0.1060   LearningRate 0.0001   Epoch: 19   Global Step: 323360   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:55,879-Speed 3340.66 samples/sec   Loss 0.1099   LearningRate 0.0001   Epoch: 19   Global Step: 323370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:27:58,977-Speed 3305.42 samples/sec   Loss 0.1036   LearningRate 0.0001   Epoch: 19   Global Step: 323380   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:02,045-Speed 3338.85 samples/sec   Loss 0.0999   LearningRate 0.0001   Epoch: 19   Global Step: 323390   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:05,121-Speed 3329.59 samples/sec   Loss 0.0960   LearningRate 0.0001   Epoch: 19   Global Step: 323400   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:08,189-Speed 3338.30 samples/sec   Loss 0.0973   LearningRate 0.0001   Epoch: 19   Global Step: 323410   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:28:11,242-Speed 3355.82 samples/sec   Loss 0.1005   LearningRate 0.0001   Epoch: 19   Global Step: 323420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:14,308-Speed 3340.54 samples/sec   Loss 0.1030   LearningRate 0.0001   Epoch: 19   Global Step: 323430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:17,382-Speed 3331.70 samples/sec   Loss 0.0985   LearningRate 0.0001   Epoch: 19   Global Step: 323440   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:20,448-Speed 3340.01 samples/sec   Loss 0.0993   LearningRate 0.0001   Epoch: 19   Global Step: 323450   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:23,522-Speed 3331.73 samples/sec   Loss 0.1078   LearningRate 0.0001   Epoch: 19   Global Step: 323460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:26,587-Speed 3342.72 samples/sec   Loss 0.1063   LearningRate 0.0001   Epoch: 19   Global Step: 323470   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:29,652-Speed 3341.67 samples/sec   Loss 0.1040   LearningRate 0.0001   Epoch: 19   Global Step: 323480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:28:32,706-Speed 3353.83 samples/sec   Loss 0.1106   LearningRate 0.0001   Epoch: 19   Global Step: 323490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:28:35,771-Speed 3342.38 samples/sec   Loss 0.1016   LearningRate 0.0001   Epoch: 19   Global Step: 323500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:28:38,869-Speed 3306.16 samples/sec   Loss 0.1006   LearningRate 0.0001   Epoch: 19   Global Step: 323510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:28:41,969-Speed 3303.80 samples/sec   Loss 0.0988   LearningRate 0.0001   Epoch: 19   Global Step: 323520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:28:45,128-Speed 3241.86 samples/sec   Loss 0.1006   LearningRate 0.0001   Epoch: 19   Global Step: 323530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:28:48,198-Speed 3336.07 samples/sec   Loss 0.0990   LearningRate 0.0001   Epoch: 19   Global Step: 323540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:28:51,289-Speed 3313.43 samples/sec   Loss 0.1093   LearningRate 0.0001   Epoch: 19   Global Step: 323550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:28:54,352-Speed 3344.90 samples/sec   Loss 0.0995   LearningRate 0.0001   Epoch: 19   Global Step: 323560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:28:57,424-Speed 3333.80 samples/sec   Loss 0.1069   LearningRate 0.0001   Epoch: 19   Global Step: 323570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:00,489-Speed 3341.26 samples/sec   Loss 0.1070   LearningRate 0.0001   Epoch: 19   Global Step: 323580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:03,557-Speed 3338.78 samples/sec   Loss 0.0932   LearningRate 0.0001   Epoch: 19   Global Step: 323590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:29:06,643-Speed 3319.14 samples/sec   Loss 0.0950   LearningRate 0.0001   Epoch: 19   Global Step: 323600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:29:09,710-Speed 3339.07 samples/sec   Loss 0.1059   LearningRate 0.0001   Epoch: 19   Global Step: 323610   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:29:12,783-Speed 3333.04 samples/sec   Loss 0.0997   LearningRate 0.0001   Epoch: 19   Global Step: 323620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:15,845-Speed 3344.82 samples/sec   Loss 0.1112   LearningRate 0.0001   Epoch: 19   Global Step: 323630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:18,934-Speed 3316.11 samples/sec   Loss 0.0999   LearningRate 0.0001   Epoch: 19   Global Step: 323640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:21,998-Speed 3343.20 samples/sec   Loss 0.1022   LearningRate 0.0001   Epoch: 19   Global Step: 323650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:25,067-Speed 3337.77 samples/sec   Loss 0.0957   LearningRate 0.0001   Epoch: 19   Global Step: 323660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:28,130-Speed 3343.19 samples/sec   Loss 0.1024   LearningRate 0.0001   Epoch: 19   Global Step: 323670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:31,195-Speed 3342.00 samples/sec   Loss 0.1071   LearningRate 0.0001   Epoch: 19   Global Step: 323680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:34,284-Speed 3315.06 samples/sec   Loss 0.1012   LearningRate 0.0001   Epoch: 19   Global Step: 323690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:37,347-Speed 3344.33 samples/sec   Loss 0.1047   LearningRate 0.0001   Epoch: 19   Global Step: 323700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:40,427-Speed 3325.23 samples/sec   Loss 0.0951   LearningRate 0.0001   Epoch: 19   Global Step: 323710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:29:43,503-Speed 3330.38 samples/sec   Loss 0.1150   LearningRate 0.0001   Epoch: 19   Global Step: 323720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:29:46,567-Speed 3343.24 samples/sec   Loss 0.1066   LearningRate 0.0001   Epoch: 19   Global Step: 323730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:29:49,631-Speed 3342.24 samples/sec   Loss 0.0984   LearningRate 0.0001   Epoch: 19   Global Step: 323740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:29:52,698-Speed 3340.59 samples/sec   Loss 0.0954   LearningRate 0.0001   Epoch: 19   Global Step: 323750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:29:55,782-Speed 3320.58 samples/sec   Loss 0.0989   LearningRate 0.0001   Epoch: 19   Global Step: 323760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:29:58,846-Speed 3342.04 samples/sec   Loss 0.0954   LearningRate 0.0001   Epoch: 19   Global Step: 323770   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:01,941-Speed 3310.28 samples/sec   Loss 0.1048   LearningRate 0.0001   Epoch: 19   Global Step: 323780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:05,003-Speed 3344.36 samples/sec   Loss 0.1094   LearningRate 0.0001   Epoch: 19   Global Step: 323790   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:08,201-Speed 3203.08 samples/sec   Loss 0.1104   LearningRate 0.0001   Epoch: 19   Global Step: 323800   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:11,322-Speed 3281.35 samples/sec   Loss 0.0938   LearningRate 0.0001   Epoch: 19   Global Step: 323810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:14,379-Speed 3350.80 samples/sec   Loss 0.1098   LearningRate 0.0001   Epoch: 19   Global Step: 323820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:17,449-Speed 3336.52 samples/sec   Loss 0.1097   LearningRate 0.0001   Epoch: 19   Global Step: 323830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:20,533-Speed 3320.84 samples/sec   Loss 0.0979   LearningRate 0.0001   Epoch: 19   Global Step: 323840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:23,609-Speed 3329.59 samples/sec   Loss 0.1013   LearningRate 0.0001   Epoch: 19   Global Step: 323850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:26,675-Speed 3341.23 samples/sec   Loss 0.0997   LearningRate 0.0001   Epoch: 19   Global Step: 323860   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:29,760-Speed 3319.87 samples/sec   Loss 0.1064   LearningRate 0.0001   Epoch: 19   Global Step: 323870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:32,833-Speed 3333.10 samples/sec   Loss 0.0967   LearningRate 0.0001   Epoch: 19   Global Step: 323880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:35,907-Speed 3332.04 samples/sec   Loss 0.1052   LearningRate 0.0001   Epoch: 19   Global Step: 323890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:38,999-Speed 3312.84 samples/sec   Loss 0.0978   LearningRate 0.0001   Epoch: 19   Global Step: 323900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:42,096-Speed 3306.49 samples/sec   Loss 0.1129   LearningRate 0.0001   Epoch: 19   Global Step: 323910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:45,167-Speed 3336.01 samples/sec   Loss 0.1005   LearningRate 0.0001   Epoch: 19   Global Step: 323920   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:30:48,253-Speed 3318.06 samples/sec   Loss 0.1048   LearningRate 0.0001   Epoch: 19   Global Step: 323930   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:30:51,323-Speed 3336.92 samples/sec   Loss 0.1089   LearningRate 0.0001   Epoch: 19   Global Step: 323940   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:30:54,401-Speed 3327.73 samples/sec   Loss 0.0940   LearningRate 0.0001   Epoch: 19   Global Step: 323950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:30:57,470-Speed 3337.48 samples/sec   Loss 0.1000   LearningRate 0.0001   Epoch: 19   Global Step: 323960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:31:00,558-Speed 3315.89 samples/sec   Loss 0.0971   LearningRate 0.0001   Epoch: 19   Global Step: 323970   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:31:03,634-Speed 3330.19 samples/sec   Loss 0.0933   LearningRate 0.0001   Epoch: 19   Global Step: 323980   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:31:06,714-Speed 3325.96 samples/sec   Loss 0.1060   LearningRate 0.0001   Epoch: 19   Global Step: 323990   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:31:09,779-Speed 3341.66 samples/sec   Loss 0.1064   LearningRate 0.0001   Epoch: 19   Global Step: 324000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:31:53,369-[lfw][324000]XNorm: 20.725748
Training: 2022-04-12 09:31:53,369-[lfw][324000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 09:31:53,369-[lfw][324000]Accuracy-Highest: 0.99817
Training: 2022-04-12 09:32:43,980-[cfp_fp][324000]XNorm: 22.611323
Training: 2022-04-12 09:32:43,981-[cfp_fp][324000]Accuracy-Flip: 0.99143+-0.00361
Training: 2022-04-12 09:32:43,981-[cfp_fp][324000]Accuracy-Highest: 0.99200
Training: 2022-04-12 09:33:27,499-[agedb_30][324000]XNorm: 22.784648
Training: 2022-04-12 09:33:27,500-[agedb_30][324000]Accuracy-Flip: 0.98600+-0.00544
Training: 2022-04-12 09:33:27,500-[agedb_30][324000]Accuracy-Highest: 0.98650
Training: 2022-04-12 09:33:30,560-Speed 72.74 samples/sec   Loss 0.1049   LearningRate 0.0001   Epoch: 19   Global Step: 324010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:33:33,618-Speed 3349.61 samples/sec   Loss 0.1036   LearningRate 0.0001   Epoch: 19   Global Step: 324020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:33:36,669-Speed 3357.12 samples/sec   Loss 0.0989   LearningRate 0.0001   Epoch: 19   Global Step: 324030   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:33:39,731-Speed 3344.28 samples/sec   Loss 0.1026   LearningRate 0.0001   Epoch: 19   Global Step: 324040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:33:42,785-Speed 3354.17 samples/sec   Loss 0.1016   LearningRate 0.0001   Epoch: 19   Global Step: 324050   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:33:45,840-Speed 3352.24 samples/sec   Loss 0.1077   LearningRate 0.0001   Epoch: 19   Global Step: 324060   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:33:48,881-Speed 3368.02 samples/sec   Loss 0.0990   LearningRate 0.0001   Epoch: 19   Global Step: 324070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:33:51,944-Speed 3344.42 samples/sec   Loss 0.1067   LearningRate 0.0001   Epoch: 19   Global Step: 324080   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:33:54,999-Speed 3351.94 samples/sec   Loss 0.1109   LearningRate 0.0001   Epoch: 19   Global Step: 324090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:33:58,059-Speed 3347.09 samples/sec   Loss 0.1061   LearningRate 0.0001   Epoch: 19   Global Step: 324100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:01,132-Speed 3333.60 samples/sec   Loss 0.1053   LearningRate 0.0001   Epoch: 19   Global Step: 324110   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:04,191-Speed 3348.20 samples/sec   Loss 0.1101   LearningRate 0.0001   Epoch: 19   Global Step: 324120   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:07,254-Speed 3343.51 samples/sec   Loss 0.1068   LearningRate 0.0001   Epoch: 19   Global Step: 324130   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:10,325-Speed 3335.80 samples/sec   Loss 0.1025   LearningRate 0.0001   Epoch: 19   Global Step: 324140   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:13,388-Speed 3343.56 samples/sec   Loss 0.0981   LearningRate 0.0001   Epoch: 19   Global Step: 324150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:16,450-Speed 3344.93 samples/sec   Loss 0.1107   LearningRate 0.0001   Epoch: 19   Global Step: 324160   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:19,561-Speed 3292.41 samples/sec   Loss 0.1026   LearningRate 0.0001   Epoch: 19   Global Step: 324170   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:34:22,625-Speed 3342.94 samples/sec   Loss 0.1027   LearningRate 0.0001   Epoch: 19   Global Step: 324180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:25,696-Speed 3335.32 samples/sec   Loss 0.1024   LearningRate 0.0001   Epoch: 19   Global Step: 324190   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:28,775-Speed 3326.20 samples/sec   Loss 0.1119   LearningRate 0.0001   Epoch: 19   Global Step: 324200   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:31,903-Speed 3273.65 samples/sec   Loss 0.1088   LearningRate 0.0001   Epoch: 19   Global Step: 324210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:34,983-Speed 3325.55 samples/sec   Loss 0.1096   LearningRate 0.0001   Epoch: 19   Global Step: 324220   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:38,070-Speed 3317.94 samples/sec   Loss 0.1106   LearningRate 0.0001   Epoch: 19   Global Step: 324230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:41,174-Speed 3299.61 samples/sec   Loss 0.0948   LearningRate 0.0001   Epoch: 19   Global Step: 324240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:44,330-Speed 3246.09 samples/sec   Loss 0.0974   LearningRate 0.0001   Epoch: 19   Global Step: 324250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:47,472-Speed 3259.42 samples/sec   Loss 0.1027   LearningRate 0.0001   Epoch: 19   Global Step: 324260   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:50,623-Speed 3250.40 samples/sec   Loss 0.1006   LearningRate 0.0001   Epoch: 19   Global Step: 324270   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:34:53,696-Speed 3333.52 samples/sec   Loss 0.0924   LearningRate 0.0001   Epoch: 19   Global Step: 324280   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:34:56,768-Speed 3333.28 samples/sec   Loss 0.1000   LearningRate 0.0001   Epoch: 19   Global Step: 324290   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:34:59,858-Speed 3315.22 samples/sec   Loss 0.1021   LearningRate 0.0001   Epoch: 19   Global Step: 324300   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:35:03,045-Speed 3213.65 samples/sec   Loss 0.1002   LearningRate 0.0001   Epoch: 19   Global Step: 324310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:06,114-Speed 3337.34 samples/sec   Loss 0.1026   LearningRate 0.0001   Epoch: 19   Global Step: 324320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:09,217-Speed 3300.10 samples/sec   Loss 0.1024   LearningRate 0.0001   Epoch: 19   Global Step: 324330   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:12,291-Speed 3332.04 samples/sec   Loss 0.1119   LearningRate 0.0001   Epoch: 19   Global Step: 324340   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:15,378-Speed 3318.45 samples/sec   Loss 0.1088   LearningRate 0.0001   Epoch: 19   Global Step: 324350   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:18,464-Speed 3318.81 samples/sec   Loss 0.1000   LearningRate 0.0001   Epoch: 19   Global Step: 324360   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:21,675-Speed 3190.24 samples/sec   Loss 0.1040   LearningRate 0.0001   Epoch: 19   Global Step: 324370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:24,809-Speed 3267.84 samples/sec   Loss 0.1123   LearningRate 0.0001   Epoch: 19   Global Step: 324380   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:27,872-Speed 3344.29 samples/sec   Loss 0.1030   LearningRate 0.0001   Epoch: 19   Global Step: 324390   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:30,935-Speed 3343.06 samples/sec   Loss 0.0984   LearningRate 0.0001   Epoch: 19   Global Step: 324400   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:34,024-Speed 3316.64 samples/sec   Loss 0.0970   LearningRate 0.0001   Epoch: 19   Global Step: 324410   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:35:37,075-Speed 3356.52 samples/sec   Loss 0.1097   LearningRate 0.0001   Epoch: 19   Global Step: 324420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:40,137-Speed 3345.11 samples/sec   Loss 0.1156   LearningRate 0.0001   Epoch: 19   Global Step: 324430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:43,212-Speed 3330.10 samples/sec   Loss 0.1029   LearningRate 0.0001   Epoch: 19   Global Step: 324440   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:46,284-Speed 3335.18 samples/sec   Loss 0.1040   LearningRate 0.0001   Epoch: 19   Global Step: 324450   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:49,377-Speed 3311.36 samples/sec   Loss 0.1065   LearningRate 0.0001   Epoch: 19   Global Step: 324460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:52,503-Speed 3276.42 samples/sec   Loss 0.0993   LearningRate 0.0001   Epoch: 19   Global Step: 324470   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:55,573-Speed 3336.49 samples/sec   Loss 0.1040   LearningRate 0.0001   Epoch: 19   Global Step: 324480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:35:58,641-Speed 3337.93 samples/sec   Loss 0.1091   LearningRate 0.0001   Epoch: 19   Global Step: 324490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:01,724-Speed 3321.79 samples/sec   Loss 0.1121   LearningRate 0.0001   Epoch: 19   Global Step: 324500   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:04,804-Speed 3325.96 samples/sec   Loss 0.0988   LearningRate 0.0001   Epoch: 19   Global Step: 324510   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:07,875-Speed 3335.22 samples/sec   Loss 0.1082   LearningRate 0.0001   Epoch: 19   Global Step: 324520   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:36:10,966-Speed 3313.69 samples/sec   Loss 0.1000   LearningRate 0.0001   Epoch: 19   Global Step: 324530   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:36:14,042-Speed 3329.31 samples/sec   Loss 0.1079   LearningRate 0.0001   Epoch: 19   Global Step: 324540   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:36:17,109-Speed 3340.28 samples/sec   Loss 0.0987   LearningRate 0.0001   Epoch: 19   Global Step: 324550   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:20,236-Speed 3274.97 samples/sec   Loss 0.0934   LearningRate 0.0001   Epoch: 19   Global Step: 324560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:23,351-Speed 3288.01 samples/sec   Loss 0.1061   LearningRate 0.0001   Epoch: 19   Global Step: 324570   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:26,459-Speed 3295.25 samples/sec   Loss 0.1013   LearningRate 0.0001   Epoch: 19   Global Step: 324580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:29,537-Speed 3328.13 samples/sec   Loss 0.1131   LearningRate 0.0001   Epoch: 19   Global Step: 324590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:32,714-Speed 3223.94 samples/sec   Loss 0.1014   LearningRate 0.0001   Epoch: 19   Global Step: 324600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:35,847-Speed 3268.58 samples/sec   Loss 0.1117   LearningRate 0.0001   Epoch: 19   Global Step: 324610   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:38,932-Speed 3320.55 samples/sec   Loss 0.1066   LearningRate 0.0001   Epoch: 19   Global Step: 324620   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:36:41,986-Speed 3354.01 samples/sec   Loss 0.1083   LearningRate 0.0001   Epoch: 19   Global Step: 324630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:36:45,107-Speed 3281.87 samples/sec   Loss 0.1032   LearningRate 0.0001   Epoch: 19   Global Step: 324640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:36:48,174-Speed 3338.40 samples/sec   Loss 0.0976   LearningRate 0.0001   Epoch: 19   Global Step: 324650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:36:51,239-Speed 3342.20 samples/sec   Loss 0.1022   LearningRate 0.0001   Epoch: 19   Global Step: 324660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:36:54,308-Speed 3337.14 samples/sec   Loss 0.1056   LearningRate 0.0001   Epoch: 19   Global Step: 324670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:36:57,428-Speed 3282.85 samples/sec   Loss 0.0976   LearningRate 0.0001   Epoch: 19   Global Step: 324680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:37:00,492-Speed 3342.60 samples/sec   Loss 0.1088   LearningRate 0.0001   Epoch: 19   Global Step: 324690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:37:03,594-Speed 3301.62 samples/sec   Loss 0.1036   LearningRate 0.0001   Epoch: 19   Global Step: 324700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:37:06,728-Speed 3268.75 samples/sec   Loss 0.0958   LearningRate 0.0001   Epoch: 19   Global Step: 324710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:37:09,807-Speed 3326.33 samples/sec   Loss 0.1094   LearningRate 0.0001   Epoch: 19   Global Step: 324720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:37:12,974-Speed 3234.45 samples/sec   Loss 0.0956   LearningRate 0.0001   Epoch: 19   Global Step: 324730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:16,067-Speed 3311.73 samples/sec   Loss 0.1064   LearningRate 0.0001   Epoch: 19   Global Step: 324740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:19,161-Speed 3310.28 samples/sec   Loss 0.0968   LearningRate 0.0001   Epoch: 19   Global Step: 324750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:22,234-Speed 3332.74 samples/sec   Loss 0.1053   LearningRate 0.0001   Epoch: 19   Global Step: 324760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:25,308-Speed 3331.73 samples/sec   Loss 0.1099   LearningRate 0.0001   Epoch: 19   Global Step: 324770   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:28,380-Speed 3334.23 samples/sec   Loss 0.0936   LearningRate 0.0001   Epoch: 19   Global Step: 324780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:31,510-Speed 3272.24 samples/sec   Loss 0.0987   LearningRate 0.0001   Epoch: 19   Global Step: 324790   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:34,618-Speed 3295.15 samples/sec   Loss 0.1081   LearningRate 0.0001   Epoch: 19   Global Step: 324800   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:37,701-Speed 3323.34 samples/sec   Loss 0.1073   LearningRate 0.0001   Epoch: 19   Global Step: 324810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:40,814-Speed 3289.51 samples/sec   Loss 0.1030   LearningRate 0.0001   Epoch: 19   Global Step: 324820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:43,942-Speed 3274.96 samples/sec   Loss 0.1043   LearningRate 0.0001   Epoch: 19   Global Step: 324830   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:37:47,030-Speed 3316.93 samples/sec   Loss 0.1046   LearningRate 0.0001   Epoch: 19   Global Step: 324840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:50,196-Speed 3234.51 samples/sec   Loss 0.1050   LearningRate 0.0001   Epoch: 19   Global Step: 324850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:53,345-Speed 3252.71 samples/sec   Loss 0.0931   LearningRate 0.0001   Epoch: 19   Global Step: 324860   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:56,415-Speed 3336.14 samples/sec   Loss 0.0987   LearningRate 0.0001   Epoch: 19   Global Step: 324870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:37:59,532-Speed 3285.68 samples/sec   Loss 0.0990   LearningRate 0.0001   Epoch: 19   Global Step: 324880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:38:02,634-Speed 3302.15 samples/sec   Loss 0.0908   LearningRate 0.0001   Epoch: 19   Global Step: 324890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:38:05,709-Speed 3330.79 samples/sec   Loss 0.1088   LearningRate 0.0001   Epoch: 19   Global Step: 324900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:38:08,827-Speed 3285.27 samples/sec   Loss 0.1108   LearningRate 0.0001   Epoch: 19   Global Step: 324910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:38:11,913-Speed 3318.88 samples/sec   Loss 0.0980   LearningRate 0.0001   Epoch: 19   Global Step: 324920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:38:14,983-Speed 3336.06 samples/sec   Loss 0.1142   LearningRate 0.0001   Epoch: 19   Global Step: 324930   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:38:18,050-Speed 3340.15 samples/sec   Loss 0.1159   LearningRate 0.0001   Epoch: 19   Global Step: 324940   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:38:21,160-Speed 3292.47 samples/sec   Loss 0.1227   LearningRate 0.0001   Epoch: 19   Global Step: 324950   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:38:24,315-Speed 3246.81 samples/sec   Loss 0.1043   LearningRate 0.0001   Epoch: 19   Global Step: 324960   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:38:27,422-Speed 3296.26 samples/sec   Loss 0.1055   LearningRate 0.0001   Epoch: 19   Global Step: 324970   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:38:30,527-Speed 3298.99 samples/sec   Loss 0.1015   LearningRate 0.0001   Epoch: 19   Global Step: 324980   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:38:33,608-Speed 3324.58 samples/sec   Loss 0.1048   LearningRate 0.0001   Epoch: 19   Global Step: 324990   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:38:36,691-Speed 3322.18 samples/sec   Loss 0.1126   LearningRate 0.0001   Epoch: 19   Global Step: 325000   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:38:39,768-Speed 3328.83 samples/sec   Loss 0.0972   LearningRate 0.0001   Epoch: 19   Global Step: 325010   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:38:42,823-Speed 3352.00 samples/sec   Loss 0.1028   LearningRate 0.0001   Epoch: 19   Global Step: 325020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:38:45,890-Speed 3340.05 samples/sec   Loss 0.0967   LearningRate 0.0001   Epoch: 19   Global Step: 325030   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:38:48,972-Speed 3323.00 samples/sec   Loss 0.1013   LearningRate 0.0001   Epoch: 19   Global Step: 325040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:38:52,035-Speed 3343.42 samples/sec   Loss 0.1050   LearningRate 0.0001   Epoch: 19   Global Step: 325050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:38:55,148-Speed 3290.32 samples/sec   Loss 0.1095   LearningRate 0.0001   Epoch: 19   Global Step: 325060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:38:58,218-Speed 3336.50 samples/sec   Loss 0.1068   LearningRate 0.0001   Epoch: 19   Global Step: 325070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:39:01,290-Speed 3334.30 samples/sec   Loss 0.1044   LearningRate 0.0001   Epoch: 19   Global Step: 325080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:39:04,370-Speed 3325.46 samples/sec   Loss 0.0957   LearningRate 0.0001   Epoch: 19   Global Step: 325090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:39:07,469-Speed 3304.53 samples/sec   Loss 0.0930   LearningRate 0.0001   Epoch: 19   Global Step: 325100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:39:10,544-Speed 3331.34 samples/sec   Loss 0.0992   LearningRate 0.0001   Epoch: 19   Global Step: 325110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:39:13,616-Speed 3333.79 samples/sec   Loss 0.1087   LearningRate 0.0001   Epoch: 19   Global Step: 325120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:39:16,684-Speed 3338.74 samples/sec   Loss 0.1163   LearningRate 0.0001   Epoch: 19   Global Step: 325130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:39:19,746-Speed 3345.19 samples/sec   Loss 0.1007   LearningRate 0.0001   Epoch: 19   Global Step: 325140   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:22,886-Speed 3260.93 samples/sec   Loss 0.1135   LearningRate 0.0001   Epoch: 19   Global Step: 325150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:25,966-Speed 3326.58 samples/sec   Loss 0.1144   LearningRate 0.0001   Epoch: 19   Global Step: 325160   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:29,042-Speed 3329.50 samples/sec   Loss 0.0961   LearningRate 0.0001   Epoch: 19   Global Step: 325170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:32,116-Speed 3331.92 samples/sec   Loss 0.1052   LearningRate 0.0001   Epoch: 19   Global Step: 325180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:35,184-Speed 3338.74 samples/sec   Loss 0.1111   LearningRate 0.0001   Epoch: 19   Global Step: 325190   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:38,267-Speed 3322.11 samples/sec   Loss 0.1042   LearningRate 0.0001   Epoch: 19   Global Step: 325200   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:41,353-Speed 3318.12 samples/sec   Loss 0.1018   LearningRate 0.0001   Epoch: 19   Global Step: 325210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:44,504-Speed 3250.41 samples/sec   Loss 0.1086   LearningRate 0.0001   Epoch: 19   Global Step: 325220   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:47,618-Speed 3289.31 samples/sec   Loss 0.1053   LearningRate 0.0001   Epoch: 19   Global Step: 325230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:50,695-Speed 3329.05 samples/sec   Loss 0.1081   LearningRate 0.0001   Epoch: 19   Global Step: 325240   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:39:53,800-Speed 3298.66 samples/sec   Loss 0.1031   LearningRate 0.0001   Epoch: 19   Global Step: 325250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:39:57,014-Speed 3187.28 samples/sec   Loss 0.1007   LearningRate 0.0001   Epoch: 19   Global Step: 325260   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:00,149-Speed 3266.86 samples/sec   Loss 0.1087   LearningRate 0.0001   Epoch: 19   Global Step: 325270   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:03,232-Speed 3322.42 samples/sec   Loss 0.1004   LearningRate 0.0001   Epoch: 19   Global Step: 325280   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:06,302-Speed 3335.25 samples/sec   Loss 0.1031   LearningRate 0.0001   Epoch: 19   Global Step: 325290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:09,456-Speed 3247.56 samples/sec   Loss 0.1032   LearningRate 0.0001   Epoch: 19   Global Step: 325300   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:12,597-Speed 3260.65 samples/sec   Loss 0.1056   LearningRate 0.0001   Epoch: 19   Global Step: 325310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:15,682-Speed 3320.69 samples/sec   Loss 0.1069   LearningRate 0.0001   Epoch: 19   Global Step: 325320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:18,768-Speed 3319.42 samples/sec   Loss 0.1110   LearningRate 0.0001   Epoch: 19   Global Step: 325330   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:21,821-Speed 3354.39 samples/sec   Loss 0.1096   LearningRate 0.0001   Epoch: 19   Global Step: 325340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:24,891-Speed 3335.89 samples/sec   Loss 0.1014   LearningRate 0.0001   Epoch: 19   Global Step: 325350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:27,962-Speed 3335.01 samples/sec   Loss 0.1101   LearningRate 0.0001   Epoch: 19   Global Step: 325360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:31,045-Speed 3322.45 samples/sec   Loss 0.1006   LearningRate 0.0001   Epoch: 19   Global Step: 325370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:34,124-Speed 3326.17 samples/sec   Loss 0.1089   LearningRate 0.0001   Epoch: 19   Global Step: 325380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:37,194-Speed 3336.09 samples/sec   Loss 0.1061   LearningRate 0.0001   Epoch: 19   Global Step: 325390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:40,270-Speed 3330.45 samples/sec   Loss 0.1030   LearningRate 0.0001   Epoch: 19   Global Step: 325400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:43,349-Speed 3326.29 samples/sec   Loss 0.1081   LearningRate 0.0001   Epoch: 19   Global Step: 325410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:46,419-Speed 3336.94 samples/sec   Loss 0.0987   LearningRate 0.0001   Epoch: 19   Global Step: 325420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:49,491-Speed 3334.07 samples/sec   Loss 0.1078   LearningRate 0.0001   Epoch: 19   Global Step: 325430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:40:52,629-Speed 3263.54 samples/sec   Loss 0.1051   LearningRate 0.0001   Epoch: 19   Global Step: 325440   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:55,705-Speed 3329.58 samples/sec   Loss 0.0986   LearningRate 0.0001   Epoch: 19   Global Step: 325450   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:40:58,769-Speed 3342.92 samples/sec   Loss 0.1150   LearningRate 0.0001   Epoch: 19   Global Step: 325460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:01,868-Speed 3304.80 samples/sec   Loss 0.0961   LearningRate 0.0001   Epoch: 19   Global Step: 325470   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:04,938-Speed 3336.03 samples/sec   Loss 0.1070   LearningRate 0.0001   Epoch: 19   Global Step: 325480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:08,004-Speed 3340.56 samples/sec   Loss 0.0953   LearningRate 0.0001   Epoch: 19   Global Step: 325490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:11,080-Speed 3330.41 samples/sec   Loss 0.1140   LearningRate 0.0001   Epoch: 19   Global Step: 325500   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:14,184-Speed 3299.58 samples/sec   Loss 0.1047   LearningRate 0.0001   Epoch: 19   Global Step: 325510   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:17,334-Speed 3251.23 samples/sec   Loss 0.1061   LearningRate 0.0001   Epoch: 19   Global Step: 325520   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:20,462-Speed 3274.58 samples/sec   Loss 0.0951   LearningRate 0.0001   Epoch: 19   Global Step: 325530   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:23,520-Speed 3349.84 samples/sec   Loss 0.1096   LearningRate 0.0001   Epoch: 19   Global Step: 325540   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:26,593-Speed 3332.80 samples/sec   Loss 0.0991   LearningRate 0.0001   Epoch: 19   Global Step: 325550   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:29,698-Speed 3298.97 samples/sec   Loss 0.1010   LearningRate 0.0001   Epoch: 19   Global Step: 325560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:32,761-Speed 3343.65 samples/sec   Loss 0.1151   LearningRate 0.0001   Epoch: 19   Global Step: 325570   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:35,839-Speed 3327.21 samples/sec   Loss 0.1074   LearningRate 0.0001   Epoch: 19   Global Step: 325580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:38,913-Speed 3332.07 samples/sec   Loss 0.1137   LearningRate 0.0001   Epoch: 19   Global Step: 325590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:41,976-Speed 3344.43 samples/sec   Loss 0.0961   LearningRate 0.0001   Epoch: 19   Global Step: 325600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:45,119-Speed 3258.33 samples/sec   Loss 0.1006   LearningRate 0.0001   Epoch: 19   Global Step: 325610   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:48,223-Speed 3300.22 samples/sec   Loss 0.1111   LearningRate 0.0001   Epoch: 19   Global Step: 325620   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:51,292-Speed 3337.20 samples/sec   Loss 0.1074   LearningRate 0.0001   Epoch: 19   Global Step: 325630   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:41:54,358-Speed 3339.84 samples/sec   Loss 0.1006   LearningRate 0.0001   Epoch: 19   Global Step: 325640   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:41:57,435-Speed 3329.24 samples/sec   Loss 0.1051   LearningRate 0.0001   Epoch: 19   Global Step: 325650   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:42:00,491-Speed 3351.07 samples/sec   Loss 0.1098   LearningRate 0.0001   Epoch: 19   Global Step: 325660   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:03,567-Speed 3330.28 samples/sec   Loss 0.1053   LearningRate 0.0001   Epoch: 19   Global Step: 325670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:06,630-Speed 3344.30 samples/sec   Loss 0.1048   LearningRate 0.0001   Epoch: 19   Global Step: 325680   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:09,711-Speed 3324.07 samples/sec   Loss 0.1032   LearningRate 0.0001   Epoch: 19   Global Step: 325690   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:12,785-Speed 3332.15 samples/sec   Loss 0.0982   LearningRate 0.0001   Epoch: 19   Global Step: 325700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:15,853-Speed 3338.55 samples/sec   Loss 0.1007   LearningRate 0.0001   Epoch: 19   Global Step: 325710   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:18,921-Speed 3338.12 samples/sec   Loss 0.0984   LearningRate 0.0001   Epoch: 19   Global Step: 325720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:22,008-Speed 3318.36 samples/sec   Loss 0.1062   LearningRate 0.0001   Epoch: 19   Global Step: 325730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:25,080-Speed 3333.86 samples/sec   Loss 0.1095   LearningRate 0.0001   Epoch: 19   Global Step: 325740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:28,151-Speed 3335.29 samples/sec   Loss 0.1129   LearningRate 0.0001   Epoch: 19   Global Step: 325750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:31,250-Speed 3305.14 samples/sec   Loss 0.1021   LearningRate 0.0001   Epoch: 19   Global Step: 325760   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:42:34,463-Speed 3187.22 samples/sec   Loss 0.0923   LearningRate 0.0001   Epoch: 19   Global Step: 325770   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:42:37,599-Speed 3266.10 samples/sec   Loss 0.1010   LearningRate 0.0001   Epoch: 19   Global Step: 325780   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:42:40,654-Speed 3352.95 samples/sec   Loss 0.1145   LearningRate 0.0001   Epoch: 19   Global Step: 325790   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:43,723-Speed 3336.74 samples/sec   Loss 0.1050   LearningRate 0.0001   Epoch: 19   Global Step: 325800   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:46,788-Speed 3342.37 samples/sec   Loss 0.1016   LearningRate 0.0001   Epoch: 19   Global Step: 325810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:49,855-Speed 3338.79 samples/sec   Loss 0.1022   LearningRate 0.0001   Epoch: 19   Global Step: 325820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:52,925-Speed 3336.99 samples/sec   Loss 0.1028   LearningRate 0.0001   Epoch: 19   Global Step: 325830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:56,012-Speed 3318.37 samples/sec   Loss 0.1051   LearningRate 0.0001   Epoch: 19   Global Step: 325840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:42:59,079-Speed 3339.04 samples/sec   Loss 0.1137   LearningRate 0.0001   Epoch: 19   Global Step: 325850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:43:02,149-Speed 3335.97 samples/sec   Loss 0.1045   LearningRate 0.0001   Epoch: 19   Global Step: 325860   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:43:05,217-Speed 3339.01 samples/sec   Loss 0.1092   LearningRate 0.0001   Epoch: 19   Global Step: 325870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:43:08,283-Speed 3340.16 samples/sec   Loss 0.0974   LearningRate 0.0001   Epoch: 19   Global Step: 325880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:43:11,353-Speed 3336.51 samples/sec   Loss 0.1049   LearningRate 0.0001   Epoch: 19   Global Step: 325890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:43:14,518-Speed 3236.33 samples/sec   Loss 0.0939   LearningRate 0.0001   Epoch: 19   Global Step: 325900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:43:17,644-Speed 3276.26 samples/sec   Loss 0.1000   LearningRate 0.0001   Epoch: 19   Global Step: 325910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:43:20,728-Speed 3320.98 samples/sec   Loss 0.0976   LearningRate 0.0001   Epoch: 19   Global Step: 325920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:43:23,801-Speed 3332.98 samples/sec   Loss 0.1168   LearningRate 0.0001   Epoch: 19   Global Step: 325930   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:43:26,856-Speed 3353.29 samples/sec   Loss 0.0963   LearningRate 0.0001   Epoch: 19   Global Step: 325940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:43:29,976-Speed 3282.04 samples/sec   Loss 0.1044   LearningRate 0.0001   Epoch: 19   Global Step: 325950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:43:33,165-Speed 3211.86 samples/sec   Loss 0.0986   LearningRate 0.0001   Epoch: 19   Global Step: 325960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:43:36,234-Speed 3336.96 samples/sec   Loss 0.1010   LearningRate 0.0001   Epoch: 19   Global Step: 325970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:43:39,303-Speed 3338.18 samples/sec   Loss 0.1060   LearningRate 0.0001   Epoch: 19   Global Step: 325980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:43:42,368-Speed 3341.61 samples/sec   Loss 0.0958   LearningRate 0.0001   Epoch: 19   Global Step: 325990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:43:45,435-Speed 3339.41 samples/sec   Loss 0.1036   LearningRate 0.0001   Epoch: 19   Global Step: 326000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:44:29,394-[lfw][326000]XNorm: 20.776914
Training: 2022-04-12 09:44:29,396-[lfw][326000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 09:44:29,396-[lfw][326000]Accuracy-Highest: 0.99817
Training: 2022-04-12 09:45:20,448-[cfp_fp][326000]XNorm: 22.677527
Training: 2022-04-12 09:45:20,449-[cfp_fp][326000]Accuracy-Flip: 0.99143+-0.00361
Training: 2022-04-12 09:45:20,449-[cfp_fp][326000]Accuracy-Highest: 0.99200
Training: 2022-04-12 09:46:04,339-[agedb_30][326000]XNorm: 22.861861
Training: 2022-04-12 09:46:04,340-[agedb_30][326000]Accuracy-Flip: 0.98633+-0.00572
Training: 2022-04-12 09:46:04,340-[agedb_30][326000]Accuracy-Highest: 0.98650
Training: 2022-04-12 09:46:07,399-Speed 72.13 samples/sec   Loss 0.1075   LearningRate 0.0001   Epoch: 19   Global Step: 326010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:46:10,480-Speed 3323.87 samples/sec   Loss 0.1093   LearningRate 0.0001   Epoch: 19   Global Step: 326020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:46:13,547-Speed 3339.45 samples/sec   Loss 0.1024   LearningRate 0.0001   Epoch: 19   Global Step: 326030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:46:16,618-Speed 3335.64 samples/sec   Loss 0.1078   LearningRate 0.0001   Epoch: 19   Global Step: 326040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:19,671-Speed 3354.40 samples/sec   Loss 0.1166   LearningRate 0.0001   Epoch: 19   Global Step: 326050   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:22,762-Speed 3313.46 samples/sec   Loss 0.0982   LearningRate 0.0001   Epoch: 19   Global Step: 326060   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:25,826-Speed 3343.37 samples/sec   Loss 0.1100   LearningRate 0.0001   Epoch: 19   Global Step: 326070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:28,883-Speed 3350.35 samples/sec   Loss 0.1008   LearningRate 0.0001   Epoch: 19   Global Step: 326080   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:31,939-Speed 3351.17 samples/sec   Loss 0.1117   LearningRate 0.0001   Epoch: 19   Global Step: 326090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:35,141-Speed 3198.55 samples/sec   Loss 0.0986   LearningRate 0.0001   Epoch: 19   Global Step: 326100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:38,284-Speed 3258.37 samples/sec   Loss 0.1006   LearningRate 0.0001   Epoch: 19   Global Step: 326110   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:41,366-Speed 3324.19 samples/sec   Loss 0.1110   LearningRate 0.0001   Epoch: 19   Global Step: 326120   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:44,428-Speed 3345.21 samples/sec   Loss 0.1004   LearningRate 0.0001   Epoch: 19   Global Step: 326130   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:47,488-Speed 3346.66 samples/sec   Loss 0.1128   LearningRate 0.0001   Epoch: 19   Global Step: 326140   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:46:50,537-Speed 3359.65 samples/sec   Loss 0.0959   LearningRate 0.0001   Epoch: 19   Global Step: 326150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:53,626-Speed 3315.10 samples/sec   Loss 0.1139   LearningRate 0.0001   Epoch: 19   Global Step: 326160   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:56,698-Speed 3334.90 samples/sec   Loss 0.1001   LearningRate 0.0001   Epoch: 19   Global Step: 326170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:46:59,790-Speed 3312.61 samples/sec   Loss 0.1055   LearningRate 0.0001   Epoch: 19   Global Step: 326180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:02,916-Speed 3276.29 samples/sec   Loss 0.0882   LearningRate 0.0001   Epoch: 19   Global Step: 326190   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:06,124-Speed 3192.28 samples/sec   Loss 0.1077   LearningRate 0.0001   Epoch: 19   Global Step: 326200   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:09,189-Speed 3341.91 samples/sec   Loss 0.1096   LearningRate 0.0001   Epoch: 19   Global Step: 326210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:12,285-Speed 3309.09 samples/sec   Loss 0.1056   LearningRate 0.0001   Epoch: 19   Global Step: 326220   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:15,481-Speed 3204.81 samples/sec   Loss 0.1103   LearningRate 0.0001   Epoch: 19   Global Step: 326230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:18,566-Speed 3320.03 samples/sec   Loss 0.1055   LearningRate 0.0001   Epoch: 19   Global Step: 326240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:21,649-Speed 3321.56 samples/sec   Loss 0.1027   LearningRate 0.0001   Epoch: 19   Global Step: 326250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:24,784-Speed 3266.79 samples/sec   Loss 0.1090   LearningRate 0.0001   Epoch: 19   Global Step: 326260   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:27,860-Speed 3330.04 samples/sec   Loss 0.1089   LearningRate 0.0001   Epoch: 19   Global Step: 326270   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:30,939-Speed 3326.91 samples/sec   Loss 0.0959   LearningRate 0.0001   Epoch: 19   Global Step: 326280   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:34,001-Speed 3344.09 samples/sec   Loss 0.1115   LearningRate 0.0001   Epoch: 19   Global Step: 326290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:37,069-Speed 3339.31 samples/sec   Loss 0.1015   LearningRate 0.0001   Epoch: 19   Global Step: 326300   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:40,152-Speed 3322.00 samples/sec   Loss 0.1052   LearningRate 0.0001   Epoch: 19   Global Step: 326310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:43,220-Speed 3338.96 samples/sec   Loss 0.1127   LearningRate 0.0001   Epoch: 19   Global Step: 326320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:46,322-Speed 3301.71 samples/sec   Loss 0.1100   LearningRate 0.0001   Epoch: 19   Global Step: 326330   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:49,391-Speed 3337.23 samples/sec   Loss 0.1161   LearningRate 0.0001   Epoch: 19   Global Step: 326340   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:52,469-Speed 3327.22 samples/sec   Loss 0.1011   LearningRate 0.0001   Epoch: 19   Global Step: 326350   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:55,534-Speed 3341.65 samples/sec   Loss 0.1057   LearningRate 0.0000   Epoch: 19   Global Step: 326360   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:47:58,605-Speed 3335.01 samples/sec   Loss 0.1135   LearningRate 0.0000   Epoch: 19   Global Step: 326370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:01,674-Speed 3337.35 samples/sec   Loss 0.1046   LearningRate 0.0000   Epoch: 19   Global Step: 326380   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:04,764-Speed 3315.32 samples/sec   Loss 0.0966   LearningRate 0.0000   Epoch: 19   Global Step: 326390   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:07,969-Speed 3196.36 samples/sec   Loss 0.1081   LearningRate 0.0000   Epoch: 19   Global Step: 326400   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:11,117-Speed 3253.34 samples/sec   Loss 0.1068   LearningRate 0.0000   Epoch: 19   Global Step: 326410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:14,260-Speed 3259.08 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 326420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:17,335-Speed 3330.21 samples/sec   Loss 0.0940   LearningRate 0.0000   Epoch: 19   Global Step: 326430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:20,458-Speed 3280.12 samples/sec   Loss 0.1106   LearningRate 0.0000   Epoch: 19   Global Step: 326440   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:23,538-Speed 3324.52 samples/sec   Loss 0.1062   LearningRate 0.0000   Epoch: 19   Global Step: 326450   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:26,642-Speed 3299.61 samples/sec   Loss 0.0961   LearningRate 0.0000   Epoch: 19   Global Step: 326460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:29,765-Speed 3279.78 samples/sec   Loss 0.1008   LearningRate 0.0000   Epoch: 19   Global Step: 326470   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:32,829-Speed 3343.02 samples/sec   Loss 0.0966   LearningRate 0.0000   Epoch: 19   Global Step: 326480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:35,921-Speed 3313.09 samples/sec   Loss 0.1016   LearningRate 0.0000   Epoch: 19   Global Step: 326490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:39,085-Speed 3236.50 samples/sec   Loss 0.1035   LearningRate 0.0000   Epoch: 19   Global Step: 326500   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:42,288-Speed 3198.55 samples/sec   Loss 0.0996   LearningRate 0.0000   Epoch: 19   Global Step: 326510   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:48:45,374-Speed 3318.40 samples/sec   Loss 0.1035   LearningRate 0.0000   Epoch: 19   Global Step: 326520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:48:48,523-Speed 3253.26 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 326530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:48:51,608-Speed 3319.14 samples/sec   Loss 0.1066   LearningRate 0.0000   Epoch: 19   Global Step: 326540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:48:54,713-Speed 3298.96 samples/sec   Loss 0.1050   LearningRate 0.0000   Epoch: 19   Global Step: 326550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:48:57,815-Speed 3302.16 samples/sec   Loss 0.1047   LearningRate 0.0000   Epoch: 19   Global Step: 326560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:49:00,899-Speed 3324.03 samples/sec   Loss 0.1052   LearningRate 0.0000   Epoch: 19   Global Step: 326570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:49:03,963-Speed 3341.80 samples/sec   Loss 0.0976   LearningRate 0.0000   Epoch: 19   Global Step: 326580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:49:07,045-Speed 3323.35 samples/sec   Loss 0.1122   LearningRate 0.0000   Epoch: 19   Global Step: 326590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:49:10,103-Speed 3349.05 samples/sec   Loss 0.1024   LearningRate 0.0000   Epoch: 19   Global Step: 326600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:49:13,219-Speed 3287.84 samples/sec   Loss 0.1053   LearningRate 0.0000   Epoch: 19   Global Step: 326610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:49:16,282-Speed 3342.86 samples/sec   Loss 0.0949   LearningRate 0.0000   Epoch: 19   Global Step: 326620   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:19,350-Speed 3338.73 samples/sec   Loss 0.1042   LearningRate 0.0000   Epoch: 19   Global Step: 326630   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:22,423-Speed 3333.57 samples/sec   Loss 0.1016   LearningRate 0.0000   Epoch: 19   Global Step: 326640   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:25,498-Speed 3330.19 samples/sec   Loss 0.1079   LearningRate 0.0000   Epoch: 19   Global Step: 326650   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:28,563-Speed 3342.07 samples/sec   Loss 0.1079   LearningRate 0.0000   Epoch: 19   Global Step: 326660   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:31,665-Speed 3303.12 samples/sec   Loss 0.1140   LearningRate 0.0000   Epoch: 19   Global Step: 326670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:34,817-Speed 3249.01 samples/sec   Loss 0.1086   LearningRate 0.0000   Epoch: 19   Global Step: 326680   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:37,951-Speed 3267.83 samples/sec   Loss 0.1029   LearningRate 0.0000   Epoch: 19   Global Step: 326690   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:41,130-Speed 3222.16 samples/sec   Loss 0.0971   LearningRate 0.0000   Epoch: 19   Global Step: 326700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:44,232-Speed 3302.35 samples/sec   Loss 0.0987   LearningRate 0.0000   Epoch: 19   Global Step: 326710   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:47,301-Speed 3338.04 samples/sec   Loss 0.0970   LearningRate 0.0000   Epoch: 19   Global Step: 326720   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:49:50,388-Speed 3317.45 samples/sec   Loss 0.0986   LearningRate 0.0000   Epoch: 19   Global Step: 326730   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:49:53,441-Speed 3354.39 samples/sec   Loss 0.1023   LearningRate 0.0000   Epoch: 19   Global Step: 326740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:56,510-Speed 3338.21 samples/sec   Loss 0.1081   LearningRate 0.0000   Epoch: 19   Global Step: 326750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:49:59,563-Speed 3355.01 samples/sec   Loss 0.1012   LearningRate 0.0000   Epoch: 19   Global Step: 326760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:02,634-Speed 3334.81 samples/sec   Loss 0.0994   LearningRate 0.0000   Epoch: 19   Global Step: 326770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:05,701-Speed 3339.71 samples/sec   Loss 0.1079   LearningRate 0.0000   Epoch: 19   Global Step: 326780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:08,767-Speed 3340.45 samples/sec   Loss 0.1143   LearningRate 0.0000   Epoch: 19   Global Step: 326790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:11,836-Speed 3337.30 samples/sec   Loss 0.0972   LearningRate 0.0000   Epoch: 19   Global Step: 326800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:14,913-Speed 3328.42 samples/sec   Loss 0.0997   LearningRate 0.0000   Epoch: 19   Global Step: 326810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:17,990-Speed 3329.87 samples/sec   Loss 0.0997   LearningRate 0.0000   Epoch: 19   Global Step: 326820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:21,063-Speed 3332.61 samples/sec   Loss 0.1001   LearningRate 0.0000   Epoch: 19   Global Step: 326830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:24,128-Speed 3341.60 samples/sec   Loss 0.1074   LearningRate 0.0000   Epoch: 19   Global Step: 326840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:27,195-Speed 3339.65 samples/sec   Loss 0.0935   LearningRate 0.0000   Epoch: 19   Global Step: 326850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:50:30,265-Speed 3335.99 samples/sec   Loss 0.1113   LearningRate 0.0000   Epoch: 19   Global Step: 326860   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:50:33,374-Speed 3294.00 samples/sec   Loss 0.1012   LearningRate 0.0000   Epoch: 19   Global Step: 326870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:50:36,442-Speed 3339.48 samples/sec   Loss 0.1080   LearningRate 0.0000   Epoch: 19   Global Step: 326880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:50:39,518-Speed 3329.14 samples/sec   Loss 0.1079   LearningRate 0.0000   Epoch: 19   Global Step: 326890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:50:42,649-Speed 3271.28 samples/sec   Loss 0.1059   LearningRate 0.0000   Epoch: 19   Global Step: 326900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:50:45,732-Speed 3322.57 samples/sec   Loss 0.0991   LearningRate 0.0000   Epoch: 19   Global Step: 326910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:50:48,805-Speed 3332.99 samples/sec   Loss 0.1038   LearningRate 0.0000   Epoch: 19   Global Step: 326920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:50:51,894-Speed 3315.68 samples/sec   Loss 0.0955   LearningRate 0.0000   Epoch: 19   Global Step: 326930   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:50:54,987-Speed 3311.73 samples/sec   Loss 0.1173   LearningRate 0.0000   Epoch: 19   Global Step: 326940   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:50:58,050-Speed 3344.27 samples/sec   Loss 0.1105   LearningRate 0.0000   Epoch: 19   Global Step: 326950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:01,127-Speed 3327.68 samples/sec   Loss 0.1058   LearningRate 0.0000   Epoch: 19   Global Step: 326960   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:51:04,212-Speed 3319.95 samples/sec   Loss 0.1034   LearningRate 0.0000   Epoch: 19   Global Step: 326970   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:51:07,324-Speed 3291.21 samples/sec   Loss 0.0976   LearningRate 0.0000   Epoch: 19   Global Step: 326980   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:51:10,471-Speed 3255.46 samples/sec   Loss 0.1107   LearningRate 0.0000   Epoch: 19   Global Step: 326990   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:13,617-Speed 3255.77 samples/sec   Loss 0.1123   LearningRate 0.0000   Epoch: 19   Global Step: 327000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:16,733-Speed 3286.47 samples/sec   Loss 0.1057   LearningRate 0.0000   Epoch: 19   Global Step: 327010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:19,840-Speed 3297.02 samples/sec   Loss 0.1106   LearningRate 0.0000   Epoch: 19   Global Step: 327020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:22,965-Speed 3277.69 samples/sec   Loss 0.1053   LearningRate 0.0000   Epoch: 19   Global Step: 327030   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:26,030-Speed 3341.14 samples/sec   Loss 0.1020   LearningRate 0.0000   Epoch: 19   Global Step: 327040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:29,177-Speed 3254.56 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 327050   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:32,245-Speed 3338.86 samples/sec   Loss 0.0991   LearningRate 0.0000   Epoch: 19   Global Step: 327060   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:35,304-Speed 3347.76 samples/sec   Loss 0.1079   LearningRate 0.0000   Epoch: 19   Global Step: 327070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:38,378-Speed 3332.15 samples/sec   Loss 0.0971   LearningRate 0.0000   Epoch: 19   Global Step: 327080   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:41,472-Speed 3311.63 samples/sec   Loss 0.1041   LearningRate 0.0000   Epoch: 19   Global Step: 327090   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:51:44,581-Speed 3294.56 samples/sec   Loss 0.0944   LearningRate 0.0000   Epoch: 19   Global Step: 327100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:47,643-Speed 3344.65 samples/sec   Loss 0.1137   LearningRate 0.0000   Epoch: 19   Global Step: 327110   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:50,711-Speed 3337.70 samples/sec   Loss 0.1104   LearningRate 0.0000   Epoch: 19   Global Step: 327120   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:53,790-Speed 3326.98 samples/sec   Loss 0.1035   LearningRate 0.0000   Epoch: 19   Global Step: 327130   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:56,869-Speed 3326.81 samples/sec   Loss 0.1099   LearningRate 0.0000   Epoch: 19   Global Step: 327140   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:51:59,951-Speed 3322.57 samples/sec   Loss 0.1048   LearningRate 0.0000   Epoch: 19   Global Step: 327150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:03,029-Speed 3328.28 samples/sec   Loss 0.1155   LearningRate 0.0000   Epoch: 19   Global Step: 327160   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:06,089-Speed 3346.71 samples/sec   Loss 0.1152   LearningRate 0.0000   Epoch: 19   Global Step: 327170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:09,195-Speed 3297.82 samples/sec   Loss 0.1083   LearningRate 0.0000   Epoch: 19   Global Step: 327180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:12,265-Speed 3336.90 samples/sec   Loss 0.1021   LearningRate 0.0000   Epoch: 19   Global Step: 327190   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:15,404-Speed 3262.49 samples/sec   Loss 0.1200   LearningRate 0.0000   Epoch: 19   Global Step: 327200   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:52:18,468-Speed 3342.96 samples/sec   Loss 0.1062   LearningRate 0.0000   Epoch: 19   Global Step: 327210   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:52:21,577-Speed 3294.19 samples/sec   Loss 0.1001   LearningRate 0.0000   Epoch: 19   Global Step: 327220   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:52:24,730-Speed 3248.77 samples/sec   Loss 0.1006   LearningRate 0.0000   Epoch: 19   Global Step: 327230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:27,874-Speed 3258.10 samples/sec   Loss 0.1102   LearningRate 0.0000   Epoch: 19   Global Step: 327240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:31,040-Speed 3234.29 samples/sec   Loss 0.1150   LearningRate 0.0000   Epoch: 19   Global Step: 327250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:34,109-Speed 3338.39 samples/sec   Loss 0.0973   LearningRate 0.0000   Epoch: 19   Global Step: 327260   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:37,352-Speed 3157.61 samples/sec   Loss 0.1063   LearningRate 0.0000   Epoch: 19   Global Step: 327270   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:40,416-Speed 3343.59 samples/sec   Loss 0.0962   LearningRate 0.0000   Epoch: 19   Global Step: 327280   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:43,522-Speed 3297.64 samples/sec   Loss 0.1007   LearningRate 0.0000   Epoch: 19   Global Step: 327290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:46,605-Speed 3321.87 samples/sec   Loss 0.1175   LearningRate 0.0000   Epoch: 19   Global Step: 327300   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:49,746-Speed 3260.84 samples/sec   Loss 0.1000   LearningRate 0.0000   Epoch: 19   Global Step: 327310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:52,980-Speed 3166.89 samples/sec   Loss 0.1071   LearningRate 0.0000   Epoch: 19   Global Step: 327320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:52:56,051-Speed 3335.12 samples/sec   Loss 0.1099   LearningRate 0.0000   Epoch: 19   Global Step: 327330   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:52:59,112-Speed 3346.02 samples/sec   Loss 0.1019   LearningRate 0.0000   Epoch: 19   Global Step: 327340   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:53:02,210-Speed 3306.17 samples/sec   Loss 0.0951   LearningRate 0.0000   Epoch: 19   Global Step: 327350   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:53:05,282-Speed 3334.44 samples/sec   Loss 0.1031   LearningRate 0.0000   Epoch: 19   Global Step: 327360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:08,511-Speed 3172.34 samples/sec   Loss 0.0977   LearningRate 0.0000   Epoch: 19   Global Step: 327370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:11,700-Speed 3211.34 samples/sec   Loss 0.1141   LearningRate 0.0000   Epoch: 19   Global Step: 327380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:14,802-Speed 3301.77 samples/sec   Loss 0.1006   LearningRate 0.0000   Epoch: 19   Global Step: 327390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:17,934-Speed 3270.67 samples/sec   Loss 0.1108   LearningRate 0.0000   Epoch: 19   Global Step: 327400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:21,088-Speed 3246.38 samples/sec   Loss 0.1042   LearningRate 0.0000   Epoch: 19   Global Step: 327410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:24,184-Speed 3309.02 samples/sec   Loss 0.1009   LearningRate 0.0000   Epoch: 19   Global Step: 327420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:27,271-Speed 3317.67 samples/sec   Loss 0.1065   LearningRate 0.0000   Epoch: 19   Global Step: 327430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:30,333-Speed 3344.98 samples/sec   Loss 0.0942   LearningRate 0.0000   Epoch: 19   Global Step: 327440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:33,490-Speed 3245.02 samples/sec   Loss 0.1130   LearningRate 0.0000   Epoch: 19   Global Step: 327450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:53:36,642-Speed 3248.94 samples/sec   Loss 0.1101   LearningRate 0.0000   Epoch: 19   Global Step: 327460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:53:39,901-Speed 3142.69 samples/sec   Loss 0.0981   LearningRate 0.0000   Epoch: 19   Global Step: 327470   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:53:42,967-Speed 3341.00 samples/sec   Loss 0.1050   LearningRate 0.0000   Epoch: 19   Global Step: 327480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:53:46,054-Speed 3317.23 samples/sec   Loss 0.1076   LearningRate 0.0000   Epoch: 19   Global Step: 327490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:53:49,133-Speed 3326.75 samples/sec   Loss 0.0990   LearningRate 0.0000   Epoch: 19   Global Step: 327500   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:53:52,224-Speed 3314.01 samples/sec   Loss 0.1097   LearningRate 0.0000   Epoch: 19   Global Step: 327510   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:53:55,305-Speed 3324.22 samples/sec   Loss 0.1032   LearningRate 0.0000   Epoch: 19   Global Step: 327520   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:53:58,391-Speed 3319.64 samples/sec   Loss 0.1094   LearningRate 0.0000   Epoch: 19   Global Step: 327530   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:01,498-Speed 3296.49 samples/sec   Loss 0.1040   LearningRate 0.0000   Epoch: 19   Global Step: 327540   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:04,587-Speed 3315.11 samples/sec   Loss 0.1162   LearningRate 0.0000   Epoch: 19   Global Step: 327550   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:07,690-Speed 3300.92 samples/sec   Loss 0.1037   LearningRate 0.0000   Epoch: 19   Global Step: 327560   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:54:10,832-Speed 3259.58 samples/sec   Loss 0.1065   LearningRate 0.0000   Epoch: 19   Global Step: 327570   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:54:13,989-Speed 3244.18 samples/sec   Loss 0.0973   LearningRate 0.0000   Epoch: 19   Global Step: 327580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:17,074-Speed 3320.16 samples/sec   Loss 0.1013   LearningRate 0.0000   Epoch: 19   Global Step: 327590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:20,162-Speed 3316.79 samples/sec   Loss 0.1067   LearningRate 0.0000   Epoch: 19   Global Step: 327600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:23,244-Speed 3324.14 samples/sec   Loss 0.0997   LearningRate 0.0000   Epoch: 19   Global Step: 327610   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:26,322-Speed 3327.39 samples/sec   Loss 0.1108   LearningRate 0.0000   Epoch: 19   Global Step: 327620   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:29,404-Speed 3323.83 samples/sec   Loss 0.1033   LearningRate 0.0000   Epoch: 19   Global Step: 327630   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:32,468-Speed 3342.51 samples/sec   Loss 0.1025   LearningRate 0.0000   Epoch: 19   Global Step: 327640   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:35,542-Speed 3331.43 samples/sec   Loss 0.0935   LearningRate 0.0000   Epoch: 19   Global Step: 327650   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:54:38,596-Speed 3353.28 samples/sec   Loss 0.1066   LearningRate 0.0000   Epoch: 19   Global Step: 327660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:54:41,665-Speed 3338.42 samples/sec   Loss 0.1058   LearningRate 0.0000   Epoch: 19   Global Step: 327670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:54:44,741-Speed 3328.95 samples/sec   Loss 0.0969   LearningRate 0.0000   Epoch: 19   Global Step: 327680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:54:47,819-Speed 3327.31 samples/sec   Loss 0.0975   LearningRate 0.0000   Epoch: 19   Global Step: 327690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:54:50,901-Speed 3323.51 samples/sec   Loss 0.1005   LearningRate 0.0000   Epoch: 19   Global Step: 327700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:54:53,972-Speed 3335.30 samples/sec   Loss 0.1095   LearningRate 0.0000   Epoch: 19   Global Step: 327710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:54:57,082-Speed 3294.29 samples/sec   Loss 0.0983   LearningRate 0.0000   Epoch: 19   Global Step: 327720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:55:00,158-Speed 3329.60 samples/sec   Loss 0.0936   LearningRate 0.0000   Epoch: 19   Global Step: 327730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:55:03,226-Speed 3337.66 samples/sec   Loss 0.1084   LearningRate 0.0000   Epoch: 19   Global Step: 327740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:55:06,295-Speed 3337.70 samples/sec   Loss 0.1171   LearningRate 0.0000   Epoch: 19   Global Step: 327750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:55:09,365-Speed 3336.01 samples/sec   Loss 0.0996   LearningRate 0.0000   Epoch: 19   Global Step: 327760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:12,561-Speed 3204.33 samples/sec   Loss 0.1135   LearningRate 0.0000   Epoch: 19   Global Step: 327770   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:15,639-Speed 3328.18 samples/sec   Loss 0.1024   LearningRate 0.0000   Epoch: 19   Global Step: 327780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:18,735-Speed 3308.99 samples/sec   Loss 0.1201   LearningRate 0.0000   Epoch: 19   Global Step: 327790   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:21,831-Speed 3308.00 samples/sec   Loss 0.1006   LearningRate 0.0000   Epoch: 19   Global Step: 327800   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:24,958-Speed 3275.21 samples/sec   Loss 0.1126   LearningRate 0.0000   Epoch: 19   Global Step: 327810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:28,043-Speed 3320.48 samples/sec   Loss 0.1088   LearningRate 0.0000   Epoch: 19   Global Step: 327820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:31,111-Speed 3337.72 samples/sec   Loss 0.1100   LearningRate 0.0000   Epoch: 19   Global Step: 327830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:34,197-Speed 3319.44 samples/sec   Loss 0.1065   LearningRate 0.0000   Epoch: 19   Global Step: 327840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:37,297-Speed 3303.31 samples/sec   Loss 0.1091   LearningRate 0.0000   Epoch: 19   Global Step: 327850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:40,500-Speed 3198.23 samples/sec   Loss 0.1023   LearningRate 0.0000   Epoch: 19   Global Step: 327860   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:55:43,572-Speed 3334.76 samples/sec   Loss 0.0933   LearningRate 0.0000   Epoch: 19   Global Step: 327870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:46,650-Speed 3327.43 samples/sec   Loss 0.1001   LearningRate 0.0000   Epoch: 19   Global Step: 327880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:49,721-Speed 3334.65 samples/sec   Loss 0.1126   LearningRate 0.0000   Epoch: 19   Global Step: 327890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:52,810-Speed 3316.31 samples/sec   Loss 0.1006   LearningRate 0.0000   Epoch: 19   Global Step: 327900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:55,994-Speed 3216.11 samples/sec   Loss 0.1007   LearningRate 0.0000   Epoch: 19   Global Step: 327910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:55:59,079-Speed 3320.44 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 327920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:56:02,182-Speed 3300.41 samples/sec   Loss 0.1000   LearningRate 0.0000   Epoch: 19   Global Step: 327930   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:56:05,277-Speed 3309.49 samples/sec   Loss 0.1073   LearningRate 0.0000   Epoch: 19   Global Step: 327940   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:56:08,372-Speed 3309.80 samples/sec   Loss 0.1001   LearningRate 0.0000   Epoch: 19   Global Step: 327950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:56:11,442-Speed 3336.68 samples/sec   Loss 0.0946   LearningRate 0.0000   Epoch: 19   Global Step: 327960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:56:14,546-Speed 3299.36 samples/sec   Loss 0.1001   LearningRate 0.0000   Epoch: 19   Global Step: 327970   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 09:56:17,607-Speed 3346.46 samples/sec   Loss 0.1092   LearningRate 0.0000   Epoch: 19   Global Step: 327980   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:56:20,702-Speed 3308.49 samples/sec   Loss 0.1105   LearningRate 0.0000   Epoch: 19   Global Step: 327990   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:56:23,801-Speed 3305.03 samples/sec   Loss 0.1157   LearningRate 0.0000   Epoch: 19   Global Step: 328000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:57:07,538-[lfw][328000]XNorm: 20.898565
Training: 2022-04-12 09:57:07,538-[lfw][328000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 09:57:07,539-[lfw][328000]Accuracy-Highest: 0.99817
Training: 2022-04-12 09:57:58,286-[cfp_fp][328000]XNorm: 22.769441
Training: 2022-04-12 09:57:58,286-[cfp_fp][328000]Accuracy-Flip: 0.99143+-0.00361
Training: 2022-04-12 09:57:58,287-[cfp_fp][328000]Accuracy-Highest: 0.99200
Training: 2022-04-12 09:58:42,000-[agedb_30][328000]XNorm: 22.953820
Training: 2022-04-12 09:58:42,001-[agedb_30][328000]Accuracy-Flip: 0.98567+-0.00655
Training: 2022-04-12 09:58:42,001-[agedb_30][328000]Accuracy-Highest: 0.98650
Training: 2022-04-12 09:58:45,081-Speed 72.48 samples/sec   Loss 0.1061   LearningRate 0.0000   Epoch: 19   Global Step: 328010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:58:48,143-Speed 3345.48 samples/sec   Loss 0.1115   LearningRate 0.0000   Epoch: 19   Global Step: 328020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:58:51,272-Speed 3273.49 samples/sec   Loss 0.1024   LearningRate 0.0000   Epoch: 19   Global Step: 328030   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:58:54,319-Speed 3360.98 samples/sec   Loss 0.1060   LearningRate 0.0000   Epoch: 19   Global Step: 328040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:58:57,401-Speed 3323.33 samples/sec   Loss 0.1089   LearningRate 0.0000   Epoch: 19   Global Step: 328050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:00,501-Speed 3303.27 samples/sec   Loss 0.1155   LearningRate 0.0000   Epoch: 19   Global Step: 328060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:03,589-Speed 3317.64 samples/sec   Loss 0.1123   LearningRate 0.0000   Epoch: 19   Global Step: 328070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:06,657-Speed 3338.48 samples/sec   Loss 0.1042   LearningRate 0.0000   Epoch: 19   Global Step: 328080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:09,754-Speed 3306.86 samples/sec   Loss 0.1140   LearningRate 0.0000   Epoch: 19   Global Step: 328090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:12,828-Speed 3331.83 samples/sec   Loss 0.1076   LearningRate 0.0000   Epoch: 19   Global Step: 328100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:16,050-Speed 3178.78 samples/sec   Loss 0.1026   LearningRate 0.0000   Epoch: 19   Global Step: 328110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:19,160-Speed 3293.79 samples/sec   Loss 0.0961   LearningRate 0.0000   Epoch: 19   Global Step: 328120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:22,223-Speed 3343.31 samples/sec   Loss 0.1165   LearningRate 0.0000   Epoch: 19   Global Step: 328130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:25,300-Speed 3329.52 samples/sec   Loss 0.1097   LearningRate 0.0000   Epoch: 19   Global Step: 328140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:28,370-Speed 3335.56 samples/sec   Loss 0.1008   LearningRate 0.0000   Epoch: 19   Global Step: 328150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:31,437-Speed 3339.47 samples/sec   Loss 0.1034   LearningRate 0.0000   Epoch: 19   Global Step: 328160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:34,499-Speed 3345.71 samples/sec   Loss 0.0986   LearningRate 0.0000   Epoch: 19   Global Step: 328170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:37,558-Speed 3348.37 samples/sec   Loss 0.1006   LearningRate 0.0000   Epoch: 19   Global Step: 328180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:40,633-Speed 3330.44 samples/sec   Loss 0.1082   LearningRate 0.0000   Epoch: 19   Global Step: 328190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:43,734-Speed 3302.95 samples/sec   Loss 0.0969   LearningRate 0.0000   Epoch: 19   Global Step: 328200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:46,905-Speed 3230.11 samples/sec   Loss 0.1104   LearningRate 0.0000   Epoch: 19   Global Step: 328210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:49,976-Speed 3334.87 samples/sec   Loss 0.0992   LearningRate 0.0000   Epoch: 19   Global Step: 328220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:53,048-Speed 3334.13 samples/sec   Loss 0.0932   LearningRate 0.0000   Epoch: 19   Global Step: 328230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 09:59:56,124-Speed 3329.72 samples/sec   Loss 0.0959   LearningRate 0.0000   Epoch: 19   Global Step: 328240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 09:59:59,204-Speed 3326.30 samples/sec   Loss 0.1026   LearningRate 0.0000   Epoch: 19   Global Step: 328250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:02,297-Speed 3311.21 samples/sec   Loss 0.0977   LearningRate 0.0000   Epoch: 19   Global Step: 328260   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:05,368-Speed 3334.91 samples/sec   Loss 0.1082   LearningRate 0.0000   Epoch: 19   Global Step: 328270   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:08,438-Speed 3335.90 samples/sec   Loss 0.1064   LearningRate 0.0000   Epoch: 19   Global Step: 328280   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:11,542-Speed 3300.18 samples/sec   Loss 0.1041   LearningRate 0.0000   Epoch: 19   Global Step: 328290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:14,764-Speed 3178.87 samples/sec   Loss 0.0968   LearningRate 0.0000   Epoch: 19   Global Step: 328300   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:17,894-Speed 3271.64 samples/sec   Loss 0.1013   LearningRate 0.0000   Epoch: 19   Global Step: 328310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:20,975-Speed 3325.13 samples/sec   Loss 0.1004   LearningRate 0.0000   Epoch: 19   Global Step: 328320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:24,062-Speed 3317.90 samples/sec   Loss 0.1061   LearningRate 0.0000   Epoch: 19   Global Step: 328330   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:27,133-Speed 3334.70 samples/sec   Loss 0.1064   LearningRate 0.0000   Epoch: 19   Global Step: 328340   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:00:30,202-Speed 3337.73 samples/sec   Loss 0.1027   LearningRate 0.0000   Epoch: 19   Global Step: 328350   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:33,309-Speed 3296.16 samples/sec   Loss 0.0975   LearningRate 0.0000   Epoch: 19   Global Step: 328360   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:36,385-Speed 3330.04 samples/sec   Loss 0.1069   LearningRate 0.0000   Epoch: 19   Global Step: 328370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:39,468-Speed 3322.26 samples/sec   Loss 0.1029   LearningRate 0.0000   Epoch: 19   Global Step: 328380   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:42,551-Speed 3321.95 samples/sec   Loss 0.1146   LearningRate 0.0000   Epoch: 19   Global Step: 328390   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:45,649-Speed 3306.57 samples/sec   Loss 0.1112   LearningRate 0.0000   Epoch: 19   Global Step: 328400   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:48,718-Speed 3337.23 samples/sec   Loss 0.1042   LearningRate 0.0000   Epoch: 19   Global Step: 328410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:51,824-Speed 3297.73 samples/sec   Loss 0.0954   LearningRate 0.0000   Epoch: 19   Global Step: 328420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:54,887-Speed 3344.12 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 328430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:00:57,974-Speed 3317.69 samples/sec   Loss 0.1062   LearningRate 0.0000   Epoch: 19   Global Step: 328440   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:01,037-Speed 3343.38 samples/sec   Loss 0.1010   LearningRate 0.0000   Epoch: 19   Global Step: 328450   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:01:04,102-Speed 3342.30 samples/sec   Loss 0.0909   LearningRate 0.0000   Epoch: 19   Global Step: 328460   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:01:07,155-Speed 3354.13 samples/sec   Loss 0.0986   LearningRate 0.0000   Epoch: 19   Global Step: 328470   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:10,226-Speed 3335.46 samples/sec   Loss 0.0945   LearningRate 0.0000   Epoch: 19   Global Step: 328480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:13,290-Speed 3343.04 samples/sec   Loss 0.0958   LearningRate 0.0000   Epoch: 19   Global Step: 328490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:16,459-Speed 3231.50 samples/sec   Loss 0.0986   LearningRate 0.0000   Epoch: 19   Global Step: 328500   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:19,680-Speed 3180.89 samples/sec   Loss 0.1011   LearningRate 0.0000   Epoch: 19   Global Step: 328510   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:22,750-Speed 3335.50 samples/sec   Loss 0.1066   LearningRate 0.0000   Epoch: 19   Global Step: 328520   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:25,877-Speed 3275.55 samples/sec   Loss 0.1065   LearningRate 0.0000   Epoch: 19   Global Step: 328530   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:28,944-Speed 3339.30 samples/sec   Loss 0.0985   LearningRate 0.0000   Epoch: 19   Global Step: 328540   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:32,013-Speed 3337.17 samples/sec   Loss 0.1075   LearningRate 0.0000   Epoch: 19   Global Step: 328550   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:35,087-Speed 3332.25 samples/sec   Loss 0.1033   LearningRate 0.0000   Epoch: 19   Global Step: 328560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:38,151-Speed 3342.63 samples/sec   Loss 0.1176   LearningRate 0.0000   Epoch: 19   Global Step: 328570   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:01:41,287-Speed 3266.37 samples/sec   Loss 0.1009   LearningRate 0.0000   Epoch: 19   Global Step: 328580   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:01:44,359-Speed 3333.91 samples/sec   Loss 0.1089   LearningRate 0.0000   Epoch: 19   Global Step: 328590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:47,476-Speed 3285.93 samples/sec   Loss 0.1012   LearningRate 0.0000   Epoch: 19   Global Step: 328600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:50,545-Speed 3337.99 samples/sec   Loss 0.0980   LearningRate 0.0000   Epoch: 19   Global Step: 328610   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:53,608-Speed 3343.16 samples/sec   Loss 0.0980   LearningRate 0.0000   Epoch: 19   Global Step: 328620   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:56,671-Speed 3344.26 samples/sec   Loss 0.1029   LearningRate 0.0000   Epoch: 19   Global Step: 328630   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:01:59,771-Speed 3304.01 samples/sec   Loss 0.1046   LearningRate 0.0000   Epoch: 19   Global Step: 328640   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:02,874-Speed 3300.15 samples/sec   Loss 0.1048   LearningRate 0.0000   Epoch: 19   Global Step: 328650   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:05,949-Speed 3331.63 samples/sec   Loss 0.0953   LearningRate 0.0000   Epoch: 19   Global Step: 328660   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:09,011-Speed 3344.09 samples/sec   Loss 0.1024   LearningRate 0.0000   Epoch: 19   Global Step: 328670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:12,107-Speed 3308.65 samples/sec   Loss 0.0998   LearningRate 0.0000   Epoch: 19   Global Step: 328680   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:15,180-Speed 3333.73 samples/sec   Loss 0.1024   LearningRate 0.0000   Epoch: 19   Global Step: 328690   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:02:18,240-Speed 3346.72 samples/sec   Loss 0.1134   LearningRate 0.0000   Epoch: 19   Global Step: 328700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:21,300-Speed 3347.83 samples/sec   Loss 0.1027   LearningRate 0.0000   Epoch: 19   Global Step: 328710   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:24,379-Speed 3325.73 samples/sec   Loss 0.1175   LearningRate 0.0000   Epoch: 19   Global Step: 328720   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:27,454-Speed 3330.78 samples/sec   Loss 0.1112   LearningRate 0.0000   Epoch: 19   Global Step: 328730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:30,531-Speed 3328.54 samples/sec   Loss 0.1097   LearningRate 0.0000   Epoch: 19   Global Step: 328740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:33,639-Speed 3296.10 samples/sec   Loss 0.0988   LearningRate 0.0000   Epoch: 19   Global Step: 328750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:36,800-Speed 3239.86 samples/sec   Loss 0.0971   LearningRate 0.0000   Epoch: 19   Global Step: 328760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:39,923-Speed 3280.01 samples/sec   Loss 0.1027   LearningRate 0.0000   Epoch: 19   Global Step: 328770   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:43,022-Speed 3304.88 samples/sec   Loss 0.1049   LearningRate 0.0000   Epoch: 19   Global Step: 328780   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:46,087-Speed 3341.93 samples/sec   Loss 0.1118   LearningRate 0.0000   Epoch: 19   Global Step: 328790   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:49,151-Speed 3342.73 samples/sec   Loss 0.1068   LearningRate 0.0000   Epoch: 19   Global Step: 328800   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:02:52,230-Speed 3326.88 samples/sec   Loss 0.1002   LearningRate 0.0000   Epoch: 19   Global Step: 328810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:55,308-Speed 3327.76 samples/sec   Loss 0.1154   LearningRate 0.0000   Epoch: 19   Global Step: 328820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:02:58,379-Speed 3335.12 samples/sec   Loss 0.1041   LearningRate 0.0000   Epoch: 19   Global Step: 328830   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:01,474-Speed 3309.17 samples/sec   Loss 0.0921   LearningRate 0.0000   Epoch: 19   Global Step: 328840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:04,554-Speed 3325.00 samples/sec   Loss 0.1089   LearningRate 0.0000   Epoch: 19   Global Step: 328850   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:07,749-Speed 3205.89 samples/sec   Loss 0.0972   LearningRate 0.0000   Epoch: 19   Global Step: 328860   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:10,822-Speed 3332.82 samples/sec   Loss 0.1135   LearningRate 0.0000   Epoch: 19   Global Step: 328870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:13,911-Speed 3316.73 samples/sec   Loss 0.1091   LearningRate 0.0000   Epoch: 19   Global Step: 328880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:17,061-Speed 3250.48 samples/sec   Loss 0.1096   LearningRate 0.0000   Epoch: 19   Global Step: 328890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:20,164-Speed 3301.54 samples/sec   Loss 0.1068   LearningRate 0.0000   Epoch: 19   Global Step: 328900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:23,243-Speed 3326.14 samples/sec   Loss 0.1030   LearningRate 0.0000   Epoch: 19   Global Step: 328910   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:03:26,367-Speed 3278.48 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 328920   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:03:29,450-Speed 3321.81 samples/sec   Loss 0.0975   LearningRate 0.0000   Epoch: 19   Global Step: 328930   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:03:32,535-Speed 3321.21 samples/sec   Loss 0.1082   LearningRate 0.0000   Epoch: 19   Global Step: 328940   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:03:35,614-Speed 3326.58 samples/sec   Loss 0.1033   LearningRate 0.0000   Epoch: 19   Global Step: 328950   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:03:38,670-Speed 3351.71 samples/sec   Loss 0.1164   LearningRate 0.0000   Epoch: 19   Global Step: 328960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:41,734-Speed 3342.89 samples/sec   Loss 0.1061   LearningRate 0.0000   Epoch: 19   Global Step: 328970   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:44,813-Speed 3325.79 samples/sec   Loss 0.1165   LearningRate 0.0000   Epoch: 19   Global Step: 328980   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:47,907-Speed 3311.00 samples/sec   Loss 0.0943   LearningRate 0.0000   Epoch: 19   Global Step: 328990   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:50,979-Speed 3333.43 samples/sec   Loss 0.0999   LearningRate 0.0000   Epoch: 19   Global Step: 329000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:54,089-Speed 3292.82 samples/sec   Loss 0.1014   LearningRate 0.0000   Epoch: 19   Global Step: 329010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:03:57,162-Speed 3333.35 samples/sec   Loss 0.1157   LearningRate 0.0000   Epoch: 19   Global Step: 329020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:04:00,302-Speed 3261.98 samples/sec   Loss 0.1028   LearningRate 0.0000   Epoch: 19   Global Step: 329030   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:04:03,397-Speed 3309.39 samples/sec   Loss 0.0995   LearningRate 0.0000   Epoch: 19   Global Step: 329040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:04:06,509-Speed 3291.71 samples/sec   Loss 0.0983   LearningRate 0.0000   Epoch: 19   Global Step: 329050   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:04:09,602-Speed 3311.67 samples/sec   Loss 0.1104   LearningRate 0.0000   Epoch: 19   Global Step: 329060   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:04:12,701-Speed 3304.54 samples/sec   Loss 0.1055   LearningRate 0.0000   Epoch: 19   Global Step: 329070   Fp16 Grad Scale: 262144   Required: 1 hours
Training: 2022-04-12 10:04:15,759-Speed 3349.84 samples/sec   Loss 0.1076   LearningRate 0.0000   Epoch: 19   Global Step: 329080   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:04:18,937-Speed 3222.31 samples/sec   Loss 0.1128   LearningRate 0.0000   Epoch: 19   Global Step: 329090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 10:04:22,046-Speed 3294.48 samples/sec   Loss 0.1000   LearningRate 0.0000   Epoch: 19   Global Step: 329100   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:04:25,132-Speed 3318.45 samples/sec   Loss 0.1117   LearningRate 0.0000   Epoch: 19   Global Step: 329110   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:04:28,204-Speed 3334.48 samples/sec   Loss 0.0958   LearningRate 0.0000   Epoch: 19   Global Step: 329120   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:04:31,270-Speed 3341.68 samples/sec   Loss 0.1109   LearningRate 0.0000   Epoch: 19   Global Step: 329130   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:04:34,334-Speed 3341.83 samples/sec   Loss 0.0974   LearningRate 0.0000   Epoch: 19   Global Step: 329140   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:04:37,408-Speed 3332.31 samples/sec   Loss 0.1037   LearningRate 0.0000   Epoch: 19   Global Step: 329150   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:04:40,472-Speed 3342.46 samples/sec   Loss 0.1086   LearningRate 0.0000   Epoch: 19   Global Step: 329160   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:04:43,540-Speed 3338.36 samples/sec   Loss 0.1081   LearningRate 0.0000   Epoch: 19   Global Step: 329170   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:04:46,607-Speed 3340.00 samples/sec   Loss 0.1160   LearningRate 0.0000   Epoch: 19   Global Step: 329180   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:04:49,750-Speed 3259.21 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 329190   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:04:52,854-Speed 3299.36 samples/sec   Loss 0.1008   LearningRate 0.0000   Epoch: 19   Global Step: 329200   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:04:55,926-Speed 3334.09 samples/sec   Loss 0.0954   LearningRate 0.0000   Epoch: 19   Global Step: 329210   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:04:58,997-Speed 3335.74 samples/sec   Loss 0.1135   LearningRate 0.0000   Epoch: 19   Global Step: 329220   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:05:02,079-Speed 3322.87 samples/sec   Loss 0.1137   LearningRate 0.0000   Epoch: 19   Global Step: 329230   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:05,198-Speed 3284.41 samples/sec   Loss 0.1112   LearningRate 0.0000   Epoch: 19   Global Step: 329240   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:08,261-Speed 3342.82 samples/sec   Loss 0.0933   LearningRate 0.0000   Epoch: 19   Global Step: 329250   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:11,407-Speed 3256.23 samples/sec   Loss 0.1053   LearningRate 0.0000   Epoch: 19   Global Step: 329260   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:14,524-Speed 3285.27 samples/sec   Loss 0.1108   LearningRate 0.0000   Epoch: 19   Global Step: 329270   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:17,642-Speed 3285.58 samples/sec   Loss 0.1013   LearningRate 0.0000   Epoch: 19   Global Step: 329280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:20,714-Speed 3333.69 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 329290   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:23,796-Speed 3323.23 samples/sec   Loss 0.1016   LearningRate 0.0000   Epoch: 19   Global Step: 329300   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:26,865-Speed 3337.95 samples/sec   Loss 0.1108   LearningRate 0.0000   Epoch: 19   Global Step: 329310   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:30,036-Speed 3229.69 samples/sec   Loss 0.1056   LearningRate 0.0000   Epoch: 19   Global Step: 329320   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:33,106-Speed 3336.50 samples/sec   Loss 0.1032   LearningRate 0.0000   Epoch: 19   Global Step: 329330   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:05:36,163-Speed 3351.00 samples/sec   Loss 0.1040   LearningRate 0.0000   Epoch: 19   Global Step: 329340   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:39,276-Speed 3290.16 samples/sec   Loss 0.0929   LearningRate 0.0000   Epoch: 19   Global Step: 329350   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:42,358-Speed 3322.20 samples/sec   Loss 0.1030   LearningRate 0.0000   Epoch: 19   Global Step: 329360   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:45,469-Speed 3292.48 samples/sec   Loss 0.1108   LearningRate 0.0000   Epoch: 19   Global Step: 329370   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:48,590-Speed 3282.19 samples/sec   Loss 0.1070   LearningRate 0.0000   Epoch: 19   Global Step: 329380   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:51,696-Speed 3297.55 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 329390   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:54,773-Speed 3328.71 samples/sec   Loss 0.1101   LearningRate 0.0000   Epoch: 19   Global Step: 329400   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:05:57,844-Speed 3335.31 samples/sec   Loss 0.1027   LearningRate 0.0000   Epoch: 19   Global Step: 329410   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:06:00,912-Speed 3337.86 samples/sec   Loss 0.1074   LearningRate 0.0000   Epoch: 19   Global Step: 329420   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:06:03,979-Speed 3339.75 samples/sec   Loss 0.1135   LearningRate 0.0000   Epoch: 19   Global Step: 329430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:07,115-Speed 3266.65 samples/sec   Loss 0.1083   LearningRate 0.0000   Epoch: 19   Global Step: 329440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:10,192-Speed 3328.13 samples/sec   Loss 0.1035   LearningRate 0.0000   Epoch: 19   Global Step: 329450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:13,291-Speed 3304.80 samples/sec   Loss 0.1074   LearningRate 0.0000   Epoch: 19   Global Step: 329460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:16,369-Speed 3328.05 samples/sec   Loss 0.1111   LearningRate 0.0000   Epoch: 19   Global Step: 329470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:19,450-Speed 3323.92 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 329480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:22,526-Speed 3330.57 samples/sec   Loss 0.1011   LearningRate 0.0000   Epoch: 19   Global Step: 329490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:25,738-Speed 3188.96 samples/sec   Loss 0.1089   LearningRate 0.0000   Epoch: 19   Global Step: 329500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:28,822-Speed 3321.23 samples/sec   Loss 0.1094   LearningRate 0.0000   Epoch: 19   Global Step: 329510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:32,041-Speed 3181.27 samples/sec   Loss 0.1154   LearningRate 0.0000   Epoch: 19   Global Step: 329520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:06:35,104-Speed 3343.87 samples/sec   Loss 0.1064   LearningRate 0.0000   Epoch: 19   Global Step: 329530   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:06:38,191-Speed 3318.41 samples/sec   Loss 0.1051   LearningRate 0.0000   Epoch: 19   Global Step: 329540   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:06:41,300-Speed 3293.53 samples/sec   Loss 0.0953   LearningRate 0.0000   Epoch: 19   Global Step: 329550   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:06:44,375-Speed 3331.85 samples/sec   Loss 0.1155   LearningRate 0.0000   Epoch: 19   Global Step: 329560   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:06:47,439-Speed 3342.87 samples/sec   Loss 0.0959   LearningRate 0.0000   Epoch: 19   Global Step: 329570   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:06:50,598-Speed 3242.04 samples/sec   Loss 0.1039   LearningRate 0.0000   Epoch: 19   Global Step: 329580   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:06:53,826-Speed 3172.77 samples/sec   Loss 0.1154   LearningRate 0.0000   Epoch: 19   Global Step: 329590   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:06:56,934-Speed 3295.69 samples/sec   Loss 0.1083   LearningRate 0.0000   Epoch: 19   Global Step: 329600   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:00,030-Speed 3307.61 samples/sec   Loss 0.0939   LearningRate 0.0000   Epoch: 19   Global Step: 329610   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:03,114-Speed 3321.48 samples/sec   Loss 0.1077   LearningRate 0.0000   Epoch: 19   Global Step: 329620   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:06,194-Speed 3325.99 samples/sec   Loss 0.0996   LearningRate 0.0000   Epoch: 19   Global Step: 329630   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:07:09,255-Speed 3345.96 samples/sec   Loss 0.0970   LearningRate 0.0000   Epoch: 19   Global Step: 329640   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:12,329-Speed 3331.81 samples/sec   Loss 0.1112   LearningRate 0.0000   Epoch: 19   Global Step: 329650   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:15,392-Speed 3343.85 samples/sec   Loss 0.1023   LearningRate 0.0000   Epoch: 19   Global Step: 329660   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:18,471-Speed 3326.86 samples/sec   Loss 0.1029   LearningRate 0.0000   Epoch: 19   Global Step: 329670   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:21,542-Speed 3334.98 samples/sec   Loss 0.1072   LearningRate 0.0000   Epoch: 19   Global Step: 329680   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:24,642-Speed 3304.24 samples/sec   Loss 0.1035   LearningRate 0.0000   Epoch: 19   Global Step: 329690   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:27,715-Speed 3332.84 samples/sec   Loss 0.0963   LearningRate 0.0000   Epoch: 19   Global Step: 329700   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:30,786-Speed 3335.47 samples/sec   Loss 0.0958   LearningRate 0.0000   Epoch: 19   Global Step: 329710   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:33,859-Speed 3332.10 samples/sec   Loss 0.1116   LearningRate 0.0000   Epoch: 19   Global Step: 329720   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:36,944-Speed 3321.28 samples/sec   Loss 0.1100   LearningRate 0.0000   Epoch: 19   Global Step: 329730   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:40,013-Speed 3337.41 samples/sec   Loss 0.1057   LearningRate 0.0000   Epoch: 19   Global Step: 329740   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:07:43,085-Speed 3333.95 samples/sec   Loss 0.1030   LearningRate 0.0000   Epoch: 19   Global Step: 329750   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:46,162-Speed 3328.51 samples/sec   Loss 0.1075   LearningRate 0.0000   Epoch: 19   Global Step: 329760   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:49,225-Speed 3343.34 samples/sec   Loss 0.1046   LearningRate 0.0000   Epoch: 19   Global Step: 329770   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:52,335-Speed 3293.64 samples/sec   Loss 0.0970   LearningRate 0.0000   Epoch: 19   Global Step: 329780   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:55,436-Speed 3302.48 samples/sec   Loss 0.1035   LearningRate 0.0000   Epoch: 19   Global Step: 329790   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:07:58,503-Speed 3340.01 samples/sec   Loss 0.1069   LearningRate 0.0000   Epoch: 19   Global Step: 329800   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:01,583-Speed 3325.18 samples/sec   Loss 0.1002   LearningRate 0.0000   Epoch: 19   Global Step: 329810   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:04,691-Speed 3296.08 samples/sec   Loss 0.1100   LearningRate 0.0000   Epoch: 19   Global Step: 329820   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:07,764-Speed 3333.02 samples/sec   Loss 0.1057   LearningRate 0.0000   Epoch: 19   Global Step: 329830   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:10,861-Speed 3307.31 samples/sec   Loss 0.1059   LearningRate 0.0000   Epoch: 19   Global Step: 329840   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:14,012-Speed 3250.01 samples/sec   Loss 0.1008   LearningRate 0.0000   Epoch: 19   Global Step: 329850   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:08:17,161-Speed 3252.63 samples/sec   Loss 0.1048   LearningRate 0.0000   Epoch: 19   Global Step: 329860   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:08:20,355-Speed 3207.20 samples/sec   Loss 0.1178   LearningRate 0.0000   Epoch: 19   Global Step: 329870   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:23,435-Speed 3325.55 samples/sec   Loss 0.1076   LearningRate 0.0000   Epoch: 19   Global Step: 329880   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:26,517-Speed 3322.73 samples/sec   Loss 0.0970   LearningRate 0.0000   Epoch: 19   Global Step: 329890   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:29,617-Speed 3304.68 samples/sec   Loss 0.1150   LearningRate 0.0000   Epoch: 19   Global Step: 329900   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:32,706-Speed 3315.65 samples/sec   Loss 0.1154   LearningRate 0.0000   Epoch: 19   Global Step: 329910   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:35,773-Speed 3339.28 samples/sec   Loss 0.0907   LearningRate 0.0000   Epoch: 19   Global Step: 329920   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:38,842-Speed 3337.57 samples/sec   Loss 0.0960   LearningRate 0.0000   Epoch: 19   Global Step: 329930   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:41,918-Speed 3329.24 samples/sec   Loss 0.1026   LearningRate 0.0000   Epoch: 19   Global Step: 329940   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:44,996-Speed 3327.39 samples/sec   Loss 0.1030   LearningRate 0.0000   Epoch: 19   Global Step: 329950   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:48,067-Speed 3336.21 samples/sec   Loss 0.1033   LearningRate 0.0000   Epoch: 19   Global Step: 329960   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:51,174-Speed 3296.15 samples/sec   Loss 0.1057   LearningRate 0.0000   Epoch: 19   Global Step: 329970   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:08:54,231-Speed 3350.53 samples/sec   Loss 0.1128   LearningRate 0.0000   Epoch: 19   Global Step: 329980   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:08:57,422-Speed 3209.06 samples/sec   Loss 0.0935   LearningRate 0.0000   Epoch: 19   Global Step: 329990   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:09:00,495-Speed 3333.53 samples/sec   Loss 0.1111   LearningRate 0.0000   Epoch: 19   Global Step: 330000   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:09:44,827-[lfw][330000]XNorm: 20.755443
Training: 2022-04-12 10:09:44,827-[lfw][330000]Accuracy-Flip: 0.99800+-0.00221
Training: 2022-04-12 10:09:44,828-[lfw][330000]Accuracy-Highest: 0.99817
Training: 2022-04-12 10:10:36,432-[cfp_fp][330000]XNorm: 22.637739
Training: 2022-04-12 10:10:36,433-[cfp_fp][330000]Accuracy-Flip: 0.99143+-0.00361
Training: 2022-04-12 10:10:36,433-[cfp_fp][330000]Accuracy-Highest: 0.99200
Training: 2022-04-12 10:11:20,926-[agedb_30][330000]XNorm: 22.813890
Training: 2022-04-12 10:11:20,927-[agedb_30][330000]Accuracy-Flip: 0.98583+-0.00616
Training: 2022-04-12 10:11:20,927-[agedb_30][330000]Accuracy-Highest: 0.98650
Training: 2022-04-12 10:11:23,996-Speed 71.36 samples/sec   Loss 0.0965   LearningRate 0.0000   Epoch: 19   Global Step: 330010   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:11:27,085-Speed 3314.96 samples/sec   Loss 0.0986   LearningRate 0.0000   Epoch: 19   Global Step: 330020   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:11:30,141-Speed 3351.60 samples/sec   Loss 0.1084   LearningRate 0.0000   Epoch: 19   Global Step: 330030   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:11:33,205-Speed 3343.14 samples/sec   Loss 0.0977   LearningRate 0.0000   Epoch: 19   Global Step: 330040   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:11:36,264-Speed 3347.36 samples/sec   Loss 0.1160   LearningRate 0.0000   Epoch: 19   Global Step: 330050   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:11:39,371-Speed 3296.86 samples/sec   Loss 0.0951   LearningRate 0.0000   Epoch: 19   Global Step: 330060   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:11:42,445-Speed 3332.38 samples/sec   Loss 0.1112   LearningRate 0.0000   Epoch: 19   Global Step: 330070   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:11:45,513-Speed 3337.96 samples/sec   Loss 0.1002   LearningRate 0.0000   Epoch: 19   Global Step: 330080   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:11:48,658-Speed 3256.32 samples/sec   Loss 0.0950   LearningRate 0.0000   Epoch: 19   Global Step: 330090   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:11:51,752-Speed 3311.35 samples/sec   Loss 0.1019   LearningRate 0.0000   Epoch: 19   Global Step: 330100   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:11:54,865-Speed 3290.44 samples/sec   Loss 0.1077   LearningRate 0.0000   Epoch: 19   Global Step: 330110   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:11:58,028-Speed 3237.63 samples/sec   Loss 0.1145   LearningRate 0.0000   Epoch: 19   Global Step: 330120   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:01,178-Speed 3251.17 samples/sec   Loss 0.1129   LearningRate 0.0000   Epoch: 19   Global Step: 330130   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:04,258-Speed 3326.32 samples/sec   Loss 0.1078   LearningRate 0.0000   Epoch: 19   Global Step: 330140   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:07,468-Speed 3190.56 samples/sec   Loss 0.1067   LearningRate 0.0000   Epoch: 19   Global Step: 330150   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:10,539-Speed 3335.14 samples/sec   Loss 0.1118   LearningRate 0.0000   Epoch: 19   Global Step: 330160   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:13,590-Speed 3356.49 samples/sec   Loss 0.1024   LearningRate 0.0000   Epoch: 19   Global Step: 330170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:16,666-Speed 3329.61 samples/sec   Loss 0.0997   LearningRate 0.0000   Epoch: 19   Global Step: 330180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:19,733-Speed 3340.49 samples/sec   Loss 0.0985   LearningRate 0.0000   Epoch: 19   Global Step: 330190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:22,797-Speed 3342.49 samples/sec   Loss 0.1026   LearningRate 0.0000   Epoch: 19   Global Step: 330200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:25,864-Speed 3339.42 samples/sec   Loss 0.1044   LearningRate 0.0000   Epoch: 19   Global Step: 330210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:28,962-Speed 3305.76 samples/sec   Loss 0.0996   LearningRate 0.0000   Epoch: 19   Global Step: 330220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:32,041-Speed 3327.30 samples/sec   Loss 0.1016   LearningRate 0.0000   Epoch: 19   Global Step: 330230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:35,132-Speed 3313.02 samples/sec   Loss 0.1060   LearningRate 0.0000   Epoch: 19   Global Step: 330240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:38,297-Speed 3236.42 samples/sec   Loss 0.1101   LearningRate 0.0000   Epoch: 19   Global Step: 330250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:41,374-Speed 3327.80 samples/sec   Loss 0.1017   LearningRate 0.0000   Epoch: 19   Global Step: 330260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:12:44,459-Speed 3320.34 samples/sec   Loss 0.1011   LearningRate 0.0000   Epoch: 19   Global Step: 330270   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:47,571-Speed 3292.27 samples/sec   Loss 0.0989   LearningRate 0.0000   Epoch: 19   Global Step: 330280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:50,639-Speed 3337.85 samples/sec   Loss 0.1014   LearningRate 0.0000   Epoch: 19   Global Step: 330290   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:53,723-Speed 3321.43 samples/sec   Loss 0.1064   LearningRate 0.0000   Epoch: 19   Global Step: 330300   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:56,824-Speed 3302.39 samples/sec   Loss 0.1074   LearningRate 0.0000   Epoch: 19   Global Step: 330310   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:12:59,913-Speed 3316.18 samples/sec   Loss 0.1082   LearningRate 0.0000   Epoch: 19   Global Step: 330320   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:13:02,983-Speed 3336.17 samples/sec   Loss 0.0962   LearningRate 0.0000   Epoch: 19   Global Step: 330330   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:13:06,056-Speed 3332.88 samples/sec   Loss 0.1126   LearningRate 0.0000   Epoch: 19   Global Step: 330340   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:13:09,199-Speed 3259.09 samples/sec   Loss 0.1123   LearningRate 0.0000   Epoch: 19   Global Step: 330350   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:13:12,298-Speed 3304.28 samples/sec   Loss 0.1032   LearningRate 0.0000   Epoch: 19   Global Step: 330360   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:13:15,461-Speed 3239.12 samples/sec   Loss 0.0998   LearningRate 0.0000   Epoch: 19   Global Step: 330370   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:13:18,528-Speed 3338.65 samples/sec   Loss 0.0998   LearningRate 0.0000   Epoch: 19   Global Step: 330380   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:13:21,582-Speed 3353.85 samples/sec   Loss 0.0973   LearningRate 0.0000   Epoch: 19   Global Step: 330390   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:13:24,635-Speed 3355.02 samples/sec   Loss 0.1001   LearningRate 0.0000   Epoch: 19   Global Step: 330400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:27,719-Speed 3320.67 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 330410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:30,815-Speed 3308.99 samples/sec   Loss 0.0974   LearningRate 0.0000   Epoch: 19   Global Step: 330420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:33,945-Speed 3272.13 samples/sec   Loss 0.1114   LearningRate 0.0000   Epoch: 19   Global Step: 330430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:37,036-Speed 3313.19 samples/sec   Loss 0.1026   LearningRate 0.0000   Epoch: 19   Global Step: 330440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:40,124-Speed 3317.56 samples/sec   Loss 0.1164   LearningRate 0.0000   Epoch: 19   Global Step: 330450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:43,242-Speed 3285.26 samples/sec   Loss 0.1042   LearningRate 0.0000   Epoch: 19   Global Step: 330460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:46,332-Speed 3314.64 samples/sec   Loss 0.0999   LearningRate 0.0000   Epoch: 19   Global Step: 330470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:49,517-Speed 3215.47 samples/sec   Loss 0.1000   LearningRate 0.0000   Epoch: 19   Global Step: 330480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:52,590-Speed 3333.33 samples/sec   Loss 0.1028   LearningRate 0.0000   Epoch: 19   Global Step: 330490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:13:55,691-Speed 3301.85 samples/sec   Loss 0.1040   LearningRate 0.0000   Epoch: 19   Global Step: 330500   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:13:58,776-Speed 3320.73 samples/sec   Loss 0.1051   LearningRate 0.0000   Epoch: 19   Global Step: 330510   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:01,854-Speed 3326.85 samples/sec   Loss 0.1086   LearningRate 0.0000   Epoch: 19   Global Step: 330520   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:04,963-Speed 3294.55 samples/sec   Loss 0.0985   LearningRate 0.0000   Epoch: 19   Global Step: 330530   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:08,044-Speed 3324.60 samples/sec   Loss 0.1036   LearningRate 0.0000   Epoch: 19   Global Step: 330540   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:11,118-Speed 3332.31 samples/sec   Loss 0.1105   LearningRate 0.0000   Epoch: 19   Global Step: 330550   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:14,222-Speed 3299.86 samples/sec   Loss 0.1055   LearningRate 0.0000   Epoch: 19   Global Step: 330560   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:17,312-Speed 3314.62 samples/sec   Loss 0.1035   LearningRate 0.0000   Epoch: 19   Global Step: 330570   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:20,392-Speed 3325.13 samples/sec   Loss 0.1066   LearningRate 0.0000   Epoch: 19   Global Step: 330580   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:23,467-Speed 3330.38 samples/sec   Loss 0.0996   LearningRate 0.0000   Epoch: 19   Global Step: 330590   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:26,530-Speed 3344.06 samples/sec   Loss 0.1066   LearningRate 0.0000   Epoch: 19   Global Step: 330600   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:29,596-Speed 3340.35 samples/sec   Loss 0.0991   LearningRate 0.0000   Epoch: 19   Global Step: 330610   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:32,664-Speed 3339.20 samples/sec   Loss 0.0986   LearningRate 0.0000   Epoch: 19   Global Step: 330620   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:35,732-Speed 3338.70 samples/sec   Loss 0.0954   LearningRate 0.0000   Epoch: 19   Global Step: 330630   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:38,814-Speed 3323.44 samples/sec   Loss 0.1066   LearningRate 0.0000   Epoch: 19   Global Step: 330640   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:41,889-Speed 3330.63 samples/sec   Loss 0.1081   LearningRate 0.0000   Epoch: 19   Global Step: 330650   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:45,040-Speed 3249.98 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 330660   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:48,107-Speed 3339.58 samples/sec   Loss 0.1061   LearningRate 0.0000   Epoch: 19   Global Step: 330670   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:51,213-Speed 3297.70 samples/sec   Loss 0.1027   LearningRate 0.0000   Epoch: 19   Global Step: 330680   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:54,376-Speed 3238.28 samples/sec   Loss 0.1016   LearningRate 0.0000   Epoch: 19   Global Step: 330690   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:14:57,449-Speed 3332.28 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 330700   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:15:00,499-Speed 3358.88 samples/sec   Loss 0.1010   LearningRate 0.0000   Epoch: 19   Global Step: 330710   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:03,572-Speed 3332.78 samples/sec   Loss 0.1085   LearningRate 0.0000   Epoch: 19   Global Step: 330720   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:06,653-Speed 3324.12 samples/sec   Loss 0.1076   LearningRate 0.0000   Epoch: 19   Global Step: 330730   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:09,735-Speed 3323.93 samples/sec   Loss 0.1031   LearningRate 0.0000   Epoch: 19   Global Step: 330740   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:12,809-Speed 3331.65 samples/sec   Loss 0.1109   LearningRate 0.0000   Epoch: 19   Global Step: 330750   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:15,873-Speed 3342.89 samples/sec   Loss 0.1014   LearningRate 0.0000   Epoch: 19   Global Step: 330760   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:18,936-Speed 3343.16 samples/sec   Loss 0.1057   LearningRate 0.0000   Epoch: 19   Global Step: 330770   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:22,013-Speed 3328.86 samples/sec   Loss 0.0938   LearningRate 0.0000   Epoch: 19   Global Step: 330780   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:25,096-Speed 3322.81 samples/sec   Loss 0.1063   LearningRate 0.0000   Epoch: 19   Global Step: 330790   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:28,185-Speed 3315.49 samples/sec   Loss 0.1039   LearningRate 0.0000   Epoch: 19   Global Step: 330800   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:31,408-Speed 3178.46 samples/sec   Loss 0.1076   LearningRate 0.0000   Epoch: 19   Global Step: 330810   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:15:34,474-Speed 3339.82 samples/sec   Loss 0.1081   LearningRate 0.0000   Epoch: 19   Global Step: 330820   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:37,571-Speed 3307.34 samples/sec   Loss 0.1063   LearningRate 0.0000   Epoch: 19   Global Step: 330830   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:40,632-Speed 3346.79 samples/sec   Loss 0.0923   LearningRate 0.0000   Epoch: 19   Global Step: 330840   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:43,703-Speed 3334.80 samples/sec   Loss 0.1177   LearningRate 0.0000   Epoch: 19   Global Step: 330850   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:46,761-Speed 3349.51 samples/sec   Loss 0.1003   LearningRate 0.0000   Epoch: 19   Global Step: 330860   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:49,847-Speed 3318.87 samples/sec   Loss 0.1020   LearningRate 0.0000   Epoch: 19   Global Step: 330870   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:52,917-Speed 3335.91 samples/sec   Loss 0.1050   LearningRate 0.0000   Epoch: 19   Global Step: 330880   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:56,000-Speed 3322.47 samples/sec   Loss 0.1100   LearningRate 0.0000   Epoch: 19   Global Step: 330890   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:15:59,091-Speed 3314.09 samples/sec   Loss 0.1104   LearningRate 0.0000   Epoch: 19   Global Step: 330900   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:16:02,173-Speed 3323.88 samples/sec   Loss 0.1123   LearningRate 0.0000   Epoch: 19   Global Step: 330910   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:16:05,278-Speed 3298.23 samples/sec   Loss 0.1051   LearningRate 0.0000   Epoch: 19   Global Step: 330920   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:16:08,343-Speed 3341.87 samples/sec   Loss 0.1059   LearningRate 0.0000   Epoch: 19   Global Step: 330930   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:16:11,408-Speed 3341.50 samples/sec   Loss 0.1046   LearningRate 0.0000   Epoch: 19   Global Step: 330940   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:16:14,478-Speed 3335.48 samples/sec   Loss 0.1041   LearningRate 0.0000   Epoch: 19   Global Step: 330950   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:16:17,554-Speed 3329.73 samples/sec   Loss 0.1063   LearningRate 0.0000   Epoch: 19   Global Step: 330960   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:16:20,733-Speed 3222.39 samples/sec   Loss 0.0999   LearningRate 0.0000   Epoch: 19   Global Step: 330970   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:16:23,928-Speed 3205.49 samples/sec   Loss 0.1155   LearningRate 0.0000   Epoch: 19   Global Step: 330980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:27,163-Speed 3166.71 samples/sec   Loss 0.1053   LearningRate 0.0000   Epoch: 19   Global Step: 330990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:30,224-Speed 3345.60 samples/sec   Loss 0.1141   LearningRate 0.0000   Epoch: 19   Global Step: 331000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:33,286-Speed 3345.57 samples/sec   Loss 0.1095   LearningRate 0.0000   Epoch: 19   Global Step: 331010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:36,353-Speed 3338.81 samples/sec   Loss 0.1020   LearningRate 0.0000   Epoch: 19   Global Step: 331020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:39,430-Speed 3328.82 samples/sec   Loss 0.0993   LearningRate 0.0000   Epoch: 19   Global Step: 331030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:42,496-Speed 3340.07 samples/sec   Loss 0.1130   LearningRate 0.0000   Epoch: 19   Global Step: 331040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:45,566-Speed 3336.68 samples/sec   Loss 0.1080   LearningRate 0.0000   Epoch: 19   Global Step: 331050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:48,632-Speed 3341.03 samples/sec   Loss 0.0958   LearningRate 0.0000   Epoch: 19   Global Step: 331060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:51,721-Speed 3316.11 samples/sec   Loss 0.1080   LearningRate 0.0000   Epoch: 19   Global Step: 331070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:16:54,789-Speed 3337.98 samples/sec   Loss 0.1055   LearningRate 0.0000   Epoch: 19   Global Step: 331080   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:16:57,998-Speed 3191.93 samples/sec   Loss 0.0991   LearningRate 0.0000   Epoch: 19   Global Step: 331090   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:01,177-Speed 3221.65 samples/sec   Loss 0.1157   LearningRate 0.0000   Epoch: 19   Global Step: 331100   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:04,245-Speed 3337.96 samples/sec   Loss 0.1089   LearningRate 0.0000   Epoch: 19   Global Step: 331110   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:07,329-Speed 3321.93 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 331120   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:10,392-Speed 3343.22 samples/sec   Loss 0.0998   LearningRate 0.0000   Epoch: 19   Global Step: 331130   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:13,461-Speed 3337.44 samples/sec   Loss 0.1139   LearningRate 0.0000   Epoch: 19   Global Step: 331140   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:16,524-Speed 3344.71 samples/sec   Loss 0.0963   LearningRate 0.0000   Epoch: 19   Global Step: 331150   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:19,594-Speed 3336.02 samples/sec   Loss 0.1109   LearningRate 0.0000   Epoch: 19   Global Step: 331160   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:22,739-Speed 3256.97 samples/sec   Loss 0.1012   LearningRate 0.0000   Epoch: 19   Global Step: 331170   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:25,878-Speed 3262.16 samples/sec   Loss 0.1013   LearningRate 0.0000   Epoch: 19   Global Step: 331180   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:17:28,937-Speed 3348.82 samples/sec   Loss 0.1058   LearningRate 0.0000   Epoch: 19   Global Step: 331190   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:31,998-Speed 3345.43 samples/sec   Loss 0.1100   LearningRate 0.0000   Epoch: 19   Global Step: 331200   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:35,069-Speed 3334.74 samples/sec   Loss 0.1105   LearningRate 0.0000   Epoch: 19   Global Step: 331210   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:38,153-Speed 3321.63 samples/sec   Loss 0.1075   LearningRate 0.0000   Epoch: 19   Global Step: 331220   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:41,214-Speed 3345.70 samples/sec   Loss 0.0973   LearningRate 0.0000   Epoch: 19   Global Step: 331230   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:44,282-Speed 3338.98 samples/sec   Loss 0.1100   LearningRate 0.0000   Epoch: 19   Global Step: 331240   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:47,348-Speed 3340.35 samples/sec   Loss 0.1059   LearningRate 0.0000   Epoch: 19   Global Step: 331250   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:50,460-Speed 3291.18 samples/sec   Loss 0.1062   LearningRate 0.0000   Epoch: 19   Global Step: 331260   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:53,525-Speed 3341.93 samples/sec   Loss 0.1023   LearningRate 0.0000   Epoch: 19   Global Step: 331270   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:56,589-Speed 3342.45 samples/sec   Loss 0.1104   LearningRate 0.0000   Epoch: 19   Global Step: 331280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:17:59,663-Speed 3331.86 samples/sec   Loss 0.1037   LearningRate 0.0000   Epoch: 19   Global Step: 331290   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:18:02,788-Speed 3278.23 samples/sec   Loss 0.1020   LearningRate 0.0000   Epoch: 19   Global Step: 331300   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:18:05,892-Speed 3298.72 samples/sec   Loss 0.1133   LearningRate 0.0000   Epoch: 19   Global Step: 331310   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:18:08,975-Speed 3322.72 samples/sec   Loss 0.0998   LearningRate 0.0000   Epoch: 19   Global Step: 331320   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:18:12,101-Speed 3276.96 samples/sec   Loss 0.1107   LearningRate 0.0000   Epoch: 19   Global Step: 331330   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:18:15,165-Speed 3343.00 samples/sec   Loss 0.1044   LearningRate 0.0000   Epoch: 19   Global Step: 331340   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:18,231-Speed 3340.41 samples/sec   Loss 0.1065   LearningRate 0.0000   Epoch: 19   Global Step: 331350   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:21,296-Speed 3341.42 samples/sec   Loss 0.1045   LearningRate 0.0000   Epoch: 19   Global Step: 331360   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:24,513-Speed 3183.79 samples/sec   Loss 0.1044   LearningRate 0.0000   Epoch: 19   Global Step: 331370   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:27,615-Speed 3302.05 samples/sec   Loss 0.1108   LearningRate 0.0000   Epoch: 19   Global Step: 331380   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:30,687-Speed 3333.58 samples/sec   Loss 0.0925   LearningRate 0.0000   Epoch: 19   Global Step: 331390   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:33,769-Speed 3322.96 samples/sec   Loss 0.1077   LearningRate 0.0000   Epoch: 19   Global Step: 331400   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:36,945-Speed 3225.93 samples/sec   Loss 0.1073   LearningRate 0.0000   Epoch: 19   Global Step: 331410   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:40,028-Speed 3322.57 samples/sec   Loss 0.1066   LearningRate 0.0000   Epoch: 19   Global Step: 331420   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:43,100-Speed 3333.36 samples/sec   Loss 0.1038   LearningRate 0.0000   Epoch: 19   Global Step: 331430   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:46,167-Speed 3339.81 samples/sec   Loss 0.1032   LearningRate 0.0000   Epoch: 19   Global Step: 331440   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:18:49,267-Speed 3303.67 samples/sec   Loss 0.1028   LearningRate 0.0000   Epoch: 19   Global Step: 331450   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:18:52,334-Speed 3339.79 samples/sec   Loss 0.1044   LearningRate 0.0000   Epoch: 19   Global Step: 331460   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:18:55,417-Speed 3322.30 samples/sec   Loss 0.1035   LearningRate 0.0000   Epoch: 19   Global Step: 331470   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:18:58,505-Speed 3315.95 samples/sec   Loss 0.1091   LearningRate 0.0000   Epoch: 19   Global Step: 331480   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:01,618-Speed 3291.40 samples/sec   Loss 0.1034   LearningRate 0.0000   Epoch: 19   Global Step: 331490   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:04,693-Speed 3330.39 samples/sec   Loss 0.1065   LearningRate 0.0000   Epoch: 19   Global Step: 331500   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:07,763-Speed 3335.89 samples/sec   Loss 0.0943   LearningRate 0.0000   Epoch: 19   Global Step: 331510   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:10,909-Speed 3255.59 samples/sec   Loss 0.0975   LearningRate 0.0000   Epoch: 19   Global Step: 331520   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:13,983-Speed 3331.81 samples/sec   Loss 0.1007   LearningRate 0.0000   Epoch: 19   Global Step: 331530   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:17,078-Speed 3309.12 samples/sec   Loss 0.0965   LearningRate 0.0000   Epoch: 19   Global Step: 331540   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:20,261-Speed 3218.52 samples/sec   Loss 0.1087   LearningRate 0.0000   Epoch: 19   Global Step: 331550   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:23,387-Speed 3276.30 samples/sec   Loss 0.1081   LearningRate 0.0000   Epoch: 19   Global Step: 331560   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:26,478-Speed 3313.67 samples/sec   Loss 0.0957   LearningRate 0.0000   Epoch: 19   Global Step: 331570   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:19:29,565-Speed 3317.66 samples/sec   Loss 0.1069   LearningRate 0.0000   Epoch: 19   Global Step: 331580   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:19:32,644-Speed 3326.91 samples/sec   Loss 0.1025   LearningRate 0.0000   Epoch: 19   Global Step: 331590   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:19:35,705-Speed 3346.09 samples/sec   Loss 0.1067   LearningRate 0.0000   Epoch: 19   Global Step: 331600   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:38,779-Speed 3331.52 samples/sec   Loss 0.0965   LearningRate 0.0000   Epoch: 19   Global Step: 331610   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:41,854-Speed 3330.81 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 331620   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:44,939-Speed 3320.60 samples/sec   Loss 0.1047   LearningRate 0.0000   Epoch: 19   Global Step: 331630   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:48,020-Speed 3324.02 samples/sec   Loss 0.1019   LearningRate 0.0000   Epoch: 19   Global Step: 331640   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:51,244-Speed 3176.42 samples/sec   Loss 0.1008   LearningRate 0.0000   Epoch: 19   Global Step: 331650   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:54,362-Speed 3285.40 samples/sec   Loss 0.1091   LearningRate 0.0000   Epoch: 19   Global Step: 331660   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:19:57,447-Speed 3320.37 samples/sec   Loss 0.1082   LearningRate 0.0000   Epoch: 19   Global Step: 331670   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:00,540-Speed 3310.75 samples/sec   Loss 0.1110   LearningRate 0.0000   Epoch: 19   Global Step: 331680   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:03,677-Speed 3265.20 samples/sec   Loss 0.1090   LearningRate 0.0000   Epoch: 19   Global Step: 331690   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:06,776-Speed 3305.50 samples/sec   Loss 0.1024   LearningRate 0.0000   Epoch: 19   Global Step: 331700   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:09,985-Speed 3191.65 samples/sec   Loss 0.1025   LearningRate 0.0000   Epoch: 19   Global Step: 331710   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:13,079-Speed 3309.65 samples/sec   Loss 0.1096   LearningRate 0.0000   Epoch: 19   Global Step: 331720   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:16,214-Speed 3267.19 samples/sec   Loss 0.0998   LearningRate 0.0000   Epoch: 19   Global Step: 331730   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:19,322-Speed 3295.77 samples/sec   Loss 0.1088   LearningRate 0.0000   Epoch: 19   Global Step: 331740   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:22,425-Speed 3301.17 samples/sec   Loss 0.0989   LearningRate 0.0000   Epoch: 19   Global Step: 331750   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:25,488-Speed 3344.15 samples/sec   Loss 0.1148   LearningRate 0.0000   Epoch: 19   Global Step: 331760   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:28,562-Speed 3331.09 samples/sec   Loss 0.1032   LearningRate 0.0000   Epoch: 19   Global Step: 331770   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:31,631-Speed 3338.10 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 331780   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:20:34,722-Speed 3312.93 samples/sec   Loss 0.1062   LearningRate 0.0000   Epoch: 19   Global Step: 331790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:20:37,790-Speed 3337.92 samples/sec   Loss 0.1065   LearningRate 0.0000   Epoch: 19   Global Step: 331800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:20:40,886-Speed 3308.29 samples/sec   Loss 0.1058   LearningRate 0.0000   Epoch: 19   Global Step: 331810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:20:43,976-Speed 3315.11 samples/sec   Loss 0.1025   LearningRate 0.0000   Epoch: 19   Global Step: 331820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:20:47,050-Speed 3332.26 samples/sec   Loss 0.1037   LearningRate 0.0000   Epoch: 19   Global Step: 331830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:20:50,148-Speed 3306.19 samples/sec   Loss 0.1039   LearningRate 0.0000   Epoch: 19   Global Step: 331840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:20:53,221-Speed 3333.11 samples/sec   Loss 0.1056   LearningRate 0.0000   Epoch: 19   Global Step: 331850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:20:56,293-Speed 3333.71 samples/sec   Loss 0.1197   LearningRate 0.0000   Epoch: 19   Global Step: 331860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:20:59,393-Speed 3303.33 samples/sec   Loss 0.1033   LearningRate 0.0000   Epoch: 19   Global Step: 331870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:21:02,486-Speed 3311.80 samples/sec   Loss 0.1049   LearningRate 0.0000   Epoch: 19   Global Step: 331880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:21:05,631-Speed 3257.20 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 331890   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:21:08,730-Speed 3304.22 samples/sec   Loss 0.1055   LearningRate 0.0000   Epoch: 19   Global Step: 331900   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:21:11,798-Speed 3338.39 samples/sec   Loss 0.1029   LearningRate 0.0000   Epoch: 19   Global Step: 331910   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:21:14,866-Speed 3339.57 samples/sec   Loss 0.1060   LearningRate 0.0000   Epoch: 19   Global Step: 331920   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:21:17,969-Speed 3300.41 samples/sec   Loss 0.1088   LearningRate 0.0000   Epoch: 19   Global Step: 331930   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:21:21,043-Speed 3331.70 samples/sec   Loss 0.0989   LearningRate 0.0000   Epoch: 19   Global Step: 331940   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:21:24,119-Speed 3330.30 samples/sec   Loss 0.0988   LearningRate 0.0000   Epoch: 19   Global Step: 331950   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:21:27,183-Speed 3342.49 samples/sec   Loss 0.0957   LearningRate 0.0000   Epoch: 19   Global Step: 331960   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:21:30,234-Speed 3357.01 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 331970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:21:33,308-Speed 3331.69 samples/sec   Loss 0.1020   LearningRate 0.0000   Epoch: 19   Global Step: 331980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:21:36,395-Speed 3318.05 samples/sec   Loss 0.0962   LearningRate 0.0000   Epoch: 19   Global Step: 331990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:21:39,476-Speed 3323.76 samples/sec   Loss 0.1030   LearningRate 0.0000   Epoch: 19   Global Step: 332000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:22:23,409-[lfw][332000]XNorm: 20.656130
Training: 2022-04-12 10:22:23,409-[lfw][332000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-12 10:22:23,410-[lfw][332000]Accuracy-Highest: 0.99817
Training: 2022-04-12 10:23:14,012-[cfp_fp][332000]XNorm: 22.591580
Training: 2022-04-12 10:23:14,012-[cfp_fp][332000]Accuracy-Flip: 0.99100+-0.00394
Training: 2022-04-12 10:23:14,013-[cfp_fp][332000]Accuracy-Highest: 0.99200
Training: 2022-04-12 10:23:57,424-[agedb_30][332000]XNorm: 22.777670
Training: 2022-04-12 10:23:57,425-[agedb_30][332000]Accuracy-Flip: 0.98650+-0.00555
Training: 2022-04-12 10:23:57,425-[agedb_30][332000]Accuracy-Highest: 0.98650
Training: 2022-04-12 10:24:00,488-Speed 72.62 samples/sec   Loss 0.0908   LearningRate 0.0000   Epoch: 19   Global Step: 332010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:24:03,549-Speed 3345.10 samples/sec   Loss 0.1115   LearningRate 0.0000   Epoch: 19   Global Step: 332020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:24:06,625-Speed 3330.13 samples/sec   Loss 0.1069   LearningRate 0.0000   Epoch: 19   Global Step: 332030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:24:09,701-Speed 3329.82 samples/sec   Loss 0.1014   LearningRate 0.0000   Epoch: 19   Global Step: 332040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:24:12,761-Speed 3347.54 samples/sec   Loss 0.0984   LearningRate 0.0000   Epoch: 19   Global Step: 332050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:24:15,818-Speed 3349.74 samples/sec   Loss 0.1074   LearningRate 0.0000   Epoch: 19   Global Step: 332060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:24:18,889-Speed 3335.44 samples/sec   Loss 0.1104   LearningRate 0.0000   Epoch: 19   Global Step: 332070   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:21,958-Speed 3337.96 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 332080   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:25,039-Speed 3323.87 samples/sec   Loss 0.1003   LearningRate 0.0000   Epoch: 19   Global Step: 332090   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:28,169-Speed 3272.54 samples/sec   Loss 0.0978   LearningRate 0.0000   Epoch: 19   Global Step: 332100   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:31,282-Speed 3290.08 samples/sec   Loss 0.0968   LearningRate 0.0000   Epoch: 19   Global Step: 332110   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:34,356-Speed 3331.77 samples/sec   Loss 0.0925   LearningRate 0.0000   Epoch: 19   Global Step: 332120   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:37,435-Speed 3326.54 samples/sec   Loss 0.1105   LearningRate 0.0000   Epoch: 19   Global Step: 332130   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:40,494-Speed 3348.28 samples/sec   Loss 0.1008   LearningRate 0.0000   Epoch: 19   Global Step: 332140   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:43,579-Speed 3320.14 samples/sec   Loss 0.1024   LearningRate 0.0000   Epoch: 19   Global Step: 332150   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:46,641-Speed 3344.29 samples/sec   Loss 0.0981   LearningRate 0.0000   Epoch: 19   Global Step: 332160   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:49,704-Speed 3344.27 samples/sec   Loss 0.1020   LearningRate 0.0000   Epoch: 19   Global Step: 332170   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:52,797-Speed 3311.63 samples/sec   Loss 0.0974   LearningRate 0.0000   Epoch: 19   Global Step: 332180   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:55,874-Speed 3328.32 samples/sec   Loss 0.1037   LearningRate 0.0000   Epoch: 19   Global Step: 332190   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:24:58,966-Speed 3312.16 samples/sec   Loss 0.1123   LearningRate 0.0000   Epoch: 19   Global Step: 332200   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:02,038-Speed 3334.30 samples/sec   Loss 0.1123   LearningRate 0.0000   Epoch: 19   Global Step: 332210   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:05,107-Speed 3338.06 samples/sec   Loss 0.1075   LearningRate 0.0000   Epoch: 19   Global Step: 332220   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:08,207-Speed 3304.39 samples/sec   Loss 0.1098   LearningRate 0.0000   Epoch: 19   Global Step: 332230   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:11,268-Speed 3346.27 samples/sec   Loss 0.1018   LearningRate 0.0000   Epoch: 19   Global Step: 332240   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:14,353-Speed 3319.26 samples/sec   Loss 0.0948   LearningRate 0.0000   Epoch: 19   Global Step: 332250   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:17,418-Speed 3341.29 samples/sec   Loss 0.1180   LearningRate 0.0000   Epoch: 19   Global Step: 332260   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:20,545-Speed 3275.15 samples/sec   Loss 0.1043   LearningRate 0.0000   Epoch: 19   Global Step: 332270   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:25:23,635-Speed 3314.96 samples/sec   Loss 0.1093   LearningRate 0.0000   Epoch: 19   Global Step: 332280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:26,695-Speed 3347.16 samples/sec   Loss 0.0971   LearningRate 0.0000   Epoch: 19   Global Step: 332290   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:29,783-Speed 3316.83 samples/sec   Loss 0.1049   LearningRate 0.0000   Epoch: 19   Global Step: 332300   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:25:32,840-Speed 3350.57 samples/sec   Loss 0.0949   LearningRate 0.0000   Epoch: 19   Global Step: 332310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:25:35,904-Speed 3343.17 samples/sec   Loss 0.1020   LearningRate 0.0000   Epoch: 19   Global Step: 332320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:25:38,984-Speed 3325.29 samples/sec   Loss 0.1076   LearningRate 0.0000   Epoch: 19   Global Step: 332330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:25:42,049-Speed 3342.31 samples/sec   Loss 0.1100   LearningRate 0.0000   Epoch: 19   Global Step: 332340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:25:45,131-Speed 3323.05 samples/sec   Loss 0.1096   LearningRate 0.0000   Epoch: 19   Global Step: 332350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:25:48,220-Speed 3314.97 samples/sec   Loss 0.0951   LearningRate 0.0000   Epoch: 19   Global Step: 332360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:25:51,289-Speed 3337.66 samples/sec   Loss 0.1034   LearningRate 0.0000   Epoch: 19   Global Step: 332370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:25:54,372-Speed 3322.39 samples/sec   Loss 0.1071   LearningRate 0.0000   Epoch: 19   Global Step: 332380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:25:57,443-Speed 3334.88 samples/sec   Loss 0.0967   LearningRate 0.0000   Epoch: 19   Global Step: 332390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:26:00,527-Speed 3321.53 samples/sec   Loss 0.1068   LearningRate 0.0000   Epoch: 19   Global Step: 332400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:26:03,634-Speed 3296.15 samples/sec   Loss 0.1089   LearningRate 0.0000   Epoch: 19   Global Step: 332410   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:06,785-Speed 3250.80 samples/sec   Loss 0.1057   LearningRate 0.0000   Epoch: 19   Global Step: 332420   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:09,850-Speed 3341.62 samples/sec   Loss 0.0957   LearningRate 0.0000   Epoch: 19   Global Step: 332430   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:12,913-Speed 3343.44 samples/sec   Loss 0.1206   LearningRate 0.0000   Epoch: 19   Global Step: 332440   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:15,985-Speed 3334.26 samples/sec   Loss 0.1000   LearningRate 0.0000   Epoch: 19   Global Step: 332450   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:19,154-Speed 3232.48 samples/sec   Loss 0.1153   LearningRate 0.0000   Epoch: 19   Global Step: 332460   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:22,379-Speed 3175.93 samples/sec   Loss 0.0948   LearningRate 0.0000   Epoch: 19   Global Step: 332470   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:25,531-Speed 3249.71 samples/sec   Loss 0.1039   LearningRate 0.0000   Epoch: 19   Global Step: 332480   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:28,591-Speed 3346.64 samples/sec   Loss 0.1008   LearningRate 0.0000   Epoch: 19   Global Step: 332490   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:31,719-Speed 3274.80 samples/sec   Loss 0.0921   LearningRate 0.0000   Epoch: 19   Global Step: 332500   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:34,806-Speed 3318.13 samples/sec   Loss 0.1020   LearningRate 0.0000   Epoch: 19   Global Step: 332510   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:26:37,867-Speed 3345.49 samples/sec   Loss 0.1066   LearningRate 0.0000   Epoch: 19   Global Step: 332520   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:26:40,945-Speed 3327.91 samples/sec   Loss 0.1021   LearningRate 0.0000   Epoch: 19   Global Step: 332530   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:26:43,998-Speed 3354.42 samples/sec   Loss 0.0898   LearningRate 0.0000   Epoch: 19   Global Step: 332540   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:47,151-Speed 3248.55 samples/sec   Loss 0.1006   LearningRate 0.0000   Epoch: 19   Global Step: 332550   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:50,347-Speed 3204.94 samples/sec   Loss 0.0976   LearningRate 0.0000   Epoch: 19   Global Step: 332560   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:53,442-Speed 3309.49 samples/sec   Loss 0.1008   LearningRate 0.0000   Epoch: 19   Global Step: 332570   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:56,534-Speed 3312.19 samples/sec   Loss 0.0971   LearningRate 0.0000   Epoch: 19   Global Step: 332580   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:26:59,640-Speed 3298.14 samples/sec   Loss 0.0978   LearningRate 0.0000   Epoch: 19   Global Step: 332590   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:02,708-Speed 3337.93 samples/sec   Loss 0.1133   LearningRate 0.0000   Epoch: 19   Global Step: 332600   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:05,781-Speed 3333.17 samples/sec   Loss 0.0945   LearningRate 0.0000   Epoch: 19   Global Step: 332610   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:08,863-Speed 3323.52 samples/sec   Loss 0.1055   LearningRate 0.0000   Epoch: 19   Global Step: 332620   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:11,926-Speed 3343.18 samples/sec   Loss 0.1076   LearningRate 0.0000   Epoch: 19   Global Step: 332630   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:14,994-Speed 3339.17 samples/sec   Loss 0.0999   LearningRate 0.0000   Epoch: 19   Global Step: 332640   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:27:18,047-Speed 3354.90 samples/sec   Loss 0.1024   LearningRate 0.0000   Epoch: 19   Global Step: 332650   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:21,182-Speed 3266.82 samples/sec   Loss 0.1093   LearningRate 0.0000   Epoch: 19   Global Step: 332660   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:24,250-Speed 3338.81 samples/sec   Loss 0.1066   LearningRate 0.0000   Epoch: 19   Global Step: 332670   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:27,347-Speed 3307.15 samples/sec   Loss 0.1105   LearningRate 0.0000   Epoch: 19   Global Step: 332680   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:30,425-Speed 3326.71 samples/sec   Loss 0.1114   LearningRate 0.0000   Epoch: 19   Global Step: 332690   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:33,499-Speed 3332.59 samples/sec   Loss 0.1009   LearningRate 0.0000   Epoch: 19   Global Step: 332700   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:36,562-Speed 3343.60 samples/sec   Loss 0.1064   LearningRate 0.0000   Epoch: 19   Global Step: 332710   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:39,663-Speed 3302.44 samples/sec   Loss 0.1048   LearningRate 0.0000   Epoch: 19   Global Step: 332720   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:42,737-Speed 3332.90 samples/sec   Loss 0.0910   LearningRate 0.0000   Epoch: 19   Global Step: 332730   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:45,821-Speed 3321.45 samples/sec   Loss 0.1083   LearningRate 0.0000   Epoch: 19   Global Step: 332740   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:48,898-Speed 3328.39 samples/sec   Loss 0.1058   LearningRate 0.0000   Epoch: 19   Global Step: 332750   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:27:51,955-Speed 3350.11 samples/sec   Loss 0.1032   LearningRate 0.0000   Epoch: 19   Global Step: 332760   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:55,024-Speed 3337.26 samples/sec   Loss 0.1115   LearningRate 0.0000   Epoch: 19   Global Step: 332770   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:27:58,079-Speed 3352.51 samples/sec   Loss 0.1050   LearningRate 0.0000   Epoch: 19   Global Step: 332780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:01,181-Speed 3302.50 samples/sec   Loss 0.1054   LearningRate 0.0000   Epoch: 19   Global Step: 332790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:04,242-Speed 3345.49 samples/sec   Loss 0.1053   LearningRate 0.0000   Epoch: 19   Global Step: 332800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:07,314-Speed 3334.59 samples/sec   Loss 0.1043   LearningRate 0.0000   Epoch: 19   Global Step: 332810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:10,384-Speed 3336.61 samples/sec   Loss 0.1104   LearningRate 0.0000   Epoch: 19   Global Step: 332820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:13,487-Speed 3300.23 samples/sec   Loss 0.0985   LearningRate 0.0000   Epoch: 19   Global Step: 332830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:16,588-Speed 3303.22 samples/sec   Loss 0.1013   LearningRate 0.0000   Epoch: 19   Global Step: 332840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:19,654-Speed 3340.29 samples/sec   Loss 0.1047   LearningRate 0.0000   Epoch: 19   Global Step: 332850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:22,739-Speed 3320.74 samples/sec   Loss 0.0982   LearningRate 0.0000   Epoch: 19   Global Step: 332860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:25,808-Speed 3336.97 samples/sec   Loss 0.1071   LearningRate 0.0000   Epoch: 19   Global Step: 332870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:28,884-Speed 3329.14 samples/sec   Loss 0.1002   LearningRate 0.0000   Epoch: 19   Global Step: 332880   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:28:32,009-Speed 3278.17 samples/sec   Loss 0.0962   LearningRate 0.0000   Epoch: 19   Global Step: 332890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:35,110-Speed 3303.04 samples/sec   Loss 0.1075   LearningRate 0.0000   Epoch: 19   Global Step: 332900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:38,217-Speed 3296.18 samples/sec   Loss 0.1125   LearningRate 0.0000   Epoch: 19   Global Step: 332910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:41,413-Speed 3204.81 samples/sec   Loss 0.1064   LearningRate 0.0000   Epoch: 19   Global Step: 332920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:44,489-Speed 3330.10 samples/sec   Loss 0.0968   LearningRate 0.0000   Epoch: 19   Global Step: 332930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:47,591-Speed 3301.70 samples/sec   Loss 0.1106   LearningRate 0.0000   Epoch: 19   Global Step: 332940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:50,695-Speed 3299.62 samples/sec   Loss 0.1020   LearningRate 0.0000   Epoch: 19   Global Step: 332950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:53,780-Speed 3319.53 samples/sec   Loss 0.1055   LearningRate 0.0000   Epoch: 19   Global Step: 332960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:56,850-Speed 3336.37 samples/sec   Loss 0.0985   LearningRate 0.0000   Epoch: 19   Global Step: 332970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:28:59,928-Speed 3327.33 samples/sec   Loss 0.0987   LearningRate 0.0000   Epoch: 19   Global Step: 332980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:03,000-Speed 3334.88 samples/sec   Loss 0.1012   LearningRate 0.0000   Epoch: 19   Global Step: 332990   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:29:06,064-Speed 3343.06 samples/sec   Loss 0.1133   LearningRate 0.0000   Epoch: 19   Global Step: 333000   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:29:09,138-Speed 3332.19 samples/sec   Loss 0.1042   LearningRate 0.0000   Epoch: 19   Global Step: 333010   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:29:12,215-Speed 3328.23 samples/sec   Loss 0.0963   LearningRate 0.0000   Epoch: 19   Global Step: 333020   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:29:15,279-Speed 3342.68 samples/sec   Loss 0.1002   LearningRate 0.0000   Epoch: 19   Global Step: 333030   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:29:18,350-Speed 3334.63 samples/sec   Loss 0.1136   LearningRate 0.0000   Epoch: 19   Global Step: 333040   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:29:21,423-Speed 3332.86 samples/sec   Loss 0.1140   LearningRate 0.0000   Epoch: 19   Global Step: 333050   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:29:24,502-Speed 3326.53 samples/sec   Loss 0.1051   LearningRate 0.0000   Epoch: 19   Global Step: 333060   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:29:27,575-Speed 3333.06 samples/sec   Loss 0.1031   LearningRate 0.0000   Epoch: 19   Global Step: 333070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:30,651-Speed 3330.13 samples/sec   Loss 0.1118   LearningRate 0.0000   Epoch: 19   Global Step: 333080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:33,766-Speed 3288.11 samples/sec   Loss 0.0994   LearningRate 0.0000   Epoch: 19   Global Step: 333090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:36,837-Speed 3335.27 samples/sec   Loss 0.1157   LearningRate 0.0000   Epoch: 19   Global Step: 333100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:39,956-Speed 3284.02 samples/sec   Loss 0.1055   LearningRate 0.0000   Epoch: 19   Global Step: 333110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:43,039-Speed 3322.33 samples/sec   Loss 0.0950   LearningRate 0.0000   Epoch: 19   Global Step: 333120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:46,123-Speed 3320.51 samples/sec   Loss 0.1059   LearningRate 0.0000   Epoch: 19   Global Step: 333130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:49,205-Speed 3323.62 samples/sec   Loss 0.1046   LearningRate 0.0000   Epoch: 19   Global Step: 333140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:52,463-Speed 3143.31 samples/sec   Loss 0.1022   LearningRate 0.0000   Epoch: 19   Global Step: 333150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:55,542-Speed 3326.31 samples/sec   Loss 0.1152   LearningRate 0.0000   Epoch: 19   Global Step: 333160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 10:29:58,619-Speed 3329.51 samples/sec   Loss 0.1126   LearningRate 0.0000   Epoch: 19   Global Step: 333170   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:01,691-Speed 3333.51 samples/sec   Loss 0.0982   LearningRate 0.0000   Epoch: 19   Global Step: 333180   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:04,767-Speed 3330.33 samples/sec   Loss 0.1056   LearningRate 0.0000   Epoch: 19   Global Step: 333190   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:07,832-Speed 3341.29 samples/sec   Loss 0.0988   LearningRate 0.0000   Epoch: 19   Global Step: 333200   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:10,918-Speed 3319.37 samples/sec   Loss 0.1104   LearningRate 0.0000   Epoch: 19   Global Step: 333210   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:13,984-Speed 3340.07 samples/sec   Loss 0.1010   LearningRate 0.0000   Epoch: 19   Global Step: 333220   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:17,053-Speed 3337.33 samples/sec   Loss 0.1103   LearningRate 0.0000   Epoch: 19   Global Step: 333230   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:20,126-Speed 3333.40 samples/sec   Loss 0.1089   LearningRate 0.0000   Epoch: 19   Global Step: 333240   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:23,216-Speed 3314.69 samples/sec   Loss 0.1057   LearningRate 0.0000   Epoch: 19   Global Step: 333250   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:26,279-Speed 3343.89 samples/sec   Loss 0.1096   LearningRate 0.0000   Epoch: 19   Global Step: 333260   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:29,429-Speed 3251.33 samples/sec   Loss 0.1013   LearningRate 0.0000   Epoch: 19   Global Step: 333270   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:30:32,537-Speed 3295.62 samples/sec   Loss 0.1061   LearningRate 0.0000   Epoch: 19   Global Step: 333280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:35,602-Speed 3341.70 samples/sec   Loss 0.0974   LearningRate 0.0000   Epoch: 19   Global Step: 333290   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:38,667-Speed 3341.15 samples/sec   Loss 0.1067   LearningRate 0.0000   Epoch: 19   Global Step: 333300   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:41,780-Speed 3291.01 samples/sec   Loss 0.1030   LearningRate 0.0000   Epoch: 19   Global Step: 333310   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:44,954-Speed 3226.24 samples/sec   Loss 0.1155   LearningRate 0.0000   Epoch: 19   Global Step: 333320   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:48,100-Speed 3256.45 samples/sec   Loss 0.1085   LearningRate 0.0000   Epoch: 19   Global Step: 333330   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:51,307-Speed 3194.33 samples/sec   Loss 0.1000   LearningRate 0.0000   Epoch: 19   Global Step: 333340   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:54,475-Speed 3235.11 samples/sec   Loss 0.0968   LearningRate 0.0000   Epoch: 19   Global Step: 333350   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:30:57,753-Speed 3124.71 samples/sec   Loss 0.1043   LearningRate 0.0000   Epoch: 19   Global Step: 333360   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:00,866-Speed 3289.68 samples/sec   Loss 0.0960   LearningRate 0.0000   Epoch: 19   Global Step: 333370   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:03,935-Speed 3337.99 samples/sec   Loss 0.1010   LearningRate 0.0000   Epoch: 19   Global Step: 333380   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:07,026-Speed 3313.39 samples/sec   Loss 0.1042   LearningRate 0.0000   Epoch: 19   Global Step: 333390   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:10,161-Speed 3266.68 samples/sec   Loss 0.1072   LearningRate 0.0000   Epoch: 19   Global Step: 333400   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:13,307-Speed 3256.21 samples/sec   Loss 0.1103   LearningRate 0.0000   Epoch: 19   Global Step: 333410   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:16,389-Speed 3322.80 samples/sec   Loss 0.1014   LearningRate 0.0000   Epoch: 19   Global Step: 333420   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:19,456-Speed 3340.32 samples/sec   Loss 0.1115   LearningRate 0.0000   Epoch: 19   Global Step: 333430   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:22,520-Speed 3342.59 samples/sec   Loss 0.1019   LearningRate 0.0000   Epoch: 19   Global Step: 333440   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:25,589-Speed 3337.55 samples/sec   Loss 0.0993   LearningRate 0.0000   Epoch: 19   Global Step: 333450   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:28,662-Speed 3332.42 samples/sec   Loss 0.1036   LearningRate 0.0000   Epoch: 19   Global Step: 333460   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:31,734-Speed 3334.29 samples/sec   Loss 0.1009   LearningRate 0.0000   Epoch: 19   Global Step: 333470   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:34,803-Speed 3337.12 samples/sec   Loss 0.1037   LearningRate 0.0000   Epoch: 19   Global Step: 333480   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:37,910-Speed 3296.89 samples/sec   Loss 0.0921   LearningRate 0.0000   Epoch: 19   Global Step: 333490   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:40,989-Speed 3326.04 samples/sec   Loss 0.1001   LearningRate 0.0000   Epoch: 19   Global Step: 333500   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:44,082-Speed 3311.84 samples/sec   Loss 0.1023   LearningRate 0.0000   Epoch: 19   Global Step: 333510   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:47,149-Speed 3339.63 samples/sec   Loss 0.1106   LearningRate 0.0000   Epoch: 19   Global Step: 333520   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:50,232-Speed 3321.76 samples/sec   Loss 0.1141   LearningRate 0.0000   Epoch: 19   Global Step: 333530   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:53,300-Speed 3338.33 samples/sec   Loss 0.1081   LearningRate 0.0000   Epoch: 19   Global Step: 333540   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:56,377-Speed 3328.64 samples/sec   Loss 0.1042   LearningRate 0.0000   Epoch: 19   Global Step: 333550   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:31:59,440-Speed 3344.43 samples/sec   Loss 0.1069   LearningRate 0.0000   Epoch: 19   Global Step: 333560   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:02,519-Speed 3326.70 samples/sec   Loss 0.1101   LearningRate 0.0000   Epoch: 19   Global Step: 333570   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:05,604-Speed 3319.41 samples/sec   Loss 0.1021   LearningRate 0.0000   Epoch: 19   Global Step: 333580   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:32:08,670-Speed 3341.09 samples/sec   Loss 0.1103   LearningRate 0.0000   Epoch: 19   Global Step: 333590   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:11,741-Speed 3334.63 samples/sec   Loss 0.1001   LearningRate 0.0000   Epoch: 19   Global Step: 333600   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:14,843-Speed 3302.58 samples/sec   Loss 0.1084   LearningRate 0.0000   Epoch: 19   Global Step: 333610   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:17,918-Speed 3330.47 samples/sec   Loss 0.1103   LearningRate 0.0000   Epoch: 19   Global Step: 333620   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:20,984-Speed 3340.80 samples/sec   Loss 0.1112   LearningRate 0.0000   Epoch: 19   Global Step: 333630   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:24,113-Speed 3273.27 samples/sec   Loss 0.1067   LearningRate 0.0000   Epoch: 19   Global Step: 333640   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:27,184-Speed 3335.22 samples/sec   Loss 0.1027   LearningRate 0.0000   Epoch: 19   Global Step: 333650   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:30,278-Speed 3310.24 samples/sec   Loss 0.1009   LearningRate 0.0000   Epoch: 19   Global Step: 333660   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:33,400-Speed 3280.33 samples/sec   Loss 0.1029   LearningRate 0.0000   Epoch: 19   Global Step: 333670   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:36,488-Speed 3317.27 samples/sec   Loss 0.1018   LearningRate 0.0000   Epoch: 19   Global Step: 333680   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:39,555-Speed 3339.87 samples/sec   Loss 0.1099   LearningRate 0.0000   Epoch: 19   Global Step: 333690   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:32:42,632-Speed 3328.87 samples/sec   Loss 0.1027   LearningRate 0.0000   Epoch: 19   Global Step: 333700   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:32:45,691-Speed 3347.94 samples/sec   Loss 0.1124   LearningRate 0.0000   Epoch: 19   Global Step: 333710   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:48,873-Speed 3218.79 samples/sec   Loss 0.1027   LearningRate 0.0000   Epoch: 19   Global Step: 333720   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:51,936-Speed 3343.50 samples/sec   Loss 0.1108   LearningRate 0.0000   Epoch: 19   Global Step: 333730   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:55,004-Speed 3338.91 samples/sec   Loss 0.0954   LearningRate 0.0000   Epoch: 19   Global Step: 333740   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:32:58,160-Speed 3245.10 samples/sec   Loss 0.0955   LearningRate 0.0000   Epoch: 19   Global Step: 333750   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:33:01,313-Speed 3248.03 samples/sec   Loss 0.0887   LearningRate 0.0000   Epoch: 19   Global Step: 333760   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:33:04,388-Speed 3332.03 samples/sec   Loss 0.1106   LearningRate 0.0000   Epoch: 19   Global Step: 333770   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:33:07,488-Speed 3304.00 samples/sec   Loss 0.1023   LearningRate 0.0000   Epoch: 19   Global Step: 333780   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:33:10,556-Speed 3338.62 samples/sec   Loss 0.0962   LearningRate 0.0000   Epoch: 19   Global Step: 333790   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:33:13,634-Speed 3327.11 samples/sec   Loss 0.1104   LearningRate 0.0000   Epoch: 19   Global Step: 333800   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 10:33:16,976-Speed 3064.13 samples/sec   Loss 0.1025   LearningRate 0.0000   Epoch: 19   Global Step: 333810   Fp16 Grad Scale: 262144   Required: 0 hours
Training: 2022-04-12 10:33:20,028-Speed 3356.27 samples/sec   Loss 0.1039   LearningRate 0.0000   Epoch: 19   Global Step: 333820   Fp16 Grad Scale: 131072   Required: -0 hours