Namespace(accumulate=1, batch_norm=False, batch_size=32, clip_grad=40, crop_ratio=0.875, data_dir='/home/ubuntu/.mxnet/datasets/hmdb51/rawframes', dataset='hmdb51', dtype='float32', eval=False, hard_weight=0.5, input_5d=False, input_size=224, kvstore=None, label_smoothing=False, last_gamma=False, log_interval=50, logging_file='2d_rgb_res50_seg3_hmdb51_b32_g8_run2.txt', lr=0.01, lr_decay=0.1, lr_decay_epoch='15,25,35', lr_decay_period=0, lr_mode='step', mixup=False, mixup_alpha=0.2, mixup_off_epoch=0, mode='hybrid', model='resnet50_v1b_hmdb51', momentum=0.9, new_height=256, new_length=1, new_step=1, new_width=340, no_wd=False, num_classes=51, num_crop=1, num_epochs=35, num_gpus=8, num_segments=3, num_workers=32, partial_bn=False, prefetch_ratio=1.0, resume_epoch=0, resume_params='', resume_states='', save_dir='/home/ubuntu/yizhu/logs/mxnet/hmdb51/2d_rgb_res50_seg3_hmdb51_b32_g8_run2', save_frequency=5, scale_ratios='1.0,0.8', teacher=None, temperature=20, train_list='/home/ubuntu/.mxnet/datasets/hmdb51/testTrainMulti_7030_splits/hmdb51_train_split_1_rawframes.txt', use_amp=False, use_decord=False, use_gn=False, use_pretrained=False, use_se=False, use_tsn=False, val_data_dir='~/.mxnet/datasets/ucf101/rawframes', val_list='/home/ubuntu/.mxnet/datasets/hmdb51/testTrainMulti_7030_splits/hmdb51_val_split_1_rawframes.txt', video_loader=False, warmup_epochs=0, warmup_lr=0.0, wd=0.0001) Total batch size is set to 256 on 8 GPUs ActionRecResNetV1b( (conv1): Conv2D(3 -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (relu): Activation(relu) (maxpool): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW) (layer1): HybridSequential( (0): BottleneckV1b( (conv1): Conv2D(64 -> 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (relu1): Activation(relu) (conv2): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (relu2): Activation(relu) (conv3): Conv2D(64 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu3): Activation(relu) (downsample): HybridSequential( (0): Conv2D(64 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) ) ) (1): BottleneckV1b( (conv1): Conv2D(256 -> 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (relu1): Activation(relu) (conv2): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (relu2): Activation(relu) (conv3): Conv2D(64 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu3): Activation(relu) ) (2): BottleneckV1b( (conv1): Conv2D(256 -> 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (relu1): Activation(relu) (conv2): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64) (relu2): Activation(relu) (conv3): Conv2D(64 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu3): Activation(relu) ) ) (layer2): HybridSequential( (0): BottleneckV1b( (conv1): Conv2D(256 -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (relu1): Activation(relu) (conv2): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (relu2): Activation(relu) (conv3): Conv2D(128 -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu3): Activation(relu) (downsample): HybridSequential( (0): Conv2D(256 -> 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) ) ) (1): BottleneckV1b( (conv1): Conv2D(512 -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (relu1): Activation(relu) (conv2): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (relu2): Activation(relu) (conv3): Conv2D(128 -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu3): Activation(relu) ) (2): BottleneckV1b( (conv1): Conv2D(512 -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (relu1): Activation(relu) (conv2): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (relu2): Activation(relu) (conv3): Conv2D(128 -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu3): Activation(relu) ) (3): BottleneckV1b( (conv1): Conv2D(512 -> 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (relu1): Activation(relu) (conv2): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128) (relu2): Activation(relu) (conv3): Conv2D(128 -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu3): Activation(relu) ) ) (layer3): HybridSequential( (0): BottleneckV1b( (conv1): Conv2D(512 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu1): Activation(relu) (conv2): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu2): Activation(relu) (conv3): Conv2D(256 -> 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu3): Activation(relu) (downsample): HybridSequential( (0): Conv2D(512 -> 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) ) ) (1): BottleneckV1b( (conv1): Conv2D(1024 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu1): Activation(relu) (conv2): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu2): Activation(relu) (conv3): Conv2D(256 -> 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu3): Activation(relu) ) (2): BottleneckV1b( (conv1): Conv2D(1024 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu1): Activation(relu) (conv2): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu2): Activation(relu) (conv3): Conv2D(256 -> 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu3): Activation(relu) ) (3): BottleneckV1b( (conv1): Conv2D(1024 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu1): Activation(relu) (conv2): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu2): Activation(relu) (conv3): Conv2D(256 -> 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu3): Activation(relu) ) (4): BottleneckV1b( (conv1): Conv2D(1024 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu1): Activation(relu) (conv2): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu2): Activation(relu) (conv3): Conv2D(256 -> 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu3): Activation(relu) ) (5): BottleneckV1b( (conv1): Conv2D(1024 -> 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu1): Activation(relu) (conv2): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=256) (relu2): Activation(relu) (conv3): Conv2D(256 -> 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=1024) (relu3): Activation(relu) ) ) (layer4): HybridSequential( (0): BottleneckV1b( (conv1): Conv2D(1024 -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu1): Activation(relu) (conv2): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu2): Activation(relu) (conv3): Conv2D(512 -> 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) (relu3): Activation(relu) (downsample): HybridSequential( (0): Conv2D(1024 -> 2048, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) ) ) (1): BottleneckV1b( (conv1): Conv2D(2048 -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu1): Activation(relu) (conv2): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu2): Activation(relu) (conv3): Conv2D(512 -> 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) (relu3): Activation(relu) ) (2): BottleneckV1b( (conv1): Conv2D(2048 -> 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu1): Activation(relu) (conv2): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=512) (relu2): Activation(relu) (conv3): Conv2D(512 -> 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=2048) (relu3): Activation(relu) ) ) (avgpool): GlobalAvgPool2D(size=(1, 1), stride=(1, 1), padding=(0, 0), ceil_mode=True, global_pool=True, pool_type=avg, layout=NCHW) (flat): Flatten (drop): Dropout(p = 0.9, axes=()) (output): Dense(2048 -> 51, linear) ) Load 3570 training samples and 1530 validation samples. [Epoch 000] training: accuracy=10.787260 loss=3.828077 [Epoch 000] speed: 141 samples/sec time cost: 26.926219 [Epoch 000] validation: acc-top1=23.437500 acc-top5=59.609375 loss=3.507447 [Epoch 001] training: accuracy=30.050223 loss=3.275561 [Epoch 001] speed: 321 samples/sec time cost: 15.457790 [Epoch 001] validation: acc-top1=34.375000 acc-top5=68.906250 loss=2.795604 [Epoch 002] training: accuracy=40.513393 loss=2.539770 [Epoch 002] speed: 307 samples/sec time cost: 15.987915 [Epoch 002] validation: acc-top1=39.453125 acc-top5=70.312500 loss=2.293683 [Epoch 003] training: accuracy=46.958705 loss=2.011451 [Epoch 003] speed: 308 samples/sec time cost: 15.940676 [Epoch 003] validation: acc-top1=48.515625 acc-top5=78.984375 loss=1.915870 [Epoch 004] training: accuracy=52.092634 loss=1.719302 [Epoch 004] speed: 311 samples/sec time cost: 15.804543 [Epoch 004] validation: acc-top1=46.093750 acc-top5=75.703125 loss=1.924335 [Epoch 005] training: accuracy=58.789062 loss=1.507441 [Epoch 005] speed: 313 samples/sec time cost: 16.576678 [Epoch 005] validation: acc-top1=51.171875 acc-top5=80.390625 loss=1.779573 [Epoch 006] training: accuracy=63.281250 loss=1.330775 [Epoch 006] speed: 313 samples/sec time cost: 17.278347 [Epoch 006] validation: acc-top1=47.109375 acc-top5=73.437500 loss=1.988118 [Epoch 007] training: accuracy=65.931920 loss=1.201734 [Epoch 007] speed: 321 samples/sec time cost: 15.615073 [Epoch 007] validation: acc-top1=48.750000 acc-top5=79.921875 loss=1.789808 [Epoch 008] training: accuracy=69.893973 loss=1.078497 [Epoch 008] speed: 325 samples/sec time cost: 15.571808 [Epoch 008] validation: acc-top1=51.250000 acc-top5=81.093750 loss=1.774634 [Epoch 009] training: accuracy=69.949777 loss=1.018738 [Epoch 009] speed: 322 samples/sec time cost: 15.726058 [Epoch 009] validation: acc-top1=54.765625 acc-top5=84.687500 loss=1.608350 [Epoch 010] training: accuracy=73.800223 loss=0.912908 [Epoch 010] speed: 298 samples/sec time cost: 16.380226 [Epoch 010] validation: acc-top1=53.046875 acc-top5=83.750000 loss=1.647892 [Epoch 011] training: accuracy=76.395089 loss=0.841036 [Epoch 011] speed: 329 samples/sec time cost: 15.229286 [Epoch 011] validation: acc-top1=48.359375 acc-top5=78.906250 loss=1.923353 [Epoch 012] training: accuracy=77.985491 loss=0.771780 [Epoch 012] speed: 303 samples/sec time cost: 15.891799 [Epoch 012] validation: acc-top1=53.750000 acc-top5=83.359375 loss=1.700463 [Epoch 013] training: accuracy=79.073661 loss=0.717152 [Epoch 013] speed: 301 samples/sec time cost: 16.982080 [Epoch 013] validation: acc-top1=54.140625 acc-top5=85.000000 loss=1.657834 [Epoch 014] training: accuracy=80.245536 loss=0.684261 [Epoch 014] speed: 336 samples/sec time cost: 15.607970 [Epoch 014] validation: acc-top1=53.828125 acc-top5=86.015625 loss=1.729456 [Epoch 015] training: accuracy=80.691964 loss=0.664447 [Epoch 015] speed: 341 samples/sec time cost: 15.332082 [Epoch 015] validation: acc-top1=56.406250 acc-top5=86.250000 loss=1.613456 [Epoch 016] training: accuracy=82.672991 loss=0.611647 [Epoch 016] speed: 332 samples/sec time cost: 15.772933 [Epoch 016] validation: acc-top1=56.796875 acc-top5=86.484375 loss=1.580554 [Epoch 017] training: accuracy=82.338170 loss=0.602388 [Epoch 017] speed: 339 samples/sec time cost: 15.426174 [Epoch 017] validation: acc-top1=57.265625 acc-top5=87.031250 loss=1.588854 [Epoch 018] training: accuracy=83.143029 loss=0.585449 [Epoch 018] speed: 350 samples/sec time cost: 14.005368 [Epoch 018] validation: acc-top1=56.953125 acc-top5=87.109375 loss=1.587513 [Epoch 019] training: accuracy=83.900670 loss=0.572270 [Epoch 019] speed: 356 samples/sec time cost: 14.917467 [Epoch 019] validation: acc-top1=58.593750 acc-top5=87.265625 loss=1.550956 [Epoch 020] training: accuracy=84.151786 loss=0.556646 [Epoch 020] speed: 326 samples/sec time cost: 15.634466 [Epoch 020] validation: acc-top1=58.125000 acc-top5=86.718750 loss=1.581718 [Epoch 021] training: accuracy=84.626116 loss=0.542892 [Epoch 021] speed: 335 samples/sec time cost: 14.979640 [Epoch 021] validation: acc-top1=58.359375 acc-top5=87.187500 loss=1.565735 [Epoch 022] training: accuracy=84.849330 loss=0.543301 [Epoch 022] speed: 316 samples/sec time cost: 15.288911 [Epoch 022] validation: acc-top1=57.890625 acc-top5=87.031250 loss=1.581604 [Epoch 023] training: accuracy=84.737723 loss=0.532924 [Epoch 023] speed: 350 samples/sec time cost: 15.637734 [Epoch 023] validation: acc-top1=57.656250 acc-top5=86.406250 loss=1.598465 [Epoch 024] training: accuracy=85.295759 loss=0.521783 [Epoch 024] speed: 343 samples/sec time cost: 14.920343 [Epoch 024] validation: acc-top1=57.578125 acc-top5=87.109375 loss=1.592211 [Epoch 025] training: accuracy=84.095982 loss=0.529248 [Epoch 025] speed: 350 samples/sec time cost: 14.490086 [Epoch 025] validation: acc-top1=56.953125 acc-top5=86.953125 loss=1.591435 [Epoch 026] training: accuracy=84.821429 loss=0.523576 [Epoch 026] speed: 319 samples/sec time cost: 15.871717 [Epoch 026] validation: acc-top1=57.890625 acc-top5=87.109375 loss=1.582557 [Epoch 027] training: accuracy=84.626116 loss=0.525866 [Epoch 027] speed: 344 samples/sec time cost: 14.901063 [Epoch 027] validation: acc-top1=57.812500 acc-top5=87.031250 loss=1.574968 [Epoch 028] training: accuracy=84.040179 loss=0.547114 [Epoch 028] speed: 338 samples/sec time cost: 15.139198 [Epoch 028] validation: acc-top1=58.125000 acc-top5=87.265625 loss=1.574651 [Epoch 029] training: accuracy=84.626116 loss=0.533297 [Epoch 029] speed: 353 samples/sec time cost: 14.094021 [Epoch 029] validation: acc-top1=58.046875 acc-top5=87.031250 loss=1.579369 [Epoch 030] training: accuracy=84.793527 loss=0.531224 [Epoch 030] speed: 353 samples/sec time cost: 14.149825 [Epoch 030] validation: acc-top1=57.890625 acc-top5=87.187500 loss=1.585400 [Epoch 031] training: accuracy=85.128348 loss=0.530558 [Epoch 031] speed: 342 samples/sec time cost: 15.208911 [Epoch 031] validation: acc-top1=57.500000 acc-top5=87.265625 loss=1.581352 [Epoch 032] training: accuracy=85.686384 loss=0.514999 [Epoch 032] speed: 347 samples/sec time cost: 14.534368 [Epoch 032] validation: acc-top1=58.203125 acc-top5=87.109375 loss=1.587227 [Epoch 033] training: accuracy=85.044643 loss=0.523267 [Epoch 033] speed: 343 samples/sec time cost: 15.035856 [Epoch 033] validation: acc-top1=57.500000 acc-top5=86.484375 loss=1.592650 [Epoch 034] training: accuracy=85.881696 loss=0.506416 [Epoch 034] speed: 329 samples/sec time cost: 16.005230 [Epoch 034] validation: acc-top1=56.953125 acc-top5=86.875000 loss=1.594530