# Age, Gender, and Race Classification with Multi-outputs Convolutional Neural Networks This project built a multi-output deep convolutional neural to classify the age, gender, and race for each image included in the UTK Face dataset, reaching an accuracy of 91.22% for gender and 81.23% for race. The best model in this project consists of 16 convolutional layers, 3 fully connected layers for each class (age, gender, and race), and a final 8-way softmax for age class, 2-way softmax for gender class, and 5-way softmax for race class. To reduce overfitting, this project 1) added max-pooling layer and batch normalization between successive convolutional layers and dropout layer before each fully connected layer, 2) applied data augmentation, and 3) early stopping. ## 1 Dataset ### 1.1 Overview

The dataset this project used is UTKFace dataset, which is released by UCI machine learning repository, consists of over 20,000 face images with annotations of age, gender, and ethnicity. As shown in the picture above, although the images are properly cropped and only contains the face region, there are variations in pose, facial expression, illumination. etc. among images, and thus this project thinks data augmentation is needed. More information of this dataset please check [this website](http://aicip.eecs.utk.edu/wiki/UTKFace). ### 1.2 Distribution of Gender, Age, and Race in the dataset #### 1.2.1 Gender

#### 1.2.2 Age

#### 1.2.2 Race

## 2 Strategies Applied for Reducing Overfitting ### 2.1 Max-pooling When the training dataset is not large enough to contain all the features in the whole dataset, overfitting happens. By adding max-pooling layer, the size of spatial size and the number of parameters will be reduced (et. only a subset of features which has the max value will be selected), as a result the model is less likely to learn false patterns. ### 2.2 Batch Normalization Regularization introduces additional information to the model and thus reduce overfitting problem. One of the problems encountered often when training Deep Neural Networks is internal covariate shift, which is caused by the different distribution of each layer’s inputs. Luckily, batch normalization can avoid the problem by reducing the amount of hidden unit values shift around. Also, each layer of the network can learn more independently from other layers ### 2.3 Dropout One of the major challenges in training Deep Neural Network is deciding when to stop the training. With too short time of training, underfitting occurs, while with too long time of training, overfitting occurs. This project will also apply early stopping to reduce overfitting – stop training when performance on the validation dataset starts to degrade ### 2.4 Data Augmentation Data Augmentation reduces overfitting by enlarging the features in the training set and avoid the training model learning false patterns. Data Augmentation has been used widely and show effective results in image classification Below an example after applying data augmentation of one of the images in the dataset:

## 3 Model Architecture ### 3.3.1 Model 1

Model 1 has 5 convolutional layers, and 3 fully connectect layers for each output. ### 3.3.2 Model 2

Model2 has 10 convolutional layers, and 3 fully connectect layers for each output. ### 3.3.3 Model 3

Model3 has 16 convolutional layers with residual learning, and 3 fully connectect layers for each output. ## 4 Result Model 3 has the best performance, after adding simple convolutional layers and apply residual learning, the accuracies for image classification have been improved on age, gender, and race. With model 3, the accuracy for classifying age, gender, and race are 59.49%, 91.22%, and 81.23%. Below is the results predicted by model 3 of 16 randomly selected images :

## References UTK Face Dataset: http://aicip.eecs.utk.edu/wiki/UTKFace Keras Multi-output documentation: https://keras.io/getting-started/functional-api-guide/ Krizhevsky, A., Sutskever, I., and Hinton, G.E. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. Karen Simonyan & Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR, 2015 K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556,2014. Wang, J., Perez, L.: The Effectiveness of Data Augmentation in Image Classification using Deep Learning. http://cs231n.stanford.edu/reports/2017/pdfs/300.pdf Zhang, C., Vinyals, O., Munos, R., et al.: A Study on Overfitting in Deep Reinforcement Learning. https://arxiv.org/pdf/1804.06893.pdf Ioffe, S. & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.. CoRR, abs/1502.03167. Hernandez-Garcia, A., Konig, P.: Do deep nets really need weight decay and dropout? https://arxiv.org/pdf/1802.07042.pdf Prechelt, L. (1996). Early Stopping-But When?. In G. B. Orr & K.-R. Müller (ed.), Neural Networks: Tricks of the Trade , Vol. 1524 (pp. 55-69) . Springer . ISBN: 3-540-65311-2. S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell.Understanding data augmentation for classification: when to warp? CoRR, abs/1609.08764, 2016.