--- title: "CNN on the CIFAR100 Data" subtitle: "Econ 425T / Biostat 203B" author: "Dr. Hua Zhou @ UCLA" date: "`r format(Sys.time(), '%d %B, %Y')`" format: html: theme: cosmo number-sections: true toc: true toc-depth: 4 toc-location: left code-fold: false engine: knitr knitr: opts_chunk: fig.align: 'center' # fig.width: 6 # fig.height: 4 message: FALSE cache: false --- Display system information for reproducibility. ::: {.panel-tabset} #### Python ```{python} import IPython print(IPython.sys_info()) ``` #### R ```{r} sessionInfo() ``` ::: Load some libraries. ::: {.panel-tabset} #### Python ```{python} # Load the pandas library import pandas as pd # Load numpy for array manipulation import numpy as np # Load seaborn plotting library import seaborn as sns import matplotlib.pyplot as plt # Set font sizes in plots sns.set(font_scale = 1.2) # Display all columns pd.set_option('display.max_columns', None) # Load Tensorflow and Keras import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers ``` #### R ```{r} library(keras) library(jpeg) ``` ::: In this example, we train a CNN (convolution neural network) on the [CIFAR100](https://www.cs.toronto.edu/~kriz/cifar.html) data set. Achieve testing accuracy 44.5% after 30 epochs. Random guess would have an accuracy of about 1%. - The **CIFAR100** database is a large database of $32 \times 32$ color images that is commonly used for training and testing machine learning algorithms. - 50,000 training images, 10,000 testing images. ## Prepare data Acquire data: ::: {.panel-tabset} #### Python ```{python} # Load the data and split it between train and test sets (x_train, y_train), (x_test, y_test) = keras.datasets.cifar100.load_data() # Training set x_train.shape y_train.shape # Test set x_test.shape y_test.shape ``` #### R ```{r} cifar100 <- dataset_cifar100() x_train <- cifar100$train$x y_train <- cifar100$train$y x_test <- cifar100$test$x y_test <- cifar100$test$y ``` Training set: ```{r} dim(x_train) dim(y_train) ``` Testing set: ```{r} dim(y_train) dim(y_test) ``` ::: For CNN, we keep the $32 \times 32 \times 3$ tensor structure, instead of vectorizing into a long vector. ::: {.panel-tabset} #### Python ```{python} # Rescale x_train = x_train / 255 x_test = x_test / 255 # Train x_train.shape # Test x_test.shape ``` #### R ```{r} # rescale x_train <- x_train / 255 x_test <- x_test / 255 dim(x_train) dim(x_test) ``` ::: Encode $y$ as binary class matrix: ::: {.panel-tabset} #### Python ```{python} y_train = keras.utils.to_categorical(y_train, 100) y_test = keras.utils.to_categorical(y_test, 100) # Train y_train.shape # Test y_test.shape # First train instance y_train[0] ``` #### R ```{r} y_train <- to_categorical(y_train, 100) y_test <- to_categorical(y_test, 100) dim(y_train) dim(y_test) # head(y_train) ``` ::: Show a few images: ::: {.panel-tabset} #### Python ```{python} import matplotlib.pyplot as plt # Feature: 32x32 color image for i in range(25): plt.figure() plt.imshow(x_train[i]); plt.show() ``` #### R ```{r} par(mar = c(0, 0, 0, 0), mfrow = c(5, 5)) index <- sample(seq(50000), 25) for (i in index) plot(as.raster(x_train[i,,, ])) ``` ::: ## Define the model Define a **sequential model** (a linear stack of layers) with 2 fully-connected hidden layers (256 and 128 neurons): ::: {.panel-tabset} #### Python ```{python} model = keras.Sequential( [ keras.Input(shape = (32, 32, 3)), layers.Conv2D( filters = 32, kernel_size = (3, 3), padding = 'same', activation = 'relu', # input_shape = (32, 32, 3) ), layers.MaxPooling2D(pool_size = (2, 2)), layers.Conv2D( filters = 64, kernel_size = (3, 3), padding = 'same', activation = 'relu' ), layers.MaxPooling2D(pool_size = (2, 2)), layers.Conv2D( filters = 128, kernel_size = (3, 3), padding = 'same', activation = 'relu' ), layers.MaxPooling2D(pool_size = (2, 2)), layers.Conv2D( filters = 256, kernel_size = (3, 3), padding = 'same', activation = 'relu' ), layers.MaxPooling2D(pool_size = (2, 2)), layers.Flatten(), layers.Dropout(rate = 0.5), layers.Dense(units = 512, activation = 'relu'), layers.Dense(units = 100, activation = 'softmax') ] ) model.summary() ``` Plot the model: ```{python} tf.keras.utils.plot_model( model, to_file = "model.png", show_shapes = True, show_dtype = False, show_layer_names = True, rankdir = "TB", expand_nested = False, dpi = 96, layer_range = None, show_layer_activations = False, ) ```

![](./model.png){width=500px}

#### R ```{r} model <- keras_model_sequential() %>% layer_conv_2d( filters = 32, kernel_size = c(3, 3), padding = "same", activation = "relu", input_shape = c(32, 32, 3) ) %>% layer_max_pooling_2d(pool_size = c(2, 2)) %>% layer_conv_2d( filters = 64, kernel_size = c(3, 3), padding = "same", activation = "relu" ) %>% layer_max_pooling_2d(pool_size = c(2, 2)) %>% layer_conv_2d( filters = 128, kernel_size = c(3, 3), padding = "same", activation = "relu" ) %>% layer_max_pooling_2d(pool_size = c(2, 2)) %>% layer_conv_2d( filters = 256, kernel_size = c(3, 3), padding = "same", activation = "relu" ) %>% layer_max_pooling_2d(pool_size = c(2, 2)) %>% layer_flatten() %>% layer_dropout(rate = 0.5) %>% layer_dense(units = 512, activation = "relu") %>% layer_dense(units = 100, activation = "softmax") summary(model) ``` ::: Compile the model with appropriate loss function, optimizer, and metrics: ::: {.panel-tabset} #### Python ```{python} model.compile( loss = "categorical_crossentropy", optimizer = "rmsprop", metrics = ["accuracy"] ) ``` #### R ```{r} model %>% compile( loss = 'categorical_crossentropy', optimizer = optimizer_rmsprop(), metrics = c('accuracy') ) ``` ::: ## Training and validation 80%/20% split for the train/validation set. On my laptop, each epoch takes about half minute. ::: {.panel-tabset} #### Python ```{python} #| output: false batch_size = 128 epochs = 30 history = model.fit( x_train, y_train, batch_size = batch_size, epochs = epochs, validation_split = 0.2 ) ``` Plot training history: ```{python} #| code-fold: true hist = pd.DataFrame(history.history) hist['epoch'] = np.arange(1, epochs + 1) hist = hist.melt( id_vars = ['epoch'], value_vars = ['loss', 'accuracy', 'val_loss', 'val_accuracy'], var_name = 'type', value_name = 'value' ) hist['split'] = np.where(['val' in s for s in hist['type']], 'validation', 'train') hist['metric'] = np.where(['loss' in s for s in hist['type']], 'loss', 'accuracy') # Accuracy trace plot plt.figure() sns.relplot( data = hist[hist['metric'] == 'accuracy'], kind = 'scatter', x = 'epoch', y = 'value', hue = 'split' ).set( xlabel = 'Epoch', ylabel = 'Accuracy' ); plt.show() # Loss trace plot plt.figure() sns.relplot( data = hist[hist['metric'] == 'loss'], kind = 'scatter', x = 'epoch', y = 'value', hue = 'split' ).set( xlabel = 'Epoch', ylabel = 'Loss' ); plt.show() ``` #### R ```{r} system.time({ history <- model %>% fit( x_train, y_train, epochs = 30, batch_size = 128, validation_split = 0.2 ) }) plot(history) ``` ::: ## Testing Evaluate model performance on the test data: ::: {.panel-tabset} #### Python ```{python} score = model.evaluate(x_test, y_test, verbose = 0) print("Test loss:", score[0]) print("Test accuracy:", score[1]) ``` #### R ```{r} model %>% evaluate(x_test, y_test) ``` Generate predictions on new data: ```{r} model %>% predict(x_test) %>% k_argmax() ``` :::