--- title: "Neural Networks and Deep Learning - Part II (Practice)" author: "Dr. Hua Zhou @ UCLA" date: "2/22/2022" subtitle: Biostat 203B output: html_document: toc: true toc_depth: 4 --- ```{r setup, include=FALSE} options(width = 120) knitr::opts_chunk$set(echo = TRUE) sessionInfo() ``` ## Learning sources This lecture draws heavily on following sources. - _Learning Deep Learning_ lectures by Dr. Qiyang Hu (UCLA Office of Advanced Research Computing): ## Software - High-level software focuses on user-friendly interface to specify and train models. [Keras](https://keras.io), [scikit-learn](http://scikit-learn.org/stable/), ... - Lower-level software focuses on developer tools for implementing deep learning models. [TensorFlow](https://www.tensorflow.org), [PyTorch](http://pytorch.org), [Theano](http://deeplearning.net/software/theano/#), [CNTK](https://github.com/Microsoft/CNTK), [Caffe](http://caffe.berkeleyvision.org), [Torch](http://torch.ch), ... - Most tools are developed in Python plus a low-level language (C/C++, CUDA). ## TensorFlow - Developed by Google Brain team for internal Google use. Formerly DistBelief. - Open sourced in Nov 2015. - OS: Linux, MacOS, and Windows (since Nov 2016). - GPU support: NVIDIA CUDA. - TPU (tensor processing unit), built specifically for machine learning and tailored for TensorFlow. - Mobile device deployment: TensorFlow Lite (May 2017) for Android and iOS.

![](./tf_toolkit_hierarchy.png){width=600px}

- Used in a variety of Google apps: speech recognition (Google assistant), Gmail (Smart Reply), search, translate, self-driving car, ... > when you have a hammer, everything looks like a nail.

![](./hammer.jpg){width=200px}

- [Machine Learning Crash Course (MLCC)](https://developers.google.com/machine-learning/crash-course/?utm_source=google-ai&utm_medium=card-image&utm_campaign=training-hub&utm_content=ml-crash-course). A 15 hour workshop available to public since March 1, 2018. ## R/RStudio R users can access Keras and TensorFlow via the `keras` and `tensorflow` packages. ```{r, eval=FALSE} #install.packages("keras") library(keras) install_keras() # install_keras(tensorflow = "gpu") # if NVIDIA GPU is available ``` On teaching server, it may be necessary to run ```{r, eval=F} library(reticulate) virtualenv_create("r-reticulate") ``` to create a virtual environment `~/.virtualenvs/r-reticulate` to install Keras locally. ## Workflow for a deep learning network

![](./dl_workflow.png){width=750px}

### Step 1: Data ingestion, preparation, and processing

![](./data_scientists.png){width=750px}

Source: [CrowdFlower](https://visit.figure-eight.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf) - The most time-consuming but the most _creative_ job. Take $>80%$ time, require experience and domain knowledge. - Determines the upper limit for the goodness of DL. ``Garbage in, garbage out``. - For structured/tabular data.

![](./dataprep_tabular_data.png){width=500px}

- Data prep for special DL tasks. - Image data: pixel scaling, train-time augmentation, test-time augmentation, convolution and flattening. - Data tokenization: break sequences into units, map units to vectors, align and padd sequences. - Data embedding: sparse to dense, merge diverse data, preserve relationship, dimension reduction, Word2Vec, be part of model training. ### Step 2: Select neural network - Architecture.

![](./NeuralNetworkZo19High.png){width=500px}

Source: - Activation function.

![](./choose_activation.png){width=750px}

### Step 3: Select loss function - Regression loss: MSE/quadratic loss/L2 loss, mean absolute error/L1 loss. - Classification loss: cross-entropy loss, ... - Customized losses. ### Step 4: Train and evaluate model - Choose optimization algorithm. Generalization (SGD) vs convergence rate (adaptive). - Stochastic GD. - Adding momentum: classical momentum, Nesterov acceleration. - Adaptive learning rate: AdaGrad, AdaDelta, RMSprop. - Comining acceleration and adaptive learning rate: ADAM (default in many libraries). - Beyond ADAM: [lookahead](https://arxiv.org/abs/1907.08610), [RAdam](https://arxiv.org/abs/1908.03265), [AdaBound/AmsBound](https://syncedreview.com/2019/03/07/iclr-2019-fast-as-adam-good-as-sgd-new-optimizer-has-both/), [Range](https://arxiv.org/abs/1908.00700v2), [AdaBelief](https://arxiv.org/abs/2010.07468). _A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam)_ by Lili Jiang: - Fitness of model: underfitting vs overfitting.

![](./overfitting_vs_underfitting.png){width=500px}

Source: - Model selection: $K$-fold cross validation.

![](./cross_validation.png){width=750px}

## Example: MNIST - MLP [Rmd](https://raw.githubusercontent.com/ucla-biostat-203b/2022winter/main/slides/15-nn/mnist_cnn/mnist_cnn.Rmd), [html](./mnist_mlp/mnist_mlp.html). ## Example: MNIST - CNN [Rmd](https://raw.githubusercontent.com/ucla-biostat-203b/2022winter/main/slides/15-nn/mnist_cnn/mnist_cnn.Rmd), [html](./mnist_cnn/mnist_cnn.html). ## Example: Generate text from Nietzsche’s writings - RNN LSTM [Rmd](https://raw.githubusercontent.com/ucla-biostat-203b/2022winter/main/slides/15-nn/nietzsche_lstm/nietzsche_lstm.Rmd), [html](./nietzsche_lstm/nietzsche_lstm.html). ## Example: IMDB review sentiment analysis - RNN LSTM [Rmd](https://raw.githubusercontent.com/ucla-biostat-203b/2022winter/main/slides/15-nn/imdb_lstm/imdb_lstm.Rmd), [html](./imdb_lstm/imdb_lstm.html). ## Example: Generate handwritten digits from MNIST - GAN [Rmd](https://raw.githubusercontent.com/ucla-biostat-203b/2022winter/main/slides/15-nn/mnist_acgan/mnist_acgan.Rmd), [html](./mnist_acgan/mnist_acgan.html).