--- title: "Neural Network and Deep Learning (Practice)" subtitle: "Econ 425T / Biostat 203B" author: "Dr. Hua Zhou @ UCLA" date: "`r format(Sys.time(), '%d %B, %Y')`" format: html: theme: cosmo embed-resources: true number-sections: true toc: true toc-depth: 4 toc-location: left code-fold: false engine: knitr knitr: opts_chunk: fig.align: 'center' # fig.width: 6 # fig.height: 4 message: FALSE cache: false --- ```{r setup, include=FALSE} options(width = 120) knitr::opts_chunk$set(echo = TRUE) sessionInfo() ``` ## Learning sources This lecture draws heavily on following sources. - [_Deep Learning with Python_](https://www.manning.com/books/deep-learning-with-python) by Francois Chollet. - [_Deep Learning Tuning Playbook_](https://github.com/google-research/tuning_playbook) by Google Research. - _Learning Deep Learning_ lectures by Dr. Qiyang Hu (UCLA Office of Advanced Research Computing): ## Software - High-level software focuses on user-friendly interface to specify and train models. [Keras](https://keras.io), [PyTorch](http://pytorch.org), [scikit-learn](http://scikit-learn.org/stable/), ... - Lower-level software focuses on developer tools for implementing deep learning models. [TensorFlow](https://www.tensorflow.org), [PyTorch](http://pytorch.org), [CNTK](https://github.com/Microsoft/CNTK), [Theano](https://github.com/Theano/Theano) (stopped development!), [Caffe](http://caffe.berkeleyvision.org), [Torch](http://torch.ch), ... - Most tools are developed in Python plus a low-level language (C/C++, CUDA).

![](./karas-pytorch-tensorflow.png){width=500px}

Source: ## TensorFlow - Developed by Google Brain team for internal Google use. Formerly DistBelief. - Open sourced in Nov 2015. - OS: Linux, MacOS, and Windows (since Nov 2016). - GPU support: NVIDIA CUDA. - TPU (tensor processing unit), built specifically for machine learning and tailored for TensorFlow. - Mobile device deployment: TensorFlow Lite (May 2017) for Android and iOS.

![](./tf_toolkit_hierarchy.png){width=600px}

- TensorFlow supports [distributed training](https://www.tensorflow.org/guide/distributed_training). - TensorFlow does not support Apple Silicon (M1/M2) directly, but Apple provides the `tensorflow-macos` package for running on M1/M2 GPUs. - Used in variety of Google apps: speech recognition (Google assistant), Gmail (Smart Reply), search, translate, self-driving car, ... > when you have a hammer, everything looks like a nail.

![](./hammer.jpg){width=200px}

## Workflow for a deep learning network

![](./dl_workflow.png){width=750px}

### Step 1: Data ingestion, preparation, and processing

![](./data_scientists.png){width=750px}

Source: [CrowdFlower](https://visit.figure-eight.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf) - The most time-consuming but the most _creative_ job. Take >80% time, require experience and domain knowledge. - Determines the upper limit for the goodness of DL. ``Garbage in, garbage out``. - For structured/tabular data.

![](./dataprep_tabular_data.png){width=500px}

- Data prep for special DL tasks. - Image data: pixel scaling, train-time augmentation, test-time augmentation, convolution and flattening. - Data tokenization: break sequences into units, map units to vectors, align and pad sequences. - Data embedding: sparse to dense, merge diverse data, preserve relationship, dimension reduction, Word2Vec, be part of model training. ### Step 2: Select neural network - Architecture.

![](./NeuralNetworkZo19High.png){width=500px}

Source: - Activation function.

![](./choose_activation.png){width=750px}

### Step 3: Select loss function - Regression loss: MSE/quadratic loss/L2 loss, mean absolute error/L1 loss. - Classification loss: cross-entropy loss, ... - Customized losses. ### Step 4: Train and evaluate model - Choose optimization algorithm. Generalization (SGD) vs convergence rate (adaptive). - Stochastic GD. - Adding momentum: classical momentum, Nesterov acceleration. [Visualize](https://ucla-biostat-216.github.io/2022fall/slides/13-optim/13-optim.html) - Adaptive learning rate: AdaGrad, AdaDelta, RMSprop. - Comining acceleration and adaptive learning rate: ADAM (default in many libraries). - Beyond ADAM: [lookahead](https://arxiv.org/abs/1907.08610), [RAdam](https://arxiv.org/abs/1908.03265), [AdaBound/AmsBound](https://syncedreview.com/2019/03/07/iclr-2019-fast-as-adam-good-as-sgd-new-optimizer-has-both/), [Range](https://arxiv.org/abs/1908.00700v2), [AdaBelief](https://arxiv.org/abs/2010.07468). _A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam)_ by Lili Jiang: - Fitness of model: underfitting vs overfitting.

![](./overfitting_vs_underfitting.png){width=500px}

Source: - Model selection: $K$-fold cross validation.

![](./cross_validation.png){width=750px}

## Keras examples Following are selected examples from the collection of [Keras code examples](https://keras.io/examples/). ## Example: MNIST - MLP [qmd](https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/10-nn/mnist_mlp/mnist_mlp.qmd), [html](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/mnist_mlp/mnist_mlp.html). ## Example: CIFAR100 - CNN [qmd](https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/10-nn/cifar100_cnn/cifar100_cnn.qmd), [html](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/cifar100_cnn/cifar100_cnn.html). ## Example: Using Pretrained Resnet50 to classify natural images [qmd](https://raw.githubusercontent.com/ucla-econ-425t/2023winter/master/slides/10-nn/pretrained_resnet50/pretrained_resnet50.qmd), [html](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/pretrained_resnet50/pretrained_resnet50.html). ## Example: IMDB review sentiment analysis - Lasso, MLP, RNN, LSTM, fransformer - [Lasso penalized logistic regression](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/imdb/imdb_lasso.html) - [MLP](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/imdb/imdb_mlp.html) - [RNN](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/imdb/imdb_rnn.html) - [LSTM](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/imdb/imdb_lstm.html) - [Transformer](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/imdb/imdb_transformer.html) - [Warm-start using pre-trained embedding in TF Hub](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/imdb/imdb_tfhub.html) ## Example: Generate Artificial Faces with GAN - [Generate Artificial Faces with CelebA Progressive GAN Model](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/progan/progan.html) ## Example: Neural style transfer - [Neural style transfer](https://ucla-econ-425t.github.io/2023winter/slides/10-nn/style_transfer/style_transfer.html)