# Machine Learning and Statistics for Physicists

Material for a [UC Irvine](https://uci.edu/) course offered by the [Department of Physics and Astronomy](https://www.physics.uci.edu/).

Content is maintained on [github](github.com/dkirkby/MachineLearningStatistics) and distributed under a [BSD3 license](https://opensource.org/licenses/BSD-3-Clause).

##### ► [View course table of contents](Contents.ipynb)

## Prerequisites

- [Create a github account](https://github.com/join) if you don't already have one.
- [Download](https://git-scm.com/downloads) and install the git command-line tools on your computer, if necessary.
- [Download](https://www.anaconda.com/download/) and install the current python 3.x version of the [anaconda distribution](https://www.anaconda.com/distribution/) on your computer, if necessary.

> Notes that are indented like this provide additional details on more advanced topics.

> If you already have an older version of anaconda installed that you want to keep, the instructions below create a new environment that should not disrupt your previous work and you can probably continue with the instructions below after running:
```
conda update conda
```

This course assumes a basic familiarity with the core python language. If you are rusty or still learning, I recommend the free ebook [A Whirlwind Tour of Python](https://www.oreilly.com/programming/free/files/a-whirlwind-tour-of-python.pdf), which is *"a fast-paced introduction to essential components of the Python language for researchers and developers who are already familiar with programming in another language"*.

> If you are currently using python 2.x and reluctant to move to python 3, read [this](https://wiki.python.org/moin/Python2orPython3) and [this](http://www.python3statement.org/).

No previous experience with git or github is necessary for this course (but they are useful research tools so worth learning - [here](https://guides.github.com/introduction/git-handbook/) is a good starting point). If you are finding the git learning curve to be steep, you are [not alone](https://explainxkcd.com/wiki/index.php/1597:_Git).

## Install course material

Clone the course material from github with the following command, which will create a subdirectory called :
```
git clone https://github.com/dkirkby/MachineLearningStatistics.git
```

> This command may prompt you for your github username and password but you can streamline future [github access using ssh](https://help.github.com/articles/which-remote-url-should-i-use/).

You should now have a subdirectory called `MachineLearningStatistics`. Enter it for the remaining steps:
```
cd MachineLearningStatistics
```
Any instructions below that assume you are in this subdirectory will be prefaced with the shell comment:
```
# cd MachineLearningStatistics/
```

## Create the course python environment

We will use the [conda command](https://conda.io/docs/commands.html) to create a standard [python environment](https://conda.io/docs/user-guide/tasks/manage-environments.html) for this course.

Create a new environment using the following command at a shell prompt:
```
# cd MachineLearningStatistics/
conda env create -f environment.yml
```
This command may run for several minutes.

> We are using python version 3.8 for this environment which, as of Apr 2021, is the anaconda default.

Activate the new environment using (this should add "(MLS)" to your command prompt, as a reminder of your current environment):
```
conda activate MLS
```

> To "deactivate" this enironment and return to your default base environment, use:
```
conda deactivate
```

> Older versions of conda used a [different syntax](https://conda.io/docs/user-guide/tasks/manage-environments.html#activating-an-environment) to activate and deactivate an environment.

Install the [tensorflow](https://www.tensorflow.org/) machine-learning framework:
```
pip install tensorflow
```

> This pip command will only install tensorflow into your MLS environment (since it uses a version of pip that is local to the MLS environment).

> If you are installing onto a linux or windows system with a [CUDA-enabled GPU card](https://www.tensorflow.org/install/gpu), use instead:
```
pip install tensorflow-gpu
```

Install the [pytorch](https://pytorch.org/) machine-learning framework. The exact command is different for Mac and linux/windows. On a Mac, use:
```
conda install pytorch torchvision -c pytorch
```
On a windows or linux system, use:
```
conda install pytorch torchvision cpuonly -c pytorch
```

> See [here](https://pytorch.org/get-started/locally/) for alternate install commands for a linux or windows system with a CUDA-enabled GPU card.

Install the [jax](https://jax.readthedocs.io/en/latest/notebooks/quickstart.html) machine-learning framework:
```
pip install jax jaxlib
```

> See [here](https://github.com/google/jax#pip-installation) to install on a system with a CUDA-enabled GPU card.

Enable a jupyter notebook [extension](https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tree/master/src/jupyter_contrib_nbextensions/nbextensions/exercise2) we will use for in-class exercises:
```
jupyter nbextension enable exercise2/main
```

> In case something goes wrong with your installation and you want to start again, shutdown any jupyter sessions with the old environment, then use:
```
conda deactivate
conda remove --name MLS --all
```

Finally, install the course code and data into your new environment using:
```
# cd MachineLearningStatistics/
pip install .
```

## Launch notebook server

To launch the [notebook server](http://jupyter-notebook.readthedocs.io/en/stable/notebook.html) at any time, you can now use:
```
# cd MachineLearningStatistics/
conda activate MLS
cd notebooks
jupyter notebook
```

> We are not using the newer [JupyterLab](https://blog.jupyter.org/jupyterlab-is-ready-for-users-5a6f039b8906) since it is [not compatible with notebook extensions](https://github.com/ipython-contrib/jupyter_contrib_nbextensions#jupyterlab).

Click on `Contents.ipynb` if this is your first time doing this, to check that you can view a notebook.

These instructions allow you to modify and run python code on your local computer from within your browser. If you just want to view these notebooks online, try this [nbviewer link](https://nbviewer.jupyter.org/github/dkirkby/MachineLearningStatistics/tree/master/notebooks/).

## Update course material

You can skip this section if you are installing MLS for the first time.

These instructions are only needed in case you need to update your local version of the MLS files, to synchronize with a change on github.

> For git experts: you will normally be working on the master branch to simplify the workflow. This means that your local work must be discarded or saved to another branch each time you update, using the instructions below.

The first step is to "factory reset" your installation before getting the updates. The simplest method is to throw away any changes you have made using:
```
# cd MachineLearningStatistics/
git checkout master
git reset --hard
```
Alternatively, you can keep a permanent record of your changes in a [git branch](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) with a name of your choice, for example "08-Jan-2021":
```
# cd MachineLearningStatistics/
git checkout -b "08-Jan-2021"
git commit -a -m "Save work in progress"
git checkout master
```

The second step is to download the changes from github:
```
# cd MachineLearningStatistics/
git pull
```
If this commands reports `Already up-to-date.` then there are no updates to download.

The final step is to update your local python environment:
```
# cd MachineLearningStatistics/
conda activate MLS
pip install . --upgrade
```