{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Analysis and Machine Learning Applications for Physicists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Material for a* [*University of Illinois*](http://illinois.edu) *course offered by the* [*Physics Department*](https://physics.illinois.edu). *This content is maintained on* [*GitHub*](https://github.com/illinois-mla) *and is distributed under a* [*BSD3 license*](https://opensource.org/licenses/BSD-3-Clause).\n", "\n", "[Table of contents](Contents.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Create a github account](https://github.com/join) if you don't already have one.\n", "- [Install the git command-line tools](https://git-scm.com/downloads) on your computer, if necessary.\n", "- Install the Python 3.7 version of [anaconda](https://www.anaconda.com/download/) on your computer, if necessary.\n", "\n", "This course assumes a basic familiarity with the core python language. If you are rusty or still learning, I recommend the free ebook [A Whirlwind Tour of Python](http://www.oreilly.com/programming/free/a-whirlwind-tour-of-python.csp), which is *\"a fast-paced introduction to essential components of the Python language for researchers and developers who are already familiar with programming in another language\"*.\n", "\n", "If you are currently using python 2.x and reluctant to move to python 3, read [this](https://wiki.python.org/moin/Python2orPython3) and [this](http://www.python3statement.org/).\n", "\n", "No previous experience with git or github is necessary for this course (but they are useful research tools so worth learning - [here](https://guides.github.com/introduction/git-handbook/) is a good starting point). If you are finding the git learning curve to be steep, you are [not alone](https://explainxkcd.com/wiki/index.php/1597:_Git)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create the course python environment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clone the course material from github with the following command, which will create a subdirectory called `syllabus`:\n", "```\n", "git clone --recurse-submodules https://github.com/illinois-mla/syllabus.git\n", "```\n", "This should ask you for your github username and password (but you can streamline future [github access using ssh](https://help.github.com/articles/which-remote-url-should-i-use/))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the [conda command](https://conda.io/docs/commands.html) to create a standard [python environment](https://conda.io/docs/user-guide/tasks/manage-environments.html) for this course. These instructions assume that you have already satisfied the prerequisites." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a new environment by entering (or pasting) the following command at a shell prompt in the top level directory of the course syllabus repo.\n", "```\n", "conda env create -f DAMLA-env/environment.yml\n", "```\n", "Activate the new environment using (this should add \"(DAMLA)\" to your command prompt, as a reminder of your current environment):\n", "```\n", "source activate DAMLA\n", "```\n", "Add some additional packages from other sources (details [here](https://github.com/ipython-contrib/jupyter_contrib_nbextensions#conda) and [here](https://github.com/ipython-contrib/jupyter_contrib_nbextensions/issues/1153)):\n", "```\n", "conda install -c conda-forge keras libiconv jupyter_contrib_nbextensions\n", "conda install -c astropy emcee astroml\n", "conda install pytorch-cpu -c pytorch\n", "```\n", "You might see something like\n", "```\n", "You are using pip version 10.0.1, however version 18.0 is available.\n", "You should consider upgrading via the 'pip install --upgrade pip' command.\n", "```\n", "at which point you should dutifully follow and do\n", "```\n", "pip install --upgrade pip\n", "```\n", "Enable a jupyter notebook [extension](https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tree/master/src/jupyter_contrib_nbextensions/nbextensions/exercise2) we will use for in-class exercises:\n", "```\n", "jupyter nbextension enable exercise2/main\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install course material" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Activate the course environment, if necessary (check your command prompt, but it doesn't do any harm to reactivate the current environment):\n", "```\n", "source activate DAMLA\n", "```\n", "Install the course code and data using:\n", "```\n", "cd syllabus\n", "pip install .\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch notebook server" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To launch the [notebook server](http://jupyter-notebook.readthedocs.io/en/stable/notebook.html) at any time, you can now use:\n", "```\n", "[[syllabus]]\n", "source activate DAMLA\n", "cd notebooks\n", "jupyter notebook\n", "```\n", "Note that `[[syllabus]]` is a reminder that you must be in your `syllabus` directory before typing the following commands. If you are unsure about this, refer to the [`pwd`](http://www.linfo.org/pwd.html) and [`cd`](http://www.linfo.org/cd.html) commands.\n", "\n", "**Windows users:** Wherever you see `source activate DAMLA`, use `activate DAMLA` instead. Details [here](https://conda.io/docs/user-guide/tasks/manage-environments.html#activating-an-environment).\n", "\n", "This should have opened a jupyter notebook tab or window in your browser. If this is your first time doing this, to validate that you can open and view a notebook, do `File->Open` and click on `Contents.ipynb`. Jupyter notebooks (formerly called IPython notebooks) have the file extension `.ipynb`.\n", "\n", "*(For git experts: you will normally be working on the master branch to simplify the workflow. This means that your local work must be discarded or saved to another branch each time you update, using the instructions below).*\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## In case of emergency, break glass..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In case something goes wrong with your installation and you want to start again, use:\n", "```\n", "conda remove --name DAMLA --all\n", "```\n", "You will need to shutdown any jupyter sessions with the old environment first." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Update course material" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can skip this section if you are installing for the first time, but remember these instructions for later.\n", "\n", "The first step is to \"factory reset\" your installation before getting the updates. The simplest method is to throw away any changes you have made using:\n", "```\n", "[[syllabus]]\n", "git checkout master\n", "git reset --hard\n", "```\n", "Alternatively, you can keep a permanent record of your changes in a [git branch](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell) with a name of your choice, for example \"08-Jan-2018\":\n", "```\n", "[[syllabus]]\n", "git checkout -b \"08-Jan-2018\"\n", "git commit -a -m \"Save work in progress\"\n", "git checkout master\n", "```\n", "\n", "The second step is to download the changes from github:\n", "```\n", "[[syllabus]]\n", "git pull\n", "```\n", "If this commands reports `Already up-to-date.` then there are no updates to download.\n", "\n", "The final step is to update your local python environment:\n", "```\n", "[[syllabus]]\n", "source activate DAMLA\n", "pip install . --upgrade\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the environment Docker image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If there is a problem that keeps you from being able to easily install the Conda environment on your local machine you can use the environment on the [provided Docker image](https://github.com/illinois-mla/DAMLA-env). The Docker image is meant to provide the compute environment, but not to be used as an area to store your work, so you should still clone the repo down to your local machine." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To install Docker Community Edition on your Linux, Mac, or Windows machine follow the [instructions in the Docker docs](https://docs.docker.com/install/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running the environment\n", "\n", "To use the Docker image first [pull](https://docs.docker.com/engine/reference/commandline/pull/) it down from Docker Hub\n", "\n", "`\n", "docker pull illinoismla/damla-env\n", "`\n", "\n", "If you want anything you do in the container to safely persist then you should bindmount your local machine's file system to the container as a [volume](https://docs.docker.com/storage/volumes/). So [run](https://docs.docker.com/engine/reference/commandline/run/) the image in a container while [exposing](https://docs.docker.com/engine/reference/run/#expose-incoming-ports) the container's internal port `8888` with the `-p` flag (this is necessary for Jupyter to be able to talk to the `localhost`) and bindmount the directory of the course Git repo on your local machine\n", "\n", "`\n", "docker run --rm -it -v :/home/physicist/data -p 8888:8888 illinoismla/damla-env:latest\n", "`\n", "\n", "Once inside the container activate note that the DAMLA Conda environment is already activated and should be shown in the terminal prompt\n", "\n", "```\n", "(DAMLA) root@:~/data#\n", "```\n", "\n", "though you can also verify this by listing the conda environments\n", "\n", "```\n", "conda env list\n", "# conda environments:\n", "#\n", "base /root/miniconda\n", "DAML\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Verifying the setup\n", "\n", "To verify that things are working as expected, launch a Jupyter notebook server and test basic imports (the \"Hello World\" of data analysis).\n", "\n", "Inside the running Docker container with the `DAMLA` environment activated navigate to `/home/physicist/data` (this is an arbitrary path that we chose when running the Docker image — you can make this whatever you want). If you `ls` you will see that you are actually inside your Git repo on your local machine.\n", "\n", "Then launch the Jupyter server\n", "\n", "`\n", "jupyter notebook\n", "`\n", "\n", "which will cause a login URL with a token to be printed to your terminal\n", "\n", "`\n", "http://localhost:8888/?token=\n", "`\n", "\n", "Click on the URL, and copy and paste it into your web browser on your local machine. This should then display the Jupyter server in your web browser.\n", "\n", "Create a new Jupyter notebook (select from the \"New\" drop down menu on the upper right) and then when the notebook opens import NumPy and run a simple test\n", "\n", "```\n", "import numpy as np\n", "np.arange(0, 10, 0.5)\n", "```\n", "`\n", "array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,\n", " 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])\n", "`\n", "\n", "If you now save the notebook and in a different terminal window on your local machine you navigate to the Git repo directory you will now see that **on your local machine** there is the Jupyter notebook. If you shutdown and exit the Jupter server, and exit the Docker container you will see that though the environment has exited and been [cleaned up](https://docs.docker.com/engine/reference/run/#clean-up---rm) the notebook has persisted." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Container persistance\n", "\n", "There should be no need to reuse the same container, as the thing you care about is the data and files you're writing which should exist on your local machine (nicely versioned with Git). However, if in the event you do want the container to persist between uses you can remove the [`--rm` flag](https://docs.docker.com/engine/reference/run/#clean-up---rm) from the `docker run` command to keep the container from being [removed](https://docs.docker.com/engine/reference/commandline/rm/). In this situation it is a good idea to also name the container with the [`--name` flag](https://docs.docker.com/engine/reference/run/#name---name)\n", "\n", "`\n", "docker run --name DAMLA-env-container -it -v :/home/physicist/data -p 8888:8888 illinoismla/damla-env:latest\n", "`\n", "\n", "After you exit the container if you [list](https://docs.docker.com/engine/reference/commandline/ps/) the Docker containers on your local machine\n", "\n", "`\n", "docker ps -a\n", "`\n", "\n", "you will see your exited container. To resume using _that specific container_ [start](https://docs.docker.com/engine/reference/commandline/start/) it again \n", "\n", "`\n", "docker start DAMLA-env-container\n", "`\n", "\n", "and then [attach](https://docs.docker.com/engine/reference/commandline/attach/) it to your shell\n", "\n", "`\n", "docker attach DAMLA-env-container\n", "`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Tool to diff two notebooks\n", "\n", "Sometimes seemingly small changes to a Juypter notebook can be hard to discern from large changes using git diff. You can add a tool called `nbdime` to your conda enviroment. First activate your DAMLA conda environment and then do:\n", "```\n", "pip install nbdime\n", "nbdime config-git --enable --global\n", "```\n", "Then diff'ing notebooks can use a web tool like:\n", "`nbdiff-web `\n", "\n", "For a complete set of `nbdime` commands, see them [here](https://nbdime.readthedocs.io/en/stable/cli.html#nbdiff-web)\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }