{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Setting up computing resources\n", "\n", "In this lesson, you will set up a GitHub account, learn about Google Colab, and set up a Python computing environment for scientific computing. \n", "\n", "It is advantageous to learn how to set up a Python distribution and manage packages on your own machine, as each person can have different needs. That said, we will make some use of [Google Colab](https://colab.research.google.com/) in this class, mainly because it is easier for remote collaborators to work together using Colab. It is also FREE! (Thanks, Google!) We will first introduce Google Colab, and then proceed with instructions for installation on your own machine using [Anaconda](https://www.anaconda.com)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting a GitHub account\n", "\n", "We will make extensive use of Git during the course. We will use [GitHub](http://github.com/) to host the repositories. You need to set up a GitHub account. Go to [http://github.com/](http://github.com/) to get an account. You should register with your academic email address so you get free private repositories as academics. You should also think carefully about picking your user name. There is a good chance other people in your professional life will see this.\n", "\n", "Once you have a GitHub account, send an email to `bois at caltech dot edu` with your account ID to get access to the [BE/Bi 103 Group on GitHub](https://github.com/bebi103). Within this group, you will form a team. Your team consists of your partners for homework submission." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up Google Colab\n", "\n", "In order to use Google Colab, you must have a Google account. Caltech students and employees have an account through Caltech's G Suite. Many of you may have a personal Google account, usually set up for things like GMail, YouTube, etc. For your work in this class, **use your Caltech account.** This will facilitate collaboration with your teammates in the course, as well as with course staff.\n", "\n", "Many of you probably use your personal Google account on your machine, so it can get annoying to log in and out of it. A trick that I find useful is to use one browser, e.g., Safari or Microsoft Edge, for your personal use, web browsing, etc., and a different browser for your scientific work, including the work in this class. Google Colab are most tested for Chrome, Firefox, and Safari (in fact JupyterLab, which you will use on your own machine, only supports these three browsers).\n", "\n", "Once you have either logged out of all of your personal accounts or have a different browser open, you can launch a Colab notebook by simply navigating to [https://colab.research.google.com/](https://colab.research.google.com/). Alternatively, you can click the \"Launch in Colab\" badge at the top right of this page, and you will launch this notebook in Colab. That badge will appear in the top right of all pages in the course content generated from notebooks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Watchouts when using Colab\n", "\n", "If you do run a notebook in Colab, you are doing your computing on one of Google's computers via a virtual machine. You get two CPU cores and 12 GB of RAM. You can also get GPUs and TPUs (Google's tensor processing units), but we will not use those in this course. The computing resources should be enough for all of our calculations this term (though you will need more computing power in the sequel of this course). However, there are some limitations you should be aware of.\n", "\n", "- If your notebook is idle for too long, you will get disconnected from your notebook. \"Idle\" means that cells are not being edited or executed. The idle timeout varies depending on the load on Google's computers; I find that I almost always get disconnected if idle for an hour.\n", "- Your virtual machine will disconnect if it is being used for too long. It typically will only available for 12 hours before disconnecting, though times can vary, again based on load.\n", "\n", "These limitations are in place so that Google can offer Colab for free. If you want more cores, longer timeouts, etc., you might want to check out [Colab Pro](https://colab.research.google.com/signup). However, the free tier should work well for you in the course. You of course can always run on your own machine, and in fact are encouraged to do so except where collaboration is necessary.\n", "\n", "There are additional software-specific watchouts when using Colab.\n", "\n", "- Later in the course, we will use [HoloViews](http://holoviews.org/) for high-level plotting. Colab will not render HoloViews plots unless `hv.extension('bokeh')` is called *in each cell* that has a HoloViews plot.\n", "- Colab does not allow for full functionality [Bokeh](http://bokeh.pydata.org/) apps and some [Panel](http://panel.holoviz.org/) functionality that we will use later in the course when we do dashboarding. \n", "- Colab instances have specific software installed, so you will need to install anything else you need in your notebook. This is not a major burden, and is discussed in the next section.\n", "\n", "\n", "\n", "I recommend reading the [Colab FAQs](https://research.google.com/colaboratory/faq.html) for more information about Colab." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Software in Colab\n", "\n", "When you launch a Google Colab notebook, much of the software we will use in class is already installed. It is not always the latest version of the software, however. In fact, as of early September 2020, Colab is running Python 3.6, whereas you will run Python 3.8 on your machine through your Anaconda installation. Nonetheless, most (but not all) of the analyses we do for this class will work just fine in Colab. We will make every effort to let you know when Colab will not be able to handle activities in class, the most important example being some dashboarding applications.\n", "\n", "Because the notebooks in Colab have software preinstalled, and no more, you will often need to install software before you can run the rest of the code in a notebook. To enable this, when necessary, in the first code cell of each notebook in this class, we will have the following code (or a variant thereof depending on what is needed or if the default installations of Colab change). Running this code will not affect running your notebook on your local machine; the same notebook will work on your local machine or on Colab." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Colab setup ------------------\n", "import os, sys, subprocess\n", "if \"google.colab\" in sys.modules:\n", " cmd = \"pip install --upgrade iqplot colorcet datashader bebi103 watermark\"\n", " process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", " stdout, stderr = process.communicate()\n", " data_path = \"https://s3.amazonaws.com/bebi103.caltech.edu/data/\"\n", "else:\n", " data_path = \"../data/\"\n", "# ------------------------------" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to installing the necessary software on a Colab instance, this also sets the relative path to data sets we will use in the course. When running in Colab, the data set is fetched from cloud storage on [AWS](https://aws.amazon.com). When running on your local machine, the path to the data is one directory up from where you are working. We will discuss relative paths in a future lesson." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Collaborating with Colab\n", "\n", "If you want to collaborate with another student or with the course staff on a notebook, you can click \"Share\" on the top right corner of the Colab window and choose with whom and how (the defaults are fine) you want to share.\n", "\n", "When we talk about Git in a future lesson, we will discuss Colab's GitHub support, which will be necessary for version control and submitting and sharing your homework." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation on your own machine\n", "\n", "We now proceed to discuss installation of the necessary software on your own machine. Before we get into that, there are some preliminaries to configure your computer." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## macOS users: Install XCode\n", "\n", "If you are using macOS, you should install [XCode](https://developer.apple.com/xcode/), if you haven't already. It's a large piece of software, taking up about 5GB on your hard drive, so make sure you have enough space. You can install it through the App Store.\n", "\n", "After installing it, you need to open the program. Be sure to do that, for example by clicking on the XCode icon in your Applications folder. Upon opening XCode, it may perform more installations. After these are completed, you can close XCode." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Windows users: Install Git and Chrome or Firefox\n", "\n", "We will be using [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) in this course. It is browser-based, and Chrome, Firefox, and Safari are supported. Microsoft Edge is **not**. Therefore, if you are a Windows user, you need to be sure you have either Chrome of Firefox installed.\n", "\n", "Git is installed on Macs with XCode. For Windows users, you need to install Git. You can do this by following the instructions [here](https://gitforwindows.org)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Uninstalling Anaconda\n", "\n", "Unless you have experience with Anaconda and know how to set up environments, if you have previously installed Anaconda with a version of Python other than 3.8, you need to uninstall it, removing it completely from your computer. You can find instructions on how to do that from the [official uninstallation documentation](https://docs.anaconda.com/anaconda/install/uninstall/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading and installing Anaconda\n", "\n", "Downloading and installing Anaconda is simple. \n", "1. Go to the [Anaconda distribution homepage](https://www.anaconda.com/distribution/) and download the graphical installer. \n", "2. Be sure to download Anaconda for Python 3.8 for the appropriate operating system.\n", "3. Follow the on-screen instructions for installation. When prompted, be sure to \"Install for me only.\"\n", "4. You may be prompted for optional installations, like [PyCharm](https://www.jetbrains.com/pycharm/). You will not need these for the course.\n", "\n", "That's it! After you do that, you will have a functioning Python distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launching JupyterLab and a terminal\n", "\n", "After installing the Anaconda distribution, you should be able to launch the **Anaconda Navigator**. If you are using macOS, this is available in your `Applications` menu. If you are using Windows, you can do this from the Start menu. Launch Anaconda Navigator.\n", "\n", "We will be using JupyterLab throughout the course (more on that in [Lesson 0b](intro_to_jupyterlab.ipynb)). You should see an option to launch JupyterLab. When you do that, a new browser window or tab will open with JupyterLab running. Within the JupyterLab window, you will have the option to launch a notebook, a console, a terminal, or a text editor. We will use all of these during the course. For the updating and installation of necessary packages, click on `Terminal` to launch a terminal. You will get a terminal window (probably black) with a bash prompt. We refer to this text interface in the terminal as the **command line**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The conda package manager\n", "\n", "conda is a package manager for keeping all of your packages up-to-date. It has plenty of functionality beyond our basic usage in class, which you can learn more about by reading the [docs](http://conda.pydata.org/docs/get-started.html). We will primarily be using conda to install and update packages.\n", "\n", "conda works from the command line. Now that you know how to get a command line prompt, you can start using conda. The first thing we'll do is update conda itself. Enter the following on the command line\n", "\n", " conda update conda\n", "\n", "You will be prompted to continue this operation, so press `y` to continue. Next, we'll update the packages that came with the Anaconda distribution. To do this, enter the following on the command line:\n", "\n", " conda update --all\n", "\n", "If anything is out of date, you will be prompted to perform the updates, and press `y` to continue. (If everything is up to date, you will just see a list of all the installed packages.) They may even be some downgrades. This happens when there are package conflicts where one package requires an earlier version of another. conda is very smart and figures all of this out for you, so you can almost always say \"yes\" (or \"`y`\") to conda when it prompts you." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installations\n", "\n", "There are several additional installations you need to do. We will first install some plotting packages we need.\n", "\n", " conda install holoviews param panel colorcet hvplot datashader\n", "\n", "These packages are part of the [HoloViz](http://holoviz.org) suite of packages. This also includes [GeoViews](https://geoviews.org), which is an excellent plotting package for geographical data. Because we will not be using geographical data sets in class and because as of early September 2020 GeoViews has conflicts with other packages, we are not installing it. If your research involves geographical data, you may with to install it as well and accept the package downgrades that conda enforces. The easiest way to do an installation of the HoloViz packages including GeoViews (and accepting downgrades) is to not do the above installation, but instead do `conda install -c pyviz holoviz`. If you are not going to be using GeoViews, do *not* do this, but rather execute the suggested installation above.\n", "\n", "Next, we need to install a few packages we will use indirectly during the course that are handy to have.\n", "\n", " conda install nodejs netcdf4 black\n", " \n", "We will also install [watermark](https://github.com/rasbt/watermark), which enables us to conveniently display version numbers of the software we are using. For this installation, we will use `pip`. There are a few other packages from pip we will need, so we can go ahead and install those now.\n", "\n", " pip install watermark blackcellmagic multiprocess jupytext cmdstanpy arviz iqplot bebi103\n", "\n", "Finally, we need to configure JupyterLab to work with the plotting packages we will use.\n", "\n", " jupyter labextension install --no-build @pyviz/jupyterlab_pyviz\n", " \n", "You may also wish to install a spell-checker (this one isn't necessary).\n", "\n", " jupyter labextension install --no-build @ijmbarr/jupyterlab_spellchecker\n", "\n", "After installing all of these extensions, you can rebuild JupyterLab.\n", "\n", " jupyter lab build\n", " \n", "You should close your JupyterLab session and terminate Anaconda Navigator after you have completed the build. Relaunch Anaconda Navigator and launch a fresh JupyterLab instance. As before, after JupyterLab launches, launch a new terminal window so that you can proceed with setting up Git." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking your distribution\n", "We'll now run a quick test to make sure things are working properly. We will make a quick plot that requires some of the scientific libraries we will use.\n", "\n", "Use the JupyterLab launcher (you can get a new launcher by clicking on the `+` icon on the left pane of your JupyterLab window) to launch a notebook. In the first cell (the box next to the `[ ]:` prompt), paste the code below. To run the code, press `Shift+Enter` while the cursor is active inside the cell. You should see a plot that looks like the one below. If you do, you have a functioning Python environment for scientific computing!\n", "\n", "You can also test this in Colab (and it should work with no problems)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"
\\n\"+\n \"