{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Setting up computing resources" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "nbsphinx": "hidden", "tags": [] }, "outputs": [], "source": [ "# Colab setup ------------------\n", "import os, sys, subprocess\n", "if \"google.colab\" in sys.modules:\n", " cmd = \"pip install --upgrade watermark\"\n", " process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", " stdout, stderr = process.communicate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this lesson you will set up a Python computing environment for scientific computing on your own computer and also learn a bit about Google Colab, a cloud service for running Jupyter notebooks.\n", "\n", "It is advantageous to learn how to set up a Python distribution and manage packages on your own machine, as each person can have different needs. That said, [Google Colab](https://colab.research.google.com/) is a nice, free resource to run Jupyter Notebooks on Google's computers without any local installations necessary." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up Google Colab\n", "\n", "In order to use Google Colab, you must have a Google account. Caltech students and employees have an account through Caltech's G Suite. Many of you may have a personal Google account, usually set up for things like GMail, YouTube, etc. For your work in this class, **use your Caltech account.** This will facilitate collaboration with your teammates in the course, as well as with course staff.\n", "\n", "Many of you probably use your personal Google account on your machine, so it can get annoying to log in and out of it. A trick that I find useful is to use one browser, e.g., Safari or Microsoft Edge, for your personal use, web browsing, etc., and a different browser for your scientific work, including the work in this class. Google Colab are most tested for Chrome, Firefox, and Safari (in fact JupyterLab, which you will use on your own machine, only supports these three browsers).\n", "\n", "Once you have either logged out of all of your personal accounts or have a different browser open, you can launch a Colab notebook by simply navigating to [https://colab.research.google.com/](https://colab.research.google.com/). Alternatively, you can click the \"Launch in Colab\" badge at the top right of this page, and you will launch this notebook in Colab. That badge will appear in the top right of all pages in the course content generated from notebooks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Watchouts when using Colab\n", "\n", "If you do run a notebook in Colab, you are doing your computing on one of Google's computers via a virtual machine. You get two CPU cores and limited (about 12 GB, but it varies) RAM. You can also get GPUs and TPUs (Google's tensor processing units), but we will not use those in this course. The computing resources should be enough for all of our calculations this term (though you will need more computing power in the sequel of this course). However, there are some limitations you should be aware of.\n", "\n", "- If your notebook is idle for too long, you will get disconnected from your notebook. \"Idle\" means that cells are not being edited or executed. The idle timeout varies depending on the load on Google's computers; I find that I almost always get disconnected if idle for an hour.\n", "- Your virtual machine will disconnect if it is being used for too long. It typically will only available for 12 hours before disconnecting, though times can vary, again based on load.\n", "\n", "These limitations are in place so that Google can offer Colab for free. If you want more cores, longer timeouts, etc., you might want to check out [Colab Pro](https://colab.research.google.com/signup). However, the free tier should work well for you in the course. You can of course always run on your own machine, and in fact are encouraged to do so.\n", "\n", "There are additional software-specific watchouts when using Colab.\n", "\n", "- Colab does not allow for full functionality of [Bokeh](http://bokeh.pydata.org/) apps. \n", "- Colab instances have specific software installed, so you will need to install anything else you need in your notebook. This is not a major burden, and is discussed in the next section.\n", "\n", "I recommend reading the [Colab FAQs](https://research.google.com/colaboratory/faq.html) for more information about Colab." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Software in Colab\n", "\n", "When you launch a Google Colab notebook, much of the software we will use in class is already installed. It is not always the latest version of the software, however. In fact, as of July 2024, Colab is running Python 3.10, whereas you will run Python 3.12 on your machine through your Anaconda installation. Nonetheless, most (but not all) of the analyses we do for this class will work just fine in Colab. We will make every effort to let you know when Colab will not be able to handle activities in class, the most important example being some dashboarding applications.\n", "\n", "Because the notebooks in Colab have software preinstalled, and no more, you will often need to install software before you can run the rest of the code in a notebook. To enable this, when necessary, in the first code cell of each notebook in this class, we will have the following code (or a variant thereof depending on what is needed or if the default installations of Colab change). Running this code will not affect running your notebook on your local machine; the same notebook will work on your local machine or on Colab.\n", "\n", "```python\n", "# Colab setup ------------------\n", "import os, sys, subprocess\n", "if \"google.colab\" in sys.modules:\n", " cmd = \"pip install --upgrade iqplot bebi103 watermark\"\n", " process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", " stdout, stderr = process.communicate()\n", " data_path = \"https://s3.amazonaws.com/bebi103.caltech.edu/data/\"\n", "else:\n", " data_path = \"../data/\"\n", "# ------------------------------\n", "```\n", "\n", "In addition to installing the necessary software on a Colab instance, this also sets the relative path to data sets we will use in the course. When running in Colab, the data set is fetched from cloud storage on [AWS](https://aws.amazon.com). When running on your local machine for homeworks, the path to the data is one directory up from where you are working.\n", "\n", "In most notebooks, the Colab and data path setup code cells are hidden in the HTML rendering to avoid clutter, but will be present when you download the notebooks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Collaborating with Colab\n", "\n", "If you want to collaborate with another student or with the course staff on a notebook, you can click \"Share\" on the top right corner of the Colab window and choose with whom and how (the defaults are fine) you want to share.\n", "\n", "When we talk about Git in a future lesson, we will discuss Colab's GitHub support, which will be necessary for version control and submitting and sharing your homework." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installation on your own machine\n", "\n", "We now proceed to discuss installation of the necessary software on your own machine." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading and installing Miniconda\n", "\n", "If you already have Anaconda or Miniconda installed on your machine, you can skip this step and proceed to install node.js.\n", "\n", "To download and install Miniconda, do the following." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Windows\n", "\n", "1. Go to the [Miniconda page](https://docs.anaconda.com/miniconda/#quick-command-line-install) and go to the \"Quick command line install\" section.\n", "2. Click on the \"Windows PowerShell\" tab.\n", "3. Copy all of the contents in the gray box (starting the `curl`).\n", "4. Go to the Start menu and search for \"PowerShell.\" Click to open a PowerShell window. Alternatively, you can hit `Windows + R` and type `PowerShell` in the text box.\n", "5. Paste the copied text into the PowerShell window and hit enter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### macOS\n", "\n", "1. Go to the [Miniconda page](https://docs.anaconda.com/miniconda/#quick-command-line-install) and go to the \"Quick command line install\" section.\n", "2. Click on the \"macOS\" tab.\n", "3. Copy all of the contents in the gray box (starting the `mkdir`).\n", "4. Open a Terminal window. You can do this by hitting `Command-space bar`, typing `Terminal`, and hitting enter. Alternatively, the Terminal application is located in the `/System/Applications/Utilities/` folder, which you can navigate to using Finder.\n", "5. Paste the copied text into the Terminal window and hit enter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linux\n", "\n", "1. Go to the [Miniconda page](https://docs.anaconda.com/miniconda/#quick-command-line-install) and go to the \"Quick command line install\" section.\n", "2. Click on the \"Linux\" tab.\n", "3. Copy all of the contents in the gray box (starting the `mkdir`).\n", "4. Open a terminal window. I assume you know how to do this if you are using Linux.\n", "5. Paste the copied text into the terminal window and hit enter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install node.js\n", "\n", "[node.js](https://nodejs.org/) is a platform that enables you to run JavaScript outside of the browser. We will not use it directly, but it needs to be installed for some of the more sophisticated JupyterLab functionality. Install node.js by downloading the appropriate installer for your machine [here](https://nodejs.org/en/download/)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up a conda environment\n", "\n", "I have created a conda environment for use in this class. You can download the YML specification for the environment via the link below (you may need to right-click and then download).\n", "\n", "> [Download bebi103.yml](https://raw.githubusercontent.com/bebi103a/bebi103a.github.io/main/_static/bebi103.yml)\n", "\n", "You can set up and activate the environment on the command line. (By \"command line,\" I mean a prompt in a PowerShell or terminal window.) Navigate to the directory where you saved the `bebi103.yml` file. (For example, if it is in a directory called `Downloads` in your home directory, you would do `cd ~/Downloads` on the command line to navigate there.) Then, on the command line, enter\n", "\n", " conda env create -f bebi103.yml\n", "\n", "This should build the environment for you (it may take several minutes). To then activate the environment, enter\n", "\n", " conda activate bebi103\n", "\n", "on the command line." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Launching JupyterLab\n", "\n", "You can launch JupyterLab via your operating system's terminal program (Terminal on macOS and PowerShell on Windows). If you are on a Mac, open the `Terminal` program. You can do this hitting `Command + space bar` and searching for \"terminal.\" Using Windows, you should launch PowerShell. You can do this by hitting `Windows + R` and typing \"powershell\" in the text box.\n", "\n", "Once you have a terminal or PowerShell window open, you will have a prompt. At the prompt, type\n", "\n", " conda activate bebi103\n", "\n", "This will ensure you are using the bebi103 environment you just created.\n", "\n", "**You need to make sure you are using the bebi103 environment whenever you launch JupyterLab, so you should do conda activate bebi103 each time you open a terminal.**\n", "\n", "Now that you have activated the bebi103 environment, you can launch JupyterLab by typing\n", "\n", " jupyter lab\n", "\n", "on the command line. You will have an instance of JupyterLab running in your default browser. If you want to specify the browser, you can, for example, type\n", "\n", " jupyter lab --browser=firefox\n", "\n", "on the command line." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Checking your distribution\n", "We'll now run a quick test to make sure things are working properly. We will make a quick plot that requires some of the scientific libraries we will use.\n", "\n", "Use the JupyterLab launcher (you can get a new launcher by clicking on the `+` icon on the left pane of your JupyterLab window) to launch a notebook. In the first cell (the box next to the `[ ]:` prompt), paste the code below. To run the code, press `Shift+Enter` while the cursor is active inside the cell. You should see a plot that looks like the one below. If you do, you have a functioning Python environment for scientific computing!\n", "\n", "You can also test this in Colab (and it should work with no problems)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "'use strict';\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " const force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", "const JS_MIME_TYPE = 'application/javascript';\n", " const HTML_MIME_TYPE = 'text/html';\n", " const EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " const CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " const script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " function drop(id) {\n", " const view = Bokeh.index.get_by_id(id)\n", " if (view != null) {\n", " view.model.document.clear()\n", " Bokeh.index.delete(view)\n", " }\n", " }\n", "\n", " const cell = handle.cell;\n", "\n", " const id = cell.output_area._bokeh_element_id;\n", " const server_id = cell.output_area._bokeh_server_id;\n", "\n", " // Clean up Bokeh references\n", " if (id != null) {\n", " drop(id)\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " const cmd_clean = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd_clean, {\n", " iopub: {\n", " output: function(msg) {\n", " const id = msg.content.text.trim()\n", " drop(id)\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " const cmd_destroy = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd_destroy);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " const output_area = handle.output_area;\n", " const output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!Object.prototype.hasOwnProperty.call(output.data, EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " const toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " const bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " const script_attrs = bk_div.children[0].attributes;\n", " for (let i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " const toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " const props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " const events = require('base/js/events');\n", " const OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " const NB_LOAD_WARNING = {'data': {'text/html':\n", " \"\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"
\\n\"+\n",
" \"\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"
\\n\"+\n \"