{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# AWS setup and usage\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we start to do more involved statistical inference using Stan, you may want to run your calculations on more powerful machines. While a local installation on your own machine may suffice, it will serve you well to have more expansive computing resources available. There are many options for this, including [Amazon Web Services](https://aws.amazon.com/) (AWS), [Google Cloud Platform](https://cloud.google.com), [Microsoft Azure](https://azure.microsoft.com/), and Caltech's own [high performance computing center](http://www.hpc.caltech.edu/). In this lesson, we will show you how to get up and running with AWS. While it looks like a lot of steps, some of the steps are done only once, so it is not much more work to launch instances after the initial setup. The [next part of this lesson](aws_usage.ipynb) serves as a quick reference for how to spin up instances and use them after you have completed the setup outlined in this part of the lesson." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Create an Amazon Web Services account" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should create an AWS account if you have not already. For a total cost of about \\$20, you can have over 100 hours of compute time on a four core machine, which should be plenty for the entire course." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Launch your instance\n", "\n", "Log on to the AWS Console by going to [https://aws.amazon.com](https://aws.amazon.com/) and clicking on the button at the upper right of the page.\n", "\n", "Once you are on the AWS Console page, you can launch your instance. We have set up an Amazon Machine Image (AMI), which has the software you need for the course installed and the data sets pre-loaded. The AMI is available in **Oregon** (us-west-2). Be sure to select this region from the top right corner of the console. You should use the same region throughout the course, since that is physically where your machine will live.\n", "\n", "1. To launch an instance with this AMI, choose EC2 among the services available from your AWS console. You can select EC2 from the `Services` pulldown menu at the top left of your screen. \n", "2. After selecting EC2, you will see a menu of options the left pane. Under `Images` there, click `AMIs`. \n", "3. The resulting menu will default to AMIs `Owned by me` (you likely do not have any). Select instead `Public images`. \n", "4. In the search menu, search for `bebi103`, and the class AMI should appear. If it does not, double check to make sure your region is Oregon.\n", "5. You will see the `bebi103` AMI listed. Right click on it and select `Launch instance from AMI`. (You may also select `Request Spot instance` if you want to save some money, but when you stop a spot instance, you will lose whatever you stored there.) \n", "6. You will then be taken to the `Launch an instance` page. In the `Name and tags` pane, you can give your instance a name, which I recommend, since it will help you find it more easily in your menu of instances when you come back to it. I simply named mine `bebi103`.\n", "7. You can skip the `Application and OS Images` pane because you already selected your AMI.\n", "8. In the `Instance type` pane, you are given many choices of instance types. For our beginning usage of Stan, I recommend a `c5.xlarge` instance, which as 4 cores and costs about 17ยข/hour to use. When we start doing simulation based calibration (SBC), which is the most computationally intensive part of the course, I would use a `c5.2xlarge` instance or larger (8 or more cores).\n", "9. In the `Key pair (login)` pane, you will need to create a new key pair if you do not already have an AWS key pair. Click `Create new key pair` and in the pop-up window, you can enter a name for the key pair (something like `bebi103_aws_keypair` would be fine). Leave the default radio buttons selected and click `Create key pair`. Once you create the key pair, it will be downloaded to your local machine. **DO NOT, I repeat, DO NOT store the key pair in any git repository, or anything that is backed up to the cloud, like Dropbox. ONLY store it locally on your machine and never, ever let it out to the internet.** You only need to create the key pair once; you can reuse it going forward.\n", "10. In the `Network settings` pane, you can select sources to allow SSH traffic. You can leave this as \"Anywhere\" and leave everything else as the defaults. If you are concerned about security, you can instead only allow SSH traffic from your IP address, but this may prove inconvenient if you want to access your instance from home and on campus.\n", "11. In the `Configure storage` pane, select 30 GiB gp2 root volume. This should be enough storage for installed software, data sets, MCMC samples, and the rest of your work.\n", "12. Click `Launch instance` at the bottom right of the right `Summary` pane.\n", "\n", "After following those steps, your instance will spin up. You can view your running instances by clicking on `Instances` on the dashboard on the left part of the screen. \n", "\n", "It will take a while for your instance to spin up. When the `Instance State` says `running` and the `Status Checks` are complete, your instance is ready for you." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Connect to your instance\n", "\n", "Now that your instance is launched, you can connect to it using your computer and the *ssh* protocol. The instructions work for Windows, macOS, or Linux, assuming you have a terminal running bash or zsh. In Windows, this is accomplished using [GitBash](https://git-scm.com), which you should install if you have not already. For macOS, use Terminal.\n", "\n", "1. Identify where you put your keypair file. For the purposes of this exercise, I will assume that you have a directory in your home directory called `key_pairs/` and that your keypair file is `~/key_pairs/bebi103_aws_keypair.pem`.\n", "2. Change permissions on your keypair for security. Do this in the terminal using\n", "\n", " `chmod 400 ~/key_pairs/bebi103_aws_keypair.pem`\n", "\n", "3. Open a new GitBash (Windows) or Terminal (macOS) window. \n", "4. SSH into your instance in the terminal. To do this, click on your instance on the `Instances` page from the dashboard. Clink on the instance you started. At the bottom of the webpage will appear information about your instance, including the Public IPv4 address. It will look something like `54.92.67.113`. Copy this. In what following, I refer to this as ``. SSH into your instance by doing the following on the command line.\n", "\n", " `ssh -i \"~/key_pairs/bebi103_aws_keypair.pem\" ec2-user@`\n", "\n", "5. (optional, may only work for macOS) To avoid having to use `-i \"~/key_pairs/bebi103_aws_keypair.pem\"` each time, you can add your keypair to your bash or zsh profile by doing, e.g. for zsh,\n", "\n", " `echo ssh-add -K $HOME/keypairs/bebi103_aws_keypair.pem >> ~/.zshrc;`\n", " `source ~/.zshrc`\n", " \n", "6. Now that you are SSH'd into the instance, you can access it through the command line. You may notice that the `bebi103` environment is already active. From here, the world is your oyster. For example, you can clone a repository where you keep your work for the class.\n", "\n", " `git clone https://github.com/my_user_name/my_favorite_repository.git`\n", " \n", "7. The data for the course are located in the `~/data` folder. So, to keep the appropriate relative path, you should make a directory for your homework, `~homework/` where each `.ipynb` file is stored.\n", "\n", "11. Whenever you log in to your instance, you should update the instance in case I updated any software or added data sets. To do this, run\n", "\n", " `bebi103_update`\n", " \n", "on the command line after SSH-ing into your instance.\n", "\n", "9. You now have your repository and all of the data for the course on your instance! Any documents you create or edit on your AWS instance can be managed with git and push/pulled to/from GitHub." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Launch JupyterLab\n", "\n", "While connected to your instance, you can launch JupyterLab by executing\n", "\n", " jupyter lab --no-browser\n", " \n", "on the command line. You will see output like this:\n", "\n", " To access the server, open this file in a browser:\n", " file:///home/ec2-user/.local/share/jupyter/runtime/jpserver-30060-open.html\n", " Or copy and paste one of these URLs:\n", " http://localhost:8888/lab?token=e52184f06c9fb0f9ceea176b1d51d9cb36c72a019e688fed\n", " or http://127.0.0.1:8888/lab?token=e52184f06c9fb0f9ceea176b1d51d9cb36c72a019e688fed\n", "\n", "\n", "Keep this window open.\n", "\n", "In order to use JupyterLab through a browser on your machine, you need to set up a socket. To do so, open up another GitBash or Terminal window and execute the following.\n", " \n", " ssh -i \"~/key_pairs/bebi103_aws_keypair.pem\" -L 8000:localhost:8888 ec2-user@\n", " \n", "This sets up a socket connecting port `8888` on your EC2 instance to port `8000` on your local machine. You can change these numbers as necessary. For example, in the URL listed above that you got with you launched JupyterLab, the port may be `localhost:8889`, in which case you need to substitute `8889` for `8888` in your ssh command. You may also want a different local port if you already have a JupyterLab instance running on port `8000`, e.g., `8001`. In what follows, I will use port number `8000` and `8888`, which you will probably use 90% of the time, but you can make changes as you see fit.\n", "\n", "After you have set up the socket, you can paste the URL given when you launched JupyterLab on your EC2 instance into your browser, but substitute `8000` for `8888`. That is, direct your browser to\n", "\n", " http://localhost:8000/lab?token=e52184f06c9fb0f9ceea176b1d51d9cb36c72a019e688fed\n", " \n", "You will now have JupyterLab up and running!\n", "\n", "Note that you may be running JupyterLab locally on your own machine. You should make sure you do not use the same port number of any JupyterLab instance running on your local machine when you launch JupyterLab on AWS. You can specify the port number to be, for example 8890, by launching JupyterLab with\n", "\n", " jupyter lab --no-browser --port 8890\n", " \n", "If you do that, make sure you use the corresponding port numbers when setting up your socket." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Copying results to and from AWS to your local machine\n", "\n", "As you work on notebooks and create new files you want to save, you may want to move them to your local machine. If you are working on a notebook or `.stan` file, the best option is to use git and commit and push those files to a git repository directly from the command line on your EC2 instance.\n", "\n", "Some files, though, such as MCMC results or intermediate data processing results, are not meant to be under version control. For these file, you an use `scp`. Within your GitBash or Terminal window on your local machine (you probably have to open yet another window), you can copy files as follows.\n", "\n", " scp -i \"~/key_pairs/bebi103_aws_keypair.pem\" ec2-user@:~/my_file.csv ./\n", "\n", "This command will copy files from your EC2 instance to your working directory. Simply put the full path to the file you want to transfer after the colon above (remember `~/` means \"home directory\"). The second argument of `scp` is where you want to copy the file.\n", "\n", "Similarly, you can upload files to your EC2 instance as follows (in this example to the home directory in your instance).\n", "\n", " scp -i \"~/key_pairs/bebi103_aws_keypair.pem\" my_file.txt ec2-user@:~/\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Exiting\n", "\n", "When you are finished with your session, you can shut down your notebook in the browser. If the shutdown does not give you a new prompt in the terminal window, you can do a hard shutdown of JupyterLab by pressing `Ctrl-c`.\n", "\n", "After you are finished with your work on your instance, you should stop your instance. To do this, go back to the `Instances` page on your EC2 console. Right click your instance and click `Stop instance`. **Do not terminate your instance** unless you really want to. Terminating an instance will get rid of any changes you made to it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Seriously. Stop your instances if you are not using them.\n", "\n", "**If your instance is not stopped and you leave it running, you will get charged!** You can rack up a massive bill with idle, but running, instances. You should stop your instances whenever you are not using them. It is a minor pain to wait for them to spin up again, but forgetting about a running instance will cause more pain than that to your pocketbook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Using your instance again\n", "\n", "After the initial setup and your first session with your instance, you will likely start it up again. This is much easier.\n", "\n", "In the AWS console, navigate to the `Instances` page, either through the left menu or via the EC2 dashboard. Find the instance you set up (mine is named `bebi103`), right click it, and select `Start instance`. After it has spun up, (`Instance state` is `Running`), you can access the Public IPv4 address, and fire up JupyterLab as per the above instructions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Terminate your instances after the class is over\n", "\n", "After the class is over, you might want to terminate your instance. This is because the storage in your instance (stored using AWS's EBS, which is what keeps your repository, installations, etc., all intact) is not free. Once your free tier accessibility expires in a year if you are new to AWS, and/or your promo codes expire, you will start getting bills for your EBS usage. These get wiped if you terminate your instance and you will not get billed." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 4 }