{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Setting up Python environments\n", "\n", "
\n", " \n", " \n", " Open this notebook in Google Colab\n", " \n", "
\n", "\n", "\n", "
\n", " \n", " \n", " Download this notebook (File -> Save As)\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning objectives\n", "- Learn why we need virtual environments. \n", "- Learn how to create and use virtual environments using `venv`. \n", "- Learn how to install packages using `pip`.\n", "- Learn how to create an environment specification file using `pip freeze`.\n", "- Learn the differences between major environment management tools `venv`, `conda`, and `poetry`. \n", "- Learn the basics of Anaconda environments and conda. \n", "- Learn how to create a virtual environment and install packages. \n", "- Learn how to use the kernel with Jupyter notebooks/lab. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## \"Works-on-my-machine\" problem and computing environment\n", "\n", "Since the beginning of computing, this has been the most common problem:\n", "\n", "\n", "\n", "To run any piece of code on any computer, you need many supporting software — the computing environment. The computing environment includes the operating system, programming language, and libraries, and so on. \n", "\n", "To address this, we version software and ensures compatability across versions, tracks dependencies, and so on. But they are not perfect, and we may (maybe eternally, *gasp*) have to deal with the problem of \"it worked on my machine.\"\n", "\n", "Still, there are many things we can do to mitigate the \"works-on-my-machine\" problem. Here, I'd like to provide some general ideas and pointers to help you set up your Python environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Reasons to learn Python environment management\n", "\n", "Setting up and maintaing Python environments can be painful! Python project has not been particularly good with dependency management systems. Python is easy to pick up and start coding because it's interpreted language. Also as a result of this approachability, there are so many users and usecases. \n", "\n", "This is a good thing, but it also contributes to the messiness of the Python ecosystem. For use cases where performance is vital (e.g., machine learning and scientific computing) — because Python is a slow, interpreted language — people had to write performance-critical code in C/C++/Fortran and wrap them in Python. This is great for performance, but it makes dependency management even more complicated.\n", "\n", "\n", "\n", "(Python Environment, by Randall Munroe, https://xkcd.com/1987/)\n", "\n", "I must admit that learning how to set up and manage Python environments is not the most exciting thing. Also, you may be lucky enough to work with a dedicated DevOps team that takes care of this for you, or you may be in a situation where you can exclusively use cloud-based services like Google Colab.\n", "\n", "On the other hand, you may also be in a situation where you have to set up your own environment, to use cutting-edge packages or due to the constraints of your organization. In such cases, whether you can set up and manage your Python environment, and whether you can successfully install and use a single critical package, can make or break your project.\n", "\n", "In addition, because the environment management involves technical details about how your computer works, it can be a great learning opportunity. It can help you understand how your computer and Python work under the hood. Therefore, I encourage you to constantly learn how to set up and manage Python environments and packages!\n", "\n", "Having said that, struggling to learn how to manage Python environments on top of completing the weekly assignments and other courses may be too much for you, especially if you are new to programming. So, while I encourage everyone to learn the basics of Python environment management, it will not be a requirement. Please do feel free to use [Google Colab](https://colab.research.google.com/) or other cloud-based services to circumvent environment management if you get stuck!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A general principle: use virtual environments!\n", "\n", "### Problem with having a single global environment\n", "Imagine a data scientist, Alice, who works on two very different projects. In one project, she is working on a machine learning model, which requires a cutting-edge package that is being actively developed. This package makes use of the most recent features (and thus most recent versions) of other foundational packages (e.g., the lastest version of `numpy`). \n", "\n", "In another project, she is debugging and maintaining a legacy codebase that breaks if she uses a recent version of `numpy`. What should she do in this situation?\n", "\n", "If Alice installs the latest version of `numpy` globally, she will not be able to test the legacy codebase, which requires an older version of `numpy`, and vice versa. She can potentially buy a new computer for each project, but that's just not practical. 🤑\n", "\n", "### Solution: virtual environments\n", "The solution is to use **virtual environments**! A virtual environment is a self-contained environment that contains its own version of software packages. Whether the package is a pure Python code or some binary code, there is nothing wrong with keeping multiple versions in the same computer, as long as they are isolated from each other and we can clearly specify which version to use for each project. \n", "\n", "This is exactly what all virtual environment tools do. Usually a virtual environment is essentially a folder somewhere in your computer (e.g., in your project directory or a dedicated directory for virtual environments) that contains a copy of Python and other packages. When you _activate_ a virtual environment, it modifies your `PATH` environment variable so that you use whatever in your virtual environment instead of the global version. Then when you _deactivate_ the virtual environment, it restores the `PATH` variable to its original state.\n", "\n", "In the most basic sense, that's it! \n", "\n", "### Virtual environments are not just for Python\n", "\n", "Virtual environments are not just for Python. For example, `Node.js` has `nvm` (Node Version Manager), `Ruby` has `rvm` (Ruby Version Manager) and `bundler`, and so on. \n", "\n", "If you go one step further, you can use a more general virtual environment tool like `Docker` to create a virtual environment that contains not only Python but also specific version of other software (e.g., a database) that your project requires. \n", "\n", "\n", "\n", "### Which tools to use? \n", "\n", "There are (too) many tools to manage virtual environments and packages in the Python ecosystem. You can use the barebones `venv` module, or you can use super powerful, yet complex tools like `conda` (and the whole anaconda ecosystem). Moreoever, there are even many online services that provide cloud-based Python environments like Google Colab. Here is my quick and dirty recommendation:\n", "\n", "1. Try **Google Colab** if you don't want to deal with environment management right now and get something done quickly. \n", "2. If you want to understand how the whole virtual environment system and Python packages work, learn the basics of **`venv` and `pip`** first. They are the most basic tools that lets you learn the core concepts. \n", "3. If your project uses fairly standard packages, start with **`uv`**. It is a modern, super-fast Python package manager that is gaining huge support right now. I believe it will likely become the de facto standard in the future. \n", "4. If you are working on a data science project and you need to use lots of scientific computing packages, you may have to use **`conda`** and anaconda system. The main reason to use `conda` is that it accommodates many non-Python packages that should be installed to the system outside of Python package system that you cannot install with `pip` or `uv` or any other package manager. \n", "\n", "Below, you can find the basic instructions for each of these tools. \n", "\n", "- See [Python on the cloud](#python-on-the-cloud) for some basic instructions on how to use Google Colab. \n", "- See [Basic virtual environment management with `venv`](#basic-virtual-environment-management-with-venv) for the basics of Python virtual environments. \n", "- See [Anaconda: a powerful Python environment management tool for data science](#anaconda-a-powerful-python-environment-management-tool-for-data-science) for the basics of `conda` and anaconda. \n", "- See [uv: A modern, fast Python package installer and resolver](#uv-a-modern-fast-python-package-installer-and-resolver) for the basics of `uv`. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic virtual environment management with `venv`\n", "\n", "### What is `venv`?\n", "\n", "`venv` is a built-in module in Python that allows us to create isolated virtual environments. Remember that you can think of each virtual environment as a directory that contains Python libraries, each with a specific version. This isolation means you can work on multiple Python projects with different dependencies on the same machine. \n", "\n", "### Creating a Virtual Environment\n", "\n", "- Open your terminal (Command Prompt on Windows, Terminal on macOS/Linux).\n", "- Navigate to the directory where you want to create your virtual environment using the `cd` command.\n", "- Run the following command:\n", "\n", "```sh\n", "python -m venv myenv\n", "```\n", "\n", "or \n", "\n", "```sh\n", "python3 -m venv myenv\n", "```\n", " \n", "**What does it do?** This command creates a new directory named `myenv` (or whatever you name it) in your current directory. This directory will contain the Python interpreter, a copy of the `pip` package manager, and other necessary files. It's a self-contained environment where you can install packages without affecting the global Python installation.\n", "\n", "But, this step does not _activate_ your virtual environment. You need to activate it to use it. \n", "\n", "### Activating the Virtual Environment\n", "\n", "- Once the environment is created, you need to activate it.\n", "- **On Windows:** Run \n", "\n", "```sh\n", "myenv\\Scripts\\activate\n", "```\n", "\n", "- **On macOS/Linux:** Run \n", "\n", "```sh\n", "source myenv/bin/activate\n", "```\n", "\n", "**What does this do?** Activating the virtual environment adjusts your shell’s environment variables so that when you run `python`, it uses the environment’s Python interpreter and when you run `pip`, it manages the environment’s packages. It changes your prompt to show the name of the activated environment to let you know that you're using a virtual environment. Always pay attention to which environment you're using!\n", "\n", "### Installing Packages in the Virtual Environment\n", "- With the environment activated, you can install Python packages using `pip`.\n", "- For example, to install `networkx`, run \n", "\n", "```sh\n", "pip install networkx\n", "```\n", "\n", "If you have activated the virtual environment, `networkx` should be installed in the virtual environment. \n", "\n", "Let's check this. First, we can run `pip list` to see what packages are installed in the virtual environment. \n", "\n", "```sh\n", "pip list\n", "```\n", "\n", "You should be able to see something like this:\n", "\n", "```\n", "Package Version\n", "---------- -------\n", "networkx X.X.X\n", "pip XX.X.X\n", "setuptools XX.X.X\n", "```\n", "\n", "You can also see this by navigating into the virtual environment (remember, it's just a directory). \n", "\n", "```sh\n", "cd myenv\n", "ls\n", "```\n", "\n", "You can do something along this line to see all the packages installed in the virtual environment:\n", "\n", "```sh\n", "ls myenv/lib/python3.11/site-packages\n", "```\n", "\n", "### Letting others to create the same environment\n", "\n", "You can share the list of packages installed in your virtual environment with others by creating a `requirements.txt` file. \n", "\n", "```sh\n", "pip freeze > requirements.txt\n", "```\n", "\n", "This will create a `requirements.txt` file that contains the list of packages installed in your virtual environment. Try it and see what's in the file. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Deactivating the Virtual Environment\n", "\n", "When you’re done working in the virtual environment, you can deactivate it by running\n", "\n", "```sh\n", "`deactivate`\n", "```\n", "\n", "This restores your shell’s environment variables to their normal state, so that `python` and `pip` refer to the global Python installation again. So when you're done with a particular project, be sure to deactivate the virtual environment.\n", "\n", "\n", "### Deleting the Virtual Environment\n", "\n", "- If you no longer need the virtual environment, you can simply delete the environment's folder. Everything installed in the virtual environment will be deleted.\n", "- Use your file manager or the command line to delete the `myenv` directory.\n", "\n", "**What happens?** This is just a clean-up step. Since all the environment's files are contained within this directory, deleting it removes the environment completely.\n", "\n", "Now you've learned the most basic usage of `venv` — how to create, use, and manage a basic Python virtual environment. This is a fundamental skill in Python development, especially when working on multiple projects or when projects have differing dependencies. Remember, each virtual environment is independent, so feel free to experiment without worrying about affecting other projects or your system's Python setup! \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Anaconda: a powerful Python environment management tool for data science\n", "\n", "Although `venv` may be all you need for many use cases (including this course), there are more powerful tools. One such tool, probably the most popular for data science, is Anaconda. It is optional to use Anaconda for this course, but I recommend you to play with it and see if it works for you. It also provides a nice graphical user interface (GUI) for managing environments, which may be helpful for you if you are not comfortable with the command line.\n", "\n", "### What is Anaconda?\n", "\n", "Anaconda is not just a virtual environment management tool, but a complete Python distribution that comes with many useful packages for data science. It is often prefered by data scientists because it comes with many packages pre-installed, and it is easy to install additional packages, even those that are nontrivial to install using `pip`. Another feature is that it comes as an isolated environment (a single folder), so it is easy to use and manage even if you are working in a shared computer that you do not have admin access to.\n", "\n", "### Installing Anaconda\n", "\n", "- [Anaconda](https://www.anaconda.com/)\n", "\n", "Simply follow the instructions on the website based on your operating system. The default installation comes with Python and many useful packages for data science.\n", "It also installs the `conda`, a command-line tool for managing environments. You can think of `conda` as a combined tool that does more or less what `venv` and `pip` do.\n", "\n", "### Using Anaconda\n", "\n", "There are two ways to use Anaconda. If you are not yet comfortable with the command line, you can use the Anaconda Navigator GUI. If you are comfortable with the command line, you can use the `conda` command. I will not go into details here, but you can find many tutorials online.\n", "\n", "#### Creating a virtual environment with conda\n", "\n", "Creating a virtual environment is similar to `venv`. \n", "\n", "```sh\n", "conda create --name myenv\n", "```\n", "\n", "However, conda does not create the virtual environment (folder) in the current directory. Instead, it creates it in a specific directory that is dedicated to all conda environments. The location will depend on your environment and conda will tell you when you create an environment. It also does some \"magic\" in the background that we will not go into details here.\n", "\n", "A nice feature of conda, compared with `venv`, is that it allows you to specify which version of Python you want to use. For example, if you want to use Python 3.8, you can run:\n", "\n", "```sh\n", "conda create --name myenv python=3.8\n", "```\n", "\n", "So, it is much more straightforward to use a specific version of Python with conda than with `venv`.\n", "\n", "### Activating the virtual environment\n", "\n", "Once you have created a conda environment, you can activate it using the `conda activate` command.\n", "\n", "```sh\n", "conda activate myenv\n", "```\n", "\n", "### Installing packages\n", "\n", "You can install packages using `conda install` command. For example, to install `networkx`, you can run:\n", "\n", "```sh \n", "conda install networkx\n", "```\n", "\n", "This looks more or less the same as `pip install`. However, conda is more powerful than `pip` in that it can install packages that are not pure Python packages that cannot be installed with `pip`. Although it is becoming easier to install such packages with `pip`, there are still some packages that are tricky to install with `pip`. This is where anaconda/conda shines and the reason why so many data science projects use it.\n", "\n", "On the other hand, conda does not \"know\" every single package out there, particularly those that are not related to data science, that can be installed with `pip`. So, you'd often need to use both `conda` and `pip` to install all the packages you need, which is annoying and confusing! Making things even more complicated, there are conda channels, which are like package repositories, that contain packages that are not available in the default conda channel. You may not need to worry about this for this course and as long as you don't go deep into cutting-edge packages, but it is something to keep in mind.\n", "\n", "\n", "### Deactivating the virtual environment\n", "\n", "You can deactivate the virtual environment using the `conda deactivate` command.\n", "\n", "```sh\n", "conda deactivate\n", "```\n", "\n", "### Deleting the virtual environment\n", "\n", "You can delete the virtual environment using the `conda remove` command.\n", "\n", "```sh\n", "conda remove --name myenv --all\n", "```\n", "\n", "### Sharing the environment specification\n", "\n", "You can share the environment specification using the `conda env export` command.\n", "\n", "```sh\n", "conda env export > environment.yml\n", "```\n", "\n", "And you can create an environment from the specification using the `conda env create` command.\n", "\n", "```sh\n", "conda env create -f environment.yml\n", "```\n", "\n", "The `environment.yml` file is similar to `requirements.txt` file, but it contains more information about the environment that conda uses (why??? [xkcd: standards](https://xkcd.com/927/)).\n", "\n", "### Summary\n", "\n", "- Anaconda is a Python distribution that comes with many useful packages for data science.\n", "- It also comes with `conda`, a powerful command-line tool for managing environments.\n", "- You can use Anaconda Navigator GUI or `conda` command to manage environments.\n", "- conda's primary power comes from its ability to install packages that are not pure Python packages and cannot be installed with `pip` (somewhat common in data science and scientific computing in general).\n", "- Another superpower of conda is that it allows you to specify which version of Python you want to use.\n", "- However, conda cannot install many non-scientific pacakges (which can be installed by `pip`). Thus you'd often need to use both `conda` and `pip` (it is ok to use both in the same environment).\n", "- conda uses its own environment specification file (`environment.yml`), which is similar to `requirements.txt`. \n", "\n", "Refer the official documentation for more details: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## uv: A modern, fast Python package installer and resolver\n", "\n", "`uv` is a new, high-performance Python package installer and resolver written in Rust. It's designed to be a faster, more reliable alternative to `pip`, with several key advantages:\n", "\n", "### Key Features of uv\n", "\n", "1. **Speed**: `uv` is crazy fast. Often 10-100x faster than `pip`.\n", "\n", "2. **Reliability**: Built-in dependency resolution that's more robust than `pip`'s, helping avoid common dependency conflicts.\n", "\n", "3. **Compatibility**: Works seamlessly with existing Python tooling and virtual environments.\n", "\n", "4. **Modern Design**: Written in Rust, with a focus on performance and reliability.\n", "\n", "### Installing uv\n", "\n", "See https://docs.astral.sh/uv/#installation for more details and up-to-date installation instructions.\n", "\n", "Note that `uv` is installed independently of your Python installation. In other words, it is not constrained by the Python version you have installed, as a package manager, and that is one of the reasons why it can work with multiple Python versions (powerful!).\n", "\n", "### Basic Usage\n", "\n", "You can either use `uv` as a drop-in replacement for `pip` or as a full-pledged package manager that replaces tools like `poetry`. It is still fast-evolving and new features are being added regularly. So, I'd encourage to check the [Official documentation](https://docs.astral.sh/uv/). See [Projects](https://docs.astral.sh/uv/#projects) for the usage as a high-level package manager and [The pip interface](https://docs.astral.sh/uv/#the-pip-interface) for the usage as a drop-in replacement for `pip`.\n", "\n", "Note that because `uv` is new and evolving, some AI models may not be aware of its full features. So, be sure to check the official documentation! " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python on the cloud\n", "\n", "Cloud-based Python environments are becoming more and more popular with the rise of cloud computing. Among many advantages of cloud-based environments, there are two that are particularly relevant. \n", "\n", "First, once you have a cloud-based environment set up, you can access it from anywhere. You can work on your project from your laptop, desktop, or even your phone. Second, you don't have to worry about setting up environments and whatever you have on the cloud tend to be exactly replicable. Usually, these cloud environments are pre-configured with all the necessary software, so you can start working on your project right away. Whenever you fire up a Python notebook on the cloud, you have a clean (exactly same) environment that is ready to use, although changing this base environment can be difficult. \n", "\n", "Thanks to these strengths, cloud-based environments are becoming more and more popular. Many research papers and tutorials are now published as Jupyter notebooks on Google Colaboratory (Colab), so that they can be easily reproduced and run by anyone.\n", "\n", "### Google's colaboratory\n", "\n", "- https://colab.research.google.com/notebook\n", "\n", "Google Colab is a free cloud-based Python environment that comes with many useful packages pre-installed. It is based on Jupyter notebook, so it is easy to use and share. It is also integrated with Google Drive, so you can easily share your notebooks with others and can use data stored in your Google Drive.\n", "\n", "This is what I recommend you to use for this course if you are not comfortable with setting up your own environment in your computer.\n", "\n", "#### Installing packages on Colab\n", "\n", "Each colab notebook is like a virtual computer that is created whenever you create or open a notebook. It also allows you to install pacakges. You can install packages using `pip` as you would do on your own computer. However, because it does not provide you with a command line interface, you need to run the `pip` command _inside_ your notebook. \n", "\n", "For example, you can run the following code in a Colab notebook to install `networkx` (it is already installed though).\n", "\n", "```python\n", "!pip install networkx\n", "```\n", "\n", "Note that you need to put `!` in front of the command. Whenever we start a line with `!` in a Jupyter notebook, it runs the command in the terminal instead of asking it to run as Python code. So this command is equivalent to running `pip install networkx` in the terminal.\n", "\n", "Also note that you need to run this command every time you open a new notebook. This is because each notebook is like a new computer, and you need to install packages every time you open a new notebook.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Jupyter notebook\n", "\n", "Google Colab is essentially a Jupyter notebook that's running on virtual computers in Google Cloud. So, what is Jupyter notebook?\n", "\n", "### Interactivity can be incredibly powerful \n", "\n", "Python is an interpreted language, so we can have a \"conversation\" with the Python interpreter. For example, we can first define some variables and then use them in the next line. \n", "\n", "```python\n", "In [1]: x = 1\n", "\n", "In [2]: y = 2\n", "\n", "In [3]: x + y\n", "\n", "Out[3]: 3\n", "```\n", "\n", "People realized that this ability to converse with a programming language can be incredibly powerful for data science (and scientific computing in general). For instance, when you load a big tabular dataset, it is super useful to be able to explore the dataset interactively and process the data step by step. \n", "\n", "Imagine writing a script that performs a series of complicated data processing operations and analyses, where each step depends on the results of the previous steps. To develop this script without any interactivity, you need to go through a tedious loop of (1) writing the initial script, (2) running it, (3) checking the results, and (4) going back to step (1) to change the script. This is not only tedious but also error-prone. Moreover, as the size of the data increases, this process becomes more and more inefficient. You will need to wait for a long time for each iteration. \n", "\n", "### The idea of computational/computable document\n", "\n", "Probably the pioneer who realized this potential for the first time was Mathematica, a popular software for mathematical computing. It provides a powerful interface where you can interactively create a document that contains text, code, and visualizations. This document can not only present the results of an analysis, but it can also contain the code that can be executed to reproduce the results and even modified to explore alternatives. This was a revolutionary idea. \n", "\n", "However, it was not free and open-source, and it was not Python! This great idea began to spread to other languages, leading to the IPython and Jupyter project. \n", "\n", "### IPython notebook and Jupyter\n", "\n", "IPython (Interactive Python) project was a nice attempt to bring this idea to Python. It was a command-line tool (an IDLE replacement) that allowed you to interactively write and execute Python code. It also implemented the idea of computational document, via \"IPython notebook\", which was pretty much what Mathematica was doing, but without the fancy GUI and limited ability to create and interact with visualizations. \n", "\n", "In terms of its capacity, it was not a match for Mathematica, but it was free and open-source, and it was Python! It was also a great tool for data science, and it became very popular among data scientists. This eventually lead to the creation of Jupyter project. \n", "\n", "- https://jupyter.org/\n", "\n", "Jupyter project was born because the same idea can be applied to many other languages besides Python, and because people realized that it is possible to extract the _interface_ of IPython notebook and make it language-agnostic (and web-based!). Whatever interpreted language we use, we can have the exactly same interface that does not care about what language people are inputting. And then the language interpreter can interpret the code and return the results to the interface.\n", "\n", "The name Jupyter came from the combination of Julia, Python, and R, the three languages that the Jupyter project initially supported (now it supports many more).\n", "\n", "### What is JupyterLab?\n", "\n", "- https://jupyterlab.readthedocs.io/en/stable/\n", "\n", "JupyterLab is a successor of the initial Jupyter notebook interface. It aims to be a more comprehensive development environment where you can put together multiple notebooks, terminal, text editor, and so on. It is under an active development and this is what you want to use in most cases. \n", "\n", "### Installing JupyterLab\n", "\n", "- https://jupyter.org/install\n", "\n", "You can install JupyterLab using `pip` or `conda`. \n", "\n", "```sh\n", "pip install jupyterlab\n", "```\n", "\n", "or \n", "\n", "```sh\n", "conda install -c conda-forge jupyterlab\n", "```\n", "\n", "### Running JupyterLab\n", "\n", "You can run JupyterLab using the `jupyter lab` command.\n", "\n", "```sh\n", "jupyter lab\n", "```\n", "\n", "This will open a new tab in your browser. You can create a new notebook by clicking the `+` button on the top left corner. You can also create a new notebook by clicking `File -> New -> Notebook` in the menu bar.\n", "\n", "### \"Kernels\" and ipykernel\n", "\n", "One thing that may be quite confusing at first, especially with respect to the virtual environments, is the concept of \"kernel\". Let's say you have created a virtual environment and then ran `jupyter lab` in the terminal. If you're using Jupyter for the first time, I'd bet that your natural assumption is that Jupyter lab is using the virtual environment you have created and you can use all the packages that you have just installed. But that's not the case! 😬\n", "\n", "As mentioned earlier, Jupyter is _language-agnostic web interface_. It does not care about what language you are using. It just sends the code you write to the language interpreter and returns the results. So, you need to tell Jupyter which language (and which _version_!) you want to use. This is what \"kernel\" means. \n", "\n", "Say, you have created a virtual environment named `myenv` and installed `networkx` in it. The Python interpreter in this virtual environment knows about `networkx`, but Jupyter does not know about this particular python interpreter (a \"kernel\") and `myenv` environment until we tell it.\n", "\n", "So, how can we let Jupyter know about our virtual environment and corresponding kernel (Python interpreter)? Here is the steps that we need to take:\n", "\n", "1. Install `ipykernel` in the virtual environment.\n", "2. Register the virtual environment as a kernel.\n", "3. Run Jupyter lab and select the kernel.\n", "\n", "#### 1. Install `ipykernel` in the virtual environment\n", "\n", "First, we need to install `ipykernel` in the virtual environment. This is a package that allows us to register the virtual environment as a kernel. \n", "\n", "```sh\n", "pip install ipykernel\n", "```\n", "\n", "#### 2. Register the virtual environment as a kernel\n", "\n", "Next, we need to register the virtual environment as a kernel. This is done by running the following command:\n", "\n", "```sh\n", "python -m ipykernel install --user --name=myenv\n", "```\n", "\n", "This will register the current virtual environment as a kernel named `myenv`. You can check this by running the following command:\n", "\n", "```sh\n", "jupyter kernelspec list\n", "```\n", "\n", "And then when we open Jupyter lab, we can select the kernel we want to use. Whenever we open a new or existing notebook, we can select or change the kernel by clicking `Kernel -> Change kernel` in the menu bar. When you change the kernel, it brings all the packages installed in the corresponding virtual environment.\n", "\n", "Now you are ready to use Jupyter lab with your virtual environment! 🎉\n", "\n", "\n", "### A convenient way to work with Jupyter notebooks: VSCode and other IDEs\n", "\n", "VSCode and other IDEs let you work with Jupyter notebooks more or less in the same way as JupyterLab (remember that Jupyter is just a language-agnostic interface for Python and other languages). It also removes a lot of tedious steps that you need to take with JupyterLab. For example, you don't need to install `ipykernel` and register the virtual environment as a kernel. VSCode handles that and let you simply choose the virtual environment you want to use.\n", "\n", "I strongly recommend to use VS Code for this course and for general data science in general. It is a great IDE with a powerful plugin ecosystem. \n", "\n", "You can find great tutorials on how to use VS Code for Jupyter notebooks. Here are some examples: https://www.youtube.com/results?search_query=vscode+jupyter+notebook " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Assignments\n", "\n", "**Q1: Create a virtual environment using your preferred method (e.g., `venv`, `conda`, or `poetry`). Install the following packages into your virtual environment. Create your environment file and submit it.**\n", "\n", "List of pacakges to install (you can install more if you want):\n", "- `numpy`\n", "- `pandas`\n", "- `matplotlib`\n", "- `networkx`\n", "- `scipy`\n", "- `jupyterlab`\n", "- `ipykernel`\n", "- `nbformat`\n", "\n", "The environment file can be the following:\n", "- If you use `venv`, submit `requirements.txt` file created by `pip freeze`.\n", "- If you use `conda`, submit `environment.yml` file created by `conda env export > environment.yml`.\n", "- If you use `poetry`, submit `pyproject.toml` and `poetry.lock` files created by `poetry`\n", "\n", "**Q2: set up Jupyter lab and `ipykernel` in your virtual environment. Create a notebook, choose the right kernel, create a cell and import the packages that you have installed into the virtual environment (see below). Make sure to run this cell and check whether they can be imported successfully. Submit this notebook as a HTML file and a notebook file.** \n", "\n", "Check [this document](https://github.com/yy/dviz-course/wiki/How-to-export-a-notebook-in-HTML) for instructions on exporting a notebook as an HTML file.\n", "\n", "```python\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import networkx as nx\n", "import scipy as sp\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 1 }