{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fonnesbeck/Bios8366/blob/master/notebooks/Section0_1-IPython_and_Jupyter.ipynb)\n", "\n", "# IPython\n", "\n", "**IPython** (Interactive Python) is an enhanced Python shell which provides a more robust and productive development environment for users. There are several key features that set it apart from the standard Python shell.\n", "\n", "### History\n", "\n", "In IPython, all your inputs and outputs are saved. There are two variables named `In` and `Out` which are assigned as you work with your results. All outputs are saved automatically to variables of the form `_N`, where `N` is the prompt number, and inputs to `_iN`. This allows you to recover quickly the result of a prior computation by referring to its number even if you forgot to store it as a variable. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.sin(4)**2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_i1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_1 / 4." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Output is asynchronous\n", "\n", "All output is displayed asynchronously as it is generated in the Kernel. If you execute the next cell, you will see the output one piece at a time, not all at the end." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time, sys\n", "for i in range(8):\n", " print(i)\n", " time.sleep(0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introspection\n", "\n", "If you want details regarding the properties and functionality of any Python objects currently loaded into IPython, you can use the `?` to reveal any details that are available:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "some_dict = {}\n", "some_dict?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If available, additional detail is provided with two question marks, including the source code of the object itself." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from numpy.linalg import cholesky\n", "cholesky??" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This syntax can also be used to search namespaces with wildcards (\\*)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import pylab as plt\n", "plt.*plot*?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tab completion\n", "\n", "Because IPython allows for introspection, it is able to afford the user the ability to tab-complete commands that have been partially typed. This is done by pressing the `` key at any point during the process of typing a command.\n", "\n", "**Place your cursor after the partially-completed command below and press tab:**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.ar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### System commands\n", "\n", "In IPython, you can type `ls` to see your files or `cd` to change directories, just like you would at a regular system prompt:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls ../data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Virtually any system command can be accessed by prepending `!`, which passes any subsequent command directly to the OS." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!touch test.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can even use Python variables in commands sent to the OS:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file_type = 'csv'\n", "!ls ../data/*$file_type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output of a system command using the exclamation point syntax can be assigned to a Python variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_files = !ls ../data/microbiome/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Jupyter Notebook\n", "\n", "Over time, the IPython project grew to include several components, including:\n", "\n", "* an interactive shell\n", "* a REPL protocol\n", "* a notebook document fromat\n", "* a notebook document conversion tool\n", "* a web-based notebook authoring tool\n", "* tools for building interactive UI (widgets)\n", "* interactive parallel Python\n", "\n", "As each component has evolved, several had grown to the point that they warrented projects of their own. For example, pieces like the notebook and protocol are not even specific to Python. As the result, the IPython team created Project Jupyter, which is the new home of language-agnostic projects that began as part of IPython, such as the notebook in which you are reading this text.\n", "\n", "The HTML notebook that is part of the Jupyter project supports **interactive data visualization** and easy high-performance **parallel computing**. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "\n", "def f(x):\n", " return (x-3)*(x-5)*(x-7)+85\n", "\n", "import numpy as np\n", "x = np.linspace(0, 10, 200)\n", "y = f(x)\n", "p = plt.plot(x,y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The notebook lets you document your workflow using either HTML or Markdown, providing a complete and self-contained record of a computation that can be exported to various formats and shared.\n", "\n", "The Jupyter Notebook consists of three interacting components:\n", "\n", "* A notebook web application: An interactive web application for writing and running code interactively and authoring notebook documents.\n", "* Kernels: Separate processes started by the notebook web application that runs notebook code and returns output to the web application, as well as secondary features like interactive widgets, tab completion and introspection.\n", "* Notebook documents: JSON documents that contain a representation of all content visible in the notebook web application, including inputs and outputs of the computations, narrative text, equations, images, and rich media representations of objects. They are stored on your filesystem with an `.ipynb` extension.\n", "\n", "The Notebook can be used by starting the Notebook server with the command:\n", "\n", " $ jupyter notebook\n", " \n", "This opens a Jupyter notebook dashboard that acts as a home page for your Jupyter instance. It displays the notebooks and other files in your current directory.\n", "\n", "The notebook web application provides a rich computing environment for data science work. For example, you can embed images, videos, or entire websites into notebooks:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import HTML\n", "HTML(\"\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "from IPython.display import YouTubeVideo\n", "YouTubeVideo(\"GExKsQ-OU78\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next-generation web interface for Jupyter notebooks is called **JupyterLab**. JupyterLab is closer to an integrated development enviroment (IDE) in its design, including things like custom components, terminals, and text editors in a constomizable, panel-based interface. It is quickly becoming the default Jupyter front-end.\n", "\n", "JupyterLab can be invoked from the command line via:\n", "\n", " $ jupyter lab\n", "\n", "Due to its extensibility, Jupyter can also be run from a variety of third-party apps, including the popular [Visual Studio Code (VSCode)](https://code.visualstudio.com/docs/datascience/jupyter-notebooks) editor." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Remote Code\n", "\n", "Use `%load` to add remote code" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load http://matplotlib.org/mpl_examples/shapes_and_collections/scatter_demo.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mathjax Support\n", "\n", "Mathjax ia a javascript implementation $\\alpha$ of LaTeX that allows equations to be embedded into HTML. For example, this markup:\n", "\n", " \"\"\"$$ \\int_{a}^{b} f(x)\\, dx \\approx \\frac{1}{2} \\sum_{k=1}^{N} \\left( x_{k} - x_{k-1} \\right) \\left( f(x_{k}) + f(x_{k-1}) \\right). $$\"\"\"\n", " \n", "becomes this:\n", "\n", "$$\n", "\\int_{a}^{b} f(x)\\, dx \\approx \\frac{1}{2} \\sum_{k=1}^{N} \\left( x_{k} - x_{k-1} \\right) \\left( f(x_{k}) + f(x_{k-1}) \\right).\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SymPy Support\n", "\n", "SymPy is a Python library for symbolic mathematics. It supports:\n", "\n", "* polynomials\n", "* calculus\n", "* solving equations\n", "* discrete math\n", "* matrices" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Never import like this!\n", "from sympy import *\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "init_printing()\n", "x, y = symbols(\"x y\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "eq = ((x+y)**2 * (x+1))\n", "eq" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "expand(eq)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(1/cos(x)).series(x, 0, 6)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "limit((sin(x)-x)/x**3, x, 0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "diff(cos(x**2)**2 / (1+x), x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Magic functions\n", "\n", "Jupyter has a set of predefined ‘magic functions’ that you can call with a command line style syntax. These include:\n", "\n", "* `%run`\n", "* `%edit`\n", "* `%debug`\n", "* `%timeit`\n", "* `%paste`\n", "* `%load_ext`\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%lsmagic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Timing the execution of code; the `timeit` magic exists both in line and cell form:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%timeit np.linalg.eigvals(np.random.rand(100,100))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%timeit a = np.random.rand(100, 100)\n", "np.linalg.eigvals(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "IPython also creates aliases for a few common interpreters, such as bash, ruby, perl, etc.\n", "\n", "These are all equivalent to `%%script `" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%ruby\n", "puts \"Hello from Ruby #{RUBY_VERSION}\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "echo \"hello from $BASH\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "IPython has an `rmagic` extension that contains a some magic functions for working with R via rpy2. This extension can be loaded using the `%load_ext` magic as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext rpy2.ipython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the above generates an error, it is likely that you do not have the `rpy2` module installed. You can install this now via:\n", "\n", " !pip install rpy2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x,y = np.arange(10), np.random.normal(size=10)\n", "%R print(lm(rnorm(10)~rnorm(10)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%R -i x,y -o XYcoef\n", "lm.fit <- lm(y~x)\n", "par(mfrow=c(2,2))\n", "print(summary(lm.fit))\n", "plot(lm.fit)\n", "XYcoef <- coef(lm.fit)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "XYcoef" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### LaTeX\n", "\n", "In addition to MathJax support, you may declare a LaTeX cell using the `%latex` magic:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%latex\n", "\\begin{align}\n", "\\nabla \\times \\vec{\\mathbf{B}} -\\, \\frac1c\\, \\frac{\\partial\\vec{\\mathbf{E}}}{\\partial t} & = \\frac{4\\pi}{c}\\vec{\\mathbf{j}} \\\\\n", "\\nabla \\cdot \\vec{\\mathbf{E}} & = 4 \\pi \\rho \\\\\n", "\\nabla \\times \\vec{\\mathbf{E}}\\, +\\, \\frac1c\\, \\frac{\\partial\\vec{\\mathbf{B}}}{\\partial t} & = \\vec{\\mathbf{0}} \\\\\n", "\\nabla \\cdot \\vec{\\mathbf{B}} & = 0\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Javscript\n", "\n", "Jupyter also enables objects to declare a JavaScript representation. At first, this may seem odd as output is inherently visual and JavaScript is a programming language. However, this opens the door for rich output that leverages the full power of JavaScript and associated libraries such as D3 for output." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%javascript\n", "\n", "alert(\"Hello world!\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exporting and Converting Notebooks\n", "\n", "In Jupyter, one can convert an `.ipynb` notebook document file into various static formats via the `nbconvert` tool. Currently, nbconvert is a command line tool, run as a script using Jupyter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!jupyter nbconvert --to html Section0_1-IPython_and_Jupyter.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently, `nbconvert` supports HTML (default), LaTeX, Markdown, reStructuredText, Python and HTML5 slides for presentations. Some types can be post-processed, such as LaTeX to PDF (this requires [Pandoc](http://johnmacfarlane.net/pandoc/) to be installed, however)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!jupyter nbconvert --to pdf Section2_1-Introduction-to-Pandas.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A very useful online service is the [Jupyter Notebook Viewer](https://nbviewer.org/) which allows you to display your notebook as a static HTML page, which is useful for sharing with others:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also, GitHub supports the [rendering of Jupyter Notebooks](https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/Section1_2-Programming-with-Python.ipynb) stored on its repositories." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reproducible Research\n", "\n", "> reproducing conclusions from a single experiment based on the measurements from that experiment\n", "\n", "The most basic form of reproducibility is a complete description of the data and associated analyses (including code!) so the results can be *exactly* reproduced by others.\n", "\n", "Reproducing calculations can be onerous, even with one's own work!\n", "\n", "Scientific data are becoming larger and more complex, making simple descriptions inadequate for reproducibility. As a result, most modern research is irreproducible without tremendous effort.\n", "\n", "**Reproducible research is not yet part of the culture of science in general, or scientific computing in particular.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scientific Computing Workflow\n", "\n", "There are a number of steps to scientific endeavors that involve computing:\n", "\n", "![workflow](images/workflow.png)\n", "\n", "\n", "Many of the standard tools impose barriers between one or more of these steps. This can make it difficult to iterate, reproduce work.\n", "\n", "The Jupyter notebook eliminates or reduces these barriers to reproducibility." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Links and References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[IPython Notebook Viewer](http://nbviewer.ipython.org) Displays static HTML versions of notebooks, and includes a gallery of notebook examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[NotebookCloud](https://notebookcloud.appspot.com) A service that allows you to launch and control IPython Notebook servers on Amazon EC2 from your browser." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data](http://ged.msu.edu/papers/2012-diginorm/) A landmark example of reproducible research in genomics: Git repo, iPython notebook, data and scripts.\n", "\n", "Jacques Ravel and K Eric Wommack. 2014. [All Hail Reproducibility in Microbiome Research](http://www.microbiomejournal.com/content/pdf/2049-2618-2-8.pdf). Microbiome, 2:8.\n", "\n", "Benjamin Ragan-Kelley et al.. 2013. [Collaborative cloud-enabled tools allow rapid, reproducible biological insights](http://www.nature.com/ismej/journal/v7/n3/full/ismej2012123a.html). The ISME Journal, 7, 461–464; doi:10.1038/ismej.2012.123;" ] } ], "metadata": { "anaconda-cloud": {}, "interpreter": { "hash": "df328b9eace1fcfab6504863805cd8f08ec69a21df2d51a0bb47c89d219bdc7e" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9" }, "nav_menu": {}, "toc": { "navigate_menu": true, "number_sections": false, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }