{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": { "word_id": "4818_07_r" }, "source": [ "# 7.8. Analyzing data with R in the IPython notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**UPDATE (2014-09-29)**: in newer versions of rpy2, the IPython extension with the R magic is `rpy2.ipython` and not `rmagic` as stated in the book." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are three steps to use R from IPython. First, install R and rpy2 (R to Python interface). Of course, you only need to do this step once. Then, to use R in an IPython session, you need to load the IPython R extension." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Download and install R for your operating system. (http://cran.r-project.org/mirrors.html)\n", "2. Download and install [rpy2](http://rpy.sourceforge.net/rpy2.html). Windows users can try to download an *experimental* installer on Chris Gohlke's webpage. (http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2)\n", "3. Then, to be able to execute R code in an IPython notebook, execute `%load_ext rpy2.ipython` first." ] }, { "cell_type": "markdown", "metadata": { "style": "tip" }, "source": [ "rpy2 does not appear to work well on Windows. We recommend using Linux or Mac OS X." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To install R and rpy2 on Ubuntu, run the following commands:\n", "\n", " sudo apt-get install r-base-dev\n", " sudo apt-get install python-rpy2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we will use the following workflow. First, we load data from Python. Then, we use R to design and fit a model, and to make some plots in the IPython notebook. We could also load data from R, or design and fit a statistical model with Python's statsmodels package, etc. In particular, the analysis we do here could be done entirely in Python, without resorting to the R language. This recipe just shows the basics of R and illustrates how R and Python can play together within an IPython session." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Let's load the *longley* dataset with the statsmodels package. This dataset contains a few economic indicators in the US from 1947 to 1962. We also load the IPython R extension." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import statsmodels.datasets as sd" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = sd.longley.load_pandas()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%load_ext rpy2.ipython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. We define `x` and `y` as the exogeneous (independent) and endogenous (dependent) variables, respectively. The endogenous variable quantifies the total employment in the country." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data.endog_name, data.exog_name" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y, x = data.endog, data.exog" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. For convenience, we add the endogenous variable to the `x` DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x['TOTEMP'] = y" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "strip_output": [ 4, 3 ] }, "outputs": [], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. We will make a simple plot in R. First, we need to pass Python variables to R. We can use the `%R -i var1,var2` magic. Then, we can call R's `plot` command." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "gnp = x['GNP']\n", "totemp = x['TOTEMP']" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%R" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%R -i totemp,gnp plot(gnp, totemp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Now that the data has been passed to R, we can fit a linear model to the data. The `lm` function lets us perform a linear regression. Here, we want to express `totemp` (total employement) as a function of the country's GNP." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%R\n", "fit <- lm(totemp ~ gnp); # Least-squares regression\n", "print(fit$coefficients) # Display the coefficients of the fit.\n", "plot(gnp, totemp) # Plot the data points.\n", "abline(fit) # And plot the linear regression." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }