{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "word_id": "4818_07_r"
   },
   "source": [
    "# 7.8. Analyzing data with R in the IPython notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**UPDATE (2014-09-29)**: in newer versions of rpy2, the IPython extension with the R magic is `rpy2.ipython` and not `rmagic` as stated in the book."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are three steps to use R from IPython. First, install R and rpy2 (R to Python interface). Of course, you only need to do this step once. Then, to use R in an IPython session, you need to load the IPython R extension."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Download and install R for your operating system. (http://cran.r-project.org/mirrors.html)\n",
    "2. Download and install [rpy2](http://rpy.sourceforge.net/rpy2.html). Windows users can try to download an *experimental* installer on Chris Gohlke's webpage. (http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2)\n",
    "3. Then, to be able to execute R code in an IPython notebook, execute `%load_ext rpy2.ipython` first."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "style": "tip"
   },
   "source": [
    "rpy2 does not appear to work well on Windows. We recommend using Linux or Mac OS X."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To install R and rpy2 on Ubuntu, run the following commands:\n",
    "\n",
    "    sudo apt-get install r-base-dev\n",
    "    sudo apt-get install python-rpy2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we will use the following workflow. First, we load data from Python. Then, we use R to design and fit a model, and to make some plots in the IPython notebook. We could also load data from R, or design and fit a statistical model with Python's statsmodels package, etc. In particular, the analysis we do here could be done entirely in Python, without resorting to the R language. This recipe just shows the basics of R and illustrates how R and Python can play together within an IPython session."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Let's load the *longley* dataset with the statsmodels package. This dataset contains a few economic indicators in the US from 1947 to 1962. We also load the IPython R extension."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import statsmodels.datasets as sd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "data = sd.longley.load_pandas()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%load_ext rpy2.ipython"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. We define `x` and `y` as the exogeneous (independent) and endogenous (dependent) variables, respectively. The endogenous variable quantifies the total employment in the country."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "data.endog_name, data.exog_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "y, x = data.endog, data.exog"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. For convenience, we add the endogenous variable to the `x` DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "x['TOTEMP'] = y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "strip_output": [
     4,
     3
    ]
   },
   "outputs": [],
   "source": [
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. We will make a simple plot in R. First, we need to pass Python variables to R. We can use the `%R -i var1,var2` magic. Then, we can call R's `plot` command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "gnp = x['GNP']\n",
    "totemp = x['TOTEMP']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%R"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%R -i totemp,gnp plot(gnp, totemp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "5. Now that the data has been passed to R, we can fit a linear model to the data. The `lm` function lets us perform a linear regression. Here, we want to express `totemp` (total employement) as a function of the country's GNP."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "fit <- lm(totemp ~ gnp);  # Least-squares regression\n",
    "print(fit$coefficients)  # Display the coefficients of the fit.\n",
    "plot(gnp, totemp)  # Plot the data points.\n",
    "abline(fit)  # And plot the linear regression."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n",
    "\n",
    "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}