{
 "metadata": {
  "name": "rmagic_extension"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Rmagic Functions Extension"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%pylab inline"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Line magics"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* IPython has an `rmagic` extension that contains a some magic functions for working with R via rpy2. \n",
      "* This extension can be loaded using the `%load_ext` magic as follows:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "%load_ext rmagic "
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* We can go from numpy arrays to compute some statistics in R and back\n",
      "* Let's suppose we just want to fit a simple linear model to a scatterplot."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "import pylab\n",
      "X = np.array([0,1,2,3,4])\n",
      "Y = np.array([3,5,4,6,7])\n",
      "pylab.scatter(X, Y)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* We can accomplish this by first pushing variables to R\n",
      "* Then fitting a model\n",
      "* And finally returning the results\n",
      "* The line magic `%Rpush` copies its arguments to variables of the same name in rpy2\n",
      "* The `%R` line magic evaluates the string in rpy2 and returns the result"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%Rpush X Y\n",
      "%R lm(Y~X)$coef"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can check that this is correct fairly easily:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "Xr = X - X.mean(); Yr = Y - Y.mean()\n",
      "slope = (Xr*Yr).sum() / (Xr**2).sum()\n",
      "intercept = Y.mean() - X.mean() * slope\n",
      "(intercept, slope)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It is also possible to return more than one value with %R."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%R resid(lm(Y~X)); coef(lm(Y~X))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* One can also easily capture the results of %R into python objects. \n",
      "* Like R, the return value of this multiline expression is the final value, which is the `coef(lm(X~Y))`. \n",
      "* To pull other variables from R, there is one more set of magic functions"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* `%Rpull` and `%Rget` \n",
      "* Both are useful to retrieve variables in the rpy2 namespace\n",
      "* The main difference is that one returns the value (%Rget), while the other pulls it to the user's namespace.\n",
      "* Imagine we've stored the results of some calculation in the variable `a` in rpy2's namespace. \n",
      "* By using the %R magic, we can obtain these results and store them in b. \n",
      "* We can also pull them directly to the namespace with %Rpull. \n",
      "* Note that they are both views on the same data."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "b = %R a=resid(lm(Y~X))\n",
      "%Rpull a\n",
      "print a\n",
      "assert id(b.data) == id(a.data)\n",
      "%R -o a"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "%Rpull is equivalent to calling %R with just -o\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%R d=resid(lm(Y~X)); e=coef(lm(Y~X))\n",
      "%R -o d -o e\n",
      "%Rpull e\n",
      "print d\n",
      "print e\n",
      "import numpy as np\n",
      "np.testing.assert_almost_equal(d, a)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "On the other hand %Rpush is equivalent to calling %R with just -i and no trailing code."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "A = np.arange(20)\n",
      "%R -i A\n",
      "%R mean(A)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The magic %Rget retrieves one variable from R."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%Rget A"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Plotting and capturing output"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* R's console (i.e. its stdout() connection) is captured by ipython,\n",
      "* So are any plots which are published as PNG files\n",
      "* As a call to %R may produce a return value (see above), what happens to a magic like the one below?\n",
      "* The R code specifies that something is published to the notebook. \n",
      "* If anything is published to the notebook, that call to %R returns None."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "v1 = %R plot(X,Y); print(summary(lm(Y~X))); vv=mean(X)*mean(Y)\n",
      "print 'v1 is:', v1\n",
      "v2 = %R mean(X)*mean(Y)\n",
      "print 'v2 is:', v2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "What value is returned from %R?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Some calls have no particularly interesting return value, the magic `%R` will not return anything in this case. \n",
      "* The return value in rpy2 is actually NULL so `%R` returns None."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "v = %R plot(X,Y)\n",
      "assert v == None"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Also, if the return value of a call to %R (inline mode) has just been printed to the console, then its value is also not returned."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "v = %R print(X)\n",
      "assert v == None"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* If the last value did not print anything to console, the value is returned"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "v = %R print(summary(X)); X\n",
      "print 'v:', v"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* The return value can be suppressed by a trailing ';' or an -n argument"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "%R -n X"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "%R X; "
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Cell level magic"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* What if we want to run several lines of R code\n",
      "* This is the cell-level magic.\n",
      "* For the cell level magic, inputs can be passed via the -i or --inputs argument in the line\n",
      "* These variables are copied from the shell namespace to R's namespace using `rpy2.robjects.r.assign` \n",
      "* It would be nice not to have to copy these into R: rnumpy ( http://bitbucket.org/njs/rnumpy/wiki/API ) has done some work to limit or at least make transparent the number of copies of an array. \n",
      "* Arrays can be output from R via the -o or --outputs argument in the line. All other arguments are sent to R's png function, which is the graphics device used to create the plots.\n",
      "* We can redo the above calculations in one ipython cell. \n",
      "* We might also want to add a summary or perhaps the standard plotting diagnostics of the lm."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R -i X,Y -o XYcoef\n",
      "XYlm = lm(Y~X)\n",
      "XYcoef = coef(XYlm)\n",
      "print(summary(XYlm))\n",
      "par(mfrow=c(2,2))\n",
      "plot(XYlm)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Passing data back and forth"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Currently (Summer 2012), data is passed through `RMagics.pyconverter` when going from python to R and `RMagics.Rconverter` when going from R to python. \n",
      "* These currently default to numpy.ndarray. Future work will involve writing better converters, most likely involving integration with http://pandas.sourceforge.net.\n",
      "* Passing ndarrays into R requires a copy, though once an object is returned to python, this object is NOT copied, and it is possible to change its values."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "seq1 = np.arange(10)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R -i seq1 -o seq2\n",
      "seq2 = rep(seq1, 2)\n",
      "print(seq2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "seq2[::2] = 0\n",
      "seq2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%R\n",
      "print(seq2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Once the array data has been passed to R, modifring its contents does not modify R's copy of the data."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "seq1[0] = 200\n",
      "%R print(seq1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* If we pass data as both input and output, then the value of \"data\" in the user's namespace will be overwritten \n",
      "* the new array will be a view of the data in R's copy."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print seq1\n",
      "%R -i seq1 -o seq1\n",
      "print seq1\n",
      "seq1[0] = 200\n",
      "%R print(seq1)\n",
      "seq1_view = %R seq1\n",
      "assert(id(seq1_view.data) == id(seq1.data))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Exception handling\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Exceptions are handled by passing back rpy2's exception and the line that triggered it."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "try:\n",
      "    %R -n nosuchvar\n",
      "except Exception as e:\n",
      "    print e"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Structured arrays and data frames\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* In R, data frames play an important role as they allow array-like objects of mixed type with column names (and row names). \n",
      "* In numpy, the closest analogy is a structured array with named fields. \n",
      "* In future work, it would be nice to use pandas to return full-fledged DataFrames from rpy2. \n",
      "* In the mean time, structured arrays can be passed back and forth with the -d flag to %R, %Rpull, and %Rget"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "datapy= np.array([(1, 2.9, 'a'), (2, 3.5, 'b'), (3, 2.1, 'c')],\n",
      "          dtype=[('x', '<i4'), ('y', '<f8'), ('z', '|S1')])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "%%R -i datapy -d datar\n",
      "datar = datapy"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "datar"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%R datar2 = datapy\n",
      "%Rpull -d datar2\n",
      "datar2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%Rget -d datar2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For arrays without names, the -d argument has no effect because the R object has no colnames or names."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "Z = np.arange(6)\n",
      "%R -i Z\n",
      "%Rget -d Z"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* For mixed-type data frames in R, if the -d flag is not used, then an array of a single type is returned \n",
      "* Its value is transposed. \n",
      "* This would be nice to fix, but it seems something that should be fixed at the rpy2 level. See [here](https://bitbucket.org/lgautier/rpy2/issue/44/numpyrecarray-as-dataframe)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%Rget datar2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}