{ "metadata": { "name": "rmagic_extension" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Rmagic Functions Extension" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pylab inline" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Line magics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* IPython has an `rmagic` extension that contains a some magic functions for working with R via rpy2. \n", "* This extension can be loaded using the `%load_ext` magic as follows:" ] }, { "cell_type": "code", "collapsed": true, "input": [ "%load_ext rmagic " ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We can go from numpy arrays to compute some statistics in R and back\n", "* Let's suppose we just want to fit a simple linear model to a scatterplot." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "import pylab\n", "X = np.array([0,1,2,3,4])\n", "Y = np.array([3,5,4,6,7])\n", "pylab.scatter(X, Y)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We can accomplish this by first pushing variables to R\n", "* Then fitting a model\n", "* And finally returning the results\n", "* The line magic `%Rpush` copies its arguments to variables of the same name in rpy2\n", "* The `%R` line magic evaluates the string in rpy2 and returns the result" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%Rpush X Y\n", "%R lm(Y~X)$coef" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check that this is correct fairly easily:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "Xr = X - X.mean(); Yr = Y - Y.mean()\n", "slope = (Xr*Yr).sum() / (Xr**2).sum()\n", "intercept = Y.mean() - X.mean() * slope\n", "(intercept, slope)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also possible to return more than one value with %R." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%R resid(lm(Y~X)); coef(lm(Y~X))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* One can also easily capture the results of %R into python objects. \n", "* Like R, the return value of this multiline expression is the final value, which is the `coef(lm(X~Y))`. \n", "* To pull other variables from R, there is one more set of magic functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* `%Rpull` and `%Rget` \n", "* Both are useful to retrieve variables in the rpy2 namespace\n", "* The main difference is that one returns the value (%Rget), while the other pulls it to the user's namespace.\n", "* Imagine we've stored the results of some calculation in the variable `a` in rpy2's namespace. \n", "* By using the %R magic, we can obtain these results and store them in b. \n", "* We can also pull them directly to the namespace with %Rpull. \n", "* Note that they are both views on the same data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "b = %R a=resid(lm(Y~X))\n", "%Rpull a\n", "print a\n", "assert id(b.data) == id(a.data)\n", "%R -o a" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "%Rpull is equivalent to calling %R with just -o\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%R d=resid(lm(Y~X)); e=coef(lm(Y~X))\n", "%R -o d -o e\n", "%Rpull e\n", "print d\n", "print e\n", "import numpy as np\n", "np.testing.assert_almost_equal(d, a)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On the other hand %Rpush is equivalent to calling %R with just -i and no trailing code." ] }, { "cell_type": "code", "collapsed": false, "input": [ "A = np.arange(20)\n", "%R -i A\n", "%R mean(A)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The magic %Rget retrieves one variable from R." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%Rget A" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Plotting and capturing output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* R's console (i.e. its stdout() connection) is captured by ipython,\n", "* So are any plots which are published as PNG files\n", "* As a call to %R may produce a return value (see above), what happens to a magic like the one below?\n", "* The R code specifies that something is published to the notebook. \n", "* If anything is published to the notebook, that call to %R returns None." ] }, { "cell_type": "code", "collapsed": false, "input": [ "v1 = %R plot(X,Y); print(summary(lm(Y~X))); vv=mean(X)*mean(Y)\n", "print 'v1 is:', v1\n", "v2 = %R mean(X)*mean(Y)\n", "print 'v2 is:', v2" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "What value is returned from %R?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Some calls have no particularly interesting return value, the magic `%R` will not return anything in this case. \n", "* The return value in rpy2 is actually NULL so `%R` returns None." ] }, { "cell_type": "code", "collapsed": false, "input": [ "v = %R plot(X,Y)\n", "assert v == None" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also, if the return value of a call to %R (inline mode) has just been printed to the console, then its value is also not returned." ] }, { "cell_type": "code", "collapsed": false, "input": [ "v = %R print(X)\n", "assert v == None" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* If the last value did not print anything to console, the value is returned" ] }, { "cell_type": "code", "collapsed": false, "input": [ "v = %R print(summary(X)); X\n", "print 'v:', v" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The return value can be suppressed by a trailing ';' or an -n argument" ] }, { "cell_type": "code", "collapsed": true, "input": [ "%R -n X" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": true, "input": [ "%R X; " ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Cell level magic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* What if we want to run several lines of R code\n", "* This is the cell-level magic.\n", "* For the cell level magic, inputs can be passed via the -i or --inputs argument in the line\n", "* These variables are copied from the shell namespace to R's namespace using `rpy2.robjects.r.assign` \n", "* It would be nice not to have to copy these into R: rnumpy ( http://bitbucket.org/njs/rnumpy/wiki/API ) has done some work to limit or at least make transparent the number of copies of an array. \n", "* Arrays can be output from R via the -o or --outputs argument in the line. All other arguments are sent to R's png function, which is the graphics device used to create the plots.\n", "* We can redo the above calculations in one ipython cell. \n", "* We might also want to add a summary or perhaps the standard plotting diagnostics of the lm." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%R -i X,Y -o XYcoef\n", "XYlm = lm(Y~X)\n", "XYcoef = coef(XYlm)\n", "print(summary(XYlm))\n", "par(mfrow=c(2,2))\n", "plot(XYlm)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Passing data back and forth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Currently (Summer 2012), data is passed through `RMagics.pyconverter` when going from python to R and `RMagics.Rconverter` when going from R to python. \n", "* These currently default to numpy.ndarray. Future work will involve writing better converters, most likely involving integration with http://pandas.sourceforge.net.\n", "* Passing ndarrays into R requires a copy, though once an object is returned to python, this object is NOT copied, and it is possible to change its values." ] }, { "cell_type": "code", "collapsed": true, "input": [ "seq1 = np.arange(10)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%%R -i seq1 -o seq2\n", "seq2 = rep(seq1, 2)\n", "print(seq2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "seq2[::2] = 0\n", "seq2" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%%R\n", "print(seq2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Once the array data has been passed to R, modifring its contents does not modify R's copy of the data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "seq1[0] = 200\n", "%R print(seq1)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* If we pass data as both input and output, then the value of \"data\" in the user's namespace will be overwritten \n", "* the new array will be a view of the data in R's copy." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print seq1\n", "%R -i seq1 -o seq1\n", "print seq1\n", "seq1[0] = 200\n", "%R print(seq1)\n", "seq1_view = %R seq1\n", "assert(id(seq1_view.data) == id(seq1.data))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exception handling\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Exceptions are handled by passing back rpy2's exception and the line that triggered it." ] }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " %R -n nosuchvar\n", "except Exception as e:\n", " print e" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Structured arrays and data frames\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* In R, data frames play an important role as they allow array-like objects of mixed type with column names (and row names). \n", "* In numpy, the closest analogy is a structured array with named fields. \n", "* In future work, it would be nice to use pandas to return full-fledged DataFrames from rpy2. \n", "* In the mean time, structured arrays can be passed back and forth with the -d flag to %R, %Rpull, and %Rget" ] }, { "cell_type": "code", "collapsed": true, "input": [ "datapy= np.array([(1, 2.9, 'a'), (2, 3.5, 'b'), (3, 2.1, 'c')],\n", " dtype=[('x', '