{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading saved R data frames in Julia\n", "\n", "One thing that is done very well in the [R](http://R-project.org) language is saving \"data sets\" compactly while preserving metadata.\n", "\n", "The basic R structure corresponding to a data set in, say, SPSS or SAS, or a table in an SQL data base or a sheet in a spreadsheet is the [data.frame](https://stat.ethz.ch/R-manual/R-devel/.../data.frame.html).\n", "\n", "Recently [Hadley Wickham](https://twitter.com/hadleywickham) tweeted about building an R package containing the injury data from [NEISS](http://www.cpsc.gov/en/Research--Statistics/NEISS-Injury-Data/) and [Jenny Bryant](https://twitter.com/JennyBryan) asked about CSV files. This prompted a discussion about saving data frames in the [.rds](https://stat.ethz.ch/R-manual/R-devel/library/base/html/readRDS.html) or [.rda/.RData](https://stat.ethz.ch/R-manual/R-devel/library/base/html/save.html) formats, as opposed to a quasi-format like a CSV file. (Die spreadsheets! Die!)\n", "\n", "I completely agree that `.rds` or `.rda` is the way to go, which is why I wrote the `read_rda` function for the [Julia](http://julialang.org) [DataFrames](https://github.com/JuliaStats/DataFrames.jl) package and later, with others, wrote the [RCall](https://github.com/JuliaStats/RCall.jl) package.\n", "\n", "To successfully install the `RCall` package you must have a version of R accessible but, as R is an open source project, that should not be an impediment.\n", "\n", "To obtain the data from the [neiss](https://github.com/hadley/neiss) package for R as a Julia dataframe, you install the R package as described at that repository then use " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(2332957,18)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using RCall\n", "population = rcopy(\"neiss::population\");\n", "products = rcopy(\"neiss::products\");\n", "injuries = rcopy(\"neiss::injuries\");\n", "size(injuries)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DataFrames.DataFrame 2332957 observations of 18 variables\n", " case_num: DataArrays.DataArray{Int32,1}(2332957) Int32[90101432,90101434,90101435,90101436]\n", " trmt_date: DataArrays.DataArray{Float64,1}(2332957) [14245.0,14245.0,14245.0,14245.0]\n", " psu: DataArrays.DataArray{Float64,1}(2332957) [61.0,61.0,61.0,61.0]\n", " weight: DataArrays.DataArray{Float64,1}(2332957) [15.3491,15.3491,15.3491,15.3491]\n", " stratum: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"V\",\"V\",\"V\",\"V\"]\n", " age: DataArrays.DataArray{Float64,1}(2332957) [5.0,51.0,2.0,20.0]\n", " sex: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"Male\",\"Male\",\"Female\",\"Male\"]\n", " race: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"Other / Mixed Race\",\"White\",\"White\",\"White\"]\n", " race_other: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"hispanic\",NA,NA,NA]\n", " diag: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"Strain, Sprain\",\"Contusion Or Abrasion\",\"Laceration\",\"Contusion Or Abrasion\"]\n", " diag_other: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[NA,NA,NA,NA]\n", " body_part: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"Neck\",\"Eyeball\",\"Face\",\"Toe\"]\n", " disposition: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"Released\",\"Released\",\"Released\",\"Released\"]\n", " location: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"Home\",\"Home\",\"Home\",\"Home\"]\n", " fmv: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"No fire/flame/smoke\",\"No fire/flame/smoke\",\"No fire/flame/smoke\",\"No fire/flame/smoke\"]\n", " prod1: DataArrays.DataArray{Float64,1}(2332957) [1807.0,899.0,4057.0,1884.0]\n", " prod2: DataArrays.DataArray{Float64,1}(2332957) [NA,NA,NA,NA]\n", " narrative: DataArrays.DataArray{ASCIIString,1}(2332957) ASCIIString[\"5 YOM ROLLING ON FLOOR DOING A SOMERSAULT AND SUSTAINED A CERVICAL STRA IN\",\"51 YOM C/O PAIN AND IRRITATION TO RIGHT EYE, HAD BEEN GRINDING METAL AT HOME AND POSSIBLY THE CAUSE, FOUND TO HAVE METAL IN EYE, CORNEAL ABRAS\",\"2 YOF WAS RUNNING THROUGH HOUSE AND FELL INTO CORNER OF TABLE SUSTAININ G A LACERATION TO FACE NEAR INSIDE CORNER OF RIGHT EYE ALONGSIDE NOSE\",\"20 YOM PUNCHED AND KICKED A WALL D/T DRINKING TOO MUCH LAST NIGHT, SUST AINED CONTUSIONS AND ABRASIONS TO RIGHT MIDDLE TOE, RIGHT HAND\"]\n" ] } ], "source": [ "dump(injuries) # sort of like R's str function but not as polished" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "18-element Array{Symbol,1}:\n", " :case_num \n", " :trmt_date \n", " :psu \n", " :weight \n", " :stratum \n", " :age \n", " :sex \n", " :race \n", " :race_other \n", " :diag \n", " :diag_other \n", " :body_part \n", " :disposition\n", " :location \n", " :fmv \n", " :prod1 \n", " :prod2 \n", " :narrative " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "names(injuries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's it.\n", "\n", "If you want to be a little more fancy you can import the entire R package as a Julia module" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "@rimport neiss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The objects in this module are Julia types corresponding to the underlying R objects called SEXPRECs." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " injuries 8 bytes RCall.RObject{RCall.VecSxp}\n", " neiss 53 bytes Module\n", " population 8 bytes RCall.RObject{RCall.VecSxp}\n", " products 8 bytes RCall.RObject{RCall.VecSxp}\n" ] } ], "source": [ "whos(neiss) # reads like Santa Claus deciding who's naughty and .." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can work with the R objects in Julia, calling R functions, etc. but most of the time it is simpler to copy the R data frame to Julia.\n", "\n", "A point of interest for those who may have tried something like this in other languages, there is no \"glue\" code written in C or C++ in the RCall package. It is all done in Julia." ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.4.2", "language": "julia", "name": "julia-0.4" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }