{ "metadata": { "language": "Julia", "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "An Introduction to Gadfly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gadfly is an easy to use visualization package for Julia, the new high level high performance language for technical computing. It follows grammar of graphics principles to simplify translating your ideas to plots - mapping how y changes with x across levels of z. \n", "\n", "The tutorial aims to make Gadfly approachable by using a series of examples. It complements Daniel Jone's [Gadfly Manual](http://dcjones.github.io/Gadfly.jl/) and my reference sheets for Gadfly and Julia.\n", "\n", "Translating your ideas to plots is more efficient using dataframes but we'll start with 1 and multiple dimensional arrays because your data may already be in that format. After that we'll look at combining data into dataframes and how that can make visualizing your data easier." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Starting Up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the easiest ways to use Julia is with an IPython Notebook. This allows you to edit the code, add annotations, and keep your plots just as I'm doing here (see Appendix One for installation instructions). This notebook is available on github so you can copy and paste from it or use it as you wish. To start IJulia, open a terminal, change to the directory in which you are saving your notebooks and perhaps your data and enter this command:\n", "\n", " ipython notebook --profile julia\n", "\n", "That will open an IPython Dashboard and you can open an existing notebook from that directory or begin fresh with **New Notebook**. In Julia, when you want to use a package you start by entering \"using packagename\" and then wait a few seconds for it to load. Lets begin:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# we only need the package Gadfly today\n", "using Gadfly" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "# First read in your files\n", "\n", "# if the file's separator was a comma then you don't need to specify it\n", "# similarly, its a default to read the first row as a header row\n", "# if it was a tab separated file with a header row we would use:\n", "# mydat = readtable(\"filenameNoHeader.csv\", separator='\\t')\n", "\n", "d_age = collect(readdlm(\"f_age.csv\"))\n", "d_sex = collect(readdlm(\"f_sex.csv\"))\n", "d_dbp = collect(readdlm(\"f_dBP.csv\")) ;\n", "\n", "# open 3 files and store them\n", "# collect is used to create one dimensional arrays instead of 2d arrays because\n", "# each of these files has one column of data\n", "# If we'd had two columns in f_sex.csv then we'd skip the collect() and address\n", "# the columns as d_sex[1] and d_sex[2]\n", "# note the semicolon on the last line to stop Julia printing the final output\n", "\n", "# lets just check what we read into the arrays\n", "print(\"sa \", size(d_age), \" ss \", size(d_sex), \" sd \", size(d_dbp),)\n", "# and lets have a look at the first few rows of column 1 for each one\n", "d_age[1:6]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "sa (" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "50,) ss (50,) sd (50,)" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "6-element Array{Float64,1}:\n", " 39.0\n", " 46.0\n", " 48.0\n", " 61.0\n", " 46.0\n", " 43.0" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "# I can do that one at a time or use a trick instead\n", "# [array1 array2 array3] with spaces between the output arrays \n", "# concatenates them into 3 columns and displays them\n", "[d_age[1:6] d_sex[1:6] d_dbp[1:6]]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "6x3 Array{Any,2}:\n", " 39.0 \"F\" 70.0\n", " 46.0 \"M\" 81.0\n", " 48.0 \"F\" 80.0\n", " 61.0 \"M\" 95.0\n", " 46.0 \"M\" 84.0\n", " 43.0 \"M\" 110.0" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "# Im interested in the age distribution so lets plot a histogram\n", "plot(x=d_age, Geom.histogram)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(120.0,80.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,12319,9223372036854775807,12320),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "Plot(...)" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "# Its good practice to check coarse and fine histograms\n", "plot(x=d_age, Geom.histogram(bincount=25))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(120.0,80.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,20374,9223372036854775807,20375),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "Plot(...)" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "# And lets look at a box plot. First narrow the plot:\n", "set_default_plot_size(6cm, 10cm)\n", "plot(y=d_age, Geom.boxplot, Theme(boxplot_spacing=10mm))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(60.0,100.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,10053,9223372036854775807,10054),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "Plot(...)" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "# what if I want to compare men with women?\n", "set_default_plot_size(8cm, 10cm)\n", "plot(x=d_sex, y=d_age, Geom.boxplot, Theme(boxplot_spacing=15mm))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(80.0,100.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,12328,9223372036854775807,12329),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "Plot(...)" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "# If I want a summary of the statistics for the sample its easy to get it\n", "[mean(d_age), std(d_age), mode(d_age), \"\", quantile(d_age,[0.75,0.5,0.25])]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "7-element Array{Any,1}:\n", " 47.86 \n", " 8.1941\n", " 43.0 \n", " \"\" \n", " 52.75 \n", " 46.0 \n", " 42.0 " ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "# ok lets do a scatter plot\n", "# resize the plot to something larger\n", "set_default_plot_size(20cm, 12cm)\n", "plot(x=1:50, y=d_age)\n", "\n", "# note that rather than enter the number of rows I could have used the size function\n", "# and entered plot(x=1:size(d_age,1), y=d_age)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,16120,9223372036854775807,16121),false,0,[],[],1,5,[0x1d9604fd10b1e8fa=>([31.89135220125786,35.121226415094334,38.351100628930816,41.58097484276729,44.81084905660377,48.040723270440246,51.27059748427672,54.5004716981132,57.730345911949684,60.960220125786165 \u2026 161.08632075471698,164.31619496855348,167.54606918238994,170.7759433962264,174.0058176100629,177.23569182389937,180.46556603773584,183.69544025157234,186.9253144654088,190.15518867924527],0),0x0950cb1b68275f87=>([75.40796645702306,60.10188679245284,55.7287211740042,27.303144654088058,60.10188679245284,66.66163522012579,22.92997903563942,62.28846960167716,46.98238993710692,66.66163522012579 \u2026 68.84821802935011,81.96771488469602,66.66163522012579,71.03480083857443,46.98238993710692,42.609224318658285,44.795807127882604,53.54213836477988,18.55681341719078,60.10188679245284],1)],true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "Plot(...)" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "# So having looked at the plot we decide to plot it with an estimated confidence interval\n", "# and a loess smoothing of the data. Plus lets make the labels more relevant.\n", "# For the confidence intervals we add Geom.errorbar and calculate a min and max\n", "# For the smoothing we add Geom.smooth but we have to add Geom.point because although\n", "# its the default it will be replaced by any other Geom. If we wanted a line \n", "# we could use Geom.line.\n", "# Lets show which respondents are male and which are female with the color function\n", "# If we do that then 2 smoothing lines are drawn (if we commented out line 4, then only one)\n", "# Finally notice that the plot command isn't on one line anymore - the brackets contain it.\n", "\n", "plot(x=1:size(d_age,1), y=d_age, \n", " Guide.xlabel(\"Respondent\"), Guide.ylabel(\"Age\"),\n", " Geom.errorbar, ymin=d_age-1.96*std(d_age), ymax=d_age+1.96*std(d_age),\n", " color=collect(d_sex), Guide.colorkey(\"Sex\"),\n", " Geom.smooth, Geom.point)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,53212,9223372036854775807,53213),false,0,[],[],1,5,[0x816dffa0e25a2390=>({LCHab(55.0,60.0,240.0),LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0),LCHab(55.0,60.0,240.0) \u2026 LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0)},3),0x3ab7712e54ecfa88=>([\"geometry color_F\",\"geometry color_M\",\"geometry color_F\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_F\",\"geometry color_F\" \u2026 \"geometry color_M\",\"geometry color_F\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_F\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_F\"],2),0x4b834682e917bb0d=>([LCHab(70.0,60.0,240.0),LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0),LCHab(70.0,60.0,240.0) \u2026 LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0)],4),0x6c4f407bb81ce376=>([32.61620627989059,35.571577680735714,38.52694908158084,41.48232048242597,44.43769188327109,47.393063284116224,50.34843468496135,53.30380608580647,56.2591774866516,59.214548887496726 \u2026 150.83106231369564,153.78643371454075,156.7418051153859,159.69717651623102,162.65254791707616,165.60791931792124,168.56329071876638,171.51866211961152,174.47403352045666,177.4294049213018],0),0x1c42c01d1803e116=>([60.97651991614256,54.854088050314466,53.10482180293501,41.73459119496855,54.854088050314466,57.477987421383645,39.9853249475891,55.72872117400419,49.6062893081761,57.477987421383645 \u2026 58.35262054507337,63.600419287211736,57.477987421383645,59.2272536687631,49.6062893081761,47.85702306079664,48.731656184486376,52.23018867924528,38.23605870020964,54.854088050314466],1)],true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "Plot(...)" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "# The other major chart type is the bar chart, comparing y with x so lets\n", "# compare blood pressure with age\n", "set_default_plot_size(20cm, 12cm)\n", "plot(x=d_age, y=d_dbp, Geom.bar, Geom.smooth)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,34966,9223372036854775807,34967),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "Plot(...)" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "# and using color to identify sex\n", "plot(x=d_age, y=d_dbp, color=d_sex, Geom.bar(position=:dodge))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,29396,9223372036854775807,29397),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "Plot(...)" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "# We don't have pie charts currently but you might prefer a normalized stacked bar chart anyway\n", "#plot(x=d_age, y=d_dbp, Geom.normbar)\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Finally for this section**. What if we wanted to save one of our charts in a drawing or a document format? Gadfly makes that easy. All you need to do is wrap your plot in a function to draw it." ] }, { "cell_type": "code", "collapsed": false, "input": [ "draw(PNG(\"myplot.png\", 6inch, 3inch), plot(x=d_age, y=d_dbp, Geom.bar))\n", "\n", "# or with a plot object:\n", "p = plot(x=1:size(d_age,1), y=d_age, \n", " Guide.xlabel(\"Respondent\"), Guide.ylabel(\"Age\"),\n", " Geom.errorbar, ymin=d_age-1.96*std(d_age), ymax=d_age+1.96*std(d_age),\n", " color=collect(d_sex), Guide.colorkey(\"Sex\"),\n", " Geom.smooth, Geom.point)\n", "\n", "draw(PDF(\"myplot.pdf\", 6inch, 3inch), p)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 18 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Using DataFrames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A DataFrame is like a table or a spreadsheet: Columns of data in an array, usually with headings at the top. \n", "\n", "If you use headings then you will be able to choose columns to plot with the heading which is a little easier than remembering all the column numbers. Also Gadfly will use them as element headings for the visualization unless you override them." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# we want to use Gadfly and Dataframes today\n", "using Gadfly; using DataFrames" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "# First read in your file into a dataframe with readtable\n", "df = readtable(\"filename.csv\")\n", "\n", "# if the file's separator was a comma then you don't need to specify it\n", "# if it was a tab separated file with no header row we would use:\n", "# mydat = readtable(\"filenameNoHeader.csv\", separator='\\t', header=false)\n", "\n", "# lets just check what we read into mydata\n", "print(\"size is \", size(df))\n", "# and lets have a look at the first few rows and columns\n", "# because its a single frame no tricks are needed to display it\n", "df[1:3, 1:size(df,2)]\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "size is (" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "50,7)" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "3x7 DataFrame:\n", " IX Sex Age sBP dBP Drink BMI\n", "[1,] 0 1 39 106.0 70.0 0 26.97\n", "[2,] 1 2 46 121.0 81.0 0 28.73\n", "[3,] 2 1 48 127.5 80.0 1 25.34\n" ] } ], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "# I'd prefer M and F to 2 and 1, similarly Y and N for Drink\n", "df[\"Sex\"]=ifelse(df[\"Sex\"].==1, \"F\", \"M\") \n", "df[\"Drink\"]=ifelse(df[\"Drink\"].==1, \"Y\", \"N\")\n", "df[1:6, 1:size(df,2)]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 21, "text": [ "6x7 DataFrame:\n", " IX Sex Age sBP dBP Drink BMI\n", "[1,] 0 \"F\" 39 106.0 70.0 \"N\" 26.97\n", "[2,] 1 \"M\" 46 121.0 81.0 \"N\" 28.73\n", "[3,] 2 \"F\" 48 127.5 80.0 \"Y\" 25.34\n", "[4,] 3 \"M\" 61 150.0 95.0 \"Y\" 28.58\n", "[5,] 4 \"M\" 46 130.0 84.0 \"Y\" 23.1\n", "[6,] 5 \"M\" 43 180.0 110.0 \"N\" 30.3\n" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "# Im interested in the age distribution so lets plot a histogram\n", "plot(df, x=\"Age\", Geom.histogram(bincount=6))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,13116,9223372036854775807,13117),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 22, "text": [ "Plot(...)" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "# Its good practice to check coarse and fine histograms\n", "plot(df, x=3, Geom.histogram(bincount=15))\n", "# note that I just entered the column number instead of its heading" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,16248,9223372036854775807,16249),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ "Plot(...)" ] } ], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "# and lets look at box plots again but lets do the original two side by side\n", "# but first lets convert the 1s and 2s in sex to \"F\" and \"M\"\n", "#df = df[df[\"Sex\"].==1 ? \"F\" : \"M\",:]\n", "hstack( plot(df, y=\"Age\", Geom.boxplot), \n", " plot(df, x=\"Sex\", y=\"Age\", Geom.boxplot),\n", " plot(df, x=\"Drink\", y=\"Age\", Geom.boxplot) \n", ")" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "svg": [ "\n", "\n", "\n", " \n", " \n", " 80\n", " 120\n", " 10\n", " -10\n", " 20\n", " 50\n", " 30\n", " 100\n", " 40\n", " 0\n", " 70\n", " -20\n", " 110\n", " 60\n", " 90\n", " \n", " \n", " Age\n", " \n", " \n", " \n", " \n", " N\n", " Y\n", " \n", " \n", " Drink\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " 80\n", " 120\n", " 10\n", " -10\n", " 20\n", " 50\n", " 30\n", " 100\n", " 40\n", " 0\n", " 70\n", " -20\n", " 110\n", " 60\n", " 90\n", " \n", " \n", " Age\n", " \n", " \n", " \n", " \n", " F\n", " M\n", " \n", " \n", " Sex\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " 80\n", " 120\n", " 10\n", " -10\n", " 20\n", " 50\n", " 30\n", " 100\n", " 40\n", " 0\n", " 70\n", " -20\n", " 110\n", " 60\n", " 90\n", " \n", " \n", " Age\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", "\n" ], "text": [ "SVG(120.0,80.0,IOBuffer([0x3c,0x3f,0x78,0x6d,0x6c,0x20,0x76,0x65,0x72,0x73 \u2026 0x73,0x3e,0x0a,0x3c,0x2f,0x73,0x76,0x67,0x3e,0x0a],true,true,true,false,16227,9223372036854775807,16228),false,0,Dict{String,String}(),[[Point(85.0mm,1.0mm),Point(103.81666666666666mm,1.0mm),Point(103.81666666666666mm,58.71111111111111mm),Point(85.0mm,58.71111111111111mm)],[Point(102.69833333333332mm,57.71111111111111mm),Point(116.11833333333334mm,57.71111111111111mm),Point(116.11833333333334mm,75.0mm),Point(102.69833333333332mm,75.0mm)],[Point(103.81666666666666mm,5.0mm),Point(115.0mm,5.0mm),Point(115.0mm,57.71111111111111mm),Point(103.81666666666666mm,57.71111111111111mm)],[Point(103.81666666666666mm,5.0mm),Point(115.0mm,5.0mm),Point(115.0mm,57.71111111111111mm),Point(103.81666666666666mm,57.71111111111111mm)],[Point(103.81666666666666mm,5.0mm),Point(115.0mm,5.0mm),Point(115.0mm,57.71111111111111mm),Point(103.81666666666666mm,57.71111111111111mm)],[Point(45.0mm,1.0mm),Point(63.81666666666666mm,1.0mm),Point(63.81666666666666mm,58.71111111111111mm),Point(45.0mm,58.71111111111111mm)],[Point(62.69833333333333mm,57.71111111111111mm),Point(76.11833333333334mm,57.71111111111111mm),Point(76.11833333333334mm,75.0mm),Point(62.69833333333333mm,75.0mm)],[Point(63.81666666666666mm,5.0mm),Point(75.0mm,5.0mm),Point(75.0mm,57.71111111111111mm),Point(63.81666666666666mm,57.71111111111111mm)],[Point(63.81666666666666mm,5.0mm),Point(75.0mm,5.0mm),Point(75.0mm,57.71111111111111mm),Point(63.81666666666666mm,57.71111111111111mm)],[Point(63.81666666666666mm,5.0mm),Point(75.0mm,5.0mm),Point(75.0mm,57.71111111111111mm),Point(63.81666666666666mm,57.71111111111111mm)],[Point(5.0mm,1.0mm),Point(23.816666666666663mm,1.0mm),Point(23.816666666666663mm,74.0mm),Point(5.0mm,74.0mm)],[Point(22.69833333333333mm,73.0mm),Point(36.11833333333333mm,73.0mm),Point(36.11833333333333mm,75.0mm),Point(22.69833333333333mm,75.0mm)],[Point(23.816666666666663mm,5.0mm),Point(35.0mm,5.0mm),Point(35.0mm,73.0mm),Point(23.816666666666663mm,73.0mm)],[Point(23.816666666666663mm,5.0mm),Point(35.0mm,5.0mm),Point(35.0mm,73.0mm),Point(23.816666666666663mm,73.0mm)],[Point(23.816666666666663mm,5.0mm),Point(35.0mm,5.0mm),Point(35.0mm,73.0mm),Point(23.816666666666663mm,73.0mm)]],Set{String}(),[],[],[],true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 24, "text": [ "Canvas(List([Canvas(List([Canvas(List([Canvas(List([Canvas(List([Canvas(List([])), Canvas(List([]))])), Canvas(List([Canvas(List([])), Canvas(List([]))])), Canvas(List([])), Canvas(List([])), Canvas(List([Canvas(List([Canvas(List([Canvas(List([Canvas(List([])), Canvas(List([]))]))]))])), Canvas(List([])), Canvas(List([Canvas(List([])), Canvas(List([])), Canvas(List([]))]))]))]))]))])), Canvas(List([Canvas(List([Canvas(List([Canvas(List([Canvas(List([])), Canvas(List([]))])), Canvas(List([Canvas(List([])), Canvas(List([]))])), Canvas(List([])), Canvas(List([])), Canvas(List([Canvas(List([Canvas(List([Canvas(List([Canvas(List([])), Canvas(List([]))]))]))])), Canvas(List([])), Canvas(List([Canvas(List([])), Canvas(List([])), Canvas(List([]))]))]))]))]))])), Canvas(List([Canvas(List([Canvas(List([Canvas(List([Canvas(List([])), Canvas(List([]))])), Canvas(List([Canvas(List([]))])), Canvas(List([])), Canvas(List([])), Canvas(List([Canvas(List([Canvas(List([Canvas(List([Canvas(List([])), Canvas(List([]))]))]))])), Canvas(List([])), Canvas(List([Canvas(List([])), Canvas(List([])), Canvas(List([]))]))]))]))]))]))]))" ] } ], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "# Its quite a bit easier with dataframes. Similarly we can display all stats\n", "describe(df)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "IX\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Min 0.0" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "1st Qu. 12.25\n", "Median 24.5\n", "Mean 24.5\n", "3rd Qu. 36.75\n", "Max 49.0\n", "NAs 0\n", "NA% 0.0%\n", "\n", "Sex\n", "Length 50\n", "Type ASCIIString\n", "NAs 0\n", "NA% 0.0%\n", "Unique 2" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "\n", "Age\n", "Min 35.0\n", "1st Qu. 42.0\n", "Median 46.0\n", "Mean 47.86\n", "3rd Qu. 52.75\n", "Max 65.0\n", "NAs 0\n", "NA% 0.0%\n", "\n", "sBP\n", "Min 96.0\n", "1st Qu. 119.5\n", "Median 131.5\n", "Mean 133.5\n", "3rd Qu. 145.75\n", "Max 206.0\n", "NAs 0\n", "NA% 0.0%\n", "\n", "dBP\n", "Min 63.0\n", "1st Qu. 76.375\n", "Median 84.25\n", "Mean 84.86\n", "3rd Qu. 90.75\n", "Max 121.0\n", "NAs 0\n", "NA% 0.0%\n", "\n", "Drink\n", "Length 50\n", "Type ASCIIString\n", "NAs 0\n", "NA% 0.0%\n", "Unique 2\n", "\n", "BMI\n", "Min 18.59\n", "1st Qu. 23.25\n", "Median 26.15\n", "Mean 26.374199999999995\n", "3rd Qu. 28.472499999999997\n", "Max 40.11\n", "NAs 0\n", "NA% 0.0%\n", "\n" ] } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "# And the standard deviations\n", "[\"Age\" std(df[\"Age\"]) \"sBP\" std(df[\"sBP\"]) \"dBP\" std(df[\"dBP\"]) \"BMI\" std(df[\"BMI\"])]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 26, "text": [ "1x8 Array{Any,2}:\n", " \"Age\" 8.1941 \"sBP\" 22.6628 \"dBP\" 12.5921 \"BMI\" 4.41216" ] } ], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "# ok lets do a scatter plot\n", "# resize the plot to something larger\n", "set_default_plot_size(20cm, 12cm)\n", "plot(df, x=\"IX\", y=\"sBP\")" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,16126,9223372036854775807,16127),false,0,[],[],1,5,[0xdff24cf330c20da7=>([83.42543675751222,72.49252271139062,67.75492662473795,51.35555555555555,65.93277428371768,29.489727463312363,60.101886792452824,87.79860237596085,57.550873515024456,42.60922431865828 \u2026 49.16897274633123,79.78113207547169,75.77239692522711,71.76366177498252,52.81327742837177,64.47505241090147,10.539343116701602,90.71404612159328,29.85415793151642,73.95024458420684],1),0x861cf0fafd174f78=>([28.661477987421378,31.89135220125786,35.121226415094334,38.351100628930816,41.58097484276729,44.81084905660377,48.040723270440246,51.27059748427672,54.5004716981132,57.730345911949684 \u2026 157.8564465408805,161.08632075471698,164.31619496855348,167.54606918238994,170.7759433962264,174.0058176100629,177.23569182389937,180.46556603773584,183.69544025157234,186.9253144654088],0)],true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "Plot(...)" ] } ], "prompt_number": 27 }, { "cell_type": "code", "collapsed": false, "input": [ "# So having looked at the plot we decide to plot it with an estimated confidence interval\n", "# and a loess smoothing of the data. Plus lets make the labels more relevant.\n", "# For the confidence intervals we add Geom.errorbar and calculate a min and max\n", "# For the smoothing we add Geom.smooth but we have to add Geom.point because although\n", "# its the default it will be replaced by any other Geom. If we wanted a line \n", "# we could use Geom.line.\n", "# Lets show which respondents are male and which are female with the color function\n", "# If we do that then 2 smoothing lines are drawn (if we commented out line 4, then only one)\n", "# Finally notice that the plot command isn't on one line anymore - the brackets contain it.\n", "\n", "plot(df, x=\"IX\", y=\"sBP\",\n", " Guide.xlabel(\"Respondent\"), Guide.ylabel(\"Blood Pressure\"),\n", " Geom.errorbar, ymin=df[\"sBP\"]-1.96*std(df[\"sBP\"]), ymax=df[\"sBP\"]+1.96*std(df[\"sBP\"]),\n", " color=\"Sex\", # Guide.colorkey(\"Sex\"),\n", " Geom.smooth, Geom.point)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,51338,9223372036854775807,51339),false,0,[],[],1,5,[0x80533680bfbc0f10=>([29.660834879045463,32.61620627989059,35.571577680735714,38.52694908158084,41.48232048242597,44.43769188327109,47.393063284116224,50.34843468496135,53.30380608580647,56.2591774866516 \u2026 147.87569091285053,150.83106231369564,153.78643371454075,156.7418051153859,159.69717651623102,162.65254791707616,165.60791931792124,168.56329071876638,171.51866211961152,174.47403352045666],0),0x381c8a11af4ec1a7=>([\"geometry color_F\",\"geometry color_M\",\"geometry color_F\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_F\",\"geometry color_F\" \u2026 \"geometry color_M\",\"geometry color_F\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_F\",\"geometry color_M\",\"geometry color_M\",\"geometry color_M\",\"geometry color_F\"],2),0x49cbae3052acf2f8=>([64.18350803633822,59.810342417889586,57.91530398322851,51.355555555555554,57.1864430468204,42.609224318658285,54.854088050314466,65.93277428371768,53.83368273934312,47.85702306079664 \u2026 50.48092243186583,62.72578616352201,61.12229210342418,59.51879804332634,51.938644304682036,56.60335429769392,35.02907058001398,67.09895178197064,42.7549965059399,60.39343116701607],1),0x32bb809d248cb572=>({LCHab(55.0,60.0,240.0),LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0),LCHab(55.0,60.0,240.0) \u2026 LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(65.0,70.0,100.43478260869566),LCHab(55.0,60.0,240.0)},3),0x4e38fa27ec19aac3=>([LCHab(70.0,60.0,240.0),LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0),LCHab(70.0,60.0,240.0) \u2026 LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(80.0,70.0,100.43478260869566),LCHab(70.0,60.0,240.0)],4)],true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 28, "text": [ "Plot(...)" ] } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "# The other major chart type is the bar chart, comparing y with x so lets\n", "# compare blood pressure with age\n", "set_default_plot_size(20cm, 12cm)\n", "plot(df, x=\"Age\", y=\"sBP\", Geom.bar, Geom.smooth)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,35021,9223372036854775807,35022),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 29, "text": [ "Plot(...)" ] } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "# and using color to identify sex\n", "plot(df, x=\"Age\", y=\"sBP\", color=\"Sex\", Geom.bar(position=:dodge))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "D3(200.0,120.0,IOBuffer([0x66,0x75,0x6e,0x63,0x74,0x69,0x6f,0x6e,0x20,0x64 \u2026 0x74,0x5f,0x69,0x64,0x29,0x3b,0x0a,0x7d,0x3b,0x0a],true,true,true,false,29417,9223372036854775807,29418),false,0,[],[],0,5,Dict{Uint64,(Any,Int64)}(),true,true)" ] }, { "html": [], "metadata": {}, "output_type": "pyout", "prompt_number": 30, "text": [ "Plot(...)" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "# We don't have pie charts currently but you might prefer a normalized stacked bar chart anyway\n", "#plot(x=d_age, y=d_dbp, Geom.normbar)\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 31 }, { "cell_type": "code", "collapsed": false, "input": [ "# Its easy to save your visualizations as png, pdf or ps files\n", "# draw(format(\"filename.formatsuffix\", width, height),(plot object or command))\n", "draw(PNG(\"myplot.png\", 6inch, 3inch), plot(df, x=\"Age\", y=\"dBP\", Geom.bar))\n", "\n", "# or with a plot object:\n", "p = plot(df, x=\"IX\", y=\"Age\", \n", " Guide.xlabel(\"Respondent\"), # Guide.ylabel(\"Age\"),\n", " Geom.errorbar, ymin=df[\"Age\"]-1.96*std(df[\"Age\"]), ymax=df[\"Age\"]+1.96*std(df[\"Age\"]),\n", " color=\"Sex\", # Guide.colorkey(\"Sex\"),\n", " Geom.smooth, Geom.point)\n", "\n", "draw(PDF(\"myplot.pdf\", 6inch, 3inch), p)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Appendix One - Installing and Updating**\n", "\n", "There are three components to install: Julia, IPython, and the Julia packages you need.\n", "\n", "Instruction for installing Julia are [here](http://julialang.org/downloads/).\n", "\n", "Instructions fo installing IPython to support Julia are [here](https://github.com/JuliaLang/IJulia.jl).\n", "\n", "You can add packages simply. Heres a couple of lines to install the most likely packages (reinstallation does no harm if your not sure what you have).\n", "\n", " Pkg.Add(\"IJulia\") ; Pkg.Add(\"DataFrames\") ; Pkg.Add(\"Gadfly\")\n", " Pkg.Add(\"Stats\") ; Pkg.Add(\"GLM\") ; Pkg.Add(\"Distributions\")\n", "\n", "From time to time it pays to check that your packages are up to date. Do that now with:\n", "\n", " Pkg.Update() \n", "\n", "Thats it. But if you have any issues then see the Julia page above and if still confused then just ask at the [Julia Users Group](https://groups.google.com/forum/#!forum/julia-users) " ] } ], "metadata": {} } ] }