{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook assumes that you've already created the file `roman.db` in part one of the [intro to sql](intro to sql.ipynb) notebook. It shows how to load a sqlite database into R." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# first we get the packages we need:\n", "library(DBI)\n", "library(RSQLite)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. '_source_info_'
  2. \n", "\t
  3. 'amphi'
  4. \n", "\t
  5. 'aqua'
  6. \n", "\t
  7. 'sqlite_sequence'
  8. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item '\\_source\\_info\\_'\n", "\\item 'amphi'\n", "\\item 'aqua'\n", "\\item 'sqlite\\_sequence'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. '_source_info_'\n", "2. 'amphi'\n", "3. 'aqua'\n", "4. 'sqlite_sequence'\n", "\n", "\n" ], "text/plain": [ "[1] \"_source_info_\" \"amphi\" \"aqua\" \"sqlite_sequence\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# now we open the connection:\n", "\n", "con = dbConnect(SQLite(), dbname=\"roman.db\")\n", "\n", "# we can see what's inside, eg, what tables are in the database?\n", "alltables = dbListTables(con)\n", "alltables\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# write the query to get the information you want\n", "\n", "myQuery <- dbSendQuery(con, \"SELECT * FROM amphi WHERE chronogroup = 'flavian'\")\n", "\n", "# pass that information to an R object. The n = -1 bit means grab everything until there's nothing left to grab. Otherwise, you can specify how many rows etc.\n", "\n", "my_data <- dbFetch(myQuery, n = -1)\n", "\n", "# You will get some warning messages as R makes sure that the data in each column is of the same type." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now to view your dataframe, we just ask:\n", "\n", "my_data" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# now that we're done, clear cache \n", "# so that we don't eat up all the memory\n", "dbClearResult(myQuery)\n", "\n", "# now carry on and begin manipulating my_data\n", "# for more information see\n", "# http://tiffanytimbers.com/querying-sqlite-databases-from-r/\n", "# also perhaps this https://www.r-bloggers.com/using-sqlite-in-r/" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# let's make a new dataframe with just the elevation and capacity columns\n", "# you could have done this with an SQL query from our original database.\n", "# here we're just going to extract from the dataframe we created above\n", "\n", "amphi_stats <- subset(my_data, select=c(\"id\",\"elevation\",\"capacity\"))\n", "\n", "#\"create new datafram called amph_stats and pass into it a subset of my_data, returning the array id, elevation, capacity\"" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
idelevationcapacity
arlesAmphitheater 21 20000
romeFlavianAmphitheater 22 50000
newsteadAmphitheater 83 1500
pozzuoliFlavianAmphitheater 41 35700
segobrigaAmphitheater 817 7500
parisAmphitheater 46 15000
nimesAmphitheater 50 24000
chesterAmphitheater 23 0
arezzoAmphitheater 263 8000
londonAmphitheater 29 6500
urbisagliaAmphitheater 219 5150
vindonissaAmphitheater 361 15142
frejusAmphitheater 12 12000
tebessaAmphitheater 877 9400
aquileiaAmphitheater 2 0
terminiImereseAmphitheater 76 0
bolsenaAmphitheater 417 0
narbonneAmphitheater 14 0
capuaImperialAmphitheater 34 37000
dorchesterAmphitheater 78 0
autunAmphitheater 317 0
bobadelaAmphitheater 383 0
chichesterAmphitheater 14 0
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " id & elevation & capacity\\\\\n", "\\hline\n", "\t arlesAmphitheater & 21 & 20000 \\\\\n", "\t romeFlavianAmphitheater & 22 & 50000 \\\\\n", "\t newsteadAmphitheater & 83 & 1500 \\\\\n", "\t pozzuoliFlavianAmphitheater & 41 & 35700 \\\\\n", "\t segobrigaAmphitheater & 817 & 7500 \\\\\n", "\t parisAmphitheater & 46 & 15000 \\\\\n", "\t nimesAmphitheater & 50 & 24000 \\\\\n", "\t chesterAmphitheater & 23 & 0 \\\\\n", "\t arezzoAmphitheater & 263 & 8000 \\\\\n", "\t londonAmphitheater & 29 & 6500 \\\\\n", "\t urbisagliaAmphitheater & 219 & 5150 \\\\\n", "\t vindonissaAmphitheater & 361 & 15142 \\\\\n", "\t frejusAmphitheater & 12 & 12000 \\\\\n", "\t tebessaAmphitheater & 877 & 9400 \\\\\n", "\t aquileiaAmphitheater & 2 & 0 \\\\\n", "\t terminiImereseAmphitheater & 76 & 0 \\\\\n", "\t bolsenaAmphitheater & 417 & 0 \\\\\n", "\t narbonneAmphitheater & 14 & 0 \\\\\n", "\t capuaImperialAmphitheater & 34 & 37000 \\\\\n", "\t dorchesterAmphitheater & 78 & 0 \\\\\n", "\t autunAmphitheater & 317 & 0 \\\\\n", "\t bobadelaAmphitheater & 383 & 0 \\\\\n", "\t chichesterAmphitheater & 14 & 0 \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "id | elevation | capacity | \n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| arlesAmphitheater | 21 | 20000 | \n", "| romeFlavianAmphitheater | 22 | 50000 | \n", "| newsteadAmphitheater | 83 | 1500 | \n", "| pozzuoliFlavianAmphitheater | 41 | 35700 | \n", "| segobrigaAmphitheater | 817 | 7500 | \n", "| parisAmphitheater | 46 | 15000 | \n", "| nimesAmphitheater | 50 | 24000 | \n", "| chesterAmphitheater | 23 | 0 | \n", "| arezzoAmphitheater | 263 | 8000 | \n", "| londonAmphitheater | 29 | 6500 | \n", "| urbisagliaAmphitheater | 219 | 5150 | \n", "| vindonissaAmphitheater | 361 | 15142 | \n", "| frejusAmphitheater | 12 | 12000 | \n", "| tebessaAmphitheater | 877 | 9400 | \n", "| aquileiaAmphitheater | 2 | 0 | \n", "| terminiImereseAmphitheater | 76 | 0 | \n", "| bolsenaAmphitheater | 417 | 0 | \n", "| narbonneAmphitheater | 14 | 0 | \n", "| capuaImperialAmphitheater | 34 | 37000 | \n", "| dorchesterAmphitheater | 78 | 0 | \n", "| autunAmphitheater | 317 | 0 | \n", "| bobadelaAmphitheater | 383 | 0 | \n", "| chichesterAmphitheater | 14 | 0 | \n", "\n", "\n" ], "text/plain": [ " id elevation capacity\n", "1 arlesAmphitheater 21 20000 \n", "2 romeFlavianAmphitheater 22 50000 \n", "3 newsteadAmphitheater 83 1500 \n", "4 pozzuoliFlavianAmphitheater 41 35700 \n", "5 segobrigaAmphitheater 817 7500 \n", "6 parisAmphitheater 46 15000 \n", "7 nimesAmphitheater 50 24000 \n", "8 chesterAmphitheater 23 0 \n", "9 arezzoAmphitheater 263 8000 \n", "10 londonAmphitheater 29 6500 \n", "11 urbisagliaAmphitheater 219 5150 \n", "12 vindonissaAmphitheater 361 15142 \n", "13 frejusAmphitheater 12 12000 \n", "14 tebessaAmphitheater 877 9400 \n", "15 aquileiaAmphitheater 2 0 \n", "16 terminiImereseAmphitheater 76 0 \n", "17 bolsenaAmphitheater 417 0 \n", "18 narbonneAmphitheater 14 0 \n", "19 capuaImperialAmphitheater 34 37000 \n", "20 dorchesterAmphitheater 78 0 \n", "21 autunAmphitheater 317 0 \n", "22 bobadelaAmphitheater 383 0 \n", "23 chichesterAmphitheater 14 0 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "amphi_stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, now that you know how to get your data out of the database, you could do a number of other kinds of data manipulations; it really depends on the kind of investigation you wish to do. Consult Marwick et al [How to Do Archaeological Science Using R](https://benmarwick.github.io/How-To-Do-Archaeological-Science-Using-R/index.html).\n", "\n", "We'll end this notebook with a simple visual plot of the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# a simple x,y plot\n", "# we can call for the columns in the dataframe by refering to the name of the frame and using the $ to specify the column therein.\n", "plot(amphi_stats$elevation, amphi_stats$capacity)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "png: 2" ], "text/latex": [ "\\textbf{png:} 2" ], "text/markdown": [ "**png:** 2" ], "text/plain": [ "png \n", " 2 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# but that's ugly. let's make it pretty \n", "# and write the result to file.\n", "\n", "# Get the input values. \n", "input <- amphi_stats[,c('elevation','capacity')]\n", "\n", "# Give the chart file a name.\n", "png(file = \"R-scatterplot.png\")\n", "\n", "# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.\n", "plot(x = input$elevation,y = input$capacity,\n", " xlab = \"Elevation\",\n", " ylab = \"Capacity\", \n", " xlim = c(0,1000),\n", " ylim = c(500,60000),\n", " main = \"Elevation vs Capacity in Flavian Amphitheatres\"\n", ")\n", "\t \n", "# Save the file.\n", "dev.off()\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The scatterplot is now available if you right-click on 'jupyter' at the top of the window and open in a new browser window; you'll see it in the list of files." ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.4.4" } }, "nbformat": 4, "nbformat_minor": 2 }