{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Systems Immunogenetics Project\n", "\n", "## WNV Cleaning Steps\n", "\n", "### McWeeney Lab, Oregon Health & Science University\n", "\n", "** Authors: Gabrielle Choonoo (choonoo@ohsu.edu) and Michael Mooney (mooneymi@ohsu.edu) **\n", "\n", "## Introduction\n", "\n", "This is the step-by-step process for cleaning the WNV qPCR data (ByLine and ByMouse).\n", "\n", "Required Files:\n", "* qPCR ByLine and ByMouse Data Files\n", "* This notebook** (`SIG_WNV_qPCR_Data_Cleaning.ipynb`): [[Download here]](https://raw.githubusercontent.com/biodev/SIG/master/SIG_WNV_qPCR_Data_Cleaning.ipynb)\n", "* The R script (`qpcr_data_cleaning_functions.r`): [[Download here]](https://raw.githubusercontent.com/biodev/SIG/master/scripts/qpcr_data_cleaning_functions.r)\n", "* The data dictionary containing all qPCR variables (`WNV_Data_Dictionary.xlsx`): [[Download here]](https://raw.githubusercontent.com/biodev/SIG/master/data/WNV_Data_Dictionary.xlsx)\n", "\n", "** Note: this notebook can also be downloaded as an R script (only the code blocks seen below will be included): [[Download R script here]](https://raw.githubusercontent.com/biodev/SIG/master/SIG_WNV_qPCR_Data_Cleaning.r)\n", "\n", "Required R packages:\n", "- `gdata` - [https://cran.r-project.org/web/packages/gdata/index.html](https://cran.r-project.org/web/packages/gdata/index.html)\n", "\n", "** All code is available on GitHub: [https://github.com/biodev/SIG](https://github.com/biodev/SIG) **" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1. Load Necessary R Packages and Functions" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.\n", "\n", "gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.\n", "\n", "Attaching package: ‘gdata’\n", "\n", "The following object is masked from ‘package:stats’:\n", "\n", " nobs\n", "\n", "The following object is masked from ‘package:utils’:\n", "\n", " object.size\n", "\n" ] } ], "source": [ "source('./scripts/qpcr_data_cleaning_functions.r')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2. Read ByLine Data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<table>\n", "<thead><tr><th></th><th scope=col>UW_Line</th><th scope=col>Mating</th><th scope=col>Timepoint</th><th scope=col>Condition</th><th scope=col>Tissue</th><th scope=col>Experiment</th><th scope=col>N</th><th scope=col>dCt.mean</th><th scope=col>dCt.sd</th><th scope=col>baseline.dCt</th><th scope=col>ddCt.mean</th><th scope=col>ddCt.sd</th><th scope=col>fc.mean</th><th scope=col>fc.sd</th></tr></thead>\n", "<tbody>\n", "\t<tr><th scope=row>1</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>IFIT1</td><td>3</td><td>7.060444</td><td>4.44931</td><td>12.28855</td><td>-5.228111</td><td>4.44931</td><td>37.4816</td><td>4.44931</td></tr>\n", "\t<tr><th scope=row>2</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>IFITM1</td><td>3</td><td>7.164444</td><td>1.29619</td><td>8.593667</td><td>-1.429223</td><td>1.29619</td><td>2.693016</td><td>1.29619</td></tr>\n", "\t<tr><th scope=row>3</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>IFNb1</td><td>3</td><td>14.47589</td><td>5.691944</td><td>18.79678</td><td>-4.32089</td><td>5.691944</td><td>19.98561</td><td>5.691944</td></tr>\n", "\t<tr><th scope=row>4</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>IL12b</td><td>3</td><td>13.98678</td><td>5.186199</td><td>18.61556</td><td>-4.628778</td><td>5.186199</td><td>24.74008</td><td>5.186199</td></tr>\n", "\t<tr><th scope=row>5</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>WNV</td><td>3</td><td>12.963</td><td>7.274413</td><td>19.51256</td><td>-6.549556</td><td>7.274413</td><td>93.67266</td><td>7.274413</td></tr>\n", "\t<tr><th scope=row>6</th><td>4</td><td>16188x3252</td><td>12M</td><td>B_d12M</td><td>Brain</td><td>IFIT1</td><td>3</td><td>12.28855</td><td>0.7611464</td><td>12.28855</td><td>0</td><td>0.7611464</td><td>1</td><td>0.7611464</td></tr>\n", "</tbody>\n", "</table>\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllllll}\n", " & UW_Line & Mating & Timepoint & Condition & Tissue & Experiment & N & dCt.mean & dCt.sd & baseline.dCt & ddCt.mean & ddCt.sd & fc.mean & fc.sd\\\\\n", "\\hline\n", "\t1 & 4 & 16188x3252 & 12 & B_d12 & Brain & IFIT1 & 3 & 7.060444 & 4.44931 & 12.28855 & -5.228111 & 4.44931 & 37.4816 & 4.44931\\\\\n", "\t2 & 4 & 16188x3252 & 12 & B_d12 & Brain & IFITM1 & 3 & 7.164444 & 1.29619 & 8.593667 & -1.429223 & 1.29619 & 2.693016 & 1.29619\\\\\n", "\t3 & 4 & 16188x3252 & 12 & B_d12 & Brain & IFNb1 & 3 & 14.47589 & 5.691944 & 18.79678 & -4.32089 & 5.691944 & 19.98561 & 5.691944\\\\\n", "\t4 & 4 & 16188x3252 & 12 & B_d12 & Brain & IL12b & 3 & 13.98678 & 5.186199 & 18.61556 & -4.628778 & 5.186199 & 24.74008 & 5.186199\\\\\n", "\t5 & 4 & 16188x3252 & 12 & B_d12 & Brain & WNV & 3 & 12.963 & 7.274413 & 19.51256 & -6.549556 & 7.274413 & 93.67266 & 7.274413\\\\\n", "\t6 & 4 & 16188x3252 & 12M & B_d12M & Brain & IFIT1 & 3 & 12.28855 & 0.7611464 & 12.28855 & 0 & 0.7611464 & 1 & 0.7611464\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " UW_Line Mating Timepoint Condition Tissue Experiment N dCt.mean\n", "1 4 16188x3252 12 B_d12 Brain IFIT1 3 7.060444\n", "2 4 16188x3252 12 B_d12 Brain IFITM1 3 7.164444\n", "3 4 16188x3252 12 B_d12 Brain IFNb1 3 14.475888\n", "4 4 16188x3252 12 B_d12 Brain IL12b 3 13.986777\n", "5 4 16188x3252 12 B_d12 Brain WNV 3 12.963000\n", "6 4 16188x3252 12M B_d12M Brain IFIT1 3 12.288555\n", " dCt.sd baseline.dCt ddCt.mean ddCt.sd fc.mean fc.sd\n", "1 4.4493099 12.288555 -5.228111 4.4493099 37.481600 4.4493099\n", "2 1.2961898 8.593667 -1.429223 1.2961898 2.693016 1.2961898\n", "3 5.6919441 18.796778 -4.320890 5.6919441 19.985608 5.6919441\n", "4 5.1861991 18.615555 -4.628778 5.1861991 24.740076 5.1861991\n", "5 7.2744126 19.512556 -6.549556 7.2744126 93.672658 7.2744126\n", "6 0.7611464 12.288555 0.000000 0.7611464 1.000000 0.7611464" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>1430</li>\n", "\t<li>14</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1430\n", "\\item 14\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1430\n", "2. 14\n", "\n", "\n" ], "text/plain": [ "[1] 1430 14" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Set data directory\n", "data_dir = \"/Users/mooneymi/Documents/SIG/WNV/qPCR\"\n", "\n", "## Read in data (byLine)\n", "qpcr_data = read.xls(file.path(data_dir, \"16-May-2016/Gale_qPCR_byLine_5-16-16 %282%29.xlsx\"), sheet=1)\n", "\n", "head(qpcr_data)\n", "dim(qpcr_data)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>1430</li>\n", "\t<li>14</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1430\n", "\\item 14\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1430\n", "2. 14\n", "\n", "\n" ], "text/plain": [ "[1] 1430 14" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Replace line 82 with fixed data (special case for May 16, 2016 data)\n", "qpcr_data = qpcr_data[qpcr_data$UW_Line != 82, ]\n", "\n", "line_82 = read.xls(file.path(data_dir, \"18-May-2016/Gale_qPCR_byLine_5-18-16.xlsx\"), sheet=1)\n", "qpcr_data = rbind(qpcr_data, line_82)\n", "\n", "dim(qpcr_data)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>1560</li>\n", "\t<li>14</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1560\n", "\\item 14\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1560\n", "2. 14\n", "\n", "\n" ], "text/plain": [ "[1] 1560 14" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Replace lines 54 and 58 with fixed data (special case for May 16, 2016 data)\n", "qpcr_data = qpcr_data[!qpcr_data$UW_Line %in% c(54, 58), ]\n", "\n", "lines_54_58 = read.xls(file.path(data_dir, \"23-May-2016/Gale_qPCR_byLine_5-23-16.xlsx\"), sheet=1)\n", "qpcr_data = rbind(qpcr_data, lines_54_58)\n", "\n", "dim(qpcr_data)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "0" ], "text/latex": [ "0" ], "text/markdown": [ "0" ], "text/plain": [ "[1] 0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check for duplicates\n", "sum(duplicated(qpcr_data))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<dl class=dl-horizontal>\n", "\t<dt>4</dt>\n", "\t\t<dd>16188x3252</dd>\n", "\t<dt>42</dt>\n", "\t\t<dd>8008x8016</dd>\n", "\t<dt>45</dt>\n", "\t\t<dd>16441x8024</dd>\n", "\t<dt>46</dt>\n", "\t\t<dd>8048x15155</dd>\n", "\t<dt>48</dt>\n", "\t\t<dd>13140x16680</dd>\n", "\t<dt>61</dt>\n", "\t\t<dd>8056x8033</dd>\n", "\t<dt>62</dt>\n", "\t\t<dd>8054x8036</dd>\n", "\t<dt>70</dt>\n", "\t\t<dd>8045x4410</dd>\n", "\t<dt>71</dt>\n", "\t\t<dd>3564x8027</dd>\n", "\t<dt>72</dt>\n", "\t\t<dd>5035x16785</dd>\n", "\t<dt>73</dt>\n", "\t\t<dd>5358x8046</dd>\n", "\t<dt>74</dt>\n", "\t\t<dd>8046x8004</dd>\n", "\t<dt>75</dt>\n", "\t\t<dd>8016x8004</dd>\n", "\t<dt>76</dt>\n", "\t\t<dd>8024x8048</dd>\n", "\t<dt>77</dt>\n", "\t\t<dd>8034x8043</dd>\n", "\t<dt>78</dt>\n", "\t\t<dd>13421x16034</dd>\n", "\t<dt>79</dt>\n", "\t\t<dd>16034x13067</dd>\n", "\t<dt>80</dt>\n", "\t\t<dd>16521x3260</dd>\n", "\t<dt>81</dt>\n", "\t\t<dd>16072x5346</dd>\n", "\t<dt>113</dt>\n", "\t\t<dd>8027x477</dd>\n", "\t<dt>82</dt>\n", "\t\t<dd>16557x3154</dd>\n", "\t<dt>54</dt>\n", "\t\t<dd>8036x18018</dd>\n", "\t<dt>58</dt>\n", "\t\t<dd>5346x16768</dd>\n", "</dl>\n" ], "text/latex": [ "\\begin{description*}\n", "\\item[4] 16188x3252\n", "\\item[42] 8008x8016\n", "\\item[45] 16441x8024\n", "\\item[46] 8048x15155\n", "\\item[48] 13140x16680\n", "\\item[61] 8056x8033\n", "\\item[62] 8054x8036\n", "\\item[70] 8045x4410\n", "\\item[71] 3564x8027\n", "\\item[72] 5035x16785\n", "\\item[73] 5358x8046\n", "\\item[74] 8046x8004\n", "\\item[75] 8016x8004\n", "\\item[76] 8024x8048\n", "\\item[77] 8034x8043\n", "\\item[78] 13421x16034\n", "\\item[79] 16034x13067\n", "\\item[80] 16521x3260\n", "\\item[81] 16072x5346\n", "\\item[113] 8027x477\n", "\\item[82] 16557x3154\n", "\\item[54] 8036x18018\n", "\\item[58] 5346x16768\n", "\\end{description*}\n" ], "text/markdown": [ "4\n", ": 16188x325242\n", ": 8008x801645\n", ": 16441x802446\n", ": 8048x1515548\n", ": 13140x1668061\n", ": 8056x803362\n", ": 8054x803670\n", ": 8045x441071\n", ": 3564x802772\n", ": 5035x1678573\n", ": 5358x804674\n", ": 8046x800475\n", ": 8016x800476\n", ": 8024x804877\n", ": 8034x804378\n", ": 13421x1603479\n", ": 16034x1306780\n", ": 16521x326081\n", ": 16072x5346113\n", ": 8027x47782\n", ": 16557x315454\n", ": 8036x1801858\n", ": 5346x16768\n", "\n" ], "text/plain": [ " 4 42 45 46 48 61 \n", " 16188x3252 8008x8016 16441x8024 8048x15155 13140x16680 8056x8033 \n", " 62 70 71 72 73 74 \n", " 8054x8036 8045x4410 3564x8027 5035x16785 5358x8046 8046x8004 \n", " 75 76 77 78 79 80 \n", " 8016x8004 8024x8048 8034x8043 13421x16034 16034x13067 16521x3260 \n", " 81 113 82 54 58 \n", " 16072x5346 8027x477 16557x3154 8036x18018 5346x16768 \n", "23 Levels: 13140x16680 13421x16034 16034x13067 16072x5346 ... 8056x8033" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check that each line has a single mating\n", "line_matings = sapply(unique(qpcr_data$UW_Line), function(x){unique(qpcr_data$Mating[qpcr_data$UW_Line==x])})\n", "names(line_matings) = unique(qpcr_data$UW_Line)\n", "line_matings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3. Format and Clean ByLine Data" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Add Data_Altered and Notes columns\n", "qpcr_data$Data_Altered = NA\n", "qpcr_data$Notes = NA" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Annotate Virus\n", "qpcr_data$Virus = NA\n", "qpcr_data[grep(\"M\",qpcr_data[,\"Timepoint\"]),\"Virus\"] <- \"Mock\"\n", "qpcr_data[-grep(\"M\",qpcr_data[,\"Timepoint\"]),\"Virus\"] <- \"WNV\"" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Remove 'M' from time points and convert to numeric\n", "qpcr_data[,\"Timepoint\"] <- as.numeric(as.character(gsub(\"M\",\"\",qpcr_data[,\"Timepoint\"])))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "<dl class=dl-horizontal>\n", "\t<dt>IFIT1</dt>\n", "\t\t<dd>312</dd>\n", "\t<dt>IFITM1</dt>\n", "\t\t<dd>312</dd>\n", "\t<dt>IFNb1</dt>\n", "\t\t<dd>312</dd>\n", "\t<dt>IL12b</dt>\n", "\t\t<dd>312</dd>\n", "\t<dt>WNV</dt>\n", "\t\t<dd>312</dd>\n", "</dl>\n" ], "text/latex": [ "\\begin{description*}\n", "\\item[IFIT1] 312\n", "\\item[IFITM1] 312\n", "\\item[IFNb1] 312\n", "\\item[IL12b] 312\n", "\\item[WNV] 312\n", "\\end{description*}\n" ], "text/markdown": [ "IFIT1\n", ": 312IFITM1\n", ": 312IFNb1\n", ": 312IL12b\n", ": 312WNV\n", ": 312\n", "\n" ], "text/plain": [ " IFIT1 IFITM1 IFNb1 IL12b WNV \n", " 312 312 312 312 312 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check experiment names\n", "sum(names(summary(qpcr_data[,\"Experiment\"])) == c(\"IFIT1\",\"IFITM1\", \"IFNb1\", \"IL12b\", \"WNV\")) == 5\n", "summary(qpcr_data[,\"Experiment\"])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Add Group column: UW Line, Timepoint, Virus, Tissue, Experiment separated by \"_\"\n", "qpcr_data$Group = paste(qpcr_data$UW_Line, qpcr_data$Timepoint, qpcr_data$Virus, qpcr_data$Tissue, qpcr_data$Experiment, sep=\"_\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Add Lab column\n", "qpcr_data$Lab = \"Gale\"" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<table>\n", "<thead><tr><th></th><th scope=col>UW_Line</th><th scope=col>Mating</th><th scope=col>Timepoint</th><th scope=col>Condition</th><th scope=col>Tissue</th><th scope=col>Experiment</th><th scope=col>N</th><th scope=col>dCt.mean</th><th scope=col>dCt.sd</th><th scope=col>baseline.dCt</th><th scope=col>ddCt.mean</th><th scope=col>ddCt.sd</th><th scope=col>fc.mean</th><th scope=col>fc.sd</th><th scope=col>Data_Altered</th><th scope=col>Notes</th><th scope=col>Virus</th><th scope=col>Group</th><th scope=col>Lab</th></tr></thead>\n", "<tbody>\n", "\t<tr><th scope=row>1</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>IFIT1</td><td>3</td><td>7.060444</td><td>4.44931</td><td>12.28855</td><td>-5.228111</td><td>4.44931</td><td>37.4816</td><td>4.44931</td><td>NA</td><td>NA</td><td>WNV</td><td>4_12_WNV_Brain_IFIT1</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>2</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>IFITM1</td><td>3</td><td>7.164444</td><td>1.29619</td><td>8.593667</td><td>-1.429223</td><td>1.29619</td><td>2.693016</td><td>1.29619</td><td>NA</td><td>NA</td><td>WNV</td><td>4_12_WNV_Brain_IFITM1</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>3</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>IFNb1</td><td>3</td><td>14.47589</td><td>5.691944</td><td>18.79678</td><td>-4.32089</td><td>5.691944</td><td>19.98561</td><td>5.691944</td><td>NA</td><td>NA</td><td>WNV</td><td>4_12_WNV_Brain_IFNb1</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>4</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>IL12b</td><td>3</td><td>13.98678</td><td>5.186199</td><td>18.61556</td><td>-4.628778</td><td>5.186199</td><td>24.74008</td><td>5.186199</td><td>NA</td><td>NA</td><td>WNV</td><td>4_12_WNV_Brain_IL12b</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>5</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12</td><td>Brain</td><td>WNV</td><td>3</td><td>12.963</td><td>7.274413</td><td>19.51256</td><td>-6.549556</td><td>7.274413</td><td>93.67266</td><td>7.274413</td><td>NA</td><td>NA</td><td>WNV</td><td>4_12_WNV_Brain_WNV</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>6</th><td>4</td><td>16188x3252</td><td>12</td><td>B_d12M</td><td>Brain</td><td>IFIT1</td><td>3</td><td>12.28855</td><td>0.7611464</td><td>12.28855</td><td>0</td><td>0.7611464</td><td>1</td><td>0.7611464</td><td>NA</td><td>NA</td><td>Mock</td><td>4_12_Mock_Brain_IFIT1</td><td>Gale</td></tr>\n", "</tbody>\n", "</table>\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllll}\n", " & UW_Line & Mating & Timepoint & Condition & Tissue & Experiment & N & dCt.mean & dCt.sd & baseline.dCt & ddCt.mean & ddCt.sd & fc.mean & fc.sd & Data_Altered & Notes & Virus & Group & Lab\\\\\n", "\\hline\n", "\t1 & 4 & 16188x3252 & 12 & B_d12 & Brain & IFIT1 & 3 & 7.060444 & 4.44931 & 12.28855 & -5.228111 & 4.44931 & 37.4816 & 4.44931 & NA & NA & WNV & 4_12_WNV_Brain_IFIT1 & Gale\\\\\n", "\t2 & 4 & 16188x3252 & 12 & B_d12 & Brain & IFITM1 & 3 & 7.164444 & 1.29619 & 8.593667 & -1.429223 & 1.29619 & 2.693016 & 1.29619 & NA & NA & WNV & 4_12_WNV_Brain_IFITM1 & Gale\\\\\n", "\t3 & 4 & 16188x3252 & 12 & B_d12 & Brain & IFNb1 & 3 & 14.47589 & 5.691944 & 18.79678 & -4.32089 & 5.691944 & 19.98561 & 5.691944 & NA & NA & WNV & 4_12_WNV_Brain_IFNb1 & Gale\\\\\n", "\t4 & 4 & 16188x3252 & 12 & B_d12 & Brain & IL12b & 3 & 13.98678 & 5.186199 & 18.61556 & -4.628778 & 5.186199 & 24.74008 & 5.186199 & NA & NA & WNV & 4_12_WNV_Brain_IL12b & Gale\\\\\n", "\t5 & 4 & 16188x3252 & 12 & B_d12 & Brain & WNV & 3 & 12.963 & 7.274413 & 19.51256 & -6.549556 & 7.274413 & 93.67266 & 7.274413 & NA & NA & WNV & 4_12_WNV_Brain_WNV & Gale\\\\\n", "\t6 & 4 & 16188x3252 & 12 & B_d12M & Brain & IFIT1 & 3 & 12.28855 & 0.7611464 & 12.28855 & 0 & 0.7611464 & 1 & 0.7611464 & NA & NA & Mock & 4_12_Mock_Brain_IFIT1 & Gale\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " UW_Line Mating Timepoint Condition Tissue Experiment N dCt.mean\n", "1 4 16188x3252 12 B_d12 Brain IFIT1 3 7.060444\n", "2 4 16188x3252 12 B_d12 Brain IFITM1 3 7.164444\n", "3 4 16188x3252 12 B_d12 Brain IFNb1 3 14.475888\n", "4 4 16188x3252 12 B_d12 Brain IL12b 3 13.986777\n", "5 4 16188x3252 12 B_d12 Brain WNV 3 12.963000\n", "6 4 16188x3252 12 B_d12M Brain IFIT1 3 12.288555\n", " dCt.sd baseline.dCt ddCt.mean ddCt.sd fc.mean fc.sd Data_Altered\n", "1 4.4493099 12.288555 -5.228111 4.4493099 37.481600 4.4493099 NA\n", "2 1.2961898 8.593667 -1.429223 1.2961898 2.693016 1.2961898 NA\n", "3 5.6919441 18.796778 -4.320890 5.6919441 19.985608 5.6919441 NA\n", "4 5.1861991 18.615555 -4.628778 5.1861991 24.740076 5.1861991 NA\n", "5 7.2744126 19.512556 -6.549556 7.2744126 93.672658 7.2744126 NA\n", "6 0.7611464 12.288555 0.000000 0.7611464 1.000000 0.7611464 NA\n", " Notes Virus Group Lab\n", "1 NA WNV 4_12_WNV_Brain_IFIT1 Gale\n", "2 NA WNV 4_12_WNV_Brain_IFITM1 Gale\n", "3 NA WNV 4_12_WNV_Brain_IFNb1 Gale\n", "4 NA WNV 4_12_WNV_Brain_IL12b Gale\n", "5 NA WNV 4_12_WNV_Brain_WNV Gale\n", "6 NA Mock 4_12_Mock_Brain_IFIT1 Gale" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "head(qpcr_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4. Read ByMouse Data" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<table>\n", "<thead><tr><th></th><th scope=col>ID</th><th scope=col>Mating</th><th scope=col>RIX_ID</th><th scope=col>UW_Line</th><th scope=col>UWID</th><th scope=col>Sample_Name</th><th scope=col>Experiment</th><th scope=col>Tissue</th><th scope=col>Condition</th><th scope=col>Timepoint</th><th scope=col>Ct</th><th scope=col>Ct.sd</th><th scope=col>ref.Ct</th><th scope=col>ref.sd</th><th scope=col>dCt</th><th scope=col>dCt.linear</th><th scope=col>dCt.sd</th></tr></thead>\n", "<tbody>\n", "\t<tr><th scope=row>1</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>S 1.12</td><td>IFIT1</td><td>Spleen</td><td>WNV</td><td>12</td><td>30.31133</td><td>0.1537316</td><td>20.94267</td><td>0.08401435</td><td>9.368667</td><td>0.001512691</td><td>0.1751908</td></tr>\n", "\t<tr><th scope=row>2</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>B 1.12</td><td>IFITM1</td><td>Brain</td><td>WNV</td><td>12</td><td>26.49567</td><td>0.1571378</td><td>20.519</td><td>0.076237</td><td>5.976665</td><td>0.01587978</td><td>0.174655</td></tr>\n", "\t<tr><th scope=row>3</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>K 1.12</td><td>IFIT1</td><td>Kidney</td><td>WNV</td><td>12</td><td>29.47433</td><td>0.09113332</td><td>19.27533</td><td>0.04600315</td><td>10.199</td><td>0.0008507361</td><td>0.1020861</td></tr>\n", "\t<tr><th scope=row>4</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>K 1.12</td><td>IL12b</td><td>Kidney</td><td>WNV</td><td>12</td><td>40</td><td>0</td><td>19.27533</td><td>0.04600315</td><td>20.72467</td><td>5.77e-07</td><td>0.04600315</td></tr>\n", "\t<tr><th scope=row>5</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>B 1.12</td><td>IFNb1</td><td>Brain</td><td>WNV</td><td>12</td><td>29.12167</td><td>0.1936756</td><td>20.519</td><td>0.076237</td><td>8.602666</td><td>0.002572405</td><td>0.2081402</td></tr>\n", "\t<tr><th scope=row>6</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>B 1.12</td><td>IFIT1</td><td>Brain</td><td>WNV</td><td>12</td><td>23.525</td><td>0.02586539</td><td>20.519</td><td>0.076237</td><td>3.006</td><td>0.1244812</td><td>0.08050527</td></tr>\n", "</tbody>\n", "</table>\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllll}\n", " & ID & Mating & RIX_ID & UW_Line & UWID & Sample_Name & Experiment & Tissue & Condition & Timepoint & Ct & Ct.sd & ref.Ct & ref.sd & dCt & dCt.linear & dCt.sd\\\\\n", "\\hline\n", "\t1 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & S 1.12 & IFIT1 & Spleen & WNV & 12 & 30.31133 & 0.1537316 & 20.94267 & 0.08401435 & 9.368667 & 0.001512691 & 0.1751908\\\\\n", "\t2 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & B 1.12 & IFITM1 & Brain & WNV & 12 & 26.49567 & 0.1571378 & 20.519 & 0.076237 & 5.976665 & 0.01587978 & 0.174655\\\\\n", "\t3 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & K 1.12 & IFIT1 & Kidney & WNV & 12 & 29.47433 & 0.09113332 & 19.27533 & 0.04600315 & 10.199 & 0.0008507361 & 0.1020861\\\\\n", "\t4 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & K 1.12 & IL12b & Kidney & WNV & 12 & 40 & 0 & 19.27533 & 0.04600315 & 20.72467 & 5.77e-07 & 0.04600315\\\\\n", "\t5 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & B 1.12 & IFNb1 & Brain & WNV & 12 & 29.12167 & 0.1936756 & 20.519 & 0.076237 & 8.602666 & 0.002572405 & 0.2081402\\\\\n", "\t6 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & B 1.12 & IFIT1 & Brain & WNV & 12 & 23.525 & 0.02586539 & 20.519 & 0.076237 & 3.006 & 0.1244812 & 0.08050527\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " ID Mating RIX_ID UW_Line UWID Sample_Name Experiment Tissue\n", "1 16188x3252_248 16188x3252 248 4 1.12 S 1.12 IFIT1 Spleen\n", "2 16188x3252_248 16188x3252 248 4 1.12 B 1.12 IFITM1 Brain\n", "3 16188x3252_248 16188x3252 248 4 1.12 K 1.12 IFIT1 Kidney\n", "4 16188x3252_248 16188x3252 248 4 1.12 K 1.12 IL12b Kidney\n", "5 16188x3252_248 16188x3252 248 4 1.12 B 1.12 IFNb1 Brain\n", "6 16188x3252_248 16188x3252 248 4 1.12 B 1.12 IFIT1 Brain\n", " Condition Timepoint Ct Ct.sd ref.Ct ref.sd dCt\n", "1 WNV 12 30.31133 0.15373160 20.94267 0.08401435 9.368667\n", "2 WNV 12 26.49567 0.15713780 20.51900 0.07623700 5.976665\n", "3 WNV 12 29.47433 0.09113332 19.27533 0.04600315 10.199001\n", "4 WNV 12 40.00000 0.00000000 19.27533 0.04600315 20.724667\n", "5 WNV 12 29.12167 0.19367563 20.51900 0.07623700 8.602666\n", "6 WNV 12 23.52500 0.02586539 20.51900 0.07623700 3.006000\n", " dCt.linear dCt.sd\n", "1 0.0015126910 0.17519080\n", "2 0.0158797774 0.17465500\n", "3 0.0008507361 0.10208610\n", "4 0.0000005770 0.04600315\n", "5 0.0025724055 0.20814017\n", "6 0.1244812292 0.08050527" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>4210</li>\n", "\t<li>17</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 4210\n", "\\item 17\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 4210\n", "2. 17\n", "\n", "\n" ], "text/plain": [ "[1] 4210 17" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Read in data (byMouse)\n", "qpcr_data_mouse = read.xls(file.path(data_dir, \"16-May-2016/Gale_qPCR_byMouse_5-16-16 %281%29.xlsx\"), sheet=1)\n", "\n", "head(qpcr_data_mouse)\n", "dim(qpcr_data_mouse)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>4215</li>\n", "\t<li>17</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 4215\n", "\\item 17\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 4215\n", "2. 17\n", "\n", "\n" ], "text/plain": [ "[1] 4215 17" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Replace line 82 with fixed data (special case for May 16, 2016 data)\n", "qpcr_data_mouse = qpcr_data_mouse[qpcr_data_mouse$UW_Line != 82, ]\n", "\n", "line_82_mouse = read.xls(file.path(data_dir, \"18-May-2016/Gale_qPCR_byMouse_5-18-16.xlsx\"), sheet=1)\n", "qpcr_data_mouse = rbind(qpcr_data_mouse, line_82_mouse)\n", "\n", "dim(qpcr_data_mouse)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>4600</li>\n", "\t<li>17</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 4600\n", "\\item 17\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 4600\n", "2. 17\n", "\n", "\n" ], "text/plain": [ "[1] 4600 17" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Replace lines 54 and 58 with fixed data (special case for May 16, 2016 data)\n", "qpcr_data_mouse = qpcr_data_mouse[!qpcr_data_mouse$UW_Line %in% c(54, 58), ]\n", "\n", "lines_54_58_mouse = read.xls(file.path(data_dir, \"23-May-2016/Gale_qPCR_byMouse_5-23-16.xlsx\"), sheet=1)\n", "qpcr_data_mouse = rbind(qpcr_data_mouse, lines_54_58_mouse)\n", "\n", "dim(qpcr_data_mouse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5. Format and Clean ByMouse Data" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "0" ], "text/latex": [ "0" ], "text/markdown": [ "0" ], "text/plain": [ "[1] 0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check if any CT < 15\n", "length(which(qpcr_data_mouse[,\"Ct\"] < 15))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "0" ], "text/latex": [ "0" ], "text/markdown": [ "0" ], "text/plain": [ "[1] 0" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check if any reference CT < 15\n", "length(which(qpcr_data_mouse[,\"ref.Ct\"] < 15))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "0" ], "text/latex": [ "0" ], "text/markdown": [ "0" ], "text/plain": [ "[1] 0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check if an reference CT == 40\n", "length(which(qpcr_data_mouse[,\"ref.Ct\"] == 40))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<dl class=dl-horizontal>\n", "\t<dt>4</dt>\n", "\t\t<dd>16188x3252</dd>\n", "\t<dt>42</dt>\n", "\t\t<dd>8008x8016</dd>\n", "\t<dt>45</dt>\n", "\t\t<dd>16441x8024</dd>\n", "\t<dt>46</dt>\n", "\t\t<dd>8048x15155</dd>\n", "\t<dt>48</dt>\n", "\t\t<dd>13140x16680</dd>\n", "\t<dt>61</dt>\n", "\t\t<dd>8056x8033</dd>\n", "\t<dt>62</dt>\n", "\t\t<dd>8054x8036</dd>\n", "\t<dt>70</dt>\n", "\t\t<dd>8045x4410</dd>\n", "\t<dt>71</dt>\n", "\t\t<dd>3564x8027</dd>\n", "\t<dt>72</dt>\n", "\t\t<dd>5035x16785</dd>\n", "\t<dt>73</dt>\n", "\t\t<dd>5358x8046</dd>\n", "\t<dt>74</dt>\n", "\t\t<dd>8046x8004</dd>\n", "\t<dt>75</dt>\n", "\t\t<dd>8016x8004</dd>\n", "\t<dt>76</dt>\n", "\t\t<dd>8024x8048</dd>\n", "\t<dt>77</dt>\n", "\t\t<dd>8034x8043</dd>\n", "\t<dt>78</dt>\n", "\t\t<dd>13421x16034</dd>\n", "\t<dt>79</dt>\n", "\t\t<dd>16034x13067</dd>\n", "\t<dt>80</dt>\n", "\t\t<dd>16521x3260</dd>\n", "\t<dt>81</dt>\n", "\t\t<dd>16072x5346</dd>\n", "\t<dt>113</dt>\n", "\t\t<dd>8027x477</dd>\n", "\t<dt>82</dt>\n", "\t\t<dd>16557x3154</dd>\n", "\t<dt>54</dt>\n", "\t\t<dd>8036x18018</dd>\n", "\t<dt>58</dt>\n", "\t\t<dd>5346x16768</dd>\n", "</dl>\n" ], "text/latex": [ "\\begin{description*}\n", "\\item[4] 16188x3252\n", "\\item[42] 8008x8016\n", "\\item[45] 16441x8024\n", "\\item[46] 8048x15155\n", "\\item[48] 13140x16680\n", "\\item[61] 8056x8033\n", "\\item[62] 8054x8036\n", "\\item[70] 8045x4410\n", "\\item[71] 3564x8027\n", "\\item[72] 5035x16785\n", "\\item[73] 5358x8046\n", "\\item[74] 8046x8004\n", "\\item[75] 8016x8004\n", "\\item[76] 8024x8048\n", "\\item[77] 8034x8043\n", "\\item[78] 13421x16034\n", "\\item[79] 16034x13067\n", "\\item[80] 16521x3260\n", "\\item[81] 16072x5346\n", "\\item[113] 8027x477\n", "\\item[82] 16557x3154\n", "\\item[54] 8036x18018\n", "\\item[58] 5346x16768\n", "\\end{description*}\n" ], "text/markdown": [ "4\n", ": 16188x325242\n", ": 8008x801645\n", ": 16441x802446\n", ": 8048x1515548\n", ": 13140x1668061\n", ": 8056x803362\n", ": 8054x803670\n", ": 8045x441071\n", ": 3564x802772\n", ": 5035x1678573\n", ": 5358x804674\n", ": 8046x800475\n", ": 8016x800476\n", ": 8024x804877\n", ": 8034x804378\n", ": 13421x1603479\n", ": 16034x1306780\n", ": 16521x326081\n", ": 16072x5346113\n", ": 8027x47782\n", ": 16557x315454\n", ": 8036x1801858\n", ": 5346x16768\n", "\n" ], "text/plain": [ " 4 42 45 46 48 61 \n", " 16188x3252 8008x8016 16441x8024 8048x15155 13140x16680 8056x8033 \n", " 62 70 71 72 73 74 \n", " 8054x8036 8045x4410 3564x8027 5035x16785 5358x8046 8046x8004 \n", " 75 76 77 78 79 80 \n", " 8016x8004 8024x8048 8034x8043 13421x16034 16034x13067 16521x3260 \n", " 81 113 82 54 58 \n", " 16072x5346 8027x477 16557x3154 8036x18018 5346x16768 \n", "23 Levels: 13140x16680 13421x16034 16034x13067 16072x5346 ... 8056x8033" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check that each line has a single mating\n", "mouse_line_matings = sapply(unique(qpcr_data_mouse$UW_Line), function(x){unique(qpcr_data_mouse$Mating[qpcr_data_mouse$UW_Line==x])})\n", "names(mouse_line_matings) = unique(qpcr_data_mouse$UW_Line)\n", "mouse_line_matings" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Remove 'M' from time points and convert to numeric\n", "qpcr_data_mouse[,\"Timepoint\"] <- as.numeric(as.character(gsub(\"M\",\"\",qpcr_data_mouse[,\"Timepoint\"])))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Change Condition column name to Virus\n", "names(qpcr_data_mouse)[which(names(qpcr_data_mouse) == \"Condition\")] <- \"Virus\"" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "<dl class=dl-horizontal>\n", "\t<dt>IFIT1</dt>\n", "\t\t<dd>920</dd>\n", "\t<dt>IFITM1</dt>\n", "\t\t<dd>920</dd>\n", "\t<dt>IFNb1</dt>\n", "\t\t<dd>920</dd>\n", "\t<dt>IL12b</dt>\n", "\t\t<dd>920</dd>\n", "\t<dt>WNV</dt>\n", "\t\t<dd>920</dd>\n", "</dl>\n" ], "text/latex": [ "\\begin{description*}\n", "\\item[IFIT1] 920\n", "\\item[IFITM1] 920\n", "\\item[IFNb1] 920\n", "\\item[IL12b] 920\n", "\\item[WNV] 920\n", "\\end{description*}\n" ], "text/markdown": [ "IFIT1\n", ": 920IFITM1\n", ": 920IFNb1\n", ": 920IL12b\n", ": 920WNV\n", ": 920\n", "\n" ], "text/plain": [ " IFIT1 IFITM1 IFNb1 IL12b WNV \n", " 920 920 920 920 920 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check experiment names\n", "sum(names(summary(qpcr_data_mouse[,\"Experiment\"])) == c(\"IFIT1\",\"IFITM1\", \"IFNb1\", \"IL12b\", \"WNV\")) == 5\n", "summary(qpcr_data_mouse[,\"Experiment\"])" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Add Group column: UW Line, Timepoint, Virus, Tissue, Experiment separated by \"_\"\n", "qpcr_data_mouse$Group <- paste(qpcr_data_mouse$UW_Line, qpcr_data_mouse$Timepoint, qpcr_data_mouse$Virus, \n", " qpcr_data_mouse$Tissue, qpcr_data_mouse$Experiment, sep=\"_\")" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "0" ], "text/latex": [ "0" ], "text/markdown": [ "0" ], "text/plain": [ "[1] 0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "0" ], "text/latex": [ "0" ], "text/markdown": [ "0" ], "text/plain": [ "[1] 0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check that ByLine file contains data from only those animals in ByMouse file\n", "length(setdiff(qpcr_data$Group, qpcr_data_mouse$Group))\n", "length(setdiff(qpcr_data_mouse$Group, qpcr_data$Group))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Add Data_Altered and Notes columns\n", "qpcr_data_mouse$Data_Altered = NA\n", "qpcr_data_mouse$Notes = NA" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Add Lab column\n", "qpcr_data_mouse$Lab = \"Gale\"" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<table>\n", "<thead><tr><th></th><th scope=col>ID</th><th scope=col>Mating</th><th scope=col>RIX_ID</th><th scope=col>UW_Line</th><th scope=col>UWID</th><th scope=col>Sample_Name</th><th scope=col>Experiment</th><th scope=col>Tissue</th><th scope=col>Virus</th><th scope=col>Timepoint</th><th scope=col>Ct</th><th scope=col>Ct.sd</th><th scope=col>ref.Ct</th><th scope=col>ref.sd</th><th scope=col>dCt</th><th scope=col>dCt.linear</th><th scope=col>dCt.sd</th><th scope=col>Group</th><th scope=col>Data_Altered</th><th scope=col>Notes</th><th scope=col>Lab</th></tr></thead>\n", "<tbody>\n", "\t<tr><th scope=row>1</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>S 1.12</td><td>IFIT1</td><td>Spleen</td><td>WNV</td><td>12</td><td>30.31133</td><td>0.1537316</td><td>20.94267</td><td>0.08401435</td><td>9.368667</td><td>0.001512691</td><td>0.1751908</td><td>4_12_WNV_Spleen_IFIT1</td><td>NA</td><td>NA</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>2</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>B 1.12</td><td>IFITM1</td><td>Brain</td><td>WNV</td><td>12</td><td>26.49567</td><td>0.1571378</td><td>20.519</td><td>0.076237</td><td>5.976665</td><td>0.01587978</td><td>0.174655</td><td>4_12_WNV_Brain_IFITM1</td><td>NA</td><td>NA</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>3</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>K 1.12</td><td>IFIT1</td><td>Kidney</td><td>WNV</td><td>12</td><td>29.47433</td><td>0.09113332</td><td>19.27533</td><td>0.04600315</td><td>10.199</td><td>0.0008507361</td><td>0.1020861</td><td>4_12_WNV_Kidney_IFIT1</td><td>NA</td><td>NA</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>4</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>K 1.12</td><td>IL12b</td><td>Kidney</td><td>WNV</td><td>12</td><td>40</td><td>0</td><td>19.27533</td><td>0.04600315</td><td>20.72467</td><td>5.77e-07</td><td>0.04600315</td><td>4_12_WNV_Kidney_IL12b</td><td>NA</td><td>NA</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>5</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>B 1.12</td><td>IFNb1</td><td>Brain</td><td>WNV</td><td>12</td><td>29.12167</td><td>0.1936756</td><td>20.519</td><td>0.076237</td><td>8.602666</td><td>0.002572405</td><td>0.2081402</td><td>4_12_WNV_Brain_IFNb1</td><td>NA</td><td>NA</td><td>Gale</td></tr>\n", "\t<tr><th scope=row>6</th><td>16188x3252_248</td><td>16188x3252</td><td>248</td><td>4</td><td>1.12</td><td>B 1.12</td><td>IFIT1</td><td>Brain</td><td>WNV</td><td>12</td><td>23.525</td><td>0.02586539</td><td>20.519</td><td>0.076237</td><td>3.006</td><td>0.1244812</td><td>0.08050527</td><td>4_12_WNV_Brain_IFIT1</td><td>NA</td><td>NA</td><td>Gale</td></tr>\n", "</tbody>\n", "</table>\n" ], "text/latex": [ "\\begin{tabular}{r|lllllllllllllllllllll}\n", " & ID & Mating & RIX_ID & UW_Line & UWID & Sample_Name & Experiment & Tissue & Virus & Timepoint & Ct & Ct.sd & ref.Ct & ref.sd & dCt & dCt.linear & dCt.sd & Group & Data_Altered & Notes & Lab\\\\\n", "\\hline\n", "\t1 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & S 1.12 & IFIT1 & Spleen & WNV & 12 & 30.31133 & 0.1537316 & 20.94267 & 0.08401435 & 9.368667 & 0.001512691 & 0.1751908 & 4_12_WNV_Spleen_IFIT1 & NA & NA & Gale\\\\\n", "\t2 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & B 1.12 & IFITM1 & Brain & WNV & 12 & 26.49567 & 0.1571378 & 20.519 & 0.076237 & 5.976665 & 0.01587978 & 0.174655 & 4_12_WNV_Brain_IFITM1 & NA & NA & Gale\\\\\n", "\t3 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & K 1.12 & IFIT1 & Kidney & WNV & 12 & 29.47433 & 0.09113332 & 19.27533 & 0.04600315 & 10.199 & 0.0008507361 & 0.1020861 & 4_12_WNV_Kidney_IFIT1 & NA & NA & Gale\\\\\n", "\t4 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & K 1.12 & IL12b & Kidney & WNV & 12 & 40 & 0 & 19.27533 & 0.04600315 & 20.72467 & 5.77e-07 & 0.04600315 & 4_12_WNV_Kidney_IL12b & NA & NA & Gale\\\\\n", "\t5 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & B 1.12 & IFNb1 & Brain & WNV & 12 & 29.12167 & 0.1936756 & 20.519 & 0.076237 & 8.602666 & 0.002572405 & 0.2081402 & 4_12_WNV_Brain_IFNb1 & NA & NA & Gale\\\\\n", "\t6 & 16188x3252_248 & 16188x3252 & 248 & 4 & 1.12 & B 1.12 & IFIT1 & Brain & WNV & 12 & 23.525 & 0.02586539 & 20.519 & 0.076237 & 3.006 & 0.1244812 & 0.08050527 & 4_12_WNV_Brain_IFIT1 & NA & NA & Gale\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " ID Mating RIX_ID UW_Line UWID Sample_Name Experiment Tissue\n", "1 16188x3252_248 16188x3252 248 4 1.12 S 1.12 IFIT1 Spleen\n", "2 16188x3252_248 16188x3252 248 4 1.12 B 1.12 IFITM1 Brain\n", "3 16188x3252_248 16188x3252 248 4 1.12 K 1.12 IFIT1 Kidney\n", "4 16188x3252_248 16188x3252 248 4 1.12 K 1.12 IL12b Kidney\n", "5 16188x3252_248 16188x3252 248 4 1.12 B 1.12 IFNb1 Brain\n", "6 16188x3252_248 16188x3252 248 4 1.12 B 1.12 IFIT1 Brain\n", " Virus Timepoint Ct Ct.sd ref.Ct ref.sd dCt\n", "1 WNV 12 30.31133 0.15373160 20.94267 0.08401435 9.368667\n", "2 WNV 12 26.49567 0.15713780 20.51900 0.07623700 5.976665\n", "3 WNV 12 29.47433 0.09113332 19.27533 0.04600315 10.199001\n", "4 WNV 12 40.00000 0.00000000 19.27533 0.04600315 20.724667\n", "5 WNV 12 29.12167 0.19367563 20.51900 0.07623700 8.602666\n", "6 WNV 12 23.52500 0.02586539 20.51900 0.07623700 3.006000\n", " dCt.linear dCt.sd Group Data_Altered Notes Lab\n", "1 0.0015126910 0.17519080 4_12_WNV_Spleen_IFIT1 NA NA Gale\n", "2 0.0158797774 0.17465500 4_12_WNV_Brain_IFITM1 NA NA Gale\n", "3 0.0008507361 0.10208610 4_12_WNV_Kidney_IFIT1 NA NA Gale\n", "4 0.0000005770 0.04600315 4_12_WNV_Kidney_IL12b NA NA Gale\n", "5 0.0025724055 0.20814017 4_12_WNV_Brain_IFNb1 NA NA Gale\n", "6 0.1244812292 0.08050527 4_12_WNV_Brain_IFIT1 NA NA Gale" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "head(qpcr_data_mouse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6. Calculate and Check Summary Measures" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6a. Calculate dCt mean" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"All dCt mean correct\"\n" ] } ], "source": [ "## Calculate dCt mean from byMouse data\n", "dct_mean = aggregate(formula=qpcr_data_mouse[,\"dCt\"]~qpcr_data_mouse[,\"Group\"], data=qpcr_data_mouse, FUN=mean)\n", "names(dct_mean) <- c(\"Group\",\"dCt.mean.V2\")\n", "dct_mean[order(dct_mean[,1]),] -> dct_mean_order\n", "\n", "## Get dCt mean from byLine data\n", "byline_dct_mean = unique(qpcr_data[,c(\"Group\",\"dCt.mean\")])\n", "byline_dct_mean[order(byline_dct_mean[,1]),] -> byline_dct_mean_order\n", "\n", "## Check that the dCt mean calculations are the same\n", "check_dct_mean_v2 = sapply(1:dim(dct_mean_order)[1], \n", " function(x){isTRUE(all.equal(dct_mean_order[x,2], byline_dct_mean_order[x,2]))})\n", "\n", "## Print discrepancies, if they exist\n", "if(sum(check_dct_mean_v2) == nrow(dct_mean_order)){\n", " print(\"All dCt mean correct\")\n", "} else {\n", " print(\"Need to clean dCt mean\")\n", " dct_mean_errs = cbind(byline_dct_mean_order[which(check_dct_mean_v2==F),], dct_mean_order[which(check_dct_mean_v2==F),])\n", " dct_mean_errs\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6b. Update dCt mean if necessary" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "if (dim(dct_mean_errs)[1] > 0) {\n", " for (i in 1:dim(dct_mean_errs)[1]) {\n", " qpcr_data$dCt.mean[qpcr_data$Group==dct_mean_errs[i,3]] = dct_mean_errs[i,4]\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6c. Calculate N for each group" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"All N correct\"\n" ] } ], "source": [ "## Calculate the N for each group from the byMouse data\n", "data.frame(summary(as.factor(qpcr_data_mouse[,\"Group\"]),maxsum=8000)) -> bymouse_n\n", "names(bymouse_n) <- c(\"N.V2\")\n", "bymouse_n[,2] <- row.names(bymouse_n)\n", "names(bymouse_n)[2] <- \"Group\"\n", "bymouse_n = bymouse_n[,c(2,1)]\n", "bymouse_n[order(bymouse_n[,\"Group\"]),] -> bymouse_n_order\n", "\n", "## Get the N for each group from the byLine data \n", "qpcr_data[,c(\"Group\",\"N\")] -> byline_n\n", "byline_n[order(byline_n[,\"Group\"]),] -> byline_n_order\n", "\n", "## Print discrepancies, if they exist\n", "if(sum(bymouse_n_order[,2] == byline_n_order[,2]) == nrow(bymouse_n_order)){\n", " print(\"All N correct\")\n", "} else {\n", " print(\"Need to clean N\")\n", " n_errs = cbind(byline_n_order[bymouse_n_order[,2] != byline_n_order[,2],], \n", " bymouse_n_order[bymouse_n_order[,2] != byline_n_order[,2],])\n", " n_errs\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6d. Update N if necessary" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "if (dim(n_errs)[1] > 0) {\n", " for (i in 1:dim(n_errs)[1]) {\n", " qpcr_data$N[qpcr_data$Group==n_errs[i,3]] = n_errs[i,4]\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6e. Calculate dCt SD" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"All dCt SD correct\"\n" ] } ], "source": [ "## Calculate dCt SD from the byMouse data \n", "dct_sd = aggregate(formula=qpcr_data_mouse[,\"dCt\"]~qpcr_data_mouse[,\"Group\"], data=qpcr_data_mouse, FUN=sd)\n", "names(dct_sd) <- c(\"Group\",\"dCt.sd.V2\")\n", "dct_sd[order(dct_sd[,1]),] -> dct_sd_order\n", "\n", "## Get dCt SD from the byLine data\n", "byline_dct_sd = qpcr_data[,c(\"Group\",\"dCt.sd\")]\n", "byline_dct_sd[order(byline_dct_sd[,1]),] -> byline_dct_sd_order\n", "\n", "check_dct_sd = sapply(1:dim(dct_sd_order)[1],function(x){isTRUE(all.equal(dct_sd_order[x,2],byline_dct_sd_order[x,2]))})\n", "\n", "## Print discrepancies, if they exist\n", "if(sum(check_dct_sd) == nrow(dct_sd_order)) {\n", " print(\"All dCt SD correct\")\n", "} else {\n", " print(\"Need to clean dCt SD\")\n", " dct_sd_errs = cbind(byline_dct_sd_order[which(check_dct_sd==F),], dct_sd_order[which(check_dct_sd==F),])\n", " dct_sd_errs\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6f. Update dCt SD if necessary" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "if (dim(dct_sd_errs)[1] > 0) {\n", " for (i in 1:dim(dct_sd_errs)[1]) {\n", " qpcr_data$dCt.sd[qpcr_data$Group==dct_sd_errs[i,3]] = dct_sd_errs[i,4]\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6g. Calculate baseline dCt" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"All baseline dCt correct\"\n" ] } ], "source": [ "## Check baseline.dct\n", "## Add new group column to annotate baseline\n", "qpcr_data$Group_g <- paste(qpcr_data[,\"UW_Line\"],qpcr_data[,\"Tissue\"], qpcr_data[,\"Experiment\"],sep=\"_\")\n", "\n", "## Calculate baseline, baseline is 12 for this data\n", "qpcr_data$baseline.dCt.V2 = NA\n", "baseline = 12\n", "\n", "for(i in unique(qpcr_data[,\"Group_g\"])){\n", " qpcr_data[which(qpcr_data[,\"Group_g\"] == i),\"baseline.dCt.V2\"] <- \n", " qpcr_data[which(qpcr_data[,\"Group_g\"] == i & qpcr_data[,\"Virus\"] == \"Mock\" & qpcr_data[,\"Timepoint\"] == baseline),\"dCt.mean\"]\n", "}\n", "\n", "# Print discrepancies, if they exist\n", "if(sum(qpcr_data[,\"baseline.dCt\"] == qpcr_data[,\"baseline.dCt.V2\"]) == nrow(qpcr_data)){\n", " print(\"All baseline dCt correct\")\n", "} else {\n", " print(\"Need to clean baseline dCt\")\n", " baseline_dct_errs = qpcr_data[qpcr_data[,\"baseline.dCt\"] != qpcr_data[,\"baseline.dCt.V2\"], \n", " c(\"Group\", \"Group_g\", \"baseline.dCt\", \"baseline.dCt.V2\")]\n", " baseline_dct_errs\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6h. Update baseline dCt if necessary" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if (dim(baseline_dct_errs)[1] > 0) {\n", " for (i in 1:dim(baseline_dct_errs)[1]) {\n", " qpcr_data$baseline.dCt[qpcr_data$Group==baseline_dct_errs[i,1]] = baseline_dct_errs[i,4]\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6i. Calculate ddCt mean" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"All ddCt mean correct\"\n" ] } ], "source": [ "## Calculate ddCt mean \n", "qpcr_data$ddCt.mean.V2 <- as.numeric(as.character(qpcr_data[,\"dCt.mean\"])) - as.numeric(as.character(qpcr_data[,\"baseline.dCt\"]))\n", "\n", "check_ddct_mean = sapply(1:dim(qpcr_data)[1],function(x)isTRUE(all.equal(qpcr_data[x,\"ddCt.mean\"], qpcr_data[x,\"ddCt.mean.V2\"], tolerance=5.5e-8)))\n", "\n", "## Print discrepancies, if they exist\n", "if(sum(check_ddct_mean) == dim(qpcr_data)[1]){\n", " print(\"All ddCt mean correct\")\n", "} else {\n", " print(\"Need to clean ddCt mean\")\n", " ddct_errs = qpcr_data[which(check_ddct_mean==F), c(\"Group\", \"Group_g\", \"ddCt.mean\", \"ddCt.mean.V2\")]\n", " ddct_errs\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6j. Update ddCt mean if necessary" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "if (dim(ddct_errs)[1] > 0) {\n", " for (i in 1:dim(ddct_errs)[1]) {\n", " qpcr_data$ddCt.mean[qpcr_data$Group==ddct_errs[i,1]] = ddct_errs[i,4]\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6k. Calculate fold change mean" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"All FC mean correct\"\n" ] } ], "source": [ "# check fc mean correct\n", "qpcr_data$fc.mean.V2 <- 2^-qpcr_data[,\"ddCt.mean\"]\n", "\n", "check_fc_mean = sapply(1:dim(qpcr_data)[1],function(x)isTRUE(all.equal(qpcr_data[x,\"fc.mean\"], qpcr_data[x,\"fc.mean.V2\"])))\n", "\n", "if(sum(check_fc_mean) == dim(qpcr_data)[1]){\n", " print(\"All FC mean correct\")\n", "} else {\n", " print(\"Need to clean FC mean\")\n", " fc_errs = qpcr_data[which(check_fc_mean==F), c(\"Group\", \"Group_g\", \"fc.mean\", \"fc.mean.V2\")]\n", " fc_errs\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6l. Update fold change mean if necessary" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "if (dim(fc_errs)[1] > 0) {\n", " for (i in 1:dim(fc_errs)[1]) {\n", " qpcr_data$fc.mean[qpcr_data$Group==fc_errs[i,1]] = fc_errs[i,4]\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 7. Remove Unused Columns and Add Baseline dCt.sd and ddCt.se" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Remove extra (unused) columns\n", "remove_cols = c(\"ddCt.sd\", \"fc.sd\", \"baseline.dCt.V2\", \"ddCt.mean.V2\", \"fc.mean.V2\")\n", "qpcr_data_v2 = qpcr_data[,-as.vector(unlist(sapply(remove_cols,function(x)which(x==names(qpcr_data)))))]" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Add baseline.dCt.sd column\n", "qpcr_data_v2$baseline.dCt.sd = NA\n", "\n", "# compute baseline sd, use baseline = 12 saved above\n", "for (i in unique(qpcr_data_v2[,\"Group_g\"])){\n", " qpcr_data_v2[which(qpcr_data_v2[,\"Group_g\"] == i),\"baseline.dCt.sd\"] <- \n", " qpcr_data_v2[which(qpcr_data_v2[,\"Group_g\"] == i & \n", " qpcr_data_v2[,\"Virus\"] == \"Mock\" & \n", " qpcr_data_v2[,\"Timepoint\"] == baseline),\"dCt.sd\"]\n", "}" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Add ddCt.se\n", "ddCt.se = sapply(1:dim(qpcr_data_v2)[1],function(x){\n", " sqrt((qpcr_data_v2[x,\"dCt.sd\"]^2/qpcr_data_v2[x,\"N\"]) + \n", " (qpcr_data_v2[x,\"baseline.dCt.sd\"]^2/qpcr_data_v2[which(qpcr_data_v2[,\"Group_g\"] == qpcr_data_v2[x,\"Group_g\"] & \n", " qpcr_data_v2[,\"Virus\"] == \"Mock\" & \n", " qpcr_data_v2[,\"Timepoint\"] == baseline),\"N\"])\n", " )\n", "})\n", "qpcr_data_v2$ddCt.se <- ddCt.se" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 8. Finalize Data" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Order by group\n", "qpcr_data_v3_final = qpcr_data_v2[order(qpcr_data_v2[,\"Group\"]),]\n", "\n", "## Double-check all calculations are correct\n", "## dCt mean\n", "isTRUE(all.equal(dct_mean_order[,2], qpcr_data_v3_final[,\"dCt.mean\"]))\n", "\n", "## N\n", "isTRUE(all.equal(bymouse_n_order[,2], qpcr_data_v3_final[,\"N\"]))\n", "\n", "## dCt SD\n", "isTRUE(all.equal(dct_sd_order[,2], qpcr_data_v3_final[,\"dCt.sd\"]))\n", "\n", "## baseline dCt\n", "isTRUE(all.equal(qpcr_data[order(qpcr_data[,\"Group\"]),\"baseline.dCt.V2\"],qpcr_data_v3_final[,\"baseline.dCt\"]))\n", "\n", "## ddCt mean\n", "isTRUE(all.equal(qpcr_data[order(qpcr_data[,\"Group\"]),\"ddCt.mean.V2\"],qpcr_data_v3_final[,\"ddCt.mean\"]))\n", "\n", "## FC mean\n", "isTRUE(all.equal(qpcr_data[order(qpcr_data[,\"Group\"]),\"fc.mean.V2\"],qpcr_data_v3_final[,\"fc.mean\"]))\n", "\n", "## baseline SD, computed using corrected dCt SD at timepoint 12\n", "\n", "## ddCt SE, computed using correct dCt SD and baseline SD\n", "\n", "## Remove Group_g column\n", "qpcr_data_final_format <- qpcr_data_v3_final[,!(names(qpcr_data_v3_final) %in% \"Group_g\")]" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Order columns according to data dictionary (byLine)\n", "## Note: you may have to change the path to the data dictionary\n", "data_dict <- read.xls(xls=\"./data/WNV_Data_Dictionary.xlsx\", sheet=\"qPCR Data - By Line\", as.is=T)\n", "qpcr_data_final_format_order = qpcr_data_final_format[, data_dict[,1]]" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Order columns according to data dictionary (byMouse)\n", "data_dict <- read.xls(xls=\"./data/WNV_Data_Dictionary.xlsx\", sheet=\"qPCR Data - By Mouse\", as.is=T)\n", "qpcr_data_mouse_order = qpcr_data_mouse[, data_dict[,1]]" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "\t<li>TRUE</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\item TRUE\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. TRUE\n", "2. TRUE\n", "3. TRUE\n", "4. TRUE\n", "5. TRUE\n", "6. TRUE\n", "7. TRUE\n", "8. TRUE\n", "9. TRUE\n", "10. TRUE\n", "11. TRUE\n", "12. TRUE\n", "13. TRUE\n", "14. TRUE\n", "15. TRUE\n", "16. TRUE\n", "17. TRUE\n", "18. TRUE\n", "19. TRUE\n", "20. TRUE\n", "21. TRUE\n", "22. TRUE\n", "23. TRUE\n", "\n", "\n" ], "text/plain": [ " [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE\n", "[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check both byLine and byMouse have the same UW lines\n", "names(summary(as.factor(qpcr_data_mouse_order[,\"UW_Line\"]))) == \n", "names(summary(as.factor(qpcr_data_final_format_order[,\"UW_Line\"])))" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>1560</li>\n", "\t<li>19</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1560\n", "\\item 19\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1560\n", "2. 19\n", "\n", "\n" ], "text/plain": [ "[1] 1560 19" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>4600</li>\n", "\t<li>21</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 4600\n", "\\item 21\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 4600\n", "2. 21\n", "\n", "\n" ], "text/plain": [ "[1] 4600 21" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dim(qpcr_data_final_format_order)\n", "dim(qpcr_data_mouse_order)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 9. Combine with Previously Cleaned Data" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>3990</li>\n", "\t<li>19</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 3990\n", "\\item 19\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 3990\n", "2. 19\n", "\n", "\n" ], "text/plain": [ "[1] 3990 19" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Read in previous version of byLine data\n", "cleaned_data_dir = \"~/Documents/SIG/WNV/Cleaned_Data_Releases/23-Mar-2016/\"\n", "prev_qpcr_byline = read.xls(xls=file.path(cleaned_data_dir, \"Gale_qPCR_byLine_23-Mar-2016_final.xlsx\"), sheet=1, as.is=T)\n", "dim(prev_qpcr_byline)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "170" ], "text/latex": [ "170" ], "text/markdown": [ "170" ], "text/plain": [ "[1] 170" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check for duplicates (new data will overwrite old)\n", "dup_groups = intersect(qpcr_data_final_format_order$Group, prev_qpcr_byline$Group)\n", "length(dup_groups)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>3820</li>\n", "\t<li>19</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 3820\n", "\\item 19\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 3820\n", "2. 19\n", "\n", "\n" ], "text/plain": [ "[1] 3820 19" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Remove duplicated groups from previous data\n", "prev_qpcr_byline = prev_qpcr_byline[!prev_qpcr_byline$Group %in% dup_groups, ]\n", "dim(prev_qpcr_byline)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>5380</li>\n", "\t<li>19</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 5380\n", "\\item 19\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 5380\n", "2. 19\n", "\n", "\n" ], "text/plain": [ "[1] 5380 19" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>5380</li>\n", "\t<li>19</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 5380\n", "\\item 19\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 5380\n", "2. 19\n", "\n", "\n" ], "text/plain": [ "[1] 5380 19" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Combine old and new data\n", "qpcr_byline_updated = rbind(prev_qpcr_byline, qpcr_data_final_format_order)\n", "dim(qpcr_byline_updated)\n", "\n", "# Set blanks to NA\n", "qpcr_byline_updated_cleaned = clean_na(qpcr_byline_updated)\n", "\n", "# Remove duplicates\n", "if(sum(duplicated(qpcr_byline_updated_cleaned)) != 0){\n", " qpcr_byline_updated_cleaned = qpcr_byline_updated_cleaned[!duplicated(qpcr_byline_updated_cleaned),]\n", "}\n", "dim(qpcr_byline_updated_cleaned)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>11564</li>\n", "\t<li>21</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 11564\n", "\\item 21\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 11564\n", "2. 21\n", "\n", "\n" ], "text/plain": [ "[1] 11564 21" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Read in previous version of byMouse data\n", "prev_qpcr_bymouse = read.xls(xls=file.path(cleaned_data_dir, \"Gale_qPCR_byMouse_23-Mar-2016_final.xlsx\"), sheet=1)\n", "dim(prev_qpcr_bymouse)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "495" ], "text/latex": [ "495" ], "text/markdown": [ "495" ], "text/plain": [ "[1] 495" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check for duplicates (new data will overwrite old)\n", "prev_ids = paste(prev_qpcr_bymouse$ID, prev_qpcr_bymouse$Tissue, prev_qpcr_bymouse$Experiment, sep='_')\n", "new_ids = paste(qpcr_data_mouse_order$ID, qpcr_data_mouse_order$Tissue, qpcr_data_mouse_order$Experiment, sep='_')\n", "dup_ids = intersect(prev_ids, new_ids)\n", "length(dup_ids)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>11069</li>\n", "\t<li>21</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 11069\n", "\\item 21\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 11069\n", "2. 21\n", "\n", "\n" ], "text/plain": [ "[1] 11069 21" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Remove duplicated IDs from previous data\n", "prev_qpcr_bymouse = prev_qpcr_bymouse[which(!prev_ids %in% dup_ids), ]\n", "dim(prev_qpcr_bymouse)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>15669</li>\n", "\t<li>21</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 15669\n", "\\item 21\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 15669\n", "2. 21\n", "\n", "\n" ], "text/plain": [ "[1] 15669 21" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>15669</li>\n", "\t<li>21</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 15669\n", "\\item 21\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 15669\n", "2. 21\n", "\n", "\n" ], "text/plain": [ "[1] 15669 21" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Combine old and new data\n", "qpcr_bymouse_updated = rbind(prev_qpcr_bymouse, qpcr_data_mouse_order)\n", "dim(qpcr_bymouse_updated)\n", "\n", "## Set blanks to NA\n", "qpcr_bymouse_updated_cleaned = clean_na(qpcr_bymouse_updated)\n", "\n", "## Remove duplicates\n", "if(sum(duplicated(qpcr_bymouse_updated_cleaned)) != 0){\n", " qpcr_bymouse_updated_cleaned = qpcr_byline_updated_cleaned[!duplicated(qpcr_bymouse_updated_cleaned),]\n", "}\n", "dim(qpcr_bymouse_updated_cleaned)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 10. Make Any Manual Corrections, If Necessary (Record These in README)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 11. Save Cleaned Data" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Save ByLine Data\n", "write.table(qpcr_byline_updated_cleaned, file=file.path(data_dir, \"23-May-2016/Gale_qPCR_byLine_5-23-16_MM_updated.txt\"), \n", " col.names=T, row.names=F, sep='\\t', quote=F, na=\"\")\n", "## Save ByMouse Data\n", "write.table(qpcr_bymouse_updated_cleaned, file=file.path(data_dir, \"23-May-2016/Gale_qPCR_byMouse_5-23-16_MM_updated.txt\"), \n", " col.names=T, row.names=F, sep='\\t', quote=F, na=\"\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### Last Updated: 26-May-2016" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.2.2" } }, "nbformat": 4, "nbformat_minor": 0 }