{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Systems Immunogenetics Project\n", "\n", "## WNV Histology Data Cleaning Workflow\n", "\n", "### McWeeney Lab, Oregon Health & Science University\n", "\n", "#### Author: Michael Mooney (mooneymi@ohsu.edu)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "This document will walk through the steps for parsing and cleaning flow cytometry spreadsheets. The result will be a single R dataframe containing all the data, with standardized column names and formatting.\n", "\n", "Required Files:\n", "\n", "- This notebook** (`SIG_WNV_Histology_Data_Cleaning.ipynb`): [[Download here]](https://raw.githubusercontent.com/biodev/SIG/master/SIG_WNV_Flow_Data_Cleaning.ipynb)\n", "- The data dictionary containing all histology variables (`WNV_Data_Dictionary.xlsx`): [[Download here]](https://raw.githubusercontent.com/biodev/SIG/master/data/WNV_Data_Dictonary.xlsx)\n", "\n", "** Note: this notebook can also be downloaded as an R script (only the code blocks seen below will be included): [[Download R script here]](https://raw.githubusercontent.com/biodev/SIG/master/SIG_WNV_Flow_Data_Cleaning.r)\n", "\n", "Required R Packages:\n", "\n", "- `gdata`: [https://cran.r-project.org/web/packages/gdata/index.html](https://cran.r-project.org/web/packages/gdata/index.html)\n", "\n", "**All code is available on GitHub: [https://github.com/biodev/SIG](https://github.com/biodev/SIG)** \n", "\n", "If you are not familiar with Jupyter Notebooks, I've created a short tutorial to get you up and running quickly. There is also plenty of documentation online:\n", "\n", "1. [Jupyter for R Tutorial](http://nbviewer.jupyter.org/github/mooneymi/jupyter_notebooks/blob/master/r/Getting_Started_R.ipynb)\n", "2. [Jupyter Documentation](http://jupyter.org/)\n", "3. [Conda and R](https://www.continuum.io/conda-for-r)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1. Prepare the Input Files\n", "\n", "Sometimes, a bit of manual cleaning of the spreadsheets is necessary for the parsing function to work correctly.\n", "\n", "1. Move histology data to the first tab in the Excel workbook.\n", "2. Remove any rows with text above the column names." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2. Load the Necessary R Libraries and Functions" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.\n", "\n", "gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.\n", "\n", "Attaching package: ‘gdata’\n", "\n", "The following object is masked from ‘package:stats’:\n", "\n", " nobs\n", "\n", "The following object is masked from ‘package:utils’:\n", "\n", " object.size\n", "\n" ] } ], "source": [ "library(gdata)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3. Read the Data into R" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Set data directories\n", "cleaned_dir = '~/Documents/SIG/WNV/Cleaned_Data_Releases/15-Jan-2016'\n", "hist_dir = '~/Documents/SIG/WNV/Histology/fixed_files'\n", "\n", "## Load column names\n", "hist_cn = read.xls(xls=\"./data/WNV_Data_Dictionary.xlsx\", sheet=\"Histology Data\", as.is=T)\n", "hist_cn = hist_cn[,1]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Load cleaned weight data\n", "gale_weight = read.xls(file.path(cleaned_dir, 'Gale_Weight_13-Jan-2016_final.xlsx'), header=T, \n", " as.is=T, na.strings=c(\"\",\" \", \"NA\", \"na\", \"#DIV/0!\"))\n", "\n", "## Load previously cleaned histology data\n", "hist_prev = read.xls(file.path(cleaned_dir, 'Gale_Histology_5-Jan-2016_final.xlsx'), header=T, \n", " as.is=T, na.strings=c(\"\",\" \", \"NA\", \"na\", \"#DIV/0!\"))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Copy of 15-H943_fixedMM.xlsx\" \n", "[2] \"Copy of Copy of Comparative Pathology Program Consult Request 15-H1246 imaging_fixedMM.xlsx\"\n", "[3] \"Copy of Copy of Comparative Pathology Program Consult Request 15-H1393_fixedMM.xlsx\" \n", "[4] \"Copy of Copy of Comparative Pathology Program Consult Request 15-H1508_fixedMM.xlsx\" \n", "[5] \"Copy of Copy of Gale CC mice CPP request for prefilled 16-H94_fixedMM.xlsx\" \n", "[6] \"Copy of histo 15-H792 results_fixedMM.xlsx\" \n", "[7] \"New scoring 08 19 14_fixedMM.xlsx\" \n", "[8] \"Results 15-H501+15-H502 06 10 15_fixedMM.xlsx\" \n", "[9] \"Results 15-H585 06 17 15_fixedMM.xlsx\" \n" ] } ], "source": [ "## Get files to be processed\n", "hist_files = list.files(hist_dir, pattern=\".*\\\\.xlsx\")\n", "print(hist_files)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Copy of 15-H943_fixedMM.xlsx\"\n", "[1] \"Copy of Copy of Comparative Pathology Program Consult Request 15-H1246 imaging_fixedMM.xlsx\"\n", "[1] \"Copy of Copy of Comparative Pathology Program Consult Request 15-H1393_fixedMM.xlsx\"\n", "[1] \"Copy of Copy of Comparative Pathology Program Consult Request 15-H1508_fixedMM.xlsx\"\n", "[1] \"Copy of Copy of Gale CC mice CPP request for prefilled 16-H94_fixedMM.xlsx\"\n", "[1] \"Copy of histo 15-H792 results_fixedMM.xlsx\"\n", "[1] \"New scoring 08 19 14_fixedMM.xlsx\"\n", "[1] \"Results 15-H501+15-H502 06 10 15_fixedMM.xlsx\"\n", "[1] \"Results 15-H585 06 17 15_fixedMM.xlsx\"\n" ] } ], "source": [ "## Read all files and combine into a single dataframe\n", "print(hist_files[1])\n", "hist_data_v2 = read.xls(file.path(hist_dir, hist_files[1]), header=T, \n", " as.is=T, na.strings=c(\"\",\" \", \"NA\", \"na\", \"#DIV/0!\"))\n", "colnames(hist_data_v2) = hist_cn[c(9, c(13:35))]\n", "\n", "for (file in hist_files[2:length(hist_files)]) {\n", " print(file)\n", " hist_data = read.xls(file.path(hist_dir, file), header=T, \n", " as.is=T, na.strings=c(\"\",\" \", \"NA\", \"na\", \"#DIV/0!\"))\n", " colnames(hist_data) = hist_cn[c(9, c(13:35))]\n", " hist_data_v2 = rbind(hist_data_v2, hist_data)\n", "}" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>616</li>\n", "\t<li>24</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 616\n", "\\item 24\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 616\n", "2. 24\n", "\n", "\n" ], "text/plain": [ "[1] 616 24" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check dimensions of dataframe\n", "dim(hist_data_v2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4. Clean and Reformat the Data" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"ID\"\n", "[1] \"Mating\"\n", "[1] \"RIX_ID\"\n", "[1] \"UW_Line\"\n", "[1] \"UWID\"\n", "[1] \"Virus\"\n", "[1] \"Tissue\"\n", "[1] \"Timepoint\"\n", "[1] \"Lab\"\n", "[1] \"Data_Altered\"\n", "[1] \"Notes\"\n", "[1] \"GI_Lesions\"\n" ] } ], "source": [ "## Create missing columns\n", "for (cn in setdiff(hist_cn, colnames(hist_data_v2))) {\n", " print(cn)\n", " hist_data_v2[,cn] = NA\n", "}\n", "\n", "## Order columns\n", "hist_data_v2 = hist_data_v2[, hist_cn]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning message:\n", "In eval(expr, envir, enclos): NAs introduced by coercion" ] } ], "source": [ "## Remove any leading or trailing spaces from slide_label\n", "hist_data_v2$slide_label = trim(hist_data_v2$slide_label)\n", "\n", "## Update UW Line\n", "hist_data_v2$UW_Line = sapply(hist_data_v2$slide_label, function(x){unlist(strsplit(x, \" \"))[1]})\n", "hist_data_v2$UW_Line = as.numeric(hist_data_v2$UW_Line)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Update UWID\n", "hist_data_v2$UWID = sapply(hist_data_v2$slide_label, function(x){unlist(strsplit(x, \" \"))[2]})\n", "hist_data_v2$UWID = gsub(\"m\", \"M\", hist_data_v2$UWID)\n", "\n", "## Update Lab\n", "hist_data_v2$Lab = 'Gale'" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Update time points\n", "hist_data_v2$Timepoint = sapply(hist_data_v2$UWID, function(x){unlist(strsplit(x, \"\\\\.\"))[2]})\n", "\n", "## Use time points to update virus\n", "hist_data_v2$Virus = 'WNV'\n", "hist_data_v2$Virus[grepl(\"M\", hist_data_v2$Timepoint)] = 'Mock'\n", "hist_data_v2$Virus[grepl(\"m\", hist_data_v2$Timepoint)] = 'Mock'\n", "\n", "## Update time points\n", "hist_data_v2$Timepoint = gsub(\"M\", \"\", hist_data_v2$Timepoint)\n", "hist_data_v2$Timepoint = gsub(\"m\", \"\", hist_data_v2$Timepoint)\n", "hist_data_v2$Timepoint = as.numeric(hist_data_v2$Timepoint)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"line=3260x1566; rix_id=2\" \"line=3260x1566; rix_id=15\"\n" ] } ], "source": [ "## Update Mating and RIX_ID\n", "for (i in 1:dim(hist_data_v2)[1]) {\n", " line = hist_data_v2$UW_Line[i]\n", " if (!is.na(line)) {\n", " mating = gale_weight$Mating[with(gale_weight, UW_Line==hist_data_v2$UW_Line[i])]\n", " if (length(mating)>0) {\n", " hist_data_v2$Mating[i] = mating[1]\n", " rix_id = gale_weight$RIX_ID[with(gale_weight, UW_Line==hist_data_v2$UW_Line[i] & UWID==hist_data_v2$UWID[i])]\n", " if (length(rix_id)>0) {\n", " if (length(rix_id)>1) {\n", " ## If more than one ID matches the UWID, print the IDs\n", " print(paste0(\"line=\", mating[1], \"; rix_id=\", rix_id))\n", " }\n", " hist_data_v2$RIX_ID[i] = rix_id[1]\n", " }\n", " }\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<table>\n", "<thead><tr><th></th><th scope=col>ID</th><th scope=col>Mating</th><th scope=col>RIX_ID</th><th scope=col>UW_Line</th><th scope=col>UWID</th><th scope=col>Virus</th><th scope=col>Tissue</th><th scope=col>Timepoint</th><th scope=col>slide_label</th><th scope=col>Lab</th><th scope=col>Data_Altered</th><th scope=col>Notes</th></tr></thead>\n", "<tbody>\n", "\t<tr><th scope=row>273</th><td>NA</td><td>3260x1566</td><td>2</td><td>100</td><td>2.12</td><td>WNV</td><td>NA</td><td>12</td><td>100 2.12</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>274</th><td>NA</td><td>3260x1566</td><td>8</td><td>100</td><td>1.12M</td><td>Mock</td><td>NA</td><td>12</td><td>100 1.12M</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>275</th><td>NA</td><td>3260x1566</td><td>9</td><td>100</td><td>2.12M</td><td>Mock</td><td>NA</td><td>12</td><td>100 2.12M</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>276</th><td>NA</td><td>3260x1566</td><td>16</td><td>100</td><td>3.12M</td><td>Mock</td><td>NA</td><td>12</td><td>100 3.12M</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>331</th><td>NA</td><td>3260x1566</td><td>NA</td><td>100</td><td>1.4</td><td>WNV</td><td>NA</td><td>4</td><td>100 1.4</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>332</th><td>NA</td><td>3260x1566</td><td>NA</td><td>100</td><td>2.4</td><td>WNV</td><td>NA</td><td>4</td><td>100 2.4</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>333</th><td>NA</td><td>3260x1566</td><td>NA</td><td>100</td><td>3.4</td><td>WNV</td><td>NA</td><td>4</td><td>100 3.4</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>364</th><td>NA</td><td>3260x1566</td><td>NA</td><td>100</td><td>1.7</td><td>WNV</td><td>NA</td><td>7</td><td>100 1.7</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>365</th><td>NA</td><td>3260x1566</td><td>NA</td><td>100</td><td>2.7</td><td>WNV</td><td>NA</td><td>7</td><td>100 2.7</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>366</th><td>NA</td><td>3260x1566</td><td>NA</td><td>100</td><td>3.7</td><td>WNV</td><td>NA</td><td>7</td><td>100 3.7</td><td>Gale</td><td>NA</td><td>NA</td></tr>\n", "</tbody>\n", "</table>\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllll}\n", " & ID & Mating & RIX_ID & UW_Line & UWID & Virus & Tissue & Timepoint & slide_label & Lab & Data_Altered & Notes\\\\\n", "\\hline\n", "\t273 & NA & 3260x1566 & 2 & 100 & 2.12 & WNV & NA & 12 & 100 2.12 & Gale & NA & NA\\\\\n", "\t274 & NA & 3260x1566 & 8 & 100 & 1.12M & Mock & NA & 12 & 100 1.12M & Gale & NA & NA\\\\\n", "\t275 & NA & 3260x1566 & 9 & 100 & 2.12M & Mock & NA & 12 & 100 2.12M & Gale & NA & NA\\\\\n", "\t276 & NA & 3260x1566 & 16 & 100 & 3.12M & Mock & NA & 12 & 100 3.12M & Gale & NA & NA\\\\\n", "\t331 & NA & 3260x1566 & NA & 100 & 1.4 & WNV & NA & 4 & 100 1.4 & Gale & NA & NA\\\\\n", "\t332 & NA & 3260x1566 & NA & 100 & 2.4 & WNV & NA & 4 & 100 2.4 & Gale & NA & NA\\\\\n", "\t333 & NA & 3260x1566 & NA & 100 & 3.4 & WNV & NA & 4 & 100 3.4 & Gale & NA & NA\\\\\n", "\t364 & NA & 3260x1566 & NA & 100 & 1.7 & WNV & NA & 7 & 100 1.7 & Gale & NA & NA\\\\\n", "\t365 & NA & 3260x1566 & NA & 100 & 2.7 & WNV & NA & 7 & 100 2.7 & Gale & NA & NA\\\\\n", "\t366 & NA & 3260x1566 & NA & 100 & 3.7 & WNV & NA & 7 & 100 3.7 & Gale & NA & NA\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " ID Mating RIX_ID UW_Line UWID Virus Tissue Timepoint slide_label Lab\n", "273 NA 3260x1566 2 100 2.12 WNV NA 12 100 2.12 Gale\n", "274 NA 3260x1566 8 100 1.12M Mock NA 12 100 1.12M Gale\n", "275 NA 3260x1566 9 100 2.12M Mock NA 12 100 2.12M Gale\n", "276 NA 3260x1566 16 100 3.12M Mock NA 12 100 3.12M Gale\n", "331 NA 3260x1566 NA 100 1.4 WNV NA 4 100 1.4 Gale\n", "332 NA 3260x1566 NA 100 2.4 WNV NA 4 100 2.4 Gale\n", "333 NA 3260x1566 NA 100 3.4 WNV NA 4 100 3.4 Gale\n", "364 NA 3260x1566 NA 100 1.7 WNV NA 7 100 1.7 Gale\n", "365 NA 3260x1566 NA 100 2.7 WNV NA 7 100 2.7 Gale\n", "366 NA 3260x1566 NA 100 3.7 WNV NA 7 100 3.7 Gale\n", " Data_Altered Notes\n", "273 NA NA\n", "274 NA NA\n", "275 NA NA\n", "276 NA NA\n", "331 NA NA\n", "332 NA NA\n", "333 NA NA\n", "364 NA NA\n", "365 NA NA\n", "366 NA NA" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check Matings and IDs of the discrepancies\n", "## Make manual corrections if necessary\n", "hist_data_v2[hist_data_v2$Mating=='3260x1566' & !is.na(hist_data_v2$Mating), 1:12]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Update ID\n", "hist_data_v2$ID = paste(hist_data_v2$Mating, hist_data_v2$RIX_ID, sep=\"_\")\n", "hist_data_v2$ID[grepl(\"NA\", hist_data_v2$ID)] = NA" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## Update tissue\n", "hist_data_v2$Tissue = 'Brain'\n", "for (i in 1:length(hist_data_v2$slide_label)) {\n", " label = hist_data_v2$slide_label[i]\n", " if (grepl(\"Br\", label)) {hist_data_v2$Tissue[i] = 'Brain'}\n", " if (grepl(\"LI\", label)) {hist_data_v2$Tissue[i] = 'Large Intestine'}\n", " if (grepl(\"SI\", label)) {hist_data_v2$Tissue[i] = 'Small Intestine'}\n", " if (grepl(\"Stom\", label)) {hist_data_v2$Tissue[i] = 'Stomach'}\n", " if (grepl(\"[Cc]ec\", label)) {hist_data_v2$Tissue[i] = 'Cecum'}\n", "}" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Update GI_Lesions\n", "hist_data_v2$GI_Lesions = NA\n", "for (i in 1:length(hist_data_v2$Tissue)) {\n", " tissue = hist_data_v2$Tissue[i]\n", " if (tissue != 'Brain') {\n", " hist_data_v2$GI_Lesions[i] = hist_data_v2$Cortex_PV_Inf[i]\n", " hist_data_v2$Cortex_PV_Inf[i] = NA\n", " }\n", "}\n", "\n", "hist_data_v2$Cortex_PV_Inf = as.numeric(hist_data_v2$Cortex_PV_Inf)\n", "hist_data_v2$GI_Lesions[hist_data_v2$GI_Lesions == 'NSL'] = 0" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "\n", " 0 1 \n", "98 22 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table(hist_data_v2$GI_Lesions)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<table>\n", "<thead><tr><th></th><th scope=col>ID</th><th scope=col>Mating</th><th scope=col>RIX_ID</th><th scope=col>UW_Line</th><th scope=col>UWID</th><th scope=col>Virus</th><th scope=col>Tissue</th><th scope=col>Timepoint</th><th scope=col>slide_label</th><th scope=col>Lab</th><th scope=col>Data_Altered</th><th scope=col>Notes</th><th scope=col>Cortex_PV_Inf</th><th scope=col>Cortex_P_Inf</th><th scope=col>Cortex_hm</th><th scope=col>Cortex_nn</th><th scope=col>Cortex_Subtotal</th><th scope=col>Hippocampus_PV_Inf</th><th scope=col>Hippocampus_P_Inf</th><th scope=col>Hippocampus_hm</th><th scope=col>Hippocampus_nn</th><th scope=col>Hippocampus_Subtotal</th><th scope=col>Thalamus_Midbrain_PV_Inf</th><th scope=col>Thalamus_Midbrain_P_Inf</th><th scope=col>Thalamus_Midbrain_hm</th><th scope=col>Thalamus_Midbrain_nn</th><th scope=col>Thalamus_Midbrain_Subtotal</th><th scope=col>Cerebellum_PV_Inf</th><th scope=col>Cerebellum_P_Inf</th><th scope=col>Cerebellum_hm</th><th scope=col>Cerebellum_nn</th><th scope=col>Cerebellum_Subtotal</th><th scope=col>Meningitis</th><th scope=col>Total</th><th scope=col>Comment</th><th scope=col>GI_Lesions</th></tr></thead>\n", "<tbody>\n", "\t<tr><th scope=row>397</th><td>NA</td><td>NA</td><td>NA</td><td>NA</td><td>Mock</td><td>WNV</td><td>Brain</td><td>NA</td><td>STI Mock K</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>KIDNEY- no signifincat lesions; mild artifacts MALE</td><td>NA</td></tr>\n", "\t<tr><th scope=row>398</th><td>NA</td><td>NA</td><td>NA</td><td>NA</td><td>3.6</td><td>WNV</td><td>Brain</td><td>6</td><td>STI 3.6 K</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "</tbody>\n", "</table>\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllllllllllllllllllllllllllll}\n", " & ID & Mating & RIX_ID & UW_Line & UWID & Virus & Tissue & Timepoint & slide_label & Lab & Data_Altered & Notes & Cortex_PV_Inf & Cortex_P_Inf & Cortex_hm & Cortex_nn & Cortex_Subtotal & Hippocampus_PV_Inf & Hippocampus_P_Inf & Hippocampus_hm & Hippocampus_nn & Hippocampus_Subtotal & Thalamus_Midbrain_PV_Inf & Thalamus_Midbrain_P_Inf & Thalamus_Midbrain_hm & Thalamus_Midbrain_nn & Thalamus_Midbrain_Subtotal & Cerebellum_PV_Inf & Cerebellum_P_Inf & Cerebellum_hm & Cerebellum_nn & Cerebellum_Subtotal & Meningitis & Total & Comment & GI_Lesions\\\\\n", "\\hline\n", "\t397 & NA & NA & NA & NA & Mock & WNV & Brain & NA & STI Mock K & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & KIDNEY- no signifincat lesions; mild artifacts MALE & NA\\\\\n", "\t398 & NA & NA & NA & NA & 3.6 & WNV & Brain & 6 & STI 3.6 K & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " ID Mating RIX_ID UW_Line UWID Virus Tissue Timepoint slide_label Lab\n", "397 <NA> <NA> NA NA Mock WNV Brain NA STI Mock K Gale\n", "398 <NA> <NA> NA NA 3.6 WNV Brain 6 STI 3.6 K Gale\n", " Data_Altered Notes Cortex_PV_Inf Cortex_P_Inf Cortex_hm Cortex_nn\n", "397 NA NA 0 0 0 0\n", "398 NA NA 0 0 0 0\n", " Cortex_Subtotal Hippocampus_PV_Inf Hippocampus_P_Inf Hippocampus_hm\n", "397 0 0 0 0\n", "398 0 0 0 0\n", " Hippocampus_nn Hippocampus_Subtotal Thalamus_Midbrain_PV_Inf\n", "397 0 0 0\n", "398 0 0 0\n", " Thalamus_Midbrain_P_Inf Thalamus_Midbrain_hm Thalamus_Midbrain_nn\n", "397 0 0 0\n", "398 0 0 0\n", " Thalamus_Midbrain_Subtotal Cerebellum_PV_Inf Cerebellum_P_Inf Cerebellum_hm\n", "397 0 0 0 0\n", "398 0 0 0 0\n", " Cerebellum_nn Cerebellum_Subtotal Meningitis Total\n", "397 0 0 0 0\n", "398 0 0 0 0\n", " Comment GI_Lesions\n", "397 KIDNEY- no signifincat lesions; mild artifacts MALE <NA>\n", "398 <NA> <NA>" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Fix STI sample\n", "hist_data_v2[grepl(\"STI\", hist_data_v2$slide_label),]" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Fix STI sample\n", "hist_data_v2$UWID[397] = NA\n", "hist_data_v2$Virus[397] = 'Mock'\n", "hist_data_v2$Tissue[397] = 'Kidney'\n", "hist_data_v2$Tissue[398] = 'Kidney'" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "0" ], "text/latex": [ "0" ], "text/markdown": [ "0" ], "text/plain": [ "[1] 0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check for duplicates\n", "sum(duplicated(hist_data_v2$ID[!is.na(hist_data_v2$ID)]))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [], "text/latex": [], "text/markdown": [], "text/plain": [ "character(0)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Duplicated IDs\n", "hist_data_v2$ID[duplicated(hist_data_v2$ID) & !is.na(hist_data_v2$ID)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5. Fix Missing Values & Calculate Totals" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cortex_cols = c('Cortex_PV_Inf','Cortex_P_Inf','Cortex_hm','Cortex_nn')\n", "hippocampus_cols = c('Hippocampus_PV_Inf','Hippocampus_P_Inf','Hippocampus_hm','Hippocampus_nn')\n", "thalamus_cols = c('Thalamus_Midbrain_PV_Inf','Thalamus_Midbrain_P_Inf','Thalamus_Midbrain_hm','Thalamus_Midbrain_nn')\n", "cerebellum_cols = c('Cerebellum_PV_Inf','Cerebellum_P_Inf','Cerebellum_hm','Cerebellum_nn')\n", "subtotal_cols = c('Cortex_Subtotal','Hippocampus_Subtotal','Thalamus_Midbrain_Subtotal','Cerebellum_Subtotal','Meningitis')\n", "\n", "## Fill in zeros if not NA\n", "for (i in 1:dim(hist_data_v2)[1]) {\n", " if (!is.na(hist_data_v2[i, 'Cortex_Subtotal']) & hist_data_v2[i, 'Cortex_Subtotal'] == 0) {\n", " hist_data_v2[i,cortex_cols] = c(0,0,0,0)\n", " }\n", " if (!is.na(hist_data_v2[i, 'Hippocampus_Subtotal']) & hist_data_v2[i, 'Hippocampus_Subtotal'] == 0) {\n", " hist_data_v2[i,hippocampus_cols] = c(0,0,0,0)\n", " }\n", " if (!is.na(hist_data_v2[i, 'Thalamus_Midbrain_Subtotal']) & hist_data_v2[i, 'Thalamus_Midbrain_Subtotal'] == 0) {\n", " hist_data_v2[i,thalamus_cols] = c(0,0,0,0)\n", " }\n", " if (!is.na(hist_data_v2[i, 'Cerebellum_Subtotal']) & hist_data_v2[i, 'Cerebellum_Subtotal'] == 0) {\n", " hist_data_v2[i,cerebellum_cols] = c(0,0,0,0)\n", " }\n", " if (!is.na(hist_data_v2[i, 'Total']) & hist_data_v2[i, 'Total'] == 0) {\n", " hist_data_v2[i,subtotal_cols] = c(0,0,0,0,0)\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [], "source": [ "hist_data_v2$Cortex_Subtotal = apply(hist_data_v2[,cortex_cols], 1, function(x){sum(x)})\n", "hist_data_v2$Hippocampus_Subtotal = apply(hist_data_v2[,hippocampus_cols], 1, function(x){sum(x)})\n", "hist_data_v2$Thalamus_Midbrain_Subtotal = apply(hist_data_v2[,thalamus_cols], 1, function(x){sum(x)})\n", "hist_data_v2$Cerebellum_Subtotal = apply(hist_data_v2[,cerebellum_cols], 1, function(x){sum(x)})\n", "hist_data_v2$Total = apply(hist_data_v2[,subtotal_cols], 1, function(x){sum(x)})" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<table>\n", "<thead><tr><th></th><th scope=col>ID</th><th scope=col>Mating</th><th scope=col>RIX_ID</th><th scope=col>UW_Line</th><th scope=col>UWID</th><th scope=col>Virus</th><th scope=col>Tissue</th><th scope=col>Timepoint</th><th scope=col>slide_label</th><th scope=col>Lab</th><th scope=col>Data_Altered</th><th scope=col>Notes</th><th scope=col>Cortex_PV_Inf</th><th scope=col>Cortex_P_Inf</th><th scope=col>Cortex_hm</th><th scope=col>Cortex_nn</th><th scope=col>Cortex_Subtotal</th><th scope=col>Hippocampus_PV_Inf</th><th scope=col>Hippocampus_P_Inf</th><th scope=col>Hippocampus_hm</th><th scope=col>Hippocampus_nn</th><th scope=col>Hippocampus_Subtotal</th><th scope=col>Thalamus_Midbrain_PV_Inf</th><th scope=col>Thalamus_Midbrain_P_Inf</th><th scope=col>Thalamus_Midbrain_hm</th><th scope=col>Thalamus_Midbrain_nn</th><th scope=col>Thalamus_Midbrain_Subtotal</th><th scope=col>Cerebellum_PV_Inf</th><th scope=col>Cerebellum_P_Inf</th><th scope=col>Cerebellum_hm</th><th scope=col>Cerebellum_nn</th><th scope=col>Cerebellum_Subtotal</th><th scope=col>Meningitis</th><th scope=col>Total</th><th scope=col>Comment</th><th scope=col>GI_Lesions</th></tr></thead>\n", "<tbody>\n", "\t<tr><th scope=row>1</th><td>3154x16012_37</td><td>3154x16012</td><td>37</td><td>83</td><td>1.7</td><td>WNV</td><td>Brain</td><td>7</td><td>083 1.7</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>2</th><td>3154x16012_38</td><td>3154x16012</td><td>38</td><td>83</td><td>2.7</td><td>WNV</td><td>Brain</td><td>7</td><td>083 2.7</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>3</th><td>3154x16012_45</td><td>3154x16012</td><td>45</td><td>83</td><td>3.7</td><td>WNV</td><td>Brain</td><td>7</td><td>083 3.7</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>4</th><td>16072x5346_61</td><td>16072x5346</td><td>61</td><td>81</td><td>1.12</td><td>WNV</td><td>Brain</td><td>12</td><td>081 1.12</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>5</th><td>16072x5346_62</td><td>16072x5346</td><td>62</td><td>81</td><td>2.12</td><td>WNV</td><td>Brain</td><td>12</td><td>081 2.12</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>6</th><td>16072x5346_68</td><td>16072x5346</td><td>68</td><td>81</td><td>3.12</td><td>WNV</td><td>Brain</td><td>12</td><td>081 3.12</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>7</th><td>16072x5346_63</td><td>16072x5346</td><td>63</td><td>81</td><td>1.12M</td><td>Mock</td><td>Brain</td><td>12</td><td>081 1.12M</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>8</th><td>16072x5346_70</td><td>16072x5346</td><td>70</td><td>81</td><td>2.12M</td><td>Mock</td><td>Brain</td><td>12</td><td>081 2.12M</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>9</th><td>16072x5346_71</td><td>16072x5346</td><td>71</td><td>81</td><td>3.12M</td><td>Mock</td><td>Brain</td><td>12</td><td>081 3.12M</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "\t<tr><th scope=row>10</th><td>5358x8046_71</td><td>5358x8046</td><td>71</td><td>73</td><td>1.12</td><td>WNV</td><td>Brain</td><td>12</td><td>073 1.12</td><td>Gale</td><td>NA</td><td>NA</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>NA</td><td>NA</td></tr>\n", "</tbody>\n", "</table>\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllllllllllllllllllllllllllll}\n", " & ID & Mating & RIX_ID & UW_Line & UWID & Virus & Tissue & Timepoint & slide_label & Lab & Data_Altered & Notes & Cortex_PV_Inf & Cortex_P_Inf & Cortex_hm & Cortex_nn & Cortex_Subtotal & Hippocampus_PV_Inf & Hippocampus_P_Inf & Hippocampus_hm & Hippocampus_nn & Hippocampus_Subtotal & Thalamus_Midbrain_PV_Inf & Thalamus_Midbrain_P_Inf & Thalamus_Midbrain_hm & Thalamus_Midbrain_nn & Thalamus_Midbrain_Subtotal & Cerebellum_PV_Inf & Cerebellum_P_Inf & Cerebellum_hm & Cerebellum_nn & Cerebellum_Subtotal & Meningitis & Total & Comment & GI_Lesions\\\\\n", "\\hline\n", "\t1 & 3154x16012_37 & 3154x16012 & 37 & 83 & 1.7 & WNV & Brain & 7 & 083 1.7 & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\t2 & 3154x16012_38 & 3154x16012 & 38 & 83 & 2.7 & WNV & Brain & 7 & 083 2.7 & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\t3 & 3154x16012_45 & 3154x16012 & 45 & 83 & 3.7 & WNV & Brain & 7 & 083 3.7 & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\t4 & 16072x5346_61 & 16072x5346 & 61 & 81 & 1.12 & WNV & Brain & 12 & 081 1.12 & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\t5 & 16072x5346_62 & 16072x5346 & 62 & 81 & 2.12 & WNV & Brain & 12 & 081 2.12 & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\t6 & 16072x5346_68 & 16072x5346 & 68 & 81 & 3.12 & WNV & Brain & 12 & 081 3.12 & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\t7 & 16072x5346_63 & 16072x5346 & 63 & 81 & 1.12M & Mock & Brain & 12 & 081 1.12M & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\t8 & 16072x5346_70 & 16072x5346 & 70 & 81 & 2.12M & Mock & Brain & 12 & 081 2.12M & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\t9 & 16072x5346_71 & 16072x5346 & 71 & 81 & 3.12M & Mock & Brain & 12 & 081 3.12M & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\t10 & 5358x8046_71 & 5358x8046 & 71 & 73 & 1.12 & WNV & Brain & 12 & 073 1.12 & Gale & NA & NA & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & NA & NA\\\\\n", "\\end{tabular}\n" ], "text/plain": [ " ID Mating RIX_ID UW_Line UWID Virus Tissue Timepoint\n", "1 3154x16012_37 3154x16012 37 83 1.7 WNV Brain 7\n", "2 3154x16012_38 3154x16012 38 83 2.7 WNV Brain 7\n", "3 3154x16012_45 3154x16012 45 83 3.7 WNV Brain 7\n", "4 16072x5346_61 16072x5346 61 81 1.12 WNV Brain 12\n", "5 16072x5346_62 16072x5346 62 81 2.12 WNV Brain 12\n", "6 16072x5346_68 16072x5346 68 81 3.12 WNV Brain 12\n", "7 16072x5346_63 16072x5346 63 81 1.12M Mock Brain 12\n", "8 16072x5346_70 16072x5346 70 81 2.12M Mock Brain 12\n", "9 16072x5346_71 16072x5346 71 81 3.12M Mock Brain 12\n", "10 5358x8046_71 5358x8046 71 73 1.12 WNV Brain 12\n", " slide_label Lab Data_Altered Notes Cortex_PV_Inf Cortex_P_Inf Cortex_hm\n", "1 083 1.7 Gale NA NA 0 0 0\n", "2 083 2.7 Gale NA NA 0 0 0\n", "3 083 3.7 Gale NA NA 0 0 0\n", "4 081 1.12 Gale NA NA 0 0 0\n", "5 081 2.12 Gale NA NA 0 0 0\n", "6 081 3.12 Gale NA NA 0 0 0\n", "7 081 1.12M Gale NA NA 0 0 0\n", "8 081 2.12M Gale NA NA 0 0 0\n", "9 081 3.12M Gale NA NA 0 0 0\n", "10 073 1.12 Gale NA NA 0 0 0\n", " Cortex_nn Cortex_Subtotal Hippocampus_PV_Inf Hippocampus_P_Inf\n", "1 0 0 0 0\n", "2 0 0 0 0\n", "3 0 0 0 0\n", "4 0 0 0 0\n", "5 0 0 0 0\n", "6 0 0 0 0\n", "7 0 0 0 0\n", "8 0 0 0 0\n", "9 0 0 0 0\n", "10 0 0 0 0\n", " Hippocampus_hm Hippocampus_nn Hippocampus_Subtotal Thalamus_Midbrain_PV_Inf\n", "1 0 0 0 0\n", "2 0 0 0 0\n", "3 0 0 0 0\n", "4 0 0 0 0\n", "5 0 0 0 0\n", "6 0 0 0 0\n", "7 0 0 0 0\n", "8 0 0 0 0\n", "9 0 0 0 0\n", "10 0 0 0 0\n", " Thalamus_Midbrain_P_Inf Thalamus_Midbrain_hm Thalamus_Midbrain_nn\n", "1 0 0 0\n", "2 0 0 0\n", "3 0 0 0\n", "4 0 0 0\n", "5 0 0 0\n", "6 0 0 0\n", "7 0 0 0\n", "8 0 0 0\n", "9 0 0 0\n", "10 0 0 0\n", " Thalamus_Midbrain_Subtotal Cerebellum_PV_Inf Cerebellum_P_Inf Cerebellum_hm\n", "1 0 0 0 0\n", "2 0 0 0 0\n", "3 0 0 0 0\n", "4 0 0 0 0\n", "5 0 0 0 0\n", "6 0 0 0 0\n", "7 0 0 0 0\n", "8 0 0 0 0\n", "9 0 0 0 0\n", "10 0 0 0 0\n", " Cerebellum_nn Cerebellum_Subtotal Meningitis Total Comment GI_Lesions\n", "1 0 0 0 0 <NA> <NA>\n", "2 0 0 0 0 <NA> <NA>\n", "3 0 0 0 0 <NA> <NA>\n", "4 0 0 0 0 <NA> <NA>\n", "5 0 0 0 0 <NA> <NA>\n", "6 0 0 0 0 <NA> <NA>\n", "7 0 0 0 0 <NA> <NA>\n", "8 0 0 0 0 <NA> <NA>\n", "9 0 0 0 0 <NA> <NA>\n", "10 0 0 0 0 <NA> <NA>" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "head(hist_data_v2, 10)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<table>\n", "<thead><tr><th></th><th scope=col>ID</th><th scope=col>Mating</th><th scope=col>RIX_ID</th><th scope=col>UW_Line</th><th scope=col>UWID</th><th scope=col>Virus</th><th scope=col>Tissue</th><th scope=col>Timepoint</th><th scope=col>slide_label</th><th scope=col>Lab</th><th scope=col>Data_Altered</th><th scope=col>Notes</th><th scope=col>Cortex_PV_Inf</th><th scope=col>Cortex_P_Inf</th><th scope=col>Cortex_hm</th><th scope=col>Cortex_nn</th><th scope=col>Cortex_Subtotal</th><th scope=col>Hippocampus_PV_Inf</th><th scope=col>Hippocampus_P_Inf</th><th scope=col>Hippocampus_hm</th><th scope=col>Hippocampus_nn</th><th scope=col>Hippocampus_Subtotal</th><th scope=col>Thalamus_Midbrain_PV_Inf</th><th scope=col>Thalamus_Midbrain_P_Inf</th><th scope=col>Thalamus_Midbrain_hm</th><th scope=col>Thalamus_Midbrain_nn</th><th scope=col>Thalamus_Midbrain_Subtotal</th><th scope=col>Cerebellum_PV_Inf</th><th scope=col>Cerebellum_P_Inf</th><th scope=col>Cerebellum_hm</th><th scope=col>Cerebellum_nn</th><th scope=col>Cerebellum_Subtotal</th><th scope=col>Meningitis</th><th scope=col>Total</th><th scope=col>Comment</th><th scope=col>GI_Lesions</th></tr></thead>\n", "<tbody>\n", "</tbody>\n", "</table>\n" ], "text/latex": [ "\\begin{tabular}{r|llllllllllllllllllllllllllllllllllll}\n", " & ID & Mating & RIX_ID & UW_Line & UWID & Virus & Tissue & Timepoint & slide_label & Lab & Data_Altered & Notes & Cortex_PV_Inf & Cortex_P_Inf & Cortex_hm & Cortex_nn & Cortex_Subtotal & Hippocampus_PV_Inf & Hippocampus_P_Inf & Hippocampus_hm & Hippocampus_nn & Hippocampus_Subtotal & Thalamus_Midbrain_PV_Inf & Thalamus_Midbrain_P_Inf & Thalamus_Midbrain_hm & Thalamus_Midbrain_nn & Thalamus_Midbrain_Subtotal & Cerebellum_PV_Inf & Cerebellum_P_Inf & Cerebellum_hm & Cerebellum_nn & Cerebellum_Subtotal & Meningitis & Total & Comment & GI_Lesions\\\\\n", "\\hline\n", "\\end{tabular}\n" ], "text/plain": [ " [1] ID Mating \n", " [3] RIX_ID UW_Line \n", " [5] UWID Virus \n", " [7] Tissue Timepoint \n", " [9] slide_label Lab \n", "[11] Data_Altered Notes \n", "[13] Cortex_PV_Inf Cortex_P_Inf \n", "[15] Cortex_hm Cortex_nn \n", "[17] Cortex_Subtotal Hippocampus_PV_Inf \n", "[19] Hippocampus_P_Inf Hippocampus_hm \n", "[21] Hippocampus_nn Hippocampus_Subtotal \n", "[23] Thalamus_Midbrain_PV_Inf Thalamus_Midbrain_P_Inf \n", "[25] Thalamus_Midbrain_hm Thalamus_Midbrain_nn \n", "[27] Thalamus_Midbrain_Subtotal Cerebellum_PV_Inf \n", "[29] Cerebellum_P_Inf Cerebellum_hm \n", "[31] Cerebellum_nn Cerebellum_Subtotal \n", "[33] Meningitis Total \n", "[35] Comment GI_Lesions \n", "<0 rows> (or 0-length row.names)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Check for mocks with scores > 0\n", "hist_data_v2[!is.na(hist_data_v2$Total) & hist_data_v2$Total != 0 & hist_data_v2$Virus=='Mock',]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6. Combine with Previous Data" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [], "text/latex": [], "text/markdown": [], "text/plain": [ "character(0)" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Get IDs duplicated in new data\n", "dup_ids = intersect(hist_data_v2$ID, hist_prev$ID)[!is.na(intersect(hist_data_v2$ID, hist_prev$ID))]\n", "dup_ids" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "hist_data_v2[hist_data_v2$ID %in% dup_ids,]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "hist_prev[hist_prev$ID %in% dup_ids,]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>545</li>\n", "\t<li>34</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 545\n", "\\item 34\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 545\n", "2. 34\n", "\n", "\n" ], "text/plain": [ "[1] 545 34" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dim(hist_prev)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [], "source": [ "hist_prev = hist_prev[!hist_prev$ID %in% dup_ids,]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>545</li>\n", "\t<li>34</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 545\n", "\\item 34\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 545\n", "2. 34\n", "\n", "\n" ], "text/plain": [ "[1] 545 34" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dim(hist_prev)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Tissue\"\n", "[1] \"GI_Lesions\"\n" ] } ], "source": [ "for (cn in setdiff(colnames(hist_data_v2), colnames(hist_prev))) {\n", " print(cn)\n", " hist_prev[,cn] = NA\n", "}" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "gale_hist = rbind(hist_prev[,colnames(hist_data_v2)], hist_data_v2[,colnames(hist_data_v2)])" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<ol class=list-inline>\n", "\t<li>1161</li>\n", "\t<li>36</li>\n", "</ol>\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 1161\n", "\\item 36\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 1161\n", "2. 36\n", "\n", "\n" ], "text/plain": [ "[1] 1161 36" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dim(gale_hist)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "write.table(gale_hist, file=file.path(hist_dir, 'Gale_Histology_21-Mar-2016_final.txt'), \n", " col.names=T, row.names=F, quote=F, sep='\\t', na=\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Last Updated: 24-May-2015" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.2.2" } }, "nbformat": 4, "nbformat_minor": 0 }