--- title: "Introduction To Markdown" author: "Malachi Griffith" date: "September 12, 2017" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # Analysis of lymphoma sequence variant data ## Introduction The following analysis explores the tumor variant allele frequencies (VAFs) for a cohort of follicular lymphoma patients. After looking at the overall distribution of VAFs for the "discovery" and "extension" subsets of this cohort, we determine the average VAF for these two patient groups. ## Lymphoma data description and input The data used in this section is from the manuscript entitled "Recurrent somatic mutations affecting B-cell receptor signaling pathway genes in follicular lymphoma" (PMID: [28064239](https://www.ncbi.nlm.nih.gov/pubmed/28064239). ```{r} fl_data <- read.delim("~/Desktop/ggplot2ExampleData.tsv") ``` ## Lymphoma VAF distribution Using geom_density() from the ggplot2 package we can see that the discovery cohort has an average tumor variant allele fraction (tumor_VAF) somewhere around 20%. The extension cohort has a wider distribution with an average variant allele fraction somewhere around 30-45%. ```{r} library(ggplot2) ggplot(fl_data, aes(x=tumor_VAF, fill=dataset)) + geom_density(alpha=.4) ``` ## Lymphoma mean VAF for discovery and extension cohorts More precisely we can see that the mean variant allele fractions for these cohorts are as follows ```{r echo=FALSE} message("average tumor variant allele fraction in the discovery cohort") mean(fl_data[fl_data$dataset=="discovery","tumor_VAF"]) message("average tumor variant allele fraction in the extension cohort") mean(fl_data[fl_data$dataset=="extension","tumor_VAF"]) ```