--- title: "RiboLausanne Report" author: "Dermot Harnett" output: github_document toc: yes --- ```{r setup, include=FALSE} #output: rmarkdown::github_document library(knitr) library(svglite) ``` ```{r timestamp, echo = TRUE} Sys.time() ``` ## Background This report will show my analyses of the Riboseq data from the 3 Lausanne cancer cell lines. The cell lines from the lab of Michal Bassini. Rnaseq is available on them, as well as exome-seq data. ```we are focusing on three samples (OMM4.75, OD5P and ONVC) which are melanoma samples. ``` ```We have identified EREs and LncRNAs derived HLA-I peptides and we will screen them this summer for T cell responses.``` ```I have now attached for one cell line, OD5P, the coordinates and ENSG from the genes encoding our hits, a few retroviral elements and quite a few ncRNAs. ``` ## results ### Executive Summary of Selected lncRNA loci show evidence for translation. In _ cases, these ORFs contain the associated peptides. Our samples are as follows: In general we see ### Todos 1. Scatterplot with the sscRNAseq for ours. 2. Update methods with annotation versions. 3. Combine methods with Ilija's 4. get it out. #### Data Quality Some text below ```{r f1 nap, out.width ='900px', out.height ='900px' ,fig.cap = "", dev='svg',echo=FALSE,include=TRUE,eval=TRUE} dir.create('plots',showWarnings = FALSE) #create placeholder svg('plots/tmp.svg');plot(1,type='n');text(1,1,"This is a placeholder plot",cex=2);dev.off() include_graphics("plots/tmp.svg") ``` Some text above This is a link to the full RNAseq report - [save it](report_Lausanne_lab.html) This is a link to the full SataNN report - [save it](report_Lausanne_lab.html) ### Specific Loci Plots showing the 3bp Periodicity for OD5P for one cell line ```{r specplots, out.width ='900px', out.height ='900px' ,fig.cap = "", dev='svg',echo=FALSE,include=TRUE,eval=TRUE} plot(1) ``` ##Notes for discussion A distinction needs to be made between prediction by Riboseq per se (all 60 nc peptides detected in phase 1 had at least 5 reads in the pooled Riboseq) and detection by Periodicity analysis in our samples. When considering the scatterplots below, The picture which emerges is that peptides not detected by Riboseq typically have low expression, or are expressed in a low fraction of cells. A few exceptions, e.g. ENSG00000269821, likely exist because of instances in which e.g. low mappability at the relatively short size of a Ribosomal footprint prevented accurate psite determination. In particular, I would change "This relatively small overlap of ncHLAIp could be rationalized by the bias of RiboSeq to detect the higher expressed source genes (Supplementary Figure 6a). Specifically, when examining the expression of pc source genes identified in both approaches, the common genes were found to have significantly higher source RNA expression (Supplementary Figure 6b)." to "This likely reflects necessary limits to the detection of periodic RiboSeq reads in transcripts with low expression, or low mappability (Supplementary Figure 6a,b)" Where Figures S6a and b have been changed to: ## Methods RiboSeq: Experimental Protocol Riboseq was performed according to Calviello et al 2016, with some modifications. 400μl of Lysis Buffer was used per 80% confluent plate. Ribosome footprinting was performed by adding 1000U of RNase I (Life Tech. #AM2295) per 400μl of the lysate, and the reaction was stopped by adding 13 µl SUPERASE-In (Thermo AM2694, 20U/µl) per 400µl of lysate. Ribosomes were recovered using two equilibrated MicroSpin S-400 HR columns (GE Healthcare #27-5140-01) per sample. The filtered halves were then combined and three volumes of Trizol LS (Life Tech. #10296010) were added and RNA was extracted using the Direct-zol RNA Mini-Prep kit (Zymo Research #R2052) following the manufacturer’s instructions (including DNase I digestion). Following rRNA depletion and isolation of short fragments, sequencing libraries were prepared using the NEXTflex Small RNA-Seq Kit v3 (BiooScientific #5132-06) following the supplied protocol. All adapters were freshly diluted with nuclease-free water at 1:2 ratio prior to ligations. 3’-Adapter ligation was performed at 25°C for 2h, and 5’-Adapter ligation for 1h at 20°C. These were pooled at 1.6pM and sequenced on an Illumina NextSeq500 machine using High Output Kits v2 (Illumina #FC-404-2005) with 75 cycles single-end. Riboseq: Analysis Riboseq reads were stripped of adaptor sequences using cutdapt, and contaminants such as tRNAs and rRNA were removed by alignment to a contaminants index via Bowtie (PMID 19261174). Unaligned reads from this analysis were then aligned to human genome version hg19 with the STAR (PMID 23104886) splice-aware alignment tool allowing for up to 1 mismatch.The star genome index was built using gencode version 24 (lift 37). Reads with up to 20 multi-mapping positions were included, with multi mapping reads beings separately treated in subsequent peridicity analysis. The RIboseQC pipeline (https://doi.org/10.1101/601468) was used to deduce P-site positions from Riboseq reads, and this P-site data was then used as input to the Ribotaper pipeline, with some update using custom R scripts (Calvello et al 2016), for ORF-calling via isoform-aware spectral analysis. ORFs were called in both individual libraries and in the pooled set of all libraries for OD5P, and ORFs which were fully contained within ORFs detected in another library were merged. ORFs were tested for periodicity, by a multitaper test (Calviello et al 2016) and those with a p-value of 0.05 were kept for analyses. Protein Fastas were generated from the coordinates of these ORFs, and used both for validation of peptides found using the canonical database, and as a de novo assembled database for the subsequent round of peptide detection. Peptides were considered validated by Riboseq if they matched anywhere within the translated ORF sequences. Riboseq profile plots were plotted with psite numbers per-base on a log2(n+1) scale. Figure S_ Limits of Detection by Riboseq Analysis A) Scatterplot showing Processed P-sites vs Raw Riboseq reads for the ncHLAp detected using the canonical transcriptome. Triangles indicate peptides which were contained within an ORF with periodic Riboseq signal. Blue symbol peptides originate from genes which contain at least one periodic Riboseq signal. Difficulty in determining correct p-site offsets for some read lengths, mapping quality, and other factors reduce the number of P-sites available for the detection of periodicity, with detection becoming difficult for genes with low expression, although all ncHLAp show at least some raw Riboseq signal. B) P-sites vs scRNAseq for ncHLAp detected using canonical transcriptome. With some exceptions due to imperfect mappability etc, genes with few psites tend to show little scRNAseq signal and are detected in few cells. Note that only one gene (ENSG00000247271 - labelled) shows greater than 100 p-sites across all samples and is detected in more than 10% of cells C) scRNAseq signal for HLAp detected using a Riboseq-derived translatome. ncHLAp (blue) detected using the Riboseq-translatome show a greater rate of detection in scRNAseq experiments, again indicating that Ribotaper identifies ORFs which show reproducible evidence of translation. ### Loci plots - initially made use of Lorenzo's bigwigs - These were single basepair, 5' ends, which I checked by loading R and looking at them - Once alignments of these were done I tested whether they were the same - they ____ - The plots are colored to indicate frame, show indications of start codon positions, and are also annotated with the pvalue for a test of periodicity in the best reading frame. - Made use of the saved R files Lorenzo provided - then later used my own - lorenzo's saved bigwigs are apparently not usable. - the first set of guys look like repeat associated peptides? - first one is from a gene called IQCK, in the middle of an intron ### Variant Info - Incorporated the variant info into the calls by re running with a custom genome I made from the VCF provided, and the relevant fasta file - Ran SaTaNN using this new info to produce the variant sequences ### Generation of peptide databases for maxquant ### Improvements for later Make SataNN run with bams rather than the genome files etc. Detect splice site disruptions, ask SaTaNN if intermediate space has 3bp periodicity? [comment]: # (I need a good system for stock genome files. I've already written bioconductor genomes to fasts, I"ll probably just do that.______) [comment]: # (This paper has bayesian analysis of periodicity in it! V. interesting. I should https://onlinelibrary.wiley.com/doi/pdf/10.1111/1467-9892.00145)