## Methods {.page_break_before} In this section, PSB 2019 attendees are asked to contribute a bit about themselves and their research. As part of the [special working group](http://web.archive.org/web/20190103203407/https://psb.stanford.edu/working%20group/), we thought this would be a helpful activity to introduce biocomputational scientists to writing with Manubot. For inspiration, here are some prompts: - Introduce yourself briefly. - What do you research? Include any relevant links to your lab or personal website. - What is your favorite study from your career or what study was your biggest discovery? - What was your first scholarly publication? - Add a code snippet or mathematical equation. - Add a figure with a caption. This could be a picture of you in Hawaii or a figure from your previous work if the license is permissive enough to allow reuse. Self-citations are explicitly encouraged, since one goal of this activity is to introduce attendees to the concept of [citation by persistent identifier](https://github.com/dhimmel/psb-manuscript/blob/master/USAGE.md#citations). By having attendees cite their existing works, we hope to show researchers how references can be created from just persistent identifiers, and how this can assist with collaborative and transparent authoring. The [markdown manuscript source](https://github.com/dhimmel/psb-manuscript/tree/master/content) has a section for each PSB 2019 attendee, generated from the online [attendee list](https://github.com/dhimmel/psb-manuscript/blob/master/attendees/attendees.pdf). Names are ordered alphabetically by last name. If you'd like to contribute, but are not already in the list, please insert your section at the appropriate alphabetical location. For questions on how to use Manubot, see the [usage documentation](https://github.com/dhimmel/psb-manuscript/blob/master/USAGE.md). More information on the tool and its inception is available in the project manuscript [@url:https://greenelab.github.io/meta-review/]. ## Attendees {.page_break_before} ### J. Brian Byrd I'm a physician-scientist at the University of Michigan. My laboratory focuses on identifying novel biomarkers for a clinically important subtype of high blood pressure, called primary aldosteronism. Our principal interest is in detecting the transcriptional activity of the mineralocorticoid receptor [@pmid:30354328]. ### Weixuan Fu Aloha, I'm in the [Institute for Biomedical Informatics (IBI)](http://upibi.org/) at the University of Pennsylvania and the developer of [TPOT](https://github.com/EpistasisLab/tpot) and PennAI [@doi:10.1007/978-3-319-90512-9_8]. My main interest of research is developing automated machine learning tools for the analysis of large scale biomedical/sequencing data. Besides that, I am working on optimizing analysis pipeline of predicting neoantigen specifically presented in tumor cells using DNA and RNA sequencing data, for designing personalized neoantigen vaccines in cancer immunotherapies. ### Casey Greene I run an integrative genomics research lab at the University of of Pennsylvania, and I direct the Childhood Cancer Data Lab for Alex's Lemonade Stand Foundation. The lab at Penn develops methods to integrate large-scale public datasets, primarily from transcriptomic assays, and applies these methods to a broad set of biological questions. We've studied numerous systems, and we currently have active research projects in the application areas of microbial systems [@doi:10.1128/mSystems.00025-15; @doi:10.1016/j.cels.2017.06.003], cancers [@doi:10.1142/9789813235533_0008; @doi:10.1016/j.celrep.2018.03.046; @doi:10.1016/j.celrep.2018.03.076; @doi:10.1016/j.cell.2018.03.035], and rare diseases [@doi:10.1101/395947]. At this PSB, a postdoc from the group will present a paper describing Continental Breakfast Included (CBI) effect in the final talk of the final session of this year's meeting [@doi:10.1101/385534]. I'm also interested in technologies that enhance the future of scientific communication. Our lab has studied Sci-Hub [@doi:10.7554/eLife.32822]. We've led a large collaborative review of deep learning in biology and medicine [@doi:10.1098/rsif.2017.0387]. Members of the lab have developed tools like manubot [@url:https://greenelab.github.io/meta-review/], which you are using now. More publications are available on our [lab website](http://www.greenelab.com/publications). ### Daniel Himmelstein Greetings, I'm in the [Greene Lab](http://www.greenelab.com/) at the University of Pennsylvania and am the lead developer of the Manubot project. 2019 is my first PSB and I'm exciting to backpack around the Big Island following the conference. My main area of research is integrating biomedical knowledge using hetnets [@pmid:26158728; @doi:10.7554/eLife.26726]. However, I've also studied Sci-Hub, finding that it provides access to nearly all paywalled scholarly literature [@pmcid:PMC5832410]. Perhaps my biggest discovery was observing an epidemiological association that higher elevation counties have lower rates of lung cancer, suggesting that oxygen is an inhaled carcinogen (Figure @fig:elevcan) [@doi:10.7717/peerj.705; @url:https://nyti.ms/1NuB0WJ]. ![ The association between elevation and lung cancer across Western U.S. counties. This figure is reused from [here](https://doi.org/10.7717/peerj.705/fig-4) under its CC BY 4.0 License. ](https://github.com/dhimmel/elevcan/raw/7aed9f29d2371eb4918f337a138608e6b6d9e311/manual/figures/peerj/Figure_4.png){#fig:elevcan width="100%"} I haven't done much text mining, but I did enjoy extracting attendee names for PSB from the online PDF. Converting the PDF to text in Python was [as easy as](https://github.com/dhimmel/psb-manuscript/blob/15babecdf2a915f88088703e23a61e34e1294b1f/attendees/attendees.ipynb): ```python # https://stackoverflow.com/a/48673754 import tika.parser parsed = tika.parser.from_file('attendees.pdf') text = parsed["content"] ``` ### Qiwen Hu I'm a postdoc from [Greene Lab](http://www.greenelab.com/) at the University of Pennsylvania. My research focuses on integrating different types of high-throughput sequencing data to find meaningful biological signals behind it. I developed machine learning and statistical approaches to identify regulatory elements that affect transcription and translation. I also developed machine learning-based methods to extract regulatory signals from addicted brain [@doi:10.1371/journal.pcbi.1005602], developmental tissues [@doi:10.1101/361816], and cell-type signals from single-cell datasets. This year at PSB, I will present our findings for analyzing single-cell data based on deep variation auto-encoders [@doi:10.1101/385534]. ### Lawrence Hunter I'm a cofounder of the [PSB conference](http://psb.stanford.edu), and a professor at the University of Colorado School of Medicine. You can find information about my lab at . One of my early papers is @doi:10.1142/9789814447331_0049. ### Shantanu Jain Hi all, I am very excited to be here attending PSB. I am a research scientist at Northeastern University. I am broadly interested in machine learning methods. During my Ph.D., I worked on positive unlabeled learning. I am most proud about my research on nonparametric estimation of class priors from positive and unlabeled data [@arxiv:1601.01944]. I have started learning about Causal Inference lately and I am interested in applying it to biological datasets. ### Adam Kurkiewicz I'm interested in building a tool to do SNP calling from single cell RNASeq data. This has been tried before by various groups, e.g. check out the honeyBADGER paper [@doi:10.1101/gr.228080.117], but ultimately none of the approaches were successful. I have a few ideas on how to make progress --- give me a shout if you'd like to discuss! ### Trang Le Hello from the [Moore lab](http://epistasis.org/) at the University of Pennsylvania! I'm [a mathematician](http://lelaboratoire.github.io) who's currently excited about automated machine learning. Here goes the self-citations: - My own favorite study: Generalization of the Fermi Pseudopotential [@arxiv:1806.05726] - a piece of mathematical physics work I got to do when procrastinating writing my dissertation. - My first (first-author) scholarly publication: Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests [@doi:10.1093/bioinformatics/btx298]. Check out the Github repo for this study [here](https://github.com/insilico/privateEC). - Code snippet I'm most proud of: ``` M = dec2bin(0:2^(n*n)-1,n*n) ``` I will be impressed if you could tell what the language is. This is my answer to a question on [Math StackExchange](https://math.stackexchange.com/questions/1943862/matlab-code-for-an-array-consisting-of-matrices/1943923#1943923). - I have too many favorite mathematical equations, but here's one: $$a^p \equiv a \mod p$$ Anyone recognize this theorem? - And a figure with a caption: ![ Participants tended to have older brain-predicted age when given placebo. ](https://s3.amazonaws.com/media-p.slid.es/uploads/909204/images/5547949/fig2.png){#fig:ibuBrainAGE width="80%"} This is an improved version of my main figure in this interesting study [@doi:10.1016/j.bpsc.2018.05.002]. ### Binglan Li Greeting from the [Ritchie Lab](http://ritchielab.org/) at the University of Pennsylvania. I am a third year graduate student in the Genomics and Computational Biology programe and interested in prioritization of drug response-related gene via data integration approaches. I am still on the early part of my research journey. But I would love to share my latest work published in the PSB 2019 proceedings. - Influence of tissue context on gene prioritization for predicted transcriptome-wide association studies [@doi:10.1142/9789813279827_0027]. - Code snippet I'm most proud of: ``` ############################################ ## Menu ## 1. Food Preparation ## 1.1. Load Necessary Libraries and Scripts ## 1.2. Define Parameters ## 2. Appetizers ## 2.1. Data Simulation ## 2.2. eQTL Detection ## 3. Entree/Main Course ## 3.1. Run single-tissue TWAS ## 3.2. Run integrative TWAS ## 3.3. Evaluate Power and Type I Error Rate of TWAS Results ## 4. Dessert ## 4.1. None. Sorry this is a healthy (aka anti-sweet) restaurant. ############################################ ## actual code set.seed(random_seed, kind = "L'Ecuyer-CMRG") ``` - Here is a plot about minor allele frequency of the eQTLs in the GTEx v7 whole blood tissue. Please pretend that you do see a title in the figure. ![ Minor allele frequency of eQTLs in the GTEx v7 whole blood tissue. ](https://user-images.githubusercontent.com/1117703/50731107-cd16b380-1100-11e9-930c-3a23cb4a0ec9.png){#fig:binglan width="50%"} ### Jason E. Miller Hi, I'm a postdoctoral fellow from the [Ritchie lab](https://ritchielab.org) at the University of Pennsylvania. I'm currently focused on identifying how genetic variation leads to Alzheimer's disease through perturbation of gene regulatory mechanisms. My favorite study from my career identified specific types of codon bias among synonymous variants, such as those related to codon optimality and frequency, that are associated with an Alzheimer's disease imaging endophenotype [@pmcid:PMC5756629]. If you are interested, you can check out my GitHub page [here](https://github.com/git-jemiller). ### Luca Pinello Aloah from the [Pinello Lab!](http://pinellolab.org/) I am a computational biologist studying the role of chromatin structure/dynamics and non-coding regions including enhancers, promoters, insulators and their role in gene regulation. The mission of my lab is the integration of omics data to explore and better understand the functional mechanisms of the non-coding genome and to provide accessible tools for the community to accelerate discovery in this field. The long-term goal of my research is to develop innovative computational approaches and to use cutting-edge experimental assays, such as single cell and genome editing, to systematically analyze sources of genetic and epigenetic variation that affect gene regulation in different human traits and diseases. I believe this will further our understanding of disease etiology involving these poorly characterized regions and will provide a foundation for the development of new drugs and more targeted treatments. I am excited to share during the workshop [Reading between the genes: Interpreting noncoding DNA in high throughput](http://lussierlab.net/pacific-symposium-2019/index.html) a new computational methods we have recently developed to analyze CRISPR tiling screen called CRISPR-SURF. You can read more on the manuscript that was recently published in _Nature Methods_ [@pmid:30504875]. ### Rashika Ramola Hi I am Rashika Ramola. I am a PhD student at Northeastern University. This is my first PSB. I like computational biology, and I am excited to be here. My first paper studies some performance measures (accuracy, balanced accuracy, f-measure and Matthews Correlation Coefficient) in positive-unlabeled learning [@doi:10.1142/9789813279827_0012]. In this work, we demonstrate how performance measure can be inaccurate in positive unlabeled setting, and then we introduce correction measures. I am including an important formula from the aforementioned manuscript: $$ \textrm{mcc} = \frac{1}{\beta-\alpha}\sqrt{\frac{\pi(1-\pi)}{c(1-c)}}\cdot\textrm{mcc}^\textrm{pu} $$ It shows that Matthews correlation coefficient (MCC) is directly proportional to its equivalent in positive unlabeled setting. Thus, MCC is a well behaved performance measure. Here is a beautiful aerial shot of [Hawaii](https://www.myhawaii.com.au/wp-content/uploads/sites/13/2018/08/Hawaii-Landscape-Copy.jpg). ### Jaclyn Taroni I'm a data scientist at the [Childhood Cancer Data Lab](https://www.ccdatalab.org) (CCDL), an initiative of [Alex's Lemonade Stand Foundation](https://www.alexslemonade.org). I'm interested in how diverse collections of publicly available transcriptomic data can help us learn about the biology of rare diseases. As a graduate student, I studied systemic sclerosis [@doi:10.1186/s13073-017-0417-1]. In the [PSB 2019 Text Mining and Machine Learning for Precision Medicine Workshop](https://healthlanguageprocessing.org/psb-2019-tm-ml-pm-workshop/), I'll present our MultiPLIER project [@doi:10.1101/395947]. With the CCDL, I've been working on [refine.bio](https://www.refine.bio), a project for uniformly processing transcriptomic data from multiple species. ### Yihsuan Tsai (Shannon) This is Shannon from UNC at Chapel Hill. I'm a bioinformatics scientist at [UNC lineberger cancer center](https://lbc.unc.edu/). My recent research project could be found at PSB poster section #69. It's about using methylation data to predict tumor infiltrating lymphocytes, which is highly correlated with patient survival in Melanoma. Here are some of my publications: 1. Meta-analysis of airway epithelium gene expression in asthma [@pmid:29650561]. 2. Identification of a robust methylation classifier for cutaneous melanoma diagnosis [@doi:10.1016/j.jid.2018.11.024]. 3. Transcriptome-wide identification and study of cancer-specific splicing events across multiple tumors [@pmid:25749525]. 4. Prevalent RNA recognition motif duplication in the human genome [@pmid:24667216]. ### Robin van der Lee Hi! I'm a post-doc with Wyeth Wasserman at UBC, Vancouver, Canada. Info about the lab can be found at and . My PhD work was on integrative omics to discover genes involved in immunity [@pmid:26485378]. I also did some work on comparative genomics of primate genomes, finding that rapidly evolving genes are predictive of virus-human interactions [@pmid:28977405]. In my post-doc work, I am developing methods for interpreting regulatory genomic variants based on alterations to transcription factor binding motifs. Some of that work is on **poster 71**, which I will present on Saturday 5 January 2018 at the PSB meeting. ![ This is the header of the poster I'll present here under its CC BY 4.0 License. ](images/Robin-van-der-Lee_Poster_Pacific_Symposium_on_Biocomputing_PSB_2019__MANTA-RAE_poster_header.jpg){#fig:vanderlee width="100%"} ### Ryan Whaley Hi, I'm Ryan and I'm one of the technical leads for [PharmGKB](https://www.pharmgkb.org). I'm also helping to run the A/V desk during this presentation. I'm trained in software development and started by career as a DBA. Over the past decade I've switched to Java and then web application development. I've contributed to PharmGKB [@pmid:22992668], CPIC [@pmid:21270786], and other PGx consortia. ## Afterword Thanks to everyone who contributed and helped prototype Manubot for massively collaborative, open writing. We'd like to especially acknowledge [Anthony Gitter](https://github.com/agitter), who was not at the conference, but remotely reviewed proposed changes. We'd also like to acknowledge the Sloan Foundation, whose [support](https://sloan.org/grant-detail/8501) made this working group possible. ![Sunset from the Western shore of the Big Island, Hawaii](images/turtles.jpg){#fig:turtles}