title: "Syllabus - GEN242"
linkTitle: "Syllabus"
type: docs
description: >
weight: 2
## Course title
Data Analysis in Genome Biology
GEN242 - Spring 2024
## Printable syllabus
See Google Doc version [here](https://rebrand.ly/vfv9sun).
## Instructor
Name: Thomas Girke
Email: thomas.girke@ucr.edu
Office location: virtual via Zoom
Office hour: Tue 4:30 - 5:30 PM & Fri 4:00 - 5:00 PM
Zoom URL: privately shared
## Description
Introduction to algorithms, statistical methods and data analysis programming
routines relevant for genome biology. The class consists of three
main components: lectures, hands-on practicals and student course projects. The
lecture topics cover databases, sequence (NGS) analysis, phylogenetics,
comparative genomics, genome-wide profiling methods, network biology and more.
The hands-on practicals include homework assignments and course projects
focusing on data analysis programming of next generation genome data using
command-line tools on a computer cluster and the programming environment R.
Credit: 4 units (2x 1.5 hours lectures, 1 hour discussion)
## Objectives of course
- Acquire understanding of algorithms used in bioinformatics
- Obtain hands-on experience in large scale data analysis.
## Prerequisites
The main prerequisite for this course is a strong interest in acquiring the
skills required for mastering the computational aspects of modern genome
## Structure of course
Two lectures per week (1.5 hours each) plus one discussion section (1 hour).
During the first weeks the discussion section will be used for data analysis
tutorials using Linux command-line tools and R.
## Delivery mode
This class will be instructed online via Zoom in synchronous mode. Upon
request, office hours can be attended in person (hybrid mode).
## Time
Lecture: Tue/Thu 2:00-3:20 PM
Discussion: Thu 3:30-4:20 PM
## Location
Online via video conferencing software
## Grading
1. Homework assignments: 40%
2. Scientific paper presentation: 20%
3. Course project presentations: 20%
4. Final project report: 20%
Additional details about the grading system are provided in this [table](https://bit.ly/3udPPMA) (see both tabs).
__Grading policy:__ Given the diverse educational background of the students in GEN242, all assignments are designed to be solvable by students from both experimental and quantitative disciplines, including those with no or only limited prior experience in programming and/or data modeling. The weight of each of the four gradable components in this class is given above in percent.
__(1)__ The homeworks include 8-10 assignments throughout the class. They cover algorithms and data analysis programming problems using the R language. The grading of these assignments is mainly based on correctness, reproducibility and reusability of the analysis code.
__(2-4)__ Students will work on a Challenge Project (individually or in group) addressing a specific data analysis problem in genome data sciences. As part of their project, students will present a scientific paper __(2)__ closely related to their project (see reading list for details). The results of the Challenge Projects __(3)__ will be presented and discussed by each student at the end of the course. In addition, each student will write a detailed analysis report __(4)__ of the assigned course project. The latter will be written in the style of a scientific publication and should include a detailed description of the results including all analysis code to fully reproduce the project results followed by a critical discussion of the outcomes. The grading of both the paper and project presentations __(2-3)__ includes anonymous feedback from all students as well as the instructor, where understanding of the material, clarity of the oral presentations and critical thinking are the main grading criteria. The final project reports __(4)__ will be graded by the instructor with an emphasis on scientific and coding accuracy, overall understanding of the topic, as well as reproducibility of the results.
## Materials needed
Students are expected to bring to each class meeting a laptop with a functional wireless
connection and a recent internet browser version (e.g. Firefox, Chrome or
Safari) preinstalled. Tablet computers with mobile operating systems are not
suitable for running the required software. User accounts on a research
computer cluster will be provided at the beginning of the course. To log in to
the cluster, students also need to install a terminal application for their
operating system (_e.g._ [iTerm2](http://www.iterm2.com/#/section/home) on OS X,
and [PuTTY](http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) or
[MobaXterm](http://mobaxterm.mobatek.net/) on Windows) as well as a file exchange software such as
[FileZilla](https://filezilla-project.org/download.php?type=client). In
addition, a recent version of [R](http://www.r-project.org) and
[RStudio](http://rstudio.org) should be installed.
If possible students may want to attend class sessions from a monitor setup
with either one large monitor (wide enough to display several windows) or two
separate monitors. This allows simultaneous viewing of presentations on one
screen and following along hands-on practicals on the other screen.
## Schedule
|Week |Topic |
| Week 1 | Course Introduction |
| | Databases and Software for Genome Biology |
| | Discussion: Introduction to Linux and HPC |
| | Reading: A1, T1, T2 |
| Week 2 | Sequencing Technologies |
| | Discussion: Introduction to R |
| | Reading: A2-A4, T3 |
| Week 3 | Sequence Alignments and Searching |
| | Multiple Sequence Alignments |
| | Discussion: Programming in R and Parallel Evaluations |
| | Reading: A5-A6, T4-T5 |
| Week 4 | Short Read Alignment Algorithms |
| | Discussion: Basics of NGS Analysis |
| | Reading: A7-A10, T6 |
| Week 5 | Gene Expression Analysis, Microarrays, bulk RNA-Seq and scRNA-Seq |
| | Discussion: NGS Workflow Overview using CWL; RNA-Seq Analysis |
| | Reading: A11-A15, T7-T8 |
| Week 6 | Analysis of ChIP-Seq and VAR-Seq Experiments |
| | Discussion: ChIP-Seq and VAR-Seq Analysis Workflows |
| | Reading: A16-A18, T9-T10 |
| Week 7 | Students present publication related to their chosen course project |
| | Discussion: Q&A about papers |
| | Reading: A19-A23 |
| Week 8 | Clustering algorithms |
| | Pathway and GO annotation systems |
| | Discussion: Gene Set Enrichment Analysis |
| | Reading: A24-A26, T7 (Sec 3.14-3.15), T11 |
| Week 9 | Genome and Transcriptome Assembly Algorithms |
| | Optional: Profile HMMs for Protein Family Modeling |
| | Introduction to Phylogenetics |
| | Discussion: Graphics and Data Visualization |
| | Parallel Processing |
| | Reading: A27-A29, T12 |
| Week 10 | Final presentations of student data analysis projects |
| | Discussion: Tips and tricks for efficient data analysis programming |
| | Reading: A30-A31, T3 (Sec 12,13-17) |
## Reading list
### Journal articles
### Tutorials
T1. [GitHub Introduction](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/github/github/)
T2. [Introduction to Computer Clusters and Linux](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/linux/linux/)
T3. [Introduction to R](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rbasics/rbasics/)
T4. [Programming in R](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rprogramming/rprogramming/)
T5. [Parallel R](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rparallel/rparallel/)
T6. [NGS Analysis Basics](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rsequences/rsequences/)
T7. [NGS Workflows](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/systempiper/systempiper/)
T8. [RNA-Seq Workflow](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/sprnaseq/sprnaseq/)
T9. [scRNA-Seq Embedding Methods](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/scrnaseq/scrnaseq/)
T10. [ChIP-Seq Workflow](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/spchipseq/spchipseq/)
T11. [R Markdown](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rmarkdown/rmarkdown/)
T12. [Functional Enrichment Analysis](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rfea/rfea/)
T13. [Clustering and Network Analysis](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rclustering/rclustering/)
T14. [Project Data](https://girke.bioinformatics.ucr.edu/GEN242/assignments/projects/project_data/)
T15. [Data Visualization](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rgraphics/rgraphics/)
T16. [Shiny Apps](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/shinyapps/shinyapps/)
T17. [Building R Packages](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/rpackages/rpackages/)
T18. [dplyr, tidyr and some SQLite](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/dplyr/dplyr/)
T19. [Advanced: Common Workflow Language (CWL)](https://girke.bioinformatics.ucr.edu/GEN242/tutorials/cmdtocwl/cmdtocwl/)
### Books
Note: there is no need to purchase any books for this course as most reading material will be based on journal articles!
Jonathan Pevsner (2009) Bioinformatics and Functional Genomics. Wiley-Blackwell; 2nd Edition, 992 pages.
Jones N and Pevzner P (2004) An Introduction to Bioinformatics Algorithms. MIT Press, Massachusetts, 435 pages.
Sequence Analysis
Durbin, R, Eddy, S, Krogh, A, Mitchison, G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, UK, 356 pages.
Parida L (2008) Pattern Discovery in Bioinformatics: Theory & Algorithms. CRC Press, London, 526 pages.
Profiling Bioinformatics
Gentleman, R, Carey, V, Dudoit, S, Irizarry, R, Huber, W (2005) Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York, 473 pages.
Felsenstein, J (2004) Inferring Phylogenies. Sinauer, Massachusetts, 664 pages.
Paradis (2006) Analysis of Phylogenetics and Evolution with R. Springer, New York, 211 pages.