--- title: "Course Introduction" subtitle: Biostat 200C author: "Dr. Jin Zhou @ UCLA" date: today format: html: theme: cosmo embed-resources: true number-sections: true toc: true toc-depth: 4 toc-location: left code-fold: false engine: knitr knitr: opts_chunk: fig.align: 'center' # fig.width: 6 # fig.height: 4 message: FALSE cache: false --- ## Brief intro ### Myself #### Before 2021 - PhD in Biomathematics, UCLA (the department was renamed to ["UCLA Computational Medicine"](https://compmed.ucla.edu/)) - Postdoc \& Statistician, Harvard University + Department of Biostatistics + Channing's Lab, Brigham and Women's Hospital - Assistant \& Associate professor, University of Arizona (UofA) + Department of Epidemiology and Biostatistics + Statistics and Genetics Graduate Interdisciplinary Program (GIDP) - Adjunct Associate Professor, UCLA + Department of Medicine Statistics Core (DOMStat) #### 2021 - Present - Research principal investigator, Phoenix VA Health Care System - Research principal investigator, Greater Los Angeles VA Health Care System - Professor-in-Residence of Biostatistics, UCLA + Department of Biostatistics ### TA - Zian Zhuang - Email: ### You? ## Course webpages - Github site: + slides, hw, announcements - Burinlearn: + announcements and hw submissions ## What's this course about? - In 200B, we learn the linear models in the form $$ y = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p + \epsilon, $$ where 1. $y$ is a continuous *response variable* (or *dependent variable*),\ 2. $x_1, \ldots, x_p$ are *covariates* (or *predictors*, or *independent variables*), and\ 3. $\epsilon$ is the *error term* and assumed to be normally distributed and independent among observations. - In 200C, we generalize the linear models in three directions. 1. **Generalized linear models (GLMs)** handles nonnormal responses, $y$. \ * binary response (disease or not) * proportions * counts 2. **Mixed effects models** relaxes the independence assumption of the error term and allows correlation structure in $\epsilon$.\ * Some data has a grouped, nested or hierarchical structure. * Repeated measures, longitudinal and multilevel data 3. **Nonparametric regression models** (GAM, trees, neural networks) allow nonlinearity in the effects of predictors $x$ on responses. ## Course description - Read [syllabus](https://ucla-biostat-200c.github.io/2025spring/syllabus/syllabus.html) and [schedule](https://ucla-biostat-200c.github.io/2025spring/schedule/schedule.html) for a tentative list of topics and course logistics. - Teaching philosophy. Usually a GLM course is taught on blackboard/whiteboard with mostly math derivations. In this course, I will start from R code and then explain the theory behind it.