---
title: "Course Introduction"
subtitle: Biostat 200C
author: "Dr. Jin Zhou @ UCLA"
date: today
format:
  html:
    theme: cosmo
    embed-resources: true
    number-sections: true
    toc: true
    toc-depth: 4
    toc-location: left
    code-fold: false
engine: knitr
knitr:
  opts_chunk: 
    fig.align: 'center'
    # fig.width: 6
    # fig.height: 4
    message: FALSE
    cache: false
---

## Brief intro

### Myself
#### Before 2021
- PhD in Biomathematics, UCLA (the department was renamed to ["UCLA Computational Medicine"](https://compmed.ucla.edu/))
- Postdoc \& Statistician, Harvard University
  + Department of Biostatistics
  + Channing's Lab, Brigham and Women's Hospital
- Assistant \& Associate professor,  University of Arizona (UofA)
  + Department of Epidemiology and Biostatistics 
  + Statistics and Genetics Graduate Interdisciplinary Program (GIDP)
- Adjunct Associate Professor, UCLA
  + Department of Medicine Statistics Core (DOMStat)

#### 2021 - Present
- Research principal investigator, Phoenix VA Health Care System
- Research principal investigator, Greater Los Angeles VA Health Care System
- Professor-in-Residence of Biostatistics, UCLA
  + Department of Biostatistics 
  
### TA

- Zian Zhuang
- Email: <zianzhuang@ucla.edu>

### You?

## Course webpages

- Github site: <https://ucla-biostat-200c.github.io/2026spring/>
  + slides, hw, announcements 

- Burinlearn: <https://bruinlearn.ucla.edu/courses/205219>
  + announcements and hw submissions


## What's this course about?

-   In 200B, we learn the linear models in the form $$
    y = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p + \epsilon, 
    $$ where
    1.  $y$ is a continuous *response variable* (or *dependent variable*),\
    2.  $x_1, \ldots, x_p$ are *covariates* (or *predictors*, or *independent variables*), and\
    3.  $\epsilon$ is the *error term* and assumed to be normally distributed and independent among observations.
-   In 200C, we generalize the linear models in three directions.
    1.  **Generalized linear models (GLMs)** handles nonnormal responses, $y$. \
        * binary response (disease or not)
        * proportions
        * counts
    2.  **Mixed effects models** relaxes the independence assumption of the error term and allows correlation structure in $\epsilon$.\
        * Some data has a grouped, nested or hierarchical structure. 
        * Repeated measures, longitudinal and multilevel data 
    3.  **Nonparametric regression models** (GAM, trees, neural networks) allow nonlinearity in the effects of predictors $x$ on responses.

## Course description

-   Read [syllabus](https://ucla-biostat-200c.github.io/2025spring/syllabus/syllabus.html) and [schedule](https://ucla-biostat-200c.github.io/2025spring/schedule/schedule.html) for a tentative list of topics and course logistics.

-   Teaching philosophy. Usually a GLM course is taught on blackboard/whiteboard with mostly math derivations. In this course, I will start from R code and then explain the theory behind it.