What’s coming
Topics traditionally covered in STAT 545A, with light updating:
Introduction to R and the RStudio IDE done
R scripts and workspaces, RStudio Projects; how to get your work done done
Creating reports from R scripts and R Markdown, using knitr done
Deep thoughts about data analytic work done
Care and feeding of data in R; data frames done
R objects – beyond data frames done
Indexing, subsetting done
- Data aggregation; “apply” functions,
plyr, dplyr done
Making figures with ggplot2 (was lattice in past) done
How to help yourself, how to ask questions to get useful answers done
How to get data in and out of R, staying as “open” as possible done
How to get figures out of R done
Be the boss of your factors, i.e. categorical variables done
Use of color in R done
- Single quantitative variable: visualizations and descriptive statistics
- Two quantitative variables: visualizations and descriptive statistics
- Categorical variables: visualizations and descriptive statistics
- Multivariate visualizations
Visualizing and summarizing data when “grouped” done
- Coding style and project organization
New topics for STAT 545A and/or STAT 547M will be selected from here:
- Bash shell / unix basics, personal system administration.
Version control with Git, collaboration via GitHub done
The tabular data mentality, “tidy” data, data reshaping done
Regular expressions, programmatic transformation and searching of character data done
Writing R functions done
Creating interactive pages, apps, and graphics via Shiny done and (maybe) ggvis
Unit testing, at least as a mentality. Maybe will cover formal unit testing, e.g. testthat done
- Stats particularly useful in exploration (and often neglected in standard intro stats courses)
- robust summary statistics
- robust regression
- smoothing
- density estimation
- cluster analysis, PCA, SVD, MDS
Embrace the web: done
getting data from the web, e.g. using an API or via scraping
exposing your hard work on the web (data, code, results)
Distributing data and code to the world via an R package done
Automating an analytical pipeline, e.g. via Make. done