--- title: "Biostat M280 Homework 3" subtitle: Due Mar 1 @ 11:59PM output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` Use tidyverse and bash to explore following two data sets. ## Q1 LA City Employee Payroll The `/home/m280data/la_payroll/City_Employee_Payroll.csv` file on teaching server contains payroll information of LA City employees in years 2013-2018. It was downloaded from [LA City Controller's Office](https://controllerdata.lacity.org/Payroll/City-Employee-Payroll/pazn-qyym). Make a Shiny app to facilitate visualization of this data. 1. For efficiency of the Shiny app, you should first pre-process, pare down, tidy, and save the data, e.g., as a compressed RDS file, to be used in the app. 0. **Total payroll by LA City**. Visualize the total LA City payroll of each year, with breakdown into base pay, overtime pay, and other pay. 0. **Who earned most?** Visualize the payroll information (total payment with breakdown into base pay, overtime pay, and other pay, Department, Job Title) of the top $n$ highest paid LA City employees in a specific year. User specifies $n$ (default 10) and year (default 2017). 0. **Which departments earn most?** Visualize the mean or median payroll, with breakdown into base pay, overtime pay, and other pay, of top $n$ earning departments. User specifies $n$ (default 5), year (default 2017), and method (mean or median, default median). 0. **Which departments cost most?** Visualize the total payroll, with breakdown into base pay, overtime pay, and other pay, of top $n$ expensive departments. User specifies $n$ (default 5) and year (default 2017). 0. Visualize any other information you are interested in. 0. Publish your Shiny app to and share the link. ## Q2 LA City Parking War The SQLite database `/home/m280data/la_parking/LA_Parking_Citations.sqlite` on teaching server contains information about parking tickets in LA City. It was downloaded from [LA Open Data Portal](https://data.lacity.org/A-Well-Run-City/Parking-Citations/wjz9-h9np). Connect to the database and answer following questions using plots and summary statistics. In this exercise, you are **not** allowed to load whole data into memory. Use the _transform in database, plot in R_ strategy. 1. How many tickets are in this data set? Which time period do these tickets span? Which years have most data? 0. When (which hour, weekday, month day, and month) are you most likely to get a ticket and when are you least likely to get a ticket? 0. Which car makes received most citations? 0. How many different colors of cars were ticketed? Which color attracted most tickets? 0. What are the most common ticket types? 0. How much money was collected on parking tickets in 2016, 2017 and 2018? 0. If you've been ticketed in LA County, did you find your ticket in this data set? 0. Read the blog and try to reproduce plots using your data. [pdf](./What\ 9\ Million\ Parking\ Tickets\ Says\ About\ Los\ Angeles\ ยท\ Brettrics.pdf)