--- title: "Beating the Odds" author: "Aaron Butler & Hannah Poquette" date: "December 8, 2020" output: html_document: theme: simplex css: ../includes/styles.css highlight: null keep_md: yes toc: yes toc_depth: 3 toc_float: yes number_sections: no --- # Implementing a BTO Analysis *A step-by-step guide to implementing a beating-the-odds (BTO) analysis using a multilevel framework. Programmed in R.* ```{r knitrSetup, echo=FALSE, error=FALSE, message=FALSE, warning=FALSE, comment=NA} # Set options for knitr library(knitr) knitr::opts_chunk$set(comment=NA, warning=FALSE, echo=TRUE, root.dir = normalizePath("../"), error=FALSE, message=FALSE, fig.align='center', fig.width=8, fig.height=6, dpi = 144, fig.path = "../figure/E_", cache.path = "../cache/E_") options(width=80) ``` ## Getting Started ### Objective In this guide, you will use statistical models to predict school performance based on the demographic makeup of schools' student populations and compare these predictions with actual school performance. ### Purpose and Overview of Analyses School leaders often want to identify promising practices that distinguish high-performing schools from their counterparts and facilitate the transfer of some of these practices to struggling schools. A BTO analysis is one approach school leaders can take to identify schools that perform better or worse than expected, given the unique student populations they serve. In general, BTO analyses predict school performance based on the demographic make up of schools’ student populations and then compare these predictions with actual school performance. Schools with observed performance that is statistically significantly greater than their predicted performance are typically considered to be performing better than expected, or beating the odds. Schools with observed performance that is statistically significantly less than their predicted performance meet the worse than expected criteria. There are a number of examples of BTO analyses being used to identify schools exceeding expectations in achievement gap closure, reading, English language arts, math, graduation rate, and state-determined performance measures. Many focused on a variety of educational contexts such as rural districts, high poverty high schools, and charter schools. The approach can be used to guide decision-making by providing objective information to leadership about schools that may warrant a closer look either positively or out of concern. | State | Performance Metric | Population | |:-----:|:------------------:|:----------:| | [Colorado](https://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=4498) | Achievement gap closure | Rural districts | | [Georgia](https://gosa.georgia.gov/accountability-0/beating-odds-analysis) | College and career readiness | All K-12 schools | | [Florida](https://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=363) | Grade 3 reading | Public elementary schools | | [Michigan](https://www.michigan.gov/documents/mde/Beating_the_Odds_Business_Rules_548870_7.pdf) | State-defined performance measures | All K-12 schools | | [Mississippi](https://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=4465) | English language arts; math | Grade 3-8 public schools | | [Nebraska](https://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=4498) | Achievement gap closure | Rural districts | | [Puerto Rico](https://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=4468) | Graduation rate; reading; math | High poverty high schools | | [South Carolina](https://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=418) | English language arts; math | Charter schools | The purpose of this guide is to present a data-driven approach to identify BTO schools. We use multilevel models to predict student performance and school-level effects on that performance. Next, we compare each school's predicted performance to its actual performance. The school is identified as beating the odds if its actual performance is higher or lower than predicted by a statistically significant margin. The procedures presented in this guide are based on those used in a collaborative study by the [Kentucky Department of Education](https://education.ky.gov/) and [REL Appalachia](https://ies.ed.gov/ncee/edlabs/regions/appalachia/). ### Using this Guide This guide draws on SDP's ["Faketucky"](https://github.com/opensdp/faketucky), a synthetic dataset based on real student data, in the analysis. While these data are synthetic, the code is not. Schools and districts wanting to conduct their own BTO analysis can easily adapt the code provided in this guide to their own data. To replicate or modify the analysis described in this guide, click the "Download" buttons to download R code and sample data. You can make changes to the charts using the code and sample data, or modify the code to work with your own data. If you are familiar with GitHub, you can click "Go to Repository" and clone the entire repository to your own computer. We encourage you to go to our [Participate page](https://opensdp.github.io/participate/) to read about more ways to engage with the OpenSDP community or reach out for assistance in adapting this code for your specific context. ### Installing and Loading R Packages To complete this tutorial, you will need R, R Studio, and the following R packages installed on your machine: - `tidyverse`: For convenient data and output manipulation - `lme4`: To fit multilevel models - `merTools`: To extract expected ranks from fitted models - `glue`: To format and interpolate strings To install packages, such as `lme4`, run the following command in the R console: `install.packages("lme4")` In addition, we use a custom ggplot theme -- `sdp_theme()` -- to make text size, font and color, axis lines, axis text, and other standard chart components into an OpenSDP style. Custom themes can help you establish a professional brand for your data visualizations. Organizations such as [The Urban Institute](https://medium.com/@urban_institute/why-the-urban-institute-visualizes-data-with-ggplot2-46d8cfc7ee67) and [BBC News](https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535) make use of custom ggplot themes to create publication-ready charts that are consistent with their organization's branding. While you may not have a dedicated marketing team, making an effort to match chart style and formatting to your district's brand can help your results stand out to stakeholders. After installing your R packages and downloading this guide's GitHub repository, run the chunk of code below to load your packages and custom theme onto your computer. ```{r load-packages, echo=TRUE} # Load packages # Note: We do not load the `merTools` package because it masks # the `select` function from `dplyr` (found in `tidyverse`). library(tidyverse) library(lme4) library(glue) # Set custom ggplot2 theme for BTO guide sdp_theme <- function() { theme_minimal() + theme( panel.grid = element_blank(), plot.title = element_text(size = 16, face = "bold"), plot.title.position = "plot", plot.subtitle = element_text(size = 14), axis.title = element_text(size = 12), axis.text = element_text(size = 12), legend.position = "top", legend.text = element_text(size = 12), strip.text = element_text(size = 12), strip.background = element_rect(fill = "gray80", color = "gray80") ) } ``` ### About the Data This guide uses SDP's ["Faketucky"](https://github.com/opensdp/faketucky) dataset. The Faketucky synthetic dataset contains high school and college outcome data for two graduating cohorts of approximately 40,000 students. There are no real students in the dataset, but it mirrors the relationships between variables present in real data. The dataset was developed as an offshoot of [SDP's College-Going Diagnostic for Kentucky](https://cepr.harvard.edu/publications/sdp-college-going-diagnostic-kentucky), using the R [synthpop](https://cran.r-project.org/web/packages/synthpop/index.html) package. In addition, we created school-level aggregates for each variable. The first step of the analysis "Prepare data for analysis" describes the steps to create the school-level variables. Below is a list of variables and descriptions used in the analyses: | Variable Name | Variable Description | |:----------- |:------------------ | | `first_dist_code` | Code of first district attended in high school | | `first_dist_name` | Name of first district attended in high school | | `first_hs_code` | Code of first high school attended | | `first_hs_name` | Name of first high school attended | | `chrt_ninth` | Student 9th grade cohort | | `male` | Student male indicator | | `race_ethnicity` | Student race/ethnicity | | `frpl_ever_in_hs` | Student ever received free or reduced price lunch in high school | | `sped_ever_in_hs` | Student ever classified as special ed in high school | | `lep_ever_in_hs` | Student ever classified as limited English proficiency in high school| | `gifted_ever_in_hs` | Student ever classified as gifted in high school | | `scale_score_8_math` | Scaled score of 8th grade math test | | `scale_score_8_read` | Scaled score of 8th grade reading test | | `scale_score_11_math`| Scaled score of highest math ACT | | `scale_score_11_read`| Scaled score of highest reading ACT | #### Loading the Dataset ```{r load-data, echo=TRUE} # Load "Faketucky" data file load("../data/faketucky.rda") # Select variables of interest my_vars <- c("first_dist_code", "first_dist_name", "first_hs_code", "first_hs_name", "chrt_ninth", "male", "race_ethnicity", "frpl_ever_in_hs", "sped_ever_in_hs", "lep_ever_in_hs", "gifted_ever_in_hs", "scale_score_8_math", "scale_score_8_read", "scale_score_11_math", "scale_score_11_read") faketucky <- faketucky_20160923[my_vars] ``` ### Giving Feedback on this Guide This guide is an open-source document hosted on GitHub and generated using R Markdown. We welcome feedback, corrections, additions, and updates. Please visit the OpenSDP [participate repository](https://opensdp.github.io/participate/) to read our contributor guidelines. ## Analyses ### Identify BTO Schools **Purpose:** This analysis illustrates how to identify schools that perform better or worse than expected, given the unique student populations they serve, using a BTO approach. **Required Analysis File Variables:** - `first_hs_code` - `chrt_ninth` - `male` - `race_ethnicity` - `frpl_ever_in_hs` - `sped_ever_in_hs` - `lep_ever_in_hs` - `gifted_ever_in_hs` - `scale_score_8_math` - `scale_score_8_read` - `scale_score_11_math` - `scale_score_11_read` **Analytic Technique:** We use multilevel models to predict student performance and school-level effects on that performance. There are a number of benefits to using a multilevel approach over a more traditional approach like ordinary least squares. In particular, a multilevel approach allows us to account for the hierarchical or nested structure of the data. In this case, student observations and school observations from different years are nested within schools. Additionally, recent BTO studies have used a multilevel approach with success (e.g., [Bowers, 2015](https://www.tandfonline.com/doi/full/10.1080/13632434.2014.962500); [Partridg, Rudo, & Herrera, 2017](https://eric.ed.gov/?id=ED572602)). We encourage you check out Gelman and Hill's book [Data Analysis Using Regression and Multilevel/Hierarchical Models](http://www.stat.columbia.edu/~gelman/arm/) if you're interested in learning more about multilevel models and their applications. **Ask Yourself:** - What is the structure of my data? How might it influence the model? - What variables do I want to include in the model? - What students will be included in or excluded from the model? - What metrics will I use to determine appropriate model fit? **A Note on Missing Data:** It is important to determine how you want to address missing data before you begin your analysis. For the purpose of guide, we chose to exclude students with data missing from the analyses for simplicity. We recommend that you conduct a missing data analysis to determine whether your data is missing completely at random, missing at random, or missing not at random and apply the appropriate strategy to address your missing data. Andrew Gelman's chapter on [Missing-data Imputation in R](http://www.stat.columbia.edu/~gelman/arm/missing.pdf) is a great resource to help you think about your options. #### Step 1: Prepare data for analysis This BTO analysis uses a multilevel framework that incorporates school-level information into the modeling process. To prepare the analytical dataset, we calculated school averages within each school year. These variables were then centered using the grand mean of each variable. We centered variables to aid in the interpretation of the school intercepts. *Note:* We recognize that grand-mean centering may not be an appropriate option for your analysis. We encourage you to explore other centering techniques for multilevel modeling. Gelman and Hill's [Data Analysis Using Regression and Multilevel/Hierarchical Models](http://www.stat.columbia.edu/~gelman/arm/) and Raudenbush and Bryk's [Hierarchical Linear Models](https://us.sagepub.com/en-us/nam/hierarchical-linear-models/book9230) provide further details on the different centering techniques as well as the advantages and disadvantages of their use in multilevel modeling. #### Calculate school averages by cohort year ```{r calc-avgs, echo=TRUE} # Calculate school averages by cohort year sch_avg_by_cohort <- faketucky %>% # Create indicator for students identifying as white mutate(race_white = ifelse(race_ethnicity == "White", 1, 0)) %>% # Group data by high school and cohort year group_by(first_hs_code, chrt_ninth) %>% # Calculate school averages mutate(sch_male = mean(male, na.rm = TRUE), sch_white = mean(race_white, na.rm = TRUE), sch_frpl = mean(frpl_ever_in_hs, na.rm = TRUE), sch_sped = mean(sped_ever_in_hs, na.rm = TRUE), sch_lep = mean(lep_ever_in_hs, na.rm = TRUE), sch_gifted = mean(gifted_ever_in_hs, na.rm = TRUE), sch_8_math = mean(scale_score_8_math, na.rm = TRUE), sch_8_read = mean(scale_score_8_read, na.rm = TRUE)) %>% # Remember to ungroup ungroup() ``` #### Center variables around grand mean ```{r center-vars, echo=TRUE} # Calculate across-year cohort averages cohort_avg <- sch_avg_by_cohort %>% # Flag 2010 cohort (for modeling) mutate(flag_2010_cohort = ifelse(chrt_ninth == 2010, 1, 0)) %>% # Group by cohort year group_by(chrt_ninth) %>% # Center prior achievement test scores mutate(math_8_center = scale_score_8_math - mean(scale_score_8_math, na.rm = TRUE), read_8_center = scale_score_8_read - mean(scale_score_8_read, na.rm = TRUE)) %>% # Calculate averages mutate(sch_male_mean_year = mean(sch_male, na.rm = TRUE), sch_white_mean_year = mean(sch_white, na.rm = TRUE), sch_frpl_mean_year = mean(sch_frpl, na.rm = TRUE), sch_sped_mean_year = mean(sch_sped, na.rm = TRUE), sch_lep_mean_year = mean(sch_lep, na.rm = TRUE), sch_gifted_mean_year = mean(sch_gifted, na.rm = TRUE), sch_8_math_mean_year = mean(sch_8_math, na.rm = TRUE), sch_8_read_mean_year = mean(sch_8_read, na.rm = TRUE)) %>% # Ungroup data frame ungroup() # Center school-level variables sch_avg_center <- cohort_avg %>% # Subtract across-year averages from school averages mutate(sch_male_center = sch_male - sch_male_mean_year, sch_white_center = sch_white - sch_white_mean_year, sch_frpl_center = sch_frpl - sch_frpl_mean_year, sch_sped_center = sch_sped - sch_sped_mean_year, sch_lep_center = sch_lep - sch_lep_mean_year, sch_gifted_center = sch_gifted - sch_gifted_mean_year, sch_8_math_center = sch_8_math - sch_8_math_mean_year, sch_8_read_center = sch_8_read - sch_8_read_mean_year) ``` #### Step 2: Fit multilevel models Multilevel models are a powerful and flexible extension to conventional regression frameworks. This is one of the many reasons why they are so attractive to education researchers. However, this added flexibility can make fitting and interpreting such models a challenge. Here, we present a relatively simple multilevel model that takes into account school-level variation. We encourage you to read more on the topic of multilevel modeling and its application in BTO analyses. We fit a two-level multilevel model for each subject area outcome of interest -- ACT math and reading -- using the `lmer` function in the `lme4` package. Specifically, these models use a random intercept framework, which allows the school-level intercept to vary randomly around a cross-school mean. We recommend you read the [lme4 Reference Manual](https://cran.r-project.org/web/packages/lme4/lme4.pdf) and vignette [Fitting Linear Mixed-Effects Models Using lme4](https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf) before you fit your models. These resources contain a wealth of information. ```{r fit-models, echo=TRUE} # Fit multilevel models for each subject area m_math <- lmer( # Define model formula formula = scale_score_11_math ~ male + race_white + frpl_ever_in_hs + sped_ever_in_hs + lep_ever_in_hs + gifted_ever_in_hs + math_8_center + sch_male_center + sch_white_center + sch_frpl_center + sch_sped_center + sch_lep_center + sch_gifted_center + sch_8_math_center + flag_2010_cohort + (1|first_hs_code), # Call dataframe containing the variables named in formula data = sch_avg_center ) m_read <- lmer( formula = scale_score_11_read ~ male + race_white + frpl_ever_in_hs + sped_ever_in_hs + lep_ever_in_hs + gifted_ever_in_hs + read_8_center + sch_male_center + sch_white_center + sch_frpl_center + sch_sped_center + sch_lep_center + sch_gifted_center + sch_8_read_center + flag_2010_cohort + (1|first_hs_code), data = sch_avg_center ) ``` #### Step 3: Inspect Summary Statistics for Model Fit Examine the coefficients and standard errors for each variable. Ask yourself if the estimates are in the range of reasonable possibility. If not, go back and inspect your dataset and make sure there are no errors in processing the data. Also inspect your dataset to make sure that the assumptions of multilevel modeling hold. For information on other model checking and sensitivity analysis for multilevel models, see Snijders and Berkhof's chapter on [Diagnostic Checks for Multilevel Models](https://www.stats.ox.ac.uk/~snijders/handbook_ml_ch3.pdf). Below we print a summary of the math model for illustrative purposes. ```{r inspect-fit, echo=TRUE} # Call summary statistics for the math model summary(m_math) # Remove the comment (#) below to print a summary of the reading model # summary(m_read) ``` Visual diagnostic plots are another way to inspect the quality of your model. Using a helpful plotting function found in the [merTools package](https://cran.r-project.org/web/packages/merTools/index.html), we plot the results of a simulation of random effects for each model. Look for variation and confidence bands (dark bars) that do not overlap the red line for zero. Here, we established that a number of school effects are meaningfully different from zero. *Note:* We do not load the `merTools` package because it masks the `select` function from `dplyr` (found in `tidyverse`). Instead, we chose to call individual functions directly from `merTools` using the `::` operator (e.g., `merTools::plotREsim`). This is a handy workaround when dealing with packages with conflicting functions. ```{r plot-RE, echo=TRUE} # Plot random effects for the math model merTools::plotREsim(merTools::REsim(m_math, n.sims = 100), stat = "median", sd = TRUE) # Remove comment (#) below to plot random effects for the reading model # merTools::plotREsim(merTools::REsim(m_read, n.sims = 100), stat = "median", sd = TRUE) ``` #### Step 4: Calculate statistics for BTO schools We used a measure called "expected rank" to identify BTO schools. Expected rank provides the percentile ranks for the observed groups (i.e., schools) in the random effect distribution taking into account both the magnitude and uncertainty of the estimated effect for each group. Incorporating magnitude and uncertainty in the BTO process is a key advantage of using this technique when assessing the performance of schools with small student populations. Estimates for small schools are more uncertain due to having few student observations. A BTO analysis that relies only on confidence intervals and point estimates biases the results towards small schools with uncertain, but very large positive values. Using expected ranks mitigates these biases. We extracted the expected rank and more reliable confidence intervals using the `merTools` package's `REsim` and `expectedRank` functions. Next, we identified schools as beating the odds if they were above the 70th percentile of all schools. Conversely, schools were identified as performing worse than expected if they were below the 30th percentile. #### Calculate school-level expected ranks ```{r pull-resids, echo=TRUE} ranks_math <- merTools::expectedRank(m_math, groupFctr = "first_hs_code") ranks_read <- merTools::expectedRank(m_read, groupFctr = "first_hs_code") ``` #### Create function to flag BTO schools ```{r calc-bto, echo=TRUE} calc_bto <- function(.ranks, .var) { # Input model expected ranks .ranks %>% # Flag schools that perform above/below benchmark (i.e., BTOs) # assume math/read have same cut offs mutate(bto = ifelse(pctER >= 70 | pctER < 30, "yes", "no")) %>% # Select and name variables of interest select(first_hs_code = groupLevel, "estimate_{{ .var }}" := estimate, "pctER_{{ .var }}" := pctER, "bto_{{ .var }}" := bto) } ``` #### Execute function for each subject area ```{r execute, echo=TRUE} bto_math <- calc_bto(ranks_math, math) bto_read <- calc_bto(ranks_read, read) ``` #### Merge datasets for plotting ```{r merge, echo=TRUE} # Merge BTO datasets bto_read_math <- left_join(bto_read, bto_math, by = "first_hs_code") # Pull school and district info from original dataset sch_names <- faketucky %>% select(first_dist_code, first_hs_code, first_dist_name, first_hs_name) %>% distinct() %>% # Convert high school work to factor for merge mutate(first_hs_code = factor(first_hs_code)) # Merge BTO and school info datasets sch_bto_data <- left_join(sch_names, bto_read_math, by = "first_hs_code") ``` #### Step 5: Plot BTO schools It's always helpful to plot the results of your analysis. We've found a nice scatter plot can be an effective visualization for communicating the overall results of the analysis. ```{r plot-bto, echo=TRUE} # Create scatter plot math/read residuals sch_bto_data %>% mutate(bto_type = case_when( bto_math == "yes" & bto_read == "yes" ~ "Math AND Reading", bto_math == "yes" | bto_read == "yes" ~ "Math OR Reading", TRUE ~ "Neither/Not BTO" )) %>% ggplot(aes(estimate_math, estimate_read, color = bto_type)) + geom_point(size = 2, alpha = .6) + geom_vline(xintercept = 0, linetype = "dashed") + geom_hline(yintercept = 0, linetype = "dashed") + scale_color_manual(values = c("#E69F00", "#0049E6", "#999999")) + sdp_theme() + theme(panel.grid = element_line(color = "grey92")) + labs(x = "Math Residual", y = "Reading Residual", color = "", title = "Schools that Performed Better/Worse that Expected in Math and Reading") ``` ### BTO schools by performance level **Purpose:** This analysis examines the distribution of BTO schools in math and reading and establishes performance levels (e.g., small, medium, or large changes in between the predicted and actual school performance). Results provide an overall summary of a BTO analysis. **Required Analysis File Variables:** - `first_hs_code` - `pctER_math` (calculated field from BTO analysis) - `pctER_read` (calculated field from BTO analysis) **Ask Yourself** - How many schools are performing better than expected given their student characteristics? How many are performing worse than expected? - How large or small are the differences between schools' predicted performances and actual performances? - How does school performance vary by subject area? **Analytic Technique:** Determining reasonable cut points for performance categories (such as small, medium, or large). Count the number of schools by performance level and plot a matrix to compare school performance across subject areas. #### Determine performance levels There are a number of ways to establish performance levels. Examples include referencing academic literature, technical documents, pilot studies, statistical analyses, etc. For illustrative purposes, we aligned performance levels to percentile ranks where: - Large performance increase = 90th percentile or above - Medium performance increase = 80th to 89th percentile - Small performance increase = 70th to 79th percentile - No performance increase/decrease (i.e., not BTO) = 30th to 69th percentile - Small performance decrease = 20th to 29th percentile - Medium performance decrease = 10th to 19th percentile - Large performance decrease = 9th percentile or below **Possible Next Steps or Action Plans:** Identify which schools are performing at different levels. Develop academic plan for schools at each performance level. ```{r set-perf-lvl, echo=TRUE} # Create function to add performance levels add_performance_levels <- function(.bto_data, .pctER.subject, .var) { # Input BTO data .bto_data %>% # Define performance levels mutate(perform_lvl = case_when( # Positive between({{ .pctER.subject }}, 70, 80) ~ 1, between({{ .pctER.subject }}, 80, 90) ~ 2, {{ .pctER.subject }} >= 90 ~ 3, # Negative between({{ .pctER.subject }}, 20, 30) ~ -1, between({{ .pctER.subject }}, 10, 20) ~ -2, {{ .pctER.subject }} < 10 ~ -3, # Other TRUE ~ 0 )) %>% # Label performance levels mutate(perform_text = case_when( perform_lvl == -3 ~ "Large\ndecrease", perform_lvl == -2 ~ "Medium\ndecrease", perform_lvl == -1 ~ "Small\ndecrease", perform_lvl == 0 ~ "Not\nBTO", perform_lvl == 1 ~ "Small\nincrease", perform_lvl == 2 ~ "Medium\nincrease", perform_lvl == 3 ~ "Large\nincrease" )) %>% mutate(perform_text = fct_relevel( perform_text, "Large\ndecrease", "Medium\ndecrease", "Small\ndecrease", "Not\nBTO", "Small\nincrease", "Medium\nincrease", "Large\nincrease")) %>% # Rename variables of interest rename("perform_lvl_{{ .var }}" := perform_lvl, "perform_text_{{ .var }}" := perform_text) } ``` #### Execute function for each subject ```{r} perf_lvl_math <- add_performance_levels(sch_bto_data, pctER_math, math) %>% # Drop reading variables for following merge select(-contains("read")) perf_lvl_read <- add_performance_levels(sch_bto_data, pctER_read, read) %>% select(-contains("math")) ``` #### Merge datasets for plotting ```{r plot-perf-lvl, echo=TRUE} bto_perform_lvl <- left_join(perf_lvl_math, perf_lvl_read) ``` #### Plot performance levels ```{r} bto_perform_lvl %>% select(first_hs_code, starts_with("perform_text")) %>% pivot_longer(-first_hs_code, names_to = "subject", values_to = "perform_lvl") %>% count(subject, perform_lvl) %>% mutate(subject = ifelse(str_detect(subject, "math"), "Math", "Reading")) %>% ggplot(aes(perform_lvl, n, fill = subject)) + geom_col(position = position_dodge(.85), width = .8) + geom_text(aes(label = n), position = position_dodge(.85), vjust = -.45) + scale_fill_manual(values = c("#E69F00", "#999999")) + sdp_theme() + theme(panel.grid.major.y = element_line(color = "grey92")) + labs(x = "Performance Level", y = "Number of Schools", fill = "", title = "Distribution of School Performance Categories for Math and Reading") ``` #### Create a table of schools by performance levels Creating tables or plots that compare school performance levels in math and reading allows us to see whether schools perform better or worse in one or two subject areas. In this example, we see that school performance in math and reading are correlated. This information can be used to inform the way we think about best practices. ```{r bto-cuts-table, echo=TRUE} bto_perform_lvl %>% select(first_hs_code, starts_with("perform_text")) %>% count(perform_text_math, perform_text_read) %>% ggplot(aes(perform_text_math, perform_text_read)) + geom_tile(aes(width = .95, height = .95), fill = "white", color = "black") + geom_text(aes(label = n)) + sdp_theme() + labs(x = "Math Performance", y = "Reading Performance", title = "School Counts by Performance Categories for Math and Reading") ``` ### BTO schools by prior performance **Purpose:** This analysis compares prior achievement by BTO performance levels to better understand the progress made by schools in the different performance groups. **Required Analysis File Variables:** - `first_hs_code` - `chrt_ninth` - `scale_score_11_math` - `scale_score_11_read` - `bto_math` (calculated field from BTO analysis) - `bto_read` (calculated field from BTO analysis) **Ask Yourself** - To what extent is prior achievement related to BTO performance levels? What can we infer from these findings? - How does prior achievement and BTO performance levels differ by subject area? **Analytic Technique:** Calculate the average prior achievement for each school in the BTO analysis then rank schools by their score. #### Calculate average prior achievement for each school ```{r calc-prior-perf, echo=TRUE} prior_achieve <- faketucky %>% filter(chrt_ninth == "2009") %>% select(first_hs_code, scale_score_11_math, scale_score_11_read) %>% mutate(first_hs_code = as.character(first_hs_code)) %>% # Calculate average scores by subject group_by(first_hs_code) %>% summarise(avg_11_read = mean(scale_score_11_read, na.rm = TRUE), avg_11_math = mean(scale_score_11_math, na.rm = TRUE)) %>% # Rank schools by subject mutate(rank_11_read = rank(avg_11_read) / length(avg_11_read) * 100, rank_11_math = rank(avg_11_math) / length(avg_11_math) * 100) ``` #### Merge datasets for plotting ```{r merge-prior-perf, echo=TRUE} prior_achieve_bto <- bto_perform_lvl %>% select(first_hs_code, starts_with("bto")) %>% left_join(prior_achieve) ``` #### Plot prior performance ```{r plot-prior-perf, echo=TRUE, fig.height=8} prior_achieve_bto %>% mutate(bto_type = case_when( bto_math == "yes" & bto_read == "yes" ~ "Math AND Reading", bto_math == "yes" | bto_read == "yes" ~ "Math OR Reading", TRUE ~ "Neither/Not BTO" )) %>% ggplot(aes(rank_11_math, rank_11_read)) + geom_point(aes(color = bto_type), size = 2, alpha = .6) + geom_vline(xintercept = 50, linetype = "dashed") + geom_hline(yintercept = 50, linetype = "dashed") + scale_color_manual(values = c("#E69F00", "#0049E6", "#999999")) + sdp_theme() + labs(x = "Prior Performance - Math (Percentile Rank)", y = "Prior Performance - Reading (Percentile Rank)", color = "", title = "Prior Performance by BTO Status") ``` ### BTO schools by student demographics **Purpose:** This analysis explores the student demographics of schools at different performance levels. **Required Analysis File Variables:** - `first_hs_code` - `chrt_ninth` - `sch_white` - `sch_frpl` - `sch_sped` - `sch_lep` - `perform_lvl_math` (calculated field from BTO analysis) - `perform_lvl_read` (calculated field from BTO analysis) **Ask Yourself** - Do student populations differ for high- and low-performing schools? - How do high- and low-performing schools compare to all schools? **Analytic Technique:** Calculate the percentage of white (or minority) students, low income students, and special education students for schools in each performance level. #### Calculate the proportion of students by performance level ```{r bto-demos-data, echo=TRUE} # Create function to calculate proportions by subject area calc_props <- function(.perform_lvl_subject, .subject_lbl) { # Call BTO performance levels bto_perform_lvl %>% # Label BTO schools mutate(perform_high_low = case_when( {{ .perform_lvl_subject }} > 0 ~ glue("{.subject_lbl}_High Performing"), {{ .perform_lvl_subject }} < 0 ~ glue("{.subject_lbl}_Low Performing"), {{ .perform_lvl_subject }} == 0 ~ glue("{.subject_lbl}_Neither High/Low"), TRUE ~ NA_character_ )) %>% # Merge with school demographics left_join(sch_avg_by_cohort %>% filter(chrt_ninth == 2010) %>% mutate(first_hs_code = factor(first_hs_code))) %>% drop_na(perform_high_low) %>% # Calculate proportions by performance level group_by(perform_high_low) %>% summarise(prop_white = mean(sch_white, na.rm = TRUE), prop_frpl = mean(sch_frpl, na.rm = TRUE), prop_sped = mean(sch_sped, na.rm = TRUE), prop_lep = mean(sch_lep, na.rm = TRUE)) %>% # Convert to long format for plotting pivot_longer(cols = starts_with("prop")) %>% rename(group = perform_high_low) } ``` #### Execute function for each subject ```{r} prop_math <- calc_props(perform_lvl_math, "Math") prop_read <- calc_props(perform_lvl_read, "Reading") ``` #### Merge datasets for plotting ```{r} prop_stn_perform_lvl <- bind_rows(prop_math, prop_read) ``` #### Plot school demographics by performance level ```{r bto-demo-plot, echo=TRUE} prop_stn_perform_lvl %>% separate(group, into = c("subject", "group"), sep = "_") %>% mutate(group = fct_relevel(group, "Low Performing", "High Performing", "Neither High/Low")) %>% mutate(value = round(value * 100, 1)) %>% mutate(name = case_when( name == "prop_white" ~ "White", name == "prop_frpl" ~ "Low Income", name == "prop_sped" ~ "Special Education", name == "prop_lep" ~ "Limited English" )) %>% ggplot(aes(name, value, fill = group)) + geom_col(position = position_dodge(.85), width = .8) + geom_text(aes(label = round(value, 1)), position = position_dodge(.85), vjust = -.45) + expand_limits(y = c(0, 100)) + facet_wrap(~ subject, nrow = 2) + scale_fill_manual(values = c("#E69F00", "#0049E6", "#999999")) + sdp_theme() + theme(panel.grid.major.y = element_line(color = "grey92")) + labs(x = "Student Group", y = "Percent of Students", fill = "", title = "School Demographics by Performance Status in Math and Reading") ``` --- ##### *This guide was originally created by [Aaron Butler](https://www.aaronjbutler.com/) and Hannah Poquette in partnership with the Strategic Data Project.*