--- title: "Bechdel" author: "YOU" date: "`r Sys.Date()`" output: html_document: fig_height: 4 fig_width: 9 --- In this mini analysis we work with the data used in the FiveThirtyEight story titled ["The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women"](https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/). ## Data and packages We start with loading the packages we'll use. ```{r load-packages, message=FALSE} library(fivethirtyeight) library(tidyverse) ``` The dataset contains information on `r nrow(bechdel)` movies released between `r min(bechdel$year)` and `r max(bechdel$year)`. However we'll focus our analysis on movies released between 1990 and 2013. ```{r} bechdel90_13 <- bechdel %>% filter(between(year, 1990, 2013)) ``` There are ---- such movies. The financial variables we'll focus on are the following: - `budget_2013`: Budget in 2013 inflation adjusted dollars - `domgross_2013`: Domestic gross (US) in 2013 inflation adjusted dollars - `intgross_2013`: Total International (i.e., worldwide) gross in 2013 inflation adjusted dollars And we'll also use the `binary` and `test_clean` variables for grouping. ## Analysis Let's take a look at how median budget and gross vary by whether the movie passed the Bechdel test. ```{r} bechdel90_13 %>% group_by(binary) %>% summarise(med_budget = median(budget_2013), med_domgross = median(domgross_2013, na.rm = TRUE), med_intgross = median(intgross_2013, na.rm = TRUE)) ``` Next, let's take a look at how median budget and gross vary by a more detailed indicator of the Bechdel test result (`ok` = passes test, `dubious`, `men` = women only talk about men, `notalk` = women don't talk to each other, `nowomen` = fewer than two women). ```{r} bechdel90_13 %>% # ____ %>% summarise(med_budget = median(budget_2013), med_domgross = median(domgross_2013, na.rm = TRUE), med_intgross = median(intgross_2013, na.rm = TRUE)) ``` In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, we'll first create a new variable called `roi` as the ratio of the gross to budget. ```{r} bechdel90_13 <- bechdel90_13 %>% mutate(roi = intgross_2013 / domgross_2013) ``` Let's see which movies have the highest return on investment. ```{r} bechdel90_13 %>% arrange(desc(roi)) %>% select(title, clean_test, binary, roi, budget_2013, intgross_2013) ``` Below is a visualization of the return on investment by test result, however it's difficult to see the distributions due to a few extreme observations. ```{r} ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) + geom_boxplot() + labs(title = "Return on investment vs. Bechdel test result", x = "Detailed Bechdel result", y = "___", color = "Binary Bechdel result") ``` Zooming in on the movies with `roi < 10` provides a better view of how the medians across the categories compare: ```{r} ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) + geom_boxplot() + ylim(0, 10) + labs(title = "Return on investment vs. Bechdel test result", subtitle = "___", x = "Detailed Bechdel result", y = "Return on investment", color = "Binary Bechdel result") ``` ## References {#references} 1. Assignment Adapted from [Mine Cetinkaya-Rundel's Data Science in a Box](https://github.com/rstudio-education/datascience-box)