---
title: "Bechdel"
author: "YOU"
date: "`r Sys.Date()`"
output: 
  html_document: 
    fig_height: 4
    fig_width: 9
---

In this mini analysis we work with the data used in the FiveThirtyEight story titled ["The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women"](https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/).

## Data and packages

We start with loading the packages we'll use.

```{r load-packages, message=FALSE}
library(fivethirtyeight)
library(tidyverse)
```

The dataset contains information on `r nrow(bechdel)` movies released between `r min(bechdel$year)` and `r max(bechdel$year)`. However we'll focus our analysis on movies released between 1990 and 2013.

```{r}
bechdel90_13 <- bechdel %>% 
  filter(between(year, 1990, 2013))
```

There are ---- such movies.

The financial variables we'll focus on are the following:

- `budget_2013`: Budget in 2013 inflation adjusted dollars
- `domgross_2013`: Domestic gross (US) in 2013 inflation adjusted dollars
- `intgross_2013`: Total International (i.e., worldwide) gross in 2013 inflation adjusted dollars

And we'll also use the `binary` and `test_clean` variables for grouping.

## Analysis

Let's take a look at how median budget and gross vary by whether the movie passed the Bechdel test.

```{r}
bechdel90_13 %>%
  group_by(binary) %>%
  summarise(med_budget = median(budget_2013),
            med_domgross = median(domgross_2013, na.rm = TRUE),
            med_intgross = median(intgross_2013, na.rm = TRUE))
```

Next, let's take a look at how median budget and gross vary by a more detailed indicator of the Bechdel test result (`ok` = passes test, `dubious`, `men` = women only talk about men, `notalk` = women don't talk to each other, `nowomen` = fewer than two women).

```{r}
bechdel90_13 %>%
  # ____ %>%
  summarise(med_budget = median(budget_2013),
            med_domgross = median(domgross_2013, na.rm = TRUE),
            med_intgross = median(intgross_2013, na.rm = TRUE))
```

In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, we'll  first create a new variable called `roi` as the ratio of the gross to budget.

```{r}
bechdel90_13 <- bechdel90_13 %>%
  mutate(roi = intgross_2013 / domgross_2013)
```

Let's see which movies have the highest return on investment.

```{r}
bechdel90_13 %>%
  arrange(desc(roi)) %>% 
  select(title, clean_test, binary, roi, budget_2013, intgross_2013)
```

Below is a visualization of the return on investment by test result, however it's difficult to see the distributions due to a few extreme observations.

```{r}
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
  geom_boxplot() +
  labs(title = "Return on investment vs. Bechdel test result",
       x = "Detailed Bechdel result",
       y = "___",
       color = "Binary Bechdel result")
```

Zooming in on the movies with `roi < 10` provides a better view of how the medians across the categories compare:

```{r}
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
  geom_boxplot() +
  ylim(0, 10) +
  labs(title = "Return on investment vs. Bechdel test result",
       subtitle = "___",
       x = "Detailed Bechdel result",
       y = "Return on investment",
       color = "Binary Bechdel result")
```

## References {#references}
1. Assignment Adapted from [Mine Cetinkaya-Rundel's Data Science in a Box](https://github.com/rstudio-education/datascience-box)