--- title: "Solutions to 'Describing data' exercises" author: "Your name here" format: html --- # Discussion of source file Below (in the source `.qmd` file) is code that is extracted from the materials in the book that is relevant to the exercises below. We use `#| include: false` to hide this code from the compiled output. (Your *answers* should not have this in general.) In this template, we have put the questions from the exercises in `**` to make them bold (to distinguish the questions from your answers). ```{r} #| message: false #| include: false library(dplyr) library(ggplot2) library(farr) library(forcats) # For fct_indorder() library(tidyr) # For pivot_wider() library(lubridate) # For ceiling_date() ``` ```{r} #| include: false rets_nab_cba <- aus_banks %>% inner_join(aus_bank_rets, by = "gvkey") %>% filter(ticker %in% c("CBA", "NAB")) %>% select(ticker, datadate, ret) ``` ```{r} #| include: false rets_nab_cba_wide <- rets_nab_cba %>% pivot_wider(id_cols = datadate, names_from = ticker, values_from = ret) %>% drop_na() ``` ```{r} #| include: false var_alt <- function(x) { sum((x - mean(x))^2)/(length(x) - 1) } ``` ```{r} #| include: false cov_alt <- function(x, y) { sum((x - mean(x)) * (y - mean(y)))/(length(x) - 1) } ``` ```{r} #| include: false latest_mkt_cap <- aus_banks %>% inner_join(aus_bank_rets, by = "gvkey") %>% filter(datadate == max(datadate)) %>% select(ticker, co_name, mkt_cap) %>% arrange(desc(mkt_cap)) %>% mutate(ticker = fct_inorder(ticker)) ``` # Exercises 1. **Create a function `cor_alt(x, y)` that uses the `cov_alt` and `var_alt` to calculate the correlation between `x` and `y`. Check that it gives the same value as the built-in function `cor` for the correlation between `ret_nab` and `ret_cba` from `rets_nab_cba_wide`.** ```{r} cor_alt <- function(x, y) { # This function is not complete! cov_alt(x, y) } ``` 2. **If we remove the `drop_na()` line used in creating `rets_nab_cba_wide`, we see missing values for `CBA`. There are two reasons for these missing values. One reason is explained [here](https://en.wikipedia.org/wiki/Commonwealth_Bank), but the other reason is more subtle and relates to how values are presented in `datadate`. What is the first reason? (*Hint:* What happened to CBA in 1991?) What is the second reason? How might we use `lubridate::ceiling_date(x, unit = "month")` to address the second reason? Does this second issue have implications for other plots?** 3. **Adapt the code used above to calculate the *correlation* matrix for the returns of Australian banks to instead calculate the *covariance* matrix. What is the calculated value of the variance of the returns for NAB?** 4. **From the output above, what is the value for the variance of NAB's returns given by the `cov` function applied to `rets_nab_cba_wide`? Why does this value differ from that you calculated in the previous question?** 5. **What do the two-table verbs `semi_join()` and `anti_join()` do? In what way do they differ from the two-table verbs listed above? How could we replace `filter(ticker %in% latest_mkt_cap$ticker)` (see above) with one of these two verbs?** 6. **In calculating ROE above, we used `ib` rather than a measure of "net income". According to [WRDS](https://wrds-www.wharton.upenn.edu/pages/support/support-articles/compustat/global/ni-net-income-variable/), "`ni` [net income] only applies to Compustat North America. Instead use: `ni = ib + xi + do`." Looking at the data in `aus_bank_funds`, does this advice seem correct? How would you check this claim? (*Hint:* You should probably focus on cases where both `xi` and `do` are non-missing and checking more recent years may be easier.)** 7. **Above we created a plot of market-to-book ratios. Another measure linking stock prices to fundamentals is the *price-to-earnings ratio* (also known as the *PE ratio*). Typically, PE ratios are calculated as** $$ \textrm{PE} = \frac{\textrm{Stock price}}{\textrm{Earnings per share}} $$