--- title: "Describing data" author: "Your name here" format: html --- In this template, we have put the questions from the exercises in `**` to make them bold (to distinguish the questions from your answers). ```{r} #| message: false #| include: false library(dplyr) library(ggplot2) library(farr) library(forcats) # fct_inorder() library(tidyr) # pivot_wider() library(lubridate) # ceiling_date() ``` ```{r} #| include: false rets_nab_cba <- aus_banks |> inner_join(aus_bank_rets, by = "gvkey") |> filter(ticker %in% c("CBA", "NAB")) |> select(ticker, datadate, ret) ``` ```{r} #| include: false rets_nab_cba_wide <- rets_nab_cba |> pivot_wider(id_cols = datadate, names_from = ticker, values_from = ret) |> drop_na() ``` ```{r} #| include: false var_alt <- function(x) { sum((x - mean(x))^2)/(length(x) - 1) } ``` ```{r} #| include: false cov_alt <- function(x, y) { sum((x - mean(x)) * (y - mean(y)))/(length(x) - 1) } ``` ```{r} #| include: false latest_mkt_cap <- aus_banks |> inner_join(aus_bank_rets, by = "gvkey") |> filter(datadate == max(datadate)) |> select(ticker, co_name, mkt_cap) |> arrange(desc(mkt_cap)) |> mutate(ticker = fct_inorder(ticker)) ``` # Exercises ```{r} cor_alt <- function(x, y) { # This function is not complete! cov_alt(x, y) } ``` 1. Create a function `cor_alt(x, y)` that uses `cov_alt()` and `var_alt()` to calculate the correlation between `x` and `y`. Check that it gives the same value as the built-in function `cor()` for the correlation between `ret_nab` and `ret_cba` from `rets_nab_cba_wide`. 2. If we remove the `drop_na()` line used in creating `rets_nab_cba_wide`, we see missing values for `CBA`. There are two reasons for these missing values. One reason is explained [here](https://en.wikipedia.org/wiki/Commonwealth_Bank), but the other reason is more subtle and relates to how values are presented in `datadate`. What is the first reason? (*Hint:* What happened to CBA in 1991?) What is the second reason? How might we use `lubridate::ceiling_date(x, unit = "month")` to address the second reason? Does this second issue have implications for other plots? 3. Adapt the code used above to calculate the *correlation* matrix for the returns of Australian banks to instead calculate the *covariance* matrix. What is the calculated value of the variance of the returns for NAB? 4. From the output above, what is the value for the variance of NAB's returns given by the `cov()` function applied to `rets_nab_cba_wide`? Why does this value differ from that you calculated in the previous question? 5. What do the two-table verbs `semi_join()` and `anti_join()` do? In what way do they differ from the two-table verbs listed above? How could we replace `filter(ticker %in% latest_mkt_cap$ticker)` (see above) with one of these two verbs? 6. In calculating ROE above, we used `ib` rather than a measure of "net income". According to [WRDS](https://wrds-www.wharton.upenn.edu/pages/support/support-articles/compustat/global/ni-net-income-variable/), "`ni` [net income] only applies to Compustat North America. Instead use: `ni = ib + xi + do`." Looking at the data in `aus_bank_funds`, does this advice seem correct? How would you check this claim? (*Hint:* You should probably focus on cases where both `xi` and `do` are non-missing and checking more recent years may be easier if you need to look at banks' financial statements.) 7. Above we created a plot of market-to-book ratios. Another measure linking stock prices to fundamentals is the **price-to-earnings ratio** (also known as the **PE ratio**). Typically, PE ratios are calculated as $$ \textrm{PE} = \frac{\textrm{Stock price}}{\textrm{Earnings per share}} $$