--- title: "Short Lab 3" author: "INSERT YOUR NAME HERE" date: "Due Date Here" output: html_document --- As usual, all code below should follow the style guidelines from the lecture slides. ## Part 1. Read in text data For this short lab, we will be using Project Gutenberg’s The Complete Works of William Shakespeare. Use the command `read_lines()` from the `readr` package to read the text available at "https://www.gutenberg.org/files/100/100-0.txt". Make sure to store the text as a variable. Use the `skip` argument to discard the first 23 lines of extra info. **1a.** Print the first 5 lines. **1b.** Print the total number of lines. **1c.** Remove all empty lines, then print the total number of lines. (*Hint: to remove empty elements from a string vector x, you could use* `x <- x[x != ""]`) ## Part 2. String Manipulation **2a.** Use `str_c()` to collapse the Shakespeare string vector into one large string. (Don't try to print it!) **2b.** Use `str_split()` to separate your string into words. (*Hint: you might get a list of length 1 that you have to convert to a vector. You could do this by using something like* `x <- unlist(x)` *or* `x <- x[[1]]`) **2c.** Use a combination of `table()` and `sort(..., decreasing = TRUE)` argument to get a count of the unique words in Shakespeare's complete works and print out the 10 most common words. ## Part 3. Factors **3a.** Use the code below to load the `movies` data, courtesy of `the-numbers.com`. Turn the `genre` and `mpaa_rating` variables into factors. ```{r, message = F, warning = F} movies <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-10-23/movie_profit.csv") ``` **3b.** Collapse the `Drama` and `Horror` levels of `genre` into one `Drama_Horror` level. **3c.** Create a new factor variable in the movies tibble, `audience`, that takes the value `"all ages"` for G and PG movies, `"Teens and adults"` for PG-13 movies, and `"Adults only"` for R movies. ## Part 4. Dates **4a.** Convert the `release_date` variable into a column of `Date` objects using an appropriate function. **4b.** Create a new column for `year` that extracts the year of release for each movie.