---
title: "Homework 2"
output:
html_document:
toc: yes
bookdown::html_chapter:
toc: no
layout: default_with_disqus
---
```{r setup, echo=FALSE, include=FALSE}
# PLEASE DO NOT EDIT THIS CODE BLOCK
library(knitr)
library(rrhw)
# tell knitr where to find the inserted file in case
# jekyll is building this in the top directory of the repo
opts_knit$set(child.path = paste(prj_dir_containing("rep-res-course.Rproj"), "extras/knitr_children/", sep=""))
rr_github_name <- NA
rr_commit <- NA
```
```{r insert-ids, child="homework-2-control.Rmd"}
```
## Problems to be done for "`r rr_homework_name`" {#hw2-start}
These are a selection of exercises on coercion, recycling, and indexing, including
indexing with names. For each problem, evaluate all the code in the code chunk
(highlight it and hit CMD-Enter (or cntrl-Enter on a PC)) and then have a look
at each of the variables involved before writing your answer.
Make sure your document still knits successfully before submitting.
```{r instruct-link, child="link-to-homework-instructions.Rmd"}
```
```{r include=FALSE, eval=FALSE}
################## DON'T MODIFY ANYTHING ABOVE THIS LINE ##########################
```
```{r coerce-and-multiply, rr.question=TRUE}
# Joe R. Newbie is trying to compute the componentwise product of two
# vectors x and y, but is running into trouble. Here is what he has
# done so far:
x <- c(3, 9, 12, "16", 11.4)
y <- c(2, 15, 10, 7, 5)
# when he tries to multiply these he gets an error. Use an `as.` function
# to coerce x appropriately and then return the product of x and y.
submit_answer({
})
```
For the following, recall from [this lecture]({#missing-data}) how to test
for missing data.
```{r do-stuff-with-NAs, rr.question=TRUE}
# z is a vector with some missing data values, and w is
# a vector of the same length with no missing data:
set.seed(5)
w <- sample(1:20, 10)
z <- sample(1:20, 10)
z[sample(1:length(z), 4)] <- NA
# return a vector that has all the non-NA values in z in the
# order in which they occur in z.
submit_answer({
# <- put your answer to the left of the #.
}, subprob = "-a")
# In the above, don't worry about the "subprob" argument. That is just
# part of the problem naming and numbering system.
# Another exercise: Return all the values in w that
# occur at the same position as the NAs in z.
submit_answer({
}, subprob = "-b")
# Another exercise: Return a vector which is like z, but in which all
# the non-missing values have been multiplied by 2.5 and all the missing
# values (NAs) have been turned into -1's
submit_answer({
}, subprob = "-c")
# Last subproblem: Modify z so that every NA gets replaced by the value
# in the same position in the vector w
submit_answer({
}, subprob = "-d")
```
## About Euclidean distance {#about-euclid-dist}
If you have two vectors $p=(p_1,\ldots,p_n)$ and $q=(q_1,\ldots,q_n)$ that describe
two points in an $n$-dimensional space, the Euclidean Distance between the points
is defined as:
$$
d(p,q) = \biggl( \sum_{i=1}^n (p_i - q_i)^2 \biggr)^{\frac{1}{2}}
$$
The next problem asks you to compute Euclidean distance between two vectors.
```{r euclidean-distance, rr.question=TRUE}
# Let p and q be two vectors defining points in a 20-dimensional space:
set.seed(10)
p <- c(-1,1) * rnorm(20, mean=6, sd=2)
q <- c(-1,1) * rnorm(20, mean=6, sd=2)
# return the Euclidean distance between p and q. Note that if you are
# not familiar with the sum() function you should read about it in the
# help files by typing "?sum" at your R prompt.
submit_answer({
})
```
```{r bin-comp-combo, rr.question=TRUE}
# let a, b, and c be the following vectors:
set.seed(1)
a <- sample(letters, 100, replace = TRUE)
b <- rnorm(100)
c <- sample(1:1000, 100)
# return all the values in c that correspond to positions in
# the vectors where:
# values in a are between "g" and "m", inclusive, alphabetically
# AND
# values in b are less than -1.5 or greater than 1.0
# For checking, your result should have length 6.
submit_answer({
})
```
```{r indexing-and-recycling, rr.question=TRUE}
# f is capital letters of the alphabet
f <- LETTERS
# Index f with a logical vector (using recycling) to return every
# third element in f (i.e. elements 3, 6, 9,...)
submit_answer({
}, subprob = "-a")
# Use recycling with a logical vector
# to return every 3rd element in f, starting on element number 2 (i.e.
# get elements 2, 5, 8, ...)
submit_answer({
}, subprob = "-b")
# A new problem: Given the vector:
g <- 10:21
# Multiply every odd number in g by 2 and every even number
# in g by 3. Use recycling. Write as short an expression as
# possible
submit_answer({
}, subprob = "-c")
```
```{r using-names, rr.question=TRUE}
# here are some names of salmon populations in CA and OR:
pops <- c("Eel_R", "Russian_R", "Klamath_IGH_fa", "Trinity_H_sp", "Smith_R", "Chetco_R", "Cole_Rivers_H", "Applegate_Cr", "Coquille_R", "Umpqua_sp", "Siuslaw_R")
# each one of these populations belongs to a so-called
# "reporting-unit" which may include multiple populations.
# Here are the reporting units corrsponding to the populations in pops:
repunits <- c("CaliforniaCoast", "CaliforniaCoast", "KlamathR", "KlamathR", "NCaliforniaSOregonCoast", "NCaliforniaSOregonCoast", "RogueR", "RogueR", "MidOregonCoast", "MidOregonCoast", "MidOregonCoast")
# here are the populations-of-origin for 25 fish caught
# in a fishery off the coast of california:
set.seed(12)
fish_seq <- sample(pops, 25, replace = TRUE)
# Problem (a): Instead of knowing the sequence of salmon populations, some
# fishery managers want you to give them the sequence of *reporting units*.
# Return a vector of length 25 (same length as fish_seq) that gives the sequence of reporting units
# of the fish in fish_seq. Do this by setting the names attribute of
# repunits to be the pops and then indexing that vector with fish_seq.
submit_answer({
}, subprob = "-a")
# Now, 20 more fish were caught and their lengths measured in mm. Those
# lengths are recorded in fish_len, and the populations from which those
# fish came from are recorded in the names attribute of fish_len
set.seed(2)
fish_len <- floor(rnorm(20, mean = 700, sd = 90))
names(fish_len) <- sample(pops, 20, replace = TRUE)
# Problem (b): Create a new vector equal to fish_len, but give it
# names that are the reporting units corresponding to the
# fish_len populations. Call it fish_lr, and, after creating it
# return it.
submit_answer({
}, subprob = "-b")
# Problem (c): Extract the lengths of the 9 fish from the MidOregonCoast
# reporting unit. Don't do this by hand! Use a tidy expression (like indexing
# on the basis of a comparison of the names attribute of fish_lr)
submit_answer({
}, subprob = "-c")
# Bonus question: Why can't you get those 9 fish lengths by doing this: fish_len["MidOregonCoast"] ?
```
## Sorting in R {#sorting-in-r}
We are going to talk briefly about sorting in R. There are two main
functions used for sorting: `sort` and `order`.
The `sort` function
returns a sorted version of its input vector. For example:
```{r}
r <- c(4, 7, 1, 3, 12) # not sorted
sort(r) # returns all the elements of r in sorted order
```
This is useful when all you want to do is sort a single
vector on the basis of its elements. However, much of the
time when one is sorting data, you will be sorting one vector
_on the basis of a different vector_. The `sort` function
is not useful for that. Instead you can use the `order`
function.
The `order` function returns the indices which, if used to
index its argument, would put it in sorted order. So,
for example:
```{r}
r <- c(4, 7, 1, 3, 12) # not sorted (same vector as above)
order(r) # indices that would extract elements from r in sorted order
# note that you can achieve the same things as sort(r) with
# r[order(r)]:
sort(r)
r[order(r)]
```
`order` is considerably more versatile. We'll do a quick problem
on it.
```{r using-order, rr.question=TRUE}
# Imagine you have measured the weights (in kg) and lengths (in mm) of
# 20 fish and recorded them in the variables wt and len.
set.seed(3)
wt <- round(rnorm(20, mean = 15, sd = 3), digits = 1)
len <- wt * 53 + floor(rnorm(20, mean = 0, sd = 50))
# and let the population from which the fish arrive come be recorded in
# the variable wpop
wpop <- sample(c("Eel_R", "Russian_R", "Klamath_IGH_fa", "Trinity_H_sp", "Smith_R", "Chetco_R", "Cole_Rivers_H", "Applegate_Cr", "Coquille_R", "Umpqua_sp", "Siuslaw_R"), 20, replace = TRUE)
# Problem (a): Return the vector wt sorted alphabetically
# on the population that each fish came from.
submit_answer({
}, subprob = "-a")
# Problem (b): Return len sorted in DECREASING order of the
# weight of each fish. (do ?order to learn about sorting in increasing
# vs decreasing order.)
submit_answer({
}, subprob = "-b")
```