---
title: "Homework 2"
output:
  html_document:
    toc: yes
  bookdown::html_chapter:
    toc: no
layout: default_with_disqus
---


```{r setup, echo=FALSE, include=FALSE}
# PLEASE DO NOT EDIT THIS CODE BLOCK
library(knitr)
library(rrhw)
# tell knitr where to find the inserted file in case
# jekyll is building this in the top directory of the repo
opts_knit$set(child.path = paste(prj_dir_containing("rep-res-course.Rproj"), "extras/knitr_children/", sep=""))
rr_github_name <- NA
rr_commit <- NA

```
```{r insert-ids, child="homework-2-control.Rmd"}
```

## Problems to be done for "`r rr_homework_name`" {#hw2-start}
These are a selection of exercises on coercion, recycling, and indexing, including
indexing with names.  For each problem, evaluate all the code in the code chunk
(highlight it and hit CMD-Enter (or cntrl-Enter on a PC)) and then have a look
at each of the variables involved before writing your answer.

Make sure your document still knits successfully before submitting.


```{r instruct-link, child="link-to-homework-instructions.Rmd"}
```
```{r include=FALSE, eval=FALSE}
################## DON'T MODIFY ANYTHING ABOVE THIS LINE ##########################
```


```{r coerce-and-multiply, rr.question=TRUE}
# Joe R. Newbie is trying to compute the componentwise product of two 
# vectors x and y,  but is running into trouble.  Here is what he has
# done so far:
x <- c(3, 9, 12, "16", 11.4)
y <- c(2, 15, 10, 7, 5)

# when he tries to multiply these he gets an error.  Use an `as.` function
# to coerce x appropriately and then return the product of x and y.
submit_answer({
  
})

```

For the following, recall from [this lecture]({#missing-data}) how to test
for missing data.
```{r do-stuff-with-NAs, rr.question=TRUE}
# z is a vector with some missing data values, and w is 
# a vector of the same length with no missing data:
set.seed(5)
w <- sample(1:20, 10)
z <- sample(1:20, 10)
z[sample(1:length(z), 4)] <- NA


# return a vector that has all the non-NA values in z in the 
# order in which they occur in z.
submit_answer({
              #  <- put your answer to the left of the #. 
}, subprob = "-a")

# In the above, don't worry about the "subprob" argument.  That is just
# part of the problem naming and numbering system.

# Another exercise:  Return all the values in w that
# occur at the same position as the NAs in z.
submit_answer({
             
}, subprob = "-b")

# Another exercise: Return a vector which is like z, but in which all
# the non-missing values have been multiplied by 2.5 and all the missing
# values (NAs) have been turned into -1's
submit_answer({
             
}, subprob = "-c")

# Last subproblem: Modify z so that every NA gets replaced by the value
# in the same position in the vector w
submit_answer({
             
}, subprob = "-d")

```


## About Euclidean distance {#about-euclid-dist}
If you have two vectors $p=(p_1,\ldots,p_n)$ and $q=(q_1,\ldots,q_n)$ that describe
two points in an $n$-dimensional space, the Euclidean Distance between the points
is defined as:
$$
d(p,q) = \biggl( \sum_{i=1}^n (p_i - q_i)^2 \biggr)^{\frac{1}{2}}
$$
The next problem asks you to compute Euclidean distance between two vectors. 
```{r euclidean-distance, rr.question=TRUE}
# Let p and q be two vectors defining points in a 20-dimensional space:
set.seed(10)
p <- c(-1,1) * rnorm(20, mean=6, sd=2)
q <- c(-1,1) * rnorm(20, mean=6, sd=2)

# return the Euclidean distance between p and q.  Note that if you are
# not familiar with the sum() function you should read about it in the 
# help files by typing "?sum" at your R prompt.
submit_answer({
             
})

```


```{r bin-comp-combo, rr.question=TRUE}
# let a, b, and c be the following vectors:
set.seed(1)
a <- sample(letters, 100, replace = TRUE)
b <- rnorm(100)
c <- sample(1:1000, 100)

# return all the values in c that correspond to positions in 
# the vectors where:  
#   values in a are between "g" and "m", inclusive, alphabetically
# AND
#   values in b are less than -1.5 or greater than 1.0

# For checking, your result should have length 6.
submit_answer({
             
})
```


```{r indexing-and-recycling, rr.question=TRUE}
# f is capital letters of the alphabet
f <- LETTERS

# Index f with a logical vector (using recycling) to return every
# third element in f (i.e. elements 3, 6, 9,...)
submit_answer({

}, subprob = "-a")

# Use recycling with a logical vector
# to return every 3rd element in f, starting on element number 2 (i.e.
# get elements 2, 5, 8, ...)
submit_answer({

}, subprob = "-b")


# A new problem: Given the vector:
g <- 10:21

# Multiply every odd number in g by 2 and every even number 
# in g by 3.  Use recycling.  Write as short an expression as
# possible
submit_answer({

}, subprob = "-c")
```


```{r using-names, rr.question=TRUE}
# here are some names of salmon populations in CA and OR:
pops <- c("Eel_R", "Russian_R", "Klamath_IGH_fa", "Trinity_H_sp", "Smith_R", "Chetco_R", "Cole_Rivers_H", "Applegate_Cr", "Coquille_R", "Umpqua_sp", "Siuslaw_R")

# each one of these populations belongs to a so-called 
# "reporting-unit" which may include multiple populations.
# Here are the reporting units corrsponding to the populations in pops:
repunits <- c("CaliforniaCoast", "CaliforniaCoast", "KlamathR", "KlamathR", "NCaliforniaSOregonCoast", "NCaliforniaSOregonCoast", "RogueR", "RogueR", "MidOregonCoast", "MidOregonCoast", "MidOregonCoast")

# here are the populations-of-origin for 25 fish caught
# in a fishery off the coast of california:
set.seed(12)
fish_seq <- sample(pops, 25, replace = TRUE)

# Problem (a): Instead of knowing the sequence of salmon populations, some
# fishery managers want you to give them the sequence of *reporting units*.
# Return a vector of length 25 (same length as fish_seq) that gives the sequence of reporting units
# of the fish in fish_seq.  Do this by setting the names attribute of 
# repunits to be the pops and then indexing that vector with fish_seq.
submit_answer({

}, subprob = "-a")


# Now, 20 more fish were caught and their lengths measured in mm.  Those
# lengths are recorded in fish_len, and the populations from which those
# fish came from are recorded in the names attribute of fish_len
set.seed(2)
fish_len <- floor(rnorm(20, mean = 700, sd = 90))
names(fish_len) <- sample(pops, 20, replace = TRUE)

# Problem (b): Create a new vector equal to fish_len, but give it
# names that are the reporting units corresponding to the
# fish_len populations. Call it fish_lr, and, after creating it
# return it.
submit_answer({

}, subprob = "-b")


# Problem (c): Extract the lengths of the 9 fish from the MidOregonCoast
# reporting unit.  Don't do this by hand! Use a tidy expression (like indexing
# on the basis of a comparison of the names attribute of fish_lr)
submit_answer({

}, subprob = "-c")

# Bonus question: Why can't you get those 9 fish lengths by doing this: fish_len["MidOregonCoast"] ?
```


## Sorting in R  {#sorting-in-r}
We are going to talk briefly about sorting in R.  There are two main 
functions used for sorting: `sort` and `order`. 

The `sort` function 
returns a sorted version of its input vector.  For example:
```{r}
r <- c(4, 7, 1, 3, 12) # not sorted

sort(r)  # returns all the elements of r in sorted order
```
This is useful when all you want to do is sort a single
vector on the basis of its elements.  However, much of the
time when one is sorting data, you will be sorting one vector
_on the basis of a different vector_.  The `sort` function
is not useful for that.  Instead you can use the `order`
function.

The `order` function returns the indices which, if used to
index its argument, would put it in sorted order.  So,
for example:
```{r}
r <- c(4, 7, 1, 3, 12) # not sorted (same vector as above)

order(r) # indices that would extract elements from r in sorted order

# note that you can achieve the same things as sort(r) with
# r[order(r)]:
sort(r)

r[order(r)]
```

`order` is considerably more versatile.  We'll do a quick problem
on it.

```{r using-order, rr.question=TRUE}
# Imagine you have measured the weights (in kg) and lengths (in mm) of
# 20 fish and recorded them in the variables wt and len.
set.seed(3)
wt <- round(rnorm(20, mean = 15, sd = 3), digits = 1)
len <- wt * 53 + floor(rnorm(20, mean = 0, sd = 50))

# and let the population from which the fish arrive come be recorded in
# the variable wpop
wpop <- sample(c("Eel_R", "Russian_R", "Klamath_IGH_fa", "Trinity_H_sp", "Smith_R", "Chetco_R", "Cole_Rivers_H", "Applegate_Cr", "Coquille_R", "Umpqua_sp", "Siuslaw_R"), 20, replace = TRUE)

# Problem (a): Return the vector wt sorted alphabetically
# on the population that each fish came from.
submit_answer({

}, subprob = "-a")

# Problem (b): Return len sorted in DECREASING order of the
# weight of each fish.  (do ?order to learn about sorting in increasing
# vs decreasing order.)
submit_answer({

}, subprob = "-b")


```