---
title: "Lab 3"
author: "STAT 302"
date: "Due Date Here"
output: html_document
---

<!--- Begin styling code. --->
<style type="text/css">
/* Whole document: */
body{
  font-family: "Palatino Linotype", "Book Antiqua", Palatino, serif;
  font-size: 12pt;
}
h1.title {
  font-size: 38px;
  text-align: center;
}
h4.author {
  font-size: 18px;
  text-align: center;
}
h4.date {
  font-size: 18px;
  text-align: center;
}
</style>
<!--- End styling code. --->


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

*If you collaborated with anyone, you must include "Collaborated with: FIRSTNAME LASTNAME" at the top of your lab!*

## Part 1. t-test (10 points)

Write a function `my_t.test()` that performs a one sample t-test in R. To prove it works, use or simulate any data. Use this data for a two-sided t-test using both `my_t.test()` and `t.test()`. The results should match. Do the same for a one-sided t-test.

Your function should have the following parameters:

* `x` a numeric vector of data.
* `alternative` a character string specifying the alternative hypothesis. This should only accept `"two.sided"`, `"less"`, or `"greater"`. Otherwise, your function should throw an informative error.
* `mu` a number indicating the null hypothesis value of the mean.

Your function should return a `list` with elements:

* `test_stat`: the numeric test statistic.
* `df`: the degrees of freedom.
* `alternative`: the value of the parameter `alternative`.
* `p_val`: the numeric p-value.

You should use the following information:

* To get the standard error in a one sample t-test, take the standard deviation (`sd()`) of your input and divide it by the square root of the sample size.
* Use `pt()` to get the area under the curve for a t-distribution. Be sure to use the parameter `lower.tail`!
* The degrees of freedom for this test (`df` within `pt()`) is equal to the sample size - 1.
* The p-value should never be less than 0 or greater than 1.

(*Hint: Be careful about whether you use `lower.tail = TRUE` or `lower.tail = FALSE` in the two-sided test! One safe option is to use `lower.tail = FALSE` with the absolute value (`abs()`) of your test statistic.*)

## Part 2. Linear model (10 points)

Write a function `my_lm()` that fits a linear model in R. To prove it works, use or simulate any data. Use this data to fit a linear model using both `my_lm()` and `lm()`. The results of `my_lm()` should match the coefficient table from the `summary` of your `lm()` output.

Your function should have the following parameters:

* `formula`: a `formula` class object, similar to `lm()`.
* `data`: input data frame.

Your function should return a `table` similar to the coefficent table from `summary()` with rows for each coefficient (including the `(Intercept)`!) and columns for the `Estimate`, `Std. Error`, `t value`, and `Pr(>|t|)`. There should be row and column names.

You should use the following information:

* Use `model.matrix()` to extract the model matrix $\mathbf{X}$. It takes as input parameters a formula and data.
* Use `model.response()` to extract the model response $\mathbf{Y}$. It takes as input a model frame object.
* Use `model.frame()` to extract a model frame object. It takes as input parameters a formula and data.
* You can solve for linear regression coefficients using the formula 
$$
\hat{\beta} = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\mathbf{Y}.
$$
* Recall matrix operations `solve()`, `t()`, and `%*%`.
* The degrees of freedom (df) you should use for your tests are equal to your sample size minus the number of covariates (including intercept!).
* You can estimate $\sigma^2$ using the formula
$$
\hat{\sigma}^2 = \sum_i \dfrac{(Y_i - \mathbf{X}_i \beta)^2}{\text{df}}.
$$
* You can estimate the standard error for your coefficients using the formula
$$
se(\hat{\beta}_j) = \left[\sqrt{\hat{\sigma}^2(\mathbf{X}^T\mathbf{X})^{-1}}\right]_{[j,j]}.
$$
Note that within the brackets of the right-hand side is a matrix and $[j,j]$ represents the element of the matrix located at the $j$<sup>th</sup> row and $j$<sup>th</sup> column indices. 
* Use `diag()` to extract diagonal components from a matrix.
* `Pr(>|t|)` comes from the two-sided t test.
$$
\begin{align}
H_0: \beta_j &= 0\\
H_a: \beta_j &\neq 0
\end{align}
$$
* Use `pt()` to get the area under the curve for a t-distribution. Because the distribution is symmetric, you can multiply this value by $2$ to get the two-sided test output. 

(*Hint: As before, be careful about whether you use `lower.tail = TRUE` or `lower.tail = FALSE`! One safe option is to use `lower.tail = FALSE` with the absolute value (`abs()`) of your test statistic. Make sure you never end up with a p-value greater than 1!*)