# Counting Model Vectors
```{r include=FALSE}
require(mosaic) # Leave this chunk alone
```
### Group Members
add your names here!
### Introduction
We're going to pull out a small subset of the `CPS85` data in order to explore model vectors and their relationship to model terms.
First, to construct the `small` data set, follow these instructions **exactly**.
```{r}
set.seed(1234)
CPS85 <- transform(CPS85, ageGrp=as.character(ntiles(age,3)))
small <- droplevels(sample(CPS85, size=10,orig.ids=FALSE))
rownames(small) <- NULL
```
### Levels of the Categorical Variables
In this small sample, you can see how many levels there are of each categorical variable with the command `with( small, levels( variableName ) )`
```{r}
with( small, levels( sector ) )
with( small, levels( ageGrp ) )
with( small, levels( union ) )
```
## Activity
For each of the following models, your job is
* **FIRST**, to predict how many explanatory model vectors there will be in the model for this `small` data set.
* Second, confirm or refute your prediction by using the `model.matrix()` command.
* Third, explain any discrepancy between your prediction and the results of `model.matrix()`.
* Fourth, find the standard deviation of the residuals. If the model is a "perfect" fit, explain why.
### EXAMPLE:
How many explanatory model vectors in the model `wage ~ 1`?
* Prediction: There's just one, the intercept. A vector of all 1s.
* Confirming
```{r}
mod = lm( wage~1, data=small )
model.matrix(mod)
```
The prediction was right.
The standard deviation of residuals is
```{r}
sd( resid(mod) )
```
Not at all a perfect fit. Indeed, the model "explains" nothing, since the residuals have a standard deviation that is every bit as large as the response variable itself:
```{r}
sd( wage, data=small )
```
### Model 1: `wage ~ sex`
### Model 2: `wage ~ sector`
### Model 3: `wage ~ union`
### Model 4: `wage ~ ageGrp`
### Model 5: `wage ~ educ + sex`
### Model 6: `wage ~ age + ageGrp`
### Model 7: `wage ~ educ * sex`
### Model 8: `wage ~ age * sector`
### Model 9: `wage ~ age * sector + age * sex`
In this model, some of the coefficients are `NA`.
```{r}
mod9 = lm( wage ~ age * sector + age * sex, data=small )
coef(mod9)
```
Look at the model matrix and try to figure out why these are NA.
### Model 10: `wage ~ ageGrp*sector + ageGrp*sex + age*sector`
Again, some of the coefficients in the model are `NA`.
```{r}
mod10 = lm( wage ~ ageGrp*sector + ageGrp*sex + age*sector, data=small )
coef(mod10)
```
Explain why.
Are those coefficients also `NA` on the model fitted to the whole of `CPS85`?
### Finally ...
Try fitting the last model to `CPS85` (rather than `small`) and look at the standard deviation of the residuals. Comment on the difference from the residuals fitted to `small`).