18.8 Chapter 15: Regression
The following questions apply to the auction dataset in the yarrr package. This dataset contains information about 1,000 ships sold at a pirate auction.
- The column jbb is the “Jack’s Blue Book” value of a ship. Create a regression object called
jbb.cannon.lm
predicting the JBB value of ships based on the number of cannons it has. Based on your result, how much value does each additional cannon bring to a ship?
library(yarrr)
# jbb.cannon.lm model
# DV = jbb, IV = cannons
jbb.cannon.lm <- lm(formula = jbb ~ cannons,
data = auction)
# Print jbb.cannon.lm coefficients
summary(jbb.cannon.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1396 61 23 1.3e-94
## cannons 101 3 34 1.9e-169
- Repeat your previous regression, but do two separate regressions: one on modern ships and one on classic ships. Is there relationship between cannons and JBB the same for both types of ships?
# jbb.cannon.modern.lm model
# DV = jbb, IV = cannons. Only include modern ships
jbb.cannon.modern.lm <- lm(formula = jbb ~ cannons,
data = subset(auction, style == "modern"))
# jbb.cannon.classic.lm model
# DV = jbb, IV = cannons. Only include classic ships
jbb.cannon.classic.lm <- lm(formula = jbb ~ cannons,
data = subset(auction, style == "classic"))
# Print jbb.cannon.modern.lm coefficients
summary(jbb.cannon.modern.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1217 71.8 17 3.5e-51
## cannons 100 3.5 29 3.1e-107
# Print jbb.cannon.classic.lm coefficients
summary(jbb.cannon.classic.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1537 75.9 20 5.9e-67
## cannons 104 3.7 28 2.0e-103
- Is there a significant interaction between a ship’s style and its age on its JBB value? If so, how do you interpret the interaction?
# int.lm model
# DV = jbb, IV = interaction between style and age
int.lm <- lm(formula = jbb ~ style * age,
data = auction
)
# Print int.lm coefficients
summary(int.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3414.2 79.20 43.11 6.0e-230
## stylemodern -15.7 111.74 -0.14 8.9e-01
## age 1.9 0.76 2.57 1.0e-02
## stylemodern:age -3.7 1.07 -3.43 6.2e-04
- Create a regression object called predicting the JBB value of ships based on cannons, rooms, age, condition, color, and style. Which aspects of a ship significantly affect its JBB value?
# jbb.all.lm model
# DV = jbb, IV = everything (except price)]
jbb.all.lm <- lm(jbb ~ cannons + rooms + age + condition + color + style,
data = auction
)
# Print jbb.all.lm coefficients
summary(jbb.all.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 134.4 52.9 2.54 1.1e-02
## cannons 100.7 1.6 64.92 0.0e+00
## rooms 50.5 1.6 30.80 1.2e-146
## age 1.1 0.2 5.58 3.1e-08
## condition 107.6 3.9 27.51 3.4e-124
## colorbrown 4.9 16.6 0.30 7.7e-01
## colorplum -29.8 31.3 -0.95 3.4e-01
## colorred 15.1 18.3 0.82 4.1e-01
## colorsalmon -19.4 20.7 -0.94 3.5e-01
## stylemodern -397.8 12.8 -30.98 6.7e-148
- Create a regression object called predicting the actual selling value of ships based on cannons, rooms, age, condition, color, and style. Based on the results, does the JBB do a good job of capturing the effect of each variable on a ship’s selling price?
# price.all.lm model
# DV = price, IV = everything (except jbb)]
price.all.lm <- lm(price ~ cannons + rooms + age + condition + color + style,
data = auction
)
# Print price.all.lm coefficients
summary(price.all.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 302.5 73.81 4.10 4.5e-05
## cannons 100.0 2.17 46.17 4.0e-249
## rooms 48.8 2.29 21.34 2.0e-83
## age 1.2 0.29 4.28 2.0e-05
## condition 104.1 5.46 19.05 3.4e-69
## colorbrown -119.2 23.19 -5.14 3.3e-07
## colorplum 15.6 43.74 0.36 7.2e-01
## colorred -603.6 25.59 -23.59 5.4e-98
## colorsalmon 70.4 28.97 2.43 1.5e-02
## stylemodern -419.2 17.93 -23.38 1.3e-96
- Repeat your previous regression analysis, but instead of using the price as the dependent variable, use the binary variable price.gt.3500 indicating whether or not the ship had a selling price greater than 3500. Call the new regression object
price.all.blr
. Make sure to use the appropriate regression function!!.
# Create new binary variable indicating whether
# a ship sold for more than 3500
auction$price.gt.3500 <- auction$price > 3500
# price.all.blr model
# DV = price.gt.3500, IV = everything (except jbb)
price.all.blr <- glm(price.gt.3500 ~ cannons + rooms + age + condition + color + style,
data = auction,
family = binomial # Logistic regression
)
# price.all.blr coefficients
summary(price.all.blr)$coefficients
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -19.7401 1.4240 -13.86 1.1e-43
## cannons 0.6251 0.0442 14.14 2.2e-45
## rooms 0.2688 0.0296 9.07 1.2e-19
## age 0.0097 0.0033 2.93 3.4e-03
## condition 0.6825 0.0745 9.16 5.4e-20
## colorbrown -0.8924 0.2549 -3.50 4.6e-04
## colorplum -0.1291 0.5090 -0.25 8.0e-01
## colorred -4.0764 0.4107 -9.93 3.2e-23
## colorsalmon 0.2479 0.3172 0.78 4.3e-01
## stylemodern -2.4037 0.2432 -9.88 4.9e-23
- Using
price.all.lm
, predict the selling price of the 3 new ships
cannons | rooms | age | condition | color | style |
---|---|---|---|---|---|
12 | 34 | 43 | 7 | black | classic |
8 | 26 | 54 | 3 | black | modern |
32 | 65 | 100 | 5 | red | modern |
# Create a dataframe with new ship data
new.ships <- data.frame(cannons = c(12, 8, 32),
rooms = c(34, 26, 65),
age = c(43, 54, 100),
condition = c(7, 3, 5),
color = c("black", "black", "red"),
style = c("classic", "modern", "modern"),
stringsAsFactors = FALSE)
# Predict new ship data based on price.all.lm model
predict(object = price.all.lm,
newdata = new.ships
)
## 1 2 3
## 3944 2331 6296
- Using
price.all.blr
, predict the probability that the three new ships will have a selling price greater than 3500.