--- title: "西南大学 2022: STATS 201 Assignment 2" author: "RUNZE LIAO 22202032102007" date: '22.5.18' output: word_document: default pdf_document: default html_document: fig_caption: yes number_sections: yes --- ```{r global_options, include=FALSE} # Do not edit this code knitr::opts_chunk$set(fig.height=3) ``` ## Enter your name here ```{r} # Replace "Model Student" with your name in quotes, # E.g., myname="Ruoxi Xu" myname="runze liao" ``` # Question 1 ## Background A manufacturer of autonomous vehicles wanted to predict the stopping distance as a function of vehicle speed (km/h). Stopping distance (m) was measured as the distance traveled after emergency braking was activated. You may assume that the observations are independent. The code below automatically generates the data for a randomly chosen student. The data are in the dataframe **Stop.df**. Variable **Speed** is the explanatory variable, and **Dist** is the response variable. ## Questions of interest Make inference about the stopping distance for a given speed. In particular, - What is the effect of a doubling in speed? - Estimate the typical stopping distance for emergency braking at speeds of 50 km/h and 100 km/h. - The manufacturer desires that the vehicle stops in less than 12 m at 50 km/h, at least 999 times out of 1000. Has this been achieved? [Hint, calculate the 0.998 level prediction interval.] ```{r echo=FALSE, message=FALSE} ## Do not delete this! ## It loads the s20x library for you. If you delete it your document may not compile require(s20x) ``` ```{r, echo=F} #Do not alter this code. It generates your data. seed=sum( as.numeric( charToRaw(myname) )^2 ) set.seed(seed) UniqSpeeds=seq(30,160,10); m=length(UniqSpeeds) Speed=rep(UniqSpeeds,rep(2,m)); n=length(Speed) mps=1000*Speed/3600 #metres per second decel=runif(1,7.5,9.5) #deceleration rate metres per second^2 Dist=round(mps^2/(2*decel)*rlnorm(n,0,0.02),2) Stop.df=data.frame(Speed,Dist) ``` ## Plot data ```{r} # Add R code below to draw appropriate scatter plot(s) plot(Dist~Speed, data = Stop.df) ``` ```{r} plot(log(Dist)~Speed, data = Stop.df) plot(log(Dist)~log(Speed), data = Stop.df) trendscatter(log(Dist)~log(Speed),data = Stop.df) ``` In this graph, we can see a apparent looks reasonably linear on the log scale and after we fit the log(Dist), it seems that it has a Power law relation. So we will fit the model with log(Dist) and log(Speed) ## Fit a power-law model and do assumption checks ```{r} Stop.lm = lm(log(Dist)~log(Speed), data = Stop.df) plot(Stop.lm, which = 1) normcheck(Stop.lm) cooks20x(Stop.lm) ``` The EOV check is good, the residuals satisfy the normal distributions. The point 3 seems strange, however, we will keep it as it does not > 0.4. ```{r} summary(Stop.lm) exp(confint(Stop.lm)) ``` The p-value of intercept and log(Speed) are all < 0.05, we have strong evidence to believe the assumption. The estimate of intercept is from 0.0040 to 0.0045 and the estimate of Speed is from 7.28 to 7.50. ## Inference, i.e, answer questions of interest ```{r} # Add R code below pred.df = data.frame(Speed = c(50,100)) exp(predict(Stop.lm,pred.df,level = 0.998, interval = "confidence")) ``` If the Speed = 50, the Distance will be the 10.37 to 10.74, while when the Speed = 100, then the distance will be 41.69 to 42.76. ## Method and Assumption Checks By plotting the original data, we cannot find a linear regression, and we use the Eov check and Normchenck, however, it does not satisfy by fitting the simple linear regression. After logging the variables, the scatter plots showed the desired linear relationship between Speed and Distance, so we fitted a power law model explaining log Speed and log Distance. The trendscatter using the power law model seems a straight line. The underlying model assumptions appear valid, however, we have a strange obseravtion point 3, however, it does not > 0.4, so we will keep it. Our final model is: $$log(Dist)_i = \beta_0 + \beta_1 * log(SpeWed)_i + \epsilon_i $$ where $\epsilon_i$~$N(0,\sigma^2)$ Our model explained 99.97% of the vairability in the Speed & Distance model. ## Executive Summary We want to find the relationship between the Speed and the Distance, and we also are interested in estimate the Speed. The model was that the Distance would follow a power law relationship with the car's Speed. More specifically, the Distance will increase with the square of the Speed. The Distance will follow a power law modelwith respect to the Speed. - What is the effect of a doubling in speed? As the power law parameter is 2, the Distance will increase with the square of the Speed, so the stop distance will be 4 times than before. - Estimate the typical stopping distance for emergency braking at speeds of 50 km/h and 100 km/h. We can see that If the Speed = 50, stopping distance for emergency braking be the 10.37 to 10.74, while when the Speed = 100, then stopping distance for emergency braking will be 41.69 to 42.76. - The manufacturer desires that the vehicle stops in less than 12 m at 50 km/h, at least 999 times out of 1000. Has this been achieved? [Hint, calculate the 0.998 level prediction interval.] If the Speed = 50, the Distance will be the 10.37 to 10.74, while when the Speed = 100, then the distance will be 41.69 to 42.76. \newpage # Question 2 ## Background and questions of interest In New Zealand, the police are usually tolerant of speeding up to 10 km/h over the posted speed limit. However this tolerance is reduced to 5 km/h during public holidays. We investigated the effect of the posted speed limit on the actual speed of vehicles, and also the effect of the reduced police tolerance for speeding that applies on holidays. Also, we wished to see if the effect of the reduced tolerance during holidays depended on the posted speed limit. A total of 445 independent observations were made for posted speed limits between 50 and 100 km/h. All measurements were made during periods of unrestricted traffic flow on straight stretches of road. ## Read in and inspect the data: ```{r,fig.height=3.5} Speed.df=read.table("Speed.txt", header=T) plot(Speed~Posted,main="Actual Speed by Speed Limit and Holiday", col=ifelse(Holiday=="N","red","blue"),pch=ifelse(Holiday=="N",1,2),data=Speed.df) legend(50,110,pch=c(1,2),col=c("red","blue"),legend=c("Holiday","Non-holiday")) ``` ## Comment on plot The number of holiday cars is less than the Non-holidays. And we can see that the average speed in their own posted area, the Holiday's Speed is larger than the Non-holiday. The fact that appears this problem maybe in holiday, the traffic management is easy. ## Fit model and check assumptions ```{r,fig.height=3.25} # Add R code below library(s20x) Speed.fit = lm(Speed~Posted*Holiday, data = Speed.df) plot(Speed.fit, which = 1) normcheck(Speed.fit) cooks20x(Speed.fit) summary(Speed.fit) ``` The residuals plot is good, the normcheck is good which indicates the residuals are normally distributed. The all observations seem no strong influence point. However, through the summary we can see that the p-value of Posted:HolidayY(0.725) is larger than 0.05 so much, so we wonder if we can remove the interaction between the Posted and Holiday. ## Reproduce plot with fitted lines superimposed. ```{r,fig.height=3.5} # Add R code below trendscatter(Speed~Posted, data = Speed.df) ``` The line between the holiday and Non-holiday is parallel, So we fitted the model without interaction. ```{r} Speed.fit2 = lm(formula = Speed ~ Posted + Holiday, data = Speed.df) ``` Then we plot some graphs to see if it satisfy all assumptions. ```{r} plot(Speed.fit2, which = 1) normcheck(Speed.fit2) cooks20x(Speed.fit2) ``` From the graphs we could see that the Residuals is good, the distribution of residuals is like Normal, the cooks distance is good, no strong influence point. It satisfied all assumption. ```{r} summary(Speed.fit2) ``` ```{r} confint(Speed.fit2) pred.df1 = data.frame(Posted = c(50,60,70,80,90,100),Holiday = "Y") pred.df2 = data.frame(Posted = c(50,60,70,80,90,100),Holiday = "N") predict(Speed.fit2,pred.df1,interval = "confidence") predict(Speed.fit2,pred.df2,interval = "confidence") ``` ```{r} plot(Speed ~ Posted, data = Speed.df, pch = substr(Holiday,1,1), cex = 0.7, col = ifelse(Holiday == "Yes","blue","red")) lines(x = c(50,60,70,80,90,100),y = predict(Speed.fit2,pred.df1),col = "blue",lty = 2) lines(x = c(50,60,70,80,90,100),y = predict(Speed.fit2,pred.df2),col = "red",lty = 2) ``` We can see that they are parrael line and have no interaction. ## Methods and assumption checks To explain the effect of the posted speed limit on the actual speed of vehicles, and also the effect of the reduced police tolerance for speeding that applies on holidays. First we build a interaction model by having a numeric and a factor variable, however, when we do the summary we see that the p-value of the interaction part is larger than 0.05, we have no strong evidence to have it, moreover, we plotted the lines between Posted and Speed with whether having the holiday, we saw that they are parallel, which indicated that the two variables have no interaction. So we have a latest model without interaction, and it satisfied all the assumptions, we hold the belief that the modle is quiet good. Our final model is: $$Speed_i = \beta_0 + \beta_1 * Posted_i + \beta_2 * Holiday_i + \epsilon_i $$ where $\epsilon_i$~$N(0,\sigma^2)$ and "Holiday" is 1 if this day is Holiday, otherwise the "Holiday" is 0. Our model explained 98.25% of the variablity in the Holiday Speed experiment. ## Executive Summary We are interested in building a model to estimate the relation between Speed and Posted with whether it is Holiday. The relationship between Posted and Holiday has no interaction. - With the Posted increased 1, the Speed will increase from 0.99 to 1.02. - When the Posted are the same, if it is in the holiday, the Speed of the vehicals the decrease about 4.17 to 5.53. We estimate that when Posted speeds are 50, 60, 70, 80, 90, 100 and is a holiday, expected Speeds are between 50.16, 60.19, 70.22, 80.25, 90.29 and 100.32. We estimate that when Posted speeds are 50, 60, 70, 80, 90, 100 and is no-holiday the expected Speeds are between 55.01, 65.04, 75.07, 85.10 and 105.17 respectively.