Exercises

Data analysis

Consider the case that, in building linear regression models, there is a concern that some data points may be more important (or more trustable). For these cases, it is not uncommon to assign a weight to each data point. Denote the weight for the \(i^{th}\) data point as \(w_i\). An example is shown in Table 8, as the last column, e.g., \(w_1=1\), \(w_2=2\), \(w_5=3\) .

Table 8: Dataset for building a weighted linear regression model

ID	\(x_1\)	\(x_2\)	\(y\)	\(w\)
\(1\)	\(-0.15\)	\(-0.48\)	\(0.46\)	\(1\)
\(2\)	\(-0.72\)	\(-0.54\)	\(-0.37\)	\(2\)
\(3\)	\(1.36\)	\(-0.91\)	\(-0.27\)	\(2\)
\(4\)	\(0.61\)	\(1.59\)	\(1.35\)	\(1\)
\(5\)	\(-1.11\)	\(0.34\)	\(-0.11\)	\(3\)

We still want to estimate the regression parameters in the least-squares framework. Follow the process of the derivation of the least-squares estimator as shown in Chapter 2, and propose your new estimator of the regression parameters.

Follow up the weighted least squares estimator derived in Q1, please calculate the regression parameters of the regression model using the data shown in Table 8.
Follow up the dataset in Q1. Use the R pipeline for linear regression on this data (set up the weights in the lm() function). Compare the result from R and the result by your manual calculation.
Consider the dataset in Table 5. Use the R pipeline for building a logistic regression model on this data.
Consider the model fitted in Q4. Suppose that now there are two new data points as shown in Table 9. Please use the fitted model to predict on these two data points and fill in the table.

Table 9: Two test data points

ID	\(x_1\)	\(x_2\)	\(y\)
\(9\)	\(0.25\)	\(0.18\)
\(10\)	\(0.08\)	\(1.12\)

Use the dataset PimaIndiansDiabetes2 in the mlbench R package, run the R pipeline for logistic regression on it, and summarize your findings.
Follow up on the simulation experiment in Q12 in Chapter 2. Apply glm() on the simulated data to build a logistic regression model, and comment on the result.

☰ Menu

Exercises

Data analysis