Key notes on solutions to assignment 3:

·  Problem 1.

You need to simulate the data sets. A set of example codes for fitting non-parametric models are:

par(mfrow = c(3,2))

plot(x, y, main = "Polynomial Regression")

lines(times, fitted(lm(y ~ poly(x,3))))

lines(times, fitted(lm(y ~ poly(x,6))), lty = 3)

legend(40, -100, c("degree = 3", "degree = 6"), lty = c(1,3), bty ="n")

plot(x, y, main = "Natural/general splines")

lines(times, fitted(lm(y ~ ns(x,df = 5))))

lines(times, fitted(lm(y ~ ns(x, df = 10))), lty = 3)

lines(times, fitted(lm(y ~ ns(x, df = 20))), lty = 4)

legend(40, -100, c("df = 5", "df = 10", "df = 20"), lty = c(1,3,4), bty= "n")

plot(x, y, main = "Smoothing splines")

lines(smooth.spline(x,y))

plot(x, y, main = "Lowess")

lines(lowess(x,y))

lines(lowess(x,y, f = 0.2), lty = 3)

legend(40, -100, c("default span", "f = 0.2"), lty = c(1,3), bty = "n")

plot(x, y, main = "ksmooth(kernel)")

lines(ksmooth(x,y, kernel = "normal", bandwidth = 5))

lines(ksmooth(x,y, kernel = "normal", bandwidth = 2), lty = 3)

legend(40, -100, c("bandwidth = 5", "bandwidth = 2"), lty = c(1,3), bty = "n")

plot(x, y, main = "supsmu/k-NN")

lines(supsmu(x,y))

lines(supsmu(x,y, bass = 8.2), lty = 3)

legend(40, -100, c("default", "bass = 8.2"), lty = c(1,3), bty = "n")

Most of you did pretty good job on this problem. The only problem that caught my attention is that some of you did not try to change the tuning parameters (ie, bandwidth, spline degree, etc.) to get better fit when the fitted curve is not good using the defaults.

For either set of simulated data, we should get good fitted curves using any of smoothing methods. In the second set, the variance is bigger. Pay attention to the appearance of the fitted curves, and any possible differences.

·  Problem 2.

There is no unique model solution to this problem. Reasonable linear models are (partial list):

Log(GAG) ~ a + b * AGE + e;

Log(GAG) ~ a + b * sqrt(AGE) + e;

Log(GAG) ~ polynomial regression on AGE [note polynomial regression is often considered as a linear regression, since it is linear in the regression parameters]

A lot of you use the non-linear regression model:

Log(GAG) ~ a * exp(b + c* AGE) + e

which seems to give us a reasonably fitted curve.

You can use any smoothing method to fit a non-parametric curve.

I was hoping in the prediction chart, you would produce a prediction line together with pointwise prediction intervals (bands). Something like (in the case of using linear model Log(GAG) ~ Age):

However, only a handful of people tried to do that. In linear regression model, prediction and prediction interval should have been taught in the Regression class (STAT 563). Although we didn’t teach them, for non-linear and non-parametric models, the idea and concept of prediction and prediction interval are the same but more complicated. Some R/Splus functions (some in MASS library) (e.g., “predict”, “predict.loess”, “predict.gam”, “predict.nls”, etc) can provide you prediction standard errors (se). This can help you to get the pointwise prediction intervals.