3.8Building ARIMA Models

3.8.1

3.8Building ARIMA Models

The goal of this section is to show how one can go through the entire process of choosing the best model for a time series data set.

Model building process

1)Construct plots of xt vs. t and the estimated ACF to determine if the time series data is stationary

a)If it is not stationary in the variance, make the appropriate transformation

b)If it is not stationary in the mean, examine differences of xt

c)Use plots of xt vs. t and the estimated ACF to determine if what you did worked.

2)Construct plots of the estimated ACF and PACF of the stationary series (call it xt).

a)Match patterns in these plots with those of ARMA models.

b)Determine a few models to investigate further.

3)Find the estimated models using maximum likelihood estimation.

Example: AR(1) with 1=0.7, =0, and = 1 (ar1_model_build.R, AR1.0.7.txt)

Step #1

> ar1<-read.table(file = "C:\\chris\\UNL\\STAT_time_series

\\chapter1\\AR1.0.7.txt", header=TRUE, sep = "")

head(ar1)

> x<-ar1$x

#########################################################

# Step #1 and #2

> win.graph(width = 8, height = 6, pointsize = 10)

> plot(x = x, ylab = expression(x[t]), xlab = "t", type =

"l", col = "red", lwd = 1, main = "Plot of

AR1.0.7.txt data", panel.first=grid(col = "gray",

lty = "dotted"))

> points(x = x, pch = 20, col = "blue")

> par(mfrow = c(1,2))

> acf(x = x, type = "correlation", main = "Estimated ACF

for AR1.0.7.txt data", xlim = c(1,20), ylim = c(-

1,1))

> pacf(x = x, lag.max = 20, main = "Estimated PACF for

AR1.0.7.txt data", xlim = c(1,20), ylim = c(-1,1))

While the PACF plot is not needed here, it is often nice to look at these simultaneously in the next step so I plotted both here. Notice the options that I used in the acf() and pacf() functions. Both plots start at lag = 1.

From examining the plot of the data over time and the estimated ACF plot, there is no evidence against the stationary assumption.

Step #2

From examining the ACF and PACF plots above, it appears that an AR(1) model would be appropriate for the data. I am also going to examine a MA(1) model just to show what could happen if the “wrong” model is chosen.

Step #3

> #ARIMA(1,0,0)

> mod.fit1<-arima(x = x, order = c(1, 0, 0), include.mean

= TRUE)

> mod.fit1

Call:

arima(x = x, order = c(1, 0, 0), include.mean = TRUE)

Coefficients:

ar1 intercept

0.6854 -0.4322

s.e. 0.0730 0.3602

sigma^2 estimated as 1.336: log likelihood = -156.68, aic = 319.36

> z<-mod.fit1$coef/sqrt(diag(mod.fit1$var.coef))

> p.value<-2*(1-pnorm(q = abs(z), mean = 0, sd = 1))

> data.frame(z, p.value)

z p.value

ar1 9.392902 0.0000000

intercept -1.200045 0.2301219

> #Could use confint() function too

> #ARIMA(0,0,1)

> mod.fit2<-arima(x = x, order = c(0, 0, 1), include.mean

= TRUE)

> mod.fit2

Call:

arima(x = x, order = c(0, 0, 1), include.mean = TRUE)

Coefficients:

ma1 intercept

0.5801 -0.4794

s.e. 0.0661 0.1979

sigma^2 estimated as 1.581: log likelihood = -164.99, aic = 335.98

> z<-mod.fit2$coef/sqrt(diag(mod.fit2$var.coef))

> p.value<-2*(1-pnorm(q = abs(z), mean = 0, sd = 1))

> data.frame(z, p.value)

z p.value

ma1 8.770337 0.00000000

intercept -2.422093 0.01543140

Model building process (continued)

4)For each model chosen, investigate diagnostic measures.

Examine the residuals to determine if they appear to be “white noise”.

Remember that wt~independent (0,). Also, an ARIMA model can be rewritten as

wt = xt - 1xt-1 - … - pxt-p -1wt-1 - … -qwt-q

Then

= xt - xt-1 - … - xt-p - - … -

are the residuals (of course, the mean can be included in this equation as well).

Examine an ACF plot of the residuals to determine if they appear to come from a white noise process.

If they do, then the model assumption about wt~independent (0,) is satisfied.

If they do not, then changes need to be made to the model. There is an autocorrelation pattern in the xt that has not been accounted for by the AR or MA terms. The changes that need to be made correspond to the autocorrelations on the ACF plot. In other words, the residual ACF plot can be used just like the ACF of the original data!

Why can this be done? The answer is coming up!

Examine a PACF plot of the residuals to determine if they appear to “white noise”.

If they do, then the model assumption about wt~independent (0,) is satisfied.

If they do not, then changes need to be made to the model. The changes that need to be made correspond to the partial autocorrelations on the PACF plot. In other words, the residual PACF plot can be used just like the PACF of the original data!

Why can this be done?

Example[SD1]: Examining the residuals

Suppose the residuals for

(1)

do not correspond to a white noise process. Note that the p, q, and d correspond to the AR operator order, MA operator order, and differencing order, respectively.

Using the residual ACF and PACF, a new model for the bt can be identified to be

(2)

Note that the bar over  and  is just to differentiate between the parameters in (1) and (2).

Solving for bt in (2) produces and substituting this into (1) produces

Suppose that a series is wrongly identified as ARIMA(0,1,1):

Also, suppose examining the residuals identify the model ARIMA(0,0,1): .

The “identification” could be done by examining the residual ACF and PACF. In this case, the residual ACF has a spike at 1 and the PACF tails off to 0. Solving for bt and substituting this into the wrongly identified model produces

Thus the new model is ARIMA(0,1,2).

Examine the standardized residuals for outliers.

Standardized residuals (innovations):

where is the forecast for time t using information up to time t-1 [CB2]and is with parameter estimates replacing the corresponding parameters (see Section 3.5).

If the correct model is chosen, these standardized residuals should approximately follow a N(0,1) distribution. Thus, we would expect about 95% of them to fall within 2 and about 99% to fall within 3. Standardized residuals falling outside of this range indicate the correct model may not have been chosen.

If needed, special model terms can be introduced into a time series model to account for these outliers. For example, special cases of transfer function models can be used (Section 5.6).

Plot the standardized residuals (or residuals) vs. time.

This is another method to determine if there is autocorrelation among the standardized residuals.

Perform the Ljung-Box-Pierce test to determine if there is evidence against the wt independent assumption.

This performs a hypothesis test on a group of residual autocorrelations.

Ho:e(1)=e(2)=…=e(H)=0

Ha:Not all 0

where e(h) for h=1,…,H is the autocorrelation for the et’s.

Many different values of H are chosen to provide more than one hypothesis test. Shumway and Stoffer suggest perform a test with H=20.

The test statistic for the Ljung-Box-Pierce hypothesis test is

Under the assumption that the null hypothesis is true, Q~ for large n. Be careful! Shumway and Stoffer will use the “Q” notation to mean something else in Section 3.9!

If Q>, then the null hypothesis is rejected and other models should be examined.

If Q , then there is not sufficient evidence against the null hypothesis. There is not evidence that the model should be changed. This is what you want to happen if you are satisfied with your model!

The direct function in R to perform the test is Box.test(). By specifying the option, type = "Ljung-Box", the Ljung-Box-Pierce hypothesis test is performed for a specified number of lags using the lag = option. By default, the Box-Pierce is performed where type = "Box-Pierce". Its test statistic is

Under the assumption that the null hypothesis is true, Q~ for large n. Box, Jenkins, and Reinsel (1994, p. 314-5) suggest the Ljung-Box-Pierce test statistic is better approximated by the 2 distribution than the Box-Pierce test.

Note that the tsdiag() function to be discussed soon will perform theLjung-Box-Pierce hypothesis test automatically along with calculating other items.

Examine the normality assumption for the wt by constructing histograms and Q-Q plots of the standardized residuals.

The Q-Q plot is a plot of the ordered residuals versus ordered quantiles from a normal distribution. If the points fall on a straight line, this suggests the normality assumption is satisfied. If they do not, then the normality assumption may be violated. Please see p. 3.43 of my STAT 870 lecture notes ( for information about Q-Q plots[b3] if needed.

The histogram can be plotted with a normal distribution overlay to assess normality. If there is evidence against the normality assumption (for example, histogram has large deviations from the normal distribution overlay), investigate similar transformations as previously done to solve nonstationary variance problems.

Perform hypothesis tests for i=0 and j=0 parameters.

If the test does not reject these hypotheses, then a different model may be needed. See Section 3.6 for a review of this hypothesis test.

Examine the various diagnostic measures in an iterative manner until at least one model satisfies all of the model assumptions!

Example: AR(1) with 1=0.7, =0, and = 1 (ar1_model_build.R, AR1.0.7.txt)

Step #4

> #########################################################

> # Step #4

> tsdiag(object = mod.fit1, gof.lag = 20)

> #Could also get the Ljung-Box-Pierce test this way.

> #Box.test(mod.fit1$residuals, lag = 20, type = "Ljung-

Box")

> #Box-Pierce test

> #Box.test(mod.fit1$residuals, lag = 20)

> par(mfrow = c(2,2))

> pacf(x = mod.fit1$residuals, lag.max = 20, main =

"Estimated PACF for residuals using ARIMA(1,0,0) fit",

xlim = c(1,20), ylim = c(-1,1))

> hist(x = mod.fit1$residuals, main = "Histogram of

residuals", xlab = "Residuals", freq = FALSE)

> curve(expr = dnorm(x, mean = mean(mod.fit1$residuals),

sd = sd(mod.fit1$residuals)), col = "red", add =

TRUE)

> qqnorm(y = mod.fit1$residuals, ylab = "Residuals",

panel.first = grid(col = "gray", lty = "dotted"))

> qqline(y = mod.fit1$residuals, col = "red")

> par(mfrow = c(1,1))

Below are the results from the MA(1) model.

> #MA(1)

> tsdiag(object = mod.fit2, gof.lag = 20)

> par(mfrow = c(2,2))

> pacf(x = mod.fit2$residuals, lag.max = 20, main =

"Estimated PACF for residuals using ARIMA(0,0,1)

fit", xlim = c(1,20), ylim = c(-1,1))

> hist(x = mod.fit2$residuals, main = "Histogram of

residuals", xlab = "Residuals", freq = FALSE)

> curve(expr = dnorm(x, mean = mean(mod.fit2$residuals),

sd = sd(mod.fit2$residuals)), col = "red", add = TRUE)

> qqnorm(y = mod.fit2$residuals, ylab = "Residuals",

panel.first = grid(col = "gray", lty = "dotted"))

> qqline(y = mod.fit2$residuals, col = "red")

> par(mfrow = c(1,1))

Notes:

Everything appears to be fine with the AR(1) model. There is a significant (barely) partial autocorrelation at lag 5. There are some people who will fit a model of the form:

xt = 1xt-1 + 5xt-5 + wt.

Notice that 2, 3, 4 are assumed to be 0 here. I prefer not to do this unless there is a known “seasonality” to the data. More on seasonal ARIMA (SARIMA) models in Section 3.9.

Problems with the MA(1) model were detected. There are some autocorrelations and partial autocorrelations found to be not 0. The Ljung-Box-Pierce hypothesis test also detects some autocorrelations are not 0.
On p. 3.8.4 in the notes, I performed a test of Ho:1=0 vs. Ha:10 and the test rejects Ho. The test statistic was

On p. 3.8.5 in the notes, I performed a test of Ho:1=0 vs. Ha:10and the test rejects Ho. The test statistic was

What if you do not reject?
The number of lags displayed on the plot of the Ljung-Box-Pierce test p-values can be changed using the gof.lagoption intsdiag(). Unfortunately, there is not an option to control the lags displayed for the ACF plot produced by tsdiag().

5)Using each model that gets through 4), pick the best model based upon model parsimony and the AIC. The best model corresponds to the one with the smallest number of AR and MA parameters and the smallest AIC.

Principle of parsimony – choose the model with the smallest number of parameters

Akaike’s information criterion (AIC)

The AIC is

-2+2(# of parameters estimating)

where is the log-likelihood evaluated with parameter estimates and is a vector of the estimated w’s (residuals).

Since the likelihood function is a measurement of “how plausible” a set of parameter estimates are, a criteria of model fit is to have be as large as possible. Multiplying by -2 means we want
-2 to be as small as possible.

The 2(# of parameters estimating) part serves as a “penalty” for the number of parameters in the model. This is needed because generally can be made larger by adding more parameters (more AR or MA terms).

In summary, the model with the smallest AIC is the “best” model to use (according to this criteria).

Since models are fit to the same data set, there are some parts of the likelihood function that will be the same for each model. This will cause some software packages to drop these parts of the likelihood function in the calculation of the AIC. Thus, different software packages can result in different AIC values so please be careful in comparing them across software packages!

Forecast error – choose the model that does the best with forecasting

Choose the model that has the smallest forecast error. How can you do this if the future values are not known?

Estimate the model without the last c observations where c<n. Forecast the c observations and find the mean square error. The smallest MSE corresponds to the best model using this criterion.

Problem: You are no longer using all of your observations in the model.

Example: AR(1) with 1=0.7, =0, and = 1 (ar1_model_build.R, AR1.0.7.txt)

Step #5

Since the MA(1) model did not “pass” the various diagnostic tests, the model should not be considered further. For illustrative purposes, I am considering it further here.

Both models have the same number of AR and MA parameters so parsimony can not be used to choose between the models.

The AIC was shown previously in the R output for the arima()function. Below are the AIC values again by extracting parts of the model fit object.

> #The AIC values were already given in the arima()

output, but here is another way to get them.

> mod.fit1$aic

[1] 319.363

> -2*mod.fit1$loglik+2*3

[1] 319.363

> mod.fit2$aic

[1] 335.9806

According to this criterion, the AR(1) model is the best since it has the lowest value.

6)Begin forecasting!

The estimated model of (1 – 0.6854B)xt = -0.1360 + wt

where = -0.4322 and =1.336 will be used for forecasting.

Example: ARIMA(1,1,1) with 1=0.7, 1=0.4, =9, n=200 (arima111_model_build.R, arima111.xls, examine.mod.R)

1)Construct plots of xt vs. t and the estimated ACF to determine if the time series data is stationary

> library(RODBC)

> z<-odbcConnectExcel("C:\\chris\\UNL\\STAT_time_series\\

chapter3\\arima111.xls")

> arima111<-sqlFetch(z, "Sheet1")

> close(z)

> head(arima111)

time x

1 1 -143.2118

2 2 -142.8908

3 3 -138.0634

4 4 -133.5038

5 5 -132.7496

6 6 -132.2910

> x<-arima111$x

> #########################################################

> # Step #1 and #2

> #Plot of the data

> win.graph(width = 8, height = 6, pointsize = 10)

> par(mfrow = c(1,1))

> plot(x = x, ylab = expression(x[t]), xlab = "t", type =

"l", col = c("red"), main = "Plot of the

arima111.xls data", panel.first=grid(col = "gray",

lty = "dotted"))

> points(x = x, pch = 20, col = "blue")

> #ACF and PACF of x_t

> par(mfrow = c(1,2))

> acf(x = x, type = "correlation", lag.max = 20, xlim =

c(1,20), ylim = c(-1,1), main = "Estimated ACF of the

arima111.xls data")

> pacf(x = x, lag.max = 20, xlim = c(1,20), ylim =

c(-1,1), xlab = "h", main = "Estimated PACF of the

arima111.xls data")

> par(mfrow = c(1,1))

The data appears to be nonstationary in the mean since there is a linear trend in the xt vs. t plot and the autocorrelations are going to 0 very slowly. Below is what happens after the first differences are taken.

> #Examine the first differences

> plot(x = diff(x = x, lag = 1, differences = 1), ylab =

expression(x[t]-x[t-1]), xlab = "t", type = "l", col

= c("red"), main = "Plot of the 1st differences for

arima111.xls data", panel.first=grid(col = "gray",

lty = "dotted"))

> points(x = diff(x = x, lag = 1, differences = 1), pch =

20, col = "blue")

> #ACF and PACF of x_t - x_t-1

> par(mfrow = c(1,2))

> acf(x = diff(x = x, lag = 1, differences = 1), type =

"correlation", lag.max = 20, xlim = c(1,20), ylim =

c(-1,1), main = "Est. ACF 1st diff. for arima111.xls

data")

> pacf(x = diff(x = x, lag = 1, differences = 1), lag.max

= 20, xlim = c(1,20), ylim = c(-1,1), xlab = "h",

main = "Est. PACF 1st diff. forarima111.xls data")

> par(mfrow = c(1,1))

The data now appears to be stationary in the mean. Also, there appears to be no evidence against constant variance.

2)Construct plots of the estimated ACF and PACF of the stationary series.

The plots are above. The ACF is tailing off toward 0 after q=1 lags. The PACF may also be tailing off toward 0 after p=1 lags. This would be indicative of an ARMA(1,1,1) model.

Other possible models include:

ARIMA(2,1,0)or ARIMA(3,1,0) since the PACF possibly cuts off to 0 after lag 2 or 3.
Try simple models such as ARIMA(1,1,0) or ARIMA(0,1,1) to see if ACF and PACF plots of the model residuals help determine what types of changes should be made to the model. This is often a very good strategy to follow when it is not obvious from an ACF or PACF what the model should be.

3)Find the estimated models using maximum likelihood estimation AND

4)For each model chosen, investigate diagnostic measures.

To help perform #4 correctly, I have written a function called examine.mod() which automatically does most of the items needed. The function puts together the code that was used in the AR(1) data example. You will need to run the entire function first before using it! Below is a screen capture of the function as it appears in WinEdt.

 2011 Christopher R. Bilder

3.8.1

 2011 Christopher R. Bilder

3.8.1

ARIMA(1,1,1):

> source("C:/chris/UNL/STAT_time_series/chapter3/

examine.mod.R")

> mod.fit111<-arima(x = x, order = c(1, 1, 1))

> mod.fit111

Call:

arima(x = x, order = c(1, 1, 1))

Coefficients:

ar1 ma1

0.6720 0.4681

s.e. 0.0637 0.0904

sigma^2 estimated as 9.558: log likelihood = -507.68, aic = 1021.36

> save.it<-examine.mod(mod.fit.obj = mod.fit111, mod.name

= "ARIMA(1,1,1)", max.lag = 20)

> save.it

ar1 ma1

10.545076 5.177294

$p.value

ar1 ma1

0.000000e+00 2.251268e-07

ARIMA(2,1,0):

> #ARIMA(2,1,0)

> mod.fit210<-arima(x = x, order = c(2, 1, 0))

> mod.fit210

Call:

arima(x = x, order = c(2, 1, 0))

Coefficients:

ar1 ar2

1.0137 -0.2371

s.e. 0.0690 0.0691

sigma^2 estimated as 9.97: log likelihood = -511.8, aic = 1029.59

> #Need to run function in examine.mod.R file first

> examine.mod(mod.fit.obj = mod.fit210, mod.name

= "ARIMA(2,1,0)", max.lag = 20)

ar1 ar2

14.701106 -3.431004

$p.value

ar1 ar2

0.0000000000 0.0006013512

ARIMA(3,1,0):