1.Characteristics of Time Series

1.1

1.Characteristics of Time Series

1.1Introduction

We are going to examine data that has been observed over time. Typically, there is a correlation between the observed data which limits our ability to use “conventional” statistical analysis methods.

Remember that many statistical applications rely on having observations that are independent.

In this class, we are going to learn how to identify this correlation and use it to help construct models. The models are then used to forecast “future” observations. These type of analyses fall under the title “time series analysis”. For most of our course, we will focus on modeling ONE series of observations without any explanatory variables. A few sections in Chapter 5 are exceptions where we will incorporate explanatory variables.

Read Shumway and Stoffer’s introduction to the methods discussed in this book.

1.2The Nature of Time Series Data

Example: OSU enrollment data (osu_enroll.R, osu_enroll.xls)

Partial listing of the data:

t / Semester / Year / Enrollment
1 / Fall / 1989 / 20,110
2 / Spring / 1990 / 19,128
3 / Summer / 1990 / 7,553
4 / Fall / 1990 / 19,591
5 / Spring / 1991 / 18,361
6 / Summer / 1991 / 6,702
 /  /  / 
32 / Spring / 2000 / 19,835
33 / Summer / 2000 / 7,202
34 / Fall / 2000 / 21,252
35 / Spring / 2001 / 20,004
36 / Summer / 2001 / 7,558
37 / Fall / 2001 / 21,872
38 / Spring / 2002 / 20,992
39 / Summer / 2002 / 7,868
40 / Fall / 2002 / 22,992

> library(RODBC)

> z<-odbcConnectExcel("C:\\chris\\UNL\\STAT_time_series

\\chapter1\\osu_enroll.xls")

> osu.enroll<-sqlFetch(z, "Sheet1")

> close(z)

> head(osu.enroll)

t Semester Year Enrollment date

1 1 Fall 1989 20110 1989-08-31

2 2 Spring 1990 19128 1990-02-01

3 3 Summer 1990 7553 1990-06-01

4 4 Fall 1990 19591 1990-08-31

5 5 Spring 1991 18361 1991-02-01

6 6 Summer 1991 6702 1991-06-01

> tail(osu.enroll)

t Semester Year Enrollment date

35 35 Spring 2001 20004 2001-02-01

36 36 Summer 2001 7558 2001-06-01

37 37 Fall 2001 21872 2001-08-31

38 38 Spring 2002 20922 2002-02-01

39 39 Summer 2002 7868 2002-06-01

40 40 Fall 2002 22992 2002-08-31

> #One way to do plot

> win.graph(width = 8, height = 6, pointsize = 10)

> plot(x = osu.enroll$Enrollment, ylab = "OSU Enrollment",

xlab = "t (time)", type = "l", col = "red",

main = "OSU Enrollment from Fall 1989 to Fall 2002",

panel.first=grid(col = "gray", lty = "dotted"))

> points(x = osu.enroll$Enrollment, pch = 20, col = "blue")

We will often use “t” to represent time so that we can say x1 = 20,110, x2 = 19,128, …, x40 = 22,992.

When only “x” is specified in the plot() function, R puts this on the y-axis and uses the observation number on the x-axis. Compare this to the next plot below where both “x” and “y” options are specified.

> #More complicated plot

> plot(y = osu.enroll[osu.enroll$Semester ==

"Fall",]$Enrollment,

x = osu.enroll[osu.enroll$Semester== "Fall",]$t,

ylab = "OSU Enrollment", xlab = "t (time)", col =

“blue", main = "OSU Enrollment from Fall 1989 to Fall

2002", panel.first=grid(col = "gray", lty =

dotted"), pch = 1, type = "o", ylim = c(0,

max(osu.enroll$Enrollment)))

> lines(y = osu.enroll[osu.enroll$Semester ==

"Spring",]$Enrollment,

x = osu.enroll[osu.enroll$Semester == "Spring",]$t,

col = "red", type = "o", pch = 2)

> lines(y = osu.enroll[osu.enroll$Semester ==

"Summer",]$Enrollment,

x = osu.enroll[osu.enroll$Semester == "Spring",]$t,

col = "darkgreen", type = "o", pch = 3)

> legend(x = locator(1),legend=c("Fall", "Spring",

"Summer"), pch = c(1,2,3), lty = c(1,1,1),

col=c("blue", "red", "darkgreen"), bty="n")

> #Another way to do plot with actual dates

> plot(y = osu.enroll$Enrollment, x =

as.Date(osu.enroll$date), xlab = "Time", type = "l",

col = "red", main = "OSU Enrollment from Fall 1989 to

Fall 2002", ylab = “OSU Enrollment”)

> points(y = osu.enroll$Enrollment, x =

as.Date(osu.enroll$date), pch = 20, col = "blue")

> #Create own gridlines

> abline(v = as.Date(c("1990/1/1", "1992/1/1", "1994/1/1",

"1996/1/1", "1998/1/1", "2000/1/1", "2002/1/1")),

lty = "dotted", col = "lightgray")

> abline(h = c(10000, 15000, 20000), lty = "dotted", col =

"lightgray")

> #There may be better ways to work with actual dates.

Questions of interest:

1)What patterns are there over time?

2)How can the correlation between observations be used to help model the data?

3)Can future enrollment be predicted using this data?

4)Most of the time, we will only use past values in the series to predict future values. However, in this case, what explanatory variables (independent variables,covariates) may be useful to use to predict enrollment?

5)Why is modeling enrollment and predicting future enrollment important?

5-8-01 O’Collegian article: “$1.8 million loss attributed to slight enrollment decline”

Example: Russell 3000 Index (russell_3000.R, russell.xls)

Source:

The index “measures the performance of the 3,000 largest United States companies based on total market capitalization, which represents approximately 98% of the investable United States equity market.”

> library(RODBC)

> z<-odbcConnectExcel("C:\\chris\\UNL\\STAT_time_series

\\chapter1\\russell.xls")

> russell<-sqlFetch(z, "Sheet1")

> close(z)

> head(russell)

Index Name Date Value Without Dividends Value With Dividends

1 Russell 3000® Index 1995-06-01 555.15 1034.42

2 Russell 3000® Index 1995-06-02 555.15 1034.56

3 Russell 3000® Index 1995-06-05 558.72 1041.21

4 Russell 3000® Index 1995-06-06 558.50 1041.04

5 Russell 3000® Index 1995-06-07 556.45 1037.21

6 Russell 3000® Index 1995-06-08 555.83 1036.18

> tail(russell)

Index Name Date Value Without Dividends Value With Dividends

674 Russell 3000® Index 1997-12-23 965.71 1891.83

675 Russell 3000® Index 1997-12-24 960.20 1881.05

676 Russell 3000® Index 1997-12-26 963.43 1887.42

677 Russell 3000® Index 1997-12-29 979.26 1919.07

678 Russell 3000® Index 1997-12-30 996.66 1953.32

679 Russell 3000® Index 1997-12-31 998.26 1956.51

> #One way to do plot

> win.graph(width = 8, height = 6, pointsize = 10)

> plot(x = russell$"Value Without Dividends", ylab =

"Russell 3000 Index", xlab = "t (time)", type = "l",

col = "red", main = "Russell 3000 Index from 6/1/1995

to 12/31/1997", panel.first=grid(col = "gray", lty =

"dotted"))

> points(x = russell$"Value Without Dividends", pch = 20,

col = "blue")

> #Another way to do plot with actual dates

> plot(y = russell$"Value Without Dividends", x =

as.Date(russell$Date), xlab = "Time", type = "l", col =

"red", main = "Russell 3000 Index from 6/1/1995 to

12/31/1997", ylab = "Russell 3000 Index", xaxt = "n")

> axis.Date(side = 1, at = seq(from = as.Date("1995/6/1"),

to = as.Date("1997/12/31"), by = "months"), labels =

format(x = seq(from = as.Date("1995/6/1"), to =

as.Date("1997/12/31"), by = "months"), format =

"%b%y"), las = 2) #las changes orientation of labels

> points(y = russell$"Value Without Dividends", x =

as.Date(russell$Date), pch = 20, col = "blue")

> #Create own gridlines

> abline(v = as.Date(c("1995/7/1", "1996/1/1", "1996/7/1",

"1997/1/1", "1997/7/1")),lty = "dotted", col =

"lightgray")

> abline(h = seq(from = 600, to = 1000, by = 100), lty =

"dotted", col = "lightgray")

Questions of interest:

1)What patterns are there over time?

2)How can the correlation between observations be used to help model the data?

3)Can future index values be predicted using this data?

4)Why would modeling the Russell 3000 Index and predicting future values be important?

Example: Sunspots (sunspots.R, sunspots.csv)

Number of sunspots per year on the sun from 1784-1983.

> sunspots.data<-read.table(file = "C:\\chris\\UNL\\

STAT_time_series\\chapter1\\sunspots.csv", header=TRUE,

sep = ",")

> head(sunspots.data)

Year Sunspots

1 1784 10.2

2 1785 24.1

3 1786 82.9

4 1787 132.0

5 1788 130.9

6 1789 118.1

> tail(sunspots.data)

Year Sunspots

195 1978 92.50

196 1979 155.40

197 1980 32.27

198 1981 54.25

199 1982 59.65

200 1983 63.62

> win.graph(width = 8, height = 6, pointsize = 10)

> plot(x = sunspots.data$Sunspots, ylab = "Number of

sunspots", xlab = "t (time)", type = "l", col = "red",

main = "Sunspots per year from 1784 to 1983",

panel.first=grid(col = "gray", lty = "dotted"))

> points(x = sunspots.data$Sunspots, pch = 20, col =

"blue")

> plot(y = sunspots.data$Sunspots, x = sunspots.data$Year,

ylab = "Number of sunspots", xlab = "Year", type = "l",

col = "red", main = "Sunspots per year from 1784 to

1983", panel.first=grid(col = "gray", lty = "dotted"))

> points(y = sunspots.data$Sunspots, x =

sunspots.data$Year, pch = 20, col = "blue")

> #Convert to an object of class "ts"

> x<-ts(sunspots.data$Sunspots, start = 1784,

frequency = 1)

> class(x)

[1] "ts"

> class(sunspots.data$Sunspots)

[1] "numeric"

> x

Time Series:

Start = 1784

End = 1983

Frequency = 1

[1] 10.20 24.10 82.90 132.00 130.90 118.10 89.90 66.60 60.00 46.90 41.00 21.30 16.00 6.40 4.10

[16] 6.80 14.50 34.00 45.00 43.10 47.50 42.20 28.10 10.10 8.10 2.50 0.00 1.40 5.00 12.20

EDITED

[196] 155.40 32.27 54.25 59.65 63.62

> plot.ts(x = x, ylab = expression(paste(x[t], " (Number of

sunspots)")), xlab = "t (year)", type = "o", col =

"red", main = "Sunspots per year from 1784 to 1983")

Notes:

The sunspot values are not necessarily integers.
Every object in R has a class. For time series data, it is sometimes useful to use a “ts” class type with it.

Questions of interest:

1)What patterns are there over time?[CB1]

2)How can the correlation between observations be used to help model the data?

3)Can future sunspots be predicted using this data?

4)Why would modeling the number of sunspots and predicting future values be important?

See Shumway and Stoffer for more examples!

1.3Time Series Statistical Models

Stochastic process – a collection of random variables {Xt} indexed by t

Time series – collection of random variables indexed according to the order they are obtained in time.

Let Xt be the random variable at time t

Then

X1 = random variable at time 1

X2 = random variable at time 2



A realization of the stochastic process is the observed values

The observed values are denoted by x1, x2, … .

Notice that lowercase letters are used to denote the observed value of the random variables.

NOTE: Shumway and Stoffer say the following:

Because it will be clear from the context of our discussions, we will use the term time series whether we are referring to the process or to a particular realization and make no notational distinction between the two concepts.

What does this mean? There will be no notational differentiation made between the random variables and their observed values. [CB2]Shumway and Stoffer will typically use a lowercase letter – xt.

Example: White noise [CB3](white_noise.R)

The simplest kind of time series is a collection of independent and identically distributed random variables with mean 0 and constant variance.

This can be written as wt ~ independent (0,) for t=1,…,n.

Most often, the probability distribution is assumed to be a normal probability distribution.

This can be written as wt ~ indepedent N(0,) for t=1,…,n.

What does this mean?

Each wt has a normal distribution with mean of 0 and a constant variance.
w1, w2, …, wn are independent of each other

Given this set up, answer the following questions:

What patterns are there over time (t)?

How can the correlation between observations be used to help model the data?
How can we “simulate” a white noise process using R?

Since each random variable is independent, we could simulate 100 observations from a normal distribution. I am going to use = 1 here.

> set.seed(8128)

> w<-rnorm(n = 100, mean = 0, sd = 1)

> head(w)

[1] -0.10528941 0.25548490 0.82065388 0.04070997

-0.66722880 -1.54502793

#Using plot.ts() which is set up for time series plots

> win.graph(width = 6, height = 6, pointsize = 10)

> plot.ts(x = w, ylab = expression(w[t]), xlab = "t", type

= "o", col = "red",main = expression(paste("White

noise where ", w[t], "~ ind. N(0, 1)")),

panel.first=grid(col = "gray", lty = "dotted"))

> #Advantage of second plot is separate control over color

of points

> plot(x = w, ylab = expression(w[t]), xlab = "t", type =

"l", col = "red", main = expression(paste("White

noise where ", w[t], " ~ ind. N(0, 1)")),

panel.first = grid(col = "gray", lty = "dotted"))

> points(x = w, pch = 20, col = "blue")

Given this data set, answer the following questions:

What patterns are there over time (t)?
How can the correlation between observations be used to help model the data?

Suppose another white noise process is simulated. To create a plot overlaying the two time series, use the code below.

> set.seed(1298)

> w.new<-rnorm(n = 100, mean = 0, sd = 1)

> head(w.new)

[1] 1.08820292 -1.46217413 -1.10887422 0.55156914 0.70582813 0.05079594

plot(x = w, ylab = expression(w[t]), xlab = "t", type =

"l", col = "red", main = expression(paste("White

noise where ", w[t], "~ ind. N(0, 1)")), panel.first

=grid(col = "gray", lty = "dotted"), c(min(w.new,

w), max(w.new, w)))

> points(x = w, pch = 20, col = "blue")

> lines(x = w.new, col = "green")

> points(x = w.new, pch = 20,col = "orange")

> legend(x = locator(1),legend=c("Time series 1", "Time

Series 2"), lty=c(1,1), col=c("red", "green"),

bty="n")

> win.graph(width = 8, height = 6, pointsize = 10)

> par(mfrow = c(2,1))

> plot(x = w, ylab = expression(w[t]), xlab = "t", type =

"l", col = c("red"), main = expression(paste("White

noise where ", w[t], "~N(0, 1)")), panel.first =

grid(col = "gray", lty = "dotted"))

> points(x = w, pch = 20, col = "blue")

> plot(x = w.new, ylab = expression(w.new[t]), xlab =

"t", type = "l", col = c("green"), main =

expression(paste("White noise where ", w[t], "~ind.

N(0, 1)")), panel.first=grid(col = "gray", lty =

"dotted"))

> points(x = w.new, pch = 20, col = "orange")

Example: Moving average of white noise (moving_average.R)

The previous time series had no correlation between the observations. One way to induce correlation is to create a “moving average” of the observations. This will have an effect of “smoothing” the series.

Let mt = (wt + wt-1 + wt-2)/3

Note: This is different from the example given in Shumway and Stoffer p. 13 where they find (wt+1 + wt + wt-1)/3.

This can be done in R using the following code:

> set.seed(8128)

> w<-rnorm(n = 100, mean = 0, sd = 1)

> head(w)

[1] -0.10528941 0.25548490 0.82065388 0.04070997

-0.66722880 -1.54502793

> m<-filter(x = w, filter = rep(x = 1/3, times = 3),

method = "convolution", sides = 1)

> head(m)

[1] NA NA 0.32361646 0.37228292 0.06471168 -0.72384892

> tail(m)

[1] 0.3158762 -0.1803096 0.2598066 -0.6450531 -0.5879723 -0.9120182

> (w[1]+w[2]+w[3])/3

[1] 0.3236165

> (w[98]+w[99]+w[100])/3

[1] -0.9120182

> #This is what the book does

> #m<-filter(x = w, filter = rep(x = 1/3, times = 3),

method = "convolution", sides = 2)

> par(mfrow = c(1,1))

> plot(x = m, ylab = expression(m[t]), xlab = "t", type =

"l", col = c("brown"), lwd = 1, main =

expression(paste("Moving average where ", m[t] ==

(w[t] + w[t-1] + w[t-2])/3)), panel.first=grid(col =

"gray", lty = "dotted"))

> points(x = m, pch = 20, col = "orange")

> #NOTE: The gridlines are not located in the correct

locations

Comparing mt to wt

> par(mfrow = c(1,1))

> plot(x = m, ylab = expression(paste(m[t], " or ",

w[t])), xlab = "t", type = "l", col = c("brown"), lwd

= 4, ylim = c(max(w), min(w)), main =

expression(paste("Moving average where ", m[t] ==

(w[t] + w[t-1] + w[t-2])/3)), panel.first=grid(col =

"gray", lty = "dotted"))

> points(x = m, pch = 20, col = "orange")

> lines(x = w, col = "red", lty = "dotted")

> points(x = w, pch = 20,col = "blue")

> legend(x = locator(1), legend=c("Moving average",

"White noise"), lty=c("solid", "dotted"), col=

c("brown", "red"), lwd = c(4,1), bty="n")

Given these observed values of mt, answer the following questions:

What patterns are there over time (t)?
How can the correlation between observations be used to help model the data?

The plot below shows a 7-point moving average (see program for code).

Example: Autoregressions (ar1.R)

An “autoregression” model uses past observations to predict future observations in a regression model.

Suppose the autoregression model is

xt = 0.7xt-1 + wt where wt~independent N(0,1) for t=1,…,n.

Notice how similar this is to a regression model from STAT 870! Since there is one past period on the right hand side, this is often denoted as an AR(1) model where AR stands for “autoregressive”.

Therefore,

x2 = 0.7x1 + w2

x3 = 0.7x2 + w3



Obviously, there will be a correlation between the random variables.

Below is one way inR to simulate observations from this model.

> set.seed(6381) #Different seed from white_noise.R

> w<-rnorm(n = 200, mean = 0, sd = 1)

> head(w)

[1] 0.06737166 -0.68095839 0.78930605 0.60049855
-1.21297680 -1.14082872

> #######################################################

> # autoregression

> #Simple way to simulate AR(1) data

> x<-numeric(length = 200)

> x.1<-0

> for (i in 1:length(x)) {

x[i]<-0.7*x.1 + w[i]

x.1<-x[i]

}

> head(cbind(x, w))

x w

[1,] 0.06737166 0.06737166

[2,] -0.63379823 -0.68095839

[3,] 0.34564730 0.78930605

[4,] 0.84245166 0.60049855

[5,] -0.62326064 -1.21297680

[6,] -1.57711117 -1.14082872

> #Do not use first 100

> x<-x[101:200]

> win.graph(width = 8, height = 6, pointsize = 10)

#Opens up wider plot window than the default (good for

time series plots)

> plot(x = x, ylab = expression(x[t]), xlab = "t", type =

"l", col = c("red"), lwd = 1 , main =

expression(paste("AR(1): ", x[t] == 0.7*x[t-1] +

w[t])) , panel.first=grid(col = "gray", lty =

"dotted"))

> points(x = x, pch = 20, col = "blue")

Notes:

Notice the syntax of the for loop.
See the the first 6 rows of x and w right after the for loop. Make sure you understand how the data was simulated!!!

The 1st value of x is

0.06737166 = 0.70 +0.06737166

The 2nd value of x is

-0.6337982 = 0.7(0.06737166) – 0.68095839

The 3rd value of x is

0.3456473 = 0.7(-0.6337982) +0.78930605

Why are the first 100 observations discarded? [CRB4]
For those of you who have taken a course where AR() structures of a covariance matrix are discussed, what do you think the approximate correlation between xt and xt-1 is?

Here is an easier way to simulate observations from an AR(1). Note that this uses an Autoregressive Integrated Moving Average (ARIMA) structure that we will discuss in Chapter 3. In this case, I use = 10.

> set.seed(7181)

> x<-arima.sim(model = list(ar = c(0.7)), n = 100,

rand.gen = rnorm, sd = 10)

> plot(x = x, ylab = expression(x[t]), xlab = "t", type =

"l", col = c("red"), lwd = 1 ,main =

expression(paste("AR(1): ", x[t] == 0.7*x[t-1] +

w[t])), panel.first=grid(col = "gray", lty =

"dotted"))

> points(x = x, pch = 20, col = "blue")

More notes:

Both the moving average and autoregressive models will be discussed extensively in Chapter 3.
See how Shumway and Stoffer are kind of trying to match up their simulated data plots in Section 1.3 to actual data plots in Section 1.2. They are doing this because we want to develop an equation that reasonably mimics or “models” real data.

1.4Measures of Dependence: Autocorrelation and Cross-Correlation

We would like to understand the relationship between all random variables in a time series. In order to do that, we would need to look at the joint distribution function.

Suppose the time series consists of the random variables . Then the cumulative joint distribution function is:

F(c1, c2, …, cn) =

This can be VERY difficult to examine over the MULTIDIMENSIONS. Note the t1,…,tn subscripts are used just to denote a “general” set of times – not necessarily 1, 2, …, n.

Instead, it is often easier to look at the one or two dimensional distribution functions. The one-dimensional cumulative distributional function is denoted by Ft(x) = P(xtx) for a random variable xt at time t. The corresponding probability distribution function is

The mean value function is

Shumway and Stofferwill drop the subscript x from xt where there is no confusion about what random variable is used.

Important: The interpretation of t is that it represents the mean taken over ALL possible events that could have produced xt. Another way to think about it is suppose that is observed an infinite number of times. Then represents the average value at time t1, represents the average value at time t2, …

Example: Moving Average

Let mt = (wt + wt-1 + wt-2)/3 where wt~ind. N(0,1) for t=1,…,n.

Then

t= E(mt)

= E[(wt + wt-1 + wt-2)/3]

= (1/3) E(wt + wt-1 + wt-2)

= (1/3) [E(wt) + E(wt-1) + E(wt-2)]

= (1/3) [0 + 0 + 0]

= 0

Example: Autoregressions

Let xt = 0.7xt-1 + wt where wt~ind. N(0,1) for t = 1, …, n.

Then

t= E(xt)

= E(0.7xt-1 + wt)

= 0.7E(xt-1) + E(wt)

= 0.7E(0.7xt-2+wt-1) + 0



= 0[CRB5]

Autocovariance function

To assess the dependence between two random variables, we need to examine the two-dimensional cumulative distribution function. This can be denoted as F(cs, ct) = P(xscs, xtct) for two different time points s and t.

In STAT 870, you learned about the covariance function which measured the linear dependence between two random variables (see Chapter 5 of Kutner, Nachtsheim, and Neter (2004)). Since we are interested in linear dependence between two random variables in the same time series, we will examine the autocovariance function:

x(s,t) = Cov(xs, xt)

= E[(xs– s)(xt– t)] for all s and t.

where and assuming continuous random variables

Notes:

If the autocovariance is 0, there is no linear dependence.
If s=t, the autocovariance is the variance:
x(t,t)= E[(xt-t)2]
Shumway and Stofferwill drop the subscript x on  when there is no confusion about which time series is being discussed.

Example: White noise

Suppose wt ~ ind. N(0,) for t=1,…,n. What is (s,t) for s=t and st?

Example: Moving Average

Let mt = (wt + wt-1 + wt-2)/3 where wt~ind. N(0,1) for t=1,…,n.

E[(ms-s)(mt-t)] = E[msmt] since s=t=0

Then

E[msmt] = E[(ws + ws-1 + ws-2)/3  (wt + wt-1 + wt-2)/3]

= (1/9)E[(ws + ws-1 + ws-2)(wt + wt-1 + wt-2)]

To find this, we need to examine a few different cases:

E[mtmt] = E[] = Var(mt) + E(mt)2since Var(mt) =

E[] - E(mt)2

= (1/9){Var(wt + wt-1 + wt-2)}+ 02

= (1/9){Var(wt) + Var(wt-1) + Var(wt-2)} since wt’s are

independent

= (1/9)(1+1+1) = 3/9

s=t-1

E[mt-1mt] = (1/9)E[(wt-1 + wt-2 + wt-3)(wt + wt-1 + wt-2)]

= (1/9)E[wt-1wt + wt-1wt-1 + wt-1wt-2 + wt-2wt + wt-2wt-1 +

wt-2wt-2 + wt-3wt + wt-3wt-1 + wt-3wt-2]

= (1/9)[E(wt-1wt) + E(wt-1wt-1) + E(wt-1wt-2) + E(wt-2wt)

+ E(wt-2wt-1) + E(wt-2wt-2) + E(wt-3wt)

+E(wt-3wt-1) + E(wt-3wt-2)]

= (1/9)[E(wt-1)E(wt) + E() + E(wt-1)E(wt-2)