For a time series, autoregressive integrated moving average (ARIMA) assumes the future value is linear with several past observations and random errors and it can be represented as ARIMA(p,d,q)(P,D,Q), where (p,d,q) is the non-seasonal part, (P,D,Q) is the seasonal part, p is the order of non-seasonal autoregression, d is the number of regular differencing, q is the order of non-seasonal MA, P is the order of seasonal autoregression, D is the number of seasonal differencing, Q is the order of seasonal MA. In practical terms, MA processes are more useful for modeling short-term fluctuations, while AR processes are more useful for modeling longer-term effects. For example, if it has monthly data, a non-seasonal order 1 AR process would model February’ value based on January’s value, while a seasonal order 1 AR process would model February’ value based on the previous February’ value.
Generally, ARIMA model-building procedure consists of three steps:
Step 1: Model Identification
a) Performing appropriate differencing of series to achieve stationary if it is necessary.
b) Examining autocorrelation(ACF) and partial autocorrelation(PACF) functions to identify the temporal correlation structure of the transformed data.
c) Selecting the model with the minimum Akaike Information Criterion(AIC) as the best fit model. The AIC is computed as:
AIC=n[ln((2πRSS)/n)+1]+2m (1)
Where, m=(p+q+P+Q) is the number of terms estimated in the model and RSS denotes the sum of squared residuals.
Step 2: Parameter Estimation
Estimating the coefficients of the identified model.
Step 3: Diagnosis
Analyzing the model residuals. The residuals of a good forecasting model must satisfy the requirements of a white noise process (i.e. uncorrelated and normally distributed around the mean of zero).
a) The ACF and PACF function of the residual series should not be significantly different from 0.
b) The residuals should be without pattern. A common test for this is the Box-Ljung Q statistics The Q values at a lag of about one-quarter of the sample size (but no more than 50) should be checked and this statistic should not be significant.