JD+ /
Towards a single RegArima modelling (Draft) /
Jean Palate /
7/2/2015 /
Introduction
Tramo-Seats and X12-ARIMA provide different RegArima modelling facilities. That can be disturbing for many users, especially when the two algorithms are integrated in single software, like JD+. For many reasons (transparency, coherence, maintenance…), it seems desirable to offer a unique pre-processing module for the two methods.
X12-ARIMA is based on an old version of Tramo; so, it doesn’t contain many recent improvements of Tramo (especially related to calendar effects, seasonality tests and over/under differencing). Moreover, if the two routines follow roughly the same logic, they differ significantly on most details.
Tramo is significantly faster (up to 10 times) and more stable than X12-ARIMA (see the tests of the SACE). On the other hand, X12-ARIMA offers some facilities that are not included in Tramo (handling of the leap year effect, changes of regime, automatic detection of the length of Easter effect…).
A common RegArima modelling should take the best aspects of both solutions; it should be mainly based on Tramo (logic, algorithms…), with the extensions provided by X12-ARIMA. Going into that direction implies clearly that such software will move away from the original programs.
We compare shortly below some aspects of the current implementations, considering the regression variables, the estimation methods and the main steps of the automatic model identification.
We propose at the end a road map towards a common implementation
Regression variables
Variable / Tramo / X12-Arima / JD+Trading days / X / X / X
Working days / X / X / X
Leap year / X / Special treatment in multiplicative model (optional) / Tramo and X12 like
Stock trading days / X / X / X
Easter effect / X (several definitions for the last day of the Easter period) / X (different mean correction) / Tramo and X12 like;
Julian Easter (no GUI[1])
Labor Day / X
Thanksgiving / X
Outliers / AO, TC, LS, SLS (seasonal level shift)
LS, SLS Outliers are 0-ending / AO, TC, LS, SO
LS, SO Outliers are 1-ending / AO, TC, LS, SO
Outliers are 0 or 1-ending
Ramps / X / X / X
Mean / X / X / X (no test)
Fixed seasonal / X / X (no GUI)
User-defined calendar effects / X (no test) / X / X
User-defined variables / X (no test) / X / X (no test)
Change of regime / X / X (no GUI)
Fixed coefficients / X / X
Estimation methods
Tramo
The estimation is based on the Kalman filter and the residuals are the one step-ahead forecast errors.
The optimization procedure is a specific version of the Levenberg-Marquardt algorithm; it uses the Hannan-Rissanen algorithm to compute initial values of the parameters.
X12
The estimation of the RegArima model is based on a modified version of the Ljung-Box algorithm. That solution is significantly slower than the Kalman filter. Moreover, it provides residuals that cannot be always easily interpreted[2].
The optimization procedure is a slightly modified version of the Minpack routines, also based on the Levenberg-Marquardt algorithm (other implementation); it uses pre-defined initial values of the parameters.
JD+
JD+ is very similar to Tramo. However, its optimization procedure slightly differs on some details. It should be noted that, for comparability issues, JD+ uses in few cases the same algorithm as X12 (computation of the residuals…).
Automatic model identification (AMI)
JD+ offers both implementations.
Main steps / Tramo / X12-ArimaPreliminary seasonality test / X
Log/level / BIC-based / AICC-based
Calendar effects / Automatic choice between WD, TD (F-test)
No test for holydays or user-defined calendars / AIC test (pre-specified variables, holidays, user-defined variables)
Easter effect / T-test / AIC test. Possible automatic choice between different lengths
Outliers detection / Fast detection based on approximate estimations / slow detection based on exact estimations
Other regression variables / No test / AIC test
Differencing / = / =
ARMA / Fast detection based on approximate estimations (Hannan-Rissanen) / slow detection based on exact estimations
Over/under differencing, residual seasonality, other final tests / Rich / Very limited
Comparison with default model / Optional
Road map
We consider below the different tasks that should be fulfilled to arrive at a common pre-processing module
Step 1
Common implementation of the regression variables.
The regression model should encompass all the options of each program. A unique definition should be adopted (Easter, outliers…). For calendar effects, additional definitions could be considered (for instance, Week Days+Saturdays+Sundays). Light development (1 month) and testing.
Step 2
Common estimation procedure.
For the estimation of RegArima models, the current choice of JD+ must be checked (Kalman filter + optimization procedure). The comparison must be done following different criteria: precision, robustness, speed. Few new developments, more testing (1 month).
Step 3
Extension of Tramo with features of X12
Modification of the current implementation of Tramo to take into account the additional features of X12 (preliminary leap year correction, automatic detection of the length of Easter effect, tests on any regression variable). Light developments (2 month)
Step 4
Possible improvements of some sub-modules of Tramo
Possible improvement of Tramo. Any automatic routine can always be improved. Even if Tramo has been fine-tuned by A. Maravall, some improvements are always possible (comparison with current X12 solutions…). Such research needs:
- The definition of criteria to compare models (see for instance what is currently used in Tramo: BIC, Ljung-Box of the residuals, number of outliers, stability tests…)
- The comparison of the current implementation against new modules (with simulated series and with real series); the impact should be measured for the sub-module and for the global algorithm (using the criteria mentioned above).
Some examples are given below:
- Some current seasonality tests seem too strict (QS significance level…) or not robust enough (spectral diagnostics); they could beimproved (perhaps).
- The current log/levels algorithms are not robust against the presence of additive outliers: they lead systematically to log transformations.
- The choice of the ARMA model in Tramo is based on Hannan-Rissanen; the robustness of that solution is not clear for complex models (especially with MA polynomials); moreover, the current algorithm seems to lead to sometimes unnecessary complex models.
- The current implementation of the outliers detection in Tramo is extremely efficient because it is based on simple approximations; however, the robustness of the method should be checked, especially at the beginning of the period.
- The calendar effects are sometimes removed too early in the processing; they could be re-introduced after the outliers detection (like in X12)
- More generally, the coherence of some tests (Easter…) should be improved.
Other remarks:
- Some steps may be processed in parallel (1 and 2 for instance)
- The training for hobby developers (September) could take some of the points discussed above as examples
- The proposed investigations could greatly improve the understanding of the routines and the sharing of the knowledge amongst the community
- The tests/improvements may be a long process, which can be spread over several years.
Conclusions and final remarks
Main questions:
Developing a unique regarima module implies automatically changes in the current core engines and more discrepancies in comparison with them. Do we accept such implications?
Thecost of such a development is not negligible (but manageable with the current resources). Are the benefits sufficient for undertaking it? What is the priority of the project?
Remarks
In any case, the development of a new pre-processing module will constitute a major release of the tool. It should be associated with other major modifications, like the change to Java 8. It could not be planned before the end of 2016.
The current versions of Tramo and of X12 should not disappear of the software; however, they should not evolve any more.
Bibliography
Gomez V. and Maravall A (1994): "Estimation, Prediction, and Interpolation for Nonstationary Series with the Kalman Filter", Journal of the American Statistical Association, vol. 89, n° 426, 611-624.
Ljung G. M., Box G.E.P. (1979), "The Likelihood Function of Stationary Autoregressive-Moving Average Models", Biometrika, 66, 2, 265-270.
Otto M. C., Bell W.R., Burman J.P. (1987), "An Iterative GLS Approach to Maximum Likelihood Estimation of Regression Models with Arima Errors", Bureau of the Census, SRD Research Report CENSUS/SRD/RR_87/34.
[1]Graphical User Interface
[2]T. Mc Elroy, from the US-Census Bureau, also thinks that the current residuals of the X12 may be sometimes strange and that they should not be used for testing (they are not NIID).