JBench

JBench is a java console application for multivariate benchmarking. It is able to benchmark small to medium-sized sets of data, following temporal and/or contemporaneous constraints. The constraints may be binding (fixed) or not. They may represent the margins of multi-ways tables or hierarchical structures or any other kind of relationships.

The underlying method is an extension of the Cholette's method, which generalizes, amongst others, the additive and the multiplicative Denton's procedure as well as simple proportional benchmarking.

The first aim of the software is to provide a reconciliation method when direct seasonal adjustment is used on a set of related time series or when annual benchmarking is needed. That is why it works on outputs generated by Demetra+. However, the software can be used for a larger set of benchmarking problems.

Finally, it should be mentioned that the actual algorithms used by JBench are included in the package jtstoolkit.jar, which is also the basic library of JDemetra+. The method can be executed by direct function calls instead of by means of the command line described below.

Brief description of the algorithm

Contrary to usual implementations, which are based on expensive matrix computation, JBench uses an approached based on state space forms (ssf) and on their related Kalman smoother. That solution increases dramatically the performances and allows the exact handling of complex relationships of medium-sized data set (up to several hundreds of monthly series). We shortly mention below, without any technical details (see the technical document of jtstoolkit for further information), the key points of the implementation.

  • The Cholette's method is put in state space form by considering that follows an auto-regressive model of order 1: . Parameterscorrespond to an additive Denton, to a multiplicative Denton andto a proportional method.
  • The ssf-form of the multivariate problem is achieved by "stacking" the individual ssf-forms
  • The constraints are handled as "pseudo-observations"; so, the multivariate ssf-form has one measurement equation for each constraint.
  • The multivariate ssf problem is handled in its univariate form (see Durbin and Koopman (DK) for further details).
  • When the problem is diffuse (), the solution proposed by DK for an exact initialization is followed. However, the usual implementation is slightly modified to get round numerical instabilities that are often encountered: the diffuse part is handled by means of the array filter approach (see Kailath, Sayed...), which corresponds to a square root filter.
  • The disturbance smoother of DK is used instead of the usual smoother; such a smoother needs much less memory, at the price of a (usually) small loss of stability.

Description of the parameters

The program islaunched by means of the following command line:

java [-XmxZZZZm] -jar [xxx/]jbench.jar -iinputFile[-r rho][-l lambda] [-d dconstraintsFile] [-t tconstraintsFile] [-c cconstraintsFile][-o outputFile]

The different parameters are described below.

Parameters

-XmxZZZZm

The optional -XmxZZZZm parameter (where ZZZZ stands for the actual size) defines the memory allocated to the Java runtime. For large sets of data, -Xmx1024m is usually a good option.For small sets of data (or if the global parameters of Java are already set in that way), the parameter can be omitted

-iinputFile

The -iinputFile parameter is the only mandatory one (except of course the -jar option). It identifies the file that will provide all the input time series for the processing (except the series included in the -d option; see below).

The format of the input file corresponds to the default csv output (list presentation) produced by Demetra+. We recall it in the annex.

-r rho

Auto-regressive parameter of the model (1 for "Denton"). The default value is 1.

-l lambda

Power of the weighted observations (0 for additive, .5 for proportional, 1 for multiplicative). The default value is 1.

-d dconstraintsFile

The -d option identifies temporal constraints corresponding to (some) input series. The temporal constraints are provided in the same format as the input file. Input series and temporal constraints are associated using their identifiers, which must be exactly the same. The temporal constraints may be expressed in the aggregation frequency or in the same frequency as the original series. In the latter case, annual constraints are applied by default. A typical use of that option corresponds to the following scenario:

  • Use Demetra+ to seasonally adjust a set of series.
  • Generate csv files (you get for instance demetra_y.csv, demetra_ycal.csv, demetra_sa.csv...
  • Use JBench with the command ".... -i demetra_sa.csv -d demetra_ycal.csv" (or demetra_y.csv).
  • The results contains the usual univariate benchmarked series
-t tconstraintsFile

The optional -t file expresses the temporal constraints in another (more flexible) way: it defines the mapping between an aggregated series and its disaggregated counterpart; both series must belong to the input file; the mapping is defined in a csv file as follows (the identifiers correspond of course to the identifiers of the input file):

aggregate1,details1

aggregate2,details2

...

-c cconstraintsFile

The optional -c file (csv format) defines the contemporaneous constraints. Contemporaneous constraints may be binding (the constraint is fixed) or not. Each constraint corresponds to a line in the csv file, using the following conventions:

Binding case

Equation:

csv format:

Unbinding case

Equation:

csv format:

The identifiers of the variables may contain the usual wild cards (? or *). In such a case, the same coefficient is applied to each series of the input file that matches the criterion (except the binding constraint).

Example:

Equation:

csv format: (binding) or "(unbinding)

-o outputFile

By default the results are stored in the bench.csv file. However, the user can specify another file by means of that option. The output file will only contain the endogenous series (binding constraints are not included in the output).

Remarks:

When specifying constraints, the user has to verify that

  • The constraints are coherent
  • The constraints are not redundant.

Those points are essential for the success of the processing

The current version of the software doesn't check the compatibility of the constraints. Moreover, it doesn't remove unnecessary contemporaneous constraints (redundant temporal constraints are automatically removed). Future releases will improve that point.

For a good understanding of some results, the lecture of the reference book of Dagum and Cholette ("Benchmarking, Temporal distribution and Reconciliation Methods for Time Series") is strongly recommended (see especially the chapters on the reconciliation of one-way and of two-ways tables without temporal aggregation constraints).

Examples

1. Uni-variate multiplicative denton benchmarking, using outputs of Demetra+

java -jar jbench.jar -idemetra_sa.csv -d demetra_ycal.csv

2. Additive 2-ways Denton's like benchmarking

java -jar jbench.jar -i test.csv -c ctest.csv -l 0

with the following files:

test.csv

s11 ...

s12 ...

s21 ...

s22...

r1...

r2...

c1...

c2...[c2=r1+r2-c1]

ctest.csv

r1,,s1?,1 [r1=s11+s12]

r2,,s2?,1 [r2=s21+s22]

c1,,s?1,1 [c1=s11+s21]

[c2,,s?2,1 c2=s12+s22 is omitted]

3. Denton benchmarking, using outputs of Demetra+, with additional constraint on the totals (identified by the series all, which must be defined in the input file.)

java -jar jbench.jar -i demetra_sa.csv -d demetra_ycal.csv -c all.csv

all.csv

all,,*,1

Bibliography

DAGUM, B.E. and CHOLETTE P.A. (2006).Benchmarking, Temporal Distribution, and Reconciliation Methods for Time Series, Springer.

DURBIN, J. and KOOPMAN, S. J. (2001).Time Series Analysis by State Space Methods. Oxford Statistical Science Series.

HARVEY, A.C. (1989), "Forecasting, Structural Time Series Models and the Kalman Filter", Cambridge University Press.

PIZZINGA, A. (2009). Diffuse Restricted Kalman Filtering. 31º Meeting of the Brazilian Econometric Society.(

Annex

csv data format (options -i, -d and output)

Each series is described in a single row, composed of:

Identifier, frequency, first year, first period, number of observations, observations.

Example for the monthly series "Test", starting in January 2012 and containing 5 observations:

Id / Freq / Year0 / Period0 / N. data / d1 / d2 / d3 / d4 / d5
Test / 12 / 2012 / 1 / 5 / 1.0 / 2.0 / 3.0 / 4.0 / 5.0

Content of the csv file:

"Test, 12, 2012, 1, 5, 1.0, 2.0, 3.0, 4.0, 5.0"

Be aware that the csv format depends on the regional settings.