/ EUROPEAN COMMISSION
EUROSTAT
Directorate B: Methodology; corporate statistical and IT services
“Unit B-3: IT for statistical production”

Quick guide to validation procedure

using

Data validation tool EDIT(EBB) for

SBSIFATS

Commission européenne, 2920 Luxembourg, LUXEMBOURG – Tel. +352 43011


Table of Contents

Introduction

Purpose of the Document

Scope of the Document

1.Starting EBB2012

1.1 Starting the EBB 2012 SBS Server

1.2. Logging into EBB2012

2.Data loading

2.1 Importing a dataset main page

2.2. Importing a dataset – Selecting the data source

2.3. Importing a dataset – Selecting the data format

2.4 Default template selected (example)

2.5. Naming the dataset

2.6 Loading new Lists of Codes (Refresh CFLAG lookup)

3.Validating data

3.1 Single Series Validation

3.2. Inter Series Validation

3.3. Year to Year Validation

4.Reporting

4.1. Automatic Generation of Error Reports

4.2. Data Overview

4.3. Confidentiality Audit Report

5.EBB for IFATS

5.1. Data loading

5.2. Importing a dataset - main page

5.3. Importing a dataset – Selecting the data source

5.4. Importing a dataset – Selecting the data format

5.5. Naming the datasets

5.6. Validation procedure

5.7. Single Series Validation

5.8. Inter Series validation 1G-1G2

5.9. Inter Series validation 1G and Series 1A, 2A, 3A, 4A

5.10 Reporting

5.11 Confidentiality Audit Report

6.EBB for BD

Introduction

Purpose of the Document

The purpose of this document is to provide quick guidelines for using the so called “Editing Building Block” (EBB)or “Data validation tool” (EDIT) for 2012 application on the SBS (Structural Business Statistics) domain and Inward FATS (Foreign Affiliates Statistics). The complete description of the EBB_SBS application functions is available in the User manual.

Scope of the Document

This document provides the users with a short description of the 4 steps towards the SBS Data validation:

  1. Starting EBB2012
  2. Data loading
  3. Processing validation job (single series, interseries and year to year validation)
  4. Reporting

1.Starting EBB2012

1.1 Starting the EBB 2012 SBS Server

The default path that the application is installed is on Start All Programs  EBB2012 version XX.X.X.XX.X

(Example)

In order to use the application, firstly you have to launch the EBB2012 version XX.X.X.XX.X

Server from the task bar StartAllProgramsEBB2012 version XX.X.X.XX.X

Start Server. This will start a Tomcat console Window where the System will log all the action made in the application. Alternatively, you can run the app_start.vbs script located at the installation directory. After the server is started the application client can be opened.

You should not close this window, as this would mean stopping the Server and the application.

1.2. Logging into EBB2012

To access the web interface, the client can be launched from the task bar: Start All Programs EBB2012 version XX.X.X.XX.XLogin

After you press the “Login”button, the default browser is started on page as in the example below:

To login, enter your “User name” and “Password” and click the “Log In” button.

Depending on the annexes you would like to work within the EBB SBS domain specific user types are available. Please select the one that fits your needs as specifically shown below.

a. For Annexes 1,2,3,4,8 please login as:

User name = SBSUser

Password = SBSUser1234

b. For IFATS please login as:

User name = FATSUser

Password = FATSUser1234

c. For Annexes 5,6,7 please login as:

User name = FINUser

Password = FINUser1234

d. For Annex 9 please login as:

User name = BDUser

Password = BDUser1234

e. For all annexes please login as(a new feature):

User name = SBSUser_ALL

Password = SBSUser_ALL1234

Note: The username and password are case sensitive.

Once you are connected, the following page will be displayed, enabling the starting of the validation process

2.Data loading

This is the first action to be done when starting a validation procedure. The data to be validated must be loaded into EBB_SBS to become a dataset.

A dataset is based on a format. For example, the data to be validated will have the SBSDATA format, where the growth and inflation rate dataset will use the SBSRATES format.

2.1 Importing a dataset main page

From the main menu, select “Dataset / Import dataset”

The following dialog will be display:

The button will enable the file selection. The other criterias in the CSV field part (header, delimiters, etc..) have been set to the default format. These parameters can be changed if needed. For example when the delimiter in the cvs file to be imported is a comma (,) instead of a semicolon (;) this should be added in to the Field delimited.

2.2. Importing a dataset – Selecting the data source

In order to select the data file to be validated, user will use the following part of the dialog:

Note1: To simplify the file properties selection, when importing the data file, a default template (inputData) has been created, and will select the expected file properties. This template is also selected by default.

Note2: If the file to be imported has other properties than the ones selected by default, the user must change them manually.

Note3: If a file with other properties is imported, the application will generate an error log:

“Error on line x; Details: expected to read from the current line y values but read only 1”.

In this case the csv file must be opened in notepad and the separator checked; then when importing the csv file, the same separator as in notepad must be selected; no separator (ex. ; or ,) after the last column.

2.3. Importing a dataset – Selecting the data format

When importing a dataset, user needs to select the correct Dataset format (SBS_DATA or SBSRATES).

Note: By default, the selected format is “SBS_DATA” which corresponds to the official SBS format.

The SBS_DATA format contains 24 fields (from the data file) plus 6 fields (GROWTH, INFLATION, WEIGHT, FLAG, AUX_VAL and AUX_PART) which have been added for internal processes.

To simplify the field selection, when importing the dataset file, a default template (INPUT_DATA) iscreated and will select the expected fields. This template is also selected by default.

Important notice for SBS domain for Year to Year validation:

The dataset containing the Growth and Inflation ratesMUST be imported into EBB before the execution of the validation jobs.

When importing data file containing the Inflation and Growth Rate, the format to be selected is the SBSRATE format. All the fields need to be transferred in the selected fields section:

2.4 Default template selected (example)

2.5. Naming the dataset

The last action before launching the data importation is to name the dataset which will be created.

It is possible to create a new dataset or to add the data to an existing dataset.

Then, Click on the Icon to launch the import procedure. You will be redirected to the IMPORT / EXPORT page, displaying all the importation and exportation tasks done under SBS.

Due to some technical problems, wrong codes, separators existent in the dataset, the importing task can fail. This is signalled under the Status and by clicking on theicon the problems are listed. The initial file (csv file) has to be corrected and the loading process restarted.

Once the importation is done, the status COMPLETED will be displayed as follows:

You are now ready to proceed to the validation procedure.

2.6 Loading new Lists of Codes (Refresh CFLAG lookup)

If you want to upload a new list of codes, you need to follow the steps below.

This process is similar to the import of a dataset, with only a few differences.

To explain the process we’re going to re-populate the “Confidentiality Flag (CFLAG)” lookup.

Step 1: Access the import dataset page.

From the main menu,select the “Datasets Import dataset” menu option and the following page will show.

Step 2: Select the new lookup file and import properties.

From “File Import” click on the “Browse” button.

The file upload window will be opened.

Choose the file where the new lists of codes are and press the “Open” button.

The file name field value will be the file name selected by user.

In this example the File Import section looks like:

At “File Properties” block, the “File Properties” template value needs to have inputData selected.

“CSV” block doesn’t need to be modfied.

Step 3: Set up File Fields import properties.

Above picture shows how File Fields section should look like.

For this example Dataset Format value is “CFlagCodes”.

All available fields “CODE”, “CODE_DETAILS” need to be in Selected Fields.

Step 4: Dataset Section.

In this section User needs to select “Repopulate Existing Dataset” option and in the combo box select the lookup which User wants to load the new lists of codes. In this example CFLAG code list has been selected. When all above processes have been met , Import link needs to be pressed.

Once User has followed all above steps an import task is created and if all is correct the new lists of code is loaded.

Note: Be aware which list of codes are refreshing, some of them need to improve the program implementation.

3.Validating data

Validation procedure is launched through the Jobs Menu

Selecting will open the job selection. The list of jobs which will be displayed will enable the selection of a dedicated validation procedure.

Three kinds of jobs can be launched (Single Series, Inter Series and Year to Year). Preliminary datasets can also be checked with EDIT. The datasets 1E and 1F are not inlcuded under the validation of single series or interseries, but they are checked separately (see above).

The selection of the appropriate job will be done via the button.

The two other icons are used respectively to:

-Visualise the program details

-Export the program

3.1 Single Series Validation

When selecting a Single Series validation, the following dialog will be displayed:

This dialog will enable you to change the Jobs information (from the General Information panel), by applying job name (by default the system will provide a job name) and to select the dataset which will be validated:

It is also possible to change the default name (given by the system) of the datasets which will contain the errors detected (ex. 2A_2011_data_err_stat_20140523_1152.csv), confidentiality audit errors (2A_2011_conf_output_stat_20140523_1152.csv) and confidentiality audit report (2A_2011_conf_err_stat_20140523_1152.csv).

Example:

And then click on the button to launch the job

3.2. Inter Series Validation

When selecting anInter Series validation, the following dialog will be displayed:

For the Inter Series validation Job, you will have to select, from the Parameters Set panel, the series to be compared.

A drop down list will display all the possible options:

Selecting one of the options will prefill the parameters selection.

You will then have to select the datasets to be checked.

Note: The selection can be changed by the user.

The renaming facilities for the Job, error dataset and output datasets are also available.

The name of the output error can be also renamed.

Click on the button to launch the job

3.3. Year to Year Validation

When selecting a Year to Year validation, the following dialog will be displayed:

For the Year to Year validation Jobs, you will have to select, from the Parameters Set panel, the year concerned by the validation.

Preliminary datasets can also be checked with EDIT tool. The same steps should be followed as it is in the case of year to year checks.

A drop down list will display all the possible options:

Selecting one of the options will pre fill the year parameters.

The renaming facilities for the Job, error dataset and output datasets are also available.

You will then have to select the datasets to be checked (one for each year, or the same if the data for the two years are present in the same file) from the Reference dataset panel.

And click on the button to launch the job

Note: Once a job has been launched, EBB_SBS display the Job list:

From the Job list, you can see the list of jobs and their status. When the job is complete, you can check the resultsof the execution using the icon. A job can be deleted via the icon or copy for re launching purposes via the icon.

Note: By default, the User field is prefilled with the current user and only the jobs created by him are displayed. The user can change the selection and view all jobs or a specific user’s jobs.

4.Reporting

As mentioned above, the results of a validation job will be accessible via the icon. The following dialog will be displayed:

This dialog will provide access to the job reports and will enable the visualisation of the dataset used for validation:

From the Error reports, the following actions will be available from the icons located under the Action section :

-View the error dataset within EBB2012

-View statistics error within EBB2012

-View detailed statistics report

-Export the error dataset (CSV of FLR)

The results of the validation checks are displayed under View detailed statistics report of ERRORLOG_2a_2010 from the Error Reports (ex. below).

4.1. Automatic Generation of Error Reports

When a job is finished, error reports are created in automatically way and they’re published in the following path folder on the local machine, as default:

Machines running under Windows 7

C:\ProgramData\EBB 2012_12.0.5.68.1\edit\REPORTS\

Machines running under Windows XP have two different folder paths depending on how the user installed the application:

- For all Users:

“C:\Documents and Settings\All Users\Application Data\EBB 2012 12.0.5.68.1\edit\REPORTS\”

- Only for current user:

C:\Documents and Settings\<your account name>\Application Data\EBB 201212.0.5.68.1\edit\REPORTS\”

The following reports are automatically published by the system:

Dataset name = Error Log

- Statistic Error Report (view Detailed Statistics Report icon )

Pattern:Series_Year_data_err_stat_Timestamp.csv

e.g.: 9A_2011_data_err_stat_20140422_1241.csv

Dataset name = ConfAuditErrorLog

- Statistic Error Report (view Detailed Statistics Report icon )

Pattern:Series_Year_conf_err_stat_Timestamp.csv

e.g.: 9A_2011_conf_err_stat_20140422_1241.csv

Dataset name = ConfAuditOutput

- Statistic Error Report (view Detailed Statistics Report icon )

Pattern:Series_Year_conf_output_stat_Timestamp.csv

e.g.: 9A_2011_conf_output_stat_20140422_1241.csv

Note: The automatic generation of error reports needs two templates:

- File properties template called CSV_EXP.

- File fields template called REPORTS.

4.2. Data Overview

A new section in the error report has been implemented and it’s located under A_OVERVIEW label. This section will contain the following information:

  • Number of observations in the data
  • Number of zeros in the data
  • Number of N/A values in the data
  • Confidentiality Flags found in the data
  • Quality Flags found in the data
  • Total number of confidentiality flags found in the data
  • Total number of quality flags found in the data

The following image shows a data overview example from the error reports:

4.3. Confidentiality Audit Report

In order to obtain the Confidentiality Audit Report, giving an overview on the flags in the data and encountered confidentiality audit errors,you must select the View details statistics report option corresponding to the CONFAUDITOUTPUT_ reference.

The report will be downloaded as a CSV file and can be opened in Excel.

Legend:

When a cell is marked : ***T1_L2*** ***T2_T30*** : a secondary confidentiality flag is missing for NACE dimension - between NACE 2 and 3digit level (T1_L2) and for size class dimension (T2_T30).

There are three types of confidentiality checks:

  • Type1:
  • Only the NACE dimension is checked
  • Type of error:
  • error level1 (between NACE 1 and 2digit level)
  • error level2 (between NACE 2 and 3digit level)
  • error level3 (between NACE 3 and 4digit level)
  • Concerned tables:
  • 1A, 2A, 3A, 4A, 2D, 3H, 4D, 2E, 3I, 4E, 2F, 3J, 4F, 2G, 4G, 4H etc.
  • Type2:
  • An additional dimension to the NACE is checked (size class breakdowns, NUTS, environmental domains etc.)
  • Concerned tables:
  • 1B, 2C, 3B, 4B, 1C, 2H 3C, 4C, 2B, 2I, 3D, 2J, 3E, 2K, 3F, 3G, 3K, etc.
  • Type3:
  • Checks on consistency of linked series:
  • 1B/1A, 2B/2A, 3B/3A, 4B/4A, 1C/1A, 2C/2A, 3C/3A, 4C/4A etc.

5.EBB for IFATS

Due to some small differences between the data validation process for the SBS annexes and IFATS, a new chapter is added for the data validation process of IFATS data.

The information mentioned inIntroduction and Chapter 1 is common for all SBS annexes and IFATS.

5.1. Data loading

This is the first action to be done when starting a validation procedure. The data to be validated must be loaded into EBB_SBS to become a dataset.

A dataset is based on a format. For example, the data to be validated will have the FATS_DATA format.

5.2. Importing a dataset - main page

From the main menu, select “Dataset / Import datasets”

The following dialog will be displayed:

5.3. Importing a dataset – Selecting the data source

In order to select the data file to be validated, user will use the following part of the dialog:

For IFATS datasets please note that it is mandatory to have comma or a semicolon after the 16th column, otherwise your file will be rejected.

Example:

1G;2008;IT;30;B-N_S95_X_K;30;V1;12110;45189;;;;;;KEUR;;

The button will enable the file selection.

When selecting the “CSV” part in the dialog window, please check all the other criterias:

- Skip Header Lines

- Field Delimiter

- Decimal Point Character

-Text Qualifier

-Thousand Delimiter

Normally, they have been set to the default format (as described in the user guide). Any of these parameters can be changed if necessary: for example, if the dataset you want to load has got as adelimiter a comma (,) then replace the semicolon (;) in the “Field Delimiter” with a comma (,).