NSI / Statistical Office of the Slovak Republic
Name contact person / Róbert Grác
E-mail contact person /
1. What is the current status of your S-DWH?
(planned /
in development /
in implementation or operational) / -Operational
-Some parts of the system are in implementation phase (time series; specific surveys, etc…)
General Description
2.Please give a (short) general description of the nature of your
S-DWH bydescribing topics as:
Which problems do/did you want to solve by implementing a S-DWH?
What is the goal of the
S-DWH?
What are the most important solutions desired by the NSI, to be solved by implementing a S-DWH?
What functionalities are (to be) implemented in the S-DWH? / -The goal is to have one compact data warehouse
-Store data from different surveys in one database
-Eliminate of using several databases
-To have integrated system and harmonized data
-Standard functionalities of BI systems are implemented
3. What is the scope of your S-DWH?
(ETL, Data Warehousing, metadata, integrations between metadata system and ETL) / -All layers are managed by metadata
-Collection of micro data in source layer
-Processed data are transforming into the integration layer
-From processed micro data are created production data (interpretation and data analyst layer)
4. How is the S-DWH organised?
(activities, responsibilities, roles etc)
If you have an architectural diagram, are you willing to share it with us (yes/no)? / -Identity and Access Management
-Authentication and privileges within system
5. Which problems did you encounter so far? / -Frequent changes of statistical statements resulting from legislative changes
-Changes of data structure
-Complexity of data processing from different surveys has impact on performance
Characteristic
6. Does your S-DWH contains:
- micro data,
- macro data or
- both? / -Micro data
-Macro data
-Virtual macro data – using by BI tools
7. In your S-DWH, do you save:
- initial data and weights,
- weighted data or
- both? / -Initial data and weights
Statistical Registers
8. How do S-DWH and SR interact?
For example:
- completely independent
- fully integrated
- other (please specify) / -Statistical Registers are integrated into the Integrated Statistical Information System within subsystem REGIS.
-For example – codelists used in Register are connected to the metadata management (METIS).
-It is possible to use surveys for updating Registers, too.
9. If S-DWH and SR are not fully integrated then is the S-DWH used for updating the SR? / -Registers are integrated into the S-DWH
Metadata management
10. How do you manage metadata in your S-DWH?
For example:
- centralised,
- per domain,
- other (please specify) / -Metadata management is centralised within subsystem METIS which is the part of the Integrated Statistical Information System. Metadata is shared within domains.
-Metadata for household surveys are maintaned within Blaise system per domain. Minimum metadata is mantained within METIS to allow integration of microdata (coming from household surveys) into the Integrated Statistical Information System.
11. What metadata model(s)/standards
do you use in your
S-DWH?
For example:
- GSIM,
- SDMX,
- MMX,
- etc / -Own metadata model which could be comparable with GSIM model.
12. How do you manage the quality of your metadata?
For example:
- minimal quality requirements,
- guidelines,
- standards / -Via GUI, user guidelines, standards (code lists, classifications, business register), internal directives
Tools/Technology Used (please read Annex 1 for a summary of tools in NSIs)
Please give a general description of the technology used in your SDWH by describing tools used for:
(Indicate whether these tools were bought or built in house)
13. Capturing data from different sources into the SDWH (Source layer) / -Custom system for collecting electronical statements and administrative sources (using Oracle database)
-Blaise
14. Cleaning and integrating the data sources into the SDWH (Integration layer) / -Custom system, which contains tools for this purpose
15. Analysing and interpreting the data (Interpretation and Data Analysis layer) / -Cognos BI tools
-SAS
16. Disseminating and providing access to Macro Data (Access layer) / -Cognos BI tools
-SDMX-RI
-SDMX Converter
-Web service tools - Apache
Annex 1: Tool Inventory
This annex provides a brief description of software packages existing on the market or developed on request in NSIs in order to describe the solutions that would meet NSI needs, implement SDWH concept and provide the necessary functionality for each SDWH level.
Access layer
The principal purpose of data warehouse is to provide information to its users for strategic decision-making. These users interact with the warehouse throughout Access layer using end user access tools. The examples of some of the end user access tools can be:
- Specialised Business Intelligence Tools for data access
Business intelligence tools are a type of software that is designed to retrieve, analyse and report data. This broad definition includes everything from Reporting and Query Tools, Application Development Tools to Visual Analytics Software, OLAP viewers. The main makers of business intelligence tools are:
- Oracle
- Microsoft
- SAS Institute
- SAP
- SPSS
- R
- Office Automation Tools for regular productivity and collaboration instruments
By Office automation tools we understand all software programs which make it possible to meet office needs. In particular, an office suite therefore usually contains following software programs: word processing, a spreadsheet, a presentation tool, a database, a scheduler. One of the most common office automation tools around:
- Microsoft Office
- IBM‘s Lotus SmartSuite
- OpenOffice
- Graphics and Publishing tools
Graphics and publishing tools are tools with ability to create one or more infographics from a provided data set or to visualize information. There are a vast variety of tools and software to create any kind of information graphics, depending on the organizations needs:
- SAS
- SPSS
- Stata
- Web services tools (M2M) like:
- Microsoft Visual Studio
- Apache Axis
- SoapUI
- JBoss
- SDMX tools like:
- SDMX Java Suite developed by European Central Bank
- Flex-CB Visualization
- DSW (Data Structure Wizard) developed by Eurostat
- SDMX Converter developed by Eurostat
- SDMX-RI (Reference Infrastructure) developed by Eurostat
- SDMX Registry developed by Eurostat
- SDMX.NET developed by UNESCO Institute for Statistics
Interpretation and Data Analysis layer
The interpretation and data analysis layer is specifically for statisticians and would enable any data manipulation or unstructured activities. In this layer expert users can carry out data mining or design new statistical strategies.
- Statistical Data Mining Tools
The overall goal of the data mining tools is to extract information from a data set and transform it into an understandable structure for further use. They also are capable to visualise data, which was extracted in data mining process:
- IBM SPSS Modeler
- SAS Enterprise Miner
- Microsoft Analysis Services
- R
- Business Intelligence Tools for data analyse in a direct connection with data base
Business Intelligence tools allow users to create visual reports for data analysis needs. These tools often come as packages that include tools for ETL operations, and for designing specialised OLAP cubes, and finally presentational tools for displaying tabular data from specialised reporting views for end users:
- SAS OLAP Cube Studio
- SQL Server Analysis Services (SSAS)
- Analytic Workspace Manager 11g
- SuperCross
- Tools for designing specialised reports like:
- Eclipse BIRT Project
- JasperReports
- OpenOffice Base
- Oracle Reports
- SAS Web Report Studio
- SQL Server Reporting Services (SSRS)
- Crystal Reports
- SPSS
Integration layer
The integration layer is where all operational activities needed for all statistical elaboration processes are carried out. This means operations carried out automatically or manually by operators to produce statistical information in an IT infrastructure. With this aim, different subprocesses are predefined and preconfigured by statisticians as a consequence of the statistical survey design in order to support the operational activities.
In general, for the Integration layer there are mostly dedicated software applications usually defined as Data Integration tools. This kind of software is used for metadata management and usually is developed and implemented on NSI request. This is because of specific needs and requirements from customer. It has a user friendly graphic interface to help the integration of different input sources and their manipulation.
Source Layer
The Source Layer is the level in which all the activities related to storing and managing internal or external data sources are located. Internal data are from direct data capturing carried out by CAWI, CAPI or CATI while external data are from administrative archives. Essential component used to load data into data warehouses from the external sources is ETL. The most popular commercial ETL Tools are:
- Oracle Warehouse Builder (OWB)
- SAS Data Integration Studio
- SAP Business Objects Data Services (SAP BODS)
- Microsoft SQL Server Integration Services (SSIS)
- Pentaho Data Integration (Kettle)
- Blaise