NSI / Statistics Finland
Name contact person / Antti Santaharju
E-mail contact person /
1. What is the current status of your S-DWH?
(planned / in development / in implementation or o0070erational) / Operational/implementation
In November 2014 Statistics Finland implemented an integrated production system for business statistics (YTY). YTY is a Statistical data warehouse system which covers the whole statistical production process from gathering the data to the dissemination and the delivery of the final results. YTY integrates the production processes of the business register and 13 different business statistics into a one common system.The first productioncycle for reference year 2013 comes to an endin October 2014
General Description
2.Please give a (short) general description of the nature of your S-DWH bydescribing topics as:
Which problems do/did you want to solve by implementing a S-DWH?
What is the goal of the S-DWH?
What are the most important solutions desired by the NSI, to be solved by implementing a S-DWH?
What functionalities are (to be) implemented in the S-DWH? / There was no single conceptual approach for processing data in the production of business statistics.
Goal of the YTY integrated production system for business statistics is to integrate all our production databases of business statistics into one operational database. So far we have integrated production processes of 13 different statistics:
- Structural business and financial statement statistics
- Regional statistics on entrepreneurial activity
- Industrial output
- Enterprise openings and closures
- Index of turnover in industry
- Index of turnover of construction
- Turnover of trade
- Turnover of service industries
- Wage and salary indices
- International trade in services
- Foreign affiliates in Finland (FATS)
- Finnish affiliates abroad (FATS)
- Business Services Statistics
We have also established a centralized data warehouse on enterprises.
Main motivation to start DWH in our business statistics systems were:
-More ways to (re)use data
-To improve proces integration in business statistics production
-To improve efficiency in business statistics production
-Cost savings
Main functionalities that are implemented in the YTY S-DWH system are:
- All business data are in common place (basis for information services quering and analyzing of data)
- Use of same classification variables and unit structures (enterprise group, enterprise, LKAU, KAU, local unit, legal unit)
- Coherence of statistics (crosschecking between different statistical systems at the beginning of process)
3. What is the scope of your S-DWH?
(ETL, Data Warehousing, metadata, integrations between metadata system and ETL) / YTY S-DWH covers the whole production process from data collection to data dissemination.Functionality is described in the picture above and it covers all the 4 layers in Centre of competence on data warehousing S-DWH business architecture.
1. Source layer
- Direct surveys are carried out by web application
- Administrative datasets (sequential text files) are transmitted from other administrative institutes to Statistics Finland as sequential textfiles
- Data sources (and every variable) are described in the Statistics Finland’s metadata database.
- Received administrative datasets are converted into SAS datasets and are technically validated using the metadata
- Data sources (external and internal) are stored into YTY Source layer as SAS datasets (raw data warehouse)
-Processing is controlled by Statistics Finland’s tailor made process control system process engine X
2. Integration layer
- Input datasets are uploaded (by SAS programs) to integration layer (YTY Production database Microsoft SQL2012 database).
- Data update launches modular data integration, coding, validating, editing and imputing procedures (SAS, .Net and SQL procedures) that
- Erroneous observations are flagged and analyzed/edited manually by Statistics Finland’s tailor made .Net application
-All database tables are described in the Statistics Finland’s metadata database
- All the processing is controlled by Statistics Finland’s tailor made process control system procX
- All ETL procedures are performed by process control system
- Process control information is stored into BPMN descriptions and database table
- Correct observations are transferred to the YTY S-DWH database for validated micro data
3. Interpretation and data analysis layer
- YTY S-DWH has data mining tools in both databases (YTY production database and YTY S-DWH database for validated micro data)
-Data analysis and data mining is done by
-SSAS Database cubes (MS Excel & SAS EG)
-Microsoft SQL2012 report builder reports
4. Access layer
- All validated micro data are loaded to YTY S-DWH database for validated micro data
-Microsoft SQL2012 server database for validated micro data
Data analysis, data mining, dissemination and delivery is based on SSAS Database cubes
-Daily updated MOLAP cubes
-One data cube for each statistics
- Publication process creates frozen micro data version into database
- Publication process tools:
-SAS, Tau-Argus, PC-Axis, PX Web..
- In future Some validated information is loaded directly from other production databases to YTY S-DWH for validated micro data
4. How is the S-DWH organised?
(activities, responsibilities, roles etc) / YTY S-DWH has about 300 users
- roughly150 users edit the database
- roughly 150 users have read-only rights
Users cover all different activies, responsibilities and roles in statistical production process.
5. Which problems did you encounter so far? / Main methodological barriers to implementing an integrated system are:
-Requires significant investment
-Too disruptive to regular work processes
-Statistical requirements are too different
-Collecting needed requirements of desired state (individual and project managing point of view)
-Metadata model
-Complexity
-insufficient development resources & lack of professional experience on large-scale IT systems building
Problems encountered in data integration you desire to solve with an integrated DWH-system are:
- overlapping work
- possible quality problems due to lack of common standards and methods
- lack of coherence of statistics
Problems encountered in process integration you desire to solve with an integrated DWH-system
- lack of efficiency
- overlapping work
- possible quality problems due to lack of common standards and methods
- lack of coherence statistics
Characteristic
6. Please check the characteristics as indicated in the questionnaire and correct / complete if necessary? / Indicated in the questionnaire:questions 3 + 4
There is not a one-to-one correspondences between input data and outputs.The data warehouse is passive.
Changes have to made via operational database and its applications. Some changes are also made in ETL to DWH.
7. Is the data in the S-DWH unit based (micro data)? / Yes. In a long run, also maybe indexes and some aggregated data.
8. Do you have weighted data in the
S-DWH? / Not yet, maybe in future.
Statistical Registers
9. Is the Business Register (BR) or other Statistical Register (SR) managed in the
S-DWH? / Yes. It’s similar (almost) than any other business statistics
10.Which BR / SR data are stored in the
S-DWH? / Almost all variables (employees, sector, turnover and yearly/monthly administrative and direct data).
All unit structures (enterprise group, enterprise, LKAU, KAU, local unit, legal unit)
11. Is the S-DWH used for updating the BR? / BR is integrated into the S-DWH
12. Do you have a snapshot of theBR in the S-DWH? / Yes, partly (from annual data). We also collect history of some critical/important variables in operational database.
BR is integrated into the S-DWH
13. Do you keep different versions of BR snapshot in the
S-DWH? / We have snapshot of the annual data at the time of release. All versions are stored.
BR is integrated into the S-DWH
Metadata (please read Annex 1 for an explanation of the underlined Metadata concepts!)
A. What kind of reference (business) metadata is used? / B. What kind of structural (technical) metadata is used
14. What kind of statistical metadata is used? / Descriptions of variables
15. What kind of process metadata is used? / - BPMN descriptions
- Process state on different conceptual levels (variable groups)
16. What kind of technical metadata is used? / SAS and SQLServer metadata
17. What kind of quality metadata is used? / Methods used
18. What kind of authorisation metadata is used? / Users are divided into different groups (roles) based on activities. User rights management is based on these roles.
17. Specify the key relations between the metadata classes according to your metamodel(s)?
(eg statistical activity, classifier, variable)
18. Specify the key relations between the different kinds of metadata
(reference, structural,
quality)
19. How is the metadata maintenance organised and stored? / XML data model, relational database (partly under construction)
Satistics Finland’s tailor made user interface
20. What are your metadata quality requirements? / All classifications are described in metadata (classification database)
All input, output and database tables are described in metadata (metadata database)
Annex 1: Metadata concepts[1]
Reference metadata are metadata that describe the contents and quality of the data in order to help the user understand and evaluate them (conceptually)
Examples: Quality information on survey, register and variable levels; variable definitions; reference dates; confidentiality information; contact information; relations between metadata items
Structural metadata are metadata that help the user find, identify, access and utilise the data (physically)
Examples:Classification codes; parameter lists
Statistical metadataare data about statistical data
This definition will obviously cover all kinds of documentation with some reference to any type of statistical data and is applicable to metadata that refer to data stored in a S-DWH as well as any other type of data store
Examples:Variable definition; register description; code list.
Process metadata are metadata that describe the expected or actual outcome of one or more processes using evaluable and operational metrics
Examples: Operator’s manual (active, structured, reference); parameter list (active, structured, reference); log file (passive, structured, reference/structural)
Technical metadata are metadata that describe or define the physical storage or location of data.
Examples:Server, database, table and column names and/or identifiers; server, directory and file names and/or identifiers
Quality metadata are any kind of metadata that contribute to the description or interpretation of the quality of data.
Examples: Quality declarations for a survey or register (passive, free-form, reference); documentation of methods that were used during a survey (passive, free-form, reference); most log lists (passive, structured, reference/structural)
Authorisation metadata are administrative data that are used by programmes, systems or subsystems to manage users’ access to data.
Examples:User lists with privileges; cross references between resources and users
[1]Metadata Framework for Statistical Data Warehousingv0.9; ESSnet on Data Warehousing