Computing and Information Systems © University of Paisley 2007
Version-Manager:
For vital simulating advantages in data warehouse
M. Khurram Shahzad
1
Computing and Information Systems © University of Paisley 2007
Abstract--To achieve vital advantages of simulating business scenarios, extensions are made to versioning concept by dividing alternative versions into two types, 1) Materialized alternative versions and 2) Virtual alternative versions. This study presents transparent querying method from multi-version data warehouse in the presence of real versions, materialized and virtual alternative versions. This is done by making extensions to our Synthetic Warehouse Builder with new component called Version Manager.
Version manager with its confinement process divide query into smaller parts called mono-version queries; execute it on relevant and heterogeneous version/s and merging the result to present it for analysis. It has been found that Version Manager is very useful for short term data requirements as well as rapidly changing requirements of simulating business alternatives. Complex and multi-facet metadata is required to be maintained by Version manager for processing queries. So, the need and contents of metadata are also defined, classified and explained in the article. ANSI/SPARC three-tier architecture has been re-worked for multi-version data warehouse.
Keywords--Data warehouse, evolution, versioning, business alternatives, simulations, transparency, query processing architectures, integrated catalog, metadata.
I. INTRODUCTION
Data Warehouse (DW) provides integrated access to transactional data source/s. But dynamics of this source/s has resulted in derivation of inconsistent analytical results from DW (Bebel et al. 2004). Various approaches have been proposed to handle business dynamics. These approaches have be categorized into two types (Hurtado et al. 1999, Blaschka et al. 1999) a) Schema Evolution approach b) Versioning approach.
For better what-if-analysis and to improve the quality of decision-making, concept of simulating business scenarios has been proposed (Bebel et al. 2004, Wrembel et al. 2005), but conventional DW does not support simulating business scenarios. Hence, two types of versions are used (Bebel et al. 2004, Wrembel et al. 2005) a) real versions, for handling business changes b) alternative versions, for simulating business scenarios. The DW which maintains these versions is called multi-version data warehouse (MV-DW).
Retrieval from MV-DW has been made possible by multi-version query language by (Morzy et al. 2004). It has been found that query writing becomes impossible with increased number of versions, for this reason, Synthetic Warehouse Builder (SWB) has been proposed (Nasir et al. 2006).
SWB provides transparent access to users for querying multiple versions of DW without caring about implementation details of versions. Recent study (Shahzad et al. 2006) has attempted to eliminate limitations of Synthetic Warehouse Builder (SWB), this attempt has resulted in following observations:
Observations-- a) SWB do not provide flexibility of simulating business scenarios in MV-DW b) Vital advantages of, simulating business scenarios, cannot be achieved by exploiting alternative versions c) The conception of handling simulating business scenarios by alternative versions is required to be re-worked d) To extend the functionality of SWB, Version-Manager is required.
Contribution-- Above mentioned observations are source of motivation for this study. So, the contributions are: 1) to present the architecture of Version-Manager (an added component of SWB) to provide flexible environment for what-if-analysis and simulating business scenarios in order to improve the quality of decision-making. 2) Querying method for three types of versions, without letting the users to know about location of data in any version. 3) Identify and classify multi-facet metadata, maintained by Version-Manager for querying versions. 4) Re-work the ANSI/SPARC’s three tier architecture for multi-version data warehouse.
Rest of paper is organized as follows; section no 2 gives the motivation for extending the SWB functionality, section 3 overviews existing approaches to address evolutionary problems of DW and simulating business alternatives. Introduction to SWB, its properties along with functionalities are given in section 4 while section 5 presents the proposed architecture, and query processing using Version Manager is given in section 6 of the paper. Finally, paper is concluded in section 7 with small discussion.
II. MOTIVATION
In today’s knowledge oriented society, success of organization depends upon quality of decision-making. (Pasha et al. 2004). Simulating business scenarios for better what-if analysis has recently been proposed. (Wrembel et al. 2005, Bebel et al. 2004) by maintaining real and alternative versions, but various businesses are found to have vibrant simulating scenarios i.e. they can be changed at runtime. Since alternative versions have instances attached with them so changes are absolutely not possible to be made, hence restricts the scope of the user’s what-if-analysis.
For achieving vital advantages of alternative business scenarios and providing users with a flexible simulating environment, alternative versions are divided into two types (Shahzad et al. 2006).
a) Materialized Alternative Versions (MAV), have instances attached with them and can also share data with its child versions. MAV’s are used for simulations, whose structure are static and are used most of the time. These are also called static simulations.
b) Virtual Alternative Versions (VAV), have no instance attached with them instead they are stored like virtual tables in multi-version catalog and gets data on-the-fly from mapped set of real and materialized alternative versions. These types of alternative versions are used for vibrant (dynamic) simulations.
Storing and mapping of versions is not the real task instead it is required to retrieve data from various versions and make it useful for malicious decision-making, which has not been done.
SWB provides transparent access to real and alternative versions, but does not provide flexible simulating environment, hence raises the need for addition of component called Version-Manager (VM). VM will not only provided interface for simulating business scenarios and run-time addition of virtual dimension and fact tables but also provide facility for querying a) Real Versions b) Materialized Alternative Versions c) Virtual Alternative Versions. The only extra effort required is one-time mapping of VAV’s to respective MAV/s and RV/s.
III. RELATED WORKS
Transactional systems act as a source of data for hDW. It is found that changes to these sources may result in derivation of inconsistent results. (Bebel et al. 2004). Semi-Star (Pasha et al. 2004), can handle these changes but are suitable only for small and medium sized dynamic organization. Other methods proposed to handle these changes are:
· Schema and data evolution method, (Blaschka. et al. 1999, Hurtado. et al. 1999), according to this approach changes to schema are applied to new schema and data is transported to new schema. But this approach gives unsatisfactory response time in the presence of increased number of changes (Nasir et al. 2006).
· Temporal versioning method (Pasha et al. 2004, Chamoni et al. 1999, Eder et al. 2001, Eder et al. 2002), in which changes new DW versions are created with a time stamp but its disadvantages has already been discussed (Chamoni et al. 1999, Mendelzon et al. 2000) and quoted by (Bebel et al. 2004).
Fig.1. MV-ILS Builder (Nasir et al. 2006)
For achieving the objective of better what-if-analysis, concept of simulating business alternatives has been reported (Bebel et al. 2004, Shahzad et al. 2006), which can be handled along with source dynamics by using two types of versions i.e. real and alternative versions. For querying purpose multi-version query language has been proposed (Morzy et al. 2004) by extensions to standard query language. But we have reported versioning algebra for evolution and generation of versions in the presence of source changes (Shahzad et al. 2006), also we have already build SWB for transparent querying method (Nasir et al. 2006). But, a) Vital advantages of simulating alternative business scenarios has not been achieved b) It is required to provide users with a flexible environment for using simulations to improve quality of decision-making c) SWB has restricted querying scope i.e. only the set of three querying possibilities exists, that are: i) Querying only current DW version ii) Query the set of real DW versions iii) Or querying set of selected alternative DW versions.
These issues can be addressed by adding a component to SWB called ‘Version-Manager’ (Shahzad et al. 2006) whose architecture has been presented here.
IV. SYNTHETIC WAREHOUSE BUILDER
SWB provides transparent access to set real and alternative versions. While using SWB users are not required to keep track of set of versions, their derivation hierarchy as well as instances attached with the versions, instead only the knowledge of logically integrated schema will work.
a) Transparency in SWB has been defined (Nasir et al. 2006) at three levels a) Source transparency, data stored in different versions are made transparent to users. i.e. user will retrieve data from multiple versions independent of the existence of source table/s in version/s, but it should be present in at least one of the versions.
Fig.2. Synthetic Query Analyzer (Nasir et al. 2006)
b) Projection transparency, user should write the query independent of existence of attribute in one or more tables of different versions.
c) Analysis transparency, in the presence of this type of transparency, user will write query independent of predicate-attribute present in one or more versions. But it should be available in at least one version. The advantage of fully transparent access to DW is high level support, which it provides for development of complex relational multi-version DW.
Three level of transparency is provided with the help of two major components: a) MV-ILS builder, acts as a core of providing integrated access to the users by provides logically integrated schema that contains the union of all the participating set of versions in (Nasir et al. 2006), shown in fig. 1.
Synthetic Query Analyzer (S-analyzer), shown in fig. 2 is produced to handle querying issues of real and alternative versions of DW. Input query to it is called multi-version query or transparent query (executable on multiple-versions), which is further transformed to mono-version queries by confinement process.
Although, SWB facilitate users by providing transparent access to versions but do not provide flexible environment for simulating business scenarios. So, need for extensions to SWB is proposed in (Shahzad et al. 2006) which can be met by adding small component named Version-Manager with the abilities of addressing issues related to three types of versions.
V. VERSION-MANAGER
Version manager extends the functionality of transparent querying from multiple real and alternative versions to: a) Set of real, materialized and virtual alternative versions b) Flexible environment for simulating business scenarios.
VAVs have no instance of its own, instead selected real and alternative versions acts as a data source for such versions (Shahzad et al. 2006). All this is done on-the-fly, although some of the performance issues are evolved but it can be handle by using the strengths of distributed and grid computing, (Bernadrino et al. 2000). Moreover, the performance can be maintained by using appropriate set of materialized and alternative versions.
Architecture of Version Manager is shown in fig. 3. (a), it provides the facility of breaking a query say TSQL, into possible set of mono-version queries i.e. RV-SQL, AV-SQL and VSQL. Each query will then be executed on relevant version/s.
1
Computing and Information Systems © University of Paisley 2007
Fig. 3. (a) Architecture of Version-Manager
Fig. 3. (b) Query processing components of Version-Manager
1
Computing and Information Systems © University of Paisley 2007
Since virtual alternative versions have no instance attached with them and they are mapped to real and materialized alternative versions so VSQL may be further confined to AV-SQL and RV-SQL. Finally, these queries are executed and their results are merged by queuing them in temporary store. For optimization purposes Inlining technique by (Pedersen et al. 2002) has been use with few modifications.
SWB don’t have the facility of adding, modifying and deleting dimension, dimension table, fact and fact tables at querying time. But, VM interface provides facility of creating, modifying and deleting virtual dimensions, virtual facts and their relationship in virtual alternative version; the only requirement is definition of one-time mapping and transformation-rules of attributes.
To compute final results of various mono-version queries over heterogeneous versions, it is required to merge the result set of RV-SQL, AV-SQL and V-SQL by using temporary store. The conception of temporary storage has already been used to merge results of various queries, and it has been found efficient and in good condition in (Pedersen et al. 2004). The only inefficiency found for such purpose is excessive transfer to data, for which number of optimization techniques has been proposed (Pedersen et al. 2002).
It is to be made clear that results of the real versions will be merged while the simulation result sets will be kept separate, which will help the user in identifying real world and simulated data. The manager have the ability to store metadata in catalog and keeping contact with smaller component called ‘metadata manager’ whose responsibility is to find and deliver appropriate set of metadata from MV catalog. With these abilities of processing queries Version Manager will provide integrated and solo access along with flexible environment for simulating business scenarios for quality decision making.
VI. QUERY PROCESSING IN VM
Given the above architecture, in order to process the query transparent to the users, VM will retrieve data from various version/s and merge the related results. Fig.3 (b) whose query processing components. For this purpose following steps are performed:
a) Splitting multi-version query into mono-version queries, user will input query in TSQL form, independent of the existence of various versions. VM will divide this multi-version query on MV-ILS to smaller queries executable on versions. In case of VSQL, it is further mapped to real and materialized alternative versions.
b) Evaluation, of mono version queries is done in which each query is evaluated for its expected results i.e. the queries for which execution is either not possible or useless will be rejected, (Nasir et al. 2006).
c) Execution, after splitting multi-version query, mono-versions queries are send to respective version by using versioning metadata from MV-Catalog.
d) Monitoring, has defined monitoring algorithm, whose responsibility is to monitor the execution of queries in (Nasir et al. 2006).