Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts

O. Mangisengi, A M. Tjoa

Institute of Software Technology, Technical University of Vienna, Austria
Tel: 58801-18800, FAX: 58801-18899, E-mail: {oscar, tjoa}@ifs.tuwien.ac.at

R.R. Wagner

Institute of Applied Knowledge Processing, University of Linz, Austria

Tel: 43-732-2468-791,E-mail:

Abstract

The paper introduces different extended relational concepts to model and query OLAP data. In the first approach we will use the concept of nested relations to model multi-dimensional OLAP-data. In the second approach we will model OLAP by using the extended relational model which was introduced by Codd for use in OLAP-modeling. The general requirements for all two extended relational data models which could serve as a foundation for multidimensional database systems, are similar to those that made the relational model successful, namely the existence of an implementation independent formalism, the separation of structure and contents, and the existence of declarative query language. All two approaches will be compared with the modeling based on the traditional flat relations, which is widely used in the OLAP-community, i.e. the modeling of OLAP star and snow-flake schemas.

1. Introduction

The On-line Analytical Processing (OLAP) is emerging as the most important approach in Data Warehousing. OLAP allows to model data in a multidimensional way as a cube and to query and analyze data from many different perspectives. Independent from the different implementation aspects, OLAP data are presented to the user in a multidimensional data model.

There are several ways to formally define multidimensional models and their query languages. However until now there do not exist a commonly accepted formal multidimensional data model. Such a model is necessary as a basis for an accepted standardized logical data model for OLAP data. This would allow practitioners and researchers to specify their data warehouses in a unified way.

The aim of this paper is to propose an approach for the conceptual modeling of multidimensional data, which is entirely based on a relational data model. It seems to the authors that such a model would very much correspond with the original intuition of Codd when he introduced the concept of OLAP in his pioneering white paper [CCS93].

Among the different ways to define a data cube the star schema approach could be regarded as the most dominant one. A data cube is defined as a collection of at least one fact table and a set of dimension tables.

In this paper we will introduce a meta model with the capability to describe these tables and the OLAP queries based on two extended relational approaches, namely nested relations, and the extended relational model of Codd. The general requirements for such a formal data model that can serve as a foundation for multidimensional database systems, are similar to those that made the relational model successful, namely the existence of an implementation independent formalism, the separation of structure and contents, and the existence of declarative query language [BSHD98].

Recently a number of approaches are proposed in the literature for the formal foundation of multidimensional modeling. [BSHD98] compares and describes the most important modeling approaches in this area, namely the approaches [AGS97] of Agrawal, Gupta, and Sarawagi, [CT97a,CT97b] of Cabbibo and Torlone, [LW96] of Li and Wang, and [GL97] of Gyssens and Lakshmanan.

The remainder of this paper is structured as follows. Section 2 briefly the theory of nested relation and of extended relational. In section 3 we present the modeling of OLAP-data by means of nested relation and extended relational. Nest and unnest operators of the nested relation are well suited for drilling-down and rolling-up of OLAP operation, meanwhile PATT and STEP operators of the extended relational model are useful for OLAP data manipulations. Section 4 presents an extended example for the use of two modeling approaches in a Data Warehouse. Our conclusion and the comparison of modeling approaches are given in section 5.

  1. The Theory of Nested Relation and Extended Relational

In this section we first review the concept of the nested relations given in [RKS88], which is used to model the multidimensional of OLAP data. The special point for the nested relation is the use of nest and unnest operator. The second we review the concept of extended relational model proposed by Codd as the basic of the object oriented model.

2.1. The Nested Relation

2.1.1. Basic Concept

A nested relations (Non First Normal Form relations) is a relation with tuple components which are atomic values or repeating group instances. A nested relational database scheme is a collection of relational schemes of the form , where all the Rji are zero order. A nested relations scheme contains any combination of zero- or higher-order names on the right-hand side of the specification as long as the scheme remains nonrecursive. A nested relation is represented simply as a higher-order name on the right-hand side of the specification.

Let R be a name in a database scheme S. An instance of R, written r, is an ordered pair of the form , where VR is a value for name R. If R is a zero-order name, then VRis just any value from the domain of R. If R is a higher-order name, then VR must be expanded in terms of the names on the right-hand side of rule R. A relation structure denotes a relation scheme R and instance r.

2.1.2. Operator of Nested Relation

Besides the extended algebra operations of the nested relation, such as union, intersection, Cartesian product, projection, and selection operators, there are two important operators of the nested relation, namely the nestand unnest operator, which are usefull for OLAP operations. The detailed extended algebra of the nested relation is given in [RKS88]. We review the definition of nest and of unnest operators proposed by [RKS88].

  1. Nest

Nest takes a relation structure and aggregates over equal data values in some subset of the names in R. Formally, let R be a relation scheme, in a database scheme S, which contains for the external name R. Let and . Assume that either is in S or that B does not appear on the left-hand side in S and does not appear on the right-hand side S. Then, where:

and is appended in S if it is not already in S, and there exists a tuple such that

  1. Unnest

Unnest takes a relation structure nested on some set of attributes and disaggregates the structure to make it a flatter structure. Formally, let R be a relation scheme, in database scheme S, which contains for external name R. Assume B is some higher-order name in ER with an associated . Let Then where:

and is removed from S if it does not appear in any other relation scheme, and there exists a tuple such that

.

The nested relations scheme is illustrated by the following example [RKS88]. The relation with scheme is given by the following table 2.1. The nested relation operation , then the result of this operation produces the following table 2.2.

2.2.The Extended Relational Model

2.2.1. The Concept of Extended Relational Model

The extended relational model proposed by Codd in [Codd79] is a suitable concept of an object oriented model for OLAP data. Since it inherently uses the concept of object identifiers, a typology of different object types, such as associations, characteristics, and kernel object types, and a meta model description on the relationship between the different types. This model defines different kind of relations, namely the object relation, property relation, association relation, characteristic relation, association-graph relation, and characteristic-graph relation. The detailed definition of each relation is given in [Codd79]. These relations are used as the base relations of the object oriented OLAP data modeling. The model of OLAP data is given in detail in the next section.

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

A / C / D / E
A1 / C1 / D1 / E1
A1 / C1 / D2 / E1
A1 / C1 / D2 / E2
A2 / C2 / D1 / E1
A2 / C2 / D1 / E2

Table 2.1 The relation r

A / B / E1
C / D / E
A1 / C1 / D1 / E1
A1 / C1 / D2 / E1
E2
A2 / C2 / D1 / E1
E2

Table 2.2 The nested relation operation

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

2.2.2. OLAP data manipulation

An n-dimensional cube can be obtained by using the PATT (Partitioning by attribute) as defined in [Codd79] on the fact relation. PATT is introduced in the original paper as follows.

PARTITION BY ATTRIBUTE: PATT

Let R be a relation with attribute A (possibly compound). R may have attributes other than A. PATT(R,A) delivers the set of relations obtained by partitioning R per all the distinct values of A. For all relations R having an attribute A:R = UNION / PATT(R,A). The “/” operator here denotes the COMPRESS-operator.

COMPRESS

Let f be an associative and commutative operator that maps a pair of relations into a relation (for example, a join). Let Z be a set of relations such that f can be validly applied to every pair of relations in Z. Then COMPRESS(f,Z) is the relation obtained by repeated pairwise application of Z. An alternative notation for COMPRESS(f,Z) is f/Z.

The partition operator has the same functionality as the partitioning operator of the quotient relation (“/”) described in [FK77]. The use of partitioning operator of the quotient relation for OLAP operation in detail is given in [MT98]. Relevant roll-ups can be obtained by using the STEP-operator to find the next refinement level. The STEP-operator can be defined as follows:

STEP

Let R be an unlabeled digraph relation that does not have an attribute SEP (which stands for separation). STEP(R) is the set of all tuples of the form (SUB:x,SUP:y,SEP:n), where (SUB:x,SUP:y) belongs to R and n is the least number of edges of the graph which separate node x from node y.

3. The modeling of OLAP-data by means of nested relation, and extended relational

In this section we will show that the mentioned established extended relational models, which have the capability of grouping tuples of a relation and the ability to ‘drill-down’ and ‘roll-up’ between the different levels of hierarchies, can be used as a conceptual model for OLAP data. This has the huge advantage of an existing sound theoretical foundation, and that these models have been studied for a long time. For the object-oriented implementation of these models data-blades with the defined operators of these models are serious candidates for an alternative approach of OLAP databases.

3.1. The Nested Relation approach

In this section we propose the use of nest operator and of unnest operator, which are applied in the OLAP operation, namely rolling-up and drilling-down operation.

1. Rolling-Up

The rolling-up operation in the nested relation can be applied using the nest operator . We use the sample of sales data based on the store dimension, which is given in table 3.1.

State
ST / City
Ci / Store
STO / Sales
Sa
CA / SF / 4 / 100
CA / SF / 5 / 300
CA / SF / 6 / 500
CA / LA / 2 / 200
CA / LA / 3 / 400
CA / LA / 7 / 600
NY / NYC / 1 / 500

Table 3.1 The sample of sales data based on the dimension Store

To illustrate the rolling-up operation in the nested relation, we assume that the lowest level of aggregation of the table 3.1 is the attribute Store. We introduce the nested relation with the aggregation hierarchy Store City State. To roll-up the sales data from Store to City, the nested operation can be specified as follows:

.

The nested operation produces the following table 3.2.

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

State / City / STO1 / Sum(Sales,City)
ST / Ci / Store
STO / Sales
Sa
CA / SF / 4 / 100
5 / 300 / 900
6 / 500
CA / LA / 2 / 200
3 / 400 / 1200
7 / 600
NY / NYC / 1 / 500 / 500

Table 3.2 The nested operation

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

The Sum function could be used as a comparable OLAP-functionality.

If the higher summarization of the sales data is required, then the attributes City and Store are nested into a relation Ci1. The nested operation for rolling-up fro City to State is given as follows:

.

The result of the nested operation is given in table 3.3.

State / Ci1 / Sum(Sales,City) / Sum(Sales,State)
ST / City
Ci / Store
STO / Sales
Sa
CA / SF / 4 / 100
5 / 300 / 900
6 / 500
LA / 2 / 200 / 2100
3 / 400 / 1200
7 / 600
NY / NYC / 1 / 500 / 500 / 500

Table 3.3 The nested operation

2. Drilling-down

According to the definition of unnest, which undo the nesting of the nested structure on an attribute set, the unnest operator can be applied as the drilling-down operation in the OLAP-model. To demonstrate the drilling-down operation, we use the sales data on the table 3.3, which is the result of the rolling-up from City to State. For drilling-down from State to City, the unnested operation can be used as follows:

This unnest operation City produces the result given in table 3.2.

To obtain a more detailed view of the data, namely at the Store level, the drilling-down operation from City to Store can be realized using the following unnest operation.

.

The result of this unnest operation Store is again the table 3.1.

3.2. The Extended Relational Approach

The extended relational model proposed by Codd [Codd79] can be regarded as a forerunner of object oriented models, having a clear concept of object identifiers, and a strict specification of object types, which could easily be transformed to class-hierarchies. In this framework fact relations can be modeled as association relations with participating dimension relation types, such as the example of the association relation Sales given in table 3.4.

Sales_Oid / Time_Oid / Product_Oid / Store_Oid
1 / 1 / 1 / 1
2 / 2 / 2 / 2
: / : / : / :
n / n / n / n

Table 3.4 Association-Relation-Sales

Each object in this model is uniquely identified by its object-identifier (Oid). For every object type there exist a single attribute relation in which the object-id’s are collected (Obj-Rel-Relations). These relations could be sought as the domain for object-identifiers of an object type. The example of the object relation “Sales” is given in table 3.5.

The associative object type SALES has the subordinate characteristic object types Times, Store, and Product in the meta relation for association, i.e. the Association-Graph-Relation AG(SUP:m,SUB:n). We have the following extension of AG, which reflects this association (Note: the domain of SUP and SUB are the relation names of the object-relations). The association graph relation is shown in table 3.6.

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

Sales_Oid
1
2
:
n

Table 3.5. Object-Relation-Sales

SUP / SUB
Obj-Rel-Sales / Obj-Rel-Time
Obj-Rel-Sales / Obj-Rel-Store
Obj-Rel-Sales / Obj-Rel-Product

Table 3.6 Association-Graph-Relation

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

Each dimension can further be refined by characteristics and “characteristics of characteristics” reflecting the drill-down hierarchy. With other words this relation reflects the star-schema with the Object-Rel-Sales representing the fact-object and the Object-Rel-Time, Object-Rel-Store, and Object-Rel-Product as the dimensions.

Each characteristic entity type representing a dimension will furthermore define a tree of these types corresponding to the possible roll-ups given by the hierarchy of the dimensions. Every sub-ordinate characteristic-relation, which participates in the OLAP cube as a dimension with its refinement hierarchy, can now be presented in the corresponding characteristic relations. In the following representation we describe the hierarchy StoreCity State.

Figure 3.1. Characteristic-Relations and Object-Relations of the dimension “Store”

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

City_Oid / Store_Oid
 c1 / 1
 c2 / 2
: / :
 cn / n

Char-Rel-City-Store

State_Oid / City_Oid
 s1 /  c1
 s2 /  c2
: / :
 sx /  cn

Char-Rel-State-City

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

with the object relations:

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

Store_Oid
1
2
.
n

Obj-Rel-Store

City_Oid
 c1
 c2
.
 cn

Obj-Rel-City

State_Oid
 s1
 s2
.
 sx

Obj-Rel-State

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

In the meta relation for characteristics relations we have the following tuples in the CG-Relation (Characteristic Graph Relation) denoting the refinement hierarchy. The example of CG-Relation for dimension “Store” is given in table 3.7. Characteristic object types provide a description of a given kernel entity representing the most aggregated granularity (in case of time: State), which form a strict hierarchy, called characteristic tree. For the example given in table 3.8 we have the complete characteristic tree described by the following CG-Relation.

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

SUB / SUP
Obj-Rel-Store / Obj-Rel-City
Obj-Rel-City / Obj-Rel-State

Table 3.7 CG-Relation

SUB / SUP
Obj-Rel-Time / Obj-Rel-Month
Obj-Rel-Month / Obj-Rel-Year
Obj-Rel-Store / Obj-Rel-City
Obj-Rel-City / Obj-Rel-State
Obj-Rel-Product / Obj-Rel-Item
Obj-Rel-Item / Obj-Rel-Category

Table 3.8 CG-Relation for all dimensions

Multidimensional Modeling Approaches for OLAP Based on Extended Relational Concepts1

4. An extended example for the use of the two-modeling approach in a Data Warehouse

In this section we take a small data warehouse schema example to show the use of the nested relations and of the extended relational in a Data Warehouse. The Data Warehouse has a sales fact relation (Sa) and three dimension relations. The dimension relations are given by the relation Store (S), Product (P), and Time (T) as represented in Figure 4.1. Figure 4.1 applies the multidimensional graphical notation as it is developed by Bulos [Bulo96] using the ADAPT modeling tool developed by Totok and Jaworski [TJ98]. A sample instantiation of the model in figure 4.1 is given by the relations Sa, S, P, and T in figure 4.2.


Figure 4.1 the model of a Data Warehouse Example