We Now Introduce the Relations to Be Used in This Examples

We Now Introduce the Relations to Be Used in This Examples

A Multidimensional Modeling Approach for OLAP

within the Framework of the Relational Model based on

Quotient Relations

O. Mangisengi, A M. TJOA

Institute of Software Technology

Technical University of Vienna, Austria

{oscar,tjoa}@ifs.tuwien.ac.at

Abstract

The paper introduces the concept of quotient relations to model and query OLAP data. The use of quotient relation inherits the advantages of the original relational model as it was introduced by Codd. Drill-down and roll-up operations can be performed by the powerful partitioning and de-partitioning operators on quotient relations. The proposed approach fulfils the requirements for a formal data model, namely the existence of an implementation independent formalism, the separation of structure and content and the existence of a declarative query language.

  1. Introduction

The On-line Analytical Processing (OLAP) is emerging as the most important approach in Data Warehousing. OLAP allows to model data in a multidimensional way as a cube and to query and analyze data from many different perspectives. Independent from the different implementation aspects, OLAP data are presented to the user in a multidimensional data model.

There are several ways how to formally define multidimensional models and their query languages. However until now there do not exist a commonly accepted formal multidimensional data model. Such a model is necessary as a basis for an accepted standardized logical data model for OLAP data. This would allow practitioners and researchers to specify their data warehouses in a unified way.

The aim of this paper is to propose an approach for the conceptual modeling of multidimensional data which is entirely based on a relational data model. It seems to the authors that such a model would very much correspond with the original intuition of Codd when he introduced the concept of OLAP in his pioneering white paper [CCS93].

Among the different ways to define a data cube the star schema approach could be regarded as the most dominant one. A data cube is defined as a collection of at least one fact table and a set of dimension tables.

In this paper we will introduce a meta model with the capability to describe these tables and the OLAP queries in an elegant manner. The general requirements for such a formal data model that can serve as a foundation for multidimensional database systems, are similar to those that made the relational model successful, namely the existence of an implementation independent formalism, the separation of structure and contents, and the existence of declarative query language [BSHD98].

The remainder of this paper is structured as follows:

  • Section 2 lists related work.
  • In section 3 we briefly present the quotient relation approach.
  • The modeling of OLAP-data by means of quotient relation is given in section 4.
  • Section 5 presents an illustrative example for the use of quotient relations for multidimensional queries.
  • Section 6 presents our conclusion.
  1. Related works

Recently a number of approaches are proposed in the literature for the formal foundation of multidimensional modeling.

[BSHD98] compares and describes the most important modeling approaches in this area, namely the approaches [AGS97] of Agrawal, Gupta, and Sarawagi, [CT97a,CT97b] of Cabbibo and Torlone, [LW96] of Li and Wang, and [GL97] of Gyssens and Lakshmanan.

In [AGS97] data are organized in one or more hypercubes where the cell values are either n-tuples which represent n-measurement values or a value of the set {0,1} where “0” indicates that for the given combination of dimension values no measurement values exists and “1” indicates that the combination of dimension values exists and can be computed.

The model does not contain any information about the structure of the dimensions. Especially, there is no static construct representing dimension levels. The expressive power of the model is as powerful as the relational algebra since the relational operators can be expressed using the basic operator set of the model [BSHD98].

The approach of Li and Wang [LW96] is characterized by the concept of grouping algebra as an algebraic query language. In Li and Wang’s approach the basic concept is a multidimensional cube which is represented by a number of relations for the dimensions of the cube and for each combination of dimension tuples, an associated scalar data value representing a single fact attribute. [LW96] introduces a multidimensional cube algebra for manipulating such cubes.

Dimension hierarchies and multiple hierarchies can be expressed using the powerful grouping mechanisms. For every fact attribute a separate cube has to be specified [BSHD98].

Gyssens and Lakshmanan [GL97] introduces a model which is based on a set of tables which included restriction concerning foreign key consisting, the key property of dimension tables and the uniqueness of Tid values for keys. The approach clearly discern between parameters (dimension attributes) and measure attributes.

The algebra of [GL97] has an expressive power which is comparable to the relational algebra. Cabbibo and Torlone proposed a formal multidimensional model and a correspond descriptive queries language based on a logical calculus [CT97ab].

The multidimensional data model is defined by the notion of f-tables as basic data structure. The f-table contains the measure value for each cell of the data cube containing a value. Formally, a dimension is defined by a triple , where L is the finite set of level’s which is partially ordered by the relation, (e.g ), and R-UP is a family of function where roll-up functions can be performed (e.g. etc.).

The aim of the paper is to introduce the well known concept of quotient relations which was introduced in [FK77] for OLAP modeling of multidimensional data within the framework of the relational model.

In the next section we will show the relevance of the quotient relations for OLAP. The partitioning operator is very well suited for the construction of n-dimensional cubes. With the use of the partitioning and de-partitioning operator drill-downs and roll-ups can be performed in an easy manner.

  1. Quotient Relations

3.1.Basic Concepts

In this section we will briefly introduce the basic concepts of quotient relation as it was introduced in [FK77].

-Let be (not necessarily distinct) sets. A relation R on is a subset of the Cartesian product ; i.e. .

-Corresponding to each is an Aj called an attribute of R. The attributes of R are distinct.

-Let I denote the collection of attributes of the n-ary relation (The relation R is often represented as R(I)).

-Let R be an n-ary relation and A a subset of I. The relation is said to be an equivalence on R if for tuples

Thus if tuples t and have the same attribute values with respect to the attributes of A they are equivalent under .

is reflexive, symmetric, and transitive, hence an equivalence relation.

As such, we say R is partitioned by A into disjoint blocks of equivalent tuples, each of which is denoted by

-The collection of all blocks, written as

Example

For the given relation R(A,B,C,D) in Table 3.1 the relation is presented in Table 3.2.

Table 3.1

Relation R(A,B,C,D)

R

/

A

/

B

/

C

/

D

1 / 2 / X / X
1 / 2 / X / Y
1 / 3 / Z / W
2 / 1 / X / Y
2 / 2 / X / Z
3 / 3 / X / Y

Table 3.2

Relation

R/A

/

A

/

B

/

C

/

D

1 / 2 / X / X
1 / 2 / X / Y
1 / 3 / Z / W
2 / 1 / X / Y
2 / 2 / X / Z
3 / 3 / X / Y

-Let R(I) be an n-ary relation. For different choices of A, different quotient relations are obtained. Together they constitute the family of quotient relations over R,

.

The partitioning attribute set for every (i.e.) is denoted as .

-It can be easily shown that for a given n-ary relation R, the family is a finite lattice with universal bounds the quotient relation consisting of a single block (i.e. ), and the quotient relation in which each tuple is itself a block.

The partial order in the lattice is given by the degree of refinement. Then for any we have .

-Let R(I) be an n-ary relation with A and B denoting any two sets of attributes of R, and a member of . The partitioning operator (/) induces a partition on a quotient relation, while the de-partitioning operator (*) eliminates a particular partition. The operations must conform the following rules:

(A \ B)

and .

3.2.The Quotient Algebra

Let R and S be n-ary and m-ary relations, respectively. Consider the quotient relations and partitioned by sets A and B respectively (i.e. and ). Further let C and D be other attribute sets of R.

Projection

The projection is defined only if in which case T is partitioned by A. Blocks are treated as unit by the operators, and in case of projection each block is projected on C and possible tuples eliminated.

Restriction

The restriction is defined so that a block [t] of participates in T only if (where ), provided that the underlying domains are compatible. The resultant quotient relation T is still partitioned by A.

Cartesian Product

The Cartesian product, is defined such that for and a block [tij] of T is defined as

[tij] = [ri] [sj],

where denotes the Cartesian product of the conventional algebra; denote the blocks of ; and [s1],[s2],…,[sp], denotes the blocks of . Note that T is partitioned by the union

Set operators like union, set-difference, and intersection are obviously also defined in the quotient algebra. It can easily be shown that the quotient algebra is relationally complete.

It is beyond the scope of this paper to introduce other relevant comparison operators, e.g. comparing some elements of a set with all elements of another set. A detailed description of the quotient algebra approach will be given in [KT98a].

4. The modeling of OLAP-data by means of quotient relation

4.1.Quotient Algebra in a Multidimensional Model

Based on the theory of the quotient algebra, a relation can be partitioned by attributes due to its attributes values. The relation partitioned by these attributes can be viewed as a multidimensional representation of this relation. Furthermore, we will show how the operations of the quotient algebra could be used for handling multidimensional cubes.

As a first example, we introduce a relation R(Year,Item,City,Sales). Let relation R contains the following data:

Table 4.1.

Relation R(Year,Item,City,Sales)

Year
Y /

Item

I / City
Ci / Sales
Sa
1994 / 1 / New York City / 40
1994 / 2 / New York City / 30
1994 / 1 / Los Angeles / 30
1994 / 2 / Los Angeles / 20
1994 / 1 / San Francisco / 10
1994 / 2 / San Francisco / 20
1995 / 3 / New York City / 20
1995 / 4 / New York City / 60
1995 / 2 / Los Angeles / 10
1995 / 4 / Los Angeles / 50
1995 / 2 / San Francisco / 30
1995 / 4 / San Francisco / 40

Example 4.1

Partitioning of relation R by the Year:

This operation denotes that a quotient relation R is partitioned by attribute Year. The result of this partitioning is shown by the following table 4.2. The attributes names of those attributes which are used for partitioning are highlighted by bold letters

Table 4.2

Quotient relation derived by

partitioning of Year

Year

Y

/ City

Ci

/

Item

I

/ Sales

Sa

1994 / New York City(NYC) / 1 / 40
1994 / New York / 2 / 30
1994 / Los Angeles (LA) / 1 / 30
1994 / Los Angeles / 2 / 20
1994 / San Francisco(SF) / 1 / 10
1994 / San Francisco / 2 / 20
1995 / New York City / 3 / 20
1995 / New York City / 1 / 60
1995 / Los Angeles / 2 / 10
1995 / Los Angeles / 4 / 50
1995 / San Francisco / 2 / 30
1995 / San Francisco / 4 / 40

For simplicity reasons we will represent the quotient relation by omitting the explicit repeating of attribute values in the tuples of the quotient relation.

The next table shows a representation of relation in this manner.

Table 4.3

Simplified representation of table 4.2

Year

Y

/ City

Ci

/

Item

I

/ Sales

Sa

NYC / 1 / 40
NYC / 2 / 30
LA / 1 / 30
1994 / LA / 2 / 20
SF / 1 / 10
SF / 2 / 20
NYC / 3 / 20
NYC / 1 / 60
LA / 2 / 10
1995 / LA / 4 / 50
SF / 2 / 30
SF / 4 / 40

Example 4.2

Partitioning of relation of the previous example by City.

Table 4.4

Quotient relation derived by

partitioning of Year and City

Year

Y

/ City

Ci

/

Item

I

/ Sales

Sa

NYC / 1 / 40
2 / 30
LA / 1 / 30
1994 / 2 / 20
SF / 1 / 10
2 / 20
NYC / 3 / 20
1 / 60
LA / 2 / 10
1995 / 4 / 50
SF / 2 / 30
4 / 40

Example 4.3

Partitioning of relation by Item (i.e. ) corresponds to the following three-dimensional cube representation (Figure 4.1).

4.2. Fact & Dimension

Multidimensional model are mostly designed consisting of a fact relation and some dimension relations. Therefore, we will introduce the following definition of a join operator in the quotient algebra between a fact relation and its dimension relations.

Let A be the set and

B be the set

where

and are the attributes in common in the attribute sets A and B.

The natural join between two quotient relations Fact(A) and Dim(B), denoted as , is defined as follows:

Figure 4.1

Multidimensional View of

Example 4.4

Let Fact(A) with A={Year,Item,City,Sales} be the flat fact relation given in Table 4.1.

Let Dim(B) with B={City,State} be the following dimension quotient relation (partitioned by City) given in Table 4.5.

Table 4.5

Quotient Relation B{City,State} which is partitioned by City

City
Ci / State
ST
SF / CA
LA / CA
Dallas / TX

The join of Fact(A) with Dim(B), denoted , has as its resulting quotient relation Table 4.6. It is obvious that this result is partitioned by City.

Table 4.6

Result of

Year

Y

/ City

Ci

/

Item

I

/ Sales

Sa

/

State

ST
1994 / 1 / 10 / CA
1994 / SF / 2 / 20 / CA
1995 / 2 / 30 / CA
1995 / 4 / 40 / CA
1994 / 1 / 30 / CA
1994 / LA / 2 / 20 / CA
1995 / 2 / 10 / CA
1995 / 4 / 50 / CA

Example 4.5

Let Fact2(A) be a fact quotient relation partitioned by the attribute Year (i.e. quot(Fact2(A))={Year}) as given in Table 4.7.

Table 4.7

Quotient relation derived by consecutive partitioning by Year

Year

Y

/ City

Ci

/

Item

I

/ Sales

Sa

NYC / 1 / 40
NYC / 2 / 30
1994 / LA / 1 / 30
LA / 2 / 20
SF / 1 / 10
SF / 2 / 20
NYC / 3 / 20
NYC / 1 / 60
1995 / LA / 2 / 10
LA / 4 / 50
SF / 2 / 30
SF / 4 / 40

Let Dim(B) be a dimension quotient relation with B={City,State} which is already partitioned by the attribute State (i.e. quot(Dim(B))={State}) given in Table 4.8.

Table 4.8

Relation Dim(B){City,State} partitioned by State

City
Ci / State
ST
SF / CA
LA
Dallas / TX
NYC / NY

The join has the following result (Table 4.9):

Table 4.9

Result ofpartitioned by State

Year

Y

/

City

Ci

/

Item

I

/

Sales

Sa

/

State

ST

1994 / LA / 1 / 30
LA / 2 / 20
1995 / LA / 2 / 10
LA / 4 / 50 / CA
1994 / SF / 1 / 10
SF / 2 / 20
1995 / SF / 2 / 30
SF / 4 / 40
1994 / NYC / 1 / 40
NYC / 2 / 30 / NY
1995 / NYC / 3 / 20
NYC / 1 / 60

4.3. Rolling-up & Drilling-down

In this section, we apply the partitioning and de-partitioning operator for the drilling-down and rolling-up operations in the multidimensional model. The quotient algebra supports the partitioning operator (/) used to induce a partition on a quotient relation. On the other hand, it also supports the de-partitioning operator (*) to eliminate a particular partition. As stated above the following quotient algebra operation provides the operation of the partitioning operator.

The operation X denotes that a quotient relation R is partitioned by attribute A, and B. To eliminate the partitioning by B, the de-partitioning operator, *, can be used as follows:

In the multidimensional model, the two operations, rolling-up and drilling-down, can be applied using the partitioning and de-partitioning operator.

4.3.1. Rolling-Up

Rolling-up is an operation used to aggregate data to a higher level in the dimension hierarchy. In the quotient algebra, the partition operator can be used to denote rolling-up operations.

Let R(A) be a quotient relation with quot(R(A)) as its partitioning attributes. The rolling-up operation in the quotient relation R(A)(with quot(R(A))) by aggregating the attribute X (with ) is denoted by:

(i.e. )

with

To illustrate the rolling-up operation in the quotient algebra, we introduce the following quotient relation S(ST,Ci,STO,Sa) with the aggregation hierarchy StoreCityState. We use a sample quotient relation S given by table 4.10.

Table 4.10

Relation

State
ST / City
Ci / Store
STO / Sales
Sa
CA / SF / 4 / 100
CA / SF / 5 / 300
CA / SF / 6 / 500
CA / LA / 2 / 200
CA / LA / 3 / 400
CA / LA / 7 / 600
NY / NY / 1 / 500

Let us assume as above that the lowest level of aggregation in the quotient relation is the store attribute. To roll-up the sales data from store to city, the quotient algebra operation is given as follows:

.

The result of rolling-up the sales data from Store to City is given in the table 4.11.

Remark: It could be useful to introduce derived attributes in quotient relations (e.g. sum, average, maximum, minimum, etc.) which are functionally dependent from the partitioning attribute. Table 4.11 shows the introduction of sums for the partitioning attribute City.

Table 4.11

Rolling-up Store to City and the introduction of derived attributes Sum(Sales,City)

State
ST / City
Ci / Store
STO / Sales / Sum
(Sales,City)
4 / 100
CA / SF / 5 / 300 / 900
6 / 500
2 / 200
CA / LA / 3 / 400 / 1200
7 / 600
NY / NY / 1 / 500 / 500

If a higher summarization of data is required, then a quotient relation S has to be partitioned by this higher level attribute. In our case, a rolling-up to State requires a further partitioning by ST and the quotient algebra operation can be written as follows:

.

The result of this roll up operation is given in table 4.12.

Table 4.12

Rolling-up from City to State and the introduction of the derived attributes Sum(Sales,State)

State
ST / City
Ci / Store
STO / Sales / Sum(
Sales,City) / Sum(
Sales,State)
4 / 100
SF / 5 / 300 / 900
CA / 6 / 500 / 2100
2 / 200
LA / 3 / 400 / 1200
7 / 600
NY / NY / 1 / 500 / 500 / 500

4.3.2. Drilling-down

Drilling-down in a multidimensional model is used to obtain the data of a finer granularity level within the aggregation hierarchy of a given dimension. In the quotient algebra, the de-partitioning operator can be used for this purpose. The de-partitioning operator can be regarded as the inversion of the partitioning operator.

Let R(S) be a quotient relation partitioned by the attribute-set quot(R(S)). The de-partitioning operator (*) reduces the degree of partitioning of a quotient relation. R(S) can be de-partitioned by an attribute set .

/ (S \ X)

with

quot(R(S)*X) := quot(R(S)) \ quot(X).

It is obvious that

and

.

The drilling-down operation on a quotient relation into a relation where the details of are of interest can be performed by the following operation:

,

where is a quotient relation, and A denotes attribute set used for the original partition, and B is the partitioning attribute to be eliminated. In this operation, data is disaggregated from a higher level to a lower level.

We obtain a drill-down of the sales data from State to City, in the following way:

,

(i.e. ).

In this example, the partitioning of the attribute ST is eliminated, and the table obtained by drilling down table 4.12 is table 4.11 again.

If we want to obtain a more detailed view of the data, e.g. at store level, the required drill-down from City to Store can be realized by:

or

The result of this drilling-down operation is again table 4.10.

5. An extended example for the use of the quotient algebra in a Data Warehouse

To show the use of the quotient algebra in a Data Warehouse, we use a small data warehouse schema example. In this example, the Data Warehouse has a sales fact relation (Sa) and three dimension relations. The dimension relations are given by the relation Store (S), Product (P), and Time (T). A sample illustration of the relation Sa, S, P, and T is given in the following tables:

Table 4.13

Relation Sa(T,P,STO,SL)

Time
T / Product
P / Store
STO / Sales
SL
1 / 3 / 1 / 30
2 / 2 / 1 / 20
13 / 3 / 4 / 10
14 / 2 / 4 / 50
397 / 3 / 1 / 40
398 / 3 / 1 / 30
401 / 3 / 4 / 20
402 / 3 / 4 / 60
1 / 5 / 5 / 25
14 / 3 / 5 / 45

Relation S(STO,Ci,ST)

Store

STO

/

City

Ci

/

State

ST

1 /

New York City

/

NY

4

/

Los Angeles

/

CA

5

/

San Francisco

/

CA

7

/

Austin

/

TX

Relation P(P,I,Ct)

Product
P / Item
I / Category
Ct
2 / Lots of Nuts / Food
3 / Sweet tooth / Food
4 / Fizzy Light / Soft drinks
5 / Fizzy Classic / Soft drinks

Relation T(T,M,Y)

Time
T / Month
M / Year
Y
1 / 10 / 1994
2 / 10 / 1994
13 / 10 / 1994
14 / 10 / 1994
397 / 11 / 1995
398 / 11 / 1995
401 / 11 / 1995
402 / 11 / 1995

To demonstrate the relevance of the use of quotient algebra in a data warehouse environment, a representative multidimensional query will be described by means of quotient algebra operations.

Example

Comparison of the sales of a product `sweet tooth` for stores in state CA for the year 1994 and 1995.

The following steps used to process a multidimensional query above using the quotient algebra are: