Chapter 3: Dimensional Modeling

Importance data modelling

Visualization of the business world:Model is an abstraction and reflection of the real world. Modeling gives us the ability to visualize what we cannot yet realize. It is the same with data modeling. Traditionally, data modelers have made use of the ER diagram, developed as part of the data modeling process, as a communication media with the business end users. The ER diagram is a tool that can help in the analysis of business requirements and in the design of the resulting data structure. Dimensional modeling gives us an improved capability to visualize the very abstract questions that the business end users are required to answer. Utilizing dimensional modeling, end users can easily understand and navigate the data structure and fully exploit the data.

Data Modeling Techniques

Two data modeling techniques that are relevant in a data warehousing environment are ER modeling and dimensional modeling. ER modeling produces a data model of the specific area of interest, using two basic concepts: entities and the relationships between those entities. Detailed ER models also contain attributes, which can be properties of either the entities or the relationships. The ER model is an abstraction tool because it can be used to understand and simplify the ambiguous data relationships in the business world and complex systems environments.

Dimensional modeling is powerful in representing the requirements of the business user in the context of database tables. Both ER and dimensional modeling can be used to create an abstract model of a specific subject. However, each has its own limited set of modeling concepts and associated notation conventions. Consequently, the techniques look different, and they are indeed different in terms of semantic representation.

Dimensional modeling is a technique for conceptualizing and visualizing data models as a set of measures that are described by common aspects of the business. It is especially useful for summarizing and rearranging the data and presenting views of the data to support data analysis. Dimensional modeling focuses on numeric data, such as values, counts, weights, balances, and occurrences.

Dimensional Models

Dimensional model is a de-normalized relational model which is made up of tables with attributes. The relationships are defined by keys and foreign keys. The data is organized for ease of understanding and ease of reporting rather than update

Dimensional modeling has three basic concepts:

· Facts

· Dimensions

· Measures (variables)

Fact

A fact is a collection of related data items, consisting of measures and contextdata. Each fact typically represents a business item, a business transaction, oran event that can be used in analyzing the business or business processes.In a data warehouse, facts are implemented in the core tables in which all of thenumeric data is stored.

Dimension

A dimension is a collection of members or units of the same type of views. In adiagram, a dimension is usually represented by an axis. In a dimensionalmodel, every data point in the fact table is associated with one and only onemember from each of the multiple dimensions. That is, dimensions determinethe contextual background for the facts. Many analytical processes are used toquantify the impact of dimensions on the facts.Dimensions are the parameters over which we want to perform Online AnalyticalProcessing (OLAP). For example, in a database for analyzing all sales ofproducts, common dimensions could be:

· Time

· Location/region

· Customers

· Salesperson

· Scenarios such as actual, budgeted, or estimated numbers

Dimensions can usually be mapped to nonnumeric, informative entities such asbranch or employee.

Dimension Members: A dimension contains many dimension members. Adimension member is a distinct name or identifier used to determine a dataitem¢s position. For example, all months, quarters, and years make up a timedimension, and all cities, regions, and countries make up a geographydimension.

Dimension Hierarchies: We can arrange the members of a dimension into oneor more hierarchies. Each hierarchy can also have multiple hierarchy levels.Every member of a dimension does not locate on one hierarchy structure.A good example to consider is the time dimension hierarchy as shown inFigure 14. The reason we define two hierarchies for time dimension is becausea week can span two months, quarters, and higher levels. Therefore, weekscannot be added up to equal a month, and so forth. If there is no practicalbenefit in analyzing the data on a weekly basis, you would not need to defineanother hierarchy for week.

Measure

A measure is a numeric attribute of a fact, representing the performance orbehavior of the business relative to the dimensions. The actual numbers arecalled as variables. For example, measures are the sales in money, the salesvolume, the quantity supplied, the supply cost, the transaction amount, and soforth. A measure is determined by combinations of the members of thedimensions and is located on facts.

Practitioners of DM have approached developing a logical data model by selecting the business process to be modeled and then deciding what each individual low level record in the "fact table" (the grain of the fact table) will mean. The fact table is the focus of dimensional analysis. It is the table dimensional queries segment in the process of producing solution sets. The criteria for segmentation are contained in one or more "dimension tables" whose single part primary keys become foreign keys of the related fact table in DM designs. The foreign keys in a related fact table constitute a multi-part primary key for that fact table, which, in turn, expresses

a many-to-many relationship. In a DM further, the grain of the fact table is usually a quantitative measurement of the outcome of the business process being analyzed.

Since a dimensional model is visually represented as a fact table surrounded by dimension tables, it is frequently called a star schema. Figure below is an illustration of a DM/star schema using a student academic fact database.

A dimensional model of a retail grocery shop

Possible date attributes
SQL date / Fiscal week
Full date description / Year
Day of week / Month
Day of month / Fiscal year
Day of calendar year / Holiday ?
Day of fiscal year / Day of holiday
Month of calendar year / Weekday ?
Month of fiscal year / Selling season
Calendar Quarter / Major event
Fiscal Quarter / etc.
Possible Store Attributes
Store Name / District
Store Number / Region
Street address / Floor plan type
City / Photo processing type
County / Financial service type
State / Square footage
Zip / Selling square footage
Manager / First open date
Last remodel date / etc.
Possible Promotion Attributes
Promotion name / Begin date
Price reduction type / End date
Promotion media / Coupon type
Ad type / Promotion cost
Ad agency / etc.

Visualization of a Dimensional Model

The most popular way of visualizing a dimensional model is to draw a cube. Usually a dimensionalmodel consists of more than three dimensions and is referred to as a hypercube.In the Figure, the measurement is the volume of production, which isdetermined by the combination of three dimensions: location, product, and time.The location dimension and product dimension have their own two levels ofhierarchy. For example, the location dimension has the region level and plantlevel.