Additional Notes on Data Warehouses

Here are some additional notes to clarify/crystallize what we have discussed thus far. Please read carefully and bring questions to my office hours/class.

Two types of database loads:

OLTP:

On-Line Transaction Processing

Lots of short, read/write transactions

Small, simple queries

Frequent updates

OLAP:

On-Line Analytical Processing

Long, read-only transactions

Huge, complex queries

Rare updates

Data warehousing:Bring data from operational (OLTP) sources into a central warehouse to do OLAP

A grossly simplified example of a star schema:

Dimension tables:

Stores(StoreID, city, state)

Items(ItemID, name, description)

Custs(CustID, name, address)

Fact table:

Sales(StoreID, ItemID, CustID, price)

Example of a Star join:

SELECT *

FROM

Sales, Stores, Items, Custs

WHERE Sales.StoreID = Stores.StoreID

AND Sales.ItemID = Items.ItemID AND Sales.CustID = Custs.CustID;

A simple OLAP query: total sales for each store in California

SELECT

Sales.StoreID, SUM(price)

FROM

Sales, Stores

WHERE Sales.StoreID = Stores.StoreID

AND Stores.state = 'CA'

GROUP BY Sales.StoreID;

Idea: materialize views to speed up query

V(store,item) =see exact syntax using insert in lecture handouts

SELECT

StoreID, ItemID, SUM(price) AS total

FROM

Sales

GROUP BY StoreID, ItemID;

Rewrite the query using V(store,item)

Alternately:

V2(store) =

SELECT

StoreID, SUM(price) AS total

FROM Sales

GROUP BY StoreID;

Rewrite the query using V2(store). Which is faster/more efficient to answer the query?

Problem: which views to materialize?

Rules of Thumb:

Views with more GROUPBY attributes  Bigger, more detailed, benefit more queries

Views with fewer GROUPBY Attributes  Smaller, more summarized, benefit queries more

Additional notes on tuples, data cubes and multidimensional representation.

(color codes , meaning  tuple representation (time in quarters, product,country,Tsales)

time, product, country are dimension attributes, Tsales is total sales

White squares (basic fact table) - (q, p, c, sales)

Green squares total annual sales grouped by product and country. (*, p, c, Tsales)

Dark Green squares total annual sales grouped by product  (*, p, *, Tsales)

Orange squares total annual sales grouped by quarter and country.  (q, *, c, Tsales)

Dark orange squares total annual sales grouped by quarter.  (q, *, *, Tsales)

Grey total annual sales grouped by country.  (*, *, c, Tsales)

Other pair (quarter and product) not shown (need to pivot).  (q, *, p, Tsales)

Dark blue (all sales) (*, *, *, sales)