Data Warehouse- A Concept

UPENDRA SINGH RATHORE

DBA, SSG VLC,

O/o the Accountant General(A&E),

Rajasthan, Jaipur

Email:

Data warehouse is a concept I am trying to put it before you as I thought one day we would use this concept because our data is increasing day by day and any organization may be Government/Non Government can use this data for decision-making.

The ‘data warehouse in the client/server environment is the repository of data for decision support system.’

W.H. Inman

In todays computing environment we expect that data is to be doubled at least once a year because modernization/computerization creates data. Basically OLTP (Online Transaction Processing) creating all this data and in Our VLC system is a mix of OLTP and Batch processing.

The client/server application falls into two categories. Decision support system (DSS) and Online Transaction Processing (OLTP). These two client/server categories serve different type of Business solution.

In general all the data collected by an OLTP system is of direct use to the application. The people creating this data, understands and know how to use it to solve their immediate day-to-day problem

But what happen if somebody outside the OLTP group wants this data? How do they know what data is available? How do they access? What about security? Still there are some more questions to see. Outsiders are the people those who need data for analysis that can help them make better decision. Business people making strategic decisions – pricing, market analysis that depends on availability of timely and accurate data. If they can’t get to our data and we can’t get to theirs, then everybody loses.

DSS(Decision Support System)

Decision support system is used to analyze data and create reports. A successful DSS must provide the user with flexible access to data and the tools to manipulate and present that data in all kinds of reports formats. DSS are not generally time critical and can tolerate slower response time.

The Data Warehouse

Bill Inman is father of data warehouse concept introduced in 1981. He writes that “ one cornerstone of client/server applications is the notion of the difference between and separation of operational and decision support processing.”

Bill Inman defines a warehouse as a separate database for decision support, which typically contains vast amount of information. Richard Hackthorn defines a warehouse as “ a collection of data objects that have been intentioned for distribution to a business community.” Data warehouse gather data from multiple sources under a unified schema at a single site. In general warehouse is an intelligent store of data that can manage and aggregate information from many sources, distribute it as and when necessary.

Elements of data warehousing

1 The Data replication manager

Manages the copying and distribution of data across databases as defined by the information users. The users defines the data that needs to be copied, the source and destination platforms. Update and data transforms. Refresh involves copying over the entire data source; update only generates the changes.

  1. The informational database

Is Database that organizes and stores copies of data from multiple data sources? We can assume a decision support server that transform, aggregates and add values to data from various sources. It also stores metadata, System level and semantic level metadata.

  1. The information directory

It is an amalgam of functions of a technical directory business directory and information navigator. Its main function is to help the information users to find out what data is available on the different databases. What format it is in and how to access it. It also helps the DBAs to manage the data warehouse. The information directory gets its metadata by discovering which databases are on the network and the querying their metadata repositories. DBA use the information directory to access system level metadata keep track of data sources, data targets, cleanup rules, transformation rules and details about predefined rules and reports.

4.Dos tool support

Is provided via SQL most vendors support ODBC and some other protocol. In summary DBA must be able to assemble data from different sources, replicate it, clean it , store it, catalog it and the make it available to DSS tools. Data mining is one of them that refer loosely to finding relevant information from a large volume of data. Data mining attempts to discover pre defined/user defined rules & pattern automatically from data.

Architecture of a typical data warehouse

Though it seems easy to implement but there is the issues to be addressed in building a warehouse are the following:-

a)When and how to gather data for storing data.

The data transmit information either daily or periodically etc for new data to sources.

b)What schema to use

Data sources may likely have different schemas. The task of a warehouse is to perform schema integration and to convert data to the integrated schema.

c)How to propagate updates

Though updates at the source propagates to destination visually not a problem for DSS but we have to keep in mind when building a warehouse.

d)What to summarize

As we know raw data may be too large to store for a warehouse however we can answer many queries by managing just summary data obtained by aggregation on a relation, rather them maintaining the entire relation.

Last words -though the topic is very vast and it is quite impossible to summarize in one short topic. If you have some comments about this topic please write in thread/Paper so the debate may be continue.

Upendra Singh Rathore

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Data Warehouse – A Concept( 1 )