Data Warehousing – FAQ
1. What is a Data Warehouse?
A data warehouse is a consolidated database designed to give business decision makers instant access to information. The warehouse copies its data from existing systems like order entry, general ledger, human resources and stores it for use by executives rather than programmers. Data warehouse users use special software that allows them to create and access information when they need it, as opposed to a reporting schedule defined by the information systems (IS) department.
2. What do you mean by Data Warehousing?
Data warehousing can be defined as the Process of extracting data from an organizations OLTP database and external data sources, integrating the extracted data, aggregating, filtering, summarizing, standardizing, transforming and cleansing the data that has been extracted, before storing it in a consolidated database that is known as a data warehouse. This database will end up being the only source, which the management will access and retrieve information for decision making. In simple terms, the process of organizing information in such a way as to create databased knowledge is called Data Warehousing. The software products that present this knowledge to users are called Business Intelligence Tools.
The goal of business intelligence and data warehousing - changing data into information and knowledge.
3. Why do you need a Data Warehouse?
A Data Warehouse is a collection of corporate information, derived directly from the operational systems and some external data sources. Its specific purpose is to support business decisions, not business operations. Some of the limitations of the OLTP systems, which create the need for a data warehouse, are:
- Multiple source systems collecting similar or related information
- Difficulty in accessing this rich, filtered and dynamic information
- Unable to perform ad-hoc reporting
In a nutshell, an OLTP system supports only business operations, and not business decisions. Further, a Data Warehouse answers questions such as the following:
Your company’s Top/Bottom 10 customers – By Product line, By Region, By Gross sales, By Profitability, By Quantity of orders
Your company’s Top/Bottom 10 Salespeople – By Year/Quarter/Month, By Region, By Gross sales, By Profitability, By Volume of sales
Your company’s Top/Bottom Products – By Gross sales, By Profitability, By Season/Quarter/Month, By Number of repeat orders
4. What is the difference between a Warehouse and an OLTP system?
Typical relational databases are designed for on-line transactional processing (OLTP) and do not meet the requirements for effective on-line analytical processing (OLAP). As a result, data warehouses are designed differently than traditional relational databases.
Warehouses are Time Referenced, Subject-Oriented, Non-volatile (read only) and Integrated.
OLTP databases are designed to maintain atomicity, consistency and integrity (the "ACID" tests). Since a data warehouse is not updated, these constraints are relaxed.
5. What is the difference between a Operational Data Store and Warehouse?
An ODS (Operational Data Store) is an integrated database of operational data. Its sources include legacy systems and it contains current or near term data. An ODS may contain 30 to 90 days of information.
A warehouse typically contains years of data (Time Referenced). Data warehouses group data by subject rather than by activity (subject-oriented). Other properties are: Non-volatile (read only) and Integrated.
6. What are the distinct characteristics of a Data Warehouse?
Characteristics of a Data Warehouse
A. Subject-oriented
Focused on the subject data, to meet strategic business goals
Not an enterprise-wide store of all OLTP data
B. Integrated
Multiple source systems
Common terminology
Single domain for a given data item
Consistently updated
C. Time-variant
Data in the warehouse is associated with a point in time
Essential for time-series historical analysis and forecasting
D. Non-volatile
Once loaded, data does not change
The answer to a given historical question will not change based on the time the question is posed to the warehouse
E. Strategic Analysis Tool
Historical analysis
Forecasting analysis
Cost reduction
Evaluation of performance to corporate strategic goals
Discovery of revenue opportunities
7. What are the business benefits of data warehousing?
Business benefits of a Data Warehouse
A. Immediate information delivery
Data warehouses shrink the length of time it takes between when business events occurrence and executive alert. For example, in many corporations, sales reports are printed once a month - about a week after the end of each month. Thus, the June sales reports are delivered during the first week in July. Using a warehouse, those same reports are available on a daily basis. Given this data delivery time compression, business decision makers can exploit opportunities that they would otherwise miss.
B. Data integration from across, and even outside, the organization
To provide a complete picture, warehouses typically combine data from multiple sources such as a company's order entry and warranty systems. Thus, with a warehouse, it may be possible to track all interactions a company has with each customer - from that customer's first inquiry, through the terms of their purchase all the way through any warranty or service interactions. This makes it possible for managers to have answers to questions like, "Is there a correlation between where a customer buys our product and the amount typically spent in supporting that customer?"
C. Future vision from historical trends
Effective business analysis frequently includes trend and seasonality analysis. To support this, warehouses typically contain multiple years of data.
D. Tools for looking at data in new ways
Instead of paper reports, warehouses give users tools for looking at data differently. They also allow those users to manipulate their data. There are times when a color-coded map speaks volumes over a simple paper report. An interactive table that allows the user to drill down into detail data with the click of a mouse can answer questions that might take months to answer in a traditional system.
E. Freedom from IS department resource limitations
One of the problems with computer systems is that they usually require computer experts to use them. When a report is needed, the requesting manager calls the IS department. IS then assigns a programmer to write a program to produce the report. The report can be created in a few days or, in extreme cases, in over a year. With a warehouse, users create most of their reports themselves. Thus, if a manager needs a report for a meeting in half an hour, they, or their assistant, can create that report in a matter of minutes.
8. What can we do with a warehouse that we can't do today?
Organizations turn to warehouses to answer a limitless variety of questions and in almost all fields. Data warehouse applications include:
Sales Analysis
Determine "moment in time" product sales to make vital pricing and distribution decisions. Analyze past product sales to determine success or failure attributes
Evaluate successful products and determine key success factors. Use corporate data to understand the margin as well as the revenue implications of a decision. Rapidly identify a preferred customer profile based on revenue and margin. Quickly isolate past preferred customers who no longer buy. Identify daily where product is in the manufacturing and distribution pipeline. Instantly determine which salespeople are performing, on both a revenue and margin basis, and which are behind.
Financial Analysis
Compare actual expenses to budgets on an annual, monthly and month-to-date basis
Review past cash flow trends and forecast future needs. Identify and analyze key expense generators Instantly generate a current set of key financial ratios and indicators
Receive near-real-time, interactive financial statements. Human Resource Analysis
Evaluate trends in benefit program use. Identify the wage and benefits costs to determine company-wide variation. Review compliance levels for EEOC and other regulated activities.
Customer Analysis
Identify the ‘Buyer Behavior’
Analyze the Demographic and Psychographics patterns of Customers. Identify the heavy users of your product
Other Areas
Warehouses have also been applied to areas such as: logistics, inventory, purchasing, detailed transaction analysis and load balancing.
9. What are the Risks and Pitfalls involved in building a data warehouse?
Risks and Pitfalls
Target the wrong or an incomplete data set
Lack of end-user involvement
Data Extraction, Transformation and cleansing account for up to 80% of total project effort
A clear dimensional model has to be designed
10. How can you estimate the ROI of a warehouse?
Many companies make the mistake of trying to put a value on their data warehouse. The truth is the value of the warehouse stems from the new and changed business processes that it enables. Thus, in developing a warehouse it is extremely important to envision who is going to use the warehouse, how they will use it and why using it will improve upon the current process. It is not unusual for Well-considered and designed warehouses to generate first year ROIs of 100 to 400 percent. The value of the new business processes can be determined by comparing the costs with the expected benefits.
Hardware, software, development personnel and consultant costs
Operational costs like ongoing systems maintenance. Benefits
Benefits typically fall into two categories, new revenue and reduced costs.
Added Revenue
Will the new (business objective) process generate new customers (what is the estimated value?)
Will the new (business objective) process increase the buying propensity of existing customers (by how much?)
Is the new process necessary to ensure that the competition doesn't offer a demanded service that you can't match?
Reduced costs
What costs of current systems will be eliminated?
Is the new process intended to make some operation more efficient? If so, how and what is the dollar value?
11. What can't a warehouse do?
Warehouses gather and report data that already exists. Therefore, a warehouse cannot create more data. If a company wishes to analyze its customers by zip code but addresses are not captured by the company's systems, a warehouse will not solve the problem unless some method is found to gather this address data. In addition, if a company's data is 'dirty' in that current systems are not recording the correct information or optional fields, the warehouse will not correct that data. The warehouse is useful in identifying where data problems exist but corrections to those problems must be made in the systems that capture the data.
12. How do you approach a Data Warehousing project?
Plan! Plan! Plan!
Resources
Experienced Data Warehouse Architect
Programmers familiar with Legacy systems
Users familiar with Legacy data
Code Generator
How long will it take?
3 to 6 months per source system
13. What is a Data Mart?
Data marts can be viewed as a more specialized data warehouse. The size of a data mart is much smaller than that of a data warehouse and therefore the time required to implement a data mart is less than the time required to implement a data warehouse. The cost of implementing a data mart is less than that of a data warehouse.
14. Data Warehouse vs Data Mart
Data Warehouse vs Data Mart
Data Warehouse is larger
Data Mart is smaller
Data Warehouse has information about multiple subject areas
Data Mart is restricted (usually) to a single subject area
Data Warehouse may be used to feed one (or more) data marts
Data Mart may feed Data Warehouse
Data Warehouse takes longer time to build.
Data Marts are easier to build.
Involves large sum of money to build
Comparatively cheaper to build
Takes long time to realize ROI
Quicker and higher ROI
15. What are the two different approaches to building a data warehouse?
The Top-Down Approach
This approach requires building of an enterprise-wide data warehouse, with one or more underlying dependent data marts. This approach creates a data warehouse that gives end-users the ability to have an enterprise-wide perspective of business operations, issues and potential opportunities for business development. Also, it minimizes integration problems between data warehouse projects.
However, history has shown that IT-driven top-down projects often lead to long delivery schedules, high capital investments, cost overruns, and poor end-user functionality. Further, data warehousing projects, done using this approach, have a long delivery time and are not fast enough to satisfy the demand by organizations for solutions that enable them to respond rapidly to changing business conditions and quickly capitalize on the new revenue opportunities.
The Bottom-Up Approach
The alternative approach for an enterprise-wide data warehouse top-down development is to build independent data marts first, and then construct an enterprise data warehouse. The limitation perceived by the experts in this approach is that, an uncontrolled proliferation of independent data marts would result in integration problems between these data marts and the future enterprise data warehouse.
Thus, it is clear that both the top-down and bottom-up approaches to data warehouse development have their strengths and weaknesses. Hence, the ideal approach would be the one, which offers the low cost and rapid Return on Investment advantages of the data mart approach, without the problems of data integration in the future. To achieve this, the design and the development of the data marts must be managed and must be based on a common shared information model of data warehouse decision-processing requirements.
16. What are the advantages and disadvantages of top down approach?
Advantages:
1.Planned, integrated multi-tier solution.
2.Single, Central Meta Data Repository the enterprise data warehouse provides a central Meta data repository for the system, making the maintenance of the system much less complex than multiple repositories.
3.Centralized Rules and Control the "top-down" approach ensures that there is only one set of data extraction, scrubbing and integration jobs or processes to monitor and maintain.
Disadvantages:
1.costly and takes a longer time
2.built in an iterative manner, subject area by subject area
3.High Exposure/Risk Enterprise data warehouses are inherently high exposure ventures.
17. What are the advantages and disadvantages of bottom up approach?
Advantages:
1.Quick results
2.Immediate and good ROI
3.Data marts are built to a tightly focused scope, they can usually be brought to production status in six to nine months.
4.Tools, technologies, consultants and vendors can be subjected to a "test and toss" process. Those that work can be retained for the next step; those that don't can be replaced, with a natural breaking point between incremental steps.
Disadvantages:
1.These non-architected data marts become legacy data marts, or LegaMarts, and are infinitely challenging to integrate at a later date. LegaMarts are part of the problem, not part of the solution.
2.This can lead to management challenges in trying to coordinate the efforts and resources of multiple teams, especially in the areas of business rules and semantics
3.Eventually yields a disintegrated warehouse
18. What is Metadata?
Meta data is data about data. Meta data will be used by those who are going to administer and maintain the data warehouse as well as by those who are going to use the information stored in the data warehouse. There are two kinds of metadata that will come along with a data warehouse. They are technical metadata and business metadata. People who administer and maintain the data warehouse will use technical metadata and those who are going to use the data warehouse will use business metadata.
19. What do you mean by Dimensional Modeling?
A dimensional data model (DDM) represents business metrics as data structures. This is a form of E-R modeling that is limited to working with "measurement" types of subjects and has a prescribed set of allowable structures.
20. What is Star Schema? Why does one design this way?
A star schema is a set of tables comprised of a single, central fact table surrounded by de-normalized dimensions. Each dimension is represented in a single table. Star schema implement dimensional data structures with de-normalized dimensions. Snowflake schemas are an alternative to star schema.
Why?
It allows for the highest level of flexibility of metadata
Low maintenance as the data warehouse matures
Best possible performance
21. What is Snowflake Schema?
A snowflake schema is a set of tables comprised of a single, central fact table surrounded by normalized dimension hierarchies. Each dimension level is represented in a table. Snowflake schema implement dimensional data structures with fully normalized dimensions. Star schemas are an alternative to snowflake schema.