Operation Data Store
An operational data store (ODS) is a system that stores data from multiple sources. The data that is maintained in an ODS system is near real time and the data is usually at the transaction level. ODS technology is widely used because of its tricked-fed transaction as the happened so that it had the most up-to-data state of the enterprise including historical information as depicted by the figure below (GRAVIC, ned).
But ODS does have some advantages and disadvantages and they are as follows.
Advantage. Integrating systems into and ODS system provides such positive results because a single ODS system supports both tactical and strategic decision making for upper management. The biggest advantages is that many corporate IT are isolated providing the benefit of no longer to interact through an enterprise application integration (EAI). Each system only communicates with ODS systems.
Disadvantage. ODS has its obstacles of hitting these goals. An ODS implementation is very expensive in regards to hardware, application licensing, and development. Much more powerful data-mining engines are required and rule engines will need to be integrated. Lastly, applications may need to be re-architected to fit into the ODS system and this is expensive and very time consuming.
Data Mart
Data marts allows user to get access to common data much faster by utilizing a technique called dimensional data modeling. This optimizes data for creating reports. This is done because the data is prepared in a common format, the end user with little or no technical background, can browse the data mart and get the information that user is looking for. Below is a graph a data mart.
But just like the ODS system a data mart has its own advantages and disadvantages and they are as follows.
Advantage. A data mart system can improve a user’s response time because of it containing raw data which provides the capability of the system to focus on a single task which in turns improves performance. A data mart can also store historical data which allows users to analyze trends. Unlike like and ODS system, data marts are not expensive or complex to configure and implement because the ease of the technical issues that arise from a data mart.
Disadvantage. Compared to the other two mentioned in this paper, data marts have their own problems, including functionality, data size, scalability, performance, data access, and consolidation. Since the data in a data mart are broken into separate departments so they can concentrate on individual requirements, this makes data mart data difficult to access, consolidating, and cleansing.
Enterprise Data Warehouse
An enterprise data warehouse is a system that holds of the organization’s data and makes it accessible across the corporation. What makes an enterprise data warehouse technology attractive to organizations is that the data is always available for analyzing and planning purposes for upper management. Below is a graphs that depicts an enterprise data warehouse (Sheldon, 2008).
An enterprise data warehouse had its own advantages and disadvantage like its counterparts in this paper.
Advantage. An enterprise data warehouse system stores its data in a relational format that enables management to access data trends from consolidated databases that holds a more consistent, accurate, and subject-oriented data. The data sources are controlled by business rules which governs what consolidation techniques will be used, code standardization for developers, data cleansing, and how historical data is tracked.
Disadvantage. But an enterprise data warehouse is very expensive to implement because of its requirements of concentrating on data modeling and management, instead of the user’s needs. This makes the enterprise data warehouse require more resources, unique support tools, and time to implement the system.
The ETL process
The process of taking the source data and moving it to the data warehouse is called ETL (Extracting, Transforming, and Loading). This process uses an application to perform the connection to source data, extracting, scrubbing, transforming, and then loading the data into the data warehouse. The ETL process can take the majority of the project time.
The extraction process will import the data from the Lafleur spreadsheets. This will be cleaned for duplications, missing data, and other erroneous data. The data will be kept in a repository until the transformation process picks it up.
In the transaction process, the data will be converted from its current data type, length, and composition depending on how the ETL software is configured to modify the data. The transformation takes place at both record-level and field-level functions. Record-level uses the SELECTION and JOIN commands to manipulate and combine data. This function also normalizes and aggregates data. The field-level function converts data from the source format to the target format on single and multiple fields. Transformations can combine fields or separate a field out to multiple fields.
Reporting Strategies
There are many different reporting strategies to consider. These are dependent on what end-user will be accessing the data warehouse and the information they need to pull. There are ad-hoc, static, parameter-drive (drill-down and across), dashboards, and analysis reports to consider.
Ad-hoc Reporting
Ad-hoc reporting allows the end user to pick and choose what information they need in their report and build a report based on such information. To keep processing usage and memory usage down, the DBA usually restricts such queries to a time range that can be searched on (i.e. 1 day, 30 days, etc.).
Static Reporting
According to Microsoft “Static reports” (2005) website, “A static report is a report that is run immediately upon request and then stored, with the data, in the Data Warehouse.” (para. 1). This type of reporting would work well when users are pulling large amounts of data that needs to be referenced by multiple users or over a time range. Thus, reducing the amount of queries needing to be run on the data warehouse.
Parameter-driven Reporting
Commonly referred to as “Standard reports”, parameter-driven reports use a predefined format and are use parameters supplied by the end-user. Usually these are the reports most users look at every day.
Dashboard Reporting
Like the dashboard in your car provides an overview of what is going on with your car at a glance, business dashboards provide a simple, overall view of Key Performance Indicators (PKI). A dashboard should be simple to use, contain only minimal distractions, and provide meaningful information.
Analysis Reporting
Analysis reports compare one set of data with another to see trends, forecast, and make critical business decisions. These are usually custom based reports based on user requirements. The designs for such will need to be coordinated with the end users.
Data Mining
Data mining also referred as “knowledge discovery” is the process of analyzing data from a different view and summarizing it into useful information. This information helps organization to increase revenue, cuts costs, or both. In a nut shell data mining technology is focused more on consumer focused companies.
But before a data mining system can be implemented it needs to be well thought out for is efficiency and practicality for the organization. There are several components that needs to be done and considered.
Data Gathering
The very first item to building a productive data mining is to gather the data. Many organization have performed this task on other projects but unlike other systems where you gather all your data, in a data mining system you need to ensure the data is the critical to the business. Once you identify and locate the critical data of the organization it needs to be refined and prepared for the data mining process.
Algorithm
The next phase after data gathering is to choose what data mining algorithm to use. Depending on what type of what type of data and what problem you’re trying to solve and the tools that will be used. With the growing technology and user’s need for ease of reading reports the classification algorithm will be used.
Classification. This algorithm allows a wider variety of data to be processed. It is much easier to interpret the outputs of the system a well.
Infrastructure
The infrastructure plays a big role in a data mine system. Though you can configure data mining application on various platforms, the client/server infrastructure is the path because of its maturity and reliability as depicted below.
The client/server technology brings reliability and widely skill sets that are available since it’s a very mature technology. This provides the company with the ease of mind of employing the right technical staff for the job. With the client/server you have multiple paths how you can configure as well.
Desktop. Loading the application at the user’s desktop without having to keep multiple copies of the database at each desktop. Also, the CPU usage is primarily used at the desktop not the server. The server provides you the data aggregated to an extent and the application loaded at the desktop does the rest.
Web. In today’s technical world interfacing with the server is easy as calling the web site and requesting the information you seek. Data mining is no exception. You can configure the server with IIS services to support web technology.
References
University of Phoenix. (n.d.). Lafleur Trading Company. Retrieved from
https://ecampus.phoenix.edu/secure/aapd/CIST/VOP/Business/Lafleur/internet/index.asp
GRAVIC. (n.d.). Part 7 – Operational Data Store (ODS): The Next Evolutionary Step. Retrieved
from http://www.gravic.com/shadowbase/uses/theoperationalstore.html
Sheldon, R. (2008). SQL Server Data Warehouse Cribsheet. Retrieved from
https://www.simple-talk.com/sql/learn-sql-server/sql-server-data-warehouse-cribsheet/
Static reports. (2005). Retrieved from http://msdn.microsoft.com/en-
us/library/ee784360%28v=cs.20%29.aspx