Special interest activity
ITK-478
Market Basket Analysis and its prominence in strategic decisions in current day business
Sowjanya Alaparthi
Table of Contents
- Introduction pg: 3
- Market Basket Analysis activity pg: 3
- Apriori algorithm pg: 5
- Designing the product dimension pg: 12
- Conclusion pg: 16
- References pg: 17
- Tools pg: 17
- Proposed deliverables pg: 17
9.Special Interest Activity Task Plan Summary pg: 18
Introduction[6]:
Market Basket Analysis is a mathematical modeling technique based upon the theory that if you buy a certain group of items, you are likely to buy another group of items. It is used to analyze the customer purchasing behavior and helps in increasing the sales and maintain inventory. There are several methods of performing this analysis, Association analysis being one of them. In Association analysis some rules are framed depending upon the type of business and data is mined for meaning.Most people are aware of the concepts of a shopping cart from buying products online, and it can be extremely useful to track the contents of these baskets. Each basket can provide valuable insight into customer buying patterns and provide the opportunity to identify options for cross selling. To analyze the behavior, the customer purchase information is needed. The association rules are then applied to this information to get the end result.At the completion of this activity we will be able to analyze the customer-purchasing behavior.
The algorithms for performing this analysis are fairly straight forward, the complexity comes when we avoid combinatorial explosions and dealing with large transaction data. The computational complexity involved in calculating the results of Markey basket analysis is at least square of the number of transaction item-lines(records of every item purchased.) In our example schema we have around trillion values. This analysis is often promoted as a means to obtain product associations to base a retailer’s promotion strategy on. Predictive market basket analysis can be used to identify sets of item purchases that generally occur in sequence.
Other applications areas of this Market Basket Analysis are
- Analysis of credit card purchases
- Analysis of telephone calling patterns
- Identification of fraudulent medical insurance claims
- Analysis of telecom service purchases.
Market Basket Analysis Activity[3]
We accomplish Market Basket Analysis using Oracle 10g database, Oracle Data Miner Client and Oracle’s OLAP product Analytic Workspace Manager the market basket analysis. The sources for this example are tables that are shipped during the installation of Oracle Database. We have used Oracle 10g for this activity because it is packed with several features and gives the benefit from the investment in Enterprise edition. Our work is made simpler as it comes with Data mining capabilities.
For this demonstration we use the common schema (SH) that is shipped when we custom install the Oracle 10g. SH stands for Sales History and contains the point of sale data. This forms the main input for our analysis. Remember to select the Oracle Data Mining Select (ODM) option (default) during the installation.
To get started,
- Download Oracle 10g and install it
- Select the sample schema option during the custom installation
- Launch ODM client.
- One needs to be an administrator to perform some grants, in order to make it work. Execute the following command in SQLPLUSW. For simplicity sake, I have set all my passwords to ‘password’ and username is ‘sh’ and ‘system’. ‘system’ being the top level username, used to view all schema’s
alteruser sh account unlock;
alteruser sh identified by password;
grantcreatetableto sh;
grantcreate sequence to sh;
grantcreate session to sh;
grantcreateviewto sh;
grantcreateprocedureto sh;
grantcreate job to sh;
grantcreatetypeto sh;
grantcreatesynonymto sh;
grantexecuteon ctxsys.ctx_ddl to sh;
The points to be noted before starting the activity are:
- Make sure the oracle listening is started
- Database instance ‘ORCL’ is started.
- The port used is 1521
- Give the hostname as 127.0.0.1, which is a general hostname.
Apriori Algorithm:[7]
Now lets us learn something about Apriori algorithm and how the rules are based on it and where is it used in our activity.
Apriori algorithm is the most established algorithm for finding frequent item sets mining. Mining for associations among items in a large database of sales transaction is an important database mining function and this algorithm is used in such scenarios.
For example: the information that a customer who purchases a pen, also tends to buy papers at the same time is represented in association rule as
Pen =>Paper [ Support=6%, Confidence= 70%]
Confidence is the measure of uncertainty or trust worthiness associated with each discovered pattern.
Support of an associated pattern is the percentage of task-relevant data transactions for which the pattern is true.
A transaction T is a set of items(Itemset). T is a subset of I, where I is a set of all possible items. An association rule for this is in the form of, p->Q, where P is a subset of I and Q is a subset of I and p.
In our example, we input database of transactions, each transaction is a list of items. We find all rules that associate the presence of one set of items with that of another set of items. There are no restrictions on the antecedent or precedent of the rules. This algorithm finds all rules x&y->z with minimum confidence and support. Support s is he probability that a transaction contains{x,y,z} and confidence c is the conditional probability that a transaction having {x,y} also contain z.
The basic principle of Apriori is “Any subset of a frequent itemset must be frequent”.
The itemset that has minimum support is a frequent itemset. Use these frequent itemsets to generate association rules. Algorithm is as follows
Ck: Candidate itemset of size k
Lk: Frequent itemset of size k
L1={frequent items};
For (k=1; Lk!=0; k++) do begin
Ck+1= Candidates generated from Lk;
For each transaction t in the database do
Increment the count of all candidates in Ck+1 that are contained in t
Lk+1=candidates in Ck+1 with min_support
End
Return UkLk;
Pictorial representation of Apriori Algorithm[8]
5. Open ODM, click Build from Activity pull down menu to launch the activity and select Association rules from the pull down menu. It automatically sets the Algorithm field to Apriori as shown in the figure. We are on the first step of New activity wizard.
We are using associations rules as they discover relationship among items and apriori is used as it supports transactional data.
Click Next to continue. The transactions (market baskets) are contained in the SALES table; select PROD_ID as the identifier for the items purchased. However, the products are identified only by an item code; the names of the products are in the PRODUCTS table, so click the checkbox indicating that there is a Name Lookup table, and select the table, the item identifier (PROD_ID), and the column containing the item description (PROD_NAME). This will then create meaningful product descriptions in the results tablethat can then used to populate the description for the dimension.
Click Next
Click ‘Next’
Select the option ‘run upon finish’ and click ‘finish’ button on the new activity wizardFinish
Click over the ‘Result’ and then click over ‘get Rules’.
To filter the result, minimum confidence is set to 80% and Minimum Support is set to 5%. The results are sorted in the descending order for Confidence and Support. Click OK on that screen. This will take us to the final screen, which has Rule ID, Antecedent, Precedent, Confidence and Support. Each rule is clearly explained in bottom pane.
Finally, the results from the model can be published to a table. This table will then form the raw source for the new OLAP product dimension. Notice at this point there is no information relating to revenue, costs or quantity. All that information is already contained in the existing fact table that is used to feed the standard sales revenue, sales costs and sales quantity measures. All that needs to be done is to link the product bundles to the original fact table to extract the revenue, costs and quantity for each product within a bundle. The process for doing this will be explained later.
As well as the details of the products that comprise the rule two additional columns are provided: Confidence (%) and Support (%).
The Support for the rule is the percentage of baskets containing the items in the rule. In the example, Support is the percentage of all baskets containing the three items mouse, keyboard and printer.
The Confidence for the rule is the percentage of baskets containing the item(s) in the antecedents that also contain the consequent. In the example, consider only baskets containing keyboard and mouse and calculate the percentage of those baskets that contain printers
Designing the Product Dimension[3]
Now the results obtained from the market basket analysis should be transformed into more understandable or meaning form. We have to correctly format the results for dimension mapping.This can be done using OLAP DML or PL/SQL. I have used OLAP DML. Now create a new dimension table that confirms to the normal requirements for source table. Source table is the table that is published to the discoverer during market basket analysis. To accomplish this, a top level has to be added to provide a total for ‘ALL Rules’ and columns need to populated for each attributes. Each rule has to be broken down into its constituent parts
- Rule itself
- Precedents- products that are part of initial purchase
- Antecedents- products that are normally purchases in conjunction with the original items.
By doing this, we can track and analyze product bundles as part of normal reporting process. We create the attributes Support, Confidence and Product_src. Using the attributes with each rule, OLAP query steps can be created, which refine the analysis of product bundles by allowing business users to filter the results based on the value % confidence associated with the rule.
In our activity we create a separate dimension that can hold the results from algorithm. For each dimension we can create Levels, hierarchies, attributes and mappings. We created 4 levels for our activity.
Top level – All Baskets
Rule – Rule derived from the algorithm
Group – contains if, then parts of the rule.
Product – Base product
When we click on attributes, we get short description, long description, by default. We add other attributes for our demo, like Support, confidence and source product.
We should change the Datatype of support and confidence to decimal from text during their creation.
Launch Analytic workspace and give the login details as
Username- sh
Connection information- 127.0.0.1:1521:orcl
This connects us to ‘SH’ schema on port 1521 and local host 127.0.0.1, with database instance ‘ORCL’. Click next. Click on the ‘system’ on the left hand side tree to create a new analytic workspace. Market_Basket is the analytic workspace we have created for our activity. This contains its own levels, hierarchies, attributes and mappings. Following figure shows the model view of the analytic workspace Manager after creating the Market_Basket workspace.
Following figure shows the overview of Market_Basket analytic workspace.
Create all the levels, hierarchies and attributes as shown in the picture below.
Click on the mappings and then we have graphical mapping view on the right side, select the table u have published in the oracle data miner client and drop it on right side graphic view, and map All Rule ID’s to all members in MB_group, MB_ product, MB_rule and MB_Top.
Then map all supports from Sales_fact_table to all supports in all attributes and next map all confidence in Sales_fact_table to all confidence in all attributes. Our finished mapping looks like above screen.
The table mapping view of above mappings is shown below.
Due to some unknown reasons, my dimension viewer doesn’t show up the dimensions.It show an error message.
Theoritically, in order to view dimension, select the dimension name and right click it to find maintain dimension option. Click on it.After the dimension is maintained, select dimension view by right clicking the dimension. If everything run’s well we should get the following screen.
Conclusion: We have shown how Market basket analysis using association rules works in determining the customer buying patterns. This can be further extended using OLAP Analytic workspace, to add dimensions and cube to identify other measures like costs, revenue and quantity. I tried to extend it till the design of Product dimension, but due to some unfigured technical problerms I couldn’t accomplish this.Hopefully, I will try to give the supplementary copy, if I can extend to OLAP.
References
Books:
[1] Michael J. A. Berry, Gordon Linoff.”Data Mining Techniques: For Marketing, Sales, and Customer Support (Paperback)”.
[2] J.Han,M. Kamber(2001) “Data Mining”, Morgan Kaufmann publishers,San Francisco,CA
Links:
[3].
[4].
[5].
[6]
[7]
[8]
Tools:
We would be using the following tools to build our application:
Oracle 10g enterprise edition
SQL Plus
Oracle Data Miner Client
Oracle OLAP product Analytic workspace
Proposed Deliverables:
- A written document consisting of association rules framed for a generic market basket analysis application, and the result of analysis and its prominence in forecasting future trends.
- Demo on how Market Basket Analysis works, using a casestudy.
ITK 478 Special Interest Activity Task Plan Summary
Student: Sowjanya Alaparthi
Special Interest Activity Title: Market basket analysis and its prominence in strategic decisions in current day business
Activity Abstract: Market Basket Analysis is a mathematical modeling tool that is used to analyze the customer purchasing behavior and help in increasing the sales. This activity is performed using Association rulers, framed using Apriori algorithm. At the end of the Activity we can know the customer buying behavior that is, if a customer is buying a certain group of items, what other group of items he is likely to buy.
Activity Task Plan
Task/step Description /Hours spent
Reading the material and gaining the knowledge about installations / 20Designing and implementing the database and other requirements / 30
Configuring the database and performing market basket analysis / 20
Adding OLAP value to Market Basket Analysis in designing the product dimension / 15
Documenting the Activity / 10
Total / 95
- 1 -