SEG4630 E-CommerceData Mining

(2009-10 First Term)

Assignment 2

(10 points)

Due time and date: 5pm Nov19 Thursday

Submit to assignment box D03, 5/F ERB

Submission Requirements

  • The hand-in version must be ordered correctly and stapled in the top left corner.
  • The hand-in version must include a header page (or with sufficient space) indicating: student name, student ID and assignment number.

Question 1.A database has four transactions. Let min_sup=60% and min_conf=80%.

Cust_ID / TID / Items (in the form of brand-category)
01 / 100 / (King’s-Crab, Sunset-Milk, Dairy-Cheese, Best-Bread)
02 / 200 / (Best-Cheese, Dairy-Milk, Golden-Apple, Tasty-Pie, Wonder-Bread)
01 / 300 / (West-Apple, Dairy-Milk, Wonder-Bread, Tasty-Pie)
03 / 400 / (Wonder-Bread, Sunset-Milk, Dairy-Cheese)

1.At the granularity of category (e.g., itemi is “Milk”, itemj is “Bread”), list the frequent k-itemsets for the largest k, andfor the following rule template at the transaction level, list all the strong association rules with support and confidence from your reported k-itemsets. (2 pt)

[support, confidence].

2.At the granularity of brand-category (e.g., itemi is “Sunset-Milk”, itemj is “West-Apple”), for the following itemset template at the customer level, list the frequent k-itemsets for the largest k only. (2 pt) [Hint: Union the two transactions of customer 01 into one set. Treat the database size as 3 at the customer level. ]

[support].

Question 2.Suppose that frequent itemsets are mined and saved for a large transaction database. Discuss how to efficiently mine the global frequent itemsets under the same relative support threshold if a set of new transactions is added into the database.Note: Mining from scratch as the database grows is not considered an efficient solution. (2 pt)

Question 3.In association mining, besides the lift measure, there are some other measures to test whether two items ai and aj in a transaction database are correlated, i.e., all-confidence, Kulczynski, cosine, and max-confidence as defined below.

  1. Prove that all-confidence =< cosine =< Kulczynski =< max-confidence.(2 pt)
  1. If a user wants to mine strongly correlated frequent itemsets with the Kluc measure, i.e., given a constant c, find all itemsets X such that Kluc(X)>=c. Is Kluc a monotonic or anti-monotonic constraint? If the items are sorted in the increasing order of support count, what type of constraint Kluc is?(2 pt) Note: the general definition of Kulc on a k-itemset is