Mining Frequent Patterns, Associations and Correlations
Week 3
1
Mining Frequent Patterns, Associations and Correlations Week 3 1 - - PowerPoint PPT Presentation
Mining Frequent Patterns, Associations and Correlations Week 3 1 Team Homework Assignment #2 Team Homework Assignment #2 Read pp. 285 300 of the text book. R d 285 300 f h b k Do Example 6.1. Prepare for the results of the
1
http://www.lucyluvs.com/images/fitt edXLpooh.JPG edXLpooh.JPG http://www.mondobirra.org/sfondi/BudLight.siz ed.jpg
4
cell_cycle ‐> [+]Exp1,[+]Exp2,[+]Exp3,[+]Exp4, support=52.94% (9 genes) apoptosis ‐> [+]Exp6,[+]Exp7,[+]Exp8, p p [ ] p ,[ ] p ,[ ] p , support=76.47% (13 genes) http://www.cnb.uam.es/~pcarmona/assocrules/imag4.JPG
5
T a ble 8.3 T
he substitutio n matrix o f amino ac ids.
F ig ure 8.8 Sc o ring two po te ntial pairwise alignme nts, (a) and
(b), o f amino ac ids.
6
F ig ure 9.1 A sample graph data se t. F ig ure 9.2 F
re que nt graph.
7
F ig ure 9.14 A c he mic al database .
8
9
– What products were often purchased together?— Beer and diapers? – What are the subsequent purchases after buying a PC? – What kinds of DNA are sensitive to this new drug? – Can we automatically classify web documents?
10
11
12
13
14
15
k
16
Apriori property: All nonempty subsets of a frequent itemset must – Apriori property: All nonempty subsets of a frequent itemset must also be frequent
17
F ig ure
ite mse
5.4 T
he A ts fo r min Aprio ri alg ing Bo o le go rithm fo e an asso c
c iatio n rul e ring fre q le s. que nt
18
TI D List of item I Ds TI D List of item _ I Ds T100 I1, I2, I5 T200 I2, I4 T300 I2, I3 T400 I1, I2, I4 T500 I1, I3 , T600 I2, I3 T700 I1, I3 T800 I1 I2 I3 I5 T800 I1, I2, I3, I5 T900 I1, I2, I3
T a ble 5 1 T
ransac tio nal data fo r an AllE le c tro nic s branc h
T a ble 5.1 T
ransac tio nal data fo r an AllE le c tro nic s branc h.
19
Minimum support count = 2 Figure 5.2 Generation of candidate itemsets and frequent itemsets, where the minimum support count is 2.
20
1
2
) ( ) ( ) ( ) ( ) | ( ) ( A unt support co B A unt support_co A support B A support A B P B A Confidence ∪ = ∪ = = ⇒ ) ( ) ( pp _ pp
3
– I1 ^I2 ‐> I5, confidence = 2/4 = 50% – I1 ^I5 ‐> I2, confidence = 2/2 = 100% – I2 ^I5 ‐> I1, confidence = 2/2 = 100% – I1 ‐> I2 ^ I5, confidence = 2/6 = 33% – I2 ‐> I1 ^ I5, confidence = 2/7 = 29% , / – I5 ‐> I1 ^ I2, confidence = 2/2 = 100%
1
TID Items_bought T100 {M, O, N, K, E, Y} T200 {D O N K E Y} T200 {D, O, N, K, E, Y} T300 {M, A, K, E} T400 {M, U, C, K, Y}
T500 {C, O, O, K, I, E}
3 2 1
4
5
6
7
8
9
Table 5.6 Task‐relevant data D.
10
F ig ure 5.10 A c o nc ept hierarc hy fo r AllE
lec tro nic s c o mputer items.
11
F ig ure 5.11 Multilevel mining with unifo rm suppo rt.
12
F ig ure 5.12 Multile ve l mining with re duc e d suppo rt.
13
Multile ve l mining with gro up-based suppo rt.
14
15
16
17
F ig ure 5.14 A 2-D grid fo r tuples re presenting c usto mers who purc hase g
g p p g p high-de finitio n T Vs. age(X,34) ∧ income(X,”31‐40K”) ⇒ buys(X,”high resolution TV”) age(X,35) ∧ income(X,”31‐40K”) ⇒ buys(X,”high resolution TV”) age(X,34) ∧ income(X,”41‐50K”) ⇒ buys(X,”high resolution TV”) age(X,35) ∧ income(X,”40‐50K”) ⇒ buys(X,”high resolution TV”) age(X,”34‐35”) ∧ income(X,”31‐50K”) ⇒ buys(X,”high resolution TV”)
18
19
20
21
22
23
Table 5.7 A 2 X 2 contingency table summarizing the transactions with respect to d id h game and video purchases.
) ( ) | ( ) ( B A conf A B P B A P lift ⇒ = = ∪ = ) ( ) ( ) ( ) ( B B P B P A P lift s u p
P({game}) = P({video}) = P({video}) = P({game,video}) =
24
2 2
= =
c i r j ij ij ij
1 2
j
j i ij
with the cell is negatively correlated.
25
Table 5.8 The above contingency table, now shown with the expected value.
= − = Expected Expected Observed
2 2
) ( χ
26