An Exhaustive Covering Approach to Parameter-free Mining of Non-redundant Discriminative Itemsets
Yoshitaka Kameya Meijo University
1 DaWaK-16
Parameter-free Mining of Non-redundant Discriminative Itemsets - - PowerPoint PPT Presentation
An Exhaustive Covering Approach to Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University DaWaK-16 1 Outline Background Our propsal Experiments DaWaK-16 2 Outline Background Our
1 DaWaK-16
DaWaK-16 2
DaWaK-16 3
DaWaK-16 4
DaWaK-16 5
Clusters
Clusters labeled with discriminative patterns
.... .... ....
pattern mining
Original data
DaWaK-16 6
[Morishita+ 00][Zimmarmann+ 09][Nijssen+ 09]
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 TID Class Transaction 1 + {A, B, D, E} 2 + {A, B, C, D, E} 3 + {A, C, D, E} 4 + {A, B, C} 5 + {B} 6 – {A, B, D, E} 7 – {B, C, D, E} 8 – {C, D, E} 9 – {A, D, E} 10 – {A, D} TID Class Transaction 1 + {A, B, D, E} 2 + {A, B, C, D, E} 3 + {A, C, D, E} 4 + {A, B, C} 5 + {B} 6 – {A, B, D, E} 7 – {B, C, D, E} 8 – {C, D, E} 9 – {A, D, E} 10 – {A, D}
DaWaK-16 7
Dataset Positive Transactions Negative Transactions Top-15 patterns (+1 due to tie score)
DaWaK-16 8
DaWaK-16 9
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
DaWaK-16 10
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
DaWaK-16 11
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
DaWaK-16 12
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
DaWaK-16 13
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
DaWaK-16 14
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
DaWaK-16 15
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
DaWaK-16 16
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
DaWaK-16 17
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
DaWaK-16 18
Rank Pattern F-score TIDs Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4
Original dataset TID Class Transaction 1 + {A, B, D, E} 2 + {A, B, C, D, E} 3 + {A, C, D, E} 4 + {A, B, C} 5 + {B} 6 – {A, B, D, E} 7 – {B, C, D, E} 8 – {C, D, E} 9 – {A, D, E} 10 – {A, D}
DaWaK-16 19
DaWaK-16 20
DaWaK-16 21
DaWaK-16 22
DaWaK-16 23
DaWaK-16 24
DaWaK-16 25
DaWaK-16 26
DaWaK-16 27
DaWaK-16 28
DaWaK-16 29
DaWaK-16 30
DaWaK-16 31
DaWaK-16 32
DaWaK-16 33
DaWaK-16 34
DaWaK-16 35
DaWaK-16 36
Enumeration tree
All combinations
All combinations
All combinations
DaWaK-16 37
DaWaK-16 38
DaWaK-16 39
DaWaK-16 40
(empty)
DaWaK-16 41
(empty)
DaWaK-16 42
DaWaK-16 43
DaWaK-16 44
DaWaK-16 45
DaWaK-16 46
DaWaK-16 47
DaWaK-16 48
upper bound of x's quality
DaWaK-16 49
DaWaK-16 50
DaWaK-16 51
DaWaK-16 52
Dataset #Trans. #Items anneal 812 93 audiology 216 148 australian-credit 653 125 german-credit 1,000 112 heart-cleveland 296 95 hepatitis 137 68 hypothyroid 3,247 88 kr-vs-kp 3,196 73 Dataset #Trans. Items lymph 148 68 mushroom 8,124 110 primary-tumor 336 31 soybean 630 50 splice-1 3,190 287 tic-tac-toe 958 28 vote 435 48 zoo-1 101 36
DaWaK-16 53
Rank Pattern F-score 1 {odor=n, veil-type=p} 0.881 2 {gill-size=b, stalk-surface-above-ring=s, veil-type=p} 0.866 3 {gill-size=b, stalk-surface-below-ring=s, veil-type=p} 0.837 4 {gill-size=b, veil-type=p} 0.798 5 {stalk-surface-above-ring=s, veil-type=p} 0.776 6 {ring-type=p, veil-type=p} 0.771 7 {stalk-surface-below-ring=s, veil-type=p} 0.744 8 {veil-type=p} 0.682
Covers 4,112 out of 4,208 positive transactions Covers remaining 96 positive transactions
DaWaK-16 54
Rank Pattern F-score 1 {odor=n, veil-type=p} 0.881 2 {gill-size=b, stalk-surface-above-ring=s, veil-type=p} 0.866 3 {gill-size=b, stalk-surface-below-ring=s, veil-type=p} 0.837 4 {gill-size=b, veil-type=p} 0.798 5 {stalk-surface-above-ring=s, veil-type=p} 0.776 6 {ring-type=p, veil-type=p} 0.771 7 {stalk-surface-below-ring=s, veil-type=p} 0.744 8 {veil-type=p} 0.682 Rank Pattern F-score 1 {odor=n, veil-type=p} 0.881 2 {gill-size=b, stalk-surface-above-ring=s, veil-type=p} 0.866 3 {stalk-surface-above-ring=s, veil-type=p} 0.776
Specifying k < 5 loses information from 96 positive transactions! We only need 3 best-covering patterns to summarize the entire dataset Covers remaining 96 positive transactions
DaWaK-16 55
DaWaK-16 56
Productivity + Closedness + Top-10
DaWaK-16 57
DaWaK-16 58
(second)
DaWaK-16 59