parameter free mining of non redundant
play

Parameter-free Mining of Non-redundant Discriminative Itemsets - PowerPoint PPT Presentation

An Exhaustive Covering Approach to Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University DaWaK-16 1 Outline Background Our propsal Experiments DaWaK-16 2 Outline Background Our


  1. An Exhaustive Covering Approach to Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University DaWaK-16 1

  2. Outline • Background • Our propsal • Experiments DaWaK-16 2

  3. Outline • Background • Our propsal • Experiments DaWaK-16 3

  4. Background: Discriminative Patterns (1) • Discriminative patterns: – Show differences between two groups (classes) – Used for: • Characterizing the positive class • Building more precise classifiers Discriminative pattern x milk=True  aquatic=False  + + :Positive class – :Negative class Positive class Class labels DaWaK-16 4

  5. Background: Discriminative Patterns (2) • Discriminative patterns tend to be more meaningful than frequent patterns (thanks to class labels) • Are class labels always available? – Comparing groups is a standard starting point in data analysis – Clustering can find groups (classes)  Cluster labeling Clusters labeled with discriminative patterns Clusters .... Original data 2. Discriminative 1. Clustering .... pattern mining .... DaWaK-16 5

  6. Background: Discriminative Patterns (3) • Quality score: Measures the overlap between pattern x and positive class c c c x x Quality is high Quality is low • Most of popular quality scores are not anti-monotonic: – Confidence, Lift – Support difference, Weighted relative accuracy, Leverage – F-score, Dice, Jaccard – ...  Branch & bound pruning is often used [Morishita+ 00][Zimmarmann+ 09][Nijssen+ 09] DaWaK-16 6

  7. Background: Coping with redundancy (1) • Example : Item A is relevant to the positive class  Patterns containing A tend to be top-ranked in the candidate list (most of them are redundant) Top-15 patterns (+1 due to tie score) TIDs Dataset Rank Pattern F-score Covered TID Class TID Class Transaction Transaction 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 1 1 + + {A, B, D, E} {A, B, D, E} 3 {A} 0.67 1, 2, 3, 4 2 2 + + {A, B, C, D, E} {A, B, C, D, E} 3 {A, B} 0.67 1, 2, 4 Positive 3 3 + + {A, C, D, E} {A, C, D, E} 5 {A, D, E} 0.60 1, 2, 3 Transactions 4 4 + + {A, B, C} {A, B, C} 5 {A, E} 0.60 1, 2, 3 5 5 + + {B} {B} 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 6 6 – – {A, B, D, E} {A, B, D, E} 8 {A, C, D} 0.57 2, 3 7 7 – – {B, C, D, E} {B, C, D, E} 8 {A, C, D, E} 0.57 2, 3 8 8 – – {C, D, E} {C, D, E} Negative 8 {A, C, E} 0.57 2, 3 Transactions 9 9 – – {A, D, E} {A, D, E} 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 10 10 – – {A, D} {A, D} 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 7

  8. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] DaWaK-16 8

  9. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Closedness : 3 {A} 0.67 1, 2, 3, 4 For patterns covering 3 {A, B} 0.67 1, 2, 4 the same (positive) 5 {A, D, E} 0.60 1, 2, 3 transactions, 5 {A, E} 0.60 1, 2, 3 pick the largest one 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 9

  10. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Closedness : 3 {A} 0.67 1, 2, 3, 4 For patterns covering 3 {A, B} 0.67 1, 2, 4 the same (positive) 5 {A, D, E} 0.60 1, 2, 3 transactions, 5 {A, E} 0.60 1, 2, 3 pick the largest one 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 10

  11. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 16 patterns  8 patterns 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 11

  12. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Productivity : 3 {A} 0.67 1, 2, 3, 4 If a super-pattern has no 3 {A, B} 0.67 1, 2, 4 higher quality, remove it 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 12

  13. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Productivity : 3 {A} 0.67 1, 2, 3, 4 If a super-pattern has no 3 {A, B} 0.67 1, 2, 4 higher quality, remove it 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 13

  14. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 16 patterns  4 patterns 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 14

  15. Background: Coping with redundancy (2) • Set-inclusion-based constraints – Productivity + Closedness [Kameya+ 13] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 16 patterns  3 patterns 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 15

  16. Background: Coping with redundancy (3) • The best-covering constraint – In the same spirit of the HCC (highest confidence covering) constraint in HARMONY [Wang+ 05] TIDs Rank Pattern F-score Best-covering : Covered Every pattern must be 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 the best to at least one 3 {A} 0.67 1, 2, 3, 4 positive transaction 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 16

  17. Background: Coping with redundancy (3) • The best-covering constraint – In the same spirit of the HCC (highest confidence covering) constraint in HARMONY [Wang+ 05] TIDs Rank Pattern F-score Best-covering : Covered Every pattern must be 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 the best to at least one 3 {A} 0.67 1, 2, 3, 4 positive transaction 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend