dynamic re ordering in mining top k
play

Dynamic Re-ordering in Mining Top- k Productive Discriminative - PowerPoint PPT Presentation

Dynamic Re-ordering in Mining Top- k Productive Discriminative Patterns Yoshitaka Kameya * and Kenya Ito Meijo University TAAI-17 1 Outline Background Dynamic re-ordering in mining top- k productive discriminative patterns


  1. Dynamic Re-ordering in Mining Top- k Productive Discriminative Patterns Yoshitaka Kameya * and Ken’ya Ito Meijo University TAAI-17 1

  2. Outline • Background • Dynamic re-ordering in mining top- k productive discriminative patterns • Experiments • Related work and Conclusion TAAI-17 2

  3. Outline • Background • Dynamic re-ordering in mining top- k productive discriminative patterns • Experiments • Related work and Conclusion TAAI-17 3

  4. Background: Discriminative Patterns (1) • Discriminative patterns: – Show differences between two groups (classes) – Used for: • Characterizing the positive class • Building more precise classifiers Discriminative pattern x milk=True  aquatic=False ➔ + + :Positive class – :Negative class Positive class Class labels TAAI-17 4

  5. Background: Discriminative Patterns (2) • Discriminative patterns tend to be more meaningful than frequent patterns (thanks to class labels) • Are class labels always available? – Comparing groups is a standard (and promising) starting point in data analysis – Clustering can find groups (classes) ! → Cluster labeling Clusters labeled with discriminative patterns Clusters .... Original data 2. Discriminative 1. Clustering .... pattern mining .... TAAI-17 5

  6. Background: Discriminative Patterns (3) • Quality score: Measures the overlap between pattern x and positive class c c x c x Quality is high Quality is low • Most of popular quality scores are not anti-monotonic: – Confidence, Lift – Support difference, Weighted relative accuracy, Leverage – F-score, Dice, Jaccard – ... ➔ Branch & bound pruning is often used [Morishita+ 00][Zimmermann+ 09][Nijssen+ 09] TAAI-17 6

  7. Background: B&B Pruning for Top- k Patterns • Suppose: we are visiting a pattern x in a depth-first search • We compute the upper bound U ( x ) of its quality R ( x ) ( U ( x ) = an optimistic estimate of qualities of x ’s extensions ) • We prune the subtree below x if U ( x ) < R ( z ) , where z is the k -th candidate  We are visiting here A B C D AB AC BC AD BD x =CD Candidate list ABD ABC ACD BCD for tentative top- k patterns ABCD 1 Prune the subtree below x Descending 2 w.r.t. quality if U ( x ) < R ( z ) ! : Optimistic estimate: z k U ( x ) TAAI-17 7

  8. Background: Suffix Enumeration Trees (1) Prefix enumeration tree:  A B C D AB AC AD BC BD CD ABC ABD ACD BCD ABCD Suffix enumeration tree:  A B C D AB AC BC AD BD CD ABD ABC ACD BCD ABCD TAAI-17 8

  9. Background: Suffix Enumeration Trees (1) • Beneficial for checking the productivity constraint in a depth-first search Prefix enumeration tree:  A B C D Productivity constraint: AB AC AD BC BD CD Every pattern must not be of less quality than its sub-pattern ABC ABD ACD BCD ABCD Suffix enumeration tree:  A B C D 0.4 0.3 0.5 AB AC BC AD BD CD 0.2 0.4 0.6 ABD ABC ACD BCD 0.5 ABCD ACD will be removed TAAI-17 9

  10. Background: Suffix Enumeration Trees (1) • Beneficial for checking the productivity constraint in a depth-first search Prefix enumeration tree:  → NOT “Sub - patterns first” A B C D AB AC AD BC BD CD ABC ABD ACD BCD Suffix enumeration tree: ABCD  → “Sub - patterns first” A B C D 0.4 0.3 0.5 AB AC BC AD BD CD 0.2 0.4 0.6 “Sub - patterns first” property: ABD ABC ACD BCD When visiting a pattern x , we have 0.5 ABCD already visited all sub-patterns of x TAAI-17 10

  11. Background: Suffix Enumeration Trees (2) • Also beneficial for effective B&B pruning Suppose: A = the highest quality item, B = the 2 nd highest quality item, Suffix enumeration tree: C = the 3 rd highest quality item,  … A B C D ➔ Items of higher quality are A only combined earlier AB AC BC AD BD CD A, B combined ➔ Patterns of higher quality ABD ABC ACD BCD would be visited earlier A, B, C combined ABCD A, B, C, D combined B&B pruning would be more aggressive! Candidate list 1 Descending We prune the subtree below x if U ( x ) < R ( z ) 2 w.r.t. ➔ Threshold in B&B pruning is higher : quality z if z has a higher quality k TAAI-17 11

  12. Outline ✓ Background • Dynamic re-ordering in mining top- k productive discriminative patterns – Basic idea – Justification • Experiments • Related work and Conclusion TAAI-17 12

  13. Outline ✓ Background • Dynamic re-ordering in mining top- k productive discriminative patterns – Basic idea – Justification • Experiments • Related work and Conclusion TAAI-17 13

  14. Our proposal: Basic idea (1) • Basic idea : Re-order sibling patterns dynamically according to their qualities ➔ Patterns of higher quality will be visited yet earlier ➔ B&B pruning will be yet more aggressive  siblings A B C D siblings AB AC BC AD BD CD siblings siblings ABD ACD BCD ABC ABCD TAAI-17 14

  15. Our proposal: Basic idea (2) • Example: Dataset – 10 transactions Class Transaction – Quality is measured by F-score + {A, B} + {A, C, E} Positive + {A, D} + {B, C, E} + {B, D} – {A, B, C} – {B, E} – {C, D} Negative – {C, D, E} – {E} TAAI-17 15

  16. Our proposal: Basic idea (4) • Example: Dataset – 10 transactions Class Transaction – Quality is measured by F-score + {A, B} + {A, C, E} Recall of {A} = 3 / 5 = 0.6 Positive + {A, D} Precision of {A} = 3 / 4 = 0.75 + {B, C, E} + {B, D} F-score of {A} = – {A, B, C} 2 * 0.6 * 0.75 / (0.6 + 0.75) = 0.67 – {B, E} – {C, D} Negative – Similarly, we have: – {C, D, E} • F-score of {A} = 0.67 – {E} • F-score of {B} = 0.6 • F-score of {C} = 0.4 • F-score of {D} = 0.44 Static ordering among patterns: • F-score of {E} = 0.4 A < B < D < C < E TAAI-17 16

  17. Our proposal: Basic idea (4) • Example: Dataset – 10 transactions Class Transaction Class Transaction – Quality is measured by F-score + + {A, B} {A, B} + + {A, C, E} {A, C , E } Suffix enumeration tree Positive + + {A, D} {A, D} under static ordering A < B < D < C < E: + + {B, C , E } {B, C, E} + + {B, D} {B, D}  – – {A, B, C } {A, B, C} – – {B, E } {B, E} A B D E C 0.4 0.6 0.44 0.4 – – { C , D} {C, D} Negative 0.67 – – { C , D, E } {C, D, E} AB AD BD BE CE AC BC AE 0.5 – – { E } {E} 0.29 0.33 0.33 0.29 0.33 0.29 0.29 ACE BCE (Note) Patterns that do not appear 0.33 0.33 in the dataset are hidden “Sub - patterns first” property holds and we have productive patterns {A}, {B}, {C, E}, {D}, {C}, {E} TAAI-17 17

  18. Our proposal: Basic idea (4) • Example: Dataset – 10 transactions Class Transaction – Quality is measured by F-score + {A, B} + {A, C, E} Suffix enumeration tree Positive + {A, D} with dynamic re-ordering: + {B, C, E} + {B, D}  – {A, B, C} – {B, E} A B D E C 0.4 0.6 0.44 0.4 – {C, D} Negative 0.67 – {C, D, E} AB AD BD AE BE AC BC CE – {E} 0.29 0.29 0.33 0.33 0.33 0.29 0.29 0.5 CAE CBE 0.33 0.33 {C, E} comes earlier than before and it is interesting to see the “sub - patterns first” property still holds ➔ Why? TAAI-17 18

  19. Outline ✓ Background • Dynamic re-ordering in mining top- k productive discriminative patterns ✓ Basic idea – Justification • Experiments • Related work and Conclusion TAAI-17 19

  20. Our proposal: Justification (1) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: Visiting order of a search =  topological order over a Hasse diagram  The search is “sub - patterns first”  A B C D AB AC AD BC BD CD ABD ACD BCD ABC ABCD TAAI-17 20

  21. Our proposal: Justification (2) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: Visiting order of a search =  topological order over a Hasse diagram  The search is “sub - patterns first” Stack Topological sorting by  right-to-left traverse A B C D AB AC AD BC BD CD ABD ACD BCD ABC ABCD TAAI-17 21

  22. Our proposal: Justification (2) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: Visiting order of a search =  topological order over a Hasse diagram  The search is “sub - patterns first” Stack Topological sorting by  right-to-left traverse A B C D AB AC AD BC BD CD ABD ACD BCD ABC BCD ABCD ABCD TAAI-17 22

  23. Our proposal: Justification (2) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: Visiting order of a search =  topological order over a Hasse diagram  The search is “sub - patterns first” Stack Topological sorting by  right-to-left traverse A B C D AB AC AD BC BD CD CD ABD ACD BCD ABC ACD BCD ABCD ABCD TAAI-17 23

  24. Our proposal: Justification (2) • “Sub - patterns first” property is assured even with dynamic re-ordering • Key observation: A B Visiting order of a search = AB  topological order over a Hasse diagram C  The search is “sub - patterns first” AC Stack BC  ABC A B C D D AD BD AB AC AD BC BD CD ABD CD ABD ACD BCD ABC ACD BCD ABCD ABCD TAAI-17 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend