preference based pattern mining
play

Preference-based Pattern Mining Bruno Crmilleux, Marc Plantevit, - PowerPoint PPT Presentation

Preference-based Pattern Mining Bruno Crmilleux, Marc Plantevit, Arnaud Soulet Nancy, France - November 16, 2017 Introduction 2/97 Who are we? Bruno Crmilleux, Professor, Univ. Caen, France. Marc Plantevit, Associate Professor, Univ.


  1. Frequent itemset mining Problem Definition Given the objects in O described with the Boolean attributes in A , listing every itemset having a frequency above a given threshold µ ∈ N . Output: every X ⊆ A such that there are at least µ objects having all attributes in X . R. Agrawal; T. Imielinski; A. Swami: Mining Association Rules Between Sets of Items in Large Databases, SIGMOD, 1993. 16/97

  2. Frequent itemset mining: illustration Specifying a minimal absolute frequency µ = 2 objects (or, equivalently, a minimal relative frequency of 50%). a 1 a 2 a 3 o 1 × × × × × o 2 o 3 × × o 4 17/97

  3. Frequent itemset mining: illustration Specifying a minimal absolute frequency µ = 2 objects (or, equivalently, a minimal relative frequency of 50%). a 1 a 2 a 3 o 1 × × × The frequent itemsets are: ∅ (4), { a 1 } (2), × × o 2 { a 2 } (3), { a 3 } (2) and { a 1 , a 2 } (2). o 3 × × o 4 17/97

  4. Inductive database vision Querying data: { d ∈ D | q ( d , D ) } where: D is a dataset (tuples), q is a query. 18/97

  5. Inductive database vision Querying patterns: { X ∈ P | Q ( X , D ) } where: D is the dataset, P is the pattern space, Q is an inductive query. 18/97

  6. Inductive database vision Querying the frequent itemsets: { X ∈ P | Q ( X , D ) } where: D is the dataset, P is the pattern space, Q is an inductive query. 18/97

  7. Inductive database vision Querying the frequent itemsets: { X ∈ P | Q ( X , D ) } where: D is a subset of O × A , i. e., objects described with Boolean attributes, P is the pattern space, Q is an inductive query. 18/97

  8. Inductive database vision Querying the frequent itemsets: { X ∈ P | Q ( X , D ) } where: D is a subset of O × A , i. e., objects described with Boolean attributes, P is 2 A , Q is an inductive query. 18/97

  9. Inductive database vision Querying the frequent itemsets: { X ∈ P | Q ( X , D ) } where: D is a subset of O × A , i. e., objects described with Boolean attributes, P is 2 A , Q is ( X , D ) �→ |{ o ∈ O | { o } × X ⊆ D}| ≥ µ . 18/97

  10. Inductive database vision Querying the frequent itemsets: { X ∈ P | Q ( X , D ) } where: D is a subset of O × A , i. e., objects described with Boolean attributes, P is 2 A , Q is ( X , D ) �→ f ( X , D ) ≥ µ . 18/97

  11. Inductive database vision Querying the frequent itemsets: { X ∈ P | Q ( X , D ) } where: D is a subset of O × A , i. e., objects described with Boolean attributes, P is 2 A , Q is ( X , D ) �→ f ( X , D ) ≥ µ . Listing the frequent itemsets is NP-hard. 18/97

  12. Pattern flooding µ = 2 O a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 a 14 a 15 o 1 × × × × × o 2 × × × × × o 3 × × × × × o 4 × × × × × o 5 × × × × × o 6 × × × × × o 7 × × × × × o 8 × × × × × How many frequent patterns? 19/97

  13. Pattern flooding µ = 2 O a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 a 14 a 15 o 1 × × × × × o 2 × × × × × o 3 × × × × × o 4 × × × × × o 5 × × × × × o 6 × × × × × o 7 × × × × × o 8 × × × × × How many frequent patterns? 1 + ( 2 5 − 1 ) × 3 = 94 patterns 19/97

  14. Pattern flooding µ = 2 O a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 a 14 a 15 o 1 × × × × × o 2 × × × × × o 3 × × × × × o 4 × × × × × o 5 × × × × × o 6 × × × × × o 7 × × × × × o 8 × × × × × How many frequent patterns? 1 + ( 2 5 − 1 ) × 3 = 94 patterns but actually 4 (potentially) interesting ones: {} , { a 1 , a 2 , a 3 , a 4 , a 5 } , { a 6 , a 7 , a 8 , a 9 , a 10 } , { a 11 , a 12 , a 13 , a 14 , a 15 } . 19/97

  15. Pattern flooding µ = 2 O a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10 a 11 a 12 a 13 a 14 a 15 o 1 × × × × × o 2 × × × × × o 3 × × × × × o 4 × × × × × o 5 × × × × × o 6 × × × × × o 7 × × × × × o 8 × × × × × How many frequent patterns? 1 + ( 2 5 − 1 ) × 3 = 94 patterns but actually 4 (potentially) interesting ones: {} , { a 1 , a 2 , a 3 , a 4 , a 5 } , { a 6 , a 7 , a 8 , a 9 , a 10 } , { a 11 , a 12 , a 13 , a 14 , a 15 } . ☞ the need to focus on a condensed representation of frequent patterns. Toon Calders, Christophe Rigotti, Jean-François Boulicaut: A Survey on Condensed Representations for Frequent Sets. Constraint-Based Mining and 19/97 Inductive Databases 2004: 64-80.

  16. Closed and Free Patterns Equivalence classes based on support. ABC O1,O2, O A B C × × × o 1 AB AC BC O1,O2, O1,O2,O3,O4, o 2 × × × O1,O2, × × o 3 A B C o 4 × × O1,O2,O3,O4, O1,O2,O3,O4,O5 O1,O2, o 5 × ø O1,O2,O3,O4,O5 20/97

  17. Closed and Free Patterns Equivalence classes based on support. ABC O1,O2, O A B C × × × o 1 AB AC BC O1,O2, O1,O2,O3,O4, o 2 × × × O1,O2, × × o 3 A B C o 4 × × O1,O2,O3,O4, O1,O2,O3,O4,O5 O1,O2, o 5 × ø O1,O2,O3,O4,O5 Closed patterns are maximal element of each equivalence class (Bastide et al., SIGKDD Exp. 2000) : ABC , BC , and C . Generators or Free patterns are minimal elements (not necessary unique) of each equivalent class (Boulicaut et al, DAMI 2003) : {} , A and B A strong intersection with Formal Concept Analysis (Ganter and Wille, 1999) . 20/97

  18. Few researchers (in DM) are aware about this strong intersection. transactional DB ≡ formal context is a triple K = ( G , M , I ) , where G is a set of objects, M is a set of attributes, and I ⊆ G × M is a binary relation called incidence that expresses which objects have which attributes. closed itemset ≡ concept intent FCA gives the mathematical background about closed patterns. Algorithms: LCM is an efficient implementation of Close By One. (Sergei O. Kuznetsov, 1993). 21/97

  19. (FIMI Workshop@ICDM, 2003 and 2004) The FIM Era: during more than a decade, only ms were worth it! Even if the complete collection of frequent itemsets is known useless, the main objective of many algorithms is to earn ms according to their competitors!! What about the end-user (and the pattern interestingness)? ➜ partially answered with constraints. 22/97

  20. Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible 23/97

  21. Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually. 23/97

  22. Pattern constraints Constraints are needed for: only retrieving patterns that describe an interesting subgroup of the data making the extraction feasible Constraint properties are used to infer constraint values on (many) patterns without having to evaluate them individually. ➜ They are defined up to the partial order ⪯ used for listing the patterns 23/97

  23. Search space traversal ABC AB AC BC Levelwise enumeration vs depth-first enumeration. A B C ø Whatever the enumeration principles, we have to derive some pruning properties from the constraints. 24/97

  24. Enumeration strategy Binary partition: the element ’ a ’ is enumerated abcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de a b c d e 25/97

  25. Enumeration strategy Binary partition: the element ’ a ’ is enumerated R ∨ R ∨ R ∧ ∪ { a } a ∈ R ∨ \ R ∧ R ∨ \ { a } R ∧ R∧ 25/97

  26. (Anti-)Monotone Constraints Monotone constraint Anti-monotone constraint ∀ φ 1 ⪯ φ 2 , C ( φ 1 , D ) ⇒ C ( φ 2 , D ) ∀ φ 1 ⪯ φ 2 , C ( φ 2 , D ) ⇒ C ( φ 1 , D ) abcde abcde abcd abce abde acde bcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de ab ac ad ae bc bd be cd ce de a b c d e a b c d e C ( φ, D ) ≡ b ∈ φ ∨ c ∈ φ C ( φ, D ) ≡ a ̸∈ φ ∧ c ̸∈ φ 26/97

  27. Constraint evaluation Monotone constraint R ∨ C ( R ∨ , D ) is false R ∧ 27/97

  28. Constraint evaluation Monotone constraint R ∨ C ( R ∨ , D ) is false empty R ∧ 27/97

  29. Constraint evaluation Anti-monotone constraint R ∨ C ( R ∧ , D ) is false R ∧ 27/97

  30. Constraint evaluation Anti-monotone constraint R ∨ empty C ( R ∧ , D ) is false R ∧ 27/97

  31. Convertible Constraints Convertible constraints (Pei et al., DAMI 2004) ⪯ is extended to the prefix order ≤ so that ∀ φ 1 ≤ φ 2 , C ( φ 2 , D ) ⇒ C ( φ 1 , D ) abcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde ab ac ad ae bc bd be cd ce de a b c d e C ( φ, w ) ≡ avg ( w ( φ )) > σ w ( a ) ≥ w ( b ) ≥ w ( c ) ≥ w ( d ) ≥ w ( e ) 28/97

  32. Loose AM Constraints Loose AM constraints C ( φ, D ) ⇒ ∃ e ∈ φ : C ( φ \ { e } , D ) abcde abcd abce abde acde bcde abc abd abe acd ace ade bcd bce bde cde C ( φ, w ) ≡ var ( w ( φ )) ≤ σ ab ac ad ae bc bd be cd ce de a b c d e Bonchi and Lucchese – DKE 2007 Uno, ISAAC07 29/97

  33. Examples v ∈ P M P ⊇ S M P ⊆ S AM min ( P ) ≤ σ AM min ( P ) ≥ σ M max ( P ) ≤ σ M max ( P ) ≤ σ AM range ( P ) ≤ σ AM range ( P ) ≥ σ M avg ( P ) θσ, θ ∈ {≤ , = , ≥} Convertible var ( w ( φ )) ≤ σ LAM 30/97

  34. A larger class of constraints Some constraints can be decomposed into several pieces that are either monotone or anti-monotone. Piecewise monotone and anti-monotone constraints L. Cerf, J. Besson, C. Robardet, J-F. Boulicaut: Closed patterns meet n-ary relations. TKDD 3(1) (2009) Primitive-based constraints A.Soulet, B. Crémilleux: Mining constraint-based patterns using automatic relaxation. Intell. Data Anal. 13(1): 109-133 (2009) Projection-antimonotonicity A. Buzmakov, S. O. Kuznetsov, A.Napoli: Fast Generation of Best Interval Patterns for Nonmonotonic Constraints. ECML/PKDD (2) 2015: 157-172 31/97

  35. An example ∀ e , w ( e ) ≥ 0 ∑ e ∈ φ w ( e ) C ( φ, w ) ≡ avg ( w ( φ )) > σ ≡ > σ . | φ | C ( φ, D ) is piecewise monotone and anti-monotone with ∑ e ∈ φ 1 w ( e ) f ( φ 1 , φ 2 , D ) = | φ 2 | ∀ x ⪯ y , ∑ ∑ e ∈ y w ( e ) e ∈ x w ( e ) f 1 ,φ is monotone: f ( x , φ 2 , D ) = > σ ⇒ > σ | φ 2 | | φ 2 | f 2 ,φ is anti-monotone: ∑ ∑ e ∈ φ 1 w ( e ) e ∈ φ 1 w ( e ) f ( φ 1 , y , D ) = > σ ⇒ > σ | y | | x | 32/97

  36. Piecewise constraint exploitation R ∨ Evaluation ∑ e ∈R∨ w ( e ) If f ( R ∨ , R ∧ , D ) = |R ∧ | R ∧ Propagation ∃ e ∈ R ∨ \ R ∧ such that f ( R ∨ \ { e } , R ∧ , D ) ≤ σ , then e is moved in R ∧ ∃ e ∈ R ∨ \ R ∧ such that f ( R ∨ , R ∧ ∪ { e } , D ) ≤ σ , then e is removed from R ∨ 33/97

  37. Piecewise constraint exploitation R ∨ Evaluation ∑ e ∈R∨ w ( e ) empty If f ( R ∨ , R ∧ , D ) = |R ∧ | ≤ σ then R is empty. R ∧ Propagation ∃ e ∈ R ∨ \ R ∧ such that f ( R ∨ \ { e } , R ∧ , D ) ≤ σ , then e is moved in R ∧ ∃ e ∈ R ∨ \ R ∧ such that f ( R ∨ , R ∧ ∪ { e } , D ) ≤ σ , then e is removed from R ∨ 33/97

  38. Tight Upper-bound computation Convex measures can be taken into account by computing some upper bounds with R ∧ and R ∨ . Branch and bound enumeration Shinichi Morishita, Jun Sese: Traversing Itemset Lattice with Statistical Metric Pruning. PODS 2000: 226-236 Studying constraints ≡ looking for efficient and effective upper bound in a branch and bound algorithm ! 34/97

  39. Toward declarativity Why declarative approaches? for each problem, do not write a solution from scratch Declarative approaches: CP approaches (Khiari et al., CP10, Guns et al., TKDE 2013) SAT approaches (Boudane et al., IJCAI16, Jabbour et al., CIKM13) ILP approaches (Mueller et al, DS10, Babaki et al., CPAIOR14, Ouali et al. IJCAI16) ASP approaches (Gebser et al., IJCAI16) 35/97

  40. Thresholding problem s n r e t t a p f o r e b m u n threshold A too stringent threshold: trivial patterns A too weak threshold: too many patterns, unmanageable and diversity not necessary assured. Some attempts to tackle this issue: Interestingness is not a dichotomy! [BB05] Taking benefit from hierarchical relationships [HF99, DPRB14] But setting thresholds remains an issue in pattern mining. 36/97

  41. Constraint-based pattern mining: concluding remarks how to fix thresholds? how to handle numerous patterns including non-informative patterns? how to get a global picture of the set of patterns? how to design the proper constraints/preferences? 37/97

  42. Pattern mining as an optimization problem 38/97

  43. Pattern mining as an optimization problem performance issue quality issue the more, the better the less, the better data-driven user-driven In this part: preferences to express user’s interests focusing on the best patterns: dominance relation, optimal pattern sets, subjective interest 39/97

  44. Addressing pattern mining tasks with user preferences Idea: a preference expresses a user’s interest (no required threshold) Examples based on measures/dominance relation: “the higher the frequency, growth rate and aromaticity are, the better the patterns” “I prefer pattern X 1 to pattern X 2 if X 1 is not dominated by X 2 according to a set of measures” ➥ measures/preferences: a natural criterion for ranking patterns and presenting the “best” patterns 40/97

  45. Preference-based approaches in this tutorial in this part : preferences are explicit (typically given by the user depending on his/her interest/subjectivity) in the last part : preferences are implicit quantitative/qualitative preferences: quantitative:  constraint - based data mining : frequency , size , . . .   measures background knowledge : price , weight , aromaticity , . . .  statistics : entropy , pvalue , . . .  qualitative: “I prefer pattern X 1 to pattern X 2 ” (pairwise comparison between patterns). With qualitative preferences: two patterns can be incomparable. 41/97

  46. Measures Many works on: interestingness measures (Geng et al. ACM Computing Surveys06) utility functions (Yao and Hamilton DKE06) statistically significant rules (Hämäläinen and Nykänen ICDM08) Examples: area ( X ) = frequency ( X ) × size ( X ) (tiling: surface) D× frequency ( X 1 X 2 ) lift ( X 1 → X 2 ) = frequency ( X 2 ) × frequency ( X 1 ) utility functions: utility of the mined patterns (e.g. weighted items, weighted transactions). An example: No of Product × Product profit 42/97

  47. Putting the pattern mining task to an optimization problem The most interesting patterns according to measures/preferences: free/closed patterns (Boulicaut et al. DAMI03, Bastide et al. SIGKDD Explorations00) ➥ given an equivalent class, I prefer the shortest/longest patterns one measure : top- k patterns (Fu et al. Ismis00, Jabbour et al. ECML/PKDD13) several measures : how to find a trade-off between several criteria? ➥ skyline patterns (Cho et al. IJDWM05, Soulet et al. ICDM’11, van Leeuwen and Ukkonen ECML/PKDD13) dominance programming (Negrevergne et al. ICDM13) , optimal patterns (Ugarte et al. ICTAI15) subjective interest/interest according to a background knowledge (De Bie DAMI2011) 43/97

  48. top- k pattern mining: an example Goal: finding the k patterns maximizing an interestingness measure. Tid Items the 3 most frequent patterns: t 1 B E F B , E , BE a B C D t 2 t 3 A E F ➥ easy due to the anti-monotone A B C D E t 4 t 5 B C D E property of frequency B C D E F t 6 t 7 A B C D E F a Other patterns have a frequency of 5: C , D , BC , BD , CD , BCD 44/97

  49. top- k pattern mining: an example Goal: finding the k patterns maximizing an interestingness measure. Tid Items the 3 most frequent patterns: t 1 B E F B , E , BE a B C D t 2 t 3 A E F ➥ easy due to the anti-monotone A B C D E t 4 t 5 B C D E property of frequency B C D E F t 6 t 7 A B C D E F the 3 patterns maximizing area: BCDE , BCD , CDE ➥ branch & bound (Zimmermann and De Raedt MLJ09) a Other patterns have a frequency of 5: C , D , BC , BD , CD , BCD 44/97

  50. top- k pattern mining an example of pruning condition A : lowest value of area for the patterns in C and top- k patterns according to area , k = 3 L : size of the longest transaction in D (here: L = 6) a pattern X must satisfy frequency ( X ) ≥ A Tid Items L to be inserted in C and t 1 B E F t 2 B C D ➥ pruning condition according to the t 3 A E F frequency (thus anti-monotone) t 4 A B C D E t 5 B C D E Example with a depth first search approach: t 6 B C D E F A B C D E F t 7 initialization: C and = { B , BE , BEC } ( area ( BEC ) = 12, area ( BE ) = 10, area ( B ) = 6) Principle: ➥ frequency ( X ) ≥ 6 6 new candidate BECD : C and = { BE , BEC , BECD } C and : the current set of the k best candidate patterns ( area ( BECD ) = 16, area ( BEC ) = 12, area ( BE ) = 10) ➥ frequency ( X ) ≥ 10 6 which is more efficient when a candidate pattern is than frequency ( X ) ≥ 6 inserted in C and , a more efficient 6 pruning condition is deduced new candidate BECDF . . . 45/97

  51. top- k pattern mining in a nutshell Drawbacks: Advantages: complete resolution is costly, compact sometimes heuristic search (beam search) (van Leeuwen and Knobbe DAMI12) threshold free diversity issue: top- k patterns are often very similar best patterns several criteria must be aggregated ➥ skylines patterns: a trade-off between several criteria 46/97

  52. Skypatterns (Pareto dominance) Notion of skylines (database) in pattern mining (Cho at al. IJDWM05, Papadopoulos et al. DAMI08, Soulet et al. ICDM11, van Leeuwen and Ukkonen ECML/PKDD13) Tid Items B E F t 1 t 2 B C D A E F t 3 t 4 A B C D E B C D E t 5 t 6 B C D E F t 7 A B C D E F Patterns freq area AB 2 4 AEF 2 6 B 6 6 |L I | = 2 6 , but only 4 skypatterns 4 16 BCDE 2 8 CDEF S ky ( L I , { freq , area } ) = { BCDE , BCD , B , E } E 6 6 . . . . . . . . . 47/97

  53. Skylines vs skypatterns Problem Skylines Skypatterns a set of a set of Mining task non dominated non dominated transactions patterns Size of the | D | | L | space search domain a lot of works very few works D set of transactions usually: | D | << | L | L set of patterns 48/97

  54. Skypatterns: how to process? A naive enumeration of all candidate patterns ( L I ) and then comparing them is not feasible. . . Two approaches: take benefit from the pattern condensed representation according 1 to the condensable measures of the given set of measures M skylineability to obtain M ′ ( M ′ ⊆ M ) giving a more concise pattern condensed representation the pattern condensed representation w.r.t. M ′ is a superset of the representative skypatterns w.r.t. M which is (much smaller) than L I . use of the dominance programming framework (together with 2 skylineability) 49/97

  55. Dominance programming Dominance : a pattern is optimal if it is not dominated by another. Skypatterns: dominance relation = Pareto dominance Principle: 1 starting from an initial pattern s 1 searching for a pattern s 2 such that s 1 is not preferred to s 2 searching for a pattern s 3 such that s 1 and s 2 are not preferred to s 3 . . . until there is no pattern satisfying the whole set of constraints Solving: 2 constraints are dynamically posted during the mining step Principle : increasingly reduce the dominance area by processing pairwise comparisons between patterns. Methods using Dynamic CSP (Negrevergne et al. ICDM13, Ugarte et al. CPAIOR14, AIJ 2017) . 50/97

  56. Dominance programming: example of the skypatterns Trans. Items t 1 B E F t 2 B C D t 3 A E F area t 4 A B C D E t 5 B C D E t 6 B C D E F t 7 A B C D E F freq M = { freq , area } q ( X ) ≡ closed M ′ ( X ) Candidates = 51/97

  57. Dominance programming: example of the skypatterns Trans. Items t 1 B E F t 2 B C D t 3 A E F area t 4 A B C D E t 5 B C D E t 6 B C D E F t 7 A B C D E F freq M = { freq , area } q ( X ) ≡ closed M ′ ( X ) Candidates = { BCDEF , � �� � s 1 51/97

  58. Dominance programming: example of the skypatterns Trans. Items t 1 B E F t 2 B C D t 3 A E F area t 4 A B C D E t 5 B C D E t 6 B C D E F t 7 A B C D E F freq M = { freq , area } q ( X ) ≡ closed M ′ ( X ) ∧¬ ( s 1 ≻ M X ) Candidates = { BCDEF , � �� � s 1 51/97

  59. Dominance programming: example of the skypatterns Trans. Items t 1 B E F t 2 B C D t 3 A E F area t 4 A B C D E t 5 B C D E t 6 B C D E F t 7 A B C D E F freq M = { freq , area } q ( X ) ≡ closed M ′ ( X ) ∧¬ ( s 1 ≻ M X ) Candidates = { BCDEF , BEF , � �� � ���� s 1 s 2 51/97

  60. Dominance programming: example of the skypatterns Trans. Items t 1 B E F t 2 B C D t 3 A E F area t 4 A B C D E t 5 B C D E t 6 B C D E F t 7 A B C D E F freq M = { freq , area } q ( X ) ≡ closed M ′ ( X ) ∧¬ ( s 1 ≻ M X ) ∧¬ ( s 2 ≻ M X ) Candidates = { BCDEF , BEF , � �� � ���� s 1 s 2 51/97

  61. Dominance programming: example of the skypatterns Trans. Items t 1 B E F t 2 B C D t 3 A E F area t 4 A B C D E t 5 B C D E t 6 B C D E F t 7 A B C D E F | L I | = 2 6 = 64 patterns freq 4 skypatterns M = { freq , area } q ( X ) ≡ closed M ′ ( X ) ∧¬ ( s 1 ≻ M X ) ∧¬ ( s 2 ≻ M X ) ∧¬ ( s 3 ≻ M X ) ∧ ¬ ( s 4 ≻ M X ) ∧ ¬ ( s 5 ≻ M X ) ∧ ¬ ( s 6 ≻ M X ) ∧ ¬ ( s 7 ≻ M X ) Candidates = { BCDEF } , BEF , EF , BCDE , BCD , B , E � �� � ���� ���� � �� � ���� ���� ���� s 1 s 2 s 3 s 4 s 5 s 6 s 7 � �� � Sky( L I , M ) 51/97

  62. Dominance programming: to sum up The dominance programming framework encompasses many kinds of patterns: dominance relation maximal patterns inclusion closed patterns inclusion at same frequency order induced by top- k patterns the interestingness measure skypatterns Pareto dominance maximal patterns ⊆ closed patterns top- k patterns ⊆ skypatterns 52/97

  63. A step further a preference is defined by any property between two patterns (i.e., pairwise comparison) and not only the Pareto dominance relation: measures on a set of patterns, overlapping between patterns, coverage,. . . ➥ preference-based optimal patterns In the following: (1) define preference-based optimal patterns, (2) show how many tasks of local patterns fall into this framework, (3) deal with optimal pattern sets. 53/97

  64. Preference-based optimal patterns A preference ▷ is a strict partial order relation on a set of patterns S . x ▷ y indicates that x is preferred to y (Ugarte et al. ICTAI15) : a pattern x is optimal (OP) according to ▷ iff ̸ ∃ y 1 , . . . y p ∈ S , ∀ 1 ≤ j ≤ p , y j ▷ x (a single y is enough for many data mining tasks) Characterisation of a set of OPs: a set of patterns: { } x ∈ S | fundamental ( x ) ∧ ̸ ∃ y 1 , . . . y p ∈ S , ∀ 1 ≤ j ≤ p , y j ▷ x fundamental ( x ) : x must satisfy a property defined by the user for example: having a minimal frequency, being closed, . . . 54/97

  65. Local patterns: examples Large tiles Trans. Items c ( x ) ≡ freq ( x ) × size ( x ) ≥ ψ area t 1 B E F Example: freq ( BCD ) × size ( BCD ) = 5 × 3 = 15 t 2 B C D t 3 A E F t 4 A B C D E Frequent sub-groups t 5 B C D E c ( x ) ≡ freq ( x ) ≥ ψ freq ∧ ̸∃ y ∈ S : t 6 B C D E F T 1 ( y ) ⊇ T 1 ( x ) ∧ T 2 ( y ) ⊆ T 2 ( x ) t 7 A B C D E F ∧ ( T ( y ) = T ( x ) ⇒ y ⊂ x ) Skypatterns c ( x ) ≡ closed M ( x ) ∧ ̸∃ y ∈ S : y ≻ M x Frequent top-k patterns according to m c ( x ) ≡ freq ( x ) ≥ ψ freq S = L I ∧ ̸∃ y 1 , . . . , y k ∈ S : ∧ m ( y j ) > m ( x ) (Mannila et al. DAMI97) 1 ≤ j ≤ k 55/97

  66. Local (optimal) patterns: examples Large tiles Trans. Items c ( x ) ≡ freq ( x ) × size ( x ) ≥ ψ area t 1 B E F t 2 B C D t 3 A E F Frequent sub-groups t 4 A B C D E t 5 B C D E c ( x ) ≡ freq ( x ) ≥ ψ freq ∧̸∃ y ∈ S : t 6 B C D E F T 1 ( y ) ⊇ T 1 ( x ) ∧ T 2 ( y ) ⊆ T 2 ( x ) t 7 A B C D E F ∧ ( T ( y ) = T ( x ) ⇒ y ⊂ x ) Skypatterns c ( x ) ≡ closed M ( x ) ∧ ̸∃ y ∈ S : y ≻ M x Frequent top-k patterns according to m c ( x ) ≡ freq ( x ) ≥ ψ freq S = L I ∧ ̸∃ y 1 , . . . , y k ∈ S : ∧ m ( y j ) > m ( x ) (Mannila et al. DAMI97) 1 ≤ j ≤ k 56/97

  67. Pattern sets: sets of patterns Patterns sets (De Raedt and Zimmermann SDM07): sets of patterns satisfying a global viewpoint (instead of evaluating and selecting patterns based on their individual merits) Search space ( S ): local patterns versus pattern sets example: I = { A , B } all local patterns: S = L I = {∅ , A , B , AB } all pattern sets: S = 2 L I = {∅ , { A } , { B } , { AB } , { A , B } , { A , AB } , { B , AB } , { A , B , AB }} Many data mining tasks: classification (Liu et al. KDD98), clustering (Ester et al. KDD96), database tiling (Geerts et al. DS04), pattern summarization (Xin et al. KDD06), pattern teams (Knobbe and Ho PKDD06),. . . Many input (“preferences”) can be given by the user: coverage, overlapping between patterns, syntactical properties, measures, number of local patterns,. . . 57/97

  68. Coming back on OP (Ugarte et al. ICTAI15) Pattern sets of length k : examples Conceptual clustering (without overlapping) ∧ ∪ clus ( x ) ≡ closed ( x i ) ∧ T ( x i ) = T ∧ i ∈ [ 1 .. k ] i ∈ [ 1 .. k ] ∧ T ( x i ) ∩ T ( x j ) = ∅ i , j ∈ [ 1 .. k ] Conceptual clustering with optimisation c ( x ) ≡ clus ( x ) ∧ ̸∃ y ∈ 2 L I , min j ∈ [ 1 .. k ] { freq ( y j ) } > i ∈ [ 1 .. k ] { freq ( x i ) } min Pattern teams S ⊂ 2 L I c ( x ) ≡ size ( x ) = k ∧ ̸∃ y ∈ 2 L I , Φ( y ) > Φ( x ) (sets of length k ) 58/97

  69. Coming back on OP (Ugarte et al. ICTAI15) (Optimal) pattern sets of length k : examples Conceptual clustering (without overlapping) ∧ ∪ clus ( x ) ≡ closed ( x i ) ∧ T ( x i ) = T ∧ i ∈ [ 1 .. k ] i ∈ [ 1 .. k ] ∧ T ( x i ) ∩ T ( x j ) = ∅ i , j ∈ [ 1 .. k ] Conceptual clustering with optimisation c ( x ) ≡ clus ( x ) ∧ ̸∃ y ∈ 2 L I , min j ∈ [ 1 .. k ] { freq ( y j ) } > i ∈ [ 1 .. k ] { freq ( x i ) } min Pattern teams S ⊂ 2 L I ∧̸∃ y ∈ 2 L I , Φ( y ) > Φ( x ) c ( x ) ≡ size ( x ) = k (sets of length k ) 59/97

  70. Relax the dogma “must be optimal”: soft patterns Stringent aspect of the classical constraint-based pattern mining framework: what about a pattern which slightly violates a query? example: introducing softness in the skypattern mining: ➥ soft-skypatterns put the user in the loop to determine the best patterns w.r.t. his/her preferences Introducing softness is easy with Constraint Programming: ➥ same process: it is enough to update the posted constraints 60/97

  71. Many other works in this broad field Example: heuristic approaches pattern sets based on the Minimum Description Length principle: a small set of patterns that compress - Krimp (Siebes et al. SDM06) L ( D , CT ) : the total compressed size of the encoded database and the code table: L ( D , CT ) = L ( D | CT ) + L ( CT | D ) Many usages: characterizing the differences and the norm between given components in the data - DiffNorm (Budhathoki and Vreeken ECML/PKDD15) causal discovery (Budhathoki and Vreeken ICDM16) missing values (Vreeken and Siebes ICDM08) handling sequences (Bertens et al. KDD16) . . . and many other works on data compression/summarization (e.g. Kiernan and Terzi KDD08),. . . Nice results based on the frequency. How handling other measures? 61/97

  72. Pattern mining as an optimization problem: concluding remarks In the approaches indicated in this part: measures/preferences are explicit and must be given by the user. . . (but there is no threshold :-) diversity issue: top- k patterns are often very similar complete approaches (optimal w.r.t the preferences): ➥ stop completeness “Please, please stop making new algorithms for mining all patterns” Toon Calders (ECML/PKDD 2012, most influential paper award) A further step: interactive pattern mining (including the instant data mining challenge), implicit preferences and learning preferences 62/97

  73. Interactive pattern mining 63/97

  74. Interactive pattern mining Idea: “I don’t know what I am looking for, but I would definitely know if I see it.” ➠ preference acquisition In this part: Easier: no user-specified parameters (constraint, threshold or measure)! Better: learn user preferences from user feedback Faster: instant pattern discovery 64/97

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend