Parameter-free Mining of Non-redundant Discriminative Itemsets - PowerPoint PPT Presentation

An Exhaustive Covering Approach to Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University DaWaK-16 1

Outline • Background • Our propsal • Experiments DaWaK-16 2

Outline • Background • Our propsal • Experiments DaWaK-16 3

Background: Discriminative Patterns (1) • Discriminative patterns: – Show differences between two groups (classes) – Used for: • Characterizing the positive class • Building more precise classifiers Discriminative pattern x milk=True  aquatic=False  + + :Positive class – :Negative class Positive class Class labels DaWaK-16 4

Background: Discriminative Patterns (2) • Discriminative patterns tend to be more meaningful than frequent patterns (thanks to class labels) • Are class labels always available? – Comparing groups is a standard starting point in data analysis – Clustering can find groups (classes)  Cluster labeling Clusters labeled with discriminative patterns Clusters .... Original data 2. Discriminative 1. Clustering .... pattern mining .... DaWaK-16 5

Background: Discriminative Patterns (3) • Quality score: Measures the overlap between pattern x and positive class c c c x x Quality is high Quality is low • Most of popular quality scores are not anti-monotonic: – Confidence, Lift – Support difference, Weighted relative accuracy, Leverage – F-score, Dice, Jaccard – ...  Branch & bound pruning is often used [Morishita+ 00][Zimmarmann+ 09][Nijssen+ 09] DaWaK-16 6

Background: Coping with redundancy (1) • Example : Item A is relevant to the positive class  Patterns containing A tend to be top-ranked in the candidate list (most of them are redundant) Top-15 patterns (+1 due to tie score) TIDs Dataset Rank Pattern F-score Covered TID Class TID Class Transaction Transaction 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 1 1 + + {A, B, D, E} {A, B, D, E} 3 {A} 0.67 1, 2, 3, 4 2 2 + + {A, B, C, D, E} {A, B, C, D, E} 3 {A, B} 0.67 1, 2, 4 Positive 3 3 + + {A, C, D, E} {A, C, D, E} 5 {A, D, E} 0.60 1, 2, 3 Transactions 4 4 + + {A, B, C} {A, B, C} 5 {A, E} 0.60 1, 2, 3 5 5 + + {B} {B} 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 6 6 – – {A, B, D, E} {A, B, D, E} 8 {A, C, D} 0.57 2, 3 7 7 – – {B, C, D, E} {B, C, D, E} 8 {A, C, D, E} 0.57 2, 3 8 8 – – {C, D, E} {C, D, E} Negative 8 {A, C, E} 0.57 2, 3 Transactions 9 9 – – {A, D, E} {A, D, E} 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 10 10 – – {A, D} {A, D} 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 7

Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] DaWaK-16 8

Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Closedness : 3 {A} 0.67 1, 2, 3, 4 For patterns covering 3 {A, B} 0.67 1, 2, 4 the same (positive) 5 {A, D, E} 0.60 1, 2, 3 transactions, 5 {A, E} 0.60 1, 2, 3 pick the largest one 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 9

Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Closedness : 3 {A} 0.67 1, 2, 3, 4 For patterns covering 3 {A, B} 0.67 1, 2, 4 the same (positive) 5 {A, D, E} 0.60 1, 2, 3 transactions, 5 {A, E} 0.60 1, 2, 3 pick the largest one 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 10

Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 16 patterns  8 patterns 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 11

Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Productivity : 3 {A} 0.67 1, 2, 3, 4 If a super-pattern has no 3 {A, B} 0.67 1, 2, 4 higher quality, remove it 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 12

Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 Productivity : 3 {A} 0.67 1, 2, 3, 4 If a super-pattern has no 3 {A, B} 0.67 1, 2, 4 higher quality, remove it 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 13

Background: Coping with redundancy (2) • Set-inclusion-based constraints – Closedness [Pasquier+ 99] – Productivity [Bayardo 00][Webb 07] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 16 patterns  4 patterns 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 14

Background: Coping with redundancy (2) • Set-inclusion-based constraints – Productivity + Closedness [Kameya+ 13] TIDs Rank Pattern F-score Covered 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 3 {A} 0.67 1, 2, 3, 4 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 16 patterns  3 patterns 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 15

Background: Coping with redundancy (3) • The best-covering constraint – In the same spirit of the HCC (highest confidence covering) constraint in HARMONY [Wang+ 05] TIDs Rank Pattern F-score Best-covering : Covered Every pattern must be 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 the best to at least one 3 {A} 0.67 1, 2, 3, 4 positive transaction 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 16

Background: Coping with redundancy (3) • The best-covering constraint – In the same spirit of the HCC (highest confidence covering) constraint in HARMONY [Wang+ 05] TIDs Rank Pattern F-score Best-covering : Covered Every pattern must be 1 {A, C} 0.75 2, 3, 4 2 {B} 0.73 1, 2, 4, 5 the best to at least one 3 {A} 0.67 1, 2, 3, 4 positive transaction 3 {A, B} 0.67 1, 2, 4 5 {A, D, E} 0.60 1, 2, 3 5 {A, E} 0.60 1, 2, 3 5 {C} 0.60 2, 3, 4 8 {A, B, C} 0.57 2, 4 8 {A, C, D} 0.57 2, 3 8 {A, C, D, E} 0.57 2, 3 8 {A, C, E} 0.57 2, 3 12 {A, D} 0.55 1, 2, 3 13 {A, B, D} 0.50 1, 2 13 {A, B, D, E} 0.50 1, 2 13 {A, B, E} 0.50 1, 2 13 {B, C} 0.50 2, 4 DaWaK-16 17

Parameter-free Mining of Non-redundant Discriminative Itemsets - PowerPoint PPT Presentation

An Exhaustive Covering Approach to Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University DaWaK-16 1 Outline Background Our propsal Experiments DaWaK-16 2 Outline Background Our

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for

Redundant Via Insertion Redundant Via Insertion with Wire Bending with Wire Bending Kuang-

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Kat - the language of calculations Mikus Vanags Parameter declaration explicitly function(x)

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Information theory " Information content of a message a boolean value

Learning objectives Understand the basic principles undelying A&T techniques Grasp

Using Component Redundancy for adaptive, self-optimising and self-healing Component-Based Systems

TRECVID-2006: Rushes Exploitation Task Alan Smeaton Dublin City University & Tzveta Ianeva

A Probabilistic Model of Redundancy in Information Extraction Doug Downey, Oren Etzioni, Stephen

TeCReVis: A Tool for Test Coverage and Test Redundancy Visualization Negar Koochakzadeh Vahid

Eliminating redundant columns from column generation subproblems using classical Benders cuts

Referee Report Guidelines The Basics of Referee Reports Referee reports are a critical part of

Parameter-free Mining of Non-redundant Discriminative Itemsets - PowerPoint PPT Presentation

An Exhaustive Covering Approach to Parameter-free Mining of Non-redundant Discriminative Itemsets Yoshitaka Kameya Meijo University DaWaK-16 1 Outline Background Our propsal Experiments DaWaK-16 2 Outline Background Our

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for

Redundant Via Insertion Redundant Via Insertion with Wire Bending with Wire Bending Kuang-

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Kat - the language of calculations Mikus Vanags Parameter declaration explicitly function(x)

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Information theory &quot; Information content of a message a boolean value

Learning objectives Understand the basic principles undelying A&amp;T techniques Grasp

Using Component Redundancy for adaptive, self-optimising and self-healing Component-Based Systems

TRECVID-2006: Rushes Exploitation Task Alan Smeaton Dublin City University &amp; Tzveta Ianeva

A Probabilistic Model of Redundancy in Information Extraction Doug Downey, Oren Etzioni, Stephen

TeCReVis: A Tool for Test Coverage and Test Redundancy Visualization Negar Koochakzadeh Vahid

Eliminating redundant columns from column generation subproblems using classical Benders cuts

Referee Report Guidelines The Basics of Referee Reports Referee reports are a critical part of

Information theory " Information content of a message a boolean value

Learning objectives Understand the basic principles undelying A&T techniques Grasp

TRECVID-2006: Rushes Exploitation Task Alan Smeaton Dublin City University & Tzveta Ianeva