On detecting differences between groups Yi Yang Department of - PowerPoint PPT Presentation

On detecting differences between groups Yi Yang Department of Computing Science University of Alberta

Contrast-Set Mining Contrast-Set Mining ● Understanding the differences between contrasting Understanding the differences between contrasting groups is a fundamental task in data analysis groups is a fundamental task in data analysis ● “ “Contrast-set Mining” Contrast-set Mining” S. D. Bay and M. J. Pazzani S. D. Bay and M. J. Pazzani Detecting change in categorical data: Mining contrast sets. 1999 Detecting change in categorical data: Mining contrast sets. 1999 ? A new technique in data mining ? ● A new technique in data mining If yes, is it somehow related to previous data mining techniques such as association rule mining, classification, etc? 2

On detecting differences between groups On detecting differences between groups Geoffrey I. Webb, Shane M. Butler, Douglas Newlands Geoffrey I. Webb, Shane M. Butler, Douglas Newlands 2003 ACM SIGKDD 2003 ACM SIGKDD A study is undertaken to compare contrast-set A study is undertaken to compare contrast-set  mining with existing rule-discovery techniques. mining with existing rule-discovery techniques. Collaboration with a retail store Collaboration with a retail store  Surprise...? Surprise...?  3

Outline Outline Introduction Introduction  The three techniques The three techniques   STUCCO STUCCO  Magnum Opus Magnum Opus  C4.5rules C4.5rules Comparison Comparison  Rule Quality Assessment Rule Quality Assessment  Conclusion Conclusion  4

Introduction Introduction Based on a project to evaluate how contrast-set Based on a project to evaluate how contrast-set  mining differs from pre-existing forms of rule- mining differs from pre-existing forms of rule- discovery in an applied context: discovery in an applied context:  One of Australia's largest discount department One of Australia's largest discount department store companies store companies  Retail activities of two different days Retail activities of two different days  6 stores; several departments 6 stores; several departments  Task: to highlight how the “baskets” of departments differed between 2 days 5

Three Techniques Three Techniques STUCCO   Search and Testing for Understandable Consistent Contrasts  Specialized for mining contrast-sets.  Proposed by Bay and Pazzani Magma Opus   A commercial implementation of OPUS_AR rule- discovery algorithm.  Rules: antecedent --> consequent C4.5rules   Classification-rule discovery  Treat groups as classes 6

STUCCO STUCCO Find contrasts “significant” and “large” Find contrasts “significant” and “large”   Significant: Significant: ∃ ij P  cset ∣ G i ≠ P  cset ∣ G i   Large: Large: ij ∣ support  cset ,G i − support  cset ,G j  ∣  max  where is a user-defined threshold called the where is a user-defined threshold called the minimum support-difference minimum support-difference  Rule filter: chi-square test Rule filter: chi-square test 7

Magnum Opus Magnum Opus OPUS algorithm (Optimized Pruning for Unordered  Search):  search tree;  identifies excluded operators;  prunes descendent trees;  ... Magnum Opus   performs association-rule-like search  does NOT find frequent-itemsets  no requirement for minimum support, but requires rule value & maximum number of rules 8

Magnum Opus (cont.) Magnum Opus (cont.) Rule: antecedent --> consequent Rule: antecedent --> consequent  antecedent = cond1 Ʌ cond2 Ʌ ...} Ʌ cond2 Ʌ ...} antecedent = cond1 Measures of rule value: Measures of rule value:   Support Support  Confidence (called strength) Confidence (called strength)  Lift Lift  Coverage Coverage support of antecedent support of antecedent  Leverage (default measure) degree to which the observed joint frequency of the antecedent and consequent differ from their joint frequency leverage  a  c = support  a ∪ c − support  a × support  c  9

C4.5rules C4.5rules Discovers classification rules Discovers classification rules  1.discovers a decision tree discovers a decision tree 1. 2.converts tree to a set of rules converts tree to a set of rules 2. 3.simplifies those rules simplifies those rules 3. ● Different from contrast-set/association-rule Different from contrast-set/association-rule discovery discovery ● CS/AR find all rules that satisfies some constraint CS/AR find all rules that satisfies some constraint ● CR find rules that are sufficient to predict classes CR find rules that are sufficient to predict classes ● Adaption to contrast-set mining: Adaption to contrast-set mining: ● Groups are encoded as a class variable Groups are encoded as a class variable ● Learn rules to distinguish the groups Learn rules to distinguish the groups 1

Application Application Data Data   2 days of transactions 2 days of transactions  6 stores, aggregated to the department level 6 stores, aggregated to the department level  To contrast the purchasing behavior of customers To contrast the purchasing behavior of customers on the two days on the two days Configuration and parameters Configuration and parameters   STUCCO STUCCO ✔ Significance level = 0.05 Significance level = 0.05 ✔ Minimum support-difference = 0.01 Minimum support-difference = 0.01  C4.5rules C4.5rules ✔ Default settings Default settings  Magnum Opus Magnum Opus ✔ Rule value: leverage Rule value: leverage ✔ Maximum number of rules: 1000 Maximum number of rules: 1000 1

Comparison Comparison STUCCO Magnum Opus C4.5rules Total # of rules 19 83 24 # of single-value rules 19 56 5 # of two-value rules 0 23 2 # of three-value rules 0 4 3 # of multi(>3)-value rules 0 0 14 Rules discovered by STUCCO are all single-value Rules discovered by STUCCO are all single-value  rules; rules; Magnum Opus discovered all rules found by Magnum Opus discovered all rules found by  STUCCO; STUCCO; C4.5 discovered rules up to 51 conditions (51-value C4.5 discovered rules up to 51 conditions (51-value  rules). rules). 1

Example of rules: STUCCO Example of rules: STUCCO Proportion of Contrast Set transactions Number of transactions chi-square test of significance on each day that contained dept 220 1

Example of rules: Magnum Opus Example of rules: Magnum Opus Rules 1-2: the proportion of Rules 1-2: the proportion of  customers buying from each customers buying from each of dept. 851 and 855 on the of dept. 851 and 855 on the 2nd day was higher than the 2nd day was higher than the 1st. 1st. Rule 3: this effect was Rule 3: this effect was  heightened when customers heightened when customers that bought from both that bought from both departments in a single departments in a single transaction were transaction were considered. considered. Rules 4-6: Whereas items Rules 4-6: Whereas items  for dept. 220 and 355 were for dept. 220 and 355 were each purchased more each purchased more frequently on day 1 than frequently on day 1 than day 2, a greater proportion day 2, a greater proportion of customers bought items of customers bought items from both departments on from both departments on the day 2 than day 1. the day 2 than day 1. 1

Example of rules: c4.5rules Example of rules: c4.5rules Value in brackets is the Value in brackets is the  confidence of the rule confidence of the rule Most rules contain many Most rules contain many  “negative” conditions “negative” conditions where dept=0 where dept=0 Are negative conditions Are negative conditions  useful? Will be assessed useful? Will be assessed by domain experts by domain experts 1

Relationship between STUCCO and Magnum Opus Relationship between STUCCO and Magnum Opus STUCCO STUCCO  ∃ ij P  cset ∣ G i ≠ P  cset ∣ G i  Magnum Opus Magnum Opus   Rule filter: Rule filter: For rule a  c , P  c ∣ a  P  c   If the antecedents are treated as contrast sets If the antecedents are treated as contrast sets and the consequents as groups: and the consequents as groups: ∃ i P  G i ∣ cset  P  G i   1

Relationship between STUCCO and Magnum Opus Relationship between STUCCO and Magnum Opus This led to the realization that contrast- This led to the realization that contrast- set mining is a special case of the more set mining is a special case of the more general rule-discovery task. general rule-discovery task. 1

On detecting differences between groups Yi Yang Department of - PowerPoint PPT Presentation

On detecting differences between groups Yi Yang Department of Computing Science University of Alberta Contrast-Set Mining Contrast-Set Mining Understanding the differences between contrasting Understanding the differences between

Friendship amidst differences Friendship amidst differences Friendship amidst differences

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

String Algae Spear Moss seeded and seedless Sword Fern See notes on differences between

Unpacking the Differences: Unpacking the Differences: Unpacking the Differences: Unpacking the

6. Individual Differences Differences: Big Questions Are some differences changeable and

NetFlow Analysis: Detecting covert channels on the network Detecting malicious traffic by using

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Growth in permutation groups and linear New work on algebraic groups permutation groups H. A.

Coxeter groups and Artin groups Day 1: Polytopes and Reflection Groups Jon McCammond (U.C. Santa

Spot the Differences Find the 4 differences between the images on the next slide Answers Fill in

Designing for differences Dan Smith 2 Designing for Differences - Goals Be familiar with

Detecting Chang Detecting Changes in W s in Water ter Qua Q ualit lity i lit lit i in L

Detecting Self-Interruptions during Reading Jan Pilzer and Sam Liu 2017-11-27 Detecting

Effective features for detecting Effective features for detecting IRC botnets IRC botnets

Detecting Insolvency Detecting Insolvency David Emanuel 1 4 August 2 0 0 9 Outline

Jamie Gamble Wicked Problems Are unique and have no precedent Do not have definitive

Energy Target and Benchmark Tool Webinar Demonstration July 7, 2020 Bay Area Regional Energy

Transforming the Bay with Christ Pat Gelsinger Nov. 8 & 9 2014 @ Venture Christian Church In

Timeline of San Francisco Mandates Passage of San Francisco labor standards Airport quality

CBG Business Chain CBG Project in Remote Area Conclusion 2 Energy Outlook and Government Policy

FIGHTING FOOD WASTE Caitrin OBrien Senior Manager, Corporate Sustainability HILTONS 2030

Pick up a handout on the front table 1 Welcome to DS504/CS586: Big Data Analytics --Review

Big Data Analytics 3 rd NESUS Winter School on Data Science & Heterogeneous Computing

On detecting differences between groups Yi Yang Department of - PowerPoint PPT Presentation

On detecting differences between groups Yi Yang Department of Computing Science University of Alberta Contrast-Set Mining Contrast-Set Mining Understanding the differences between contrasting Understanding the differences between

Friendship amidst differences Friendship amidst differences Friendship amidst differences

Detecting Spammers and Content Detecting Spammers and Content Detecting Spammers and Content

12/6/2013 Detecting Fakes Image Forensics: Detecting Forged Photos 1.Detecting photorealistic

String Algae Spear Moss seeded and seedless Sword Fern See notes on differences between

Unpacking the Differences: Unpacking the Differences: Unpacking the Differences: Unpacking the

6. Individual Differences Differences: Big Questions Are some differences changeable and

NetFlow Analysis: Detecting covert channels on the network Detecting malicious traffic by using

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Growth in permutation groups and linear New work on algebraic groups permutation groups H. A.

Coxeter groups and Artin groups Day 1: Polytopes and Reflection Groups Jon McCammond (U.C. Santa

Spot the Differences Find the 4 differences between the images on the next slide Answers Fill in

Designing for differences Dan Smith 2 Designing for Differences - Goals Be familiar with

Detecting Chang Detecting Changes in W s in Water ter Qua Q ualit lity i lit lit i in L

Detecting Self-Interruptions during Reading Jan Pilzer and Sam Liu 2017-11-27 Detecting

Effective features for detecting Effective features for detecting IRC botnets IRC botnets

Detecting Insolvency Detecting Insolvency David Emanuel 1 4 August 2 0 0 9 Outline

Jamie Gamble Wicked Problems Are unique and have no precedent Do not have definitive

Energy Target and Benchmark Tool Webinar Demonstration July 7, 2020 Bay Area Regional Energy

Transforming the Bay with Christ Pat Gelsinger Nov. 8 &amp; 9 2014 @ Venture Christian Church In

Timeline of San Francisco Mandates Passage of San Francisco labor standards Airport quality

CBG Business Chain CBG Project in Remote Area Conclusion 2 Energy Outlook and Government Policy

FIGHTING FOOD WASTE Caitrin OBrien Senior Manager, Corporate Sustainability HILTONS 2030

Pick up a handout on the front table 1 Welcome to DS504/CS586: Big Data Analytics --Review

Big Data Analytics 3 rd NESUS Winter School on Data Science &amp; Heterogeneous Computing

Transforming the Bay with Christ Pat Gelsinger Nov. 8 & 9 2014 @ Venture Christian Church In

Big Data Analytics 3 rd NESUS Winter School on Data Science & Heterogeneous Computing