Course Content Week 2 (March 17) and Week 3 (March 24) 33459-01 - PDF document

Lecture 2 Course Content Week 2 (March 17) and Week 3 (March 24) 33459-01 Principles of Knowledge Discovery • Introduction to Data Mining in Data • Association Analysis Association Rule Mining • Sequential Pattern Analysis • Classification and Prediction • Contrast Sets • Data Clustering Lecture by: Dr. Osmar R. Zaïane • Outlier Detection • Web Mining 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 1 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 2 (Dr. O. Zaiane) (Dr. O. Zaiane) What Is Association Rule Mining? Transactional Databases • Association rule mining searches for relationships between items in a dataset : Transaction Frequent itemset Rule – aims at discovering associations between items in a transactional database. {bread, milk, beer,…} (Bread, milk) Bread � milk find Store {a,b,c,d…} combinations of items that Automatic diagnostic {x,y,z} occur typically Background, Motivation and General Outline of the Proposed Project {term 1 , term 2 ,…,term n } (term 2 , term 25 ) term2 � term25 We have been collecting tremendous amounts of information counting on the power of computers to help efficiently sort together through this amalgam of information. Unfortunately, these massive collections of data stored on disparate dispersed media very rapidly become overwhelming. Regrettably, most of the collected large datasets remain unanalyzed due to lack of appropriate, effective and scalable techniques. { , , ,…} • Rule form: “ Body � Head [support, confidence] ” buys(x, “bread”) � buys(x, “milk”) [0.6%, 65%] {f1, f2,…,Ca} (f3, f5, f α ) f3^f5 � f α major(x, “CS”) ^ takes(x, “DB”) � grade(x, “A”) [1%, 75%] 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 3 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 4 (Dr. O. Zaiane) (Dr. O. Zaiane) Association Rule Mining Lecture Outline Part I: Concepts (30 minutes) mining association rules Fast algorithm Partitioning (Agrawal et. al SIGMOD93) (Agrawal et. al VLDB94) (Navathe et. al VLDB95) Basic concepts • Generalized A.R. Hash-based Multilevel A.R. Support and Confidence • (Park et. al SIGMOD95) (Han et. al. VLDB95) (Srikant et. Al. VLDB95) Naïve approach • Quantitative A.R. Incremental mining Parallel mining Part II: The Apriori Algorithm (30 minutes) (Cheung et. al ICDE96) (Agrawal et. al TKDE96) (Srikant et. al SIGMOD96) Principles • Algorithm Distributed mining Meta-ruleguided mining Direct Itemset Counting • (Cheung et. al PDIS96) (Kamber et al. KDD97) (Brin et. al SIGMOD97) Running Example • Part III: The FP-Growth Algorithm (30 minutes) N-dimensional A.R. Constraint A.R. A.R. with recurrent items FP-tree structure (Lu et. al DMKD’98) (Ng et. al SIGMOD’98) (Zaïane et. al ICDE’00) • Running Example • FP without Candidate gen. DualMiner COFI algorithm (Han et. al SIGMOD’00) (Bucil, et. al KDD’02) (El-Hajj, et. al Dawak’03) Part IV: More Advanced Concepts (30 minutes) Database layout and space search approach And many many others: • Spatial AR; Sequence Associations;AR for multimedia; AR Other types of patterns and constraints • in time series;AR with progressive refinement; etc. 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 5 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 6 (Dr. O. Zaiane) (Dr. O. Zaiane)

Finding Rules in Transaction Data Set Basic Concepts • 6 transactions • 5 items: {Beer, Bread, Jelly, Milk, PeanutButter} Transactions Items A transaction is a set of items: T={i a , i b ,…i t } T1 Bread, Jelly, PeanutButter T2 Bread, PeanutButter T ⊂ I, where I is the set of all possible items {i 1 , i 2 ,…i d } T3 Bread, Milk, PeanutButter T4 Beer, Bread T5 Beer, Milk D , the task relevant data, is a set of transactions D={T 1 , T 2 ,…T n }. T6 Bread, Milk • Searching for rules of the form X � Y, where X and Y are An association rule is of the form: sets of items P � Q, where P ⊂ I , Q ⊂ I , and P ∩ Q = ∅ – e.g. Bread � Jelly; Bread, Jelly � PeanutButter • Design an efficient algorithm for mining association rules in large data sets • Develop an effective approach for distinguishing interesting rules from irrelevant ones 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 7 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 8 (Dr. O. Zaiane) (Dr. O. Zaiane) Support of an Itemset Basic Concepts (con’t) Support of P = P 1 ∧ P 2 ∧ ... ∧ P k in D σ (P/ D ) is the probability that P • occurs in D: it is the percentage of transactions T in D satisfying P. A set of items is referred to as itemset. I.e. the support of an item (or itemset) X is the percentage of • An itemset containing k items is called k-itemset . transactions in which that item (or items) occurs: (number of T by {Jelly, Milk, Bread} is a 3-itemset example cardinality of D ). # X = s upport ( X ) An items set can also be seen as a conjunction of items (or a n predicate) Support for all subsets of • Itemset Support Itemset Support items Beer 33% Beer, Bread, Milk 0% Bread 66% Beer, Bread, PeanutButter 0% – Note the exponential P � Q holds in D with support s Jelly 16% Beer, Jelly, Milk 0% growth in the set of items Milk 50% Beer, Jelly, PeanutButter 0% PeanutButter 50% Beer, Milk, PeanutButter 0% and – 5 items: 31 sets Beer, Bread 16% Bread, Jelly, Milk 0% Beer, Jelly 0% Bread, Jelly, PeanutButter 16% P � Q has a confidence c in the transaction set D . Beer, Milk 16% Bread, Milk, PeanutButter 16% Transactions Items Beer, PeanutButter 0% Jelly, Milk, PeanutButter 0% T1 Bread, Jelly, PeanutButter Bread, Jelly 16% Beer, Bread, Jelly, Milk 0% T2 Bread, PeanutButter Bread, Milk 33% Beer, Bread, Jelly, PeanutButter 0% Support(P � Q) = Probability(P ∪ Q) Bread, PeanutButter 50% Beer, Bread, Milk, PeanutButter 0% T3 Bread, Milk, PeanutButter Jelly, Milk 0% Beer, Jelly, Milk, PeanutButter 0% T4 Beer, Bread Jelly, PeanutButter 16% Bread, Jelly, Milk, PeanutButter 0% Confidence(P � Q) = Probability(Q/P) T5 Beer, Milk Milk, PeanutButter 16% Beer, Bread, Jelly, Milk, PeanutButter 0% T6 Bread, Milk Beer, Bread, Jelly 0% 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 9 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 10 (Dr. O. Zaiane) (Dr. O. Zaiane) Support and Confidence of an Association Rule Support and Confidence – cont. • What is the support and Transactions Items • The support of an association rule X � Y is the T1 Bread, Jelly, PeanutButter confidence of the following rules? T2 Bread, PeanutButter percentage of transactions that contain X ∪ Y T3 Bread, Milk, PeanutButter – Beer � Bread T4 Beer, Bread T5 Beer, Milk ∪ # ( X Y) T6 Bread, Milk – {Bread, PeanutButter} � Jelly support ( X − > Y ) = n • Support and confidence for some association rules • The confidence of an association rule X � Y is the ratio Rule Support Confidence of the number of transactions that contain X ∪ Y to the Bread � PeanutButter 50% 75% Why the number of transactions that contain X PeanutButter � Bread 50% 100% difference? Beer � Bread 16% 50% ∪ # ( X Y) PeanutButter � Jelly 16% 33% − > = confidence ( X Y ) Jelly � PeanutButter 16% 100% # X Jelly � Milk 0% 0% {Bread, PeanutButter} � Jelly 16% 33% • Confidence of a rule P → Q in database D ϕ( P → Q/ D ) is • Support measures how often the rule occurs in the the ratio σ ((P ∧ Q)/ D ) by σ (P/ D ) database. support ( X − > Y) − > = confidence ( X Y ) • Confidence measures the strength of the rule. support ( X ) 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 11 33459-01: Principles of Knowledge Discovery in Data – March-June, 2006 12 (Dr. O. Zaiane) (Dr. O. Zaiane)

Course Content Week 2 (March 17) and Week 3 (March 24) 33459-01 - PDF document

Lecture 2 Course Content Week 2 (March 17) and Week 3 (March 24) 33459-01 Principles of Knowledge Discovery Introduction to Data Mining in Data Association Analysis Association Rule Mining Sequential Pattern Analysis

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

CONTENT DURING CORONAVIRUS LUCINDA DAWES - STRATEGIC CONTENT MARKETING Copywriter & Content

Content Provider Content Resolver Cursor Content Provider Basics Content providers is one

Peering and CDNs Arturo Servin Google Imagine youre a Content Provider Content Provider

CS371m - Mobile Computing Content Providers And Content Resolvers Content Providers One of

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

Content Editors Training Course 2 In this session we will introduce Content Editors to the new

NC COURSE OF STUDY GRADUATION REQUIREMENTS * Content Area CAREER PREP COLLEGE TECH PREP**

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Course Home Page Course Design Course Structure main source reading-intensive course

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

Sharing and non sharing of work related information amongst scholars within the field of

Counterexpectation, concession, and free choice in Tibetan and beyond Michael Yoshitaka Erlewine

Prrs rs tt

MTL-algebras via rotations of basic hoops Sara Ugolini University of Denver, Department of

Safe Testing Peter Grnwald Centrum Wiskunde & Informatica Amsterdam Mathematisch

How I approach patients with both proximal common carotid disease & carotid bifurcation

of Israel and Judah Every kingdom divide divided against itself will be ruined, and every

2016 Third Quarter Update November 2, 2016 Legal Statements SAFE HARBOR STATEMENT /

Course Content Week 2 (March 17) and Week 3 (March 24) 33459-01 - PDF document

Lecture 2 Course Content Week 2 (March 17) and Week 3 (March 24) 33459-01 Principles of Knowledge Discovery Introduction to Data Mining in Data Association Analysis Association Rule Mining Sequential Pattern Analysis

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

CONTENT DURING CORONAVIRUS LUCINDA DAWES - STRATEGIC CONTENT MARKETING Copywriter &amp; Content

Content Provider Content Resolver Cursor Content Provider Basics Content providers is one

Peering and CDNs Arturo Servin Google Imagine youre a Content Provider Content Provider

CS371m - Mobile Computing Content Providers And Content Resolvers Content Providers One of

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

Content Editors Training Course 2 In this session we will introduce Content Editors to the new

NC COURSE OF STUDY GRADUATION REQUIREMENTS * Content Area CAREER PREP COLLEGE TECH PREP**

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Course Home Page Course Design Course Structure main source reading-intensive course

Level 1, V2.0 Level 1, V2.0 1 Course Contents Course Contents Course Contents Course

Sharing and non sharing of work related information amongst scholars within the field of

Counterexpectation, concession, and free choice in Tibetan and beyond Michael Yoshitaka Erlewine

Prrs rs tt

MTL-algebras via rotations of basic hoops Sara Ugolini University of Denver, Department of

Safe Testing Peter Grnwald Centrum Wiskunde &amp; Informatica Amsterdam Mathematisch

How I approach patients with both proximal common carotid disease &amp; carotid bifurcation

of Israel and Judah Every kingdom divide divided against itself will be ruined, and every

2016 Third Quarter Update November 2, 2016 Legal Statements SAFE HARBOR STATEMENT /

CONTENT DURING CORONAVIRUS LUCINDA DAWES - STRATEGIC CONTENT MARKETING Copywriter & Content

Safe Testing Peter Grnwald Centrum Wiskunde & Informatica Amsterdam Mathematisch

How I approach patients with both proximal common carotid disease & carotid bifurcation