Data Mining for Knowledge Management Association Rules Themis - PDF document

Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Data Mining for Knowledge Management Thanks for slides to: Jiawei Han  George Kollios  Zhenyu Lu  Osmar R. Zaïane  Mohammad El-Hajj  Yu-ting Kung  2 Data Mining for Knowledge Management 1

Frequent Pattern Mining Given a transaction database DB and a minimum support threshold ξ , find all frequent patterns (item sets) with support no less than ξ. TID Items bought Input: DB: 100 { f, a, c, d, g, i, m, p } 200 { a, b, c, f, l, m, o } 300 { b, f, h, j, o } 400 { b, c, k, s, p } 500 { a, f, c, e, l, p, m, n } Minimum support: ξ =3 Output : all frequent patterns, i.e., f, a, …, fa, fac, fam, … Problem: How to efficiently find all frequent patterns? 3 Data Mining for Knowledge Management Apriori  The core of the Apriori algorithm:  Use frequent ( k – 1)-itemsets (L k-1 ) to generate candidates of frequent k- itemsets C k  Scan database and count each pattern in C k , get frequent k - itemsets ( L k ) .  E.g., TID Items bought Apriori iteration 100 { f, a, c, d, g, i, m, p } C1 f,a,c,d,g,i,m,p,l,o,h,j,k,s,b,e,n 200 { a, b, c, f, l, m, o } L1 f, a, c, m, b, p 300 { b, f, h, j, o } fa, fc, fm, fp, ac, am, …bp C2 fa, fc, fm, … 400 { b, c, k, s, p } L2 500 { a, f, c, e, l, p, m, n } … 4 Data Mining for Knowledge Management 2

Performance Bottlenecks of Apriori  The bottleneck of Apriori : candidate generation  Huge candidate sets:  10 4 frequent 1-itemset will generate 10 7 candidate 2-itemsets  To discover a frequent pattern of size 100, e.g., {a 1 , a 2 , …, 10 30 candidates. a 100 }, one needs to generate 2 100  Multiple scans of database: each candidate 5 Data Mining for Knowledge Management Ideas  Compress a large database into a compact, Frequent- Pattern tree (FP-tree) structure  highly condensed, but complete for frequent pattern mining  avoid costly database scans  Develop an efficient, FP-tree-based frequent pattern mining method (FP-growth)  A divide-and-conquer methodology: decompose mining tasks into smaller ones  Avoid candidate generation: sub-database test only. 6 Data Mining for Knowledge Management 3

Mining Frequent Patterns Without Candidate Generation  Grow long patterns from short ones using local frequent items  ―abc‖ is a frequent pattern  Get all transactions having ―abc‖: DB|abc  ―d‖ is a local frequent item in DB|abc  abcd is a frequent pattern 7 Data Mining for Knowledge Management Mining Frequent Patterns Without Candidate Generation 8 Data Mining for Knowledge Management 4

FP-tree Construction from a Transactional DB min_support = 3 TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Steps: 10 Data Mining for Knowledge Management FP-tree Construction from a Transactional DB min_support = 3 TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} Steps: 1. Scan DB once, find frequent 1-itemsets (single item patterns) 11 Data Mining for Knowledge Management 5

FP-tree Construction from a Transactional DB min_support = 3 Item frequency TID Items bought (ordered) frequent items f 4 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} c 4 200 {a, b, c, f, l, m, o} {f, c, a, b, m} a 3 300 {b, f, h, j, o} {f, b} b 3 400 {b, c, k, s, p} {c, b, p} m 3 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} p 3 Steps: 1. Scan DB once, find frequent 1-itemsets (single item patterns) 12 Data Mining for Knowledge Management FP-tree Construction from a Transactional DB min_support = 3 Item frequency TID Items bought (ordered) frequent items f 4 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} c 4 200 {a, b, c, f, l, m, o} {f, c, a, b, m} a 3 300 {b, f, h, j, o} {f, b} b 3 400 {b, c, k, s, p} {c, b, p} m 3 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} p 3 Steps: 1. Scan DB once, find frequent 1-itemsets (single item patterns) 2. Order frequent items in descending order of their frequency 13 Data Mining for Knowledge Management 6

FP-tree Construction from a Transactional DB min_support = 3 Item frequency TID Items bought (ordered) frequent items f 4 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} c 4 200 {a, b, c, f, l, m, o} {f, c, a, b, m} a 3 300 {b, f, h, j, o} {f, b} b 3 400 {b, c, k, s, p} {c, b, p} m 3 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} p 3 Steps: 1. Scan DB once, find frequent 1-itemsets (single item patterns) 2. Order frequent items in descending order of their frequency 14 Data Mining for Knowledge Management FP-tree Construction from a Transactional DB min_support = 3 Item frequency TID Items bought (ordered) frequent items f 4 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} c 4 200 {a, b, c, f, l, m, o} {f, c, a, b, m} a 3 300 {b, f, h, j, o} {f, b} b 3 400 {b, c, k, s, p} {c, b, p} m 3 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} p 3 Steps: 1. Scan DB once, find frequent 1-itemsets (single item patterns) 2. Order frequent items in descending order of their frequency 3. Scan DB again, construct FP-tree 15 Data Mining for Knowledge Management 7

FP-tree Construction min_support = 3 TID freq. Items bought Item frequency 100 {f, c, a, m, p} f 4 200 {f, c, a, b, m} c 4 300 {f, b} a 3 400 {c, p, b} 500 {f, c, a, m, p} b 3 root m 3 p 3 f:1 c:1 a:1 m:1 p:1 16 Data Mining for Knowledge Management FP-tree Construction min_support = 3 TID freq. Items bought Item frequency 100 {f, c, a, m, p} f 4 200 {f, c, a, b, m} c 4 300 {f, b} a 3 400 {c, p, b} 500 {f, c, a, m, p} b 3 root m 3 p 3 f:2 c:2 a:2 m:1 b:1 p:1 m:1 17 Data Mining for Knowledge Management 8

FP-tree Construction min_support = 3 TID freq. Items bought Item frequency 100 {f, c, a, m, p} f 4 200 {f, c, a, b, m} c 4 300 {f, b} a 3 400 {c, p, b} 500 {f, c, a, m, p} b 3 root m 3 p 3 f:3 c:1 b:1 c:2 b:1 a:2 p:1 m:1 b:1 p:1 m:1 18 Data Mining for Knowledge Management FP-tree Construction min_support = 3 TID freq. Items bought Item frequency 100 {f, c, a, m, p} f 4 200 {f, c, a, b, m} c 4 300 {f, b} a 3 400 {c, p, b} 500 {f, c, a, m, p} b 3 root m 3 p 3 f:4 c:1 b:1 c:3 b:1 a:3 p:1 m:2 b:1 p:2 m:1 19 Data Mining for Knowledge Management 9

FP-tree Construction min_support = 3 TID freq. Items bought Item frequency 100 {f, c, a, m, p} f 4 200 {f, c, a, b, m} c 4 300 {f, b} a 3 400 {c, p, b} 500 {f, c, a, m, p} b 3 root m 3 p 3 Header Table f:4 c:1 Item freq head f 4 b:1 c:3 b:1 c 4 a 3 a:3 p:1 b 3 m 3 m:2 p 3 b:1 p:2 m:1 20 Data Mining for Knowledge Management FP-Tree Definition FP-tree is a frequent pattern tree , defined below:  It consists of one root labeled as ―null― • a set of item prefix subtrees as the children of the root, and a • frequent-item header table . 21 Data Mining for Knowledge Management 10

FP-Tree Definition FP-tree is a frequent pattern tree , defined below:  It consists of one root labeled as ―null― • a set of item prefix subtrees as the children of the root, and a • frequent-item header table . Each node in the item prefix subtrees has three fields:  item-name to register which item this node represents,  count, the number of transactions represented by the portion of  the path reaching this node, and node-link that links to the next node in the FP-tree carrying the  same item-name, or null if there is none. 22 Data Mining for Knowledge Management FP-Tree Definition FP-tree is a frequent pattern tree , defined below:  It consists of one root labeled as ―null― • a set of item prefix subtrees as the children of the root, and a • frequent-item header table . Each node in the item prefix subtrees has three fields:  item-name to register which item this node represents,  count, the number of transactions represented by the portion of  the path reaching this node, and node-link that links to the next node in the FP-tree carrying the  same item-name, or null if there is none. Each entry in the frequent-item header table has two  fields, item-name, and  head of node-link that points to the first node in the FP-tree  carrying the item-name. 23 Data Mining for Knowledge Management 11

Data Mining for Knowledge Management Association Rules Themis - PDF document

Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Data Mining for Knowledge Management Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu

Mining Association Rules Mining Association Rules Additional Measures of rule interestingness

Association Rules from transactional databases ! Mining multilevel association rules from

Association Rules Data Mining and Exploration: Association Rules Itemsets, association rules

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Association Rule Mining 1 What Is Association Rule Mining? Association rule mining is finding

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Data Mining for Knowledge Management Mining Data Streams Themis Palpanas University of Trento

LECTURE 1: INTRODUCTION TO DATA MINING Dr. Dhaval Patel CSE, IIT-Roorkee What is data mining?

Relationship Mining Association Rule Mining Association Rule Mining Try to automatically find

Week 5 Video 3 Relationship Mining Association Rule Mining Association Rule Mining Try to

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to

Building Knowledge Management Systems in Drupal Knowledge

Introduction CS2253 Goal: write a simple C program and understand Why and what for 2253

SOFTDRIVE.NL, SOFTDRIVE.NL, CVMFS FOR THE CVMFS FOR THE MASSES MASSES DENNIS VAN DOK DENNIS

CPSC 213 Introduction to Computer Systems Unit 3 Course Review 1 Learning Goals 1 Memory

There are three prime targets for attackers: Data : smartphones are devices for data

Information and Knowledge Management Working Group Makelesi Gonelevu Timo Baur Purpose The

www.drupaleurope.org Drupal PKM A Personal Knowledge Management Drupal distro

Tracking Knowledge Proficiency of Students with Educational Priors Yuying Chen 1 , Qi Liu 1 ,