X 1 1 1 1 0 0 0 0 1000 - - PDF document

x 1 1 1 1 0 0 0 0
SMART_READER_LITE
LIVE PREVIEW

X 1 1 1 1 0 0 0 0 1000 - - PDF document


slide-1
SLIDE 1

Data Mining: Association Rules 46

  • → →
  • !"#
  • Data Mining: Association Rules

47

$%&'( %&')( %&'("&'(&'( $%&'(&'*("&'*(

Data Mining: Association Rules 48

  • +,→ -

!!!

).) /0 /1 / 00 01 , 1/ 10 11 ,

  • Contingency table for X → Y

f11: support of X and Y f10: support of X and Y f01: support of X and Y f00: support of X and Y Used to define various measures

support, confidence, lift, Gini,

J-measure, etc.

Data Mining: Association Rules 49

  • 100

10 20 30 4 54 . 60 4 14 .

  • ⇒ !"

⇒ #

Data Mining: Association Rules 50

  • 7819':-&;$23(

1000 <00!#! 540 =00!!#! !#! ⇒ >=0?<<@5?A ! 54?: <<@5?@ !#! ⇒ >60?BB@B?A ::

basketball not basketball sum(row) cereal 400 350 750 not cereal 200 50 250 sum(col.) 600 400 1000

Data Mining: Association Rules 51

!

7869

,-9 ,C9 ,⇒ C

X 1 1 1 1 0 0 0 0 Y 1 1 0 0 0 0 0 0 Z 0 1 1 1 1 1 1 1

Rule Support Confidence X=>Y 25% 50% X=>Z 37.50% 75%

) ( ) ( ) (

,

B P A P B A P corr B

A

∪ =

&')("&'( ⇒ ⇒ ⇒ ⇒ D

slide-2
SLIDE 2

Data Mining: Association Rules 52

"#$

'(

#!&'(&'( &'∧(%&'(E&'( 1F :

) ( ) ( ) ( B P A P B A P ∧

X 1 1 1 1 0 0 0 0 Y 1 1 0 0 0 0 0 0 Z 0 1 1 1 1 1 1 1

Itemset Support Interest X,Y 25% 2 X,Z 37.50% 0.9 Y,Z 12.50% 0.57 Data Mining: Association Rules 53

  • &1000

<00#:::'$( 500#::!#'( =60#:::!#'$( &'$∧(%=60"1000%0@=6 &'$(× &'(%0@<× 0@5%0@=6 &'$∧(%&'$(× &'(%G$ &'$∧(G&'$(× &'(%G& &'$∧(H&'$(× &'(%GI

Data Mining: Association Rules 54

  • D#
  • )]

( 1 )[ ( )] ( 1 )[ ( ) ( ) ( ) , ( ) ( ) ( ) , ( ) ( ) ( ) , ( ) ( ) | ( Y P Y P X P X P Y P X P Y X P t coefficien Y P X P Y X P PS Y P X P Y X P Interest Y P X Y P Lift − − − = − − = = = φ

Data Mining: Association Rules 55

%$&'

100 10 20 30 4 54 . 60 4 14 .

  • ⇒ $ %&###'( !!)*

Data Mining: Association Rules 56

&(

100 20 10 20 20 , 10 10 ,

  • 100

10 20 10 10 , 20 20 ,

  • 10

) 1 . )( 1 . ( 1 . = = Lift 11 . 1 ) 9 . )( 9 . ( 9 . = = Lift

Statistical independence: If P(X,Y)=P(X)P(Y) => Lift = 1

Data Mining: Association Rules 57

) >&#J$A BD9

D'(%0 D'(:&'(: &'(&'( D'(:&'(>&'(A :&'(&'(>&'(A

slide-3
SLIDE 3

Data Mining: Association Rules 58

*

B B A p q A r s A A B p r B q s Does M(A,B) = M(B,A)? Symmetric measures:

support, lift, collective strength, cosine, Jaccard, etc

Asymmetric measures:

confidence, conviction, Laplace, J-measure, etc

Data Mining: Association Rules 59

+'

10 5 B 4 = 1 K: 4 B 6 L M D 5< 50 < =6 =0 6 K: B= B0 = L M D

Grade-Gender Example (Mosteller, 1968): Mosteller: Underlying association should be independent of the relative number of male and female students in the samples

2x 10x

Data Mining: Association Rules 60

"

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 A B C D (a) (b) 1 1 1 1 1 1 1 1 1 (c) E F

Transaction 1 Transaction N

. . . . .

Data Mining: Association Rules 61

%$φ φ φ φ

φJ !

100 B0 50 B0 60 10 , 50 10 <0 ,

  • 100

50 B0 50 <0 10 , B0 10 60 ,

  • 5238

. 3 . 7 . 3 . 7 . 7 . 7 . 6 . = × × × × − = φ

φ φ φ φ Coefficient is the same for both tables

5238 . 3 . 7 . 3 . 7 . 3 . 3 . 2 . = × × × × − = φ

Data Mining: Association Rules 62

,-

B B A p q A r s B B A p q A r s + k Invariant measures:

support, cosine, Jaccard, etc

Non-invariant measures:

correlation, Gini, mutual information, odds ratio, etc

Data Mining: Association Rules 63

  • ;!N

.:9 F

$!N >$!O .OP24A

'( 8 '(F" ! ':(