C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo - - PowerPoint PPT Presentation

c ost s ensitive m easures of i nstance h ardness
SMART_READER_LITE
LIVE PREVIEW

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo - - PowerPoint PPT Presentation

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo Ricardo Prudncio Centro de Informtica UFPE Recife-Brazil I NTRODUCTION Instance hardness Which instances are more difficult in a dataset? Motivation Data


slide-1
SLIDE 1

COST-SENSITIVE MEASURES OF INSTANCE HARDNESS

Carlos Melo Ricardo Prudêncio Centro de Informática – UFPE Recife-Brazil

slide-2
SLIDE 2

INTRODUCTION

 Instance hardness  Which instances are more difficult in a dataset?  Motivation  Data cleaning, ensemble methods,...  Aspects considered in our work  Misclassification costs  Decision thresholds choice methods

slide-3
SLIDE 3

Context A Cost of FN = Cost of FP Context B Cost of FN > Cost of FP

Instance hardness depends on the observed context (misclassification costs) and how to deal with it (decision threshold choice method)

slide-4
SLIDE 4

DEVELOPED WORK

 Framework to define cost curves and

hardness measures for instances

 Questions:  Given a context and an algorithm, how

hard is an instance?

 How hard is an instance in general?  Which algorithm is the best for each

instance?

 Different curves for different decision

threshold choice methods

Loss Instance x

slide-5
SLIDE 5

NOTATION AND BASIC DEFINITIONS

 Instances can be either positive (y = 0) or

negative (y = 1)

 Learned model m is a scoring function  s = m(x) is high for negative instances  Decision Threshold (t)

y = 1, if s > t (i.e., x is negative) 0, otherwise (i.e., x is positive) s y y 0.92 1 1 0.71 1 0 0.54 1 0 0.36 0 0 0.21 0 1 t = 0.5 ^ ^

slide-6
SLIDE 6

 Cost model :

INSTANCE COST CURVES

)} ( ) 1 ( ) ( { 2 ) , (

1

t FP c t FN c c t Q     

 Positive instances  Negative instances

) , ( ) 1 ( 2 ) , , ( t x f c c t x QI

p

 

) , ( 2 ) , , ( t x f c c t x QI

n

slide-7
SLIDE 7

INSTANCE COST CURVES - SCORE-DRIVEN

THRESHOLD

 Threshold is set equal to the cost proportion  t = T(c) = c

s y y 0.92 1 1 0.71 1 0 0.54 1 0 0.36 0 0 0.21 0 1 c = 0.4 Higher cost for false positives t = 0.4 ^

c

1

QI

0.54

54 .  c

1 ) , (  t x fn

) , ( 2 ) ), ( , ( t x f c c c T x QI

n

x

slide-8
SLIDE 8

 Instance cost curves (positive instances)

INSTANCE HARDNESS - SCORE-DRIVEN

THRESHOLD

) , ( 2 ) ), ( , ( t x f c c c T x QI

n

c

1

QI

2 0 2

) ( s cdc x IH

s

 

s

2 2

) ( ) ( s y s    

IH is the square error

2s

slide-9
SLIDE 9

INSTANCE HARDNESS - RATE-DRIVEN

THRESHOLD

 Threshold is equal to a desired

rate of positive predictions R(t)

 t = T(c) = R-1(c)

s y y 0.92 1 1 0.71 0 0 0.54 0 0 0.36 0 0 0.21 0 1 R = 0.80 (80% of positive predictions) ^

4 .  c

1 ) , (  t x fn

1

QI

0.4 0.6

x R(0.36) = 0.40

6 .  c

) , (  t x fn

R(0.54) = 0.60

slide-10
SLIDE 10

 Instance cost curves (positive instances)

INSTANCE HARDNESS - RATE-DRIVEN

THRESHOLD

) , ( 2 ) , , ( t x f c c t x QI

n

c

1

QI

         ) ( 3 1 ) ( ) (

2

s R n n l s R x IH

R(s)

) (s R c 

) , (  t x fn

IH is the square positive rate

   n s R x IH

2

) ( ) (

slide-11
SLIDE 11

x m1(x) y x1 0.92 1 x2 0.71 1 x3 0.34 0 x4 0.31 1 x5 0.23 1 x6 0.20 1 x7 0.15 0 x8 0.13 0 x9 0.11 1 x10 0.05 0

IHRD = (0.7)2 IHSD = (0 - 0.34)2 = (0.34)2

ILLUSTRATIVE EXAMPLE

Well calibrated score but poor rank

slide-12
SLIDE 12

ENSEMBLE INSTANCE HARDNESS

 Average cost curves and instances hardness

  • ver a pool L of learning models

| | 1

) , , ( | | 1 ) , , (

L j j

c t x QI L c t x QI

| | 1

) ( | | 1 ) (

L j j x

IH L x IH

Strong assumption: all learning models are equally probable and reliable

slide-13
SLIDE 13

ILLUSTRATIVE EXAMPLE

  • ENSEMBLE HARDNESS

Ensemble instance cost curves for the positive instances Score-Driven Rate-Driven x5 x6 x7 x8 x9 x10

slide-14
SLIDE 14

ILLUSTRATIVE EXAMPLE

  • CLASS HARDNESS

Score-Driven Rate-Driven Positive Class Negative Class

slide-15
SLIDE 15

CONCLUSION

 Instance hardness measures and cost curves

considering different scenarios

 Other threshold choice methods  Probabilistic methods (rate-uniform and score-

uniform), rate-fixed and score-fixed.

 Future work  Integrate instance hardness into classification

methods (ensemble learning)

 Empirical and meta-learning studies

slide-16
SLIDE 16

COST-SENSITIVE MEASURES OF INSTANCE HARDNESS

Questions???