c ost s ensitive m easures of i nstance h ardness
play

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo - PowerPoint PPT Presentation

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo Ricardo Prudncio Centro de Informtica UFPE Recife-Brazil I NTRODUCTION Instance hardness Which instances are more difficult in a dataset? Motivation Data


  1. C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo Ricardo Prudêncio Centro de Informática – UFPE Recife-Brazil

  2. I NTRODUCTION  Instance hardness  Which instances are more difficult in a dataset?  Motivation  Data cleaning, ensemble methods,...  Aspects considered in our work  Misclassification costs  Decision thresholds choice methods

  3. Context B Context A Cost of FN > Cost of FP Cost of FN = Cost of FP Instance hardness depends on the observed context (misclassification costs) and how to deal with it (decision threshold choice method)

  4. D EVELOPED W ORK  Framework to define cost curves and hardness measures for instances Instance x  Questions:  Given a context and an algorithm, how Loss hard is an instance?  How hard is an instance in general?  Which algorithm is the best for each instance?  Different curves for different decision threshold choice methods

  5. N OTATION AND BASIC DEFINITIONS  Instances can be either positive (y = 0) or negative (y = 1)  Learned model m is a scoring function  s = m(x) is high for negative instances ^ s y y  Decision Threshold (t) 0.92 1 1 0.71 1 0 0.54 1 0 t = 0.5 1, if s > t (i.e., x is negative) 0.36 0 0 ^ 0.21 0 1 y = 0, otherwise (i.e., x is positive)

  6. I NSTANCE COST CURVES  Cost model :      ( , ) 2 { ( ) ( 1 ) ( )} Q t c c FN t c FP t 0 1  ( , , ) 2 ( , ) QI x t c c f x t  Positive instances n   ( , , ) 2 ( 1 ) ( , ) QI x t c c f x t  Negative instances p

  7. I NSTANCE C OST C URVES - S CORE - DRIVEN THRESHOLD  Threshold is set equal to the cost proportion  t = T(c) = c  ( , ( ), ) 2 ( , ) QI x T c c c f x t n c = 0.4  0 . 54 c Higher cost for false positives  ( , ) 1 f n x t QI ^ s y y 0.92 1 1 0.71 1 0 x 0.54 1 0 t = 0.4 0.36 0 0 1 0 0.54 0.21 0 1 c

  8. I NSTANCE HARDNESS - S CORE - DRIVEN THRESHOLD  Instance cost curves (positive instances)  ( , ( ), ) 2 ( , ) QI x T c c c f x t n QI  ( ) IH x 2s  s  2 0 2 cdc s     2 2 s ( 0 ) ( ) s y s 1 c IH is the square error

  9. I NSTANCE HARDNESS - R ATE - DRIVEN THRESHOLD  Threshold is equal to a desired rate of positive predictions R(t)  t = T(c) = R -1 (c)   0 . 4 0 . 6 c c  R = 0.80 (80% of positive  ( , ) 0 f n x t ( , ) 1 f n x t QI predictions) ^ s y y 0.92 1 1 0.71 0 0 R(0.54) = 0.60 0.54 0 0 1 0.4 0.6 x 0.36 0 0 R(0.36) = 0.40 0.21 0 1

  10. I NSTANCE HARDNESS - R ATE - DRIVEN THRESHOLD  Instance cost curves (positive instances)  ( , , ) 2 ( , ) QI x t c c f x t n c  ( s ) R    1 l QI ( , ) 0 f n x t      2 ( ) ( ) ( ) IH x R s R s   3 n n  2 ( ) ( ) IH x R s R(s)   1 n c IH is the square positive rate

  11. I LLUSTRATIVE E XAMPLE x m 1 (x) y x 1 0.92 1 x 2 0.71 1 IH SD = (0 - 0.34) 2 = (0.34) 2 x 3 0.34 0 IH RD = (0.7) 2 x 4 0.31 1 x 5 0.23 1 x 6 0.20 1 x 7 0.15 0 x 8 0.13 0 Well calibrated score but poor rank x 9 0.11 1 x 10 0.05 0

  12. E NSEMBLE I NSTANCE H ARDNESS  Average cost curves and instances hardness over a pool L of learning models | | L 1   ( , , ) ( , , ) QI x t c QI x t c j | | L  1 j | | L 1   ( ) ( ) IH x IH j x | | L  1 j Strong assumption: all learning models are equally probable and reliable

  13. I LLUSTRATIVE E XAMPLE - E NSEMBLE H ARDNESS Ensemble instance cost curves for the positive instances x 5 x 6 x 7 x 8 x 9 x 10 Score-Driven Rate-Driven

  14. I LLUSTRATIVE E XAMPLE - C LASS H ARDNESS Positive Class Negative Class Score-Driven Rate-Driven

  15. C ONCLUSION  Instance hardness measures and cost curves considering different scenarios  Other threshold choice methods  Probabilistic methods (rate-uniform and score- uniform), rate-fixed and score-fixed.  Future work  Integrate instance hardness into classification methods (ensemble learning)  Empirical and meta-learning studies

  16. C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Questions???

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend