THE PREDICTION ADVANTAGE: A UNIVERSALLY MEANINGFUL PERFORMANCE Ran - PowerPoint PPT Presentation

THE PREDICTION ADVANTAGE: A UNIVERSALLY MEANINGFUL PERFORMANCE Ran El-Yaniv, Yonatan Geifman, Yair Wiener

2 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER OUTLINE Introduction and motivation The prediction advantage Bayesian marginal prediction PA for several loss functions Related measures Empirical results Future research and open questions Conclusion

3 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER INTRODUCTION Consider an imbalanced problem Does 99% accuracy is good enough? When the minority class is only 0.5%? Can 70% accuracy on multi-class with 3 classes can be compared to 70% with 4 classes? Haberman – a dataset with 26.4% of minority class with reported results of 27% We are looking for a universal measure that can obtain the complexity and the bias of the problem.

4 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER MAIN IDEA Lets obtain the performance advantage of the prediction function over the ”random” function Challenges: What is the “random classifier” How can we compare 2 classifiers? Which loss? Subtract? Divide? Does it general for regression and classification? For any loss function?

5 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER PREDICTION ADVANTAGE PA ` ( f ) = 1 − R ` ( f ) R ` ( f 0 ) = 1 − E X,Y ( ` ( f ( X ) , Y )) E X,Y ( ` ( f 0 ( X ) , Y ) .

6 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER BAYESIAN MARGINAL PREDICTION (BMP) The optimal prediction function with respect to the marginal distribution of Y. The BMP predicts a constant value/class while being oblivious to X and P(Y|X ). we expect the BMP to obtain only the complexity of the problem latent in P(Y).

7 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER THE BMP IS CONSTANT Why the BMP is a constant? Yaw principal Lemma: Consider a general function g~Q and a convex loss function

8 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER PREDICTION ADVANTAGE - PROPERTIES Order preservation - The PA forms a weak ordering of the functions, similar to the order formed by the loss function Boundedness - the PA is bounded by 1. PA=1 achieved only by the perfect classifier. Meaningfulness - PA=0 when f has no advantage over the BMP

9 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER PA FOR CROSS ENTROPY LOSS Cross-entropy loss - X ` ( f ( X ) , Y ) = − Pr { Y = i } log (Pr { f ( X ) = i } ) i ∈ C f ( x ) : X → R k Multi class problem with k classes The BMP is the marginal probabilities for each class f 0 ( X ) i = P { Y = e i } Labels are given in one-hot representation

10 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER PA FOR CROSS ENTROPY LOSS - PROOF Lets define an arbitrary distribution Q and f Q ( X ) ∼ Q R ` ( f 0 ) = E ` ( f 0 ( X ) , Y ) X = Pr { Y = e i } ` ( f 0 ( X ) , e i ) i ∈ C X = − Pr { Y = e i } log (Pr { Y = e i } ) i ∈ C = H ( Y ) R ` ( f Q ) = E ` ( f Q ( X ) , Y ) X = Pr { Y = e i } ` ( f Q ( X ) , e i ) i ∈ C X = − Pr { Y = e i } log ( f Qi ( X )) i ∈ C

11 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER PA FOR CROSS ENTROPY LOSS - PROOF We calculate: R ` ( f Q ) − R ` ( f 0 ) X X R ` ( f Q ) − R ` ( f 0 ) = − Pr { Y = e i } log ( f Qi ) + Pr { Y = e i } log (Pr { Y = e i } ) i ∈ C i ∈ C X = Pr { Y = e i } log (Pr { Y = e i } /f Qi ( X )) i ∈ C = D kl ( f 0 ( X ) || f Q ( X )) 0 . ≥ The BMP loss: R ` ( f 0 ) = H ( P ( Y )) R ` ( f ) The PA: PA ` ( f ) = 1 − H ( P ( Y )) .

12 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER PA FOR 0/1 LOSS The BMP: f 0 = argmax i (Pr { Y = i } ) The BMP risk: R ` 0 − 1 ( f 0 ) = 1 − max i ∈ C (Pr { Y = i } ) = 1 − Pr { Y = j } . The PA: PA ` ( f ) = 1 − R ` ( f ) R ` ( f ) R ` ( f 0 ) = 1 − 1 − max i ∈ C (Pr { Y = i } )

13 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER PA FOR SQUARED LOSS The BMP f 0 = E [ Y ] The BMP risk: R ` ( f 0 ) = E Y [( Y − f 0 ) 2 ] = E Y [( Y − E [ Y ]) 2 ] = var ( Y ) The PA: PA ` ( f ) = 1 − R ` ( f ) R ` ( f 0 ) = 1 − R ` ( f ) var ( Y )

14 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER PA FOR ABSOLUTE LOSS The BMP for absolute loss: f 0 = median ( Y ) The BMP risk: R ` ( f 0 ) = E Y [ | Y − median ( Y ) | ] = D med The PA: PA ` ( f ) = 1 − R ` ( f ) R ` ( f 0 ) = 1 − R ` ( f ) . D med

15 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER RELATION TO OTHER MEASURES Some other measures defined as two numbers (e.g., precision recall), we look for one number We compared to F-score, Cohen’s kappa, and balanced accuracy The PA bounds from below all the other measures

16 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER EMPIRICAL RESULTS We compared some relevant performance measure on different noise levels and imbalance levels on the breast cancer dataset Measures: Balanced accuracy - (TP+TN)/2 F-measure - harmonic mean of precision and recall Cohen’s kappa - inter-rater agreement measure

17 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER EMPIRICAL RESULTS

21 TEXT PA AND SELECTIVE PREDICTION In selective prediction for every coverage rate we have different P(Y) Risk-coverage curves are misleading We argue that in this case the objective has to be the PA and we should measure the PA-coverage curve Still not clear how to construct a reject mechanism which optimize PA

22 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER CONCLUSION AND FUTURE WORK We presented a universal performance measure It is still not clear how to best estimate some of the measures (entropy, median, etc…) Does the PA can be used as an optimization objective? where is it needed? how to optimize it? (non convex)

THE PREDICTION ADVANTAGE: A UNIVERSALLY MEANINGFUL PERFORMANCE Ran - PowerPoint PPT Presentation

THE PREDICTION ADVANTAGE: A UNIVERSALLY MEANINGFUL PERFORMANCE Ran El-Yaniv, Yonatan Geifman, Yair Wiener 2 THE PREDICTION ADVANTAGE - EL-YANIV, GEIFMAN, WIENER OUTLINE Introduction and motivation The prediction advantage Bayesian marginal

Iteration hypotheses and the strong sealing of universally Baire sets W. Hugh Woodin Harvard

Generic absoluteness and universally Baire sets of reals Trevor Wilson Miami University, Ohio

Meaningful Use: a Primer Mary Mitchell Director of Meaningful Use A project of L.A. Care Health

HIE The Bridge to Interoperability Facilitating Meaningful Use and MACRA Meaningful Use Today

Meaningful Measures November 28, 2017 Jean Moody-Williams, RN, MPP Pierre Yong, MD, MPH, MS

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Comparative Advantage: The Advantage of the Comparatively Powerful? J. Bradford DeLong

Presentation Advantage: How to Inform and Persuade Any Audience Presentation Advantage: How to

Original A/B Medicare and Medicare Advantage Part C or Medicare Advantage Whats The

Intermediate Care and Frailty Prof Anne Hendry Advantage JA Work Package 7 Lead ADVANTAGE JA

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

Exercise 7a: Additional Intra Prediction Modes Implement Additional Block Prediction Modes Add

pmacct: BMP and Streaming Telemetry Paolo Lucente NTT CommunicaDons | pmacct ALNOG 2, Tirana

Bitmap (Raster) Images CO2016 Multimedia and Computer Graphics Roy Crole: Bitmap Images (CO2016,

CS 4204 Computer Graphics OpenGL Practice OpenGL Practice Yong Cao Yong Cao Virginia Tech

ROUTEVIEWS EVOLVES: Modernizing the BGP Collector for Today's Researcher ROUTEVIEWS A

Working with Contractors: Habitat-Oriented RFPs and Specifications Objectives Revisit key

Raster Methods for 2-D Geometric Translations See CS-460/560 Notes:

Presented by: Linda Meyer US EPA Region 10 Sarah Babcock, PE, PMP EA Engineering, Science, and

Effective file format fuzzing Thoughts, techniques and results Mateusz j00ru Jurczyk Black

Sambuz

Useful Links

Newsletter

Mail Us