Calibrated Surrogate Maximization of Linear-fractional Utility 07 th - PowerPoint PPT Presentation

Calibrated Surrogate Maximization of Linear-fractional Utility 07 th Feb. Han Bao (The University of Tokyo / RIKEN AIP)

accuracy: 0.8 Is accuracy appropriate? 2 8 5 5 accuracy: 0.8 May cause severe issues! (e.g. in medical diagnosis) positive negative 2 ■ Our focus: binary classification

accuracy: 0.8 positive F-measure Is accuracy appropriate? F-measure: 0.75 negative F-measure: 0 accuracy: 0.8 5 5 8 2 3 2 𝖴𝖰 𝖦 𝟤 = 2 𝖴𝖰 + 𝖦𝖰 + 𝖦𝖮 𝖴𝖰 = 𝔽 X , Y =+1 [1 { f ( X )>0} ] 𝖴𝖮 = 𝔽 X , Y = − 1 [1 { f ( X )<0} ] 𝖦𝖰 = 𝔽 X , Y = − 1 [1 { f ( X )>0} ] 𝖦𝖮 = 𝔽 X , Y =+1 [1 { f ( X )<0} ]

Training and Evaluation minimizing 0/1-error compatible evaluation ？？？ training incompatible evaluation training 4 compatible evaluation minimizing 0/1-error training ■ Usual empirical risk minimization (ERM) 𝖡𝖽𝖽 = 𝖴𝖰 + 𝖴𝖮 1 = 1 − (0/1-risk) 0 ■ Training with accuracy but evaluate with F 1 2 𝖴𝖰 1 𝖦 𝟤 = 2 𝖴𝖰 + 𝖦𝖰 + 𝖦𝖮 0 ■ Why not? Direct Optimization 2 𝖴𝖰 𝖦 𝟤 = 2 𝖴𝖰 + 𝖦𝖰 + 𝖦𝖮

Balanced Error Rate Fowlkes-Mallows index Accuracy Weighted Accuracy Gower-Legendre index Jaccard index Matthews Correlation Coefficient F-measure w 1 𝖴𝖰 + w 2 𝖴𝖮 𝖷𝖡𝖽𝖽 = 𝖦𝖭𝖩 = 𝖴𝖰 1 w 1 𝖴𝖰 + w 2 𝖴𝖮 + w 3 𝖦𝖰 + w 4 𝖦𝖮 π 𝖴𝖰 + 𝖦𝖰 2 𝖴𝖰 𝖦 𝟤 = 2 𝖴𝖰 + 𝖦𝖰 + 𝖦𝖮 Wanna Unify!! 𝖡𝖽𝖽 = 𝖴𝖰 + 𝖴𝖮 𝖢𝖥𝖲 = 1 1 π 𝖦𝖮 + 1 − π 𝖦𝖰 𝖴𝖰 𝖪𝖻𝖽 = 𝖴𝖰 + 𝖦𝖰 + 𝖦𝖮 𝖴𝖰 ⋅ 𝖴𝖮 − 𝖦𝖰 ⋅ 𝖦𝖮 𝖭𝖣𝖣 = π (1 − π )( 𝖴𝖰 + 𝖦𝖰 )( 𝖴𝖮 + 𝖦𝖮 ) 𝖴𝖰 + 𝖴𝖮 𝖧𝖬𝖩 = 𝖴𝖰 + α ( 𝖦𝖰 + 𝖦𝖮 ) + 𝖴𝖮

Actual Metrics linear-fraction Note: Unification of Metrics 6 𝖴𝖮 = ℙ ( Y = − 1) − 𝖦𝖰 𝖦𝖮 = ℙ ( Y = + 1) − 𝖴𝖰 2 𝖴𝖰 𝖦 𝟤 = 2 𝖴𝖰 + 𝖦𝖰 + 𝖦𝖮 U ( f ) = a 0 𝖴𝖰 + b 0 𝖦𝖰 + c 0 a 1 𝖴𝖰 + b 1 𝖦𝖰 + c 1 𝖴𝖰 𝖪𝖻𝖽 = 𝖴𝖰 + 𝖦𝖰 + 𝖦𝖮 a k , b k , c k : constants

Unification of Metrics := 7 = linear-fraction . . . . . . . . . . . . . a 0 𝔽 P + b 0 𝔽 N . . . . . . . . . . + c 0 U ( f ) = a 0 𝖴𝖰 + b 0 𝖦𝖰 + c 0 1 . . . . . . . . . . . . . a 1 𝔽 P + b 1 𝔽 N . . . . . . . . . . + c 1 a 1 𝖴𝖰 + b 1 𝖦𝖰 + c 1 1 𝔽 X [ W 0 ( f ( X ))] 𝔽 X [ W 1 ( f ( X ))] ■ TP, FP = expectation of 0/1-loss 𝖴𝖰 = ℙ ( Y = + 1, f ( X ) > 0) = 𝔽 X , Y =+1 [1 { f ( X )>0} ] ▶ e.g.

Goal of This Talk 8 Given a metric metric (utility) labeled sample classifier s.t. i.i.d. U ( f ) = a 0 𝖴𝖰 + b 0 𝖦𝖰 + c 0 a 1 𝖴𝖰 + b 1 𝖦𝖰 + c 1 Q. How to optimize U ( f ) directly? ▶ without estimating class-posterior probability f : 𝒴 → ℝ {( x i , y i )} n ∼ ℙ i =1 U ( f ′ � ) U ( f ) = sup U f ′ �

9 Outline ■ Introduction ■ Preliminary ▶ Convex Risk Minimization ▶ Plug-in Principle vs. Cost-sensitive Learning ■ Key Idea ▶ Quasi-concave Surrogate ■ Calibration Analysis & Experiments

Formulation of Classification ̂ classified correctly classified incorrectly make 0/1 loss smoother (Empirical) Surrogate Risk Example of convex in ! 10 = minimize mis-classification rate   ̂ ■ Goal of classification: maximize accuracy   n R ( f ) = 1 ∑ 1 [ y i ≠ sign( f ( x i ))] 0/1 ( ` ) n Logistic i =1 Hinge � ( m ) n = 1 ∑ ℓ ( y i f ( x i )) 1 n i =1 0 − 1 0 1 m = y i f ( x i ) m n R ϕ ( f ) = 1 ∑ ϕ ( y i f ( x i )) ϕ n i =1 ▶ logistic loss f ▶ hinge loss ⇒ SVM ▶ exponential loss ⇒ AdaBoost

3 Actors in Risk Minimization 0/1-loss what we actually minimize (empirical (surrogate) risk) classified correctly ̂ differentiable upper bound of 0/1-loss surrogate loss (surrogate risk) prediction margin classified         11     incorrectly   ■ Minimize classification risk (= 1 - Accuracy)   R ( f ) = 𝔽 [ ℓ ( Yf ( X ) ) ] 0/1 ( ` ) Logistic 0/1-loss represents if X is correctly Hinge � ( m ) classified by f 1 ■ Surrogate loss makes tractable   0 − 1 0 1 m = y i f ( x i ) m R ϕ ( f ) = 𝔽 [ ϕ ( Yf ( X ))] ■ Sample approximation (M-estimation)   n R ϕ ( f ) = 1 ∑ ϕ ( y i f ( x i )) n i =1

Convexity & Statistical Property Then, Assume : convex. 12 = argmin ? Q. argmin tractable (convex) intractable generalize Theorem. iff . ̂ (informal) [Bartlett+ 2006] A. Yes, w/ calibrated surrogate n R ϕ ( f ) = 1 R ϕ R ∑ ϕ ( y i f ( x i )) n i =1 ϕ R ϕ ( f ) = 𝔽 [ ϕ ( Yf ( X ))] argmin f R ϕ ( f ) = argmin f R ( f ) ϕ ′ � (0) < 0 R ( f ) = 𝔽 [ ℓ ( Yf ( X ))] P. L. Bartlett, M. I. Jordan, & J. D. McAuliffe. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association , 101(473), 138-156.

Related Work: Plug-in Rule Y = +1 ⇒ estimate P(Y=+1|x) and δ independently Y = -1 Y = +1 Y = -1 [Koyejo+ NIPS2014; Yan+ ICML2018] 13 ■ Classifier based on class-posterior probability Bayes-optimal classifier (accuracy): ℙ ( Y = + 1 | x ) − 1 2 ℙ ( Y = + 1 ∣ X ) 1 0 1 2 Bayes-optimal classifier (general case): ℙ ( Y = + 1 | x ) − δ * ℙ ( Y = + 1 ∣ X ) 0 δ * 1 ℙ ( Y = + 1 | x ) δ * O. O. Koyejo, N. Natarajan, P. K. Ravikumar, & I. S. Dhillon. Consistent binary classification with generalized performance metrics. In NIPS , 2014. B. Yan, O. Koyejo, K. Zhong, & P. Ravikumar. Binary classification with Karmic, threshold-quasi-concave metrics. In ICML , 2018.

Convexity & Statistical Property calibration ① = argmin argmin objective? Q. tractable & calibrated calibration intractable tractable (convex) 15 intractable generalize ̂ ② n R ϕ ( f ) = 1 ∑ ϕ ( y i f ( x i )) n i =1 U ( f ) = 𝔽 X [ W 0 ( f ( X ))] 𝔽 X [ W 1 ( f ( X ))] R ϕ ( f ) = 𝔽 [ ϕ ( Yf ( X ))] R ϕ R R ( f ) = 𝔽 [ ℓ ( Yf ( X ))]

Non-concave, but quasi-concave (proof) Show ⇒ efficiently optimized non-concave, but unimodal concave is convex for is convex NB: super-level set of concave func. 16 is convex. if : concave, : convex, for and Idea: concave / convex = quasi-concave is quasi-concave f ( x ) g ( x ) f g f ( x ) ≥ 0 g ( x ) > 0 ∀ x { x | f / g ≥ α } f ( x ) g ( x ) ≥ α ⟺ f ( x ) − α g ( x ) ≥ 0 ⊇ ■ quasi-concave concave ■ super-levels are convex ∴ { x | f / g ≥ α } ∀ α ≥ 0

Surrogate Utility = non-negative sum of convex ⇒ concave non-negative sum of concave denominator from above numerator from below ⇒ convex 17 linear-fraction ■ Idea: bound true utility from below . . . . . . . . . . . . . a 0 𝔽 P + b 0 𝔽 N . . . . . . . . . . + c 0 U ( f ) = a 0 𝖴𝖰 + b 0 𝖦𝖰 + c 0 1 . . . . . . . . . . . . . a 1 𝔽 P + b 1 𝔽 N . . . . . . . . . . + c 1 a 1 𝖴𝖰 + b 1 𝖦𝖰 + c 1 1 O . . . . . . . . . . . . . O a 0 𝔽 P + b 0 𝔽 N . . . . . . . . . . + c 0 ≥ 1 . . . . . . . . . . . . . a 1 𝔽 P + b 1 𝔽 N . . . . . . . . . . + c 1 1 O O

Surrogate Utility linear-fraction surrogate loss : Surrogate Utility = 18 ■ Idea: bound true utility from below O . . . . . . . . . . . . . O a 0 𝔽 P + b 0 𝔽 N . . . . . . . . . . + c 0 ≥ U ( f ) = a 0 𝖴𝖰 + b 0 𝖦𝖰 + c 0 1 . . . . . . . . . . . . . a 1 𝖴𝖰 + b 1 𝖦𝖰 + c 1 a 1 𝔽 P + b 1 𝔽 N . . . . . . . . . . + c 1 1 O O U ϕ ( f ) = a 0 𝔽 P [1 − ϕ ( f ( X ))] + b 0 𝔽 N [ − ϕ ( − f ( X ))] + c 0 φ ( m ) a 1 𝔽 P [1 + ϕ ( f ( X ))] + b 1 𝔽 N [ ϕ ( − f ( X )) ] + c 1 𝔽 [ W 0, ϕ ] O := 𝔽 [ W 1, ϕ ]

Hybrid Optimization Strategy ▶ isn’t quasi-concave if numerator < 0 maximize fractional form (quasi-concave) 19 O . . . . . . . . . . . . . O a 0 𝔽 P + b 0 𝔽 N . . . . . . . . . . + c 0 1 U ϕ ( f ) = = . . . . . . . . . . . . . a 1 𝔽 P + b 1 𝔽 N . . . . . . . . . . + c 1 1 O O ■ Note: numerator can be negative U ϕ ▶ maximize numerator first (concave), then

Hybrid Optimization Strategy 20 maximize numerator maximize fraction normalized gradient for quasi-concave optimization [Hazan+ NeurIPS2015] Hazan, E., Levy, K., & Shalev-Shwartz, S. (2015). Beyond convexity: Stochastic quasi-convex optimization. In Advances in Neural Information Processing Systems (pp. 1594-1602).

Calibrated Surrogate Maximization of Linear-fractional Utility 07 th - PowerPoint PPT Presentation

Calibrated Surrogate Maximization of Linear-fractional Utility 07 th Feb. Han Bao (The University of Tokyo / RIKEN AIP) accuracy: 0.8 Is accuracy appropriate? 2 8 5 5 accuracy: 0.8 May cause severe issues! (e.g. in medical diagnosis)

Efficient Numerical Methods for Fractional Laplacian and time fractional PDEs Jie Shen Purdue

Sampling Lecture 30 ME EN 575 Andrew Ning aning@byu.edu Outline Surrogate Based Optimization

Fractional Linear What We Do Dependence under Interval Main Idea Main Idea (cont-d)

Just-In-TimeReview Sections 18-21 JIT18: SimplifyingRatio- nalExpressions Fractional

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Submodular Maximization Seffi Naor Lecture 3 4th Cargese Workshop on Combinatorial Optimization

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

EFPIA POSITION PAPER EFPIA POSITION PAPER THE EFPIA SURROGATE THE EFPIA SURROGATE ENDPOINT

Urban Drainage Systems PhD Candidate: Mahmood Mahmoodian Daily supervisor: Ulrich Leopold WHAT

Surrogate production technology in fish Martin Penika, Taiju Saito www.frov.jcu.cz Content

The Search for an Optimal Immunological Surrogate Endpoint in Randomized Vaccine Efficacy Trials

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

An introduction to fractional calculus Mohammad Hossein Heydari Department of Mathematics, Shiraz

Potency test Potency test Fractional beta- -cell Viability cell Viability Fractional beta

Minimal Sufficient Conditions for a Primal Optimizer in Nonsmooth Utility Maximization Harry

1 Expectimax Pseudocode Expectimax Example 10 1/2 1/6 1/3 5 8 24 7 -12 def

Differentiability of reflected BSDEs with quadratic growth joint work with S. Ankirchner and P.

A new functional analytic approach to robust utility maximization in the dominated case Julio

Modeling and Decision Making 1/20/17 Modeling Dimensions Discreteness Planning horizon

The Stability of Best Effort and Managed Services, and the Role of Application Spawning in the

CERN roadmap: High Energy frontiers CLIC Workshop 2015 26-30 September 2015 - CERN Outline -

CS 4700: Foundations of Artificial Intelligence Bart Selman Problem Solving by Search R&N:

Calibrated Surrogate Maximization of Linear-fractional Utility 07 th - PowerPoint PPT Presentation

Calibrated Surrogate Maximization of Linear-fractional Utility 07 th Feb. Han Bao (The University of Tokyo / RIKEN AIP) accuracy: 0.8 Is accuracy appropriate? 2 8 5 5 accuracy: 0.8 May cause severe issues! (e.g. in medical diagnosis)

Efficient Numerical Methods for Fractional Laplacian and time fractional PDEs Jie Shen Purdue

Sampling Lecture 30 ME EN 575 Andrew Ning aning@byu.edu Outline Surrogate Based Optimization

Fractional Linear What We Do Dependence under Interval Main Idea Main Idea (cont-d)

Just-In-TimeReview Sections 18-21 JIT18: SimplifyingRatio- nalExpressions Fractional

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Submodular Maximization Seffi Naor Lecture 3 4th Cargese Workshop on Combinatorial Optimization

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

EFPIA POSITION PAPER EFPIA POSITION PAPER THE EFPIA SURROGATE THE EFPIA SURROGATE ENDPOINT

Urban Drainage Systems PhD Candidate: Mahmood Mahmoodian Daily supervisor: Ulrich Leopold WHAT

Surrogate production technology in fish Martin Penika, Taiju Saito www.frov.jcu.cz Content

The Search for an Optimal Immunological Surrogate Endpoint in Randomized Vaccine Efficacy Trials

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

An introduction to fractional calculus Mohammad Hossein Heydari Department of Mathematics, Shiraz

Potency test Potency test Fractional beta- -cell Viability cell Viability Fractional beta

Minimal Sufficient Conditions for a Primal Optimizer in Nonsmooth Utility Maximization Harry

1 Expectimax Pseudocode Expectimax Example 10 1/2 1/6 1/3 5 8 24 7 -12 def

Differentiability of reflected BSDEs with quadratic growth joint work with S. Ankirchner and P.

A new functional analytic approach to robust utility maximization in the dominated case Julio

Modeling and Decision Making 1/20/17 Modeling Dimensions Discreteness Planning horizon

The Stability of Best Effort and Managed Services, and the Role of Application Spawning in the

CERN roadmap: High Energy frontiers CLIC Workshop 2015 26-30 September 2015 - CERN Outline -

CS 4700: Foundations of Artificial Intelligence Bart Selman Problem Solving by Search R&amp;N:

CS 4700: Foundations of Artificial Intelligence Bart Selman Problem Solving by Search R&N: