Calibrated Surrogate Losses for Adversarially Robust Classification - PowerPoint PPT Presentation

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo 2 RIKEN AIP 3 University of Michigan Jul. 9 th - 12 th @ COLT 2020 Han Bao 1,2 Clayton Scott 3 Masashi Sugiyama 2,1

Adversarial Attacks 2 Adding inperceptible small noise can fool classifiers! [Goodfellow+ 2015] original data perturbed data Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In ICLR , 2015.

Penalize Vulnerable Prediction 3 : -ball should be penalized prediction too close to boundary robust 0-1 loss usual 0-1 loss no penalty no penalty penalized! no penalty Robust Classification Usual Classification ℓ 01 ( x , y , f ) = { ℓ γ ( x , y , f ) = { 1 if yf ( x ) ≤ 0 1 if ∃Δ ∈ 𝔺 2 ( γ ) . yf ( x + Δ ) ≤ 0 0 otherwise 0 otherwise 𝔺 2 ( γ ) = { x ∈ ℝ d ∣ ∥ x ∥ 2 ≤ γ } γ

In Case of Linear Predictors 4 no penalty penalized! robust 0-1 loss linear predictors ℱ lin = { x ↦ θ ⊤ x ∣ ∥ θ ∥ 2 = 1} margin = θ ⊤ x x θ ⊤ x > γ θ ⊤ x ≤ γ ℓ γ ( x , y , f ) = { = 1 { yf ( x ) ≤ γ } := ϕ γ ( yf ( x )) 1 if ∃Δ ∈ 𝔺 2 ( γ ) . yf ( x + Δ ) ≤ 0 0 otherwise

Formulation of Classification 5 are not easy to optimize! & non-robust wrong correct wrong correct minimize 0-1 risk minimize -robust 0-1 risk Robust Classification Usual Classification (restricted to linear predictors) γ R ϕ γ ( f ) = 𝔽 [ ϕ γ ( Yf ( X )) ] R ϕ 01 ( f ) = 𝔽 [ ϕ 01 ( Yf ( X )) ] robust 0-1 loss ϕ γ ( α ) = 1 { α ≤ γ } 0-1 loss ϕ 01 ( α ) = 1 { α ≤ 0} ϕ 01 ϕ γ ϕ 01 ϕ γ

What surrogate is desirable? final learning criterion target risk surrogate risk … 6 Calibrated surrogate Target loss (0-1 loss) easily optimizable Surrogate loss ϕ R ϕ ( f ) R * ϕ R ψ ( f ) ϕ 01 R * ψ f m f ∞

What surrogate is calibrated? wrong ？ surrogate [Bartlett+ 2006] calibrated convex & 7 non-robust surrogate correct robust 0-1 wrong correct 0-1 loss Robust Classification Usual Classification calibrated ϕ ϕ ϕ ′ (0) < 0 ϕ 01 ϕ γ P. L. Bartlett, M. I. Jordan, & J. D. McAuliffe. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association , 101(473), 138-156.

Short Course on Calibration Analysis ̶ how to analyze loss calibration property ̶ Ingo Steinwart. How to compare different loss functions and their risks. Constructive Approximation , 2007.

Conditional Risk and Calibration , there exists surrogate excess conditional risk target excess conditional risk )- calibrated for a target loss is ( , 9 , and such that for all . if for any (prediction) (class prob.) Definition. Conditional Risk = Risk at a single x R ϕ ( f ) = 𝔽 X [ ℙ ( Y = + 1 | X ) ϕ ( f ( X )) + ℙ ( Y = − 1 | X ) ϕ ( − f ( X )) ] ℙ ( Y = + 1 | X ) := η f ( X ) := α C ϕ ( α , η ) := ηϕ ( α ) + (1 − η ) ϕ ( − α ) ϕ ψ ℱ ψ ε > 0 δ > 0 α ∈ A ℱ η ∈ [0,1] C ϕ ( α , η ) − C * ϕ , ℱ ( η ) < δ ⟹ C ψ ( α , η ) − C * ψ , ℱ ( η ) < ε A ℱ := { f ( x ) ∣ f ∈ ℱ , x ∈ 𝒴 }

Main Tool: Calibration Function 10 target excess conditional risk s.t. Definition. (calibration function) : biconjugate of increasing monotonically target excess risk surrogate excess risk surrogate excess conditional risk )-calibrated )-calibrated for all δ ( ε ) = η ∈ [0,1] inf inf C ϕ ( η , α ) − C * ϕ , ℱ ( η ) C ψ ( η , α ) − C * ψ , ℱ ( η ) ≥ ε α ∈ A ℱ ■ Provides iff condition ψ ℱ ⟺ δ ( ε ) > 0 ε > 0 ▶ ( , ■ Provides excess risk bound ψ ≤ ( δ **) − 1 ( R ϕ ( f ) − R * ϕ ) ψ ℱ ⟹ R ψ ( f ) − R * ▶ ( , A ℱ := { f ( x ) ∣ f ∈ ℱ , x ∈ 𝒴 } δ ** δ

Example: Binary Classification ( ▶ squared loss ) hinge loss : all measurable functions [Bartlett+ 2006] , )-calibrated iff Theorem. If surrogate is convex, it is ( 11 ϕ 01 ϕ ϕ 01 ℱ all ▶ differentiable at 0 ϕ ′ (0) < 0 ℱ all δ δ 1 1 ε ε 0 0 1 1 ϕ ( α ) = (1 − α ) 2 δ ( ε ) = ε 2 ϕ ( α ) = [1 − α ] + δ ( ε ) = ε P. L. Bartlett, M. I. Jordan, & J. D. McAuliffe. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association , 101(473), 138-156.

Analysis of Robust Classification robust 0-1 correct wrong non-robust surrogate Any convex surrogates? calibrated restricted to linear predictors ϕ γ ϕ

No convex calibrated surrogate non-robust 13 non-robust surrogate conditional risk is plotted correct non-robust correct non-robust minimizer! calibration function wrong wrong correct s.t. is non-robust Proof Sketch )-calibrated. Theorem. Any convex surrogate is not ( , correct ϕ γ ℱ lin convex in α | α | ≤ γ δ ( ε ) = η ∈ [0,1] inf inf C ϕ ( η , α ) − C * ϕ , ℱ ( η ) C ϕ γ ( η , α ) − C * ϕ γ , ℱ ( η ) ≥ ε α ∈ A ℱ − γ γ α α α η ≈ 1 η ≈ 0 η ≈ 1 2

How to find calibrated surrogate? correct conditional risk is quasiconcave consider a surrogate such that surrogate conditional risk is plotted Idea. To make conditional risk not minimized in non-robust area non-robust wrong correct 14 all superlevels are convex non-robust correct non-robust wrong correct − γ γ α α α η ≈ 1 η ≈ 0 η ≈ 1 2 ϕ

Example: Shifted Ramp Loss Ramp loss Shifted ramp loss 15 calibration function ) conditional risk ( ϕ ( α ) = clip [0,1] ( ) 1 − α 2 α − 1 1 ϕ β ( α ) = clip [0,1] ( ) 1 − α + β + β 2 α − 1 + β 1 + β η > 1/2 assume 0 < β < 1 − γ

Calibrated Surrogate Losses for Adversarially Robust Classification Example: Quasiconcavity is important correct non-robust correct because minimizer lies in non-robust area conditional risk under linear predictors correct non-robust correct under restriction to linear predictors No convex calibrated surrogate ⇐ minimizing target minimizing surrogate Calibrated surrogate loss non-robust wrong correct = minimize robust 0-1 loss Robust classification 16 shifted ramp loss ℙ ( Y = + 1 | X ) = 1 2

Calibrated Surrogate Losses for Adversarially Robust Classification - PowerPoint PPT Presentation

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo 2 RIKEN AIP 3 University of Michigan Jul. 9 th - 12 th @ COLT 2020 Han Bao 1,2 Clayton Scott 3 Masashi Sugiyama 2,1 Adversarial Attacks 2

Contents of Presentation Types of losses Causes of losses Prevention of losses

Sampling Lecture 30 ME EN 575 Andrew Ning aning@byu.edu Outline Surrogate Based Optimization

HALI: Hierarchical Adversarially Learned Inference Negar Rostamzadeh ACM Webinar January 18,

Adversarially Robust Optimization with Gaussian Processes Ilija Bogunovic, Jonathan Scarlett,

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Calibrated Surrogate Maximization of Linear-fractional Utility 07 th Feb. Han Bao (The University

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

Optimal Statistical Guarantees for Adversarially Robust Gaussian Classification Chen Dan, Yuting

Adversarially Robust Generalization Requires More Data Ludwig Schmidt Shibani Santurkar

A Spectral View of Adversarially Robust Features Shivam Garg Vatsal Sharan * Brian Zhang *

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

EFPIA POSITION PAPER EFPIA POSITION PAPER THE EFPIA SURROGATE THE EFPIA SURROGATE ENDPOINT

Urban Drainage Systems PhD Candidate: Mahmood Mahmoodian Daily supervisor: Ulrich Leopold WHAT

Surrogate production technology in fish Martin Penika, Taiju Saito www.frov.jcu.cz Content

The Search for an Optimal Immunological Surrogate Endpoint in Randomized Vaccine Efficacy Trials

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support

Neutron matter based on chiral effective field theory interactions Ingo Tews, Technische

(NB. some images removed.. ) Articles: Arunima S. Mukherjee, Margunn Aanestad, Sundeep

Engineering Aggregation Operators for Relational In-Memory Database Systems Ingo Mller PhD

DME for Peace Thursday talk Follow that car! Telling the story of conflict & peacebuilding

Nuclear PDFs & leptonnucleon scattering From Quarks to Hadrons Fred Olness SMU Thanks

Taking Into Account Interval Case of Fuzzy Uncertainty (and Fuzzy) Uncertainty Can Case When We

Higher Order Models in Network Science A NetSci 2014 satellite workshop organized by Renaud

Space Charge Effects in Linacs CERN-School High Intensity Limitations, 2015 November 2-11, 2015

Calibrated Surrogate Losses for Adversarially Robust Classification - PowerPoint PPT Presentation

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo 2 RIKEN AIP 3 University of Michigan Jul. 9 th - 12 th @ COLT 2020 Han Bao 1,2 Clayton Scott 3 Masashi Sugiyama 2,1 Adversarial Attacks 2

Contents of Presentation Types of losses Causes of losses Prevention of losses

Sampling Lecture 30 ME EN 575 Andrew Ning aning@byu.edu Outline Surrogate Based Optimization

HALI: Hierarchical Adversarially Learned Inference Negar Rostamzadeh ACM Webinar January 18,

Adversarially Robust Optimization with Gaussian Processes Ilija Bogunovic, Jonathan Scarlett,

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Calibrated Surrogate Maximization of Linear-fractional Utility 07 th Feb. Han Bao (The University

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

Optimal Statistical Guarantees for Adversarially Robust Gaussian Classification Chen Dan, Yuting

Adversarially Robust Generalization Requires More Data Ludwig Schmidt Shibani Santurkar

A Spectral View of Adversarially Robust Features Shivam Garg Vatsal Sharan * Brian Zhang *

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

EFPIA POSITION PAPER EFPIA POSITION PAPER THE EFPIA SURROGATE THE EFPIA SURROGATE ENDPOINT

Urban Drainage Systems PhD Candidate: Mahmood Mahmoodian Daily supervisor: Ulrich Leopold WHAT

Surrogate production technology in fish Martin Penika, Taiju Saito www.frov.jcu.cz Content

The Search for an Optimal Immunological Surrogate Endpoint in Randomized Vaccine Efficacy Trials

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support

Neutron matter based on chiral effective field theory interactions Ingo Tews, Technische

(NB. some images removed.. ) Articles: Arunima S. Mukherjee, Margunn Aanestad, Sundeep

Engineering Aggregation Operators for Relational In-Memory Database Systems Ingo Mller PhD

DME for Peace Thursday talk Follow that car! Telling the story of conflict &amp; peacebuilding

Nuclear PDFs &amp; leptonnucleon scattering From Quarks to Hadrons Fred Olness SMU Thanks

Taking Into Account Interval Case of Fuzzy Uncertainty (and Fuzzy) Uncertainty Can Case When We

Higher Order Models in Network Science A NetSci 2014 satellite workshop organized by Renaud

Space Charge Effects in Linacs CERN-School High Intensity Limitations, 2015 November 2-11, 2015

DME for Peace Thursday talk Follow that car! Telling the story of conflict & peacebuilding

Nuclear PDFs & leptonnucleon scattering From Quarks to Hadrons Fred Olness SMU Thanks