Adversarial Surrogate Losses for General Multiclass Classification - PowerPoint PPT Presentation

Adversarial Surrogate Losses for General Multiclass Classification Rizal Zaini Ahmad Fathony Committee: Prof. Brian Ziebart (Chair) Prof. Bhaskar DasGupta Prof. Lev Reyzin Prof. Xinhua Zhang Prof. Simon Lacoste-Julien 1

→ Multiclass Classification Supervised Learning Testing Training 1 𝒚 𝑜+1 ? 𝒚 1 𝑧 1 2 𝒚 𝑜+2 ? 𝒚 2 𝑧 2 3 … … … 𝒚 𝑜 𝑧 𝑜 |𝒵| possible value of 𝑧 Sample Distribution ෨ 𝑄(𝒚, 𝑧) Loss Function: loss(ො 𝑧, 𝑧) Data 2

Multiclass Classification → Zero-One Loss Digit Recognition 1 2 3 … … Loss Function: loss ො 𝑧, 𝑧 = 𝐽(ො 𝑧 ≠ 𝑧) General Multiclass Classification → any loss 3

Multiclass Classification → Ordinal Classification Movie Rating Prediction 1 2 … … 5 Predicted vs Actual Label: Loss Function (example): loss ො 𝑧, 𝑧 = |ො 𝑧 − 𝑧| Distance Loss 4

Multiclass Classification → Taxonomy Classification Object Classification ℎ = 4 Object Nonlife Life Two-wheeled Four-wheeled 5:Person Animal 1:Bicycle 2:Motorbike 3:Bus 4:Car Carnivore Herbivore 6:Cat 7:Dog 8:Cow loss(Bus,Car) = 2 loss(Cat,Dog) = 1 Loss Function (example): loss(Bus,Bicycle) = 3 loss(Cat,Cow) = 2 loss ො 𝑧, 𝑧 = ℎ − 𝑤 ො 𝑧, 𝑧 + 1 loss(Car,Cow) = 4 loss(Cow,Person) = 3 loss(Bus,Person) = 4 Loss(Cow,Motorbike)= 4 ℎ : tree height 𝑤 ො 𝑧, 𝑧 : level of the common ancestor 5

Multiclass Classification → Loss Matrix 𝑧, 𝑧 → loss matrix: 𝑴 loss function: loss ො Taxonomy-based loss Ordinal Classification Loss Zero One Loss loss ො 𝑧, 𝑧 = 𝐽(ො 𝑧 ≠ 𝑧) loss ො 𝑧, 𝑧 = |ො 𝑧 − 𝑧| loss ො 𝑧, 𝑧 = ℎ − 𝑤 ො 𝑧, 𝑧 + 1 6

Empirical Risk Minimization (ERM) • Assume a family of parametric hypothesis function 𝑔 (e.g. linear discriminator) Find the hypothesis 𝑔 ∗ that minimize the empirical risk: • Intractable optimization, non-convex, non-continuous Convex surrogate loss need to be employed Example: Binary zero-one loss Surrogate Loss: • Hinge loss (used by SVM) • Log loss (used by Logistic Regression) • Exponential loss (used by AdaBoost) 7

ERM under Hinge Loss and Log Loss SVM (hinge loss): Logistic regression (log loss): Probabilistic prediction ෠ 𝑄 𝑔 (𝑧|𝒚) Binary SVM and Binary Logistic Regression: Fisher consistent Fisher consistent: produce Bayes optimal decision in the limit Binary SVM only: Dual parameter sparsity Surrogate loss for multiclass cases: Extend binary surrogate loss like hinge-loss and log-loss to multiclass 8

Adversarial Prediction (Asif et. al., 2015) Empirical Risk Minimization Original Loss Approximate the loss Non-convex, non-continuous with convex surrogates Empirical Risk Minimization Approximate loss Exact training data Probabilistic prediction Adversarial Prediction Adversarial Prediction Exact loss Evaluate against an adversary, instead of using empirical data Approximate training data (by only using the statistics) Adversary’s probabilistic prediction Constraint the statistics of the adversary ’s distribution to match the empirical statistics 9

Adversarial Prediction → Optimization Adversarial Prediction Example of game matrix for zero-one loss Minimax and Lagrangian duality Minimization over many zero-sum games where: Inner optimization Can be solved using Linear Programming 𝒵 3.5 ) Complexity: 𝑃(| | 10

Adversarial Prediction → ERM perspective Adversarial Prediction (optimization) Empirical Risk Minimization with surrogate loss: where: Adversarial Surrogate Loss = The Nash equilibrium value of ′ the zero-sum game characterized by matrix 𝑴 𝒚,𝜄 11

AL 0-1 Outline The Adversarial Surrogate Loss for Multiclass Zero-One Classification 1 AL ord Adversarial Surrogate Losses for Multiclass Ordinal Classification 2 3 Ongoing and Future Works 12

The Adversarial Surrogate Loss for Multiclass Zero-One Classification Based on: Rizal Fathony , Anqi Liu, Kaiser Asif, Brian D. Ziebart. “ Adversarial Multiclass Classification: A Risk Minimization Perspective ”. Advances in Neural Information Processing Systems 29 (NIPS), 2016. 13

Multiclass Zero-One: Related Works Multiclass Support Vector Machine Fisher Consistent? Perform well in (Tewari and Bartlett, 2007) low feature spaces? (Liu, 2007) (Dogan, 2016) 1. The WW Model (Weston et.al., 2002) Relative Margin Model 2. The CS Model (Crammer and Singer, 1999) Relative Margin Model 3. The LLW Model (Lee et.al., 2004) with: Absolute Margin Model 14

Adversarial Prediction : Multiclass Zero-One Loss Adversarial Game shorter notation Nash Equilibrium 15

Ƽ Ƹ Ƽ Ƹ Ƹ Ƹ Ƹ Ƹ Ƹ Ƽ Ƹ Ƹ Ƽ Ƹ Ƽ Ƽ Ƽ Ƽ Ƽ Ƹ Ƽ Ƽ Ƽ Ƹ Ƽ Ƹ Ƹ Adversarial Zero-Sum Game (Zero-One Loss) The augmented game for 4 classes 𝑞 1 𝑞 2 𝑞 3 𝑞 4 𝑞 1 𝑞 2 if completely 𝑞 3 mixed 𝑞 4 when Ƽ 𝑞 3 = 0 Considering all possible set of adversary’s non -zero probability: 𝑞 1 𝑞 2 0 𝑞 4 𝑞 1 𝑞 2 𝑞 3 𝑞 4 AL 0-1 : maximization over 2 |𝒵| − 1 hyperplanes 𝑞 1 𝑞 2 𝑞 4 𝑞 1 𝑞 2 𝑞 4 𝑞 1 𝑞 1 𝑞 2 if completely 𝑞 2 0 mixed 𝑞 4 𝑞 4 16

AL 0-1 (Adversarial Surrogate Loss) → Binary Classification AL 0-1 for binary zero-one classification: If the true label 𝑧 = 1 Change classification notation to 𝑧 ∈ +1, −1 , parameter to 𝒙 and 𝑐 , add L2 regularization Soft Margin SVM Binary AL 0-1 17

AL 0-1 → 3 Class Classification AL 0-1 for 3-class zero-one classification: Maximization over 7 hyperplanes: 𝑧 = 1 20

AL 0-1 → Fisher Consistency → Property of the Minimizer Fisher Consistency in Multiclass Zero-One Classification constraint is employed to remove redundant solution The minimizer 𝒈 ∗ lies in the area defined by all class label 𝑧 = 1 𝒯 = {1,2} 21

AL 0-1 → Fisher Consistency Finding the minimizer 𝒈 ∗ based on the properties of the minimizer Solution: Fisher Consistent 22

AL 0-1 → Optimization → Primal Optimization of AL 0-1 (Empirical Risk Minimization) Gradient for a single sample 𝒚 𝑗 Let 𝑆 be the set that maximize AL 0-1 for 𝒚 𝑗 , The sub-gradient for a single sample 𝒚 𝑗 includes: Finding the set 𝑆 : Greedy algorithm: Compute all 𝜔 𝑘 ≜ 𝜄 𝑈 𝜚 𝒚 𝑗 , 𝑘 − 𝜚 𝒚 𝑗 , 𝑧 𝑗 1. Sort 𝜔 𝑘 in non-descending order 2. 3. Start with empty set 𝑆 = ∅ 4. Repeat: Incrementally add 𝑘 to the set 𝑆 , update the value of AL 0-1 5. Until adding another one decrease the value of AL 0-1 6. 23

AL 0-1 → Optimization → Dual Primal Quadratic Programming Formulation of AL 0-1 with L2 regularization Constrained Primal QP Dual QP Formulation where: , and is the constant part of 24

AL 0-1 → Optimization → Dual → Kernel Trick Kernel trick input space rich feature space 𝒚 𝑗 𝜕(𝒚 𝑗 ) Compute the dot products implicitly Dual QP Formulation where: 𝑆 𝑗,𝑙 is the set of labels included in the constraint Δ 𝑗,𝑙 25

AL 0-1 → Optimization → Dual → Constraint Generation Primal and Dual Optimization Exponential number of constraints (in primal) and dual variables Constraint Generation Algorithm Polynomial time convergence guarantee is provided Experiment shows better convergence rate 26

AL 0-1 → Experiments Dataset properties and AL 0-1 constraints 27

AL 0-1 → Experiments → Results Results for Linear Kernel and Gaussian Kernel The mean (standard deviation) of the accuracy Bold numbers: best or not significantly worse than the best 28

Multiclass Zero-One Classification Perform well in Fisher Consistent? low feature spaces? 1. The SVM WW Model (Weston et.al., 2002) Relative Margin Model 2. The SVM CS Model (Crammer and Singer, 1999) Relative Margin Model 3. The SVM LLW Model (Lee et.al., 2004) Absolute Margin Model 4. The AL 0-1 (Adversarial Surrogate Loss) Relative Margin Model 29

Adversarial Surrogate Losses for Multiclass Ordinal Classification Based on: Rizal Fathony , Mohammad Bashiri , Brian D. Ziebart. “ Adversarial Surrogate Losses for Ordinal Regression ”. Advances in Neural Information Processing Systems 30 (NIPS), 2017. 30

Ordinal Classification: Related Works Support Vector Machine for Ordinal Classification Extend hinge loss to ordinal classification A. Threshold Methods (Sashua & Levin, 2003; Chu & Keerthi, 2005; Rennie & Srebro, 2005) 1. All Threshold (also called SVORIM) 𝜀 is a surrogate for binary classification, e.g. the hinge loss 2. Immediate Threshold (also called SVOREX) B. Reduction Framework (Li & Lin, 2007) - Create 𝒵 − 1 weighted extended samples for each training sample, - Run binary classification with binary surrogate loss (e.g. hinge loss) on the extended samples C. Cost Sensitive Classification Based Methods (Lin, 2008; Tu & Lin, 2010; Lin, 2014) 1. Cost Sensitive One-Versus-All (CSOVA) 2. Cost Sensitive One-Versus-One (CSOVO) 3. Cost Sensitive One-Sided-Regression (CSOSR) 31

Adversarial Surrogate Loss : Ordinal Classification Adversarial Game where Nash Equilibrium AL ord : maximization over pairs Can be independently realized 32

Adversarial Surrogate Losses for General Multiclass Classification - PowerPoint PPT Presentation

Adversarial Surrogate Losses for General Multiclass Classification Rizal Zaini Ahmad Fathony Committee: Prof. Brian Ziebart (Chair) Prof. Bhaskar DasGupta Prof. Lev Reyzin Prof. Xinhua Zhang Prof. Simon Lacoste-Julien 1 Multiclass

Contents of Presentation Types of losses Causes of losses Prevention of losses

Sampling Lecture 30 ME EN 575 Andrew Ning aning@byu.edu Outline Surrogate Based Optimization

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo

EFPIA POSITION PAPER EFPIA POSITION PAPER THE EFPIA SURROGATE THE EFPIA SURROGATE ENDPOINT

Urban Drainage Systems PhD Candidate: Mahmood Mahmoodian Daily supervisor: Ulrich Leopold WHAT

Surrogate production technology in fish Martin Penika, Taiju Saito www.frov.jcu.cz Content

The Search for an Optimal Immunological Surrogate Endpoint in Randomized Vaccine Efficacy Trials

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support

LOSSES OEE Workshop Siyambulela Bozo: Junior Project Manager AIDC - TPM Pres resentation

Calibration of Convex Surrogate Losses via Property Elicitation Jessie Finocchiaro October 10,

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization Zhenxun

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Capturing rare events with the heterogeneous multiscale method David Kelly Eric Vanden-Eijnden

Experimental quantum fast Carlo Di Franco hitting on hexagonal graphs 9th International

Quantum many particle systems in one dimensional optical potentials Luigi Amico Dep. Fisica

Wireless Sensor Networks 5th Lecture 08.11.2006 Christian Schindelhauer

Exponential lower bounds for hom. depth-5 circuits over finite fields Mrinal Kumar Ramprasad

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

The New Large D Limit of Matrix Models: Theory and Applications Frank FERRARI Universit Libre

Comparison of Fault Currents in AC & DC Microgrids Mahdi Izadkhast Learning Objectives

Adversarial Surrogate Losses for General Multiclass Classification - PowerPoint PPT Presentation

Adversarial Surrogate Losses for General Multiclass Classification Rizal Zaini Ahmad Fathony Committee: Prof. Brian Ziebart (Chair) Prof. Bhaskar DasGupta Prof. Lev Reyzin Prof. Xinhua Zhang Prof. Simon Lacoste-Julien 1 Multiclass

Contents of Presentation Types of losses Causes of losses Prevention of losses

Sampling Lecture 30 ME EN 575 Andrew Ning aning@byu.edu Outline Surrogate Based Optimization

Food Losses/Waste in Food Value Chains Food Losses/Waste in Food Value Chains Areas

Calibrated Surrogate Losses for Adversarially Robust Classification 1 The University of Tokyo

EFPIA POSITION PAPER EFPIA POSITION PAPER THE EFPIA SURROGATE THE EFPIA SURROGATE ENDPOINT

Urban Drainage Systems PhD Candidate: Mahmood Mahmoodian Daily supervisor: Ulrich Leopold WHAT

Surrogate production technology in fish Martin Penika, Taiju Saito www.frov.jcu.cz Content

The Search for an Optimal Immunological Surrogate Endpoint in Randomized Vaccine Efficacy Trials

Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support

LOSSES OEE Workshop Siyambulela Bozo: Junior Project Manager AIDC - TPM Pres resentation

Calibration of Convex Surrogate Losses via Property Elicitation Jessie Finocchiaro October 10,

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization Zhenxun

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Capturing rare events with the heterogeneous multiscale method David Kelly Eric Vanden-Eijnden

Experimental quantum fast Carlo Di Franco hitting on hexagonal graphs 9th International

Quantum many particle systems in one dimensional optical potentials Luigi Amico Dep. Fisica

Wireless Sensor Networks 5th Lecture 08.11.2006 Christian Schindelhauer

Exponential lower bounds for hom. depth-5 circuits over finite fields Mrinal Kumar Ramprasad

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

The New Large D Limit of Matrix Models: Theory and Applications Frank FERRARI Universit Libre

Comparison of Fault Currents in AC &amp; DC Microgrids Mahdi Izadkhast Learning Objectives

Comparison of Fault Currents in AC & DC Microgrids Mahdi Izadkhast Learning Objectives