Provably Robust Boosted Decision Stumps and Trees against - PowerPoint PPT Presentation

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks Maksym Andriushchenko (EPFL ∗ ) Matthias Hein (University of T¨ ubingen) ∗ Work done at the University of T¨ ubingen SMLD 2019, NeurIPS 2019 Stumps Trees Plain boosted stumps Robust boosted stumps Plain boosted trees Robust boosted trees 1.00 1.00 1.00 1.00 0.75 0.75 0.75 0.75 0.50 0.50 0.50 0.50 0.25 0.25 0.25 0.25 0.00 0.00 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Maksym Andriushchenko (EPFL) 1

Adversarial vulnerability Source: Goodfellow et al, “Explaining and Harnessing Adversarial Examples”, 2014 Maksym Andriushchenko (EPFL) 2

Adversarial vulnerability Source: Goodfellow et al, “Explaining and Harnessing Adversarial Examples”, 2014 Problem : small changes in the input ⇒ large changes in the output Maksym Andriushchenko (EPFL) 2

Adversarial vulnerability Source: Goodfellow et al, “Explaining and Harnessing Adversarial Examples”, 2014 Problem : small changes in the input ⇒ large changes in the output Topic of active research for neural networks and image recognition, but what about other domains and other classifiers ? Maksym Andriushchenko (EPFL) 2

Motivation: other domains (going beyond images) Some input feature values can be incorrect : measurement noise, a human mistake, an adversarially crafted change, etc. Maksym Andriushchenko (EPFL) 3

Motivation: other domains (going beyond images) Some input feature values can be incorrect : measurement noise, a human mistake, an adversarially crafted change, etc. For high-stakes decision making, it’s necessary to ensure a reasonable worst-case error rate under possible noise perturbations Maksym Andriushchenko (EPFL) 3

Motivation: other domains (going beyond images) Some input feature values can be incorrect : measurement noise, a human mistake, an adversarially crafted change, etc. For high-stakes decision making, it’s necessary to ensure a reasonable worst-case error rate under possible noise perturbations The expected perturbation range can be specified by domain experts Maksym Andriushchenko (EPFL) 3

Motivation: other classifiers Our paper : we concentrate on boosted decision stumps and trees Maksym Andriushchenko (EPFL) 4

Motivation: other classifiers Our paper : we concentrate on boosted decision stumps and trees They are widely adopted in practice – implementations like XGBoost or LightGBM are used almost in every Kaggle competition Maksym Andriushchenko (EPFL) 4

Motivation: other classifiers Our paper : we concentrate on boosted decision stumps and trees They are widely adopted in practice – implementations like XGBoost or LightGBM are used almost in every Kaggle competition Moreover, boosted trees are interpretable which is also an important practical aspect. Who wants to deploy a black-box? Maksym Andriushchenko (EPFL) 4

Motivation: other classifiers Our paper : we concentrate on boosted decision stumps and trees They are widely adopted in practice – implementations like XGBoost or LightGBM are used almost in every Kaggle competition Moreover, boosted trees are interpretable which is also an important practical aspect. Who wants to deploy a black-box? = ⇒ it is important to develop boosted trees which are robust , but first we need to understand the reason of their vulnerability Maksym Andriushchenko (EPFL) 4

Motivation: other classifiers Our paper : we concentrate on boosted decision stumps and trees They are widely adopted in practice – implementations like XGBoost or LightGBM are used almost in every Kaggle competition Moreover, boosted trees are interpretable which is also an important practical aspect. Who wants to deploy a black-box? = ⇒ it is important to develop boosted trees which are robust , but first we need to understand the reason of their vulnerability So why do adversarial examples exist? Maksym Andriushchenko (EPFL) 4

Plain boosted stumps Robust boosted stumps 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Understanding adversarial vulnerability What goes wrong and how to fix it? Maksym Andriushchenko (EPFL) 5

Understanding adversarial vulnerability What goes wrong and how to fix it? We would like to have a large geometric margin for every point Plain boosted stumps Robust boosted stumps 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Maksym Andriushchenko (EPFL) 5

Understanding adversarial vulnerability What goes wrong and how to fix it? We would like to have a large geometric margin for every point Plain boosted stumps Robust boosted stumps 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Empirical risk minimization does not distinguish the two types of solutions ⇒ we need to use a robust objective Maksym Andriushchenko (EPFL) 5

Understanding adversarial vulnerability What goes wrong and how to fix it? We would like to have a large geometric margin for every point Plain boosted stumps Robust boosted stumps 1.00 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Empirical risk minimization does not distinguish the two types of solutions ⇒ we need to use a robust objective Let’s formalize the problem! Maksym Andriushchenko (EPFL) 5

✶ ✶ Adversarial robustness What is an adversarial example ? Consider x ∈ R d , y ∈ {− 1 , 1 } , classifier f : R d → R , some L p -norm threshold ǫ : δ ∈ R d yf ( x + δ ) min � δ � p ≤ ǫ, x + δ ∈ C Maksym Andriushchenko (EPFL) 6

✶ ✶ Adversarial robustness What is an adversarial example ? Consider x ∈ R d , y ∈ {− 1 , 1 } , classifier f : R d → R , some L p -norm threshold ǫ : δ ∈ R d yf ( x + δ ) min � δ � p ≤ ǫ, x + δ ∈ C Assume x is correctly classified ( yf ( x ) > 0), then x + δ ∗ is an adversarial example if x + δ ∗ is incorrectly classified ( yf ( x + δ ∗ ) < 0) Maksym Andriushchenko (EPFL) 6

Adversarial robustness What is an adversarial example ? Consider x ∈ R d , y ∈ {− 1 , 1 } , classifier f : R d → R , some L p -norm threshold ǫ : δ ∈ R d yf ( x + δ ) min � δ � p ≤ ǫ, x + δ ∈ C Assume x is correctly classified ( yf ( x ) > 0), then x + δ ∗ is an adversarial example if x + δ ∗ is incorrectly classified ( yf ( x + δ ∗ ) < 0) How to measure robustness? Robust test error (RTE): n n 1 1 � � ✶ yf ( x ) < 0 → ✶ yf ( x + δ ∗ ) < 0 n n i =1 i =1 � �� standard zero-one loss robust zero-one loss Maksym Andriushchenko (EPFL) 6

Adversarial robustness What is an adversarial example ? Consider x ∈ R d , y ∈ {− 1 , 1 } , classifier f : R d → R , some L p -norm threshold ǫ : δ ∈ R d yf ( x + δ ) min � δ � p ≤ ǫ, x + δ ∈ C Assume x is correctly classified ( yf ( x ) > 0), then x + δ ∗ is an adversarial example if x + δ ∗ is incorrectly classified ( yf ( x + δ ∗ ) < 0) How to measure robustness? Robust test error (RTE): n n 1 1 � � ✶ yf ( x ) < 0 → ✶ yf ( x + δ ∗ ) < 0 n n i =1 i =1 � �� standard zero-one loss robust zero-one loss Finding δ ∗ : non-convex opt. problem for NNs and BTs. Exact mixed integer formulations exist for ReLU-NNs and BTs ( slow ). Maksym Andriushchenko (EPFL) 6

Training adversarially robust models Robust optimization problem wrt the set ∆( ǫ ): n � δ ∈ ∆( ǫ ) L ( f ( x i + δ ; θ ) , y i ) min max θ i =1 Maksym Andriushchenko (EPFL) 7

Training adversarially robust models Robust optimization problem wrt the set ∆( ǫ ): n � δ ∈ ∆( ǫ ) L ( f ( x i + δ ; θ ) , y i ) min max θ i =1 L is a usual margin-based loss function (cross-entropy, exp. loss, etc) Maksym Andriushchenko (EPFL) 7

Training adversarially robust models Robust optimization problem wrt the set ∆( ǫ ): n � δ ∈ ∆( ǫ ) L ( f ( x i + δ ; θ ) , y i ) min max θ i =1 L is a usual margin-based loss function (cross-entropy, exp. loss, etc) ǫ = 0 = ⇒ just well-known Empirical Risk Minimization Maksym Andriushchenko (EPFL) 7

Training adversarially robust models Robust optimization problem wrt the set ∆( ǫ ): n � δ ∈ ∆( ǫ ) L ( f ( x i + δ ; θ ) , y i ) min max θ i =1 L is a usual margin-based loss function (cross-entropy, exp. loss, etc) ǫ = 0 = ⇒ just well-known Empirical Risk Minimization Goal : small loss ( ⇒ large margin) not only at x i , but for every x i + δ ∈ ∆( ǫ ) Maksym Andriushchenko (EPFL) 7

Training adversarially robust models Robust optimization problem wrt the set ∆( ǫ ): n � δ ∈ ∆( ǫ ) L ( f ( x i + δ ; θ ) , y i ) min max θ i =1 L is a usual margin-based loss function (cross-entropy, exp. loss, etc) ǫ = 0 = ⇒ just well-known Empirical Risk Minimization Goal : small loss ( ⇒ large margin) not only at x i , but for every x i + δ ∈ ∆( ǫ ) Adversarial training : approximately solve the robust loss = ⇒ minimization of a lower bound on the objective Maksym Andriushchenko (EPFL) 7

Provably Robust Boosted Decision Stumps and Trees against - PowerPoint PPT Presentation

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks Maksym Andriushchenko (EPFL ) Matthias Hein (University of T ubingen) Work done at the University of T ubingen SMLD 2019, NeurIPS 2019 Stumps Trees

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Tree Computation for Ranking and Classification CS240A, T. Yang, 2016 Outlines Decision Trees

Provably secure hash functions - do we care? Krystian Matusiewicz Technical University of Denmark

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Gradient boosted trees w ith XGBoost C R E D IT R ISK MOD E L IN G IN P YTH ON Michael

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

Introduction to Machine Learning CART: Splitting Criteria compstat-lmu.github.io/lecture_i2ml

The landscape of empirical risk for non-convex losses Song Mei ICME, Stanford December 3, 2016

CS485/685 Lecture 15: Feb 28, 2012 Probably Approximately Correct Learning [BDSS] Chapter 1

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Risk bounds for cl classification and re regre ression rules that interpolate Daniel Hsu

Recent Results on Algorithmic Fairness and Meta-Learning Massimiliano Pontil Computational

The Failure of a Clearinghouse: Empirical Evidence Vincent Bignon Guillaume Vuillemey Banque de

Concentration of risk measures: A Wasserstein distance approach 1 Prashanth L. A. Joint work