Provably Robust Boosted Decision Stumps and Trees against - - PowerPoint PPT Presentation

provably robust boosted decision stumps and trees against
SMART_READER_LITE
LIVE PREVIEW

Provably Robust Boosted Decision Stumps and Trees against - - PowerPoint PPT Presentation

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks Maksym Andriushchenko (EPFL ) Matthias Hein (University of T ubingen) Work done at the University of T ubingen SMLD 2019, NeurIPS 2019 Stumps Trees


slide-1
SLIDE 1

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks

Maksym Andriushchenko (EPFL∗) Matthias Hein (University of T¨ ubingen)

∗Work done at the University of T¨

ubingen SMLD 2019, NeurIPS 2019 Stumps Trees

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Plain boosted stumps 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Robust boosted stumps 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Plain boosted trees 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Robust boosted trees

Maksym Andriushchenko (EPFL) 1

slide-2
SLIDE 2

Adversarial vulnerability

Source: Goodfellow et al, “Explaining and Harnessing Adversarial Examples”, 2014

Maksym Andriushchenko (EPFL) 2

slide-3
SLIDE 3

Adversarial vulnerability

Source: Goodfellow et al, “Explaining and Harnessing Adversarial Examples”, 2014

Problem: small changes in the input ⇒ large changes in the output

Maksym Andriushchenko (EPFL) 2

slide-4
SLIDE 4

Adversarial vulnerability

Source: Goodfellow et al, “Explaining and Harnessing Adversarial Examples”, 2014

Problem: small changes in the input ⇒ large changes in the output Topic of active research for neural networks and image recognition, but what about other domains and other classifiers?

Maksym Andriushchenko (EPFL) 2

slide-5
SLIDE 5

Motivation: other domains (going beyond images)

Some input feature values can be incorrect: measurement noise, a human mistake, an adversarially crafted change, etc.

Maksym Andriushchenko (EPFL) 3

slide-6
SLIDE 6

Motivation: other domains (going beyond images)

Some input feature values can be incorrect: measurement noise, a human mistake, an adversarially crafted change, etc. For high-stakes decision making, it’s necessary to ensure a reasonable worst-case error rate under possible noise perturbations

Maksym Andriushchenko (EPFL) 3

slide-7
SLIDE 7

Motivation: other domains (going beyond images)

Some input feature values can be incorrect: measurement noise, a human mistake, an adversarially crafted change, etc. For high-stakes decision making, it’s necessary to ensure a reasonable worst-case error rate under possible noise perturbations The expected perturbation range can be specified by domain experts

Maksym Andriushchenko (EPFL) 3

slide-8
SLIDE 8

Motivation: other classifiers

Our paper: we concentrate on boosted decision stumps and trees

Maksym Andriushchenko (EPFL) 4

slide-9
SLIDE 9

Motivation: other classifiers

Our paper: we concentrate on boosted decision stumps and trees They are widely adopted in practice – implementations like XGBoost

  • r LightGBM are used almost in every Kaggle competition

Maksym Andriushchenko (EPFL) 4

slide-10
SLIDE 10

Motivation: other classifiers

Our paper: we concentrate on boosted decision stumps and trees They are widely adopted in practice – implementations like XGBoost

  • r LightGBM are used almost in every Kaggle competition

Moreover, boosted trees are interpretable which is also an important practical aspect. Who wants to deploy a black-box?

Maksym Andriushchenko (EPFL) 4

slide-11
SLIDE 11

Motivation: other classifiers

Our paper: we concentrate on boosted decision stumps and trees They are widely adopted in practice – implementations like XGBoost

  • r LightGBM are used almost in every Kaggle competition

Moreover, boosted trees are interpretable which is also an important practical aspect. Who wants to deploy a black-box? = ⇒ it is important to develop boosted trees which are robust, but first we need to understand the reason of their vulnerability

Maksym Andriushchenko (EPFL) 4

slide-12
SLIDE 12

Motivation: other classifiers

Our paper: we concentrate on boosted decision stumps and trees They are widely adopted in practice – implementations like XGBoost

  • r LightGBM are used almost in every Kaggle competition

Moreover, boosted trees are interpretable which is also an important practical aspect. Who wants to deploy a black-box? = ⇒ it is important to develop boosted trees which are robust, but first we need to understand the reason of their vulnerability So why do adversarial examples exist?

Maksym Andriushchenko (EPFL) 4

slide-13
SLIDE 13

Understanding adversarial vulnerability

What goes wrong and how to fix it?

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Plain boosted stumps

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Robust boosted stumps Maksym Andriushchenko (EPFL) 5

slide-14
SLIDE 14

Understanding adversarial vulnerability

What goes wrong and how to fix it? We would like to have a large geometric margin for every point

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Plain boosted stumps

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Robust boosted stumps Maksym Andriushchenko (EPFL) 5

slide-15
SLIDE 15

Understanding adversarial vulnerability

What goes wrong and how to fix it? We would like to have a large geometric margin for every point

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Plain boosted stumps

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Robust boosted stumps

Empirical risk minimization does not distinguish the two types of solutions ⇒ we need to use a robust objective

Maksym Andriushchenko (EPFL) 5

slide-16
SLIDE 16

Understanding adversarial vulnerability

What goes wrong and how to fix it? We would like to have a large geometric margin for every point

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Plain boosted stumps

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Robust boosted stumps

Empirical risk minimization does not distinguish the two types of solutions ⇒ we need to use a robust objective Let’s formalize the problem!

Maksym Andriushchenko (EPFL) 5

slide-17
SLIDE 17

Adversarial robustness

What is an adversarial example? Consider x ∈ Rd, y ∈ {−1, 1}, classifier f : Rd → R, some Lp-norm threshold ǫ: min

δ∈Rd yf (x + δ)

δp ≤ ǫ, x + δ ∈ C ✶ ✶

Maksym Andriushchenko (EPFL) 6

slide-18
SLIDE 18

Adversarial robustness

What is an adversarial example? Consider x ∈ Rd, y ∈ {−1, 1}, classifier f : Rd → R, some Lp-norm threshold ǫ: min

δ∈Rd yf (x + δ)

δp ≤ ǫ, x + δ ∈ C Assume x is correctly classified (yf (x) > 0), then x + δ∗ is an adversarial example if x + δ∗ is incorrectly classified (yf (x + δ∗) < 0) ✶ ✶

Maksym Andriushchenko (EPFL) 6

slide-19
SLIDE 19

Adversarial robustness

What is an adversarial example? Consider x ∈ Rd, y ∈ {−1, 1}, classifier f : Rd → R, some Lp-norm threshold ǫ: min

δ∈Rd yf (x + δ)

δp ≤ ǫ, x + δ ∈ C Assume x is correctly classified (yf (x) > 0), then x + δ∗ is an adversarial example if x + δ∗ is incorrectly classified (yf (x + δ∗) < 0) How to measure robustness? Robust test error (RTE): 1 n

n

  • i=1

✶yf (x)<0

  • standard zero-one loss

→ 1 n

n

  • i=1

✶yf (x+δ∗)<0

  • robust zero-one loss

Maksym Andriushchenko (EPFL) 6

slide-20
SLIDE 20

Adversarial robustness

What is an adversarial example? Consider x ∈ Rd, y ∈ {−1, 1}, classifier f : Rd → R, some Lp-norm threshold ǫ: min

δ∈Rd yf (x + δ)

δp ≤ ǫ, x + δ ∈ C Assume x is correctly classified (yf (x) > 0), then x + δ∗ is an adversarial example if x + δ∗ is incorrectly classified (yf (x + δ∗) < 0) How to measure robustness? Robust test error (RTE): 1 n

n

  • i=1

✶yf (x)<0

  • standard zero-one loss

→ 1 n

n

  • i=1

✶yf (x+δ∗)<0

  • robust zero-one loss

Finding δ∗: non-convex opt. problem for NNs and BTs. Exact mixed integer formulations exist for ReLU-NNs and BTs (slow).

Maksym Andriushchenko (EPFL) 6

slide-21
SLIDE 21

Training adversarially robust models

Robust optimization problem wrt the set ∆(ǫ): min

θ n

  • i=1

max

δ∈∆(ǫ) L(f (xi + δ; θ), yi)

Maksym Andriushchenko (EPFL) 7

slide-22
SLIDE 22

Training adversarially robust models

Robust optimization problem wrt the set ∆(ǫ): min

θ n

  • i=1

max

δ∈∆(ǫ) L(f (xi + δ; θ), yi)

L is a usual margin-based loss function (cross-entropy, exp. loss, etc)

Maksym Andriushchenko (EPFL) 7

slide-23
SLIDE 23

Training adversarially robust models

Robust optimization problem wrt the set ∆(ǫ): min

θ n

  • i=1

max

δ∈∆(ǫ) L(f (xi + δ; θ), yi)

L is a usual margin-based loss function (cross-entropy, exp. loss, etc) ǫ = 0 = ⇒ just well-known Empirical Risk Minimization

Maksym Andriushchenko (EPFL) 7

slide-24
SLIDE 24

Training adversarially robust models

Robust optimization problem wrt the set ∆(ǫ): min

θ n

  • i=1

max

δ∈∆(ǫ) L(f (xi + δ; θ), yi)

L is a usual margin-based loss function (cross-entropy, exp. loss, etc) ǫ = 0 = ⇒ just well-known Empirical Risk Minimization Goal: small loss (⇒ large margin) not only at xi, but for every xi + δ ∈ ∆(ǫ)

Maksym Andriushchenko (EPFL) 7

slide-25
SLIDE 25

Training adversarially robust models

Robust optimization problem wrt the set ∆(ǫ): min

θ n

  • i=1

max

δ∈∆(ǫ) L(f (xi + δ; θ), yi)

L is a usual margin-based loss function (cross-entropy, exp. loss, etc) ǫ = 0 = ⇒ just well-known Empirical Risk Minimization Goal: small loss (⇒ large margin) not only at xi, but for every xi + δ ∈ ∆(ǫ) Adversarial training: approximately solve the robust loss = ⇒ minimization of a lower bound on the objective

Maksym Andriushchenko (EPFL) 7

slide-26
SLIDE 26

Training adversarially robust models

Robust optimization problem wrt the set ∆(ǫ): min

θ n

  • i=1

max

δ∈∆(ǫ) L(f (xi + δ; θ), yi)

L is a usual margin-based loss function (cross-entropy, exp. loss, etc) ǫ = 0 = ⇒ just well-known Empirical Risk Minimization Goal: small loss (⇒ large margin) not only at xi, but for every xi + δ ∈ ∆(ǫ) Adversarial training: approximately solve the robust loss = ⇒ minimization of a lower bound on the objective Provable defenses: upper bound the robust loss = ⇒ minimization of an upper bound on the objective

Maksym Andriushchenko (EPFL) 7

slide-27
SLIDE 27

Robustness Certification and Robust Optimization for Boosted Trees

Maksym Andriushchenko (EPFL) 8

slide-28
SLIDE 28

Tree ensemble: robustness certification

The exact certification is NP-hard [Kantchelian et al, ICML 2016]

Maksym Andriushchenko (EPFL) 9

slide-29
SLIDE 29

Tree ensemble: robustness certification

The exact certification is NP-hard [Kantchelian et al, ICML 2016] But we can derive a tractable lower bound ˜ G(x, y) on G(x, y) for an ensemble of trees: min

δ∞≤ǫ yF(x + δ) =

min

δ∞≤ǫ T

  • t=1

yu(t)

qt(x+δ) ≥ T

  • t=1

min

δ∞≤ǫ yu(t) qt(x+δ) := ˜

G(x, y)

Maksym Andriushchenko (EPFL) 9

slide-30
SLIDE 30

Tree ensemble: robustness certification

The exact certification is NP-hard [Kantchelian et al, ICML 2016] But we can derive a tractable lower bound ˜ G(x, y) on G(x, y) for an ensemble of trees: min

δ∞≤ǫ yF(x + δ) =

min

δ∞≤ǫ T

  • t=1

yu(t)

qt(x+δ) ≥ T

  • t=1

min

δ∞≤ǫ yu(t) qt(x+δ) := ˜

G(x, y) ˜ G(x, y) ≥ 0 = ⇒ G(x, y) ≥ 0, i.e. x is provably robust.

Maksym Andriushchenko (EPFL) 9

slide-31
SLIDE 31

Tree ensemble: robustness certification

The exact certification is NP-hard [Kantchelian et al, ICML 2016] But we can derive a tractable lower bound ˜ G(x, y) on G(x, y) for an ensemble of trees: min

δ∞≤ǫ yF(x + δ) =

min

δ∞≤ǫ T

  • t=1

yu(t)

qt(x+δ) ≥ T

  • t=1

min

δ∞≤ǫ yu(t) qt(x+δ) := ˜

G(x, y) ˜ G(x, y) ≥ 0 = ⇒ G(x, y) ≥ 0, i.e. x is provably robust. ˜ G(x, y) < 0 = ⇒ x is either robust or non-robust.

Maksym Andriushchenko (EPFL) 9

slide-32
SLIDE 32

Tree ensemble: robustness certification

The exact certification is NP-hard [Kantchelian et al, ICML 2016] But we can derive a tractable lower bound ˜ G(x, y) on G(x, y) for an ensemble of trees: min

δ∞≤ǫ yF(x + δ) =

min

δ∞≤ǫ T

  • t=1

yu(t)

qt(x+δ) ≥ T

  • t=1

min

δ∞≤ǫ yu(t) qt(x+δ) := ˜

G(x, y) ˜ G(x, y) ≥ 0 = ⇒ G(x, y) ≥ 0, i.e. x is provably robust. ˜ G(x, y) < 0 = ⇒ x is either robust or non-robust. We get an upper bound on the number of non-robust points, which yields an upper bound on the robust test error.

Maksym Andriushchenko (EPFL) 9

slide-33
SLIDE 33

Tree ensemble: robustness certification

The exact certification is NP-hard [Kantchelian et al, ICML 2016] But we can derive a tractable lower bound ˜ G(x, y) on G(x, y) for an ensemble of trees: min

δ∞≤ǫ yF(x + δ) =

min

δ∞≤ǫ T

  • t=1

yu(t)

qt(x+δ) ≥ T

  • t=1

min

δ∞≤ǫ yu(t) qt(x+δ) := ˜

G(x, y) ˜ G(x, y) ≥ 0 = ⇒ G(x, y) ≥ 0, i.e. x is provably robust. ˜ G(x, y) < 0 = ⇒ x is either robust or non-robust. We get an upper bound on the number of non-robust points, which yields an upper bound on the robust test error. For a decision tree: minδ∞≤ǫyu(t)

qt(x+δ) can be found exactly by

checking all leafs which are reachable in B∞(x, ǫ) (O(l) time)

Maksym Andriushchenko (EPFL) 9

slide-34
SLIDE 34

Tree ensemble: from certification to robust optimization

Now we know how to lower bound the certification problem: min

δ∞≤ǫ yF(x + δ)

Maksym Andriushchenko (EPFL) 10

slide-35
SLIDE 35

Tree ensemble: from certification to robust optimization

Now we know how to lower bound the certification problem: min

δ∞≤ǫ yF(x + δ)

Does it help to solve the min-max problem? min

θ n

  • i=1

max

δ∞≤ǫ L(f (xi + δ; θ), yi)

Yes! For monotonically decreasing L (e.g. exp. loss): max

δ∞≤ǫ L(y F(x + δ)) = L

  • min

δ∞≤ǫ yF(x + δ)

  • ,

Maksym Andriushchenko (EPFL) 10

slide-36
SLIDE 36

Tree ensemble: from certification to robust optimization

Now we know how to lower bound the certification problem: min

δ∞≤ǫ yF(x + δ)

Does it help to solve the min-max problem? min

θ n

  • i=1

max

δ∞≤ǫ L(f (xi + δ; θ), yi)

Yes! For monotonically decreasing L (e.g. exp. loss): max

δ∞≤ǫ L(y F(x + δ)) = L

  • min

δ∞≤ǫ yF(x + δ)

  • ,

= ⇒ we can calculate an upper bound on the robust loss.

Maksym Andriushchenko (EPFL) 10

slide-37
SLIDE 37

Tree ensemble: from certification to robust optimization

Now we know how to lower bound the certification problem: min

δ∞≤ǫ yF(x + δ)

Does it help to solve the min-max problem? min

θ n

  • i=1

max

δ∞≤ǫ L(f (xi + δ; θ), yi)

Yes! For monotonically decreasing L (e.g. exp. loss): max

δ∞≤ǫ L(y F(x + δ)) = L

  • min

δ∞≤ǫ yF(x + δ)

  • ,

= ⇒ we can calculate an upper bound on the robust loss. Now: come up with a proper update for a new weak learner.

Maksym Andriushchenko (EPFL) 10

slide-38
SLIDE 38

Tree ensemble: robust optimization

The robust loss for a tree ensemble can be upper bounded as

Maksym Andriushchenko (EPFL) 11

slide-39
SLIDE 39

Tree ensemble: robust optimization

The robust loss for a tree ensemble can be upper bounded as For a particular node during the tree construction process, the robust objective is (I: set of points reachable for the current leaf):

Maksym Andriushchenko (EPFL) 11

slide-40
SLIDE 40

Tree ensemble: robust optimization

The robust loss for a tree ensemble can be upper bounded as For a particular node during the tree construction process, the robust objective is (I: set of points reachable for the current leaf): How to solve the minimization problem? Just a case distinction:

Maksym Andriushchenko (EPFL) 11

slide-41
SLIDE 41

Tree ensemble: robust optimization

Denoting the case distinction as ✶(xi, yi; wr), our final robust

  • bjective is:

Maksym Andriushchenko (EPFL) 12

slide-42
SLIDE 42

Tree ensemble: robust optimization

Denoting the case distinction as ✶(xi, yi; wr), our final robust

  • bjective is:

The minimization wrt wl, wr can be done using coordinate descent (the objective is convex in wl, wr)

Maksym Andriushchenko (EPFL) 12

slide-43
SLIDE 43

Tree ensemble: robust optimization

Denoting the case distinction as ✶(xi, yi; wr), our final robust

  • bjective is:

The minimization wrt wl, wr can be done using coordinate descent (the objective is convex in wl, wr) Important: we are guaranteed to decrease the robust loss after every weak learner

Maksym Andriushchenko (EPFL) 12

slide-44
SLIDE 44

Tree ensemble: robust optimization

Denoting the case distinction as ✶(xi, yi; wr), our final robust

  • bjective is:

The minimization wrt wl, wr can be done using coordinate descent (the objective is convex in wl, wr) Important: we are guaranteed to decrease the robust loss after every weak learner Complexity: O(n2), while XGBoost has O(n log n)

Maksym Andriushchenko (EPFL) 12

slide-45
SLIDE 45

Tree ensemble: robust optimization

Denoting the case distinction as ✶(xi, yi; wr), our final robust

  • bjective is:

The minimization wrt wl, wr can be done using coordinate descent (the objective is convex in wl, wr) Important: we are guaranteed to decrease the robust loss after every weak learner Complexity: O(n2), while XGBoost has O(n log n) That’s it for boosted trees Now what is so special about boosted stumps (one-level trees)?

Maksym Andriushchenko (EPFL) 12

slide-46
SLIDE 46

Results for boosted stumps

The certification problem can be solved exactly! min

δ∞≤ǫ yF(x + δ)

Maksym Andriushchenko (EPFL) 13

slide-47
SLIDE 47

Results for boosted stumps

The certification problem can be solved exactly! min

δ∞≤ǫ yF(x + δ)

Proof idea: the objective is separable over each dimension = ⇒ just solve d simple one-dimensional optimization problems

Maksym Andriushchenko (EPFL) 13

slide-48
SLIDE 48

Results for boosted stumps

The certification problem can be solved exactly! min

δ∞≤ǫ yF(x + δ)

Proof idea: the objective is separable over each dimension = ⇒ just solve d simple one-dimensional optimization problems As a result, the robust loss can be also calculated exactly max

δ∈∆∞(ǫ) L(y F(x + δ)) = L

  • min

δ∈∆∞(ǫ) yF(x + δ)

  • ,

Maksym Andriushchenko (EPFL) 13

slide-49
SLIDE 49

Results for boosted stumps

The certification problem can be solved exactly! min

δ∞≤ǫ yF(x + δ)

Proof idea: the objective is separable over each dimension = ⇒ just solve d simple one-dimensional optimization problems As a result, the robust loss can be also calculated exactly max

δ∈∆∞(ǫ) L(y F(x + δ)) = L

  • min

δ∈∆∞(ǫ) yF(x + δ)

  • ,

Moreover, we also derive an efficient update of the ensemble.

Maksym Andriushchenko (EPFL) 13

slide-50
SLIDE 50

Results for boosted stumps

The certification problem can be solved exactly! min

δ∞≤ǫ yF(x + δ)

Proof idea: the objective is separable over each dimension = ⇒ just solve d simple one-dimensional optimization problems As a result, the robust loss can be also calculated exactly max

δ∈∆∞(ǫ) L(y F(x + δ)) = L

  • min

δ∈∆∞(ǫ) yF(x + δ)

  • ,

Moreover, we also derive an efficient update of the ensemble. = ⇒ interesting result since previously exact certification and robust optimization was known only for linear classifiers

Maksym Andriushchenko (EPFL) 13

slide-51
SLIDE 51

Experiments

Maksym Andriushchenko (EPFL) 14

slide-52
SLIDE 52

Experiments

We test our methods on various datasets, including some image classification datasets (to compare to the literature).

Maksym Andriushchenko (EPFL) 15

slide-53
SLIDE 53

Experiments

We test our methods on various datasets, including some image classification datasets (to compare to the literature). However, our methods are primarily suitable for tabular data

Maksym Andriushchenko (EPFL) 15

slide-54
SLIDE 54

Boosted trees: results

Main metric: RTE (obtained via a mixed-integer solver)

Maksym Andriushchenko (EPFL) 16

slide-55
SLIDE 55

Boosted trees: results

Main metric: RTE (obtained via a mixed-integer solver) Better RTE on 8/8 datasets compared to adversarial training (baseline) and Chen et al. (ICML’19)

Maksym Andriushchenko (EPFL) 16

slide-56
SLIDE 56

Boosted trees: results

Main metric: RTE (obtained via a mixed-integer solver) Better RTE on 8/8 datasets compared to adversarial training (baseline) and Chen et al. (ICML’19) Adversarial training doesn’t work well for boosted trees (the conclusion is different from the neural networks literature)

Maksym Andriushchenko (EPFL) 16

slide-57
SLIDE 57

Boosted trees: results

Main metric: RTE (obtained via a mixed-integer solver) Better RTE on 8/8 datasets compared to adversarial training (baseline) and Chen et al. (ICML’19) Adversarial training doesn’t work well for boosted trees (the conclusion is different from the neural networks literature) The heuristic robust training of Chen et al. works better, but not as good as our approach

Maksym Andriushchenko (EPFL) 16

slide-58
SLIDE 58

Boosted trees: results

Main metric: RTE (obtained via a mixed-integer solver) Better RTE on 8/8 datasets compared to adversarial training (baseline) and Chen et al. (ICML’19) Adversarial training doesn’t work well for boosted trees (the conclusion is different from the neural networks literature) The heuristic robust training of Chen et al. works better, but not as good as our approach Note: upper bounds (URTE) are remarkably close to RTE!

Maksym Andriushchenko (EPFL) 16

slide-59
SLIDE 59

Multi-class comparison to provable defenses for CNNs

We outperform almost all provable defenses for CNNs, except one recent method (Gowal et al, 2018)!

Maksym Andriushchenko (EPFL) 17

slide-60
SLIDE 60

Distribution of splitting thresholds

Robust training changes the threshold distribution dramatically!

Maksym Andriushchenko (EPFL) 18

slide-61
SLIDE 61

Distribution of splitting thresholds

Robust training changes the threshold distribution dramatically! Adversarial training also changes it, but still has non-robust splits

Maksym Andriushchenko (EPFL) 18

slide-62
SLIDE 62

Adversarial examples for boosted trees

Models: normal, adversarially trained, our robust boosted trees.

Maksym Andriushchenko (EPFL) 19

slide-63
SLIDE 63

Adversarial examples for boosted trees

Models: normal, adversarially trained, our robust boosted trees. Adversarial training leads to examples with δ∞ < 0.3 Our method consistently leads to δ∞ ≥ 0.3

Maksym Andriushchenko (EPFL) 19

slide-64
SLIDE 64

Conclusions and outlook

Maksym Andriushchenko (EPFL) 20

slide-65
SLIDE 65

Outlook

Our results put the provable defenses for CNNs into a perspective = ⇒ so far they have achieved only limited success

Maksym Andriushchenko (EPFL) 21

slide-66
SLIDE 66

Outlook

Our results put the provable defenses for CNNs into a perspective = ⇒ so far they have achieved only limited success Shallow models (i.e. no layer-wise structure) are easy to certify!

Maksym Andriushchenko (EPFL) 21

slide-67
SLIDE 67

Outlook

Our results put the provable defenses for CNNs into a perspective = ⇒ so far they have achieved only limited success Shallow models (i.e. no layer-wise structure) are easy to certify! Lp-robustness for image data – no applications so far

Maksym Andriushchenko (EPFL) 21

slide-68
SLIDE 68

Outlook

Our results put the provable defenses for CNNs into a perspective = ⇒ so far they have achieved only limited success Shallow models (i.e. no layer-wise structure) are easy to certify! Lp-robustness for image data – no applications so far Tabular data matters and it is ubiquitous. Real applications of Lp-robustness are rather there.

Maksym Andriushchenko (EPFL) 21

slide-69
SLIDE 69

Outlook

Our results put the provable defenses for CNNs into a perspective = ⇒ so far they have achieved only limited success Shallow models (i.e. no layer-wise structure) are easy to certify! Lp-robustness for image data – no applications so far Tabular data matters and it is ubiquitous. Real applications of Lp-robustness are rather there. Robust and interpretable models are needed!

Maksym Andriushchenko (EPFL) 21

slide-70
SLIDE 70

Thanks for your attention! Questions?

Maksym Andriushchenko (EPFL) 22