[PPT] - Adversarial Attacks on Probabilistic Autoregressive Forecasting PowerPoint Presentation

SLIDE 1

Adversarial Attacks on Probabilistic Autoregressive Forecasting Models

ICML 2020

Raphaël Dang-Nhu Gagandeep Singh Pavol Bielik Martin Vechev

Department of Computer Science, ETH Zürich dangnhur@ethz.ch

1

SLIDE 2

Neural architectures with stochastic behavior

−10 −5 5 10 15 20 400 600 800 1000 1200 1400 1600

(i) Probabilistic forecasting model (ii) Bayesian neural network

Multiple sources of noise: (i) each timestep, (ii) each weight1
Complex resulting output distribution, approximated via

Monte-Carlo sampling

1Blundell et al., Weight Uncertainty in Neural Networks, ICML 2015

2

SLIDE 3

Neural architectures with stochastic behavior

−10 −5 5 10 15 20 400 600 800 1000 1200 1400 1600

(i) Probabilistic forecasting model (ii) Bayesian neural network

Multiple sources of noise: (i) each timestep, (ii) each weight1
Complex resulting output distribution, approximated via

Monte-Carlo sampling

1Blundell et al., Weight Uncertainty in Neural Networks, ICML 2015

2

SLIDE 4

Neural architectures with stochastic behavior

−10 −5 5 10 15 20 400 600 800 1000 1200 1400 1600

(i) Probabilistic forecasting model (ii) Bayesian neural network

Multiple sources of noise: (i) each timestep, (ii) each weight1
Complex resulting output distribution, approximated via

Monte-Carlo sampling

1Blundell et al., Weight Uncertainty in Neural Networks, ICML 2015

2

SLIDE 5

Neural architectures with stochastic behavior

−10 −5 5 10 15 20 400 600 800 1000 1200 1400 1600

(i) Probabilistic forecasting model (ii) Bayesian neural network

Multiple sources of noise: (i) each timestep, (ii) each weight1
Complex resulting output distribution, approximated via

Monte-Carlo sampling

1Blundell et al., Weight Uncertainty in Neural Networks, ICML 2015

2

SLIDE 6

Focus of this work: probabilistic forecasting models

Stochastic sequence model
Generates several prediction

traces

−10 −5 5 10 15 20 400 600 800 1000 1200 1400 1600

Traditionally used as a generative model WaveNet for raw audio Handwriting generation

3

SLIDE 7

Focus of this work: probabilistic forecasting models

Stochastic sequence model
Generates several prediction

traces

−10 −5 5 10 15 20 400 600 800 1000 1200 1400 1600

Traditionally used as a generative model WaveNet for raw audio Handwriting generation

3

SLIDE 8

Probabilistic forecasting models for decision-making2

Allows to predict volatility of the time-series.
Useful with low signal-to-noise ratio.

Key idea: use generated traces as Monte-Carlo samples to estimate the evolution of the time-series

Stock prices Electricity consumption Business sales

Integrated in Amazon Sagemaker (DeepAR architecture)

2Salinas et al., DeepAR: Probabilistic forecasting with autoregressive recurrent

networks, International Journal of Forecasting, 2020 4

SLIDE 9

Probabilistic forecasting models for decision-making2

Allows to predict volatility of the time-series.
Useful with low signal-to-noise ratio.

Key idea: use generated traces as Monte-Carlo samples to estimate the evolution of the time-series

Stock prices Electricity consumption Business sales

Integrated in Amazon Sagemaker (DeepAR architecture)

2Salinas et al., DeepAR: Probabilistic forecasting with autoregressive recurrent

networks, International Journal of Forecasting, 2020 4

SLIDE 10

Contributions

New class of attack objectives based on output statistics
Adaptation of gradient-based adversarial attacks to these new

attack objectives for stochastic models

Main technical aspect: developing estimators for propagating

the objective gradient through the Monte-Carlo approximation

We aim at providing an ofg-the-shelf methodology for these attacks

5

SLIDE 11

Contributions

New class of attack objectives based on output statistics
Adaptation of gradient-based adversarial attacks to these new

attack objectives for stochastic models

Main technical aspect: developing estimators for propagating

the objective gradient through the Monte-Carlo approximation

We aim at providing an ofg-the-shelf methodology for these attacks

5

SLIDE 12

Contributions

New class of attack objectives based on output statistics
Adaptation of gradient-based adversarial attacks to these new

attack objectives for stochastic models

Main technical aspect: developing estimators for propagating

the objective gradient through the Monte-Carlo approximation

We aim at providing an ofg-the-shelf methodology for these attacks

5

SLIDE 13

Contributions

New class of attack objectives based on output statistics
Adaptation of gradient-based adversarial attacks to these new

attack objectives for stochastic models

Main technical aspect: developing estimators for propagating

the objective gradient through the Monte-Carlo approximation

We aim at providing an ofg-the-shelf methodology for these attacks

5

SLIDE 14

Contributions

New class of attack objectives based on output statistics
Adaptation of gradient-based adversarial attacks to these new

attack objectives for stochastic models

Main technical aspect: developing estimators for propagating

the objective gradient through the Monte-Carlo approximation

We aim at providing an ofg-the-shelf methodology for these attacks

5

SLIDE 15

Class of attack objectives

SLIDE 16

Stochastic model with input x, and output y ∼ qx(·). Previously considered attack objectives:

Untargeted attacks on information divergence D with the original predicted distribution D qx qx Untargeted/Targeted attacks on the mean of the distribution distance

qx

y target

6

SLIDE 17

Stochastic model with input x, and output y ∼ qx(·). Previously considered attack objectives:

Untargeted attacks on information divergence D with the original predicted distribution max

δ

D (qx+δ∥qx) Untargeted/Targeted attacks on the mean of the distribution distance

qx

y target

6

SLIDE 18

Stochastic model with input x, and output y ∼ qx(·). Previously considered attack objectives:

Untargeted attacks on information divergence D with the original predicted distribution max

δ

D (qx+δ∥qx) Untargeted/Targeted attacks on the mean of the distribution min

δ distance

( Eqx+δ[y], target )

6

SLIDE 19

Framework

We perform a targeted attack on a statistic χ(y) of the output. This corresponds to minimizing distance

qx

y target Extensions:

Bayesian setting qx y z .
Generalization to simultaneous attack of several statistics.
Statistics depending on x.

7

SLIDE 20

Framework

We perform a targeted attack on a statistic χ(y) of the output. This corresponds to minimizing distance ( Eqx+δ[χ(y)], target ) Extensions:

Bayesian setting qx y z .
Generalization to simultaneous attack of several statistics.
Statistics depending on x.

7

SLIDE 21

Framework

We perform a targeted attack on a statistic χ(y) of the output. This corresponds to minimizing distance ( Eqx+δ[χ(y)], target ) Extensions:

Bayesian setting qx(y|z).
Generalization to simultaneous attack of several statistics.
Statistics depending on x.

7

SLIDE 22

Motivation 1: option pricing in finance

Consider a stock with

past prices x = (p1, . . . , pt−1)
predicted future prices y = (pt, . . . , pT).

Name y Observation z European call option 0 yh Asian call option averagei yi Limit sell order 1

i yi

threshold Barrier option yh

i yi

threshold

Our framework allows to specifically target one of these options

8

SLIDE 23

Motivation 1: option pricing in finance

Consider a stock with

past prices x = (p1, . . . , pt−1)
predicted future prices y = (pt, . . . , pT).

Name χ(y) Observation z European call option max(0, yh) Asian call option averagei(yi) Limit sell order 1 [ maxi yi ≥ threshold ] Barrier option yh maxi yi ≥ threshold

Our framework allows to specifically target one of these options

8

SLIDE 24

Motivation 1: option pricing in finance

Consider a stock with

past prices x = (p1, . . . , pt−1)
predicted future prices y = (pt, . . . , pT).

Name χ(y) Observation z European call option max(0, yh) Asian call option averagei(yi) Limit sell order 1 [ maxi yi ≥ threshold ] Barrier option yh maxi yi ≥ threshold

Our framework allows to specifically target one of these options

8

SLIDE 25

Motivation 2: attacking model uncertainty

Some defenses use prediction uncertainty to detect adversarial examples. New attacks bypass these defenses by enforcing uncertainty constraints for the adversarial example. Our framework allows to express these constraints, with

The entropy

qx

q y x .

The distribution’s moments

qx yk . 9

SLIDE 26

Motivation 2: attacking model uncertainty

Some defenses use prediction uncertainty to detect adversarial examples. New attacks bypass these defenses by enforcing uncertainty constraints for the adversarial example. Our framework allows to express these constraints, with

The entropy

qx

q y x .

The distribution’s moments

qx yk . 9

SLIDE 27

Motivation 2: attacking model uncertainty

Some defenses use prediction uncertainty to detect adversarial examples. New attacks bypass these defenses by enforcing uncertainty constraints for the adversarial example. Our framework allows to express these constraints, with

The entropy Eqx[− log(q[y|x])].
The distribution’s moments Eqx[yk].

9

SLIDE 28

Details about the estimators

SLIDE 29

Technical challenge

Gradient-based attacks require computing The expectation and its gradient have no analytical closed form We provide two difgerent estimators to approximate the gradient

10

SLIDE 30

Technical challenge

Gradient-based attacks require computing The expectation and its gradient have no analytical closed form We provide two difgerent estimators to approximate the gradient

10

SLIDE 31

Technical challenge

Gradient-based attacks require computing The expectation and its gradient have no analytical closed form We provide two difgerent estimators to approximate the gradient

10

SLIDE 32

Approach 1: REINFORCE

A.k.a as log-derivative trick and score-function estimator.
Based on interversion of expectation and derivative.

REINFORCE estimator

.

11

SLIDE 33

Approach 1: REINFORCE

A.k.a as log-derivative trick and score-function estimator.
Based on interversion of expectation and derivative.

REINFORCE estimator

.

11

SLIDE 34

Approach 2: Reparametrization

Mitigates the high-variance of REINFORCE.
Typically used for variational inference.
Assumes a reparametrization y ∼ g(x, η), where g is

deterministic.

Reparametrization estimator

.

12

SLIDE 35

Approach 2: Reparametrization

Mitigates the high-variance of REINFORCE.
Typically used for variational inference.
Assumes a reparametrization y ∼ g(x, η), where g is

deterministic.

Reparametrization estimator

.

12

SLIDE 36

Comparison

Respective advantages of gradient estimators.

Method REINFORCE Reparametrization Applies to non-difgerentiable statistics ✔ Requires no reparametrization ✔ Applies to Bayesian setting ✔ Yields best gradient estimates ✔

Detailed comparison and conditions in the paper!

13

SLIDE 37

Comparison

Respective advantages of gradient estimators.

Method REINFORCE Reparametrization Applies to non-difgerentiable statistics ✔ Requires no reparametrization ✔ Applies to Bayesian setting ✔ Yields best gradient estimates ✔

Detailed comparison and conditions in the paper!

13

SLIDE 38

Experimental evaluation

SLIDE 39

Experiments: stock prices

Algorithmic trading scenario, standard additive threat model, maximum Euclidean norm of 0.13 for the perturbation.

Attack is successful on 90
f test inputs.
The network incurs a daily financial loss of

13 .

3Corresponds to perturbing one value by 10%, 10 values by 3.3%, 100 values by 1%.

14

SLIDE 40

Experiments: stock prices

Algorithmic trading scenario, standard additive threat model, maximum Euclidean norm of 0.13 for the perturbation.

Attack is successful on 90% of test inputs.
The network incurs a daily financial loss of

13 .

3Corresponds to perturbing one value by 10%, 10 values by 3.3%, 100 values by 1%.

14

SLIDE 41

Experiments: stock prices

Algorithmic trading scenario, standard additive threat model, maximum Euclidean norm of 0.13 for the perturbation.

Attack is successful on 90% of test inputs.
The network incurs a daily financial loss of −13%.

3Corresponds to perturbing one value by 10%, 10 values by 3.3%, 100 values by 1%.

14

SLIDE 42

Experiments: stock prices

Algorithmic trading scenario, standard additive threat model, maximum Euclidean norm of 0.13 for the perturbation.

Attack is successful on 90% of test inputs.
The network incurs a daily financial loss of −13%.

3Corresponds to perturbing one value by 10%, 10 values by 3.3%, 100 values by 1%.

14

SLIDE 43

Experiments: stock prices

Algorithmic trading scenario, standard additive threat model, maximum Euclidean norm of 0.13 for the perturbation.

Attack is successful on 90% of test inputs.
The network incurs a daily financial loss of −13%.

3Corresponds to perturbing one value by 10%, 10 values by 3.3%, 100 values by 1%.

14

SLIDE 44

Experiments: electricity

Original test samples (red) and adversarial examples (blue) for prediction of electricity consumption.

15

SLIDE 45