An Abstract Domain for Certifying Neural Networks Gagandeep Singh - - PowerPoint PPT Presentation

β–Ά
an abstract domain for certifying neural networks
SMART_READER_LITE
LIVE PREVIEW

An Abstract Domain for Certifying Neural Networks Gagandeep Singh - - PowerPoint PPT Presentation

An Abstract Domain for Certifying Neural Networks Gagandeep Singh Timon Gehr Markus Pschel Martin Vechev Department of Computer Science 1 Adversarial input perturbations Neural network f 8 " Neural network f 7


slide-1
SLIDE 1

An Abstract Domain for Certifying Neural Networks

Gagandeep Singh Timon Gehr Markus PΓΌschel Martin Vechev Department of Computer Science

1

slide-2
SLIDE 2

Adversarial input perturbations

8 7

𝐽" 𝐽 ∈ 𝑀%(𝐽', πœ—)

9

𝐽 ∈ 𝑆𝑝𝑒𝑏𝑒𝑓(𝐽', πœ—,𝛽, 𝛾)

2

Neural network f Neural network f Neural network f

slide-3
SLIDE 3

Neural network robustness

Neural network 𝑔: ℝ5 ⟢ ℝ7 Perturbation region β„› 𝐽', 𝜚 𝑀% 𝐽', πœ— : All images 𝐽 where pixel values in 𝐽 and 𝐽' differ by at most πœ— Rotate(𝐽', πœ—,𝛽, 𝛾): All images 𝐽 in 𝑀% 𝐽', πœ— rotated by πœ„ ∈ [𝛽, 𝛾] βˆ€π½ ∈ β„› 𝐽', 𝜚 . 𝑔 𝑑 > 𝑔(π‘˜) where c is the correct output and j is any other output

Given: Regions: T

  • Prove:

The size of β„› 𝐽', 𝜚 grows exponentially in the number of pixels:

  • cannot compute f 𝐽 for all 𝐽 separately
  • Precise but does not scale:
  • SMT Solving [CAV’17]
  • Input refinement [USENIX’18]
  • Semidefinite relaxations [ICLR’18]
  • Scales but imprecise
  • Linear relaxations [ICML’18]
  • Abstract interpretation [S&P’18,

NIPS’18]

Challenges Prior Work

3

slide-4
SLIDE 4

This work: contributions

A new abstract domain combining floating point Polyhedra with Intervals:

  • custom transformers for common functions in

neural networks such as affine transforms, ReLU, sigmoid, tanh, and maxpool activations

  • scalable and precise analysis

First approach to certify robustness under rotation combined with linear interpolation:

  • based on refinement of the abstract input
  • πœ— = 0.001, 𝛽 = βˆ’45I, 𝛾 = 65I

DeepPoly:

  • complete and parallelized end-to-end

implementation based on ELINA

  • https://github.com/eth-sri/eran

Network 𝝑 NIPS’18 DeepPoly Ø 6 layers Ø 3010 units 0.035 proves 21% 15.8 sec proves 64% 4.8 sec Ø 6 layers Ø 34,688 units 0.3 proves 37% 17 sec proves 43% 88 sec

4

slide-5
SLIDE 5

Our Abstract Domain

Shape: associate a lower polyhedral 𝑏L

M and an upper polyhedral 𝑏L N constraint with each 𝑦L

  • less precise than Polyhedra, restriction

needed to ensure scalability

  • captures affine transformation precisely

unlike Octagon, TVPI

  • custom transformers for ReLU, sigmoid,

tanh, and maxpool activations Concretization of abstract element 𝑏: Domain invariant: store auxiliary concrete lower and upper bounds π‘šL, 𝑣L for each 𝑦L

5

Transformer Polyhedra Our domain Affine Ο(π‘œπ‘›U) Ο(π‘₯5WX

U

𝑀) ReLU Ο(exp (π‘œ, 𝑛)) Ο(1)

π‘œ: #neurons, 𝑛: #constraints π‘₯5WX: max #neurons in a layer, 𝑀: # layers

slide-6
SLIDE 6

Example: Analysis of a Toy Neural Network

𝑦] 𝑦^ 𝑦_ 𝑦]] 𝑦U 𝑦` 𝑦a 𝑦b 𝑦c 𝑦d 𝑦]' 𝑦]U 1 max (0, 𝑦^) 1 1 βˆ’1 βˆ’1 1 max (0, 𝑦`) max (0, 𝑦b) max (0, 𝑦d) 1 1 1 1 1 [βˆ’1,1] [βˆ’1,1]

Input layer Output layer Hidden layers

1

6

slide-7
SLIDE 7

𝑦] 𝑦^ 𝑦_ 𝑦]] 𝑦U 𝑦` 𝑦a 𝑦b 𝑦c 𝑦d 𝑦]' 𝑦]U 1 max (0, 𝑦^) 1 1 βˆ’1 βˆ’1 1 max (0, 𝑦`) max (0, 𝑦b) max (0, 𝑦d) 1 1 1 1 1 [βˆ’1,1] [βˆ’1,1] 1

7

slide-8
SLIDE 8

ReLU activation

𝑦^ 𝑦_ 𝑦b 𝑦c max (0, 𝑦^) max (0, 𝑦b)

Pointwise transformer for 𝑦g ≔ 𝑛𝑏𝑦(0, 𝑦L) that uses π‘šL, 𝑣L 𝑗𝑔 𝑣L ≀ 0, 𝑏g

M = 𝑏g N = 0, π‘šg = 𝑣g = 0,

𝑗𝑔 π‘šL β‰₯ 0, 𝑏g

M = 𝑏g N = 𝑦L, π‘šg = π‘šL, 𝑣g = 𝑣L,

𝑗𝑔 π‘šL < 0 π‘π‘œπ‘’ 𝑣L > 0 choose (b) or (c) depending on the area Constant runtime

8

slide-9
SLIDE 9

Affine transformation after ReLU

𝑦_ 𝑦` 𝑦c 1 1

Imprecise upper bound 𝑣` by substituting 𝑣_, 𝑣c for 𝑦_ and 𝑦c in 𝑏`

N

9

slide-10
SLIDE 10

Backsubstitution

𝑦_ 𝑦` 𝑦c 1 1

10

slide-11
SLIDE 11

Affine transformation with backsubstitution is pointwise, complexity: Ο π‘₯5WX

U

𝑀

𝑦_ 𝑦` 𝑦c 1 1 𝑦^ 𝑦b max (0, 𝑦^) max (0, 𝑦b) 𝑦] 𝑦U 1 βˆ’1 1 1

11

slide-12
SLIDE 12

𝑦] 𝑦^ 𝑦_ 𝑦]] 𝑦U 𝑦` 𝑦a 𝑦b 𝑦c 𝑦d 𝑦]' 𝑦]U 1 max (0, 𝑦^) 1 1 βˆ’1 βˆ’1 1 max (0, 𝑦`) max (0, 𝑦b) max (0, 𝑦d) 1 1 1 1 1 [βˆ’1,1] [βˆ’1,1] 1

12

slide-13
SLIDE 13

Checking for robustness

Prove 𝑦]] βˆ’ 𝑦]U > 0 for all inputs in βˆ’1,1 Γ—[βˆ’1,1] Computing lower bound for 𝑦]] βˆ’ 𝑦]U using π‘š]], 𝑣]U gives -1 which is an imprecise result With backsubstitution, one gets 1 as the lower bound for 𝑦]] βˆ’ 𝑦]U, proving robustness

13

slide-14
SLIDE 14

More complex perturbations: rotations

14

Solution: Over-approximate Rotate(𝐽', πœ—,𝛽, 𝛾) with boxes and use input refinement for precision Challenge: Rotate(𝐽', πœ—,𝛽, 𝛾) is non-linear and cannot be captured in our domain unlike 𝑀% 𝐽', πœ— Result: Prove robustness for networks under Rotate(𝐽', 0.001,-45,65)

slide-15
SLIDE 15

More in the paper

15

Sigmoid transformer Tanh transformer Maxpool transformer Floating point soundness

slide-16
SLIDE 16

Experimental evaluation

  • Neural network architectures:
  • fully connected feedforward (FFNN)
  • convolutional (CNN)
  • Training:
  • trained to be robust with DiffAI [ICML’18] and PGD [CVPR’18]
  • without adversarial training
  • Datasets:
  • MNIST
  • CIFAR10
  • DeepPoly vs. state-of-the-art DeepZ [NIPS’18] and Fast-Lin [ICML’18]

16

slide-17
SLIDE 17

Results

17

slide-18
SLIDE 18

MNIST FFNN (3,010 hidden units)

18

slide-19
SLIDE 19

CIFAR10 CNNs (4,852 hidden units)

19

slide-20
SLIDE 20

Large Defended CNNs

trained via DiffAI [ICML’18]

Dataset Model #hidden units 𝝑 %verified robustness Average runtime (s) DeepZ DeepPoly DeepZ DeepPoly MNIST ConvBig 34,688 0.1 97 97 5 50 ConvBig 34,688 0.2 79 78 7 61 ConvBig 34,688 0.3 37 43 17 88 ConvSuper 88,500 0.1 97 97 133 400 CIFAR10 ConvBig 62,464 0.006 50 52 39 322 ConvBig 62,464 0.008 33 40 46 331

20

slide-21
SLIDE 21

Conclusion

DeepPoly:

  • complete and parallelized end-to-end

implementation based on ELINA

  • https://github.com/eth-sri/eran

21

A new abstract domain combining floating point Polyhedra with Intervals:

Transformer Polyhedra Our domain Affine Ο(π‘œπ‘›U) Ο(π‘₯5WX

U

𝑀) ReLU Ο(exp (π‘œ, 𝑛)) Ο(1)

π‘œ: #neurons, 𝑛: #constraints π‘₯5WX: max #neurons in a layer, 𝑀: # layers