Testing Deep Neural Networks Xiaowei Huang, University of Liverpool - - PowerPoint PPT Presentation

testing deep neural networks
SMART_READER_LITE
LIVE PREVIEW

Testing Deep Neural Networks Xiaowei Huang, University of Liverpool - - PowerPoint PPT Presentation

Testing Deep Neural Networks Xiaowei Huang, University of Liverpool Outline Safety Problem of AI Verification (brief) Testing Conclusions and Future Works Human-Level Intelligence Robotics and Autonomous Systems Deep neural networks all


slide-1
SLIDE 1

Testing Deep Neural Networks

Xiaowei Huang, University of Liverpool

slide-2
SLIDE 2

Outline

Safety Problem of AI Verification (brief) Testing Conclusions and Future Works

slide-3
SLIDE 3

Human-Level Intelligence

slide-4
SLIDE 4

Robotics and Autonomous Systems

slide-5
SLIDE 5

Deep neural networks

all implemented with

slide-6
SLIDE 6

Figure: safety in image classification networks

slide-7
SLIDE 7

Figure: safety in natural language processing networks

slide-8
SLIDE 8

Figure: safety in voice recognition networks

slide-9
SLIDE 9

Figure: safety in security systems

slide-10
SLIDE 10

Safety Definition: Human Driving vs. Autonomous Driving

Traffic image from “The German Traffic Sign Recognition Benchmark”

slide-11
SLIDE 11

Safety Definition: Human Driving vs. Autonomous Driving

Image generated from our tool

slide-12
SLIDE 12

Safety Problem: Incidents

slide-13
SLIDE 13

Safety Definition: Illustration

slide-14
SLIDE 14

Safety Requirements

◮ Pointwise Robustness (this talk)

◮ if the decision of a pair (input, network) is invariant with

respect to the perturbation to the input.

◮ Network Robustness ◮ or more fundamentally, Lipschitz continuity, mutual

information, etc

◮ model interpretability

slide-15
SLIDE 15

Certification of DNN

https://github.com/TrustAI

slide-16
SLIDE 16

Outline

Safety Problem of AI Verification (brief) Testing Conclusions and Future Works

slide-17
SLIDE 17

Safety Definition: Traffic Sign Example

slide-18
SLIDE 18

Maximum Safe Radius

Definition

The maximum safe radius problem is to compute the minimum distance from the original input α to an adversarial example, i.e., MSR(α) = min

α′∈D{||α − α′||k | α′ is an adversarial example}

(1)

slide-19
SLIDE 19
slide-20
SLIDE 20

Existing Approaches

◮ layer-by-layer exhaustive search, see e.g., [2]1 ◮ SMT, MILP, SAT based constraint solving, see e.g., [3]2 ◮ global optimisation, see e.g., [6]3 ◮ abstract interpretation, see e.g., [1]4

1Huang, Kwiatkowska, Wang, Wu, CAV2017 2Katz, Barrett, Dill, Julian, Kochenderfer, CAV2017 3Ruan, Huang, Kwiatkowska, IJCAI2018 4Gehr, Mirman, Drachsler-Cohen, Tsankov, Chaudhuri, Vechev, S&P2018

slide-21
SLIDE 21

Outline

Safety Problem of AI Verification (brief) Testing Test Coverage Criteria Test Case Generation Conclusions and Future Works

slide-22
SLIDE 22

Deep Neural Networks (DNNs)

v1,1 v1,2 u4,1 u4,2 Hidden layer Hidden layer Input layer Output layer

n2,1 n2,2 n2,3 n3,1 n3,2 n3,3

label = argmax1≤l≤sK uK,l

slide-23
SLIDE 23

Deep Neural Networks (DNNs)

v1,1 v1,2 u4,1 u4,2 Hidden layer Hidden layer Input layer Output layer

n2,1 n2,2 n2,3 n3,1 n3,2 n3,3

label = argmax1≤l≤sK uK,l 1) neuron activation value

uk,i = bk,i +

  • 1≤h≤sk−1

wk−1,h,i · vk−1,h weighted sum plus a bias; w, b are parameters learned

2) rectified linear unit (ReLU):

vk,i = max{uk,i, 0}

slide-24
SLIDE 24

DNN as a program

. . . // 1) neuron a c t i v a t i o n value uk,i = bk,i for (unsigned h = 0; h ≤ sk−1; h += 1) { uk,i += wk−1,h,i · vk−1,h } vk,i = 0 // 2) ReLU i f (uk,i > 0) { vk,i = uk,i } . . .

slide-25
SLIDE 25

Testing Framework

◮ Test Coverage Criteria ◮ Test Case Generation

slide-26
SLIDE 26

Examples of Test Coverage Criteria

◮ Neuron coverage [5]5 ◮ Neuron boundary coverage [4] 6 ◮ MC/DC for DNNs [8]7 ◮ Lipschitz continuity

5Pei, Cao, Yang, Jana, SOSP2017. 6Ma, Xu, Zhang, Sun, Xue, Li, Chen, Su, Li, Liu, Zhao, Wang, ASE2018 7Sun, Huang, Kroening, ASE2018

slide-27
SLIDE 27

Neuron coverage

For any hidden neuron nk,i, there exists test case t ∈ T such that the neuron nk,i is activated: uk,i > 0. Test coverage conditions: {∃x.u[x]k,i > 0 | 2 ≤ k ≤ K − 1, 1 ≤ i ≤ sk}

slide-28
SLIDE 28

Neuron coverage

For any hidden neuron nk,i, there exists test case t ∈ T such that the neuron nk,i is activated: uk,i > 0. Test coverage conditions: {∃x.u[x]k,i > 0 | 2 ≤ k ≤ K − 1, 1 ≤ i ≤ sk}

◮ ≈ statement (line) coverage . . . // 1) neuron a c t i v a t i o n v a l u e uk,i = bk,i for (unsigned h = 0; h ≤ sk−1; h += 1) { uk,i += wk−1,h,i · vk−1,h } vk,i = 0 // 2) ReLU i f (uk,i > 0) { vk,i = uk,i ⇐ this line is covered } . . .

slide-29
SLIDE 29

Neuron Coverage

Problem of neuron coverage:

◮ too easy to reach 100% coverage

slide-30
SLIDE 30

MC/DC in Software Testing

Developed by NASA and has been widely adopted in e.g., avionics software development guidance to ensure adequate testing of applications with the highest criticality. Idea: if a choice can be made, all the possible factors (conditions) that contribute to that choice (decision) must be tested. For traditional software, both conditions and the decision are usually Boolean variables or Boolean expressions.

slide-31
SLIDE 31

MC/DC Example

Example: the decision d ⇐ ⇒ ((a > 3) ∨ (b = 0)) ∧ (c = 4) (2) contains the three conditions (a > 3), (b = 0) and (c = 4). The following two test cases provide 100% condition coverage (i.e., all possibilities of the conditions are exploited):

  • 1. (a > 3)=True, (b = 0)=True, (c = 4)=True, d = True
  • 2. (a > 3)=False, (b = 0)=False, (c = 4)=False, d = False
slide-32
SLIDE 32

MC/DC Example

Example: the decision d ⇐ ⇒ ((a > 3) ∨ (b = 0)) ∧ (c = 4) (3) contains the three conditions (a > 3), (b = 0) and (c = 4). The following six test cases provide 100% MC/DC coverage:

  • 1. (a > 3)=True, (b = 0)=True, (c = 4)=True, d = True
  • 2. (a > 3)=False, (b = 0)=False, (c = 4)=False, d = False
  • 3. (a > 3)=False, (b = 0)=False, (c = 4)=True, d = False
  • 4. (a > 3)=False, (b = 0)=True, (c = 4)=True, d = True
  • 5. (a > 3)=False, (b = 0)=True, (c = 4)=False, d = False
  • 6. (a > 3)=True, (b = 0)=False, (c = 4)=True, d = True
slide-33
SLIDE 33

MC/DC for DNNs – General Idea

The core idea of our criteria is to ensure that not only the presence

  • f a feature needs to be tested but also the effects of less complex

features on a more complex feature must be tested. v1,1 v1,2 v4,1 v4,2

n2,1 n2,2 n2,3 n3,1 n3,2 n3,3

For example, check the impact of n2,1, n2,2, n2,3 on n3,1.

slide-34
SLIDE 34

MC/DC for DNNs – Neuron Pair and Sign Change

A neuron pair (nk,i, nk+1,j) are two neurons in adjacent layers k and k + 1 such that 1 ≤ k ≤ K − 1, 1 ≤ i ≤ sk, and 1 ≤ j ≤ sk+1. (Sign Change of a neuron) Given a neuron nk,l and two test cases x1 and x2, we say that the sign change of nk,l is exploited by x1 and x2, denoted as sc(nk,l, x1, x2), if sign(vk,l[x1]) = sign(vk,l[x2]).

slide-35
SLIDE 35

MC/DC for DNNs – Value Change and Distance Change

(Value Change of a neuron) Given a neuron nk,l and two test cases x1 and x2, we say that the value change of nk,l is exploited with respect to a value function g by x1 and x2, denoted as vc(g, nk,l, x1, x2), if g(uk,l[x1], uk,l[x2])=True .

slide-36
SLIDE 36

MC/DC for DNNs – Sign-Sign Cover, or SS Cover

A neuron pair α = (nk,i, nk+1,j) is SS-covered by two test cases x1, x2, denoted as covSS(α, x1, x2), if the following conditions are satisfied by the network instances N[x1] and N[x2]:

◮ sc(nk,i, x1, x2); ◮ ¬sc(nk,l, x1, x2) for all nk,l ∈ Pk \ {i}; ◮ sc(nk+1,j, x1, x2).

slide-37
SLIDE 37

MC/DC for DNNs – Other Covering Methods

Value-Sign Cover, or VS Cover Sign-Value Cover, or SV Cover Value-Value Cover, or VV Cover

slide-38
SLIDE 38

Relation

MN denotes the neuron coverage metric arrows represent “weaker than” relation between metrics

slide-39
SLIDE 39

Activation pattern8

Activation Pattern

◮ Given a concrete input x, N[x] corresponds to a linear model

C

◮ C represents the set of inputs following the same activation

pattern

◮ One DNN activation pattern corresponds to a program

execution path

◮ traverse of all activation patterns ⇒ formal verification ◮ too many patterns: e.g., 2>10,000... 8Sun, Huang, Kroening. ”Testing Deep Neural Networks.” (2018).

slide-40
SLIDE 40

Safety Coverage [10]9

Definition

Let each hyper-rectangle rec contains those inputs with the same pattern of ReLU, i.e., for all x1, x2 ∈ rec we have sign(nk,l, x1) = sign(nk,l, x2) for all nk,l ∈ H(N). A hyper-rectangle rec is safe covered by a test case x, denoted as covS(rec, x), if x ∈ rec.

9Wicker, Huang, Kwiatkowska, TACAS2018

slide-41
SLIDE 41

Relation

MS denotes the safety coverage metric

slide-42
SLIDE 42

Safety Coverage

Problem of safety coverage:

◮ exponential number of hyper-rectangles to be covered

Therefore, our MC/DC based criteria strikes the balance between intensive testing and computational feasibility (justified by the experimental results).

slide-43
SLIDE 43

Relation with a few other criteria from [4]

◮ MMN: multi-section neuron coverage ◮ MNB: neuron boundary coverage ◮ MTN: top-k neuron coverage

slide-44
SLIDE 44

What we can do?

◮ bug finding ◮ DNN safety statistics ◮ testing efficiency ◮ DNN internal structure analysis

slide-45
SLIDE 45

Test Case Generation

◮ optimisation based (symbolic) approach ◮ concolic testing ◮ monte carlo tree based input mutation testing ◮

slide-46
SLIDE 46

Optimisation based symbolic approach

Formalising the searching of the next test case as an optimisation problem, which can then be solved by e.g.,

◮ Linear Programming (LP) based, see e.g., [8]10 ◮ Global Optimisation (GO) based, see e.g., [7]11

10Sun, Huang, Kroening. Testing Deep Neural Networks.

https://arxiv.org/abs/1803.04792

11Sun, Wu, Ruan, Huang, Kwiatkowska, Kroening, Global Robustness

Evaluation of Deep Neural Networks with Provable Guarantees for L0 Norm. http://cn.arxiv.org/abs/1805.00089

slide-47
SLIDE 47

Concolic approach [9]12

Concolic testing: concrete execution + symbolic analysis

{t0}: seed input T R: test coverage conditions δ: a heuristic δ(R) ranked R t, r new input t′ Oracle adversarial examples top ranked symbolic analysis

12Sun, Wu, Ruan, Huang, Kwiatkowska, Kroening, ASE2018

slide-48
SLIDE 48

Concrete execution (neuron coverage)

◮ The t, r pair is chosen by

concrete executions such that though the specified neuron is not activated by t, it should be really close to be activated. Intuitively, to find the neuron that is closest to be activated

◮ E.g., uk,i = −1.0 is ranked

higher than uk,j = −100.0

slide-49
SLIDE 49

Concrete execution (neuron coverage)

◮ The t, r pair is chosen by

concrete executions such that though the specified neuron is not activated by t, it should be really close to be activated. Intuitively, to find the neuron that is closest to be activated

◮ E.g., uk,i = −1.0 is ranked

higher than uk,j = −100.0

. . . // 1) neuron a c t i v a t i o n v a l u e uk,i = bk,i for (unsigned h = 0; h ≤ sk−1; h += 1) { uk,i += wk−1,h,i · vk−1,h } vk,i = 0 // 2) ReLU i f ( uk,i > 0 ) ⇐ not satisfied { vk,i = uk,i } . . .

◮ to select the branching

point that is most likely to be satisfied

slide-50
SLIDE 50

Symbolic execution (neuron coverage)

Given t, r, to find a new input t′ s.t. r is satisfied. {u′

k,i > 0 ∧ ∀k1 < k :

  • 0≤i1≤sk1

ap′

k1,i1 = ap[t]k1,i1}

slide-51
SLIDE 51

Symbolic execution (neuron coverage)

Given t, r, to find a new input t′ s.t. r is satisfied. {u′

k,i > 0 ∧ ∀k1 < k :

  • 0≤i1≤sk1

ap′

k1,i1 = ap[t]k1,i1}

∧ min ||t′ − t||p

slide-52
SLIDE 52

Symbolic execution (neuron coverage)

Given t, r, to find a new input t′ s.t. r is satisfied. {u′

k,i > 0 ∧ ∀k1 < k :

  • 0≤i1≤sk1

ap′

k1,i1 = ap[t]k1,i1}

∧ min ||t′ − t||p ⇒ the symbolic engine

slide-53
SLIDE 53

Symbolic execution (neuron coverage)

Given t, r, to find a new input t′ s.t. r is satisfied. {u′

k,i > 0 ∧ ∀k1 < k :

  • 0≤i1≤sk1

ap′

k1,i1 = ap[t]k1,i1}

∧ min ||t′ − t||p ⇒ the symbolic engine

◮ The CPLEX Linear Programming (LP) solver13

◮ L∞-norm: maximum difference among all pixels

◮ The global optimisation method 14

◮ L0-norm: the number of pixels that have been changed 13Sun, Huang, Kroening. Testing Deep Neural Networks.

https://arxiv.org/abs/1803.04792

14Sun, Wu, Ruan, Huang, Kwiatkowska, Kroening. Global Robustness

Evaluation of Deep Neural Networks with Provable Guarantees for L0 Norm. http://cn.arxiv.org/abs/1805.00089

slide-54
SLIDE 54

Comparison with DeepXplore

DeepConcolic DeepXplore L∞-norm L0-norm light

  • cclusion

blackout MNIST 97.89% 97.24% 80.5% 82.5% 81.6% CIFAR-10 89.59% 99.69% 77.9% 86.8% 89.5%

slide-55
SLIDE 55

Monte carlo tree search based test case generation [10]15

15Wicker, Huang, Kwiatkowska, TACAS2018

slide-56
SLIDE 56

Pixel Manipulation

define pixel manipulations δX,i : D → D for X ⊆ P0 a subset of input dimensions and i ∈ I: δX,i(α)(x, y, z) =    α(x, y, z) + τ, if (x, y) ∈ X and i = + α(x, y, z) − τ, if (x, y) ∈ X and i = − α(x, y, z)

  • therwise
slide-57
SLIDE 57

Safety Testing as Two-Player Turn-based Game

slide-58
SLIDE 58

Rewards under Strategy Profile σ = (σ1, σ2)

◮ For terminal nodes, ρ ∈ PathF I ,

R(σ, ρ) = 1 sevα(α′

ρ)

where sevα(α′) is severity of an image α′, comparing to the

  • riginal image α

◮ For non-terminal nodes, simply compute the reward by

applying suitable strategy σi on the rewards of the children nodes

slide-59
SLIDE 59

Players’ Objectives

The goal of the game is for player I to choose a strategy σI to maximise the reward R((σI, σII), s0) of the initial state s0, based

  • n the strategy σII of the player II, i.e.,

arg max

σI optσIIR((σI, σII), s0).

(4) where option optσII can be maxσII, minσII, or natσII, according to which player II acts as a cooperator, an adversary, or nature who samples the distribution G(Λ(α)) for pixels and randomly chooses the manipulation instruction.

slide-60
SLIDE 60

Outline

Safety Problem of AI Verification (brief) Testing Conclusions and Future Works

slide-61
SLIDE 61

Conclusions and Future Works

◮ Conclusions

◮ Testing-DNNs is a one-year old baby. ◮ It has attracted attentions from both the academia and the

industry.

◮ Both criteria and test case generation need further validations.

◮ Future Works

◮ safety problems other than robustness ◮ DNN specific criteria, to complement the existing ones which

borrow ideas from traditional software engineering

◮ more light-weight test case generation algorithms ◮ ...

slide-62
SLIDE 62
slide-63
SLIDE 63

Reference

  • T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. Vechev.

Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), volume 00, pages 948–963. Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. Safety verification of deep neural networks. In CAV 2017, pages 3–29, 2017. Guy Katz, Clark Barrett, David Dill, Kyle Julian, and Mykel Kochenderfer. Reluplex: An efficient smt solver for verifying deep neural networks. In CAV 2017, to appear, 2017.

  • L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, L. Li, Y. Liu, J. Zhao, and Y. Wang.

DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems. ArXiv e-prints, March 2018. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. Deepxplore: Automated whitebox testing of deep learning systems. CoRR, abs/1705.06640, 2017. Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. Reachability analysis of deep neural networks with provable guarantees. In IJCAI-2018, 2018. Wenjie Ruan, Min Wu, Youcheng Sun, Xiaowei Huang, Daniel Kroening, and Marta Kwiatkowska. Global robustness evaluation of deep neural networks with provable guarantees for L0 norm. CoRR, abs/1804.05805, 2018. Youcheng Sun, Xiaowei Huang, and Daniel Kroening. Testing deep neural networks. In https://arxiv.org/abs/1803.04792, 2018. Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. Concolic testing for deep neural networks. CoRR, abs/1805.00089, 2018.