not all neurons are created equal
play

Not all Neurons are created equal: Towards a feature level Deep - PowerPoint PPT Presentation

Not all Neurons are created equal: Towards a feature level Deep Neural Network Test Coverage Metric Nils Wenzler - CSC2125: Topics in Software Engineering Winter 2019 Problem DNN Problem Does it work? Does it really work? DNN Problem


  1. Not all Neurons are created equal: Towards a feature level Deep Neural Network Test Coverage Metric Nils Wenzler - CSC2125: Topics in Software Engineering Winter 2019

  2. Problem DNN

  3. Problem Does it work? Does it really work? DNN ฀

  4. Problem Steer left! DNN

  5. Problem Steer right! DNN

  6. Problem Go straight! DNN

  7. Problem Did I test it enough? Did I test it in the right way? DNN ฀

  8. Structure 1. Problem 2. Current DNN Test Coverage Metrics 3. α -Bin Coverage 4. Practical Evaluation

  9. General Approach Use a test coverage metric for • Building test suites that • Cover all significant behaviours of a deep neural network Not a proof of correctness but evidence towards correctness!

  10. Current DNN Test Coverage Metrics

  11. Current DNN Test Coverage Metrics • High research interest • White-box testing • Focused on single neurons

  12. Current DNN Test Coverage Metrics 𝑚𝑝𝑥 𝑜 : lowest output value during training ℎ𝑗𝑕ℎ 𝑜 : highest output value during training

  13. Current DNN Test Coverage Metrics 𝑚𝑝𝑥 𝑜 ℎ𝑗𝑕ℎ 𝑜 Neural Coverage 0.2 k-multisection Neuron Coverage k=6 Neuron Boundary Coverage Strong Neuron Activation Cov.

  14. Structure 1. Problem 2. Current DNN Test Coverage Metrics 3. α -Bin Coverage 4. Practical Evaluation

  15. Yet another metric? Less then 1 ‰ of total coverage metric! Number of neurons per layer in AlexNet

  16. Not all Neurons are created equal Current metrics put equal emphasis on each neuron, but: Is a first layer neuron as important as an output layer neuron? Make use of domain specific knowledge concerning layer architectures!

  17. 𝑚𝑝𝑥 𝑜 ℎ𝑗𝑕ℎ 𝑜 Neural Coverage 0.2 k-multisection Neuron Coverage k=6 Neuron Boundary Coverage Strong Neuron Activation Cov. Bin Coverage # bins dependend on layer

  18. α -Bin Coverage Equally distribute so-called bins throughout layers. Each layer contributes approximate same share to coverage metric. 𝑚𝑝𝑥 𝑜 ℎ𝑗𝑕ℎ 𝑜 k-multisection Neuron Coverage k=6 Bin Coverage # bins dependend on layer

  19. α -Bin Coverage Let 𝑀 𝑗 denote the number of neurons in Layer i. Let 𝑀 𝑛𝑏𝑦 be the maximum of all 𝑀 𝑗 . Let α ∈ (0, ∞ ] . The minimum number of bins per layer for α -Bin Coverage is defined as: 𝐶𝑗𝑜𝑡 = 𝑀 𝑛𝑏𝑦 ⋅ α The number of bins per neuron in Layer i is defined as: 𝐶𝑗𝑜𝑡 𝑙 𝑗 = 𝑀 𝑗

  20. Structure 1. Problem 2. Current DNN Test Coverage Metrics 3. α -Bin Coverage 4. Practical Evaluation

  21. Practical Evaluation The main questions: 1. Can α -Bin Coverage be implemented in a practically feasible way? 2. Can α -Bin Coverage be optimized with a greedy search approach? 3. How does α -Bin Coverage relate to other DNN coverage metrics? 4. Can α -Bin Coverage be used to find wrong behaviours?

  22. Practical Evaluation The main questions: 1. Can α -Bin Coverage be implemented in a practically feasible way? 2. Can α -Bin Coverage be optimized with a greedy search approach? 3. How does α -Bin Coverage relate to other DNN coverage metrics? 4. Can α -Bin Coverage be used to find wrong behaviours?

  23. Practically feasible? Test setup (1/2): • 10 layer DNN inspired by Nvidea End to End approach using ReLu • Trained on 45,500 publicly available labeled images • Implemented in Python using Tensorflow

  24. Practically feasible? Test setup (2/2): • Created greedy optimizer that uses image transforms to optimize coverage metric • Compare behaviour of α -Bin Coverage & Neuron Coverage

  25. Performance Greedy search transforms Add image to test suite Determine 𝑚𝑝𝑥 𝑜 Select random Add transforms to Evaluate coverage and ℎ𝑗𝑕ℎ 𝑜 image image Iterate on transforms Determining 𝑚𝑝𝑥 𝑜 and ℎ𝑗𝑕ℎ 𝑜 only needs to be done once and can be approximated through random sampling. Calculating α -Bin Coverage incrementally: constant time (dependend on network size).

  26. Greedy search: Transforms Transformations: Translation, Brightness, Contrast, Blur

  27. Practical Evaluation The main questions: 1. Can α -Bin Coverage be implemented in a practically feasible way? 2. Can α -Bin Coverage be optimized with a greedy search approach? 3. How does α -Bin Coverage relate to other DNN coverage metrics? 4. Can α -Bin Coverage be used to find wrong behaviours?

  28. Greedy Optimization: Bin Coverage

  29. Greedy Optimization: Bin Coverage ReLu Activations: Neuron Boundary Coverage is practically limited at 50%

  30. Greedy Optimization: Bin Coverage Obtain 74% 0.05-Bin Coverage with ~220 images

  31. Greedy Optimization: Neuron Coverage

  32. Neuron Coverage Optimization: Layer View

  33. Neuron Coverage Optimization: Layer View Output layer is „fully tested “ for an image with a steering angle > 11.5°

  34. Bin Coverage Optimization: Layer View

  35. Bin Coverage Optimization Layer View Output layer is „fully tested “ after testing 3656 images which correspond to 0.2° steps in -360° to +360°

  36. Practical Evaluation The main questions: 1. Can α -Bin Coverage be implemented in a practically feasible way? 2. Can α -Bin Coverage be optimized with a greedy search approach? 3. How does α -Bin Coverage relate to other DNN coverage metrics? 4. Can α -Bin Coverage be used to find wrong behaviours?

  37. Deviation from target labels in test suite Example: Transformed Output: Target: Image 234° 160°

  38. Conclusions • Current DNN test coverage metrics deal all neurons equally • This introduces an intrinsic focus on the neurons of low layers in modern architectures • α -Bin Coverage is a practically feasible approach to equally distribute a test coverage metric over all layers • First evidence shows that α -Bin Coverage can be used for finding erroneous behaviours and creating test suites automatically

  39. Let‘s discuss! Some points to consider: • Only one model in evaluation • Limited number of test runs • Only one domain • Why greedy search? • What is this strange α value? Why do we need it? • How about classification tasks?

  40. Greedy search Stack transformations on randomly selected images to optimize coverage metric. Add an image to test suite if it significantly increases coverage metric Transformations: Translation, Brightness, Contrast, Blur

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend