Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou University of California, San Diego
Incremental and Approximate Inference for Faster Occlusion-based - - PowerPoint PPT Presentation
Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations Supun Nakandala , Arun Kumar, and Yannis Papakonstantinou University of California, San Diego Introduction Deep Convolutional Neural Networks (CNNs) are
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou University of California, San Diego
2
3
*Simplified representation of a CNN Convolution Layer Input 3-D Array Output 3-D Array
P(pneumonia)
Input Image 3-D array Series of Convolution Layer Transformations Predict Class Probability …
4
5
6
Source: http://blog.qure.ai/notes/visualizing_deep_learning
7
E.g. Inception3: 35 MFLOPS, ResNet152 : 65 MFLOPS
*MFLOPS: Mega Floating Point Operations
8
*IVM: Incremental View Maintenance *MQO: Multi-Query Optimization
*AQP: Approximate Query Processing
9
P(pneumonia)
*Simplified representation of a CNN Input Image 3-D array Series of Convolution Layer Transformations Predict Class Probability Convolution Layer Input 3-D Array Output 3-D Array …
10
Input 3-D Array 3-D Filter Kernel (learned) X K1
x
SUM( K1 o X’)
x x x x x x x x
2-D slice of Output K2
x x x x x x x x x
Kn
x x x x x x x x x
Convolution Layer
11
X Y Z V X Y Z N V
Input: A Filter Kernels: K
WHERE A_K.X = K.X AND A_K.Y = K.Y FROM K, SELECT A.X AS X, A.Y AS Y, K.N AS Z, SUM(A.V * K.V) AS V GROUP BY A.X, A.Y, K.N (SELECT A.X , A.Y, A.Z, A.V, A.X - T.X + FW /2 AS A_K.X, A.Y - T.Y + FH /2 AS A_K.Y FROM A, A AS T WHERE ABS(A.X - T.X) <= FW /2 AND ABS(A.Y - T.Y) <= FH/2)
*FW: Filter Width, FH: Filter Height
12
*IVM: Incremental View Maintenance *MQO: Multi-Query Optimization
*AQP: Approximate Query Processing
13
Same Different
Layer 1 * Only a cross section is shown. Changed region spans the depth dimension Layer 2 Layer N
14
Original Image Materialized Views
Algebraic Framework for Incremental Propagation: No redundant computations IVM
15
additional context …
16
17
18
…
19
20
21
1 3 6 6 7 3 1
22
23
24
25
26
Runtime (s) 3 6 9 12 VGG16 ResNet18 Inception3
Naive Incremental (Exact) Incremental + Approximate
Runtime (min) 4.5 9 13.5 18 VGG16 ResNet18 Inception3
5.4x 2.1x 1.5x 13.8x 4.9x 3.7x 3.9x 1.6x 0.7x 8.6x 3.1x 2.3x
27
Project Web Page: https://adalabucsd.github.io/krypton.html snakanda@eng.ucsd.edu Video: https://tinyurl.com/y2oy9hqq
28
FFI
PyTorch
Python
GPU Memory
cuDNN Library
Custom Kernel InterfaceC Custom Kernel Impl.
Cuda
Invokes Flow of Data
1 2 3 4 5 6
0: Invoke incremental inference. 1: Initialize the input tensors, kernel weights and output buffer in the GPU memory. 2: Invoke the Custom Kernel Interface (written in C) using Python foreign function interface (FFI) support. Pass memory references of input tensors, kernel weights and output buffer. 3: Forward the call to the Custom Kernel Implementation (written in CUDA). 4: Parallely copy the memory regions from the input tensor to an intermediate memory buffer. 5: Invoke the CNN transformation using cuDNN. 6: cuDNN reads the input from intermediate buffer and writes the transformed output to the output buffer. 7: Read the output to the main memory or pass reference as the input to the next transformation.
7
Krypton
Python
29
Input Output Updated patch in the output Updated patch in the input Input patch that needs to be read in to the transformation operator Padding Filter kernel
30
Observation Explainability of CNN predictions is
explainability (OBE) is widely used. Problem OBE is highly compute intensive. This Work Cast OBE as an instance of view-materialization problem. Perform incremental and approximate inference. ~5x and ~35x speedups for exact and approximate heatmaps.
Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou University of California, San Diego